Training: 2022-01-07 17:13:51,071-rank_id: 0
Training: 2022-01-07 17:14:17,828-: loss                     cosface
Training: 2022-01-07 17:14:17,829-: network                  r100
Training: 2022-01-07 17:14:17,829-: resume                   False
Training: 2022-01-07 18:37:51,584-: output                   work_dirs/webface42m_r100_lr01_pfc02_16gpus
Training: 2022-01-07 18:37:51,584-: embedding_size           512
Training: 2022-01-07 18:37:51,584-: sample_rate              0.2
Training: 2022-01-07 18:37:51,584-: fp16                     True
Training: 2022-01-07 18:37:51,584-: momentum                 0.9
Training: 2022-01-07 18:37:51,584-: weight_decay             0.0005
Training: 2022-01-07 18:37:51,584-: batch_size               256
Training: 2022-01-07 18:37:51,584-: lr                       0.3
Training: 2022-01-07 18:37:51,584-: dali                     True
Training: 2022-01-07 18:37:51,584-: verbose                  2000
Training: 2022-01-07 18:37:51,584-: frequent                 10
Training: 2022-01-07 18:37:51,584-: if_hard_scale            False
Training: 2022-01-07 18:37:51,585-: score                    None
Training: 2022-01-07 18:37:51,585-: rec                      /train_tmp/WebFace42M
Training: 2022-01-07 18:37:51,585-: num_classes              2059906
Training: 2022-01-07 18:37:51,585-: num_image                42474557
Training: 2022-01-07 18:37:51,585-: num_epoch                20
Training: 2022-01-07 18:37:51,585-: warmup_epoch             1
Training: 2022-01-07 18:37:51,585-: val_targets              ['lfw', 'cfp_fp', 'agedb_30']
Training: 2022-01-07 18:37:51,585-: warmup_step              10369
Training: 2022-01-07 18:37:51,585-: total_step               207380
Training: 2022-01-07 18:38:05,243-Speed 5498.09 samples/sec   Loss 39.6443   LearningRate 0.0159   Epoch: 0   Global Step: 550   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:38:12,670-Speed 5516.75 samples/sec   Loss 39.6036   LearningRate 0.0162   Epoch: 0   Global Step: 560   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:38:20,106-Speed 5509.71 samples/sec   Loss 39.5587   LearningRate 0.0165   Epoch: 0   Global Step: 570   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:38:27,549-Speed 5504.42 samples/sec   Loss 39.5510   LearningRate 0.0168   Epoch: 0   Global Step: 580   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:38:35,042-Speed 5468.04 samples/sec   Loss 39.5291   LearningRate 0.0171   Epoch: 0   Global Step: 590   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:38:42,644-Speed 5388.85 samples/sec   Loss 39.4717   LearningRate 0.0174   Epoch: 0   Global Step: 600   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:38:50,096-Speed 5497.64 samples/sec   Loss 39.4198   LearningRate 0.0176   Epoch: 0   Global Step: 610   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:38:57,510-Speed 5526.22 samples/sec   Loss 39.4227   LearningRate 0.0179   Epoch: 0   Global Step: 620   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:39:04,968-Speed 5493.36 samples/sec   Loss 39.3887   LearningRate 0.0182   Epoch: 0   Global Step: 630   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:39:12,403-Speed 5510.44 samples/sec   Loss 39.3821   LearningRate 0.0185   Epoch: 0   Global Step: 640   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:39:19,944-Speed 5432.65 samples/sec   Loss 39.3465   LearningRate 0.0188   Epoch: 0   Global Step: 650   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:39:27,369-Speed 5517.94 samples/sec   Loss 39.3211   LearningRate 0.0191   Epoch: 0   Global Step: 660   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:39:34,955-Speed 5400.76 samples/sec   Loss 39.3001   LearningRate 0.0194   Epoch: 0   Global Step: 670   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:39:42,357-Speed 5535.23 samples/sec   Loss 39.2760   LearningRate 0.0197   Epoch: 0   Global Step: 680   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:39:49,724-Speed 5560.82 samples/sec   Loss 39.2780   LearningRate 0.0200   Epoch: 0   Global Step: 690   Fp16 Grad Scale: 32768   Required: 43 hours
ng: 2022-01-07 18:39:51,469-Speed 5584.63 samples/sec   Loss 42.4764   LearningRate 0.0023   Epoch: 0   Global Step: 80   Fp16 Grad Scale: 8192   Required: 44 hours
Training: 2022-01-07 18:39:57,092-Speed 5560.67 samples/sec   Loss 39.2595   LearningRate 0.0203   Epoch: 0   Global Step: 700   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:40:04,755-Speed 5350.07 samples/sec   Loss 39.2531   LearningRate 0.0205   Epoch: 0   Global Step: 710   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:40:12,366-Speed 5383.32 samples/sec   Loss 39.2174   LearningRate 0.0208   Epoch: 0   Global Step: 720   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:40:19,696-Speed 5589.52 samples/sec   Loss 39.2185   LearningRate 0.0211   Epoch: 0   Global Step: 730   Fp16 Grad Scale: 32768   Required: 43 hours
ining: 2022-01-07 18:40:20,771-Speed 5479.16 samples/sec   Loss 42.4339   LearningRate 0.0035   Epoch: 0   Global Step: 120   Fp16 Grad Scale: 16384   Required: 43 hours
Training: 2022-01-07 18:40:27,265-Speed 5411.99 samples/sec   Loss 39.1960   LearningRate 0.0214   Epoch: 0   Global Step: 740   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:40:28,143-Speed 5557.25 samples/sec   Loss 42.4347   LearningRate 0.0038   Epoch: 0   Global Step: 130   Fp16 Grad Scale: 16384   Required: 43 hours
Training: 2022-01-07 18:40:35,382-Speed 5661.93 samples/sec   Loss 42.4128   LearningRate 0.0041   Epoch: 0   Global Step: 140   Fp16 Grad Scale: 16384   Required: 43 hours
Training: 2022-01-07 18:40:42,621-Speed 5658.90 samples/sec   Loss 42.3886   LearningRate 0.0043   Epoch: 0   Global Step: 150   Fp16 Grad Scale: 16384   Required: 43 hours
Training: 2022-01-07 18:40:50,093-Speed 5483.51 samples/sec   Loss 42.3571   LearningRate 0.0046   Epoch: 0   Global Step: 160   Fp16 Grad Scale: 16384   Required: 43 hours
Training: 2022-01-07 18:40:56,844-Speed 5497.71 samples/sec   Loss 39.2052   LearningRate 0.0226   Epoch: 0   Global Step: 780   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:41:04,119-Speed 5633.38 samples/sec   Loss 39.1676   LearningRate 0.0229   Epoch: 0   Global Step: 790   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:41:11,600-Speed 5476.66 samples/sec   Loss 39.1966   LearningRate 0.0231   Epoch: 0   Global Step: 800   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:41:19,205-Speed 5387.43 samples/sec   Loss 39.2049   LearningRate 0.0234   Epoch: 0   Global Step: 810   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:41:26,630-Speed 5517.57 samples/sec   Loss 39.1738   LearningRate 0.0237   Epoch: 0   Global Step: 820   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:41:34,039-Speed 5530.22 samples/sec   Loss 39.1547   LearningRate 0.0240   Epoch: 0   Global Step: 830   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:41:41,538-Speed 5463.09 samples/sec   Loss 39.2001   LearningRate 0.0243   Epoch: 0   Global Step: 840   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:41:49,165-Speed 5372.02 samples/sec   Loss 39.1941   LearningRate 0.0246   Epoch: 0   Global Step: 850   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:41:56,819-Speed 5351.96 samples/sec   Loss 39.1715   LearningRate 0.0249   Epoch: 0   Global Step: 860   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:42:04,290-Speed 5484.49 samples/sec   Loss 39.1698   LearningRate 0.0252   Epoch: 0   Global Step: 870   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:42:11,790-Speed 5462.60 samples/sec   Loss 39.1889   LearningRate 0.0255   Epoch: 0   Global Step: 880   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:42:19,050-Speed 5643.81 samples/sec   Loss 39.1620   LearningRate 0.0257   Epoch: 0   Global Step: 890   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:42:26,244-Speed 5695.20 samples/sec   Loss 39.1756   LearningRate 0.0260   Epoch: 0   Global Step: 900   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:42:33,393-Speed 5731.05 samples/sec   Loss 39.1696   LearningRate 0.0263   Epoch: 0   Global Step: 910   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:42:40,833-Speed 5506.69 samples/sec   Loss 39.1989   LearningRate 0.0266   Epoch: 0   Global Step: 920   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:42:48,272-Speed 5507.02 samples/sec   Loss 39.1680   LearningRate 0.0269   Epoch: 0   Global Step: 930   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:42:55,811-Speed 5434.35 samples/sec   Loss 39.1873   LearningRate 0.0272   Epoch: 0   Global Step: 940   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:43:03,286-Speed 5480.87 samples/sec   Loss 39.1872   LearningRate 0.0275   Epoch: 0   Global Step: 950   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:43:10,743-Speed 5493.97 samples/sec   Loss 39.1804   LearningRate 0.0278   Epoch: 0   Global Step: 960   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:43:18,192-Speed 5499.56 samples/sec   Loss 39.1828   LearningRate 0.0281   Epoch: 0   Global Step: 970   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:43:24,241-Speed 5636.82 samples/sec   Loss 40.6593   LearningRate 0.0107   Epoch: 0   Global Step: 370   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:43:25,627-Speed 5509.75 samples/sec   Loss 39.2133   LearningRate 0.0284   Epoch: 0   Global Step: 980   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:43:33,069-Speed 5506.43 samples/sec   Loss 39.2146   LearningRate 0.0286   Epoch: 0   Global Step: 990   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:43:40,603-Speed 5437.84 samples/sec   Loss 39.2198   LearningRate 0.0289   Epoch: 0   Global Step: 1000   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:43:48,022-Speed 5521.40 samples/sec   Loss 39.2090   LearningRate 0.0292   Epoch: 0   Global Step: 1010   Fp16 Grad Scale: 65536   Required: 43 hours
aining: 2022-01-07 18:43:53,343-Speed 5667.45 samples/sec   Loss 40.3711   LearningRate 0.0119   Epoch: 0   Global Step: 410   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-01-07 18:43:55,480-Speed 5492.99 samples/sec   Loss 39.2389   LearningRate 0.0295   Epoch: 0   Global Step: 1020   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:44:02,954-Speed 5483.64 samples/sec   Loss 39.2298   LearningRate 0.0298   Epoch: 0   Global Step: 1030   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:44:10,375-Speed 5519.81 samples/sec   Loss 39.2127   LearningRate 0.0301   Epoch: 0   Global Step: 1040   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:44:17,820-Speed 5502.64 samples/sec   Loss 39.2076   LearningRate 0.0304   Epoch: 0   Global Step: 1050   Fp16 Grad Scale: 65536   Required: 43 hours
ining: 2022-01-07 18:44:22,696-Speed 5618.41 samples/sec   Loss 40.0572   LearningRate 0.0130   Epoch: 0   Global Step: 450   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-01-07 18:44:25,302-Speed 5475.10 samples/sec   Loss 39.2280   LearningRate 0.0307   Epoch: 0   Global Step: 1060   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:44:32,915-Speed 5382.85 samples/sec   Loss 39.2349   LearningRate 0.0310   Epoch: 0   Global Step: 1070   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:44:40,527-Speed 5381.43 samples/sec   Loss 39.2379   LearningRate 0.0312   Epoch: 0   Global Step: 1080   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:44:47,933-Speed 5532.06 samples/sec   Loss 39.2408   LearningRate 0.0315   Epoch: 0   Global Step: 1090   Fp16 Grad Scale: 32768   Required: 43 hours
ining: 2022-01-07 18:44:52,197-Speed 5600.57 samples/sec   Loss 39.8258   LearningRate 0.0142   Epoch: 0   Global Step: 490   Fp16 Grad Scale: 32768   Required: 42 hours
Training: 2022-01-07 18:44:55,373-Speed 5505.53 samples/sec   Loss 39.2582   LearningRate 0.0318   Epoch: 0   Global Step: 1100   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:45:02,819-Speed 5504.47 samples/sec   Loss 39.2440   LearningRate 0.0321   Epoch: 0   Global Step: 1110   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:45:10,302-Speed 5474.30 samples/sec   Loss 39.2359   LearningRate 0.0324   Epoch: 0   Global Step: 1120   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:45:17,720-Speed 5523.38 samples/sec   Loss 39.2752   LearningRate 0.0327   Epoch: 0   Global Step: 1130   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:45:25,146-Speed 5516.10 samples/sec   Loss 39.2780   LearningRate 0.0330   Epoch: 0   Global Step: 1140   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:45:32,557-Speed 5527.83 samples/sec   Loss 39.2744   LearningRate 0.0333   Epoch: 0   Global Step: 1150   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:45:39,978-Speed 5520.37 samples/sec   Loss 39.2987   LearningRate 0.0336   Epoch: 0   Global Step: 1160   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:45:47,399-Speed 5520.66 samples/sec   Loss 39.2949   LearningRate 0.0339   Epoch: 0   Global Step: 1170   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:45:54,826-Speed 5515.07 samples/sec   Loss 39.2948   LearningRate 0.0341   Epoch: 0   Global Step: 1180   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:46:02,301-Speed 5481.11 samples/sec   Loss 39.2977   LearningRate 0.0344   Epoch: 0   Global Step: 1190   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:46:09,791-Speed 5469.36 samples/sec   Loss 39.3209   LearningRate 0.0347   Epoch: 0   Global Step: 1200   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:46:17,235-Speed 5502.73 samples/sec   Loss 39.3473   LearningRate 0.0350   Epoch: 0   Global Step: 1210   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:46:24,697-Speed 5489.61 samples/sec   Loss 39.3534   LearningRate 0.0353   Epoch: 0   Global Step: 1220   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:46:32,130-Speed 5512.19 samples/sec   Loss 39.3334   LearningRate 0.0356   Epoch: 0   Global Step: 1230   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:46:39,586-Speed 5494.28 samples/sec   Loss 39.3281   LearningRate 0.0359   Epoch: 0   Global Step: 1240   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:46:47,037-Speed 5497.47 samples/sec   Loss 39.3505   LearningRate 0.0362   Epoch: 0   Global Step: 1250   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:46:54,502-Speed 5487.73 samples/sec   Loss 39.3721   LearningRate 0.0365   Epoch: 0   Global Step: 1260   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:47:01,957-Speed 5495.09 samples/sec   Loss 39.3805   LearningRate 0.0367   Epoch: 0   Global Step: 1270   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:47:06,580-Speed 5312.02 samples/sec   Loss 39.2506   LearningRate 0.0194   Epoch: 0   Global Step: 670   Fp16 Grad Scale: 16384   Required: 43 hours
Training: 2022-01-07 18:47:14,394-Speed 5246.81 samples/sec   Loss 39.2250   LearningRate 0.0197   Epoch: 0   Global Step: 680   Fp16 Grad Scale: 16384   Required: 43 hours
Training: 2022-01-07 18:47:22,178-Speed 5263.14 samples/sec   Loss 39.2187   LearningRate 0.0200   Epoch: 0   Global Step: 690   Fp16 Grad Scale: 16384   Required: 43 hours
Training: 2022-01-07 18:47:29,897-Speed 5307.10 samples/sec   Loss 39.2036   LearningRate 0.0203   Epoch: 0   Global Step: 700   Fp16 Grad Scale: 16384   Required: 43 hours
 hours
Training: 2022-01-07 18:47:37,637-Speed 5292.97 samples/sec   Loss 39.1947   LearningRate 0.0205   Epoch: 0   Global Step: 710   Fp16 Grad Scale: 16384   Required: 43 hours
Training: 2022-01-07 18:47:45,365-Speed 5304.02 samples/sec   Loss 39.1989   LearningRate 0.0208   Epoch: 0   Global Step: 720   Fp16 Grad Scale: 16384   Required: 43 hours
Training: 2022-01-07 18:47:53,036-Speed 5340.50 samples/sec   Loss 39.1970   LearningRate 0.0211   Epoch: 0   Global Step: 730   Fp16 Grad Scale: 16384   Required: 43 hours
Training: 2022-01-07 18:48:00,835-Speed 5253.25 samples/sec   Loss 39.2106   LearningRate 0.0214   Epoch: 0   Global Step: 740   Fp16 Grad Scale: 16384   Required: 43 hours
Training: 2022-01-07 18:48:08,541-Speed 5316.07 samples/sec   Loss 39.1846   LearningRate 0.0217   Epoch: 0   Global Step: 750   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:48:16,117-Speed 5407.92 samples/sec   Loss 39.1867   LearningRate 0.0220   Epoch: 0   Global Step: 760   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:48:23,462-Speed 5577.85 samples/sec   Loss 39.1850   LearningRate 0.0223   Epoch: 0   Global Step: 770   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:48:31,006-Speed 5430.42 samples/sec   Loss 39.2002   LearningRate 0.0226   Epoch:2   Epoch: 0   Global Step: 1390   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:48:39,314-Speed 5483.07 samples/sec   Loss 39.4691   LearningRate 0.0405   Epoch: 0   Global Step: 1400   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:48:46,840-Speed 5443.01 samples/sec   Loss 39.4508   LearningRate 0.0408   Epoch: 0   Global Step: 1410   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:48:54,382-Speed 5431.75 samples/sec   Loss 39.4505   LearningRate 0.0411   Epoch: 0   Global Step: 1420   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:49:01,895-Speed 5453.39 samples/sec   Loss 39.4577   LearningRate 0.0414   Epoch: 0   Global Step: 1430   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:49:09,515-Speed 5375.74 samples/sec   Loss 39.4651   LearningRate 0.0417   Epoch: 0   Global Step: 1440   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:49:16,965-Speed 5498.71 samples/sec   Loss 39.4912   LearningRate 0.0420   Epoch: 0   Global Step: 1450   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:49:24,494-Speed 5441.04 samples/sec   Loss 39.4540   LearningRate 0.0422   Epoch: 0   Global Step: 1460   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:49:31,997-Speed 5459.90 samples/sec   Loss 39.4867   LearningRate 0.0425   Epoch: 0   Global Step: 1470   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:49:39,539-Speed 5431.80 samples/sec   Loss 39.4755   LearningRate 0.0428   Epoch: 0   Global Step: 1480   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:49:45,580-Speed 5559.86 samples/sec   Loss 39.2483   LearningRate 0.0255   Epoch: 0   Global Step: 880   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:49:53,074-Speed 5469.29 samples/sec   Loss 39.2726   LearningRate 0.0257   Epoch: 0   Global Step: 890   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:50:00,468-Speed 5540.56 samples/sec   Loss 39.2569   LearningRate 0.0260   Epoch: 0   Global Step: 900   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:50:02,074-Speed 5436.83 samples/sec   Loss 39.5017   LearningRate 0.0437   Epoch: 0   Global Step: 1510   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:50:09,600-Speed 5448.27 samples/sec   Loss 39.4714   LearningRate 0.0440   Epoch: 0   Global Step: 1520   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:50:15,404-Speed 5526.08 samples/sec   Loss 39.2703   LearningRate 0.0266   Epoch: 0   Global Step: 920   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:50:22,679-Speed 5633.52 samples/sec   Loss 39.2809   LearningRate 0.0269   Epoch: 0   Global Step: 930   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:50:29,947-Speed 5636.98 samples/sec   Loss 39.2958   LearningRate 0.0272   Epoch: 0   Global Step: 940   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:50:32,363-Speed 5266.49 samples/sec   Loss 39.4985   LearningRate 0.0448   Epoch: 0   Global Step: 1550   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:50:39,954-Speed 5397.81 samples/sec   Loss 39.4815   LearningRate 0.0451   Epoch: 0   Global Step: 1560   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:50:47,411-Speed 5493.68 samples/sec   Loss 39.5012   LearningRate 0.0454   Epoch: 0   Global Step: 1570   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:50:54,880-Speed 5484.84 samples/sec   Loss 39.5174   LearningRate 0.0457   Epoch: 0   Global Step: 1580   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:51:02,354-Speed 5481.27 samples/sec   Loss 39.4911   LearningRate 0.0460   Epoch: 0   Global Step: 1590   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:51:09,845-Speed 5468.51 samples/sec   Loss 39.4959   LearningRate 0.0463   Epoch: 0   Global Step: 1600   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:51:17,292-Speed 5501.29 samples/sec   Loss 39.4971   LearningRate 0.0466   Epoch: 0   Global Step: 1610   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:51:24,736-Speed 5502.73 samples/sec   Loss 39.4911   LearningRate 0.0469   Epoch: 0   Global Step: 1620   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:51:32,243-Speed 5457.25 samples/sec   Loss 39.4939   LearningRate 0.0472   Epoch: 0   Global Step: 1630   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:51:39,828-Speed 5401.39 samples/sec   Loss 39.4735   LearningRate 0.0474   Epoch: 0   Global Step: 1640   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:51:44,495-Speed 5509.43 samples/sec   Loss 39.3612   LearningRate 0.0301   Epoch: 0   Global Step: 1040   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:51:51,808-Speed 5607.29 samples/sec   Loss 39.3666   LearningRate 0.0304   Epoch: 0   Global Step: 1050   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:51:59,191-Speed 5548.64 samples/sec   Loss 39.4078   LearningRate 0.0307   Epoch: 0   Global Step: 1060   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:52:02,284-Speed 5506.25 samples/sec   Loss 39.4541   LearningRate 0.0483   Epoch: 0   Global Step: 1670   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:52:09,772-Speed 5473.73 samples/sec   Loss 39.4923   LearningRate 0.0486   Epoch: 0   Global Step: 1680   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:52:13,984-Speed 5592.76 samples/sec   Loss 39.4264   LearningRate 0.0312   Epoch: 0   Global Step: 1080   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:52:21,544-Speed 5428.66 samples/sec   Loss 39.4239   LearningRate 0.0315   Epoch: 0   Global Step: 1090   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:52:28,918-Speed 5557.64 samples/sec   Loss 39.4139   LearningRate 0.0318   Epoch: 0   Global Step: 1100   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:52:32,165-Speed 5473.31 samples/sec   Loss 39.4546   LearningRate 0.0495   Epoch: 0   Global Step: 1710   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:52:39,659-Speed 5468.06 samples/sec   Loss 39.4632   LearningRate 0.0498   Epoch: 0   Global Step: 1720   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:52:43,782-Speed 5527.94 samples/sec   Loss 39.4360   LearningRate 0.0324   Epoch: 0   Global Step: 1120   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 18:52:51,273-Speed 5472.03 samples/sec   Loss 39.4344   LearningRate 0.0327   Epoch: 0   Global Step: 1130   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:52:58,753-Speed 5476.95 samples/sec   Loss 39.4446   LearningRate 0.0330   Epoch: 0   Global Step: 1140   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:53:02,048-Speed 5502.50 samples/sec   Loss 39.4940   LearningRate 0.0506   Epoch: 0   Global Step: 1750   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:53:09,508-Speed 5494.48 samples/sec   Loss 39.4427   LearningRate 0.0509   Epoch: 0   Global Step: 1760   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 18:53:13,785-Speed 5387.15 samples/sec   Loss 39.4590   LearningRate 0.0336   Epoch: 0   Global Step: 1160   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:53:21,155-Speed 5560.95 samples/sec   Loss 39.4655   LearningRate 0.0339   Epoch: 0   Global Step: 1170   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:53:28,563-Speed 5529.94 samples/sec   Loss 39.4995   LearningRate 0.0341   Epoch: 0   Global Step: 1180   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:53:35,996-Speed 5511.14 samples/sec   Loss 39.4858   LearningRate 0.0344   Epoch: 0   Global Step: 1190   Fp16 Grad Scale: 65536   Required: 43 hours

Training: 2022-01-07 18:53:39,501-Speed 5493.68 samples/sec   Loss 39.4168   LearningRate 0.0521   Epoch: 0   Global Step: 1800   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:53:47,000-Speed 5463.40 samples/sec   Loss 39.4033   LearningRate 0.0524   Epoch: 0   Global Step: 1810   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:53:54,441-Speed 5505.02 samples/sec   Loss 39.4054   LearningRate 0.0527   Epoch: 0   Global Step: 1820   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:54:01,906-Speed 5488.21 samples/sec   Loss 39.3479   LearningRate 0.0529   Epoch: 0   Global Step: 1830   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:54:09,343-Speed 5508.18 samples/sec   Loss 39.3927   LearningRate 0.0532   Epoch: 0   Global Step: 1840   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:54:13,212-Speed 5514.77 samples/sec   Loss 39.5079   LearningRate 0.0359   Epoch: 0   Global Step: 1240   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:54:20,628-Speed 5526.24 samples/sec   Loss 39.5323   LearningRate 0.0362   Epoch: 0   Global Step: 1250   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:54:28,080-Speed 5497.58 samples/sec   Loss 39.5191   LearningRate 0.0365   Epoch: 0   Global Step: 1260   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:54:31,748-Speed 5502.77 samples/sec   Loss 39.3351   LearningRate 0.0541   Epoch: 0   Global Step: 1870   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:54:39,189-Speed 5507.69 samples/sec   Loss 39.3343   LearningRate 0.0544   Epoch: 0   Global Step: 1880   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:54:42,724-Speed 5572.05 samples/sec   Loss 39.5339   LearningRate 0.0370   Epoch: 0   Global Step: 1280   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:54:50,043-Speed 5599.64 samples/sec   Loss 39.5414   LearningRate 0.0373   Epoch: 0   Global Step: 1290   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:54:57,481-Speed 5508.26 samples/sec   Loss 39.5520   LearningRate 0.0376   Epoch: 0   Global Step: 1300   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:55:01,710-Speed 5463.32 samples/sec   Loss 39.2786   LearningRate 0.0553   Epoch: 0   Global Step: 1910   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:55:09,173-Speed 5491.74 samples/sec   Loss 39.2420   LearningRate 0.0556   Epoch: 0   Global Step: 1920   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:55:12,156-Speed 5587.76 samples/sec   Loss 39.5262   LearningRate 0.0382   Epoch: 0   Global Step: 1320   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:55:19,559-Speed 5538.23 samples/sec   Loss 39.5398   LearningRate 0.0385   Epoch: 0   Global Step: 1330   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:55:26,930-Speed 5557.66 samples/sec   Loss 39.5331   LearningRate 0.0388   Epoch: 0   Global Step: 1340   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:55:31,479-Speed 5497.32 samples/sec   Loss 39.1896   LearningRate 0.0564   Epoch: 0   Global Step: 1950   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:55:38,912-Speed 5521.93 samples/sec   Loss 39.1933   LearningRate 0.0567   Epoch: 0   Global Step: 1960   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:55:41,810-Speed 5475.10 samples/sec   Loss 39.5673   LearningRate 0.0393   Epoch: 0   Global Step: 1360   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:55:49,938-Speed 5042.59 samples/sec   Loss 39.5428   LearningRate 0.0396   Epoch: 0   Global Step: 1370   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:55:58,007-Speed 5077.20 samples/sec   Loss 39.5418   LearningRate 0.0399   Epoch: 0   Global Step: 1380   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 18:56:06,002-Speed 5124.11 samples/sec   Loss 39.5541   LearningRate 0.0402   Epoch: 0   Global Step: 1390   Fp16 Grad Scale: 131072   Required: 43 hours
rs
Training: 2022-01-07 18:56:14,077-Speed 5073.76 samples/sec   Loss 39.5674   LearningRate 0.0405   Epoch: 0   Global Step: 1400   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:56:22,072-Speed 5125.88 samples/sec   Loss 39.5585   LearningRate 0.0408   Epoch: 0   Global Step: 1410   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:56:30,071-Speed 5121.24 samples/sec   Loss 39.5631   LearningRate 0.0411   Epoch: 0   Global Step: 1420   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:56:38,180-Speed 5051.72 samples/sec   Loss 39.5640   LearningRate 0.0414   Epoch: 0   Global Step: 1430   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:56:46,397-Speed 4986.23 samples/sec   Loss 39.5618   LearningRate 0.0417   Epoch: 0   Global Step: 1440   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:56:54,672-Speed 4950.82 samples/sec   Loss 39.5726   LearningRate 0.0420   Epoch: 0   Global Step: 1450   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:57:02,694-Speed 5106.31 samples/sec   Loss 39.5725   LearningRate 0.0422   Epoch: 0   Global Step: 1460   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:57:10,719-Speed 5105.00 samples/sec   Loss 39.5873   LearningRate 0.0425   Epoch: 0   Global Step: 1470   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:57:18,721-Speed 5119.05 samples/sec   Loss 39.5725   LearningRate 0.0428   Epoch: 0   Global Step: 1480   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:57:26,706-Speed 5130.66 samples/sec   Loss 39.5576   LearningRate 0.0431   Epoch: 0   Global Step: 1490   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:57:34,675-Speed 5140.35 samples/sec   Loss 39.5729   LearningRate 0.0434   Epoch: 0   Global Step: 1500   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:57:42,687-Speed 5113.73 samples/sec   Loss 39.5956   LearningRate 0.0437   Epoch: 0   Global Step: 1510   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:57:50,721-Speed 5099.38 samples/sec   Loss 39.5641   LearningRate 0.0440   Epoch: 0   Global Step: 1520   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:57:58,728-Speed 5116.38 samples/sec   Loss 39.5526   LearningRate 0.0443   Epoch: 0   Global Step: 1530   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:58:06,886-Speed 5021.98 samples/sec   Loss 39.5827   LearningRate 0.0446   Epoch: 0   Global Step: 1540   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:58:14,910-Speed 5106.56 samples/sec   Loss 39.5695   LearningRate 0.0448   Epoch: 0   Global Step: 1550   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:58:23,145-Speed 4975.02 samples/sec   Loss 39.5463   LearningRate 0.0451   Epoch: 0   Global Step: 1560   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:58:31,579-Speed 4857.21 samples/sec   Loss 39.5315   LearningRate 0.0454   Epoch: 0   Global Step: 1570   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:58:39,590-Speed 5113.74 samples/sec   Loss 39.5589   LearningRate 0.0457   Epoch: 0   Global Step: 1580   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:58:47,272-Speed 5526.44 samples/sec   Loss 39.0886   LearningRate 0.0584   Epoch: 0   Global Step: 2020   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-01-07 18:58:54,714-Speed 5507.43 samples/sec   Loss 39.0395   LearningRate 0.0587   Epoch: 0   Global Step: 2030   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-01-07 18:59:02,109-Speed 5540.12 samples/sec   Loss 39.0065   LearningRate 0.0590   Epoch: 0   Global Step: 2040   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-01-07 18:59:09,513-Speed 5533.45 samples/sec   Loss 38.9746   LearningRate 0.0593   Epoch: 0   Global Step: 2050   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-01-07 18:59:11,868-Speed 5079.67 samples/sec   Loss 39.6042   LearningRate 0.0469   Epoch: 0   Global Step: 1620   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 18:59:16,916-Speed 5534.65 samples/sec   Loss 38.9635   LearningRate 0.0596   Epoch: 0   Global Step: 2060   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-01-07 18:59:24,311-Speed 5540.22 samples/sec   Loss 38.9349   LearningRate 0.0599   Epoch: 0   Global Step: 2070   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-01-07 18:59:31,757-Speed 5502.44 samples/sec   Loss 38.9255   LearningRate 0.0602   Epoch: 0   Global Step: 2080   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-01-07 18:59:39,188-Speed 5512.93 samples/sec   Loss 38.8944   LearningRate 0.0605   Epoch: 0   Global Step: 2090   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-01-07 18:59:46,593-Speed 5532.63 samples/sec   Loss 38.8868   LearningRate 0.0608   Epoch: 0   Global Step: 2100   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-01-07 18:59:54,000-Speed 5531.40 samples/sec   Loss 38.8464   LearningRate 0.0610   Epoch: 0   Global Step: 2110   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-01-07 19:00:01,407-Speed 5531.46 samples/sec   Loss 38.8233   LearningRate 0.0613   Epoch: 0   Global Step: 2120   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-01-07 19:00:08,816-Speed 5529.28 samples/sec   Loss 38.8056   LearningRate 0.0616   Epoch: 0   Global Step: 2130   Fp16 Grad Scale: 131072   Required: 47 hours
Training: 2022-01-07 19:00:16,255-Speed 5507.87 samples/sec   Loss 38.7604   LearningRate 0.0619   Epoch: 0   Global Step: 2140   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:00:23,748-Speed 5467.40 samples/sec   Loss 38.7654   LearningRate 0.0622   Epoch: 0   Global Step: 2150   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:00:31,042-Speed 5617.18 samples/sec   Loss 38.7448   LearningRate 0.0625   Epoch: 0   Global Step: 2160   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:00:38,242-Speed 5690.54 samples/sec   Loss 38.7204   LearningRate 0.0628   Epoch: 0   Global Step: 2170   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:00:45,364-Speed 5752.83 samples/sec   Loss 38.6850   LearningRate 0.0631   Epoch: 0   Global Step: 2180   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-01-07 19:00:52,481-Speed 5756.39 samples/sec   Loss 38.6535   LearningRate 0.0634   Epoch: 0   Global Step: 2190   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:00:59,840-Speed 5567.14 samples/sec   Loss 38.5824   LearningRate 0.0637   Epoch: 0   Global Step: 2200   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:01:07,254-Speed 5526.24 samples/sec   Loss 38.5746   LearningRate 0.0639   Epoch: 0   Global Step: 2210   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:01:14,668-Speed 5525.70 samples/sec   Loss 38.5995   LearningRate 0.0642   Epoch: 0   Global Step: 2220   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:01:22,061-Speed 5541.77 samples/sec   Loss 38.5608   LearningRate 0.0645   Epoch: 0   Global Step: 2230   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:01:29,482-Speed 5520.74 samples/sec   Loss 38.5198   LearningRate 0.0648   Epoch: 0   Global Step: 2240   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:01:36,859-Speed 5553.80 samples/sec   Loss 38.4989   LearningRate 0.0651   Epoch: 0   Global Step: 2250   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:01:44,102-Speed 5656.58 samples/sec   Loss 38.4296   LearningRate 0.0654   Epoch: 0   Global Step: 2260   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:01:51,567-Speed 5488.55 samples/sec   Loss 38.4549   LearningRate 0.0657   Epoch: 0   Global Step: 2270   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:01:58,965-Speed 5537.68 samples/sec   Loss 38.4344   LearningRate 0.0660   Epoch: 0   Global Step: 2280   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:02:06,254-Speed 5620.31 samples/sec   Loss 38.3981   LearningRate 0.0663   Epoch: 0   Global Step: 2290   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-01-07 19:02:13,682-Speed 5516.07 samples/sec   Loss 38.3636   LearningRate 0.0665   Epoch: 0   Global Step: 2300   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:02:21,055-Speed 5556.83 samples/sec   Loss 38.3443   LearningRate 0.0668   Epoch: 0   Global Step: 2310   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:02:28,392-Speed 5584.11 samples/sec   Loss 38.3023   LearningRate 0.0671   Epoch: 0   Global Step: 2320   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:02:35,759-Speed 5560.87 samples/sec   Loss 38.2830   LearningRate 0.0674   Epoch: 0   Global Step: 2330   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:02:43,148-Speed 5544.14 samples/sec   Loss 38.2760   LearningRate 0.0677   Epoch: 0   Global Step: 2340   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:02:50,576-Speed 5515.95 samples/sec   Loss 38.2683   LearningRate 0.0680   Epoch: 0   Global Step: 2350   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:02:58,019-Speed 5505.20 samples/sec   Loss 38.2192   LearningRate 0.0683   Epoch: 0   Global Step: 2360   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:03:05,411-Speed 5542.19 samples/sec   Loss 38.1707   LearningRate 0.0686   Epoch: 0   Global Step: 2370   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:03:12,855-Speed 5503.93 samples/sec   Loss 38.1459   LearningRate 0.0689   Epoch: 0   Global Step: 2380   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:03:20,307-Speed 5497.33 samples/sec   Loss 38.0994   LearningRate 0.0691   Epoch: 0   Global Step: 2390   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:03:27,358-Speed 5811.01 samples/sec   Loss 38.1053   LearningRate 0.0694   Epoch: 0   Global Step: 2400   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:03:34,710-Speed 5572.06 samples/sec   Loss 38.0341   LearningRate 0.0697   Epoch: 0   Global Step: 2410   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:03:42,282-Speed 5410.20 samples/sec   Loss 38.0479   LearningRate 0.0700   Epoch: 0   Global Step: 2420   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:03:49,671-Speed 5546.11 samples/sec   Loss 38.0104   LearningRate 0.0703   Epoch: 0   Global Step: 2430   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:03:57,116-Speed 5502.83 samples/sec   Loss 37.9638   LearningRate 0.0706   Epoch: 0   Global Step: 2440   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:04:04,675-Speed 5419.68 samples/sec   Loss 37.9485   LearningRate 0.0709   Epoch: 0   Global Step: 2450   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:04:12,111-Speed 5509.68 samples/sec   Loss 37.9332   LearningRate 0.0712   Epoch: 0   Global Step: 2460   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:04:19,668-Speed 5421.04 samples/sec   Loss 37.8870   LearningRate 0.0715   Epoch: 0   Global Step: 2470   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:04:27,168-Speed 5462.02 samples/sec   Loss 37.8213   LearningRate 0.0718   Epoch: 0   Global Step: 2480   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:04:34,606-Speed 5508.40 samples/sec   Loss 37.8112   LearningRate 0.0720   Epoch: 0   Global Step: 2490   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:04:42,007-Speed 5535.20 samples/sec   Loss 37.7860   LearningRate 0.0723   Epoch: 0   Global Step: 2500   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:04:49,453-Speed 5501.29 samples/sec   Loss 37.7495   LearningRate 0.0726   Epoch: 0   Global Step: 2510   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:04:56,891-Speed 5508.10 samples/sec   Loss 37.7184   LearningRate 0.0729   Epoch: 0   Global Step: 2520   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:05:04,400-Speed 5455.68 samples/sec   Loss 37.6822   LearningRate 0.0732   Epoch: 0   Global Step: 2530   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:05:11,888-Speed 5470.80 samples/sec   Loss 37.6603   LearningRate 0.0735   Epoch: 0   Global Step: 2540   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:05:19,383-Speed 5465.92 samples/sec   Loss 37.6211   LearningRate 0.0738   Epoch: 0   Global Step: 2550   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:05:26,824-Speed 5505.38 samples/sec   Loss 37.5768   LearningRate 0.0741   Epoch: 0   Global Step: 2560   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:05:34,343-Speed 5449.15 samples/sec   Loss 37.5733   LearningRate 0.0744   Epoch: 0   Global Step: 2570   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:05:41,946-Speed 5387.91 samples/sec   Loss 37.5189   LearningRate 0.0746   Epoch: 0   Global Step: 2580   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:05:49,575-Speed 5369.94 samples/sec   Loss 37.4435   LearningRate 0.0749   Epoch: 0   Global Step: 2590   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:05:57,133-Speed 5420.42 samples/sec   Loss 37.4136   LearningRate 0.0752   Epoch: 0   Global Step: 2600   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-01-07 19:06:04,637-Speed 5459.53 samples/sec   Loss 37.3582   LearningRate 0.0755   Epoch: 0   Global Step: 2610   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-01-07 19:06:12,048-Speed 5527.80 samples/sec   Loss 37.3890   LearningRate 0.0758   Epoch: 0   Global Step: 2620   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:06:19,489-Speed 5505.76 samples/sec   Loss 37.3441   LearningRate 0.0761   Epoch: 0   Global Step: 2630   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:06:26,896-Speed 5531.04 samples/sec   Loss 37.2965   LearningRate 0.0764   Epoch: 0   Global Step: 2640   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:06:34,350-Speed 5495.62 samples/sec   Loss 37.2542   LearningRate 0.0767   Epoch: 0   Global Step: 2650   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:06:41,770-Speed 5521.03 samples/sec   Loss 37.1895   LearningRate 0.0770   Epoch: 0   Global Step: 2660   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:06:49,200-Speed 5513.27 samples/sec   Loss 37.1949   LearningRate 0.0772   Epoch: 0   Global Step: 2670   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:06:56,648-Speed 5500.81 samples/sec   Loss 37.1469   LearningRate 0.0775   Epoch: 0   Global Step: 2680   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:07:04,084-Speed 5509.21 samples/sec   Loss 37.1170   LearningRate 0.0778   Epoch: 0   Global Step: 2690   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:07:11,576-Speed 5467.77 samples/sec   Loss 37.1162   LearningRate 0.0781   Epoch: 0   Global Step: 2700   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:07:19,084-Speed 5456.59 samples/sec   Loss 37.0394   LearningRate 0.0784   Epoch: 0   Global Step: 2710   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:07:26,624-Speed 5433.14 samples/sec   Loss 36.9839   LearningRate 0.0787   Epoch: 0   Global Step: 2720   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 19:07:34,102-Speed 5478.61 samples/sec   Loss 36.9716   LearningRate 0.0790   Epoch: 0   Global Step: 2730   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:07:41,562-Speed 5490.65 samples/sec   Loss 36.8934   LearningRate 0.0793   Epoch: 0   Global Step: 2740   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:07:49,011-Speed 5500.03 samples/sec   Loss 36.8685   LearningRate 0.0796   Epoch: 0   Global Step: 2750   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:07:56,470-Speed 5492.65 samples/sec   Loss 36.8238   LearningRate 0.0799   Epoch: 0   Global Step: 2760   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:08:04,032-Speed 5416.97 samples/sec   Loss 36.8370   LearningRate 0.0801   Epoch: 0   Global Step: 2770   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:08:11,613-Speed 5404.25 samples/sec   Loss 36.7488   LearningRate 0.0804   Epoch: 0   Global Step: 2780   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:08:19,072-Speed 5491.93 samples/sec   Loss 36.7479   LearningRate 0.0807   Epoch: 0   Global Step: 2790   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:08:26,596-Speed 5444.62 samples/sec   Loss 36.6917   LearningRate 0.0810   Epoch: 0   Global Step: 2800   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:08:34,058-Speed 5490.32 samples/sec   Loss 36.6001   LearningRate 0.0813   Epoch: 0   Global Step: 2810   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:08:41,491-Speed 5510.82 samples/sec   Loss 36.6034   LearningRate 0.0816   Epoch: 0   Global Step: 2820   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:08:48,924-Speed 5511.61 samples/sec   Loss 36.5371   LearningRate 0.0819   Epoch: 0   Global Step: 2830   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 19:08:56,368-Speed 5503.80 samples/sec   Loss 36.5431   LearningRate 0.0822   Epoch: 0   Global Step: 2840   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:09:03,813-Speed 5502.40 samples/sec   Loss 36.5060   LearningRate 0.0825   Epoch: 0   Global Step: 2850   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:09:11,277-Speed 5487.80 samples/sec   Loss 36.4598   LearningRate 0.0827   Epoch: 0   Global Step: 2860   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:09:18,713-Speed 5509.56 samples/sec   Loss 36.3733   LearningRate 0.0830   Epoch: 0   Global Step: 2870   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:09:26,142-Speed 5514.26 samples/sec   Loss 36.3937   LearningRate 0.0833   Epoch: 0   Global Step: 2880   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:09:33,579-Speed 5508.35 samples/sec   Loss 36.3008   LearningRate 0.0836   Epoch: 0   Global Step: 2890   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:09:41,027-Speed 5500.03 samples/sec   Loss 36.2913   LearningRate 0.0839   Epoch: 0   Global Step: 2900   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:09:48,682-Speed 5352.09 samples/sec   Loss 36.2412   LearningRate 0.0842   Epoch: 0   Global Step: 2910   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:09:56,151-Speed 5484.51 samples/sec   Loss 36.2182   LearningRate 0.0845   Epoch: 0   Global Step: 2920   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:10:03,621-Speed 5484.03 samples/sec   Loss 36.1121   LearningRate 0.0848   Epoch: 0   Global Step: 2930   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:10:11,114-Speed 5467.37 samples/sec   Loss 36.1283   LearningRate 0.0851   Epoch: 0   Global Step: 2940   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 19:10:18,598-Speed 5473.95 samples/sec   Loss 36.0817   LearningRate 0.0854   Epoch: 0   Global Step: 2950   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:10:26,075-Speed 5479.08 samples/sec   Loss 36.0565   LearningRate 0.0856   Epoch: 0   Global Step: 2960   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:10:33,513-Speed 5507.23 samples/sec   Loss 35.9862   LearningRate 0.0859   Epoch: 0   Global Step: 2970   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:10:40,962-Speed 5499.84 samples/sec   Loss 35.9798   LearningRate 0.0862   Epoch: 0   Global Step: 2980   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:10:48,406-Speed 5503.45 samples/sec   Loss 35.9151   LearningRate 0.0865   Epoch: 0   Global Step: 2990   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:10:55,864-Speed 5493.26 samples/sec   Loss 35.8793   LearningRate 0.0868   Epoch: 0   Global Step: 3000   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:11:03,309-Speed 5502.21 samples/sec   Loss 35.8429   LearningRate 0.0871   Epoch: 0   Global Step: 3010   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:11:10,747-Speed 5507.89 samples/sec   Loss 35.7457   LearningRate 0.0874   Epoch: 0   Global Step: 3020   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:11:18,207-Speed 5491.48 samples/sec   Loss 35.6966   LearningRate 0.0877   Epoch: 0   Global Step: 3030   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:11:25,861-Speed 5351.85 samples/sec   Loss 35.6121   LearningRate 0.0880   Epoch: 0   Global Step: 3040   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:11:33,525-Speed 5345.35 samples/sec   Loss 35.6466   LearningRate 0.0882   Epoch: 0   Global Step: 3050   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 19:11:40,987-Speed 5490.67 samples/sec   Loss 35.6335   LearningRate 0.0885   Epoch: 0   Global Step: 3060   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:11:48,418-Speed 5512.76 samples/sec   Loss 35.5558   LearningRate 0.0888   Epoch: 0   Global Step: 3070   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:11:55,860-Speed 5504.62 samples/sec   Loss 35.4720   LearningRate 0.0891   Epoch: 0   Global Step: 3080   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:12:03,332-Speed 5482.63 samples/sec   Loss 35.4817   LearningRate 0.0894   Epoch: 0   Global Step: 3090   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:12:10,869-Speed 5435.32 samples/sec   Loss 35.4046   LearningRate 0.0897   Epoch: 0   Global Step: 3100   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:12:18,427-Speed 5420.01 samples/sec   Loss 35.3655   LearningRate 0.0900   Epoch: 0   Global Step: 3110   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:12:25,842-Speed 5524.89 samples/sec   Loss 35.3294   LearningRate 0.0903   Epoch: 0   Global Step: 3120   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:12:33,271-Speed 5514.59 samples/sec   Loss 35.2389   LearningRate 0.0906   Epoch: 0   Global Step: 3130   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:12:40,763-Speed 5468.32 samples/sec   Loss 35.2263   LearningRate 0.0908   Epoch: 0   Global Step: 3140   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:12:48,220-Speed 5493.44 samples/sec   Loss 35.1906   LearningRate 0.0911   Epoch: 0   Global Step: 3150   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:12:55,669-Speed 5499.35 samples/sec   Loss 35.1434   LearningRate 0.0914   Epoch: 0   Global Step: 3160   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 19:13:03,168-Speed 5462.80 samples/sec   Loss 35.1307   LearningRate 0.0917   Epoch: 0   Global Step: 3170   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 19:13:10,617-Speed 5499.84 samples/sec   Loss 35.0310   LearningRate 0.0920   Epoch: 0   Global Step: 3180   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:13:18,060-Speed 5503.68 samples/sec   Loss 35.0358   LearningRate 0.0923   Epoch: 0   Global Step: 3190   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:13:25,494-Speed 5511.27 samples/sec   Loss 34.9901   LearningRate 0.0926   Epoch: 0   Global Step: 3200   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:13:33,063-Speed 5412.11 samples/sec   Loss 34.9040   LearningRate 0.0929   Epoch: 0   Global Step: 3210   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:13:40,564-Speed 5461.82 samples/sec   Loss 34.8330   LearningRate 0.0932   Epoch: 0   Global Step: 3220   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:13:48,010-Speed 5501.23 samples/sec   Loss 34.8000   LearningRate 0.0935   Epoch: 0   Global Step: 3230   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:13:55,462-Speed 5497.22 samples/sec   Loss 34.7616   LearningRate 0.0937   Epoch: 0   Global Step: 3240   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:14:02,955-Speed 5467.33 samples/sec   Loss 34.7479   LearningRate 0.0940   Epoch: 0   Global Step: 3250   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:14:10,434-Speed 5477.64 samples/sec   Loss 34.7198   LearningRate 0.0943   Epoch: 0   Global Step: 3260   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:14:17,884-Speed 5498.93 samples/sec   Loss 34.6152   LearningRate 0.0946   Epoch: 0   Global Step: 3270   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:14:25,310-Speed 5516.44 samples/sec   Loss 34.6095   LearningRate 0.0949   Epoch: 0   Global Step: 3280   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:14:32,818-Speed 5456.11 samples/sec   Loss 34.5498   LearningRate 0.0952   Epoch: 0   Global Step: 3290   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:14:40,228-Speed 5529.12 samples/sec   Loss 34.5093   LearningRate 0.0955   Epoch: 0   Global Step: 3300   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:14:47,654-Speed 5516.61 samples/sec   Loss 34.4279   LearningRate 0.0958   Epoch: 0   Global Step: 3310   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:14:55,128-Speed 5480.54 samples/sec   Loss 34.4010   LearningRate 0.0961   Epoch: 0   Global Step: 3320   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:15:02,575-Speed 5501.42 samples/sec   Loss 34.3092   LearningRate 0.0963   Epoch: 0   Global Step: 3330   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:15:10,074-Speed 5463.15 samples/sec   Loss 34.2828   LearningRate 0.0966   Epoch: 0   Global Step: 3340   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:15:17,491-Speed 5523.00 samples/sec   Loss 34.2441   LearningRate 0.0969   Epoch: 0   Global Step: 3350   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:15:24,950-Speed 5491.83 samples/sec   Loss 34.2252   LearningRate 0.0972   Epoch: 0   Global Step: 3360   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:15:32,393-Speed 5504.20 samples/sec   Loss 34.0922   LearningRate 0.0975   Epoch: 0   Global Step: 3370   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:15:39,808-Speed 5524.98 samples/sec   Loss 34.1155   LearningRate 0.0978   Epoch: 0   Global Step: 3380   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:15:47,240-Speed 5512.04 samples/sec   Loss 34.0180   LearningRate 0.0981   Epoch: 0   Global Step: 3390   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:15:54,694-Speed 5495.31 samples/sec   Loss 34.0093   LearningRate 0.0984   Epoch: 0   Global Step: 3400   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:16:02,122-Speed 5515.13 samples/sec   Loss 33.9363   LearningRate 0.0987   Epoch: 0   Global Step: 3410   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:16:09,619-Speed 5464.61 samples/sec   Loss 33.8485   LearningRate 0.0989   Epoch: 0   Global Step: 3420   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:16:17,118-Speed 5463.26 samples/sec   Loss 33.8903   LearningRate 0.0992   Epoch: 0   Global Step: 3430   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:16:24,565-Speed 5500.55 samples/sec   Loss 33.7892   LearningRate 0.0995   Epoch: 0   Global Step: 3440   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:16:32,090-Speed 5444.43 samples/sec   Loss 33.7509   LearningRate 0.0998   Epoch: 0   Global Step: 3450   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:16:39,624-Speed 5437.62 samples/sec   Loss 33.7078   LearningRate 0.1001   Epoch: 0   Global Step: 3460   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:16:47,079-Speed 5495.38 samples/sec   Loss 33.5981   LearningRate 0.1004   Epoch: 0   Global Step: 3470   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:16:54,564-Speed 5472.44 samples/sec   Loss 33.6528   LearningRate 0.1007   Epoch: 0   Global Step: 3480   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 19:17:01,982-Speed 5522.76 samples/sec   Loss 33.5801   LearningRate 0.1010   Epoch: 0   Global Step: 3490   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:17:09,417-Speed 5509.46 samples/sec   Loss 33.5091   LearningRate 0.1013   Epoch: 0   Global Step: 3500   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:17:16,912-Speed 5466.34 samples/sec   Loss 33.4475   LearningRate 0.1016   Epoch: 0   Global Step: 3510   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:17:24,359-Speed 5501.06 samples/sec   Loss 33.3904   LearningRate 0.1018   Epoch: 0   Global Step: 3520   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:17:31,845-Speed 5472.21 samples/sec   Loss 33.3961   LearningRate 0.1021   Epoch: 0   Global Step: 3530   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:17:39,334-Speed 5470.17 samples/sec   Loss 33.3168   LearningRate 0.1024   Epoch: 0   Global Step: 3540   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:17:46,788-Speed 5496.10 samples/sec   Loss 33.2825   LearningRate 0.1027   Epoch: 0   Global Step: 3550   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:17:54,278-Speed 5469.09 samples/sec   Loss 33.1769   LearningRate 0.1030   Epoch: 0   Global Step: 3560   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:18:01,713-Speed 5510.12 samples/sec   Loss 33.1615   LearningRate 0.1033   Epoch: 0   Global Step: 3570   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:18:09,162-Speed 5499.77 samples/sec   Loss 33.1202   LearningRate 0.1036   Epoch: 0   Global Step: 3580   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:18:16,595-Speed 5511.70 samples/sec   Loss 33.1008   LearningRate 0.1039   Epoch: 0   Global Step: 3590   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 19:18:24,015-Speed 5520.54 samples/sec   Loss 33.0179   LearningRate 0.1042   Epoch: 0   Global Step: 3600   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:18:31,529-Speed 5451.86 samples/sec   Loss 32.9805   LearningRate 0.1044   Epoch: 0   Global Step: 3610   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:18:38,967-Speed 5508.12 samples/sec   Loss 32.8369   LearningRate 0.1047   Epoch: 0   Global Step: 3620   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:18:46,434-Speed 5486.54 samples/sec   Loss 32.8570   LearningRate 0.1050   Epoch: 0   Global Step: 3630   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:18:53,946-Speed 5453.36 samples/sec   Loss 32.7292   LearningRate 0.1053   Epoch: 0   Global Step: 3640   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:19:01,407-Speed 5490.06 samples/sec   Loss 32.6648   LearningRate 0.1056   Epoch: 0   Global Step: 3650   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:19:08,898-Speed 5469.14 samples/sec   Loss 32.6007   LearningRate 0.1059   Epoch: 0   Global Step: 3660   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:19:16,371-Speed 5482.03 samples/sec   Loss 32.6322   LearningRate 0.1062   Epoch: 0   Global Step: 3670   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:19:23,918-Speed 5428.32 samples/sec   Loss 32.5113   LearningRate 0.1065   Epoch: 0   Global Step: 3680   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:19:31,423-Speed 5458.40 samples/sec   Loss 32.5448   LearningRate 0.1068   Epoch: 0   Global Step: 3690   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:19:38,888-Speed 5487.71 samples/sec   Loss 32.4174   LearningRate 0.1070   Epoch: 0   Global Step: 3700   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:19:46,346-Speed 5493.36 samples/sec   Loss 32.4315   LearningRate 0.1073   Epoch: 0   Global Step: 3710   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:19:53,826-Speed 5476.25 samples/sec   Loss 32.3280   LearningRate 0.1076   Epoch: 0   Global Step: 3720   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:20:01,268-Speed 5504.63 samples/sec   Loss 32.3105   LearningRate 0.1079   Epoch: 0   Global Step: 3730   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:20:08,704-Speed 5509.49 samples/sec   Loss 32.2432   LearningRate 0.1082   Epoch: 0   Global Step: 3740   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:20:16,185-Speed 5475.74 samples/sec   Loss 32.2148   LearningRate 0.1085   Epoch: 0   Global Step: 3750   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:20:23,637-Speed 5497.36 samples/sec   Loss 32.1930   LearningRate 0.1088   Epoch: 0   Global Step: 3760   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:20:31,078-Speed 5506.15 samples/sec   Loss 32.0994   LearningRate 0.1091   Epoch: 0   Global Step: 3770   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:20:38,584-Speed 5456.96 samples/sec   Loss 31.9849   LearningRate 0.1094   Epoch: 0   Global Step: 3780   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:20:46,011-Speed 5516.46 samples/sec   Loss 32.0466   LearningRate 0.1097   Epoch: 0   Global Step: 3790   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:20:53,474-Speed 5489.11 samples/sec   Loss 31.9820   LearningRate 0.1099   Epoch: 0   Global Step: 3800   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 19:21:00,918-Speed 5503.70 samples/sec   Loss 31.9008   LearningRate 0.1102   Epoch: 0   Global Step: 3810   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:21:08,339-Speed 5519.47 samples/sec   Loss 31.8188   LearningRate 0.1105   Epoch: 0   Global Step: 3820   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:21:15,775-Speed 5509.80 samples/sec   Loss 31.7723   LearningRate 0.1108   Epoch: 0   Global Step: 3830   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:21:23,247-Speed 5482.67 samples/sec   Loss 31.6902   LearningRate 0.1111   Epoch: 0   Global Step: 3840   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:21:30,744-Speed 5464.18 samples/sec   Loss 31.6909   LearningRate 0.1114   Epoch: 0   Global Step: 3850   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:21:38,237-Speed 5467.44 samples/sec   Loss 31.6343   LearningRate 0.1117   Epoch: 0   Global Step: 3860   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:21:45,694-Speed 5493.64 samples/sec   Loss 31.5717   LearningRate 0.1120   Epoch: 0   Global Step: 3870   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:21:53,184-Speed 5469.14 samples/sec   Loss 31.5208   LearningRate 0.1123   Epoch: 0   Global Step: 3880   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:22:00,692-Speed 5456.52 samples/sec   Loss 31.4977   LearningRate 0.1125   Epoch: 0   Global Step: 3890   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:22:08,161-Speed 5485.38 samples/sec   Loss 31.3727   LearningRate 0.1128   Epoch: 0   Global Step: 3900   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:22:15,613-Speed 5497.08 samples/sec   Loss 31.3758   LearningRate 0.1131   Epoch: 0   Global Step: 3910   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 19:22:23,045-Speed 5512.22 samples/sec   Loss 31.3269   LearningRate 0.1134   Epoch: 0   Global Step: 3920   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:22:30,566-Speed 5446.70 samples/sec   Loss 31.2029   LearningRate 0.1137   Epoch: 0   Global Step: 3930   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:22:38,023-Speed 5493.86 samples/sec   Loss 31.1777   LearningRate 0.1140   Epoch: 0   Global Step: 3940   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:22:45,512-Speed 5470.27 samples/sec   Loss 31.0566   LearningRate 0.1143   Epoch: 0   Global Step: 3950   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:22:52,951-Speed 5507.32 samples/sec   Loss 31.1066   LearningRate 0.1146   Epoch: 0   Global Step: 3960   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:23:00,384-Speed 5512.65 samples/sec   Loss 30.9683   LearningRate 0.1149   Epoch: 0   Global Step: 3970   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:23:07,832-Speed 5499.91 samples/sec   Loss 31.0086   LearningRate 0.1152   Epoch: 0   Global Step: 3980   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:23:15,265-Speed 5511.23 samples/sec   Loss 30.9280   LearningRate 0.1154   Epoch: 0   Global Step: 3990   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:23:22,714-Speed 5500.25 samples/sec   Loss 30.7091   LearningRate 0.1157   Epoch: 0   Global Step: 4000   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:24:08,011-[lfw][4000]XNorm: 21.241078
Training: 2022-01-07 19:24:08,012-[lfw][4000]Accuracy-Flip: 0.98033+-0.00785
Training: 2022-01-07 19:24:08,013-[lfw][4000]Accuracy-Highest: 0.98033
Training: 2022-01-07 19:25:00,678-[cfp_fp][4000]XNorm: 18.989971
Training: 2022-01-07 19:25:00,679-[cfp_fp][4000]Accuracy-Flip: 0.88014+-0.01442
Training: 2022-01-07 19:25:00,680-[cfp_fp][4000]Accuracy-Highest: 0.88014
Training: 2022-01-07 19:25:46,348-[agedb_30][4000]XNorm: 20.819124
Training: 2022-01-07 19:25:46,349-[agedb_30][4000]Accuracy-Flip: 0.84617+-0.01346
Training: 2022-01-07 19:25:46,349-[agedb_30][4000]Accuracy-Highest: 0.84617
Training: 2022-01-07 19:25:53,839-Speed 271.04 samples/sec   Loss 30.8200   LearningRate 0.1160   Epoch: 0   Global Step: 4010   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:26:01,244-Speed 5533.24 samples/sec   Loss 30.7546   LearningRate 0.1163   Epoch: 0   Global Step: 4020   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:26:08,818-Speed 5409.12 samples/sec   Loss 30.5659   LearningRate 0.1166   Epoch: 0   Global Step: 4030   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:26:16,320-Speed 5460.91 samples/sec   Loss 30.6485   LearningRate 0.1169   Epoch: 0   Global Step: 4040   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:26:23,849-Speed 5441.48 samples/sec   Loss 30.5623   LearningRate 0.1172   Epoch: 0   Global Step: 4050   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:26:31,227-Speed 5553.28 samples/sec   Loss 30.4052   LearningRate 0.1175   Epoch: 0   Global Step: 4060   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:26:38,631-Speed 5532.84 samples/sec   Loss 30.4505   LearningRate 0.1178   Epoch: 0   Global Step: 4070   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:26:46,054-Speed 5519.22 samples/sec   Loss 30.3762   LearningRate 0.1180   Epoch: 0   Global Step: 4080   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:26:53,569-Speed 5451.84 samples/sec   Loss 30.3233   LearningRate 0.1183   Epoch: 0   Global Step: 4090   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:27:00,985-Speed 5524.72 samples/sec   Loss 30.3058   LearningRate 0.1186   Epoch: 0   Global Step: 4100   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:27:08,249-Speed 5641.32 samples/sec   Loss 30.1089   LearningRate 0.1189   Epoch: 0   Global Step: 4110   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:27:15,540-Speed 5619.35 samples/sec   Loss 30.1502   LearningRate 0.1192   Epoch: 0   Global Step: 4120   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-01-07 19:27:22,482-Speed 5901.51 samples/sec   Loss 30.1601   LearningRate 0.1195   Epoch: 0   Global Step: 4130   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:27:29,841-Speed 5566.48 samples/sec   Loss 30.0609   LearningRate 0.1198   Epoch: 0   Global Step: 4140   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:27:37,335-Speed 5466.84 samples/sec   Loss 29.9687   LearningRate 0.1201   Epoch: 0   Global Step: 4150   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:27:44,850-Speed 5452.25 samples/sec   Loss 29.9292   LearningRate 0.1204   Epoch: 0   Global Step: 4160   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:27:52,255-Speed 5532.43 samples/sec   Loss 29.8077   LearningRate 0.1206   Epoch: 0   Global Step: 4170   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:27:59,743-Speed 5471.68 samples/sec   Loss 29.7437   LearningRate 0.1209   Epoch: 0   Global Step: 4180   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:28:07,408-Speed 5344.90 samples/sec   Loss 29.7582   LearningRate 0.1212   Epoch: 0   Global Step: 4190   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:28:14,798-Speed 5544.10 samples/sec   Loss 29.6465   LearningRate 0.1215   Epoch: 0   Global Step: 4200   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:28:22,077-Speed 5627.96 samples/sec   Loss 29.7081   LearningRate 0.1218   Epoch: 0   Global Step: 4210   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:28:29,469-Speed 5542.53 samples/sec   Loss 29.6010   LearningRate 0.1221   Epoch: 0   Global Step: 4220   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:28:36,851-Speed 5550.32 samples/sec   Loss 29.4316   LearningRate 0.1224   Epoch: 0   Global Step: 4230   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:28:44,219-Speed 5560.86 samples/sec   Loss 29.4878   LearningRate 0.1227   Epoch: 0   Global Step: 4240   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:28:51,598-Speed 5551.92 samples/sec   Loss 29.4699   LearningRate 0.1230   Epoch: 0   Global Step: 4250   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:28:59,300-Speed 5318.55 samples/sec   Loss 29.3212   LearningRate 0.1233   Epoch: 0   Global Step: 4260   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:29:06,724-Speed 5519.52 samples/sec   Loss 29.2851   LearningRate 0.1235   Epoch: 0   Global Step: 4270   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:29:14,210-Speed 5473.14 samples/sec   Loss 29.2386   LearningRate 0.1238   Epoch: 0   Global Step: 4280   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:29:21,630-Speed 5521.33 samples/sec   Loss 29.1283   LearningRate 0.1241   Epoch: 0   Global Step: 4290   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:29:29,102-Speed 5482.72 samples/sec   Loss 29.0931   LearningRate 0.1244   Epoch: 0   Global Step: 4300   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:29:36,483-Speed 5551.36 samples/sec   Loss 29.0149   LearningRate 0.1247   Epoch: 0   Global Step: 4310   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:29:43,866-Speed 5550.07 samples/sec   Loss 28.9880   LearningRate 0.1250   Epoch: 0   Global Step: 4320   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:29:51,286-Speed 5521.54 samples/sec   Loss 29.0315   LearningRate 0.1253   Epoch: 0   Global Step: 4330   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-01-07 19:29:58,687-Speed 5535.97 samples/sec   Loss 28.8204   LearningRate 0.1256   Epoch: 0   Global Step: 4340   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-01-07 19:30:06,126-Speed 5506.95 samples/sec   Loss 28.8564   LearningRate 0.1259   Epoch: 0   Global Step: 4350   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:30:13,480-Speed 5571.57 samples/sec   Loss 28.7699   LearningRate 0.1261   Epoch: 0   Global Step: 4360   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:30:20,900-Speed 5521.68 samples/sec   Loss 28.7069   LearningRate 0.1264   Epoch: 0   Global Step: 4370   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:30:28,275-Speed 5554.79 samples/sec   Loss 28.6134   LearningRate 0.1267   Epoch: 0   Global Step: 4380   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:30:35,496-Speed 5673.77 samples/sec   Loss 28.5316   LearningRate 0.1270   Epoch: 0   Global Step: 4390   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:30:42,658-Speed 5720.42 samples/sec   Loss 28.5494   LearningRate 0.1273   Epoch: 0   Global Step: 4400   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:30:50,046-Speed 5545.52 samples/sec   Loss 28.4340   LearningRate 0.1276   Epoch: 0   Global Step: 4410   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:30:57,489-Speed 5504.53 samples/sec   Loss 28.3495   LearningRate 0.1279   Epoch: 0   Global Step: 4420   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:31:04,875-Speed 5547.06 samples/sec   Loss 28.2765   LearningRate 0.1282   Epoch: 0   Global Step: 4430   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:31:12,283-Speed 5530.57 samples/sec   Loss 28.3250   LearningRate 0.1285   Epoch: 0   Global Step: 4440   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:31:19,701-Speed 5522.67 samples/sec   Loss 28.3031   LearningRate 0.1287   Epoch: 0   Global Step: 4450   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-01-07 19:31:27,124-Speed 5519.19 samples/sec   Loss 28.2639   LearningRate 0.1290   Epoch: 0   Global Step: 4460   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:31:34,304-Speed 5705.68 samples/sec   Loss 28.1209   LearningRate 0.1293   Epoch: 0   Global Step: 4470   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:31:41,613-Speed 5605.45 samples/sec   Loss 28.0575   LearningRate 0.1296   Epoch: 0   Global Step: 4480   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:31:49,111-Speed 5464.67 samples/sec   Loss 27.9526   LearningRate 0.1299   Epoch: 0   Global Step: 4490   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:31:56,579-Speed 5487.40 samples/sec   Loss 27.9264   LearningRate 0.1302   Epoch: 0   Global Step: 4500   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:32:04,070-Speed 5469.12 samples/sec   Loss 27.8773   LearningRate 0.1305   Epoch: 0   Global Step: 4510   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:32:11,499-Speed 5515.52 samples/sec   Loss 27.8037   LearningRate 0.1308   Epoch: 0   Global Step: 4520   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-01-07 19:32:18,983-Speed 5474.95 samples/sec   Loss 27.7692   LearningRate 0.1311   Epoch: 0   Global Step: 4530   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-01-07 19:32:26,411-Speed 5517.41 samples/sec   Loss 27.7005   LearningRate 0.1314   Epoch: 0   Global Step: 4540   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-01-07 19:32:33,851-Speed 5506.45 samples/sec   Loss 27.5878   LearningRate 0.1316   Epoch: 0   Global Step: 4550   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-01-07 19:32:41,288-Speed 5508.62 samples/sec   Loss 27.6981   LearningRate 0.1319   Epoch: 0   Global Step: 4560   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-01-07 19:32:48,442-Speed 5726.58 samples/sec   Loss 27.5588   LearningRate 0.1322   Epoch: 0   Global Step: 4570   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-01-07 19:32:55,805-Speed 5564.56 samples/sec   Loss 27.5906   LearningRate 0.1325   Epoch: 0   Global Step: 4580   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-01-07 19:33:03,140-Speed 5585.75 samples/sec   Loss 27.4110   LearningRate 0.1328   Epoch: 0   Global Step: 4590   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-01-07 19:33:10,673-Speed 5438.48 samples/sec   Loss 27.3393   LearningRate 0.1331   Epoch: 0   Global Step: 4600   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-01-07 19:33:18,176-Speed 5460.72 samples/sec   Loss 27.2801   LearningRate 0.1334   Epoch: 0   Global Step: 4610   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-01-07 19:33:25,832-Speed 5351.51 samples/sec   Loss 27.2408   LearningRate 0.1337   Epoch: 0   Global Step: 4620   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-01-07 19:33:33,428-Speed 5393.18 samples/sec   Loss 27.3004   LearningRate 0.1340   Epoch: 0   Global Step: 4630   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-01-07 19:33:40,957-Speed 5441.70 samples/sec   Loss 27.2225   LearningRate 0.1342   Epoch: 0   Global Step: 4640   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-01-07 19:33:48,364-Speed 5531.45 samples/sec   Loss 27.0071   LearningRate 0.1345   Epoch: 0   Global Step: 4650   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-01-07 19:33:55,834-Speed 5484.33 samples/sec   Loss 27.0637   LearningRate 0.1348   Epoch: 0   Global Step: 4660   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 19:34:03,486-Speed 5353.48 samples/sec   Loss 26.9365   LearningRate 0.1351   Epoch: 0   Global Step: 4670   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 19:34:11,028-Speed 5432.13 samples/sec   Loss 26.9100   LearningRate 0.1354   Epoch: 0   Global Step: 4680   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 19:34:18,428-Speed 5536.45 samples/sec   Loss 26.7849   LearningRate 0.1357   Epoch: 0   Global Step: 4690   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 19:34:25,561-Speed 5744.20 samples/sec   Loss 26.8040   LearningRate 0.1360   Epoch: 0   Global Step: 4700   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 19:34:32,633-Speed 5793.46 samples/sec   Loss 26.8127   LearningRate 0.1363   Epoch: 0   Global Step: 4710   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 19:34:40,089-Speed 5494.35 samples/sec   Loss 26.7107   LearningRate 0.1366   Epoch: 0   Global Step: 4720   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:34:47,479-Speed 5543.85 samples/sec   Loss 26.5709   LearningRate 0.1369   Epoch: 0   Global Step: 4730   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:34:54,710-Speed 5665.89 samples/sec   Loss 26.5512   LearningRate 0.1371   Epoch: 0   Global Step: 4740   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:35:02,133-Speed 5519.06 samples/sec   Loss 26.5511   LearningRate 0.1374   Epoch: 0   Global Step: 4750   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:35:09,625-Speed 5468.53 samples/sec   Loss 26.4141   LearningRate 0.1377   Epoch: 0   Global Step: 4760   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:35:17,021-Speed 5538.88 samples/sec   Loss 26.3589   LearningRate 0.1380   Epoch: 0   Global Step: 4770   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:35:24,532-Speed 5455.00 samples/sec   Loss 26.2682   LearningRate 0.1383   Epoch: 0   Global Step: 4780   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:35:32,098-Speed 5414.98 samples/sec   Loss 26.2487   LearningRate 0.1386   Epoch: 0   Global Step: 4790   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:35:39,656-Speed 5420.30 samples/sec   Loss 26.1727   LearningRate 0.1389   Epoch: 0   Global Step: 4800   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:35:47,225-Speed 5412.61 samples/sec   Loss 26.2012   LearningRate 0.1392   Epoch: 0   Global Step: 4810   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:35:54,687-Speed 5490.69 samples/sec   Loss 26.1295   LearningRate 0.1395   Epoch: 0   Global Step: 4820   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:36:02,211-Speed 5445.69 samples/sec   Loss 25.9380   LearningRate 0.1397   Epoch: 0   Global Step: 4830   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:36:09,674-Speed 5489.20 samples/sec   Loss 25.9406   LearningRate 0.1400   Epoch: 0   Global Step: 4840   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:36:17,028-Speed 5571.45 samples/sec   Loss 25.9077   LearningRate 0.1403   Epoch: 0   Global Step: 4850   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:36:24,607-Speed 5405.74 samples/sec   Loss 25.8182   LearningRate 0.1406   Epoch: 0   Global Step: 4860   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:36:32,010-Speed 5533.58 samples/sec   Loss 25.8517   LearningRate 0.1409   Epoch: 0   Global Step: 4870   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:36:39,409-Speed 5537.17 samples/sec   Loss 25.6565   LearningRate 0.1412   Epoch: 0   Global Step: 4880   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:36:46,815-Speed 5533.09 samples/sec   Loss 25.6534   LearningRate 0.1415   Epoch: 0   Global Step: 4890   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:36:54,259-Speed 5503.49 samples/sec   Loss 25.6398   LearningRate 0.1418   Epoch: 0   Global Step: 4900   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:37:01,646-Speed 5546.72 samples/sec   Loss 25.5083   LearningRate 0.1421   Epoch: 0   Global Step: 4910   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:37:09,064-Speed 5522.84 samples/sec   Loss 25.4993   LearningRate 0.1423   Epoch: 0   Global Step: 4920   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 19:37:16,458-Speed 5540.99 samples/sec   Loss 25.3682   LearningRate 0.1426   Epoch: 0   Global Step: 4930   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:37:23,963-Speed 5459.06 samples/sec   Loss 25.3217   LearningRate 0.1429   Epoch: 0   Global Step: 4940   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:37:31,385-Speed 5520.12 samples/sec   Loss 25.3297   LearningRate 0.1432   Epoch: 0   Global Step: 4950   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:37:38,849-Speed 5489.03 samples/sec   Loss 25.2991   LearningRate 0.1435   Epoch: 0   Global Step: 4960   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:37:46,273-Speed 5517.62 samples/sec   Loss 25.2393   LearningRate 0.1438   Epoch: 0   Global Step: 4970   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:37:53,675-Speed 5535.76 samples/sec   Loss 25.2393   LearningRate 0.1441   Epoch: 0   Global Step: 4980   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:38:01,171-Speed 5465.85 samples/sec   Loss 25.1180   LearningRate 0.1444   Epoch: 0   Global Step: 4990   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:38:08,552-Speed 5550.03 samples/sec   Loss 25.0682   LearningRate 0.1447   Epoch: 0   Global Step: 5000   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:38:15,893-Speed 5580.86 samples/sec   Loss 24.9828   LearningRate 0.1450   Epoch: 0   Global Step: 5010   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:38:23,362-Speed 5485.44 samples/sec   Loss 24.8543   LearningRate 0.1452   Epoch: 0   Global Step: 5020   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 19:38:30,757-Speed 5540.19 samples/sec   Loss 24.8137   LearningRate 0.1455   Epoch: 0   Global Step: 5030   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 19:38:38,505-Speed 5287.88 samples/sec   Loss 24.7575   LearningRate 0.1458   Epoch: 0   Global Step: 5040   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 19:38:46,013-Speed 5456.35 samples/sec   Loss 24.6868   LearningRate 0.1461   Epoch: 0   Global Step: 5050   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 19:38:53,330-Speed 5598.75 samples/sec   Loss 24.6542   LearningRate 0.1464   Epoch: 0   Global Step: 5060   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 19:39:00,591-Speed 5642.72 samples/sec   Loss 24.5479   LearningRate 0.1467   Epoch: 0   Global Step: 5070   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 19:39:07,885-Speed 5618.63 samples/sec   Loss 24.5193   LearningRate 0.1470   Epoch: 0   Global Step: 5080   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 19:39:15,433-Speed 5427.36 samples/sec   Loss 24.5077   LearningRate 0.1473   Epoch: 0   Global Step: 5090   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 19:39:22,822-Speed 5544.02 samples/sec   Loss 24.4262   LearningRate 0.1476   Epoch: 0   Global Step: 5100   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 19:39:30,263-Speed 5506.49 samples/sec   Loss 24.3795   LearningRate 0.1478   Epoch: 0   Global Step: 5110   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 19:39:37,665-Speed 5534.55 samples/sec   Loss 24.2670   LearningRate 0.1481   Epoch: 0   Global Step: 5120   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:39:45,113-Speed 5501.61 samples/sec   Loss 24.1709   LearningRate 0.1484   Epoch: 0   Global Step: 5130   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:39:52,542-Speed 5514.73 samples/sec   Loss 24.2716   LearningRate 0.1487   Epoch: 0   Global Step: 5140   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 19:39:59,908-Speed 5562.20 samples/sec   Loss 24.1719   LearningRate 0.1490   Epoch: 0   Global Step: 5150   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 19:40:07,308-Speed 5536.33 samples/sec   Loss 24.1014   LearningRate 0.1493   Epoch: 0   Global Step: 5160   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 19:40:14,729-Speed 5520.84 samples/sec   Loss 23.9848   LearningRate 0.1496   Epoch: 0   Global Step: 5170   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 19:40:22,127-Speed 5537.75 samples/sec   Loss 24.0386   LearningRate 0.1499   Epoch: 0   Global Step: 5180   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 19:40:29,526-Speed 5537.25 samples/sec   Loss 23.9327   LearningRate 0.1502   Epoch: 0   Global Step: 5190   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 19:40:37,007-Speed 5476.74 samples/sec   Loss 23.9090   LearningRate 0.1504   Epoch: 0   Global Step: 5200   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 19:40:44,445-Speed 5508.20 samples/sec   Loss 23.7552   LearningRate 0.1507   Epoch: 0   Global Step: 5210   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 19:40:51,863-Speed 5522.80 samples/sec   Loss 23.7235   LearningRate 0.1510   Epoch: 0   Global Step: 5220   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 19:40:59,335-Speed 5483.25 samples/sec   Loss 23.7381   LearningRate 0.1513   Epoch: 0   Global Step: 5230   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 19:41:06,939-Speed 5388.41 samples/sec   Loss 23.5935   LearningRate 0.1516   Epoch: 0   Global Step: 5240   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:41:14,069-Speed 5745.48 samples/sec   Loss 23.6460   LearningRate 0.1519   Epoch: 0   Global Step: 5250   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:41:21,057-Speed 5862.88 samples/sec   Loss 23.5765   LearningRate 0.1522   Epoch: 0   Global Step: 5260   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:41:28,446-Speed 5544.82 samples/sec   Loss 23.4556   LearningRate 0.1525   Epoch: 0   Global Step: 5270   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:41:35,870-Speed 5519.23 samples/sec   Loss 23.4387   LearningRate 0.1528   Epoch: 0   Global Step: 5280   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:41:43,312-Speed 5504.90 samples/sec   Loss 23.3230   LearningRate 0.1531   Epoch: 0   Global Step: 5290   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:41:50,727-Speed 5525.47 samples/sec   Loss 23.3808   LearningRate 0.1533   Epoch: 0   Global Step: 5300   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:41:58,153-Speed 5517.09 samples/sec   Loss 23.3308   LearningRate 0.1536   Epoch: 0   Global Step: 5310   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:42:05,585-Speed 5512.53 samples/sec   Loss 23.3077   LearningRate 0.1539   Epoch: 0   Global Step: 5320   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:42:13,065-Speed 5477.38 samples/sec   Loss 23.1330   LearningRate 0.1542   Epoch: 0   Global Step: 5330   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:42:20,625-Speed 5419.06 samples/sec   Loss 23.0965   LearningRate 0.1545   Epoch: 0   Global Step: 5340   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:42:27,921-Speed 5615.21 samples/sec   Loss 23.1013   LearningRate 0.1548   Epoch: 0   Global Step: 5350   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:42:35,309-Speed 5545.43 samples/sec   Loss 23.0804   LearningRate 0.1551   Epoch: 0   Global Step: 5360   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:42:42,806-Speed 5465.21 samples/sec   Loss 22.8829   LearningRate 0.1554   Epoch: 0   Global Step: 5370   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:42:50,327-Speed 5446.26 samples/sec   Loss 22.8602   LearningRate 0.1557   Epoch: 0   Global Step: 5380   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:42:57,860-Speed 5438.86 samples/sec   Loss 22.8181   LearningRate 0.1559   Epoch: 0   Global Step: 5390   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:43:05,281-Speed 5520.73 samples/sec   Loss 22.7181   LearningRate 0.1562   Epoch: 0   Global Step: 5400   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:43:12,745-Speed 5489.17 samples/sec   Loss 22.6841   LearningRate 0.1565   Epoch: 0   Global Step: 5410   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:43:20,275-Speed 5440.74 samples/sec   Loss 22.6541   LearningRate 0.1568   Epoch: 0   Global Step: 5420   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:43:27,848-Speed 5410.18 samples/sec   Loss 22.6078   LearningRate 0.1571   Epoch: 0   Global Step: 5430   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:43:35,289-Speed 5505.38 samples/sec   Loss 22.6401   LearningRate 0.1574   Epoch: 0   Global Step: 5440   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 19:43:42,701-Speed 5527.59 samples/sec   Loss 22.6112   LearningRate 0.1577   Epoch: 0   Global Step: 5450   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:43:50,153-Speed 5497.48 samples/sec   Loss 22.4965   LearningRate 0.1580   Epoch: 0   Global Step: 5460   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:43:57,587-Speed 5511.09 samples/sec   Loss 22.4275   LearningRate 0.1583   Epoch: 0   Global Step: 5470   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:44:05,276-Speed 5327.69 samples/sec   Loss 22.3580   LearningRate 0.1585   Epoch: 0   Global Step: 5480   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:44:12,761-Speed 5474.35 samples/sec   Loss 22.3150   LearningRate 0.1588   Epoch: 0   Global Step: 5490   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:44:20,216-Speed 5494.80 samples/sec   Loss 22.3175   LearningRate 0.1591   Epoch: 0   Global Step: 5500   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:44:27,717-Speed 5462.19 samples/sec   Loss 22.2088   LearningRate 0.1594   Epoch: 0   Global Step: 5510   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:44:35,185-Speed 5485.85 samples/sec   Loss 22.2189   LearningRate 0.1597   Epoch: 0   Global Step: 5520   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:44:42,593-Speed 5530.62 samples/sec   Loss 22.1567   LearningRate 0.1600   Epoch: 0   Global Step: 5530   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:44:50,026-Speed 5511.83 samples/sec   Loss 22.0710   LearningRate 0.1603   Epoch: 0   Global Step: 5540   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:44:57,471-Speed 5502.73 samples/sec   Loss 21.9900   LearningRate 0.1606   Epoch: 0   Global Step: 5550   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:45:05,117-Speed 5358.04 samples/sec   Loss 21.9654   LearningRate 0.1609   Epoch: 0   Global Step: 5560   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:45:12,773-Speed 5351.36 samples/sec   Loss 21.9420   LearningRate 0.1612   Epoch: 0   Global Step: 5570   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:45:20,401-Speed 5371.07 samples/sec   Loss 21.8924   LearningRate 0.1614   Epoch: 0   Global Step: 5580   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:45:28,145-Speed 5290.01 samples/sec   Loss 21.8744   LearningRate 0.1617   Epoch: 0   Global Step: 5590   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:45:35,693-Speed 5427.32 samples/sec   Loss 21.8290   LearningRate 0.1620   Epoch: 0   Global Step: 5600   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:45:43,234-Speed 5432.68 samples/sec   Loss 21.7355   LearningRate 0.1623   Epoch: 0   Global Step: 5610   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:45:50,788-Speed 5423.82 samples/sec   Loss 21.6304   LearningRate 0.1626   Epoch: 0   Global Step: 5620   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:45:58,302-Speed 5452.49 samples/sec   Loss 21.6079   LearningRate 0.1629   Epoch: 0   Global Step: 5630   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:46:05,884-Speed 5403.08 samples/sec   Loss 21.5606   LearningRate 0.1632   Epoch: 0   Global Step: 5640   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:46:13,535-Speed 5355.15 samples/sec   Loss 21.5759   LearningRate 0.1635   Epoch: 0   Global Step: 5650   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 19:46:21,072-Speed 5435.87 samples/sec   Loss 21.4573   LearningRate 0.1638   Epoch: 0   Global Step: 5660   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:46:28,572-Speed 5462.84 samples/sec   Loss 21.5045   LearningRate 0.1640   Epoch: 0   Global Step: 5670   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:46:36,083-Speed 5454.32 samples/sec   Loss 21.4279   LearningRate 0.1643   Epoch: 0   Global Step: 5680   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:46:43,599-Speed 5452.77 samples/sec   Loss 21.3697   LearningRate 0.1646   Epoch: 0   Global Step: 5690   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:46:51,095-Speed 5465.96 samples/sec   Loss 21.1908   LearningRate 0.1649   Epoch: 0   Global Step: 5700   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:46:58,451-Speed 5570.01 samples/sec   Loss 21.1978   LearningRate 0.1652   Epoch: 0   Global Step: 5710   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:47:05,924-Speed 5482.41 samples/sec   Loss 21.2516   LearningRate 0.1655   Epoch: 0   Global Step: 5720   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:47:13,497-Speed 5410.23 samples/sec   Loss 21.2353   LearningRate 0.1658   Epoch: 0   Global Step: 5730   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:47:21,089-Speed 5395.58 samples/sec   Loss 21.0360   LearningRate 0.1661   Epoch: 0   Global Step: 5740   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:47:29,014-Speed 5169.55 samples/sec   Loss 21.0284   LearningRate 0.1664   Epoch: 0   Global Step: 5750   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:47:36,573-Speed 5420.33 samples/sec   Loss 21.0027   LearningRate 0.1667   Epoch: 0   Global Step: 5760   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:47:44,130-Speed 5421.73 samples/sec   Loss 21.0548   LearningRate 0.1669   Epoch: 0   Global Step: 5770   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:47:51,708-Speed 5405.96 samples/sec   Loss 20.9004   LearningRate 0.1672   Epoch: 0   Global Step: 5780   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:47:59,261-Speed 5424.70 samples/sec   Loss 20.8589   LearningRate 0.1675   Epoch: 0   Global Step: 5790   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:48:06,858-Speed 5393.32 samples/sec   Loss 20.8797   LearningRate 0.1678   Epoch: 0   Global Step: 5800   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:48:14,484-Speed 5372.62 samples/sec   Loss 20.8691   LearningRate 0.1681   Epoch: 0   Global Step: 5810   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:48:22,161-Speed 5336.18 samples/sec   Loss 20.7739   LearningRate 0.1684   Epoch: 0   Global Step: 5820   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:48:29,850-Speed 5328.24 samples/sec   Loss 20.9255   LearningRate 0.1687   Epoch: 0   Global Step: 5830   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:48:37,446-Speed 5393.18 samples/sec   Loss 20.7084   LearningRate 0.1690   Epoch: 0   Global Step: 5840   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:48:44,956-Speed 5455.30 samples/sec   Loss 20.6774   LearningRate 0.1693   Epoch: 0   Global Step: 5850   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:48:52,435-Speed 5477.85 samples/sec   Loss 20.6532   LearningRate 0.1695   Epoch: 0   Global Step: 5860   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 19:48:59,884-Speed 5498.96 samples/sec   Loss 20.6166   LearningRate 0.1698   Epoch: 0   Global Step: 5870   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 19:49:07,386-Speed 5460.25 samples/sec   Loss 20.4948   LearningRate 0.1701   Epoch: 0   Global Step: 5880   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 19:49:14,872-Speed 5473.09 samples/sec   Loss 20.5017   LearningRate 0.1704   Epoch: 0   Global Step: 5890   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:49:22,336-Speed 5488.52 samples/sec   Loss 20.5409   LearningRate 0.1707   Epoch: 0   Global Step: 5900   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:49:29,790-Speed 5495.21 samples/sec   Loss 20.3403   LearningRate 0.1710   Epoch: 0   Global Step: 5910   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:49:37,244-Speed 5495.28 samples/sec   Loss 20.3292   LearningRate 0.1713   Epoch: 0   Global Step: 5920   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:49:44,805-Speed 5418.88 samples/sec   Loss 20.2235   LearningRate 0.1716   Epoch: 0   Global Step: 5930   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:49:52,361-Speed 5421.42 samples/sec   Loss 20.1557   LearningRate 0.1719   Epoch: 0   Global Step: 5940   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:49:59,827-Speed 5486.74 samples/sec   Loss 20.2184   LearningRate 0.1721   Epoch: 0   Global Step: 5950   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:50:07,295-Speed 5485.10 samples/sec   Loss 20.1251   LearningRate 0.1724   Epoch: 0   Global Step: 5960   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:50:14,744-Speed 5499.61 samples/sec   Loss 20.0960   LearningRate 0.1727   Epoch: 0   Global Step: 5970   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:50:22,182-Speed 5507.92 samples/sec   Loss 20.1102   LearningRate 0.1730   Epoch: 0   Global Step: 5980   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:50:29,649-Speed 5486.20 samples/sec   Loss 20.0471   LearningRate 0.1733   Epoch: 0   Global Step: 5990   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 19:50:37,077-Speed 5514.49 samples/sec   Loss 20.0268   LearningRate 0.1736   Epoch: 0   Global Step: 6000   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 19:51:22,551-[lfw][6000]XNorm: 24.317117
Training: 2022-01-07 19:51:22,552-[lfw][6000]Accuracy-Flip: 0.99367+-0.00407
Training: 2022-01-07 19:51:22,552-[lfw][6000]Accuracy-Highest: 0.99367
Training: 2022-01-07 19:52:15,361-[cfp_fp][6000]XNorm: 22.341087
Training: 2022-01-07 19:52:15,362-[cfp_fp][6000]Accuracy-Flip: 0.94243+-0.00780
Training: 2022-01-07 19:52:15,363-[cfp_fp][6000]Accuracy-Highest: 0.94243
Training: 2022-01-07 19:53:00,944-[agedb_30][6000]XNorm: 23.721091
Training: 2022-01-07 19:53:00,945-[agedb_30][6000]Accuracy-Flip: 0.92217+-0.01402
Training: 2022-01-07 19:53:00,945-[agedb_30][6000]Accuracy-Highest: 0.92217
Training: 2022-01-07 19:53:08,512-Speed 270.48 samples/sec   Loss 20.0161   LearningRate 0.1739   Epoch: 0   Global Step: 6010   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:53:16,052-Speed 5433.82 samples/sec   Loss 19.9685   LearningRate 0.1742   Epoch: 0   Global Step: 6020   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:53:23,504-Speed 5497.55 samples/sec   Loss 19.9108   LearningRate 0.1745   Epoch: 0   Global Step: 6030   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:53:31,027-Speed 5446.30 samples/sec   Loss 19.8129   LearningRate 0.1748   Epoch: 0   Global Step: 6040   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:53:38,469-Speed 5505.22 samples/sec   Loss 19.8113   LearningRate 0.1750   Epoch: 0   Global Step: 6050   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:53:45,946-Speed 5479.55 samples/sec   Loss 19.7808   LearningRate 0.1753   Epoch: 0   Global Step: 6060   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:53:53,489-Speed 5430.89 samples/sec   Loss 19.7104   LearningRate 0.1756   Epoch: 0   Global Step: 6070   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:54:01,007-Speed 5449.36 samples/sec   Loss 19.6467   LearningRate 0.1759   Epoch: 0   Global Step: 6080   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:54:08,500-Speed 5467.45 samples/sec   Loss 19.7205   LearningRate 0.1762   Epoch: 0   Global Step: 6090   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:54:15,958-Speed 5493.27 samples/sec   Loss 19.6130   LearningRate 0.1765   Epoch: 0   Global Step: 6100   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-01-07 19:54:23,420-Speed 5490.52 samples/sec   Loss 19.5520   LearningRate 0.1768   Epoch: 0   Global Step: 6110   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-01-07 19:54:30,882-Speed 5490.34 samples/sec   Loss 19.4433   LearningRate 0.1771   Epoch: 0   Global Step: 6120   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-01-07 19:54:38,075-Speed 5696.53 samples/sec   Loss 19.4297   LearningRate 0.1774   Epoch: 0   Global Step: 6130   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-01-07 19:54:45,565-Speed 5469.56 samples/sec   Loss 19.4453   LearningRate 0.1776   Epoch: 0   Global Step: 6140   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-01-07 19:54:53,112-Speed 5429.36 samples/sec   Loss 19.4192   LearningRate 0.1779   Epoch: 0   Global Step: 6150   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:55:00,630-Speed 5448.98 samples/sec   Loss 19.4534   LearningRate 0.1782   Epoch: 0   Global Step: 6160   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:55:08,063-Speed 5512.02 samples/sec   Loss 19.3592   LearningRate 0.1785   Epoch: 0   Global Step: 6170   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:55:15,520-Speed 5493.98 samples/sec   Loss 19.3500   LearningRate 0.1788   Epoch: 0   Global Step: 6180   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:55:23,046-Speed 5443.93 samples/sec   Loss 19.3020   LearningRate 0.1791   Epoch: 0   Global Step: 6190   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:55:30,549-Speed 5460.38 samples/sec   Loss 19.2346   LearningRate 0.1794   Epoch: 0   Global Step: 6200   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:55:38,119-Speed 5412.24 samples/sec   Loss 19.1614   LearningRate 0.1797   Epoch: 0   Global Step: 6210   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:55:45,592-Speed 5481.86 samples/sec   Loss 19.1560   LearningRate 0.1800   Epoch: 0   Global Step: 6220   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:55:53,006-Speed 5526.01 samples/sec   Loss 19.1178   LearningRate 0.1802   Epoch: 0   Global Step: 6230   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:56:00,468-Speed 5491.14 samples/sec   Loss 19.0824   LearningRate 0.1805   Epoch: 0   Global Step: 6240   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:56:07,912-Speed 5503.58 samples/sec   Loss 19.1459   LearningRate 0.1808   Epoch: 0   Global Step: 6250   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-01-07 19:56:15,323-Speed 5528.24 samples/sec   Loss 18.9952   LearningRate 0.1811   Epoch: 0   Global Step: 6260   Fp16 Grad Scale: 262144   Required: 46 hours
Training: 2022-01-07 19:56:22,782-Speed 5492.51 samples/sec   Loss 19.0514   LearningRate 0.1814   Epoch: 0   Global Step: 6270   Fp16 Grad Scale: 131072   Required: 46 hours
Training: 2022-01-07 19:56:30,286-Speed 5459.46 samples/sec   Loss 18.9426   LearningRate 0.1817   Epoch: 0   Global Step: 6280   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-01-07 19:56:37,774-Speed 5471.94 samples/sec   Loss 18.9009   LearningRate 0.1820   Epoch: 0   Global Step: 6290   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-01-07 19:56:45,315-Speed 5432.68 samples/sec   Loss 18.8152   LearningRate 0.1823   Epoch: 0   Global Step: 6300   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-01-07 19:56:52,953-Speed 5363.83 samples/sec   Loss 18.8247   LearningRate 0.1826   Epoch: 0   Global Step: 6310   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-01-07 19:57:00,321-Speed 5560.76 samples/sec   Loss 18.8242   LearningRate 0.1829   Epoch: 0   Global Step: 6320   Fp16 Grad Scale: 65536   Required: 46 hours
Training: 2022-01-07 19:57:07,811-Speed 5469.71 samples/sec   Loss 18.8048   LearningRate 0.1831   Epoch: 0   Global Step: 6330   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 19:57:15,290-Speed 5477.33 samples/sec   Loss 18.7400   LearningRate 0.1834   Epoch: 0   Global Step: 6340   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 19:57:22,969-Speed 5334.86 samples/sec   Loss 18.6827   LearningRate 0.1837   Epoch: 0   Global Step: 6350   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 19:57:30,598-Speed 5370.14 samples/sec   Loss 18.6426   LearningRate 0.1840   Epoch: 0   Global Step: 6360   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 19:57:38,031-Speed 5512.68 samples/sec   Loss 18.6247   LearningRate 0.1843   Epoch: 0   Global Step: 6370   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 19:57:45,173-Speed 5736.11 samples/sec   Loss 18.5976   LearningRate 0.1846   Epoch: 0   Global Step: 6380   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:57:52,617-Speed 5503.44 samples/sec   Loss 18.4729   LearningRate 0.1849   Epoch: 0   Global Step: 6390   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:58:00,120-Speed 5460.47 samples/sec   Loss 18.5630   LearningRate 0.1852   Epoch: 0   Global Step: 6400   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:58:07,565-Speed 5502.31 samples/sec   Loss 18.3917   LearningRate 0.1855   Epoch: 0   Global Step: 6410   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:58:15,041-Speed 5480.32 samples/sec   Loss 18.5088   LearningRate 0.1857   Epoch: 0   Global Step: 6420   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:58:22,538-Speed 5464.53 samples/sec   Loss 18.3972   LearningRate 0.1860   Epoch: 0   Global Step: 6430   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:58:30,002-Speed 5488.76 samples/sec   Loss 18.3619   LearningRate 0.1863   Epoch: 0   Global Step: 6440   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:58:37,479-Speed 5479.33 samples/sec   Loss 18.4116   LearningRate 0.1866   Epoch: 0   Global Step: 6450   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:58:44,791-Speed 5603.31 samples/sec   Loss 18.3470   LearningRate 0.1869   Epoch: 0   Global Step: 6460   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:58:51,949-Speed 5723.78 samples/sec   Loss 18.3289   LearningRate 0.1872   Epoch: 0   Global Step: 6470   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:58:59,425-Speed 5479.90 samples/sec   Loss 18.2687   LearningRate 0.1875   Epoch: 0   Global Step: 6480   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 19:59:06,848-Speed 5518.99 samples/sec   Loss 18.2842   LearningRate 0.1878   Epoch: 0   Global Step: 6490   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:59:14,009-Speed 5721.27 samples/sec   Loss 18.2267   LearningRate 0.1881   Epoch: 0   Global Step: 6500   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:59:21,414-Speed 5532.41 samples/sec   Loss 18.0968   LearningRate 0.1883   Epoch: 0   Global Step: 6510   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:59:28,986-Speed 5410.99 samples/sec   Loss 18.2099   LearningRate 0.1886   Epoch: 0   Global Step: 6520   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:59:36,525-Speed 5434.11 samples/sec   Loss 18.0042   LearningRate 0.1889   Epoch: 0   Global Step: 6530   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:59:44,002-Speed 5479.72 samples/sec   Loss 17.9923   LearningRate 0.1892   Epoch: 0   Global Step: 6540   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:59:51,450-Speed 5500.26 samples/sec   Loss 18.0607   LearningRate 0.1895   Epoch: 0   Global Step: 6550   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 19:59:58,908-Speed 5493.45 samples/sec   Loss 18.0829   LearningRate 0.1898   Epoch: 0   Global Step: 6560   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:00:06,397-Speed 5470.55 samples/sec   Loss 17.9495   LearningRate 0.1901   Epoch: 0   Global Step: 6570   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:00:13,861-Speed 5489.01 samples/sec   Loss 17.9123   LearningRate 0.1904   Epoch: 0   Global Step: 6580   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:00:21,419-Speed 5420.15 samples/sec   Loss 17.8779   LearningRate 0.1907   Epoch: 0   Global Step: 6590   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 20:00:28,905-Speed 5472.57 samples/sec   Loss 17.8210   LearningRate 0.1910   Epoch: 0   Global Step: 6600   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 20:00:36,428-Speed 5446.14 samples/sec   Loss 17.8541   LearningRate 0.1912   Epoch: 0   Global Step: 6610   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 20:00:43,921-Speed 5467.86 samples/sec   Loss 17.9578   LearningRate 0.1915   Epoch: 0   Global Step: 6620   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:00:51,543-Speed 5375.32 samples/sec   Loss 17.8066   LearningRate 0.1918   Epoch: 0   Global Step: 6630   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:00:59,173-Speed 5368.87 samples/sec   Loss 17.8634   LearningRate 0.1921   Epoch: 0   Global Step: 6640   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:01:06,549-Speed 5554.44 samples/sec   Loss 17.8018   LearningRate 0.1924   Epoch: 0   Global Step: 6650   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:01:14,042-Speed 5468.07 samples/sec   Loss 17.6944   LearningRate 0.1927   Epoch: 0   Global Step: 6660   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:01:21,416-Speed 5557.06 samples/sec   Loss 17.7702   LearningRate 0.1930   Epoch: 0   Global Step: 6670   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:01:28,819-Speed 5533.69 samples/sec   Loss 17.6261   LearningRate 0.1933   Epoch: 0   Global Step: 6680   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:01:36,285-Speed 5487.61 samples/sec   Loss 17.5719   LearningRate 0.1936   Epoch: 0   Global Step: 6690   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:01:43,637-Speed 5572.87 samples/sec   Loss 17.5273   LearningRate 0.1938   Epoch: 0   Global Step: 6700   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:01:50,951-Speed 5600.96 samples/sec   Loss 17.6439   LearningRate 0.1941   Epoch: 0   Global Step: 6710   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:01:58,365-Speed 5525.59 samples/sec   Loss 17.5789   LearningRate 0.1944   Epoch: 0   Global Step: 6720   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:02:05,775-Speed 5528.71 samples/sec   Loss 17.5115   LearningRate 0.1947   Epoch: 0   Global Step: 6730   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:02:13,195-Speed 5521.24 samples/sec   Loss 17.5467   LearningRate 0.1950   Epoch: 0   Global Step: 6740   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:02:20,671-Speed 5480.59 samples/sec   Loss 17.5204   LearningRate 0.1953   Epoch: 0   Global Step: 6750   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:02:28,080-Speed 5529.14 samples/sec   Loss 17.5821   LearningRate 0.1956   Epoch: 0   Global Step: 6760   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:02:35,502-Speed 5520.55 samples/sec   Loss 17.4434   LearningRate 0.1959   Epoch: 0   Global Step: 6770   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:02:42,935-Speed 5511.43 samples/sec   Loss 17.4371   LearningRate 0.1962   Epoch: 0   Global Step: 6780   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:02:50,421-Speed 5473.16 samples/sec   Loss 17.3750   LearningRate 0.1965   Epoch: 0   Global Step: 6790   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:02:57,878-Speed 5493.33 samples/sec   Loss 17.2812   LearningRate 0.1967   Epoch: 0   Global Step: 6800   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:03:05,470-Speed 5396.57 samples/sec   Loss 17.2558   LearningRate 0.1970   Epoch: 0   Global Step: 6810   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:03:12,876-Speed 5532.02 samples/sec   Loss 17.3264   LearningRate 0.1973   Epoch: 0   Global Step: 6820   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 20:03:20,359-Speed 5474.41 samples/sec   Loss 17.3109   LearningRate 0.1976   Epoch: 0   Global Step: 6830   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:03:27,820-Speed 5491.46 samples/sec   Loss 17.2885   LearningRate 0.1979   Epoch: 0   Global Step: 6840   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:03:35,424-Speed 5387.96 samples/sec   Loss 17.2359   LearningRate 0.1982   Epoch: 0   Global Step: 6850   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:03:42,959-Speed 5436.59 samples/sec   Loss 17.1097   LearningRate 0.1985   Epoch: 0   Global Step: 6860   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:03:50,460-Speed 5461.90 samples/sec   Loss 17.0890   LearningRate 0.1988   Epoch: 0   Global Step: 6870   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:03:57,938-Speed 5478.73 samples/sec   Loss 17.1852   LearningRate 0.1991   Epoch: 0   Global Step: 6880   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:04:05,542-Speed 5387.47 samples/sec   Loss 17.1903   LearningRate 0.1993   Epoch: 0   Global Step: 6890   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:04:13,136-Speed 5395.42 samples/sec   Loss 17.0419   LearningRate 0.1996   Epoch: 0   Global Step: 6900   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:04:20,596-Speed 5492.05 samples/sec   Loss 17.0394   LearningRate 0.1999   Epoch: 0   Global Step: 6910   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:04:28,021-Speed 5517.17 samples/sec   Loss 16.9957   LearningRate 0.2002   Epoch: 0   Global Step: 6920   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:04:35,465-Speed 5503.79 samples/sec   Loss 16.9898   LearningRate 0.2005   Epoch: 0   Global Step: 6930   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 20:04:42,960-Speed 5466.26 samples/sec   Loss 17.0505   LearningRate 0.2008   Epoch: 0   Global Step: 6940   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 20:04:50,352-Speed 5542.08 samples/sec   Loss 16.8923   LearningRate 0.2011   Epoch: 0   Global Step: 6950   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:04:57,861-Speed 5455.49 samples/sec   Loss 17.0299   LearningRate 0.2014   Epoch: 0   Global Step: 6960   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:05:05,290-Speed 5514.40 samples/sec   Loss 16.9096   LearningRate 0.2017   Epoch: 0   Global Step: 6970   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:05:12,513-Speed 5672.02 samples/sec   Loss 16.9186   LearningRate 0.2019   Epoch: 0   Global Step: 6980   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:05:19,829-Speed 5600.53 samples/sec   Loss 16.7981   LearningRate 0.2022   Epoch: 0   Global Step: 6990   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:05:27,273-Speed 5503.69 samples/sec   Loss 16.8215   LearningRate 0.2025   Epoch: 0   Global Step: 7000   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:05:34,691-Speed 5522.37 samples/sec   Loss 16.8175   LearningRate 0.2028   Epoch: 0   Global Step: 7010   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:05:42,135-Speed 5504.06 samples/sec   Loss 16.7989   LearningRate 0.2031   Epoch: 0   Global Step: 7020   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:05:49,663-Speed 5442.08 samples/sec   Loss 16.7380   LearningRate 0.2034   Epoch: 0   Global Step: 7030   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:05:57,067-Speed 5533.09 samples/sec   Loss 16.7160   LearningRate 0.2037   Epoch: 0   Global Step: 7040   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:06:04,803-Speed 5296.18 samples/sec   Loss 16.7093   LearningRate 0.2040   Epoch: 0   Global Step: 7050   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 20:06:12,201-Speed 5537.45 samples/sec   Loss 16.6190   LearningRate 0.2043   Epoch: 0   Global Step: 7060   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:06:19,869-Speed 5342.51 samples/sec   Loss 16.6783   LearningRate 0.2046   Epoch: 0   Global Step: 7070   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:06:27,298-Speed 5515.06 samples/sec   Loss 16.7052   LearningRate 0.2048   Epoch: 0   Global Step: 7080   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:06:34,730-Speed 5512.39 samples/sec   Loss 16.6336   LearningRate 0.2051   Epoch: 0   Global Step: 7090   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:06:42,257-Speed 5442.47 samples/sec   Loss 16.6597   LearningRate 0.2054   Epoch: 0   Global Step: 7100   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:06:49,877-Speed 5376.93 samples/sec   Loss 16.6475   LearningRate 0.2057   Epoch: 0   Global Step: 7110   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:06:57,267-Speed 5543.45 samples/sec   Loss 16.4910   LearningRate 0.2060   Epoch: 0   Global Step: 7120   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:07:04,726-Speed 5492.99 samples/sec   Loss 16.5009   LearningRate 0.2063   Epoch: 0   Global Step: 7130   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:07:11,782-Speed 5805.82 samples/sec   Loss 16.4465   LearningRate 0.2066   Epoch: 0   Global Step: 7140   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:07:18,904-Speed 5752.30 samples/sec   Loss 16.5654   LearningRate 0.2069   Epoch: 0   Global Step: 7150   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:07:26,447-Speed 5431.79 samples/sec   Loss 16.4001   LearningRate 0.2072   Epoch: 0   Global Step: 7160   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 20:07:33,857-Speed 5528.79 samples/sec   Loss 16.4053   LearningRate 0.2074   Epoch: 0   Global Step: 7170   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:07:41,308-Speed 5498.47 samples/sec   Loss 16.4448   LearningRate 0.2077   Epoch: 0   Global Step: 7180   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:07:48,735-Speed 5515.68 samples/sec   Loss 16.4301   LearningRate 0.2080   Epoch: 0   Global Step: 7190   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:07:56,135-Speed 5536.88 samples/sec   Loss 16.3543   LearningRate 0.2083   Epoch: 0   Global Step: 7200   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:08:03,582-Speed 5501.74 samples/sec   Loss 16.4164   LearningRate 0.2086   Epoch: 0   Global Step: 7210   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:08:11,084-Speed 5461.05 samples/sec   Loss 16.3323   LearningRate 0.2089   Epoch: 0   Global Step: 7220   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:08:18,598-Speed 5452.38 samples/sec   Loss 16.3451   LearningRate 0.2092   Epoch: 0   Global Step: 7230   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:08:26,144-Speed 5429.17 samples/sec   Loss 16.2984   LearningRate 0.2095   Epoch: 0   Global Step: 7240   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:08:33,604-Speed 5492.55 samples/sec   Loss 16.2480   LearningRate 0.2098   Epoch: 0   Global Step: 7250   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:08:41,090-Speed 5472.88 samples/sec   Loss 16.2566   LearningRate 0.2100   Epoch: 0   Global Step: 7260   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:08:48,595-Speed 5458.32 samples/sec   Loss 16.3576   LearningRate 0.2103   Epoch: 0   Global Step: 7270   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 20:08:56,549-Speed 5151.16 samples/sec   Loss 16.2728   LearningRate 0.2106   Epoch: 0   Global Step: 7280   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 20:09:04,077-Speed 5442.78 samples/sec   Loss 16.2869   LearningRate 0.2109   Epoch: 0   Global Step: 7290   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 20:09:11,501-Speed 5517.70 samples/sec   Loss 16.2606   LearningRate 0.2112   Epoch: 0   Global Step: 7300   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:09:19,059-Speed 5420.56 samples/sec   Loss 16.2300   LearningRate 0.2115   Epoch: 0   Global Step: 7310   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:09:26,802-Speed 5292.04 samples/sec   Loss 16.1754   LearningRate 0.2118   Epoch: 0   Global Step: 7320   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:09:34,422-Speed 5376.85 samples/sec   Loss 16.1130   LearningRate 0.2121   Epoch: 0   Global Step: 7330   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:09:41,596-Speed 5710.28 samples/sec   Loss 16.2005   LearningRate 0.2124   Epoch: 0   Global Step: 7340   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:09:49,189-Speed 5395.55 samples/sec   Loss 16.1524   LearningRate 0.2127   Epoch: 0   Global Step: 7350   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:09:56,594-Speed 5532.96 samples/sec   Loss 16.0834   LearningRate 0.2129   Epoch: 0   Global Step: 7360   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:10:04,149-Speed 5422.95 samples/sec   Loss 16.0488   LearningRate 0.2132   Epoch: 0   Global Step: 7370   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:10:11,587-Speed 5507.26 samples/sec   Loss 16.0058   LearningRate 0.2135   Epoch: 0   Global Step: 7380   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:10:19,010-Speed 5519.79 samples/sec   Loss 16.0627   LearningRate 0.2138   Epoch: 0   Global Step: 7390   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:10:26,466-Speed 5494.91 samples/sec   Loss 16.0630   LearningRate 0.2141   Epoch: 0   Global Step: 7400   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 20:10:33,850-Speed 5548.77 samples/sec   Loss 16.0269   LearningRate 0.2144   Epoch: 0   Global Step: 7410   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:10:41,415-Speed 5414.84 samples/sec   Loss 16.0382   LearningRate 0.2147   Epoch: 0   Global Step: 7420   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:10:49,014-Speed 5391.83 samples/sec   Loss 15.9221   LearningRate 0.2150   Epoch: 0   Global Step: 7430   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:10:56,723-Speed 5315.10 samples/sec   Loss 16.0316   LearningRate 0.2153   Epoch: 0   Global Step: 7440   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:11:04,290-Speed 5413.77 samples/sec   Loss 15.9327   LearningRate 0.2155   Epoch: 0   Global Step: 7450   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:11:11,845-Speed 5422.77 samples/sec   Loss 15.8868   LearningRate 0.2158   Epoch: 0   Global Step: 7460   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:11:19,344-Speed 5463.14 samples/sec   Loss 15.8266   LearningRate 0.2161   Epoch: 0   Global Step: 7470   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:11:26,893-Speed 5427.47 samples/sec   Loss 15.7827   LearningRate 0.2164   Epoch: 0   Global Step: 7480   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:11:34,385-Speed 5468.68 samples/sec   Loss 15.7641   LearningRate 0.2167   Epoch: 0   Global Step: 7490   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:11:41,919-Speed 5437.35 samples/sec   Loss 15.8835   LearningRate 0.2170   Epoch: 0   Global Step: 7500   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:11:49,450-Speed 5439.96 samples/sec   Loss 15.8785   LearningRate 0.2173   Epoch: 0   Global Step: 7510   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 20:11:56,793-Speed 5580.09 samples/sec   Loss 15.7159   LearningRate 0.2176   Epoch: 0   Global Step: 7520   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:12:04,406-Speed 5381.65 samples/sec   Loss 15.8402   LearningRate 0.2179   Epoch: 0   Global Step: 7530   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:12:11,885-Speed 5477.65 samples/sec   Loss 15.7591   LearningRate 0.2182   Epoch: 0   Global Step: 7540   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:12:19,295-Speed 5529.59 samples/sec   Loss 15.7267   LearningRate 0.2184   Epoch: 0   Global Step: 7550   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:12:26,693-Speed 5537.96 samples/sec   Loss 15.8257   LearningRate 0.2187   Epoch: 0   Global Step: 7560   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:12:34,156-Speed 5490.20 samples/sec   Loss 15.6745   LearningRate 0.2190   Epoch: 0   Global Step: 7570   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:12:41,637-Speed 5475.71 samples/sec   Loss 15.7864   LearningRate 0.2193   Epoch: 0   Global Step: 7580   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:12:49,094-Speed 5494.09 samples/sec   Loss 15.7427   LearningRate 0.2196   Epoch: 0   Global Step: 7590   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:12:56,551-Speed 5494.82 samples/sec   Loss 15.6878   LearningRate 0.2199   Epoch: 0   Global Step: 7600   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:13:03,991-Speed 5507.02 samples/sec   Loss 15.6863   LearningRate 0.2202   Epoch: 0   Global Step: 7610   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:13:11,416-Speed 5517.06 samples/sec   Loss 15.6490   LearningRate 0.2205   Epoch: 0   Global Step: 7620   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 20:13:18,874-Speed 5493.15 samples/sec   Loss 15.6776   LearningRate 0.2208   Epoch: 0   Global Step: 7630   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 20:13:26,439-Speed 5415.28 samples/sec   Loss 15.6791   LearningRate 0.2210   Epoch: 0   Global Step: 7640   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 20:13:34,134-Speed 5324.08 samples/sec   Loss 15.6689   LearningRate 0.2213   Epoch: 0   Global Step: 7650   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 20:13:41,582-Speed 5501.28 samples/sec   Loss 15.5300   LearningRate 0.2216   Epoch: 0   Global Step: 7660   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:13:49,199-Speed 5377.78 samples/sec   Loss 15.5036   LearningRate 0.2219   Epoch: 0   Global Step: 7670   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:13:56,731-Speed 5439.24 samples/sec   Loss 15.5316   LearningRate 0.2222   Epoch: 0   Global Step: 7680   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:14:04,142-Speed 5527.90 samples/sec   Loss 15.5402   LearningRate 0.2225   Epoch: 0   Global Step: 7690   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:14:11,568-Speed 5517.20 samples/sec   Loss 15.4393   LearningRate 0.2228   Epoch: 0   Global Step: 7700   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:14:19,038-Speed 5484.03 samples/sec   Loss 15.4199   LearningRate 0.2231   Epoch: 0   Global Step: 7710   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:14:26,455-Speed 5523.46 samples/sec   Loss 15.5028   LearningRate 0.2234   Epoch: 0   Global Step: 7720   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:14:33,970-Speed 5452.08 samples/sec   Loss 15.4933   LearningRate 0.2236   Epoch: 0   Global Step: 7730   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:14:41,015-Speed 5815.72 samples/sec   Loss 15.4717   LearningRate 0.2239   Epoch: 0   Global Step: 7740   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:14:48,384-Speed 5559.31 samples/sec   Loss 15.4054   LearningRate 0.2242   Epoch: 0   Global Step: 7750   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:14:55,930-Speed 5429.04 samples/sec   Loss 15.4853   LearningRate 0.2245   Epoch: 0   Global Step: 7760   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:15:03,357-Speed 5516.68 samples/sec   Loss 15.3310   LearningRate 0.2248   Epoch: 0   Global Step: 7770   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:15:10,803-Speed 5502.29 samples/sec   Loss 15.4989   LearningRate 0.2251   Epoch: 0   Global Step: 7780   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:15:18,269-Speed 5487.19 samples/sec   Loss 15.3973   LearningRate 0.2254   Epoch: 0   Global Step: 7790   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:15:25,774-Speed 5459.62 samples/sec   Loss 15.4000   LearningRate 0.2257   Epoch: 0   Global Step: 7800   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:15:33,301-Speed 5442.96 samples/sec   Loss 15.3901   LearningRate 0.2260   Epoch: 0   Global Step: 7810   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:15:40,787-Speed 5473.34 samples/sec   Loss 15.3886   LearningRate 0.2263   Epoch: 0   Global Step: 7820   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:15:48,209-Speed 5519.77 samples/sec   Loss 15.3312   LearningRate 0.2265   Epoch: 0   Global Step: 7830   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:15:55,658-Speed 5499.69 samples/sec   Loss 15.2532   LearningRate 0.2268   Epoch: 0   Global Step: 7840   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:16:03,098-Speed 5506.93 samples/sec   Loss 15.3900   LearningRate 0.2271   Epoch: 0   Global Step: 7850   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 20:16:10,562-Speed 5489.07 samples/sec   Loss 15.3200   LearningRate 0.2274   Epoch: 0   Global Step: 7860   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 20:16:18,132-Speed 5411.31 samples/sec   Loss 15.2019   LearningRate 0.2277   Epoch: 0   Global Step: 7870   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 20:16:25,686-Speed 5423.50 samples/sec   Loss 15.2642   LearningRate 0.2280   Epoch: 0   Global Step: 7880   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 20:16:33,247-Speed 5418.46 samples/sec   Loss 15.2952   LearningRate 0.2283   Epoch: 0   Global Step: 7890   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 20:16:40,707-Speed 5492.23 samples/sec   Loss 15.2963   LearningRate 0.2286   Epoch: 0   Global Step: 7900   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 20:16:48,304-Speed 5392.87 samples/sec   Loss 15.2850   LearningRate 0.2289   Epoch: 0   Global Step: 7910   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 20:16:55,740-Speed 5509.28 samples/sec   Loss 15.3123   LearningRate 0.2291   Epoch: 0   Global Step: 7920   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 20:17:03,307-Speed 5413.89 samples/sec   Loss 15.2716   LearningRate 0.2294   Epoch: 0   Global Step: 7930   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 20:17:10,767-Speed 5491.60 samples/sec   Loss 15.2446   LearningRate 0.2297   Epoch: 0   Global Step: 7940   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 20:17:18,248-Speed 5476.91 samples/sec   Loss 15.2185   LearningRate 0.2300   Epoch: 0   Global Step: 7950   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:17:26,673-Speed 4861.99 samples/sec   Loss 15.1207   LearningRate 0.2303   Epoch: 0   Global Step: 7960   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:17:34,130-Speed 5493.80 samples/sec   Loss 15.1824   LearningRate 0.2306   Epoch: 0   Global Step: 7970   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:17:41,657-Speed 5443.52 samples/sec   Loss 15.1881   LearningRate 0.2309   Epoch: 0   Global Step: 7980   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:17:49,200-Speed 5431.22 samples/sec   Loss 15.2180   LearningRate 0.2312   Epoch: 0   Global Step: 7990   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:17:56,728-Speed 5442.08 samples/sec   Loss 15.1600   LearningRate 0.2315   Epoch: 0   Global Step: 8000   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:18:42,198-[lfw][8000]XNorm: 22.129730
Training: 2022-01-07 20:18:42,199-[lfw][8000]Accuracy-Flip: 0.99450+-0.00395
Training: 2022-01-07 20:18:42,200-[lfw][8000]Accuracy-Highest: 0.99450
Training: 2022-01-07 20:19:34,979-[cfp_fp][8000]XNorm: 20.460830
Training: 2022-01-07 20:19:34,980-[cfp_fp][8000]Accuracy-Flip: 0.96786+-0.01033
Training: 2022-01-07 20:19:34,981-[cfp_fp][8000]Accuracy-Highest: 0.96786
Training: 2022-01-07 20:20:20,531-[agedb_30][8000]XNorm: 21.737388
Training: 2022-01-07 20:20:20,533-[agedb_30][8000]Accuracy-Flip: 0.94783+-0.01088
Training: 2022-01-07 20:20:20,533-[agedb_30][8000]Accuracy-Highest: 0.94783
Training: 2022-01-07 20:20:28,244-Speed 270.34 samples/sec   Loss 15.1302   LearningRate 0.2317   Epoch: 0   Global Step: 8010   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:20:35,683-Speed 5510.00 samples/sec   Loss 15.0885   LearningRate 0.2320   Epoch: 0   Global Step: 8020   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:20:43,223-Speed 5433.50 samples/sec   Loss 15.0030   LearningRate 0.2323   Epoch: 0   Global Step: 8030   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:20:50,680-Speed 5494.63 samples/sec   Loss 15.0732   LearningRate 0.2326   Epoch: 0   Global Step: 8040   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:20:58,161-Speed 5476.49 samples/sec   Loss 15.0429   LearningRate 0.2329   Epoch: 0   Global Step: 8050   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 20:21:05,669-Speed 5456.27 samples/sec   Loss 15.1155   LearningRate 0.2332   Epoch: 0   Global Step: 8060   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:21:13,206-Speed 5435.36 samples/sec   Loss 15.0025   LearningRate 0.2335   Epoch: 0   Global Step: 8070   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:21:20,363-Speed 5724.83 samples/sec   Loss 15.1174   LearningRate 0.2338   Epoch: 0   Global Step: 8080   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:21:27,484-Speed 5752.59 samples/sec   Loss 15.0260   LearningRate 0.2341   Epoch: 0   Global Step: 8090   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:21:35,072-Speed 5399.07 samples/sec   Loss 15.0343   LearningRate 0.2344   Epoch: 0   Global Step: 8100   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:21:42,675-Speed 5388.77 samples/sec   Loss 15.0548   LearningRate 0.2346   Epoch: 0   Global Step: 8110   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:21:50,195-Speed 5448.95 samples/sec   Loss 15.0523   LearningRate 0.2349   Epoch: 0   Global Step: 8120   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:21:57,705-Speed 5455.30 samples/sec   Loss 14.9050   LearningRate 0.2352   Epoch: 0   Global Step: 8130   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:22:05,239-Speed 5438.39 samples/sec   Loss 14.9479   LearningRate 0.2355   Epoch: 0   Global Step: 8140   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:22:12,697-Speed 5492.78 samples/sec   Loss 14.9536   LearningRate 0.2358   Epoch: 0   Global Step: 8150   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:22:20,190-Speed 5467.46 samples/sec   Loss 14.8065   LearningRate 0.2361   Epoch: 0   Global Step: 8160   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 20:22:27,724-Speed 5437.54 samples/sec   Loss 14.9941   LearningRate 0.2364   Epoch: 0   Global Step: 8170   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:22:35,196-Speed 5483.15 samples/sec   Loss 14.9332   LearningRate 0.2367   Epoch: 0   Global Step: 8180   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-01-07 20:22:42,616-Speed 5521.59 samples/sec   Loss 14.9612   LearningRate 0.2370   Epoch: 0   Global Step: 8190   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-01-07 20:22:49,700-Speed 5783.23 samples/sec   Loss 14.8972   LearningRate 0.2372   Epoch: 0   Global Step: 8200   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-01-07 20:22:56,863-Speed 5719.41 samples/sec   Loss 14.9157   LearningRate 0.2375   Epoch: 0   Global Step: 8210   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-01-07 20:23:03,958-Speed 5774.11 samples/sec   Loss 14.8167   LearningRate 0.2378   Epoch: 0   Global Step: 8220   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-01-07 20:23:10,956-Speed 5854.80 samples/sec   Loss 14.8535   LearningRate 0.2381   Epoch: 0   Global Step: 8230   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-01-07 20:23:17,923-Speed 5879.74 samples/sec   Loss 14.8961   LearningRate 0.2384   Epoch: 0   Global Step: 8240   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-01-07 20:23:25,150-Speed 5668.79 samples/sec   Loss 14.8553   LearningRate 0.2387   Epoch: 0   Global Step: 8250   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-01-07 20:23:32,180-Speed 5828.02 samples/sec   Loss 14.9242   LearningRate 0.2390   Epoch: 0   Global Step: 8260   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-01-07 20:23:39,202-Speed 5834.29 samples/sec   Loss 14.8096   LearningRate 0.2393   Epoch: 0   Global Step: 8270   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-01-07 20:23:46,447-Speed 5655.13 samples/sec   Loss 14.8255   LearningRate 0.2396   Epoch: 0   Global Step: 8280   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 20:23:53,993-Speed 5429.05 samples/sec   Loss 14.9068   LearningRate 0.2398   Epoch: 0   Global Step: 8290   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 20:24:01,557-Speed 5416.62 samples/sec   Loss 14.7583   LearningRate 0.2401   Epoch: 0   Global Step: 8300   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 20:24:09,104-Speed 5428.73 samples/sec   Loss 14.7785   LearningRate 0.2404   Epoch: 0   Global Step: 8310   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 20:24:16,621-Speed 5450.17 samples/sec   Loss 14.6930   LearningRate 0.2407   Epoch: 0   Global Step: 8320   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 20:24:24,298-Speed 5336.09 samples/sec   Loss 14.7619   LearningRate 0.2410   Epoch: 0   Global Step: 8330   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 20:24:31,707-Speed 5529.37 samples/sec   Loss 14.8096   LearningRate 0.2413   Epoch: 0   Global Step: 8340   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 20:24:39,207-Speed 5462.71 samples/sec   Loss 14.7541   LearningRate 0.2416   Epoch: 0   Global Step: 8350   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 20:24:46,675-Speed 5485.44 samples/sec   Loss 14.7991   LearningRate 0.2419   Epoch: 0   Global Step: 8360   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 20:24:54,095-Speed 5520.46 samples/sec   Loss 14.6666   LearningRate 0.2422   Epoch: 0   Global Step: 8370   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 20:25:01,552-Speed 5493.86 samples/sec   Loss 14.7610   LearningRate 0.2425   Epoch: 0   Global Step: 8380   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:25:09,102-Speed 5425.92 samples/sec   Loss 14.6799   LearningRate 0.2427   Epoch: 0   Global Step: 8390   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:25:16,590-Speed 5470.92 samples/sec   Loss 14.7241   LearningRate 0.2430   Epoch: 0   Global Step: 8400   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:25:24,019-Speed 5514.01 samples/sec   Loss 14.6770   LearningRate 0.2433   Epoch: 0   Global Step: 8410   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:25:31,485-Speed 5487.50 samples/sec   Loss 14.6275   LearningRate 0.2436   Epoch: 0   Global Step: 8420   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:25:38,975-Speed 5469.55 samples/sec   Loss 14.6758   LearningRate 0.2439   Epoch: 0   Global Step: 8430   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:25:46,514-Speed 5433.52 samples/sec   Loss 14.6971   LearningRate 0.2442   Epoch: 0   Global Step: 8440   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:25:53,962-Speed 5500.09 samples/sec   Loss 14.7048   LearningRate 0.2445   Epoch: 0   Global Step: 8450   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:26:01,407-Speed 5502.54 samples/sec   Loss 14.5848   LearningRate 0.2448   Epoch: 0   Global Step: 8460   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:26:08,932-Speed 5443.81 samples/sec   Loss 14.6470   LearningRate 0.2451   Epoch: 0   Global Step: 8470   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:26:16,425-Speed 5467.59 samples/sec   Loss 14.6031   LearningRate 0.2453   Epoch: 0   Global Step: 8480   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 20:26:23,979-Speed 5422.46 samples/sec   Loss 14.7359   LearningRate 0.2456   Epoch: 0   Global Step: 8490   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:26:31,536-Speed 5421.23 samples/sec   Loss 14.5745   LearningRate 0.2459   Epoch: 0   Global Step: 8500   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:26:39,089-Speed 5424.22 samples/sec   Loss 14.6010   LearningRate 0.2462   Epoch: 0   Global Step: 8510   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:26:46,593-Speed 5459.25 samples/sec   Loss 14.5508   LearningRate 0.2465   Epoch: 0   Global Step: 8520   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:26:54,181-Speed 5398.33 samples/sec   Loss 14.6168   LearningRate 0.2468   Epoch: 0   Global Step: 8530   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:27:01,678-Speed 5464.72 samples/sec   Loss 14.5784   LearningRate 0.2471   Epoch: 0   Global Step: 8540   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:27:09,236-Speed 5420.06 samples/sec   Loss 14.5925   LearningRate 0.2474   Epoch: 0   Global Step: 8550   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:27:16,821-Speed 5400.63 samples/sec   Loss 14.5534   LearningRate 0.2477   Epoch: 0   Global Step: 8560   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:27:24,235-Speed 5525.45 samples/sec   Loss 14.6015   LearningRate 0.2480   Epoch: 0   Global Step: 8570   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:27:31,681-Speed 5501.99 samples/sec   Loss 14.5618   LearningRate 0.2482   Epoch: 0   Global Step: 8580   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:27:39,148-Speed 5485.62 samples/sec   Loss 14.5738   LearningRate 0.2485   Epoch: 0   Global Step: 8590   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 20:27:46,552-Speed 5532.78 samples/sec   Loss 14.4994   LearningRate 0.2488   Epoch: 0   Global Step: 8600   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:27:54,048-Speed 5465.59 samples/sec   Loss 14.4922   LearningRate 0.2491   Epoch: 0   Global Step: 8610   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:28:01,551-Speed 5459.85 samples/sec   Loss 14.5864   LearningRate 0.2494   Epoch: 0   Global Step: 8620   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:28:09,064-Speed 5452.81 samples/sec   Loss 14.5305   LearningRate 0.2497   Epoch: 0   Global Step: 8630   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:28:16,677-Speed 5380.64 samples/sec   Loss 14.5211   LearningRate 0.2500   Epoch: 0   Global Step: 8640   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:28:24,250-Speed 5409.91 samples/sec   Loss 14.5503   LearningRate 0.2503   Epoch: 0   Global Step: 8650   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:28:31,759-Speed 5455.09 samples/sec   Loss 14.5495   LearningRate 0.2506   Epoch: 0   Global Step: 8660   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:28:39,308-Speed 5426.88 samples/sec   Loss 14.4733   LearningRate 0.2508   Epoch: 0   Global Step: 8670   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:28:46,767-Speed 5491.78 samples/sec   Loss 14.5276   LearningRate 0.2511   Epoch: 0   Global Step: 8680   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:28:54,220-Speed 5496.76 samples/sec   Loss 14.4284   LearningRate 0.2514   Epoch: 0   Global Step: 8690   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:29:01,729-Speed 5455.42 samples/sec   Loss 14.4725   LearningRate 0.2517   Epoch: 0   Global Step: 8700   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 20:29:09,262-Speed 5438.03 samples/sec   Loss 14.4623   LearningRate 0.2520   Epoch: 0   Global Step: 8710   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:29:16,712-Speed 5499.22 samples/sec   Loss 14.4117   LearningRate 0.2523   Epoch: 0   Global Step: 8720   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:29:24,244-Speed 5438.88 samples/sec   Loss 14.3941   LearningRate 0.2526   Epoch: 0   Global Step: 8730   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:29:31,771-Speed 5442.69 samples/sec   Loss 14.4603   LearningRate 0.2529   Epoch: 0   Global Step: 8740   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:29:39,250-Speed 5477.16 samples/sec   Loss 14.4350   LearningRate 0.2532   Epoch: 0   Global Step: 8750   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:29:46,693-Speed 5504.09 samples/sec   Loss 14.3575   LearningRate 0.2534   Epoch: 0   Global Step: 8760   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:29:54,142-Speed 5499.16 samples/sec   Loss 14.4979   LearningRate 0.2537   Epoch: 0   Global Step: 8770   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:30:01,691-Speed 5426.73 samples/sec   Loss 14.4839   LearningRate 0.2540   Epoch: 0   Global Step: 8780   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:30:09,219-Speed 5441.93 samples/sec   Loss 14.4145   LearningRate 0.2543   Epoch: 0   Global Step: 8790   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:30:16,721-Speed 5460.39 samples/sec   Loss 14.3597   LearningRate 0.2546   Epoch: 0   Global Step: 8800   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:30:24,206-Speed 5473.44 samples/sec   Loss 14.4121   LearningRate 0.2549   Epoch: 0   Global Step: 8810   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 20:30:31,867-Speed 5346.75 samples/sec   Loss 14.3764   LearningRate 0.2552   Epoch: 0   Global Step: 8820   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 20:30:39,427-Speed 5419.20 samples/sec   Loss 14.2898   LearningRate 0.2555   Epoch: 0   Global Step: 8830   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 20:30:46,922-Speed 5465.45 samples/sec   Loss 14.3456   LearningRate 0.2558   Epoch: 0   Global Step: 8840   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 20:30:54,383-Speed 5490.41 samples/sec   Loss 14.4902   LearningRate 0.2561   Epoch: 0   Global Step: 8850   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:31:01,930-Speed 5428.51 samples/sec   Loss 14.3578   LearningRate 0.2563   Epoch: 0   Global Step: 8860   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:31:09,433-Speed 5459.00 samples/sec   Loss 14.3215   LearningRate 0.2566   Epoch: 0   Global Step: 8870   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:31:16,952-Speed 5448.81 samples/sec   Loss 14.3375   LearningRate 0.2569   Epoch: 0   Global Step: 8880   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:31:24,432-Speed 5476.21 samples/sec   Loss 14.3980   LearningRate 0.2572   Epoch: 0   Global Step: 8890   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:31:31,952-Speed 5447.54 samples/sec   Loss 14.2738   LearningRate 0.2575   Epoch: 0   Global Step: 8900   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:31:39,402-Speed 5498.61 samples/sec   Loss 14.3431   LearningRate 0.2578   Epoch: 0   Global Step: 8910   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:31:46,916-Speed 5451.97 samples/sec   Loss 14.2911   LearningRate 0.2581   Epoch: 0   Global Step: 8920   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:31:54,528-Speed 5381.86 samples/sec   Loss 14.4411   LearningRate 0.2584   Epoch: 0   Global Step: 8930   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:32:02,104-Speed 5407.23 samples/sec   Loss 14.2622   LearningRate 0.2587   Epoch: 0   Global Step: 8940   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:32:09,564-Speed 5491.21 samples/sec   Loss 14.1865   LearningRate 0.2589   Epoch: 0   Global Step: 8950   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 20:32:17,160-Speed 5393.25 samples/sec   Loss 14.3095   LearningRate 0.2592   Epoch: 0   Global Step: 8960   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 20:32:24,577-Speed 5523.46 samples/sec   Loss 14.3218   LearningRate 0.2595   Epoch: 0   Global Step: 8970   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 20:32:31,977-Speed 5535.48 samples/sec   Loss 14.2826   LearningRate 0.2598   Epoch: 0   Global Step: 8980   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 20:32:39,461-Speed 5473.69 samples/sec   Loss 14.2258   LearningRate 0.2601   Epoch: 0   Global Step: 8990   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 20:32:46,876-Speed 5524.93 samples/sec   Loss 14.2779   LearningRate 0.2604   Epoch: 0   Global Step: 9000   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 20:32:54,374-Speed 5463.63 samples/sec   Loss 14.1928   LearningRate 0.2607   Epoch: 0   Global Step: 9010   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 20:33:01,905-Speed 5439.56 samples/sec   Loss 14.2994   LearningRate 0.2610   Epoch: 0   Global Step: 9020   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 20:33:09,389-Speed 5473.12 samples/sec   Loss 14.2448   LearningRate 0.2613   Epoch: 0   Global Step: 9030   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 20:33:16,875-Speed 5472.97 samples/sec   Loss 14.1702   LearningRate 0.2615   Epoch: 0   Global Step: 9040   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 20:33:24,305-Speed 5513.55 samples/sec   Loss 14.2905   LearningRate 0.2618   Epoch: 0   Global Step: 9050   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 20:33:31,783-Speed 5477.81 samples/sec   Loss 14.2198   LearningRate 0.2621   Epoch: 0   Global Step: 9060   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 20:33:39,282-Speed 5462.79 samples/sec   Loss 14.2746   LearningRate 0.2624   Epoch: 0   Global Step: 9070   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 20:33:46,777-Speed 5466.15 samples/sec   Loss 14.2273   LearningRate 0.2627   Epoch: 0   Global Step: 9080   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:33:54,188-Speed 5528.20 samples/sec   Loss 14.1881   LearningRate 0.2630   Epoch: 0   Global Step: 9090   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:34:01,683-Speed 5465.87 samples/sec   Loss 14.2516   LearningRate 0.2633   Epoch: 0   Global Step: 9100   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:34:09,227-Speed 5430.06 samples/sec   Loss 14.2326   LearningRate 0.2636   Epoch: 0   Global Step: 9110   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:34:16,624-Speed 5538.34 samples/sec   Loss 14.2858   LearningRate 0.2639   Epoch: 0   Global Step: 9120   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:34:24,043-Speed 5521.58 samples/sec   Loss 14.2020   LearningRate 0.2642   Epoch: 0   Global Step: 9130   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:34:31,626-Speed 5402.13 samples/sec   Loss 14.2066   LearningRate 0.2644   Epoch: 0   Global Step: 9140   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:34:39,084-Speed 5493.03 samples/sec   Loss 14.2058   LearningRate 0.2647   Epoch: 0   Global Step: 9150   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:34:46,540-Speed 5493.95 samples/sec   Loss 14.2304   LearningRate 0.2650   Epoch: 0   Global Step: 9160   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:34:54,103-Speed 5416.75 samples/sec   Loss 14.1849   LearningRate 0.2653   Epoch: 0   Global Step: 9170   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:35:01,605-Speed 5460.75 samples/sec   Loss 14.1222   LearningRate 0.2656   Epoch: 0   Global Step: 9180   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 20:35:09,107-Speed 5460.49 samples/sec   Loss 14.1990   LearningRate 0.2659   Epoch: 0   Global Step: 9190   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 20:35:16,737-Speed 5368.78 samples/sec   Loss 14.1229   LearningRate 0.2662   Epoch: 0   Global Step: 9200   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:35:24,269-Speed 5439.05 samples/sec   Loss 14.0223   LearningRate 0.2665   Epoch: 0   Global Step: 9210   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:35:31,715-Speed 5501.87 samples/sec   Loss 14.1704   LearningRate 0.2668   Epoch: 0   Global Step: 9220   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:35:39,214-Speed 5463.10 samples/sec   Loss 14.2064   LearningRate 0.2670   Epoch: 0   Global Step: 9230   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:35:46,768-Speed 5423.09 samples/sec   Loss 14.1549   LearningRate 0.2673   Epoch: 0   Global Step: 9240   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:35:54,321-Speed 5423.12 samples/sec   Loss 14.1625   LearningRate 0.2676   Epoch: 0   Global Step: 9250   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:36:01,752-Speed 5513.52 samples/sec   Loss 14.1597   LearningRate 0.2679   Epoch: 0   Global Step: 9260   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:36:09,209-Speed 5493.35 samples/sec   Loss 14.0770   LearningRate 0.2682   Epoch: 0   Global Step: 9270   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:36:16,615-Speed 5531.57 samples/sec   Loss 14.0218   LearningRate 0.2685   Epoch: 0   Global Step: 9280   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:36:24,142-Speed 5442.13 samples/sec   Loss 14.1522   LearningRate 0.2688   Epoch: 0   Global Step: 9290   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:36:31,595-Speed 5496.60 samples/sec   Loss 14.1133   LearningRate 0.2691   Epoch: 0   Global Step: 9300   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 20:36:39,064-Speed 5485.53 samples/sec   Loss 14.1543   LearningRate 0.2694   Epoch: 0   Global Step: 9310   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 20:36:46,525-Speed 5489.92 samples/sec   Loss 14.2882   LearningRate 0.2696   Epoch: 0   Global Step: 9320   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 20:36:53,976-Speed 5498.27 samples/sec   Loss 14.1360   LearningRate 0.2699   Epoch: 0   Global Step: 9330   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 20:37:01,497-Speed 5446.93 samples/sec   Loss 14.1478   LearningRate 0.2702   Epoch: 0   Global Step: 9340   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 20:37:08,927-Speed 5514.06 samples/sec   Loss 14.1237   LearningRate 0.2705   Epoch: 0   Global Step: 9350   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 20:37:16,365-Speed 5506.99 samples/sec   Loss 14.2116   LearningRate 0.2708   Epoch: 0   Global Step: 9360   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:37:23,823-Speed 5492.89 samples/sec   Loss 14.1346   LearningRate 0.2711   Epoch: 0   Global Step: 9370   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:37:31,306-Speed 5474.64 samples/sec   Loss 14.1033   LearningRate 0.2714   Epoch: 0   Global Step: 9380   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:37:38,751-Speed 5502.50 samples/sec   Loss 14.0806   LearningRate 0.2717   Epoch: 0   Global Step: 9390   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:37:46,247-Speed 5465.03 samples/sec   Loss 14.0646   LearningRate 0.2720   Epoch: 0   Global Step: 9400   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:37:53,738-Speed 5468.28 samples/sec   Loss 14.1265   LearningRate 0.2723   Epoch: 0   Global Step: 9410   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:38:01,187-Speed 5500.42 samples/sec   Loss 14.0264   LearningRate 0.2725   Epoch: 0   Global Step: 9420   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:38:08,627-Speed 5505.85 samples/sec   Loss 14.0642   LearningRate 0.2728   Epoch: 0   Global Step: 9430   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:38:16,083-Speed 5494.25 samples/sec   Loss 14.0569   LearningRate 0.2731   Epoch: 0   Global Step: 9440   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:38:23,506-Speed 5518.40 samples/sec   Loss 14.0781   LearningRate 0.2734   Epoch: 0   Global Step: 9450   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:38:30,954-Speed 5500.30 samples/sec   Loss 14.0731   LearningRate 0.2737   Epoch: 0   Global Step: 9460   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 20:38:38,605-Speed 5354.69 samples/sec   Loss 14.0206   LearningRate 0.2740   Epoch: 0   Global Step: 9470   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 20:38:46,041-Speed 5509.26 samples/sec   Loss 14.0869   LearningRate 0.2743   Epoch: 0   Global Step: 9480   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 20:38:53,482-Speed 5504.66 samples/sec   Loss 14.0266   LearningRate 0.2746   Epoch: 0   Global Step: 9490   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 20:39:01,032-Speed 5426.19 samples/sec   Loss 14.0427   LearningRate 0.2749   Epoch: 0   Global Step: 9500   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 20:39:08,487-Speed 5495.46 samples/sec   Loss 13.9802   LearningRate 0.2751   Epoch: 0   Global Step: 9510   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 20:39:15,981-Speed 5466.23 samples/sec   Loss 13.9596   LearningRate 0.2754   Epoch: 0   Global Step: 9520   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:39:23,509-Speed 5441.97 samples/sec   Loss 14.1009   LearningRate 0.2757   Epoch: 0   Global Step: 9530   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:39:30,924-Speed 5524.46 samples/sec   Loss 13.9731   LearningRate 0.2760   Epoch: 0   Global Step: 9540   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:39:38,362-Speed 5507.73 samples/sec   Loss 13.9817   LearningRate 0.2763   Epoch: 0   Global Step: 9550   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:39:45,801-Speed 5506.77 samples/sec   Loss 13.9634   LearningRate 0.2766   Epoch: 0   Global Step: 9560   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:39:53,413-Speed 5381.46 samples/sec   Loss 14.1002   LearningRate 0.2769   Epoch: 0   Global Step: 9570   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 20:40:00,878-Speed 5488.57 samples/sec   Loss 13.9308   LearningRate 0.2772   Epoch: 0   Global Step: 9580   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 20:40:08,292-Speed 5524.96 samples/sec   Loss 14.4544   LearningRate 0.2775   Epoch: 0   Global Step: 9590   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 20:40:15,712-Speed 5521.61 samples/sec   Loss 14.4927   LearningRate 0.2778   Epoch: 0   Global Step: 9600   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-01-07 20:40:23,211-Speed 5462.05 samples/sec   Loss 14.2164   LearningRate 0.2780   Epoch: 0   Global Step: 9610   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-01-07 20:40:30,807-Speed 5393.32 samples/sec   Loss 14.0396   LearningRate 0.2783   Epoch: 0   Global Step: 9620   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-01-07 20:40:38,233-Speed 5516.38 samples/sec   Loss 14.0909   LearningRate 0.2786   Epoch: 0   Global Step: 9630   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-01-07 20:40:45,684-Speed 5498.16 samples/sec   Loss 13.9813   LearningRate 0.2789   Epoch: 0   Global Step: 9640   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-01-07 20:40:53,153-Speed 5485.05 samples/sec   Loss 13.9863   LearningRate 0.2792   Epoch: 0   Global Step: 9650   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-01-07 20:41:00,604-Speed 5497.55 samples/sec   Loss 14.0429   LearningRate 0.2795   Epoch: 0   Global Step: 9660   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-01-07 20:41:08,028-Speed 5518.75 samples/sec   Loss 14.0537   LearningRate 0.2798   Epoch: 0   Global Step: 9670   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-01-07 20:41:15,445-Speed 5523.23 samples/sec   Loss 13.9689   LearningRate 0.2801   Epoch: 0   Global Step: 9680   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-01-07 20:41:22,855-Speed 5527.71 samples/sec   Loss 13.9905   LearningRate 0.2804   Epoch: 0   Global Step: 9690   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-01-07 20:41:30,309-Speed 5496.28 samples/sec   Loss 14.0016   LearningRate 0.2806   Epoch: 0   Global Step: 9700   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 20:41:37,810-Speed 5461.23 samples/sec   Loss 13.9536   LearningRate 0.2809   Epoch: 0   Global Step: 9710   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 20:41:45,278-Speed 5485.69 samples/sec   Loss 13.9510   LearningRate 0.2812   Epoch: 0   Global Step: 9720   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 20:41:52,700-Speed 5519.74 samples/sec   Loss 13.9670   LearningRate 0.2815   Epoch: 0   Global Step: 9730   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 20:42:00,169-Speed 5484.86 samples/sec   Loss 13.9725   LearningRate 0.2818   Epoch: 0   Global Step: 9740   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 20:42:07,614-Speed 5502.39 samples/sec   Loss 13.9324   LearningRate 0.2821   Epoch: 0   Global Step: 9750   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 20:42:15,045-Speed 5513.36 samples/sec   Loss 13.9971   LearningRate 0.2824   Epoch: 0   Global Step: 9760   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 20:42:22,485-Speed 5505.99 samples/sec   Loss 14.0568   LearningRate 0.2827   Epoch: 0   Global Step: 9770   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 20:42:29,926-Speed 5505.34 samples/sec   Loss 14.0047   LearningRate 0.2830   Epoch: 0   Global Step: 9780   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 20:42:37,426-Speed 5462.42 samples/sec   Loss 13.9211   LearningRate 0.2832   Epoch: 0   Global Step: 9790   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 20:42:44,867-Speed 5505.02 samples/sec   Loss 13.9178   LearningRate 0.2835   Epoch: 0   Global Step: 9800   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:42:52,328-Speed 5490.98 samples/sec   Loss 13.9492   LearningRate 0.2838   Epoch: 0   Global Step: 9810   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:42:59,759-Speed 5512.38 samples/sec   Loss 13.9100   LearningRate 0.2841   Epoch: 0   Global Step: 9820   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:43:07,316-Speed 5421.14 samples/sec   Loss 14.0000   LearningRate 0.2844   Epoch: 0   Global Step: 9830   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:43:14,740-Speed 5518.59 samples/sec   Loss 13.8934   LearningRate 0.2847   Epoch: 0   Global Step: 9840   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:43:22,169-Speed 5513.89 samples/sec   Loss 13.9084   LearningRate 0.2850   Epoch: 0   Global Step: 9850   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:43:29,600-Speed 5512.73 samples/sec   Loss 13.8353   LearningRate 0.2853   Epoch: 0   Global Step: 9860   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:43:37,054-Speed 5496.17 samples/sec   Loss 13.9456   LearningRate 0.2856   Epoch: 0   Global Step: 9870   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:43:44,514-Speed 5491.07 samples/sec   Loss 13.9694   LearningRate 0.2859   Epoch: 0   Global Step: 9880   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:43:51,992-Speed 5478.44 samples/sec   Loss 13.9083   LearningRate 0.2861   Epoch: 0   Global Step: 9890   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:43:59,426-Speed 5510.45 samples/sec   Loss 13.9515   LearningRate 0.2864   Epoch: 0   Global Step: 9900   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:44:06,871-Speed 5502.48 samples/sec   Loss 13.9572   LearningRate 0.2867   Epoch: 0   Global Step: 9910   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:44:14,348-Speed 5478.70 samples/sec   Loss 13.9070   LearningRate 0.2870   Epoch: 0   Global Step: 9920   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:44:21,771-Speed 5518.68 samples/sec   Loss 13.8922   LearningRate 0.2873   Epoch: 0   Global Step: 9930   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:44:29,250-Speed 5477.61 samples/sec   Loss 13.8573   LearningRate 0.2876   Epoch: 0   Global Step: 9940   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:44:36,742-Speed 5467.43 samples/sec   Loss 13.9540   LearningRate 0.2879   Epoch: 0   Global Step: 9950   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:44:44,199-Speed 5493.95 samples/sec   Loss 13.8472   LearningRate 0.2882   Epoch: 0   Global Step: 9960   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:44:51,635-Speed 5508.90 samples/sec   Loss 13.9780   LearningRate 0.2885   Epoch: 0   Global Step: 9970   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:44:59,020-Speed 5547.70 samples/sec   Loss 13.8697   LearningRate 0.2887   Epoch: 0   Global Step: 9980   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:45:06,473-Speed 5496.36 samples/sec   Loss 13.9913   LearningRate 0.2890   Epoch: 0   Global Step: 9990   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:45:13,926-Speed 5496.59 samples/sec   Loss 13.9089   LearningRate 0.2893   Epoch: 0   Global Step: 10000   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 20:45:58,511-[lfw][10000]XNorm: 23.119454
Training: 2022-01-07 20:45:58,512-[lfw][10000]Accuracy-Flip: 0.99567+-0.00327
Training: 2022-01-07 20:45:58,513-[lfw][10000]Accuracy-Highest: 0.99567
Training: 2022-01-07 20:46:51,699-[cfp_fp][10000]XNorm: 21.330772
Training: 2022-01-07 20:46:51,700-[cfp_fp][10000]Accuracy-Flip: 0.97143+-0.00740
Training: 2022-01-07 20:46:51,701-[cfp_fp][10000]Accuracy-Highest: 0.97143
Training: 2022-01-07 20:47:37,423-[agedb_30][10000]XNorm: 23.151996
Training: 2022-01-07 20:47:37,424-[agedb_30][10000]Accuracy-Flip: 0.95500+-0.00869
Training: 2022-01-07 20:47:37,425-[agedb_30][10000]Accuracy-Highest: 0.95500
Training: 2022-01-07 20:47:45,028-Speed 271.08 samples/sec   Loss 13.8010   LearningRate 0.2896   Epoch: 0   Global Step: 10010   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 20:47:52,651-Speed 5374.29 samples/sec   Loss 13.9753   LearningRate 0.2899   Epoch: 0   Global Step: 10020   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:48:00,252-Speed 5390.08 samples/sec   Loss 13.9044   LearningRate 0.2902   Epoch: 0   Global Step: 10030   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:48:07,725-Speed 5482.56 samples/sec   Loss 13.8992   LearningRate 0.2905   Epoch: 0   Global Step: 10040   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:48:15,164-Speed 5507.72 samples/sec   Loss 13.8962   LearningRate 0.2908   Epoch: 0   Global Step: 10050   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:48:22,674-Speed 5455.18 samples/sec   Loss 13.8347   LearningRate 0.2911   Epoch: 0   Global Step: 10060   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:48:30,257-Speed 5403.37 samples/sec   Loss 13.8300   LearningRate 0.2913   Epoch: 0   Global Step: 10070   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:48:37,852-Speed 5394.21 samples/sec   Loss 13.8547   LearningRate 0.2916   Epoch: 0   Global Step: 10080   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:48:45,290-Speed 5507.43 samples/sec   Loss 13.9728   LearningRate 0.2919   Epoch: 0   Global Step: 10090   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:48:52,816-Speed 5444.12 samples/sec   Loss 13.9322   LearningRate 0.2922   Epoch: 0   Global Step: 10100   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:49:00,484-Speed 5342.66 samples/sec   Loss 13.8914   LearningRate 0.2925   Epoch: 0   Global Step: 10110   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:49:07,753-Speed 5636.14 samples/sec   Loss 13.9653   LearningRate 0.2928   Epoch: 0   Global Step: 10120   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:49:15,263-Speed 5455.07 samples/sec   Loss 13.9788   LearningRate 0.2931   Epoch: 0   Global Step: 10130   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:49:22,716-Speed 5497.09 samples/sec   Loss 13.8336   LearningRate 0.2934   Epoch: 0   Global Step: 10140   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:49:30,280-Speed 5416.80 samples/sec   Loss 13.9003   LearningRate 0.2937   Epoch: 0   Global Step: 10150   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:49:37,809-Speed 5440.91 samples/sec   Loss 13.9053   LearningRate 0.2940   Epoch: 0   Global Step: 10160   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:49:45,234-Speed 5518.29 samples/sec   Loss 13.8760   LearningRate 0.2942   Epoch: 0   Global Step: 10170   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:49:52,701-Speed 5486.60 samples/sec   Loss 13.8570   LearningRate 0.2945   Epoch: 0   Global Step: 10180   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:50:00,044-Speed 5579.55 samples/sec   Loss 13.9280   LearningRate 0.2948   Epoch: 0   Global Step: 10190   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:50:07,494-Speed 5499.27 samples/sec   Loss 13.8596   LearningRate 0.2951   Epoch: 0   Global Step: 10200   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:50:14,906-Speed 5528.11 samples/sec   Loss 13.8332   LearningRate 0.2954   Epoch: 0   Global Step: 10210   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:50:22,586-Speed 5334.23 samples/sec   Loss 13.8616   LearningRate 0.2957   Epoch: 0   Global Step: 10220   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 20:50:30,107-Speed 5447.23 samples/sec   Loss 13.8835   LearningRate 0.2960   Epoch: 0   Global Step: 10230   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 20:50:37,193-Speed 5781.84 samples/sec   Loss 13.7577   LearningRate 0.2963   Epoch: 0   Global Step: 10240   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 20:50:44,518-Speed 5592.89 samples/sec   Loss 13.8189   LearningRate 0.2966   Epoch: 0   Global Step: 10250   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 20:50:52,057-Speed 5434.15 samples/sec   Loss 13.7958   LearningRate 0.2968   Epoch: 0   Global Step: 10260   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:50:59,221-Speed 5719.24 samples/sec   Loss 13.7974   LearningRate 0.2971   Epoch: 0   Global Step: 10270   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:51:06,792-Speed 5411.02 samples/sec   Loss 13.9565   LearningRate 0.2974   Epoch: 0   Global Step: 10280   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:51:14,549-Speed 5281.15 samples/sec   Loss 13.7485   LearningRate 0.2977   Epoch: 0   Global Step: 10290   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:51:22,206-Speed 5350.67 samples/sec   Loss 13.8137   LearningRate 0.2980   Epoch: 0   Global Step: 10300   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:51:29,789-Speed 5403.05 samples/sec   Loss 13.8153   LearningRate 0.2983   Epoch: 0   Global Step: 10310   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:51:37,344-Speed 5422.51 samples/sec   Loss 13.7642   LearningRate 0.2986   Epoch: 0   Global Step: 10320   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:51:45,027-Speed 5332.59 samples/sec   Loss 13.8676   LearningRate 0.2989   Epoch: 0   Global Step: 10330   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:51:52,727-Speed 5319.98 samples/sec   Loss 13.8651   LearningRate 0.2992   Epoch: 0   Global Step: 10340   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:52:00,269-Speed 5432.38 samples/sec   Loss 13.8303   LearningRate 0.2995   Epoch: 0   Global Step: 10350   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:52:07,783-Speed 5452.27 samples/sec   Loss 13.8039   LearningRate 0.2997   Epoch: 0   Global Step: 10360   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:52:15,342-Speed 5420.19 samples/sec   Loss 13.8142   LearningRate 0.3000   Epoch: 0   Global Step: 10370   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:52:38,276-Speed 1786.27 samples/sec   Loss 13.7174   LearningRate 0.3000   Epoch: 1   Global Step: 10380   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:52:45,669-Speed 5541.46 samples/sec   Loss 13.7615   LearningRate 0.2999   Epoch: 1   Global Step: 10390   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:52:53,047-Speed 5552.55 samples/sec   Loss 13.7743   LearningRate 0.2999   Epoch: 1   Global Step: 10400   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:53:00,468-Speed 5521.14 samples/sec   Loss 13.8331   LearningRate 0.2999   Epoch: 1   Global Step: 10410   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:53:07,958-Speed 5470.15 samples/sec   Loss 13.8493   LearningRate 0.2998   Epoch: 1   Global Step: 10420   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:53:15,362-Speed 5534.05 samples/sec   Loss 13.8542   LearningRate 0.2998   Epoch: 1   Global Step: 10430   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-01-07 20:53:22,921-Speed 5419.70 samples/sec   Loss 13.7681   LearningRate 0.2998   Epoch: 1   Global Step: 10440   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-01-07 20:53:30,398-Speed 5479.17 samples/sec   Loss 13.7879   LearningRate 0.2998   Epoch: 1   Global Step: 10450   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-01-07 20:53:37,791-Speed 5542.09 samples/sec   Loss 13.8541   LearningRate 0.2997   Epoch: 1   Global Step: 10460   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-01-07 20:53:45,397-Speed 5385.80 samples/sec   Loss 13.7615   LearningRate 0.2997   Epoch: 1   Global Step: 10470   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-01-07 20:53:52,800-Speed 5534.17 samples/sec   Loss 13.8087   LearningRate 0.2997   Epoch: 1   Global Step: 10480   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-01-07 20:53:59,883-Speed 5784.60 samples/sec   Loss 13.7360   LearningRate 0.2996   Epoch: 1   Global Step: 10490   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-01-07 20:54:07,167-Speed 5624.73 samples/sec   Loss 13.8070   LearningRate 0.2996   Epoch: 1   Global Step: 10500   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-01-07 20:54:14,398-Speed 5666.58 samples/sec   Loss 13.7994   LearningRate 0.2996   Epoch: 1   Global Step: 10510   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-01-07 20:54:21,678-Speed 5627.51 samples/sec   Loss 13.7025   LearningRate 0.2995   Epoch: 1   Global Step: 10520   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-01-07 20:54:29,112-Speed 5511.26 samples/sec   Loss 13.7201   LearningRate 0.2995   Epoch: 1   Global Step: 10530   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 20:54:36,314-Speed 5688.94 samples/sec   Loss 13.7639   LearningRate 0.2995   Epoch: 1   Global Step: 10540   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 20:54:43,544-Speed 5667.15 samples/sec   Loss 13.7682   LearningRate 0.2994   Epoch: 1   Global Step: 10550   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 20:54:51,058-Speed 5451.98 samples/sec   Loss 13.7215   LearningRate 0.2994   Epoch: 1   Global Step: 10560   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 20:54:58,525-Speed 5486.55 samples/sec   Loss 13.7943   LearningRate 0.2994   Epoch: 1   Global Step: 10570   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 20:55:06,031-Speed 5458.12 samples/sec   Loss 13.7260   LearningRate 0.2994   Epoch: 1   Global Step: 10580   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 20:55:13,455-Speed 5518.84 samples/sec   Loss 13.6908   LearningRate 0.2993   Epoch: 1   Global Step: 10590   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 20:55:20,609-Speed 5726.39 samples/sec   Loss 13.7780   LearningRate 0.2993   Epoch: 1   Global Step: 10600   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 20:55:27,585-Speed 5873.12 samples/sec   Loss 13.7498   LearningRate 0.2993   Epoch: 1   Global Step: 10610   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 20:55:34,932-Speed 5575.88 samples/sec   Loss 13.7494   LearningRate 0.2992   Epoch: 1   Global Step: 10620   Fp16 Grad Scale: 65536   Required: 45 hours
Training: 2022-01-07 20:55:42,391-Speed 5491.92 samples/sec   Loss 13.7498   LearningRate 0.2992   Epoch: 1   Global Step: 10630   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:55:49,816-Speed 5518.23 samples/sec   Loss 13.7834   LearningRate 0.2992   Epoch: 1   Global Step: 10640   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:55:57,109-Speed 5617.25 samples/sec   Loss 13.7155   LearningRate 0.2991   Epoch: 1   Global Step: 10650   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:56:04,247-Speed 5739.62 samples/sec   Loss 13.7926   LearningRate 0.2991   Epoch: 1   Global Step: 10660   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:56:11,639-Speed 5541.97 samples/sec   Loss 13.6340   LearningRate 0.2991   Epoch: 1   Global Step: 10670   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:56:19,054-Speed 5525.29 samples/sec   Loss 13.6930   LearningRate 0.2991   Epoch: 1   Global Step: 10680   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:56:26,405-Speed 5573.65 samples/sec   Loss 13.7176   LearningRate 0.2990   Epoch: 1   Global Step: 10690   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:56:33,868-Speed 5489.23 samples/sec   Loss 13.6654   LearningRate 0.2990   Epoch: 1   Global Step: 10700   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:56:41,217-Speed 5574.99 samples/sec   Loss 13.6777   LearningRate 0.2990   Epoch: 1   Global Step: 10710   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:56:48,286-Speed 5795.69 samples/sec   Loss 13.6046   LearningRate 0.2989   Epoch: 1   Global Step: 10720   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:56:55,897-Speed 5382.72 samples/sec   Loss 13.6660   LearningRate 0.2989   Epoch: 1   Global Step: 10730   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 20:57:03,369-Speed 5482.65 samples/sec   Loss 13.6465   LearningRate 0.2989   Epoch: 1   Global Step: 10740   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:57:10,897-Speed 5443.42 samples/sec   Loss 13.6251   LearningRate 0.2988   Epoch: 1   Global Step: 10750   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:57:18,600-Speed 5318.62 samples/sec   Loss 13.6724   LearningRate 0.2988   Epoch: 1   Global Step: 10760   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 20:57:26,196-Speed 5393.99 samples/sec   Loss 13.6349   LearningRate 0.2988   Epoch: 1   Global Step: 10770   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:57:33,747-Speed 5425.23 samples/sec   Loss 13.6547   LearningRate 0.2987   Epoch: 1   Global Step: 10780   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:57:41,207-Speed 5492.38 samples/sec   Loss 13.6461   LearningRate 0.2987   Epoch: 1   Global Step: 10790   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:57:48,609-Speed 5534.89 samples/sec   Loss 13.6551   LearningRate 0.2987   Epoch: 1   Global Step: 10800   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:57:56,029-Speed 5521.45 samples/sec   Loss 13.6284   LearningRate 0.2987   Epoch: 1   Global Step: 10810   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:58:03,567-Speed 5434.96 samples/sec   Loss 13.5989   LearningRate 0.2986   Epoch: 1   Global Step: 10820   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:58:11,028-Speed 5491.28 samples/sec   Loss 13.6195   LearningRate 0.2986   Epoch: 1   Global Step: 10830   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:58:18,491-Speed 5489.69 samples/sec   Loss 13.5961   LearningRate 0.2986   Epoch: 1   Global Step: 10840   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 20:58:25,898-Speed 5533.22 samples/sec   Loss 13.5596   LearningRate 0.2985   Epoch: 1   Global Step: 10850   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:58:33,306-Speed 5530.26 samples/sec   Loss 13.6033   LearningRate 0.2985   Epoch: 1   Global Step: 10860   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:58:40,725-Speed 5522.13 samples/sec   Loss 13.5722   LearningRate 0.2985   Epoch: 1   Global Step: 10870   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:58:48,190-Speed 5488.17 samples/sec   Loss 13.6217   LearningRate 0.2984   Epoch: 1   Global Step: 10880   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:58:55,703-Speed 5453.44 samples/sec   Loss 13.5656   LearningRate 0.2984   Epoch: 1   Global Step: 10890   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:59:03,211-Speed 5455.65 samples/sec   Loss 13.5524   LearningRate 0.2984   Epoch: 1   Global Step: 10900   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:59:10,721-Speed 5455.58 samples/sec   Loss 13.5366   LearningRate 0.2984   Epoch: 1   Global Step: 10910   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:59:18,233-Speed 5453.04 samples/sec   Loss 13.5964   LearningRate 0.2983   Epoch: 1   Global Step: 10920   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:59:25,726-Speed 5467.64 samples/sec   Loss 13.6064   LearningRate 0.2983   Epoch: 1   Global Step: 10930   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:59:33,343-Speed 5377.65 samples/sec   Loss 13.5003   LearningRate 0.2983   Epoch: 1   Global Step: 10940   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 20:59:40,802-Speed 5492.44 samples/sec   Loss 13.5403   LearningRate 0.2982   Epoch: 1   Global Step: 10950   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 20:59:48,264-Speed 5490.38 samples/sec   Loss 13.5805   LearningRate 0.2982   Epoch: 1   Global Step: 10960   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 20:59:55,886-Speed 5374.49 samples/sec   Loss 13.5662   LearningRate 0.2982   Epoch: 1   Global Step: 10970   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:00:03,482-Speed 5392.75 samples/sec   Loss 13.6030   LearningRate 0.2981   Epoch: 1   Global Step: 10980   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:00:11,086-Speed 5387.04 samples/sec   Loss 13.5207   LearningRate 0.2981   Epoch: 1   Global Step: 10990   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:00:18,674-Speed 5399.11 samples/sec   Loss 13.6135   LearningRate 0.2981   Epoch: 1   Global Step: 11000   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:00:26,309-Speed 5365.29 samples/sec   Loss 13.4981   LearningRate 0.2981   Epoch: 1   Global Step: 11010   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:00:33,974-Speed 5344.60 samples/sec   Loss 13.4796   LearningRate 0.2980   Epoch: 1   Global Step: 11020   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:00:41,594-Speed 5376.15 samples/sec   Loss 13.5199   LearningRate 0.2980   Epoch: 1   Global Step: 11030   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:00:49,177-Speed 5402.71 samples/sec   Loss 13.4643   LearningRate 0.2980   Epoch: 1   Global Step: 11040   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:00:56,935-Speed 5280.49 samples/sec   Loss 13.3979   LearningRate 0.2979   Epoch: 1   Global Step: 11050   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:01:04,610-Speed 5336.99 samples/sec   Loss 13.4865   LearningRate 0.2979   Epoch: 1   Global Step: 11060   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:01:12,399-Speed 5259.37 samples/sec   Loss 13.5801   LearningRate 0.2979   Epoch: 1   Global Step: 11070   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:01:20,089-Speed 5326.90 samples/sec   Loss 13.4871   LearningRate 0.2978   Epoch: 1   Global Step: 11080   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:01:27,856-Speed 5274.62 samples/sec   Loss 13.4998   LearningRate 0.2978   Epoch: 1   Global Step: 11090   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:01:35,550-Speed 5324.48 samples/sec   Loss 13.5059   LearningRate 0.2978   Epoch: 1   Global Step: 11100   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:01:43,251-Speed 5319.45 samples/sec   Loss 13.5007   LearningRate 0.2977   Epoch: 1   Global Step: 11110   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:01:50,941-Speed 5326.57 samples/sec   Loss 13.4613   LearningRate 0.2977   Epoch: 1   Global Step: 11120   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:01:58,620-Speed 5334.81 samples/sec   Loss 13.4809   LearningRate 0.2977   Epoch: 1   Global Step: 11130   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:02:06,330-Speed 5313.56 samples/sec   Loss 13.4768   LearningRate 0.2977   Epoch: 1   Global Step: 11140   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:02:14,044-Speed 5310.57 samples/sec   Loss 13.4295   LearningRate 0.2976   Epoch: 1   Global Step: 11150   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:02:21,693-Speed 5355.67 samples/sec   Loss 13.4623   LearningRate 0.2976   Epoch: 1   Global Step: 11160   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:02:29,375-Speed 5332.03 samples/sec   Loss 13.4085   LearningRate 0.2976   Epoch: 1   Global Step: 11170   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:02:37,162-Speed 5261.03 samples/sec   Loss 13.6190   LearningRate 0.2975   Epoch: 1   Global Step: 11180   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:02:44,935-Speed 5270.11 samples/sec   Loss 13.4382   LearningRate 0.2975   Epoch: 1   Global Step: 11190   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:02:52,683-Speed 5287.12 samples/sec   Loss 13.4068   LearningRate 0.2975   Epoch: 1   Global Step: 11200   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:03:00,406-Speed 5304.45 samples/sec   Loss 13.4025   LearningRate 0.2974   Epoch: 1   Global Step: 11210   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:03:08,176-Speed 5272.83 samples/sec   Loss 13.4626   LearningRate 0.2974   Epoch: 1   Global Step: 11220   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:03:15,888-Speed 5311.71 samples/sec   Loss 13.4027   LearningRate 0.2974   Epoch: 1   Global Step: 11230   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:03:23,669-Speed 5264.89 samples/sec   Loss 13.4063   LearningRate 0.2974   Epoch: 1   Global Step: 11240   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:03:31,414-Speed 5288.94 samples/sec   Loss 13.4272   LearningRate 0.2973   Epoch: 1   Global Step: 11250   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:03:39,162-Speed 5287.31 samples/sec   Loss 13.4636   LearningRate 0.2973   Epoch: 1   Global Step: 11260   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:03:46,897-Speed 5296.58 samples/sec   Loss 13.3905   LearningRate 0.2973   Epoch: 1   Global Step: 11270   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:03:54,624-Speed 5301.19 samples/sec   Loss 13.3754   LearningRate 0.2972   Epoch: 1   Global Step: 11280   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:04:02,377-Speed 5284.07 samples/sec   Loss 13.3913   LearningRate 0.2972   Epoch: 1   Global Step: 11290   Fp16 Grad Scale: 524288   Required: 44 hours
Training: 2022-01-07 21:04:10,143-Speed 5274.60 samples/sec   Loss 13.4145   LearningRate 0.2972   Epoch: 1   Global Step: 11300   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:04:17,989-Speed 5221.49 samples/sec   Loss 13.3946   LearningRate 0.2971   Epoch: 1   Global Step: 11310   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:04:25,762-Speed 5269.73 samples/sec   Loss 13.4338   LearningRate 0.2971   Epoch: 1   Global Step: 11320   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:04:33,553-Speed 5258.46 samples/sec   Loss 13.3936   LearningRate 0.2971   Epoch: 1   Global Step: 11330   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:04:41,332-Speed 5266.39 samples/sec   Loss 13.3635   LearningRate 0.2971   Epoch: 1   Global Step: 11340   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:04:49,121-Speed 5258.81 samples/sec   Loss 13.4264   LearningRate 0.2970   Epoch: 1   Global Step: 11350   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:04:56,912-Speed 5258.10 samples/sec   Loss 13.3395   LearningRate 0.2970   Epoch: 1   Global Step: 11360   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:05:04,706-Speed 5256.00 samples/sec   Loss 13.3704   LearningRate 0.2970   Epoch: 1   Global Step: 11370   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:05:12,484-Speed 5267.55 samples/sec   Loss 13.3746   LearningRate 0.2969   Epoch: 1   Global Step: 11380   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:05:20,267-Speed 5262.92 samples/sec   Loss 13.4048   LearningRate 0.2969   Epoch: 1   Global Step: 11390   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:05:28,159-Speed 5190.33 samples/sec   Loss 13.4194   LearningRate 0.2969   Epoch: 1   Global Step: 11400   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:05:35,888-Speed 5301.04 samples/sec   Loss 13.4849   LearningRate 0.2968   Epoch: 1   Global Step: 11410   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:05:43,646-Speed 5279.99 samples/sec   Loss 13.3605   LearningRate 0.2968   Epoch: 1   Global Step: 11420   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:05:51,260-Speed 5380.50 samples/sec   Loss 13.2974   LearningRate 0.2968   Epoch: 1   Global Step: 11430   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:05:58,933-Speed 5338.67 samples/sec   Loss 13.3457   LearningRate 0.2967   Epoch: 1   Global Step: 11440   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:06:06,589-Speed 5350.97 samples/sec   Loss 13.2438   LearningRate 0.2967   Epoch: 1   Global Step: 11450   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:06:14,284-Speed 5323.85 samples/sec   Loss 13.1299   LearningRate 0.2967   Epoch: 1   Global Step: 11460   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:06:22,008-Speed 5303.41 samples/sec   Loss 13.3141   LearningRate 0.2967   Epoch: 1   Global Step: 11470   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:06:29,818-Speed 5245.47 samples/sec   Loss 13.2452   LearningRate 0.2966   Epoch: 1   Global Step: 11480   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:06:37,602-Speed 5262.49 samples/sec   Loss 13.2575   LearningRate 0.2966   Epoch: 1   Global Step: 11490   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:06:45,377-Speed 5269.30 samples/sec   Loss 13.2523   LearningRate 0.2966   Epoch: 1   Global Step: 11500   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:06:53,142-Speed 5275.54 samples/sec   Loss 13.3322   LearningRate 0.2965   Epoch: 1   Global Step: 11510   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:07:00,879-Speed 5297.15 samples/sec   Loss 13.2028   LearningRate 0.2965   Epoch: 1   Global Step: 11520   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:07:08,740-Speed 5211.54 samples/sec   Loss 13.2290   LearningRate 0.2965   Epoch: 1   Global Step: 11530   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:07:16,469-Speed 5300.26 samples/sec   Loss 13.2599   LearningRate 0.2964   Epoch: 1   Global Step: 11540   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:07:24,102-Speed 5367.18 samples/sec   Loss 13.2839   LearningRate 0.2964   Epoch: 1   Global Step: 11550   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:07:31,694-Speed 5396.16 samples/sec   Loss 13.2566   LearningRate 0.2964   Epoch: 1   Global Step: 11560   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:07:39,352-Speed 5348.90 samples/sec   Loss 13.2513   LearningRate 0.2964   Epoch: 1   Global Step: 11570   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:07:46,983-Speed 5368.06 samples/sec   Loss 13.2679   LearningRate 0.2963   Epoch: 1   Global Step: 11580   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:07:54,631-Speed 5356.26 samples/sec   Loss 13.2683   LearningRate 0.2963   Epoch: 1   Global Step: 11590   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:08:02,311-Speed 5334.90 samples/sec   Loss 13.2650   LearningRate 0.2963   Epoch: 1   Global Step: 11600   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:08:09,885-Speed 5408.48 samples/sec   Loss 13.2123   LearningRate 0.2962   Epoch: 1   Global Step: 11610   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:08:17,512-Speed 5371.02 samples/sec   Loss 13.2267   LearningRate 0.2962   Epoch: 1   Global Step: 11620   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:08:25,127-Speed 5379.54 samples/sec   Loss 13.3253   LearningRate 0.2962   Epoch: 1   Global Step: 11630   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:08:32,733-Speed 5385.97 samples/sec   Loss 13.1986   LearningRate 0.2961   Epoch: 1   Global Step: 11640   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:08:40,338-Speed 5386.75 samples/sec   Loss 13.1991   LearningRate 0.2961   Epoch: 1   Global Step: 11650   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:08:47,831-Speed 5466.96 samples/sec   Loss 13.2160   LearningRate 0.2961   Epoch: 1   Global Step: 11660   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:08:55,384-Speed 5423.91 samples/sec   Loss 13.0999   LearningRate 0.2961   Epoch: 1   Global Step: 11670   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:09:02,921-Speed 5435.53 samples/sec   Loss 13.1795   LearningRate 0.2960   Epoch: 1   Global Step: 11680   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:09:10,465-Speed 5429.88 samples/sec   Loss 13.1979   LearningRate 0.2960   Epoch: 1   Global Step: 11690   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:09:17,917-Speed 5497.10 samples/sec   Loss 13.2174   LearningRate 0.2960   Epoch: 1   Global Step: 11700   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:09:25,468-Speed 5425.64 samples/sec   Loss 13.2076   LearningRate 0.2959   Epoch: 1   Global Step: 11710   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:09:32,978-Speed 5454.18 samples/sec   Loss 13.1710   LearningRate 0.2959   Epoch: 1   Global Step: 11720   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:09:40,511-Speed 5438.64 samples/sec   Loss 13.1435   LearningRate 0.2959   Epoch: 1   Global Step: 11730   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:09:48,012-Speed 5461.72 samples/sec   Loss 13.0338   LearningRate 0.2958   Epoch: 1   Global Step: 11740   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:09:55,599-Speed 5398.87 samples/sec   Loss 13.1484   LearningRate 0.2958   Epoch: 1   Global Step: 11750   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:10:03,196-Speed 5392.68 samples/sec   Loss 13.1845   LearningRate 0.2958   Epoch: 1   Global Step: 11760   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:10:10,802-Speed 5385.81 samples/sec   Loss 13.1962   LearningRate 0.2957   Epoch: 1   Global Step: 11770   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:10:18,364-Speed 5416.90 samples/sec   Loss 13.2717   LearningRate 0.2957   Epoch: 1   Global Step: 11780   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:10:25,884-Speed 5447.57 samples/sec   Loss 13.1031   LearningRate 0.2957   Epoch: 1   Global Step: 11790   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:10:33,491-Speed 5385.50 samples/sec   Loss 13.1908   LearningRate 0.2957   Epoch: 1   Global Step: 11800   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:10:41,035-Speed 5430.31 samples/sec   Loss 13.1675   LearningRate 0.2956   Epoch: 1   Global Step: 11810   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:10:48,475-Speed 5505.90 samples/sec   Loss 13.1607   LearningRate 0.2956   Epoch: 1   Global Step: 11820   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:10:56,054-Speed 5404.83 samples/sec   Loss 13.0563   LearningRate 0.2956   Epoch: 1   Global Step: 11830   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:11:03,603-Speed 5426.97 samples/sec   Loss 13.1686   LearningRate 0.2955   Epoch: 1   Global Step: 11840   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:11:11,100-Speed 5463.94 samples/sec   Loss 13.1308   LearningRate 0.2955   Epoch: 1   Global Step: 11850   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:11:18,607-Speed 5456.95 samples/sec   Loss 13.2911   LearningRate 0.2955   Epoch: 1   Global Step: 11860   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:11:26,245-Speed 5363.84 samples/sec   Loss 13.1577   LearningRate 0.2954   Epoch: 1   Global Step: 11870   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:11:33,845-Speed 5389.99 samples/sec   Loss 13.0823   LearningRate 0.2954   Epoch: 1   Global Step: 11880   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:11:41,644-Speed 5252.82 samples/sec   Loss 13.0828   LearningRate 0.2954   Epoch: 1   Global Step: 11890   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:11:49,354-Speed 5313.15 samples/sec   Loss 13.1124   LearningRate 0.2954   Epoch: 1   Global Step: 11900   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:11:56,809-Speed 5495.51 samples/sec   Loss 13.1026   LearningRate 0.2953   Epoch: 1   Global Step: 11910   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:12:04,358-Speed 5426.17 samples/sec   Loss 13.1142   LearningRate 0.2953   Epoch: 1   Global Step: 11920   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:12:11,865-Speed 5456.84 samples/sec   Loss 13.0623   LearningRate 0.2953   Epoch: 1   Global Step: 11930   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:12:19,415-Speed 5426.14 samples/sec   Loss 13.0022   LearningRate 0.2952   Epoch: 1   Global Step: 11940   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:12:26,869-Speed 5496.41 samples/sec   Loss 13.0730   LearningRate 0.2952   Epoch: 1   Global Step: 11950   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:12:34,417-Speed 5426.63 samples/sec   Loss 13.0846   LearningRate 0.2952   Epoch: 1   Global Step: 11960   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:12:41,921-Speed 5459.79 samples/sec   Loss 12.9961   LearningRate 0.2951   Epoch: 1   Global Step: 11970   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:12:49,562-Speed 5360.64 samples/sec   Loss 13.1302   LearningRate 0.2951   Epoch: 1   Global Step: 11980   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:12:57,078-Speed 5451.04 samples/sec   Loss 13.0519   LearningRate 0.2951   Epoch: 1   Global Step: 11990   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:13:04,622-Speed 5430.38 samples/sec   Loss 13.1204   LearningRate 0.2951   Epoch: 1   Global Step: 12000   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:13:49,845-[lfw][12000]XNorm: 23.169394
Training: 2022-01-07 21:13:49,846-[lfw][12000]Accuracy-Flip: 0.99567+-0.00291
Training: 2022-01-07 21:13:49,847-[lfw][12000]Accuracy-Highest: 0.99567
Training: 2022-01-07 21:14:42,993-[cfp_fp][12000]XNorm: 20.977489
Training: 2022-01-07 21:14:42,995-[cfp_fp][12000]Accuracy-Flip: 0.97486+-0.00756
Training: 2022-01-07 21:14:42,995-[cfp_fp][12000]Accuracy-Highest: 0.97486
Training: 2022-01-07 21:15:28,466-[agedb_30][12000]XNorm: 22.756491
Training: 2022-01-07 21:15:28,467-[agedb_30][12000]Accuracy-Flip: 0.95033+-0.00856
Training: 2022-01-07 21:15:28,468-[agedb_30][12000]Accuracy-Highest: 0.95500
Training: 2022-01-07 21:15:35,913-Speed 270.74 samples/sec   Loss 13.0954   LearningRate 0.2950   Epoch: 1   Global Step: 12010   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 21:15:43,457-Speed 5431.47 samples/sec   Loss 12.9869   LearningRate 0.2950   Epoch: 1   Global Step: 12020   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 21:15:50,971-Speed 5452.39 samples/sec   Loss 13.0627   LearningRate 0.2950   Epoch: 1   Global Step: 12030   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 21:15:58,555-Speed 5401.61 samples/sec   Loss 13.0714   LearningRate 0.2949   Epoch: 1   Global Step: 12040   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 21:16:06,050-Speed 5466.31 samples/sec   Loss 12.9489   LearningRate 0.2949   Epoch: 1   Global Step: 12050   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 21:16:13,504-Speed 5495.28 samples/sec   Loss 13.0250   LearningRate 0.2949   Epoch: 1   Global Step: 12060   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 21:16:20,990-Speed 5473.00 samples/sec   Loss 13.0391   LearningRate 0.2948   Epoch: 1   Global Step: 12070   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 21:16:28,604-Speed 5380.25 samples/sec   Loss 12.9319   LearningRate 0.2948   Epoch: 1   Global Step: 12080   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 21:16:36,167-Speed 5416.41 samples/sec   Loss 13.0468   LearningRate 0.2948   Epoch: 1   Global Step: 12090   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 21:16:43,687-Speed 5447.49 samples/sec   Loss 13.0873   LearningRate 0.2948   Epoch: 1   Global Step: 12100   Fp16 Grad Scale: 262144   Required: 45 hours
Training: 2022-01-07 21:16:51,284-Speed 5392.56 samples/sec   Loss 13.0246   LearningRate 0.2947   Epoch: 1   Global Step: 12110   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 21:16:58,835-Speed 5424.87 samples/sec   Loss 12.9656   LearningRate 0.2947   Epoch: 1   Global Step: 12120   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 21:17:06,416-Speed 5403.58 samples/sec   Loss 12.9858   LearningRate 0.2947   Epoch: 1   Global Step: 12130   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 21:17:14,008-Speed 5395.61 samples/sec   Loss 13.0218   LearningRate 0.2946   Epoch: 1   Global Step: 12140   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 21:17:21,576-Speed 5413.09 samples/sec   Loss 13.0186   LearningRate 0.2946   Epoch: 1   Global Step: 12150   Fp16 Grad Scale: 131072   Required: 45 hours
Training: 2022-01-07 21:17:29,217-Speed 5361.09 samples/sec   Loss 12.9985   LearningRate 0.2946   Epoch: 1   Global Step: 12160   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:17:36,767-Speed 5426.00 samples/sec   Loss 12.9943   LearningRate 0.2945   Epoch: 1   Global Step: 12170   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:17:44,331-Speed 5415.65 samples/sec   Loss 12.9677   LearningRate 0.2945   Epoch: 1   Global Step: 12180   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:17:51,815-Speed 5474.15 samples/sec   Loss 12.9440   LearningRate 0.2945   Epoch: 1   Global Step: 12190   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:17:59,377-Speed 5417.13 samples/sec   Loss 12.9398   LearningRate 0.2944   Epoch: 1   Global Step: 12200   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:18:06,956-Speed 5405.03 samples/sec   Loss 12.9105   LearningRate 0.2944   Epoch: 1   Global Step: 12210   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:18:14,500-Speed 5429.99 samples/sec   Loss 12.8647   LearningRate 0.2944   Epoch: 1   Global Step: 12220   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:18:22,166-Speed 5344.60 samples/sec   Loss 13.0062   LearningRate 0.2944   Epoch: 1   Global Step: 12230   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:18:29,683-Speed 5449.61 samples/sec   Loss 13.0078   LearningRate 0.2943   Epoch: 1   Global Step: 12240   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:18:37,289-Speed 5385.24 samples/sec   Loss 12.9056   LearningRate 0.2943   Epoch: 1   Global Step: 12250   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:18:44,801-Speed 5453.68 samples/sec   Loss 12.9464   LearningRate 0.2943   Epoch: 1   Global Step: 12260   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:18:52,354-Speed 5424.23 samples/sec   Loss 12.8854   LearningRate 0.2942   Epoch: 1   Global Step: 12270   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:18:59,896-Speed 5431.11 samples/sec   Loss 12.9327   LearningRate 0.2942   Epoch: 1   Global Step: 12280   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:19:07,562-Speed 5343.86 samples/sec   Loss 12.8387   LearningRate 0.2942   Epoch: 1   Global Step: 12290   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:19:15,102-Speed 5432.50 samples/sec   Loss 12.8677   LearningRate 0.2941   Epoch: 1   Global Step: 12300   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:19:22,561-Speed 5492.98 samples/sec   Loss 12.8989   LearningRate 0.2941   Epoch: 1   Global Step: 12310   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:19:30,052-Speed 5468.51 samples/sec   Loss 12.9954   LearningRate 0.2941   Epoch: 1   Global Step: 12320   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:19:37,655-Speed 5387.35 samples/sec   Loss 12.8717   LearningRate 0.2941   Epoch: 1   Global Step: 12330   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:19:45,197-Speed 5431.68 samples/sec   Loss 12.9747   LearningRate 0.2940   Epoch: 1   Global Step: 12340   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:19:52,792-Speed 5394.14 samples/sec   Loss 12.8236   LearningRate 0.2940   Epoch: 1   Global Step: 12350   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:20:00,312-Speed 5447.39 samples/sec   Loss 12.8641   LearningRate 0.2940   Epoch: 1   Global Step: 12360   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:20:07,982-Speed 5340.63 samples/sec   Loss 12.9403   LearningRate 0.2939   Epoch: 1   Global Step: 12370   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:20:15,527-Speed 5429.57 samples/sec   Loss 12.9187   LearningRate 0.2939   Epoch: 1   Global Step: 12380   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:20:23,159-Speed 5368.13 samples/sec   Loss 12.8235   LearningRate 0.2939   Epoch: 1   Global Step: 12390   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:20:30,782-Speed 5373.62 samples/sec   Loss 12.7978   LearningRate 0.2938   Epoch: 1   Global Step: 12400   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:20:38,410-Speed 5370.27 samples/sec   Loss 12.8439   LearningRate 0.2938   Epoch: 1   Global Step: 12410   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:20:46,056-Speed 5358.10 samples/sec   Loss 12.9074   LearningRate 0.2938   Epoch: 1   Global Step: 12420   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:20:53,661-Speed 5386.51 samples/sec   Loss 12.8756   LearningRate 0.2938   Epoch: 1   Global Step: 12430   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:21:01,312-Speed 5354.85 samples/sec   Loss 12.8546   LearningRate 0.2937   Epoch: 1   Global Step: 12440   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:21:08,786-Speed 5480.30 samples/sec   Loss 12.8946   LearningRate 0.2937   Epoch: 1   Global Step: 12450   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:21:16,355-Speed 5412.96 samples/sec   Loss 12.9205   LearningRate 0.2937   Epoch: 1   Global Step: 12460   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:21:23,853-Speed 5463.06 samples/sec   Loss 12.8130   LearningRate 0.2936   Epoch: 1   Global Step: 12470   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:21:31,389-Speed 5436.18 samples/sec   Loss 12.7886   LearningRate 0.2936   Epoch: 1   Global Step: 12480   Fp16 Grad Scale: 16384   Required: 44 hours
Training: 2022-01-07 21:21:38,852-Speed 5488.82 samples/sec   Loss 12.8410   LearningRate 0.2936   Epoch: 1   Global Step: 12490   Fp16 Grad Scale: 16384   Required: 44 hours
Training: 2022-01-07 21:21:46,497-Speed 5358.16 samples/sec   Loss 12.8369   LearningRate 0.2935   Epoch: 1   Global Step: 12500   Fp16 Grad Scale: 16384   Required: 44 hours
Training: 2022-01-07 21:21:54,060-Speed 5416.54 samples/sec   Loss 12.8686   LearningRate 0.2935   Epoch: 1   Global Step: 12510   Fp16 Grad Scale: 16384   Required: 44 hours
Training: 2022-01-07 21:22:01,586-Speed 5443.70 samples/sec   Loss 12.7894   LearningRate 0.2935   Epoch: 1   Global Step: 12520   Fp16 Grad Scale: 16384   Required: 44 hours
Training: 2022-01-07 21:22:09,066-Speed 5476.22 samples/sec   Loss 12.7894   LearningRate 0.2935   Epoch: 1   Global Step: 12530   Fp16 Grad Scale: 16384   Required: 44 hours
Training: 2022-01-07 21:22:16,674-Speed 5384.76 samples/sec   Loss 12.8669   LearningRate 0.2934   Epoch: 1   Global Step: 12540   Fp16 Grad Scale: 16384   Required: 44 hours
Training: 2022-01-07 21:22:24,165-Speed 5468.80 samples/sec   Loss 12.7800   LearningRate 0.2934   Epoch: 1   Global Step: 12550   Fp16 Grad Scale: 16384   Required: 44 hours
Training: 2022-01-07 21:22:31,783-Speed 5377.21 samples/sec   Loss 12.8728   LearningRate 0.2934   Epoch: 1   Global Step: 12560   Fp16 Grad Scale: 16384   Required: 44 hours
Training: 2022-01-07 21:22:39,363-Speed 5404.65 samples/sec   Loss 12.8529   LearningRate 0.2933   Epoch: 1   Global Step: 12570   Fp16 Grad Scale: 16384   Required: 44 hours
Training: 2022-01-07 21:22:47,048-Speed 5330.25 samples/sec   Loss 12.7943   LearningRate 0.2933   Epoch: 1   Global Step: 12580   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-01-07 21:22:54,521-Speed 5481.46 samples/sec   Loss 12.7103   LearningRate 0.2933   Epoch: 1   Global Step: 12590   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-01-07 21:23:02,089-Speed 5413.32 samples/sec   Loss 12.8623   LearningRate 0.2932   Epoch: 1   Global Step: 12600   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-01-07 21:23:09,577-Speed 5470.78 samples/sec   Loss 12.8043   LearningRate 0.2932   Epoch: 1   Global Step: 12610   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-01-07 21:23:17,162-Speed 5400.36 samples/sec   Loss 12.8633   LearningRate 0.2932   Epoch: 1   Global Step: 12620   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-01-07 21:23:24,652-Speed 5469.17 samples/sec   Loss 12.7585   LearningRate 0.2932   Epoch: 1   Global Step: 12630   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-01-07 21:23:32,276-Speed 5373.37 samples/sec   Loss 12.7518   LearningRate 0.2931   Epoch: 1   Global Step: 12640   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-01-07 21:23:39,920-Speed 5359.56 samples/sec   Loss 12.7895   LearningRate 0.2931   Epoch: 1   Global Step: 12650   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-01-07 21:23:47,604-Speed 5331.36 samples/sec   Loss 12.8228   LearningRate 0.2931   Epoch: 1   Global Step: 12660   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-01-07 21:23:55,097-Speed 5466.74 samples/sec   Loss 12.8635   LearningRate 0.2930   Epoch: 1   Global Step: 12670   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-01-07 21:24:02,671-Speed 5408.21 samples/sec   Loss 12.8678   LearningRate 0.2930   Epoch: 1   Global Step: 12680   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:24:10,180-Speed 5456.21 samples/sec   Loss 12.9057   LearningRate 0.2930   Epoch: 1   Global Step: 12690   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:24:17,759-Speed 5404.80 samples/sec   Loss 12.8906   LearningRate 0.2929   Epoch: 1   Global Step: 12700   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:24:25,301-Speed 5431.26 samples/sec   Loss 12.7420   LearningRate 0.2929   Epoch: 1   Global Step: 12710   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:24:32,885-Speed 5401.96 samples/sec   Loss 12.7021   LearningRate 0.2929   Epoch: 1   Global Step: 12720   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:24:40,461-Speed 5407.23 samples/sec   Loss 12.8497   LearningRate 0.2929   Epoch: 1   Global Step: 12730   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:24:47,939-Speed 5478.06 samples/sec   Loss 12.8073   LearningRate 0.2928   Epoch: 1   Global Step: 12740   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:24:55,467-Speed 5441.48 samples/sec   Loss 12.6738   LearningRate 0.2928   Epoch: 1   Global Step: 12750   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:25:03,074-Speed 5385.00 samples/sec   Loss 12.7060   LearningRate 0.2928   Epoch: 1   Global Step: 12760   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:25:10,658-Speed 5402.52 samples/sec   Loss 12.7301   LearningRate 0.2927   Epoch: 1   Global Step: 12770   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:25:18,133-Speed 5479.84 samples/sec   Loss 12.7431   LearningRate 0.2927   Epoch: 1   Global Step: 12780   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:25:25,678-Speed 5428.87 samples/sec   Loss 12.8561   LearningRate 0.2927   Epoch: 1   Global Step: 12790   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:25:33,284-Speed 5386.28 samples/sec   Loss 12.7123   LearningRate 0.2926   Epoch: 1   Global Step: 12800   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:25:40,866-Speed 5416.22 samples/sec   Loss 12.6940   LearningRate 0.2926   Epoch: 1   Global Step: 12810   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:25:48,343-Speed 5479.14 samples/sec   Loss 12.6908   LearningRate 0.2926   Epoch: 1   Global Step: 12820   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:25:55,833-Speed 5468.79 samples/sec   Loss 12.6478   LearningRate 0.2926   Epoch: 1   Global Step: 12830   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:26:03,336-Speed 5459.82 samples/sec   Loss 12.7603   LearningRate 0.2925   Epoch: 1   Global Step: 12840   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:26:10,789-Speed 5496.66 samples/sec   Loss 12.6845   LearningRate 0.2925   Epoch: 1   Global Step: 12850   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:26:18,351-Speed 5417.51 samples/sec   Loss 12.6644   LearningRate 0.2925   Epoch: 1   Global Step: 12860   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:26:25,881-Speed 5439.75 samples/sec   Loss 12.7377   LearningRate 0.2924   Epoch: 1   Global Step: 12870   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:26:33,562-Speed 5333.10 samples/sec   Loss 12.6519   LearningRate 0.2924   Epoch: 1   Global Step: 12880   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:26:41,063-Speed 5462.07 samples/sec   Loss 12.7072   LearningRate 0.2924   Epoch: 1   Global Step: 12890   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:26:48,628-Speed 5414.64 samples/sec   Loss 12.6974   LearningRate 0.2923   Epoch: 1   Global Step: 12900   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:26:56,180-Speed 5424.29 samples/sec   Loss 12.7156   LearningRate 0.2923   Epoch: 1   Global Step: 12910   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:27:03,993-Speed 5244.05 samples/sec   Loss 12.6229   LearningRate 0.2923   Epoch: 1   Global Step: 12920   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:27:11,561-Speed 5413.08 samples/sec   Loss 12.6430   LearningRate 0.2923   Epoch: 1   Global Step: 12930   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:27:19,290-Speed 5300.19 samples/sec   Loss 12.7075   LearningRate 0.2922   Epoch: 1   Global Step: 12940   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:27:27,043-Speed 5283.61 samples/sec   Loss 12.7042   LearningRate 0.2922   Epoch: 1   Global Step: 12950   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:27:34,759-Speed 5309.20 samples/sec   Loss 12.6862   LearningRate 0.2922   Epoch: 1   Global Step: 12960   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:27:42,385-Speed 5372.54 samples/sec   Loss 12.6520   LearningRate 0.2921   Epoch: 1   Global Step: 12970   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:27:50,034-Speed 5355.09 samples/sec   Loss 12.7256   LearningRate 0.2921   Epoch: 1   Global Step: 12980   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:27:57,536-Speed 5460.28 samples/sec   Loss 12.6721   LearningRate 0.2921   Epoch: 1   Global Step: 12990   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:28:05,145-Speed 5384.02 samples/sec   Loss 12.7526   LearningRate 0.2920   Epoch: 1   Global Step: 13000   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:28:12,714-Speed 5412.61 samples/sec   Loss 12.7668   LearningRate 0.2920   Epoch: 1   Global Step: 13010   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:28:20,405-Speed 5326.38 samples/sec   Loss 12.6150   LearningRate 0.2920   Epoch: 1   Global Step: 13020   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:28:27,984-Speed 5405.37 samples/sec   Loss 12.6571   LearningRate 0.2920   Epoch: 1   Global Step: 13030   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:28:35,512-Speed 5441.77 samples/sec   Loss 12.6693   LearningRate 0.2919   Epoch: 1   Global Step: 13040   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:28:43,059-Speed 5428.38 samples/sec   Loss 12.6376   LearningRate 0.2919   Epoch: 1   Global Step: 13050   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:28:50,716-Speed 5349.68 samples/sec   Loss 12.6570   LearningRate 0.2919   Epoch: 1   Global Step: 13060   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:28:58,179-Speed 5488.96 samples/sec   Loss 12.7047   LearningRate 0.2918   Epoch: 1   Global Step: 13070   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:29:05,890-Speed 5312.89 samples/sec   Loss 12.5767   LearningRate 0.2918   Epoch: 1   Global Step: 13080   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:29:13,471-Speed 5403.89 samples/sec   Loss 12.7304   LearningRate 0.2918   Epoch: 1   Global Step: 13090   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:29:21,086-Speed 5379.12 samples/sec   Loss 12.6837   LearningRate 0.2917   Epoch: 1   Global Step: 13100   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:29:28,787-Speed 5319.87 samples/sec   Loss 12.5683   LearningRate 0.2917   Epoch: 1   Global Step: 13110   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:29:36,317-Speed 5440.31 samples/sec   Loss 12.7078   LearningRate 0.2917   Epoch: 1   Global Step: 13120   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:29:43,981-Speed 5345.32 samples/sec   Loss 12.6879   LearningRate 0.2917   Epoch: 1   Global Step: 13130   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:29:51,547-Speed 5413.92 samples/sec   Loss 12.7269   LearningRate 0.2916   Epoch: 1   Global Step: 13140   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:29:59,101-Speed 5423.27 samples/sec   Loss 12.6655   LearningRate 0.2916   Epoch: 1   Global Step: 13150   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:30:06,641-Speed 5433.52 samples/sec   Loss 12.6647   LearningRate 0.2916   Epoch: 1   Global Step: 13160   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:30:14,224-Speed 5402.49 samples/sec   Loss 12.6477   LearningRate 0.2915   Epoch: 1   Global Step: 13170   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:30:21,758-Speed 5437.00 samples/sec   Loss 12.6472   LearningRate 0.2915   Epoch: 1   Global Step: 13180   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:30:29,283-Speed 5443.66 samples/sec   Loss 12.6025   LearningRate 0.2915   Epoch: 1   Global Step: 13190   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:30:36,807-Speed 5444.64 samples/sec   Loss 12.6491   LearningRate 0.2914   Epoch: 1   Global Step: 13200   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:30:44,350-Speed 5431.49 samples/sec   Loss 12.5566   LearningRate 0.2914   Epoch: 1   Global Step: 13210   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:30:51,967-Speed 5378.12 samples/sec   Loss 12.5478   LearningRate 0.2914   Epoch: 1   Global Step: 13220   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:30:59,496-Speed 5440.52 samples/sec   Loss 12.6399   LearningRate 0.2914   Epoch: 1   Global Step: 13230   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:31:07,166-Speed 5340.85 samples/sec   Loss 12.5809   LearningRate 0.2913   Epoch: 1   Global Step: 13240   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:31:14,675-Speed 5455.92 samples/sec   Loss 12.5658   LearningRate 0.2913   Epoch: 1   Global Step: 13250   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:31:22,282-Speed 5385.36 samples/sec   Loss 12.6075   LearningRate 0.2913   Epoch: 1   Global Step: 13260   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:31:29,783-Speed 5460.84 samples/sec   Loss 12.6126   LearningRate 0.2912   Epoch: 1   Global Step: 13270   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:31:37,354-Speed 5411.00 samples/sec   Loss 12.6383   LearningRate 0.2912   Epoch: 1   Global Step: 13280   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:31:44,882-Speed 5441.63 samples/sec   Loss 12.5946   LearningRate 0.2912   Epoch: 1   Global Step: 13290   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:31:52,499-Speed 5378.19 samples/sec   Loss 12.5775   LearningRate 0.2911   Epoch: 1   Global Step: 13300   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:32:00,033-Speed 5437.43 samples/sec   Loss 12.5519   LearningRate 0.2911   Epoch: 1   Global Step: 13310   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:32:07,625-Speed 5395.43 samples/sec   Loss 12.5724   LearningRate 0.2911   Epoch: 1   Global Step: 13320   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:32:15,168-Speed 5431.16 samples/sec   Loss 12.5032   LearningRate 0.2910   Epoch: 1   Global Step: 13330   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:32:22,717-Speed 5426.59 samples/sec   Loss 12.6105   LearningRate 0.2910   Epoch: 1   Global Step: 13340   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:32:30,523-Speed 5248.19 samples/sec   Loss 12.5599   LearningRate 0.2910   Epoch: 1   Global Step: 13350   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:32:38,245-Speed 5304.72 samples/sec   Loss 12.5549   LearningRate 0.2910   Epoch: 1   Global Step: 13360   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:32:45,869-Speed 5373.12 samples/sec   Loss 12.5501   LearningRate 0.2909   Epoch: 1   Global Step: 13370   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:32:53,510-Speed 5361.50 samples/sec   Loss 12.4573   LearningRate 0.2909   Epoch: 1   Global Step: 13380   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:33:01,063-Speed 5423.36 samples/sec   Loss 12.4904   LearningRate 0.2909   Epoch: 1   Global Step: 13390   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:33:08,633-Speed 5411.42 samples/sec   Loss 12.5328   LearningRate 0.2908   Epoch: 1   Global Step: 13400   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:33:16,174-Speed 5432.75 samples/sec   Loss 12.5116   LearningRate 0.2908   Epoch: 1   Global Step: 13410   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:33:23,758-Speed 5401.14 samples/sec   Loss 12.5449   LearningRate 0.2908   Epoch: 1   Global Step: 13420   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:33:31,347-Speed 5397.89 samples/sec   Loss 12.5721   LearningRate 0.2908   Epoch: 1   Global Step: 13430   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:33:38,790-Speed 5504.42 samples/sec   Loss 12.5708   LearningRate 0.2907   Epoch: 1   Global Step: 13440   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:33:46,337-Speed 5427.69 samples/sec   Loss 12.5217   LearningRate 0.2907   Epoch: 1   Global Step: 13450   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:33:53,904-Speed 5413.67 samples/sec   Loss 12.4674   LearningRate 0.2907   Epoch: 1   Global Step: 13460   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:34:01,540-Speed 5365.21 samples/sec   Loss 12.5462   LearningRate 0.2906   Epoch: 1   Global Step: 13470   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:34:09,005-Speed 5487.92 samples/sec   Loss 12.5960   LearningRate 0.2906   Epoch: 1   Global Step: 13480   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:34:16,595-Speed 5396.90 samples/sec   Loss 12.5483   LearningRate 0.2906   Epoch: 1   Global Step: 13490   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:34:24,073-Speed 5478.38 samples/sec   Loss 12.4156   LearningRate 0.2905   Epoch: 1   Global Step: 13500   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:34:31,800-Speed 5301.92 samples/sec   Loss 12.5294   LearningRate 0.2905   Epoch: 1   Global Step: 13510   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:34:39,354-Speed 5423.18 samples/sec   Loss 12.4290   LearningRate 0.2905   Epoch: 1   Global Step: 13520   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:34:46,781-Speed 5515.63 samples/sec   Loss 12.5417   LearningRate 0.2905   Epoch: 1   Global Step: 13530   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:34:54,307-Speed 5442.80 samples/sec   Loss 12.5317   LearningRate 0.2904   Epoch: 1   Global Step: 13540   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:35:01,766-Speed 5492.61 samples/sec   Loss 12.4753   LearningRate 0.2904   Epoch: 1   Global Step: 13550   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:35:09,253-Speed 5471.85 samples/sec   Loss 12.4583   LearningRate 0.2904   Epoch: 1   Global Step: 13560   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:35:16,705-Speed 5497.08 samples/sec   Loss 12.4278   LearningRate 0.2903   Epoch: 1   Global Step: 13570   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:35:24,203-Speed 5463.12 samples/sec   Loss 12.4822   LearningRate 0.2903   Epoch: 1   Global Step: 13580   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:35:31,749-Speed 5428.94 samples/sec   Loss 12.4341   LearningRate 0.2903   Epoch: 1   Global Step: 13590   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:35:39,224-Speed 5480.21 samples/sec   Loss 12.5106   LearningRate 0.2902   Epoch: 1   Global Step: 13600   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:35:46,710-Speed 5472.45 samples/sec   Loss 12.4517   LearningRate 0.2902   Epoch: 1   Global Step: 13610   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:35:54,213-Speed 5459.69 samples/sec   Loss 12.5062   LearningRate 0.2902   Epoch: 1   Global Step: 13620   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:36:01,769-Speed 5421.50 samples/sec   Loss 12.4933   LearningRate 0.2902   Epoch: 1   Global Step: 13630   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:36:09,282-Speed 5453.04 samples/sec   Loss 12.5672   LearningRate 0.2901   Epoch: 1   Global Step: 13640   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:36:16,815-Speed 5437.64 samples/sec   Loss 12.5784   LearningRate 0.2901   Epoch: 1   Global Step: 13650   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:36:24,380-Speed 5415.14 samples/sec   Loss 12.4710   LearningRate 0.2901   Epoch: 1   Global Step: 13660   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:36:31,954-Speed 5409.27 samples/sec   Loss 12.4979   LearningRate 0.2900   Epoch: 1   Global Step: 13670   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:36:39,434-Speed 5476.71 samples/sec   Loss 12.4648   LearningRate 0.2900   Epoch: 1   Global Step: 13680   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:36:46,871-Speed 5508.05 samples/sec   Loss 12.3590   LearningRate 0.2900   Epoch: 1   Global Step: 13690   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:36:54,479-Speed 5384.45 samples/sec   Loss 12.4690   LearningRate 0.2899   Epoch: 1   Global Step: 13700   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:37:02,005-Speed 5442.81 samples/sec   Loss 12.4087   LearningRate 0.2899   Epoch: 1   Global Step: 13710   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:37:09,543-Speed 5434.76 samples/sec   Loss 12.5434   LearningRate 0.2899   Epoch: 1   Global Step: 13720   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:37:16,990-Speed 5500.71 samples/sec   Loss 12.4477   LearningRate 0.2899   Epoch: 1   Global Step: 13730   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:37:24,461-Speed 5483.52 samples/sec   Loss 12.4172   LearningRate 0.2898   Epoch: 1   Global Step: 13740   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:37:31,912-Speed 5497.93 samples/sec   Loss 12.4540   LearningRate 0.2898   Epoch: 1   Global Step: 13750   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:37:39,499-Speed 5399.46 samples/sec   Loss 12.3987   LearningRate 0.2898   Epoch: 1   Global Step: 13760   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:37:46,993-Speed 5466.38 samples/sec   Loss 12.3342   LearningRate 0.2897   Epoch: 1   Global Step: 13770   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:37:54,591-Speed 5391.60 samples/sec   Loss 12.4145   LearningRate 0.2897   Epoch: 1   Global Step: 13780   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:38:02,076-Speed 5472.88 samples/sec   Loss 12.4494   LearningRate 0.2897   Epoch: 1   Global Step: 13790   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:38:09,603-Speed 5443.10 samples/sec   Loss 12.4702   LearningRate 0.2896   Epoch: 1   Global Step: 13800   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:38:17,112-Speed 5455.05 samples/sec   Loss 12.4359   LearningRate 0.2896   Epoch: 1   Global Step: 13810   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:38:24,655-Speed 5430.76 samples/sec   Loss 12.4807   LearningRate 0.2896   Epoch: 1   Global Step: 13820   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:38:32,197-Speed 5431.41 samples/sec   Loss 12.3940   LearningRate 0.2896   Epoch: 1   Global Step: 13830   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:38:39,730-Speed 5438.86 samples/sec   Loss 12.3387   LearningRate 0.2895   Epoch: 1   Global Step: 13840   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:38:47,199-Speed 5484.15 samples/sec   Loss 12.4800   LearningRate 0.2895   Epoch: 1   Global Step: 13850   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:38:54,783-Speed 5401.33 samples/sec   Loss 12.2948   LearningRate 0.2895   Epoch: 1   Global Step: 13860   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:39:02,402-Speed 5376.99 samples/sec   Loss 12.3457   LearningRate 0.2894   Epoch: 1   Global Step: 13870   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:39:09,969-Speed 5413.96 samples/sec   Loss 12.4804   LearningRate 0.2894   Epoch: 1   Global Step: 13880   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:39:17,449-Speed 5476.40 samples/sec   Loss 12.3843   LearningRate 0.2894   Epoch: 1   Global Step: 13890   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:39:24,906-Speed 5493.43 samples/sec   Loss 12.3827   LearningRate 0.2893   Epoch: 1   Global Step: 13900   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:39:32,442-Speed 5436.21 samples/sec   Loss 12.5234   LearningRate 0.2893   Epoch: 1   Global Step: 13910   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:39:39,868-Speed 5516.39 samples/sec   Loss 12.4804   LearningRate 0.2893   Epoch: 1   Global Step: 13920   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:39:47,318-Speed 5498.71 samples/sec   Loss 12.4513   LearningRate 0.2893   Epoch: 1   Global Step: 13930   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:39:54,838-Speed 5447.42 samples/sec   Loss 12.3627   LearningRate 0.2892   Epoch: 1   Global Step: 13940   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:40:02,385-Speed 5428.14 samples/sec   Loss 12.3609   LearningRate 0.2892   Epoch: 1   Global Step: 13950   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:40:09,852-Speed 5486.62 samples/sec   Loss 12.3971   LearningRate 0.2892   Epoch: 1   Global Step: 13960   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:40:17,438-Speed 5399.98 samples/sec   Loss 12.3756   LearningRate 0.2891   Epoch: 1   Global Step: 13970   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:40:25,016-Speed 5405.77 samples/sec   Loss 12.3415   LearningRate 0.2891   Epoch: 1   Global Step: 13980   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:40:32,507-Speed 5469.02 samples/sec   Loss 12.3365   LearningRate 0.2891   Epoch: 1   Global Step: 13990   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:40:40,066-Speed 5419.14 samples/sec   Loss 12.4009   LearningRate 0.2890   Epoch: 1   Global Step: 14000   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:41:24,458-[lfw][14000]XNorm: 22.441222
Training: 2022-01-07 21:41:24,459-[lfw][14000]Accuracy-Flip: 0.99583+-0.00271
Training: 2022-01-07 21:41:24,460-[lfw][14000]Accuracy-Highest: 0.99583
Training: 2022-01-07 21:42:17,480-[cfp_fp][14000]XNorm: 20.112162
Training: 2022-01-07 21:42:17,482-[cfp_fp][14000]Accuracy-Flip: 0.97571+-0.00723
Training: 2022-01-07 21:42:17,482-[cfp_fp][14000]Accuracy-Highest: 0.97571
Training: 2022-01-07 21:43:02,910-[agedb_30][14000]XNorm: 21.935689
Training: 2022-01-07 21:43:02,911-[agedb_30][14000]Accuracy-Flip: 0.96083+-0.01023
Training: 2022-01-07 21:43:02,911-[agedb_30][14000]Accuracy-Highest: 0.96083
Training: 2022-01-07 21:43:10,388-Speed 272.49 samples/sec   Loss 12.4301   LearningRate 0.2890   Epoch: 1   Global Step: 14010   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:43:17,823-Speed 5510.53 samples/sec   Loss 12.3692   LearningRate 0.2890   Epoch: 1   Global Step: 14020   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:43:25,363-Speed 5433.47 samples/sec   Loss 12.3483   LearningRate 0.2890   Epoch: 1   Global Step: 14030   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:43:32,835-Speed 5482.86 samples/sec   Loss 12.4098   LearningRate 0.2889   Epoch: 1   Global Step: 14040   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:43:40,328-Speed 5467.42 samples/sec   Loss 12.3178   LearningRate 0.2889   Epoch: 1   Global Step: 14050   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:43:47,849-Speed 5447.12 samples/sec   Loss 12.3039   LearningRate 0.2889   Epoch: 1   Global Step: 14060   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:43:55,289-Speed 5505.65 samples/sec   Loss 12.4757   LearningRate 0.2888   Epoch: 1   Global Step: 14070   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:44:02,766-Speed 5478.66 samples/sec   Loss 12.4649   LearningRate 0.2888   Epoch: 1   Global Step: 14080   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:44:10,344-Speed 5406.28 samples/sec   Loss 12.3674   LearningRate 0.2888   Epoch: 1   Global Step: 14090   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:44:17,887-Speed 5431.08 samples/sec   Loss 12.3085   LearningRate 0.2887   Epoch: 1   Global Step: 14100   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:44:25,404-Speed 5449.04 samples/sec   Loss 12.3516   LearningRate 0.2887   Epoch: 1   Global Step: 14110   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:44:32,970-Speed 5414.21 samples/sec   Loss 12.2761   LearningRate 0.2887   Epoch: 1   Global Step: 14120   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:44:40,378-Speed 5530.47 samples/sec   Loss 12.3513   LearningRate 0.2887   Epoch: 1   Global Step: 14130   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:44:48,069-Speed 5326.45 samples/sec   Loss 12.3753   LearningRate 0.2886   Epoch: 1   Global Step: 14140   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:44:55,667-Speed 5391.47 samples/sec   Loss 12.3664   LearningRate 0.2886   Epoch: 1   Global Step: 14150   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:45:03,179-Speed 5452.95 samples/sec   Loss 12.3876   LearningRate 0.2886   Epoch: 1   Global Step: 14160   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:45:10,853-Speed 5338.08 samples/sec   Loss 12.3115   LearningRate 0.2885   Epoch: 1   Global Step: 14170   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:45:18,404-Speed 5425.62 samples/sec   Loss 12.3335   LearningRate 0.2885   Epoch: 1   Global Step: 14180   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:45:25,834-Speed 5512.83 samples/sec   Loss 12.2460   LearningRate 0.2885   Epoch: 1   Global Step: 14190   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:45:33,277-Speed 5504.09 samples/sec   Loss 12.3597   LearningRate 0.2884   Epoch: 1   Global Step: 14200   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:45:40,880-Speed 5388.71 samples/sec   Loss 12.3711   LearningRate 0.2884   Epoch: 1   Global Step: 14210   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:45:48,391-Speed 5453.88 samples/sec   Loss 12.2943   LearningRate 0.2884   Epoch: 1   Global Step: 14220   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:45:55,956-Speed 5415.29 samples/sec   Loss 12.2129   LearningRate 0.2884   Epoch: 1   Global Step: 14230   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:46:03,370-Speed 5525.21 samples/sec   Loss 12.3628   LearningRate 0.2883   Epoch: 1   Global Step: 14240   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:46:10,906-Speed 5436.10 samples/sec   Loss 12.3240   LearningRate 0.2883   Epoch: 1   Global Step: 14250   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:46:18,415-Speed 5455.60 samples/sec   Loss 12.2284   LearningRate 0.2883   Epoch: 1   Global Step: 14260   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:46:25,960-Speed 5429.45 samples/sec   Loss 12.3104   LearningRate 0.2882   Epoch: 1   Global Step: 14270   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:46:33,480-Speed 5447.29 samples/sec   Loss 12.3352   LearningRate 0.2882   Epoch: 1   Global Step: 14280   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:46:40,953-Speed 5481.74 samples/sec   Loss 12.2764   LearningRate 0.2882   Epoch: 1   Global Step: 14290   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:46:48,459-Speed 5457.09 samples/sec   Loss 12.2269   LearningRate 0.2881   Epoch: 1   Global Step: 14300   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:46:55,861-Speed 5534.53 samples/sec   Loss 12.3147   LearningRate 0.2881   Epoch: 1   Global Step: 14310   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:47:03,322-Speed 5490.99 samples/sec   Loss 12.3252   LearningRate 0.2881   Epoch: 1   Global Step: 14320   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:47:10,729-Speed 5530.34 samples/sec   Loss 12.2893   LearningRate 0.2881   Epoch: 1   Global Step: 14330   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:47:18,203-Speed 5480.85 samples/sec   Loss 12.2000   LearningRate 0.2880   Epoch: 1   Global Step: 14340   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:47:25,652-Speed 5499.94 samples/sec   Loss 12.3291   LearningRate 0.2880   Epoch: 1   Global Step: 14350   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:47:33,126-Speed 5481.05 samples/sec   Loss 12.3541   LearningRate 0.2880   Epoch: 1   Global Step: 14360   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:47:40,554-Speed 5514.68 samples/sec   Loss 12.3236   LearningRate 0.2879   Epoch: 1   Global Step: 14370   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:47:47,973-Speed 5522.25 samples/sec   Loss 12.3368   LearningRate 0.2879   Epoch: 1   Global Step: 14380   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:47:55,428-Speed 5494.72 samples/sec   Loss 12.2552   LearningRate 0.2879   Epoch: 1   Global Step: 14390   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:48:02,858-Speed 5513.96 samples/sec   Loss 12.2552   LearningRate 0.2878   Epoch: 1   Global Step: 14400   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:48:10,290-Speed 5511.81 samples/sec   Loss 12.2727   LearningRate 0.2878   Epoch: 1   Global Step: 14410   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:48:17,664-Speed 5555.32 samples/sec   Loss 12.2455   LearningRate 0.2878   Epoch: 1   Global Step: 14420   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:48:25,069-Speed 5532.64 samples/sec   Loss 12.2629   LearningRate 0.2878   Epoch: 1   Global Step: 14430   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:48:32,520-Speed 5497.80 samples/sec   Loss 12.2335   LearningRate 0.2877   Epoch: 1   Global Step: 14440   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:48:39,970-Speed 5498.69 samples/sec   Loss 12.2537   LearningRate 0.2877   Epoch: 1   Global Step: 14450   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:48:47,472-Speed 5461.01 samples/sec   Loss 12.2840   LearningRate 0.2877   Epoch: 1   Global Step: 14460   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:48:54,946-Speed 5481.15 samples/sec   Loss 12.2940   LearningRate 0.2876   Epoch: 1   Global Step: 14470   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:49:02,455-Speed 5455.29 samples/sec   Loss 12.2502   LearningRate 0.2876   Epoch: 1   Global Step: 14480   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:49:09,929-Speed 5481.20 samples/sec   Loss 12.1756   LearningRate 0.2876   Epoch: 1   Global Step: 14490   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:49:17,412-Speed 5474.52 samples/sec   Loss 12.3102   LearningRate 0.2876   Epoch: 1   Global Step: 14500   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:49:24,905-Speed 5466.52 samples/sec   Loss 12.2763   LearningRate 0.2875   Epoch: 1   Global Step: 14510   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:49:32,334-Speed 5514.65 samples/sec   Loss 12.1500   LearningRate 0.2875   Epoch: 1   Global Step: 14520   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:49:39,787-Speed 5496.56 samples/sec   Loss 12.1094   LearningRate 0.2875   Epoch: 1   Global Step: 14530   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:49:47,341-Speed 5422.90 samples/sec   Loss 12.1229   LearningRate 0.2874   Epoch: 1   Global Step: 14540   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:49:55,093-Speed 5284.06 samples/sec   Loss 12.2309   LearningRate 0.2874   Epoch: 1   Global Step: 14550   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:50:02,567-Speed 5481.49 samples/sec   Loss 12.2540   LearningRate 0.2874   Epoch: 1   Global Step: 14560   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:50:10,028-Speed 5490.69 samples/sec   Loss 12.2431   LearningRate 0.2873   Epoch: 1   Global Step: 14570   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:50:17,480-Speed 5497.49 samples/sec   Loss 12.2370   LearningRate 0.2873   Epoch: 1   Global Step: 14580   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:50:25,078-Speed 5391.15 samples/sec   Loss 12.3006   LearningRate 0.2873   Epoch: 1   Global Step: 14590   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:50:32,556-Speed 5478.45 samples/sec   Loss 12.2801   LearningRate 0.2873   Epoch: 1   Global Step: 14600   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:50:40,021-Speed 5487.88 samples/sec   Loss 12.1951   LearningRate 0.2872   Epoch: 1   Global Step: 14610   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:50:47,438-Speed 5523.50 samples/sec   Loss 12.2061   LearningRate 0.2872   Epoch: 1   Global Step: 14620   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:50:54,944-Speed 5457.25 samples/sec   Loss 12.1984   LearningRate 0.2872   Epoch: 1   Global Step: 14630   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:51:02,470-Speed 5443.16 samples/sec   Loss 12.1669   LearningRate 0.2871   Epoch: 1   Global Step: 14640   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:51:09,944-Speed 5481.15 samples/sec   Loss 12.1411   LearningRate 0.2871   Epoch: 1   Global Step: 14650   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:51:17,533-Speed 5398.02 samples/sec   Loss 12.2352   LearningRate 0.2871   Epoch: 1   Global Step: 14660   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:51:25,135-Speed 5388.73 samples/sec   Loss 12.2891   LearningRate 0.2870   Epoch: 1   Global Step: 14670   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:51:32,641-Speed 5457.93 samples/sec   Loss 12.2388   LearningRate 0.2870   Epoch: 1   Global Step: 14680   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:51:40,177-Speed 5436.58 samples/sec   Loss 12.3763   LearningRate 0.2870   Epoch: 1   Global Step: 14690   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:51:47,668-Speed 5468.32 samples/sec   Loss 12.2010   LearningRate 0.2870   Epoch: 1   Global Step: 14700   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:51:55,143-Speed 5480.79 samples/sec   Loss 12.2460   LearningRate 0.2869   Epoch: 1   Global Step: 14710   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:52:02,659-Speed 5450.41 samples/sec   Loss 12.2294   LearningRate 0.2869   Epoch: 1   Global Step: 14720   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:52:10,197-Speed 5433.99 samples/sec   Loss 12.2499   LearningRate 0.2869   Epoch: 1   Global Step: 14730   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:52:17,649-Speed 5497.84 samples/sec   Loss 12.2377   LearningRate 0.2868   Epoch: 1   Global Step: 14740   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:52:25,107-Speed 5493.33 samples/sec   Loss 12.2317   LearningRate 0.2868   Epoch: 1   Global Step: 14750   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:52:32,554-Speed 5500.82 samples/sec   Loss 12.2076   LearningRate 0.2868   Epoch: 1   Global Step: 14760   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:52:40,050-Speed 5465.14 samples/sec   Loss 12.1738   LearningRate 0.2867   Epoch: 1   Global Step: 14770   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:52:47,469-Speed 5521.28 samples/sec   Loss 12.1298   LearningRate 0.2867   Epoch: 1   Global Step: 14780   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:52:54,919-Speed 5499.30 samples/sec   Loss 12.2354   LearningRate 0.2867   Epoch: 1   Global Step: 14790   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:53:02,468-Speed 5426.25 samples/sec   Loss 12.2319   LearningRate 0.2867   Epoch: 1   Global Step: 14800   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:53:09,955-Speed 5471.43 samples/sec   Loss 12.2164   LearningRate 0.2866   Epoch: 1   Global Step: 14810   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:53:17,367-Speed 5527.41 samples/sec   Loss 12.2844   LearningRate 0.2866   Epoch: 1   Global Step: 14820   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:53:24,805-Speed 5507.47 samples/sec   Loss 12.2772   LearningRate 0.2866   Epoch: 1   Global Step: 14830   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:53:32,238-Speed 5511.54 samples/sec   Loss 12.1402   LearningRate 0.2865   Epoch: 1   Global Step: 14840   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:53:39,721-Speed 5474.01 samples/sec   Loss 12.1795   LearningRate 0.2865   Epoch: 1   Global Step: 14850   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:53:47,270-Speed 5427.15 samples/sec   Loss 12.1402   LearningRate 0.2865   Epoch: 1   Global Step: 14860   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:53:54,890-Speed 5375.67 samples/sec   Loss 12.2010   LearningRate 0.2864   Epoch: 1   Global Step: 14870   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:54:02,400-Speed 5455.14 samples/sec   Loss 12.2071   LearningRate 0.2864   Epoch: 1   Global Step: 14880   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:54:09,898-Speed 5463.66 samples/sec   Loss 12.0670   LearningRate 0.2864   Epoch: 1   Global Step: 14890   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:54:17,423-Speed 5443.33 samples/sec   Loss 12.1757   LearningRate 0.2864   Epoch: 1   Global Step: 14900   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:54:25,047-Speed 5374.06 samples/sec   Loss 12.1603   LearningRate 0.2863   Epoch: 1   Global Step: 14910   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:54:32,526-Speed 5477.01 samples/sec   Loss 12.1607   LearningRate 0.2863   Epoch: 1   Global Step: 14920   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:54:40,011-Speed 5473.32 samples/sec   Loss 12.1211   LearningRate 0.2863   Epoch: 1   Global Step: 14930   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:54:47,511-Speed 5461.88 samples/sec   Loss 12.0481   LearningRate 0.2862   Epoch: 1   Global Step: 14940   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:54:55,127-Speed 5379.22 samples/sec   Loss 12.2598   LearningRate 0.2862   Epoch: 1   Global Step: 14950   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:55:02,649-Speed 5445.51 samples/sec   Loss 12.1879   LearningRate 0.2862   Epoch: 1   Global Step: 14960   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:55:10,200-Speed 5425.65 samples/sec   Loss 12.1482   LearningRate 0.2862   Epoch: 1   Global Step: 14970   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:55:17,772-Speed 5410.16 samples/sec   Loss 12.1169   LearningRate 0.2861   Epoch: 1   Global Step: 14980   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:55:25,308-Speed 5435.82 samples/sec   Loss 12.1402   LearningRate 0.2861   Epoch: 1   Global Step: 14990   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:55:32,922-Speed 5380.65 samples/sec   Loss 12.2179   LearningRate 0.2861   Epoch: 1   Global Step: 15000   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:55:40,484-Speed 5416.40 samples/sec   Loss 12.1599   LearningRate 0.2860   Epoch: 1   Global Step: 15010   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:55:47,975-Speed 5469.22 samples/sec   Loss 12.1988   LearningRate 0.2860   Epoch: 1   Global Step: 15020   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:55:55,486-Speed 5454.27 samples/sec   Loss 12.1944   LearningRate 0.2860   Epoch: 1   Global Step: 15030   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:56:03,143-Speed 5349.71 samples/sec   Loss 12.1404   LearningRate 0.2859   Epoch: 1   Global Step: 15040   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:56:10,746-Speed 5387.99 samples/sec   Loss 12.0876   LearningRate 0.2859   Epoch: 1   Global Step: 15050   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:56:18,416-Speed 5340.59 samples/sec   Loss 12.1564   LearningRate 0.2859   Epoch: 1   Global Step: 15060   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:56:26,032-Speed 5379.52 samples/sec   Loss 12.1160   LearningRate 0.2859   Epoch: 1   Global Step: 15070   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:56:33,596-Speed 5415.50 samples/sec   Loss 12.0452   LearningRate 0.2858   Epoch: 1   Global Step: 15080   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:56:41,047-Speed 5497.88 samples/sec   Loss 12.2411   LearningRate 0.2858   Epoch: 1   Global Step: 15090   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:56:48,567-Speed 5447.68 samples/sec   Loss 12.1706   LearningRate 0.2858   Epoch: 1   Global Step: 15100   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 21:56:56,134-Speed 5413.70 samples/sec   Loss 12.2155   LearningRate 0.2857   Epoch: 1   Global Step: 15110   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:57:03,690-Speed 5422.02 samples/sec   Loss 12.1229   LearningRate 0.2857   Epoch: 1   Global Step: 15120   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:57:11,347-Speed 5350.17 samples/sec   Loss 12.2142   LearningRate 0.2857   Epoch: 1   Global Step: 15130   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:57:18,880-Speed 5438.37 samples/sec   Loss 12.1915   LearningRate 0.2856   Epoch: 1   Global Step: 15140   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:57:26,479-Speed 5390.44 samples/sec   Loss 12.1047   LearningRate 0.2856   Epoch: 1   Global Step: 15150   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:57:33,988-Speed 5455.58 samples/sec   Loss 12.1248   LearningRate 0.2856   Epoch: 1   Global Step: 15160   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:57:41,499-Speed 5453.98 samples/sec   Loss 12.1798   LearningRate 0.2856   Epoch: 1   Global Step: 15170   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:57:49,030-Speed 5439.93 samples/sec   Loss 12.1065   LearningRate 0.2855   Epoch: 1   Global Step: 15180   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:57:56,615-Speed 5400.95 samples/sec   Loss 12.0946   LearningRate 0.2855   Epoch: 1   Global Step: 15190   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:58:04,153-Speed 5434.01 samples/sec   Loss 12.0681   LearningRate 0.2855   Epoch: 1   Global Step: 15200   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:58:11,722-Speed 5412.29 samples/sec   Loss 12.0872   LearningRate 0.2854   Epoch: 1   Global Step: 15210   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 21:58:19,373-Speed 5354.74 samples/sec   Loss 12.1810   LearningRate 0.2854   Epoch: 1   Global Step: 15220   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:58:26,832-Speed 5491.43 samples/sec   Loss 12.1379   LearningRate 0.2854   Epoch: 1   Global Step: 15230   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:58:34,469-Speed 5364.53 samples/sec   Loss 12.1496   LearningRate 0.2853   Epoch: 1   Global Step: 15240   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:58:42,161-Speed 5325.15 samples/sec   Loss 12.0580   LearningRate 0.2853   Epoch: 1   Global Step: 15250   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:58:49,676-Speed 5451.54 samples/sec   Loss 12.0333   LearningRate 0.2853   Epoch: 1   Global Step: 15260   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:58:57,379-Speed 5317.88 samples/sec   Loss 12.1592   LearningRate 0.2853   Epoch: 1   Global Step: 15270   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:59:04,911-Speed 5438.97 samples/sec   Loss 12.0997   LearningRate 0.2852   Epoch: 1   Global Step: 15280   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:59:12,416-Speed 5458.25 samples/sec   Loss 12.1326   LearningRate 0.2852   Epoch: 1   Global Step: 15290   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:59:20,103-Speed 5329.20 samples/sec   Loss 12.0860   LearningRate 0.2852   Epoch: 1   Global Step: 15300   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:59:27,517-Speed 5525.48 samples/sec   Loss 12.0671   LearningRate 0.2851   Epoch: 1   Global Step: 15310   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:59:35,069-Speed 5424.54 samples/sec   Loss 12.1547   LearningRate 0.2851   Epoch: 1   Global Step: 15320   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:59:42,546-Speed 5478.35 samples/sec   Loss 12.0429   LearningRate 0.2851   Epoch: 1   Global Step: 15330   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:59:50,046-Speed 5461.93 samples/sec   Loss 12.0837   LearningRate 0.2851   Epoch: 1   Global Step: 15340   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 21:59:57,525-Speed 5477.69 samples/sec   Loss 12.0140   LearningRate 0.2850   Epoch: 1   Global Step: 15350   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 22:00:05,171-Speed 5357.91 samples/sec   Loss 12.1251   LearningRate 0.2850   Epoch: 1   Global Step: 15360   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 22:00:12,812-Speed 5360.90 samples/sec   Loss 12.0295   LearningRate 0.2850   Epoch: 1   Global Step: 15370   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 22:00:20,340-Speed 5442.10 samples/sec   Loss 12.1015   LearningRate 0.2849   Epoch: 1   Global Step: 15380   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 22:00:27,873-Speed 5437.65 samples/sec   Loss 12.1639   LearningRate 0.2849   Epoch: 1   Global Step: 15390   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 22:00:35,488-Speed 5380.10 samples/sec   Loss 12.0662   LearningRate 0.2849   Epoch: 1   Global Step: 15400   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 22:00:43,102-Speed 5380.02 samples/sec   Loss 12.0423   LearningRate 0.2848   Epoch: 1   Global Step: 15410   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 22:00:50,591-Speed 5470.23 samples/sec   Loss 12.0610   LearningRate 0.2848   Epoch: 1   Global Step: 15420   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 22:00:58,101-Speed 5454.83 samples/sec   Loss 12.1212   LearningRate 0.2848   Epoch: 1   Global Step: 15430   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:01:05,648-Speed 5427.88 samples/sec   Loss 12.0466   LearningRate 0.2848   Epoch: 1   Global Step: 15440   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:01:13,201-Speed 5424.08 samples/sec   Loss 11.9914   LearningRate 0.2847   Epoch: 1   Global Step: 15450   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:01:20,754-Speed 5423.72 samples/sec   Loss 12.1407   LearningRate 0.2847   Epoch: 1   Global Step: 15460   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:01:28,382-Speed 5370.06 samples/sec   Loss 12.1449   LearningRate 0.2847   Epoch: 1   Global Step: 15470   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:01:35,887-Speed 5458.68 samples/sec   Loss 12.1123   LearningRate 0.2846   Epoch: 1   Global Step: 15480   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:01:43,332-Speed 5502.25 samples/sec   Loss 12.1090   LearningRate 0.2846   Epoch: 1   Global Step: 15490   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:01:50,856-Speed 5445.04 samples/sec   Loss 12.0826   LearningRate 0.2846   Epoch: 1   Global Step: 15500   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:01:58,537-Speed 5332.86 samples/sec   Loss 12.0643   LearningRate 0.2845   Epoch: 1   Global Step: 15510   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:02:06,236-Speed 5320.73 samples/sec   Loss 11.9840   LearningRate 0.2845   Epoch: 1   Global Step: 15520   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:02:13,836-Speed 5390.70 samples/sec   Loss 12.0048   LearningRate 0.2845   Epoch: 1   Global Step: 15530   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:02:21,538-Speed 5318.34 samples/sec   Loss 12.0624   LearningRate 0.2845   Epoch: 1   Global Step: 15540   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 22:02:29,458-Speed 5172.29 samples/sec   Loss 12.0722   LearningRate 0.2844   Epoch: 1   Global Step: 15550   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 22:02:37,066-Speed 5384.95 samples/sec   Loss 11.9984   LearningRate 0.2844   Epoch: 1   Global Step: 15560   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 22:02:44,641-Speed 5407.82 samples/sec   Loss 12.1318   LearningRate 0.2844   Epoch: 1   Global Step: 15570   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 22:02:52,132-Speed 5468.64 samples/sec   Loss 12.0888   LearningRate 0.2843   Epoch: 1   Global Step: 15580   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 22:02:59,669-Speed 5434.63 samples/sec   Loss 12.0377   LearningRate 0.2843   Epoch: 1   Global Step: 15590   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 22:03:07,191-Speed 5446.15 samples/sec   Loss 12.0455   LearningRate 0.2843   Epoch: 1   Global Step: 15600   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 22:03:14,766-Speed 5408.31 samples/sec   Loss 12.0350   LearningRate 0.2843   Epoch: 1   Global Step: 15610   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 22:03:22,402-Speed 5364.63 samples/sec   Loss 12.1025   LearningRate 0.2842   Epoch: 1   Global Step: 15620   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 22:03:29,908-Speed 5457.56 samples/sec   Loss 12.0606   LearningRate 0.2842   Epoch: 1   Global Step: 15630   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 22:03:37,464-Speed 5421.47 samples/sec   Loss 12.0020   LearningRate 0.2842   Epoch: 1   Global Step: 15640   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:03:45,015-Speed 5425.40 samples/sec   Loss 12.0277   LearningRate 0.2841   Epoch: 1   Global Step: 15650   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:03:52,469-Speed 5496.06 samples/sec   Loss 12.0875   LearningRate 0.2841   Epoch: 1   Global Step: 15660   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:03:59,974-Speed 5458.14 samples/sec   Loss 12.0739   LearningRate 0.2841   Epoch: 1   Global Step: 15670   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:04:07,471-Speed 5464.08 samples/sec   Loss 11.9298   LearningRate 0.2840   Epoch: 1   Global Step: 15680   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:04:14,966-Speed 5467.51 samples/sec   Loss 12.1075   LearningRate 0.2840   Epoch: 1   Global Step: 15690   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:04:22,418-Speed 5496.85 samples/sec   Loss 12.0433   LearningRate 0.2840   Epoch: 1   Global Step: 15700   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:04:29,987-Speed 5412.73 samples/sec   Loss 11.9214   LearningRate 0.2840   Epoch: 1   Global Step: 15710   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:04:37,529-Speed 5431.12 samples/sec   Loss 11.9440   LearningRate 0.2839   Epoch: 1   Global Step: 15720   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:04:45,024-Speed 5466.17 samples/sec   Loss 12.0464   LearningRate 0.2839   Epoch: 1   Global Step: 15730   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:04:52,522-Speed 5463.38 samples/sec   Loss 12.0619   LearningRate 0.2839   Epoch: 1   Global Step: 15740   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:05:00,010-Speed 5470.88 samples/sec   Loss 12.0800   LearningRate 0.2838   Epoch: 1   Global Step: 15750   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:05:07,604-Speed 5394.80 samples/sec   Loss 12.0668   LearningRate 0.2838   Epoch: 1   Global Step: 15760   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:05:15,091-Speed 5471.56 samples/sec   Loss 12.0013   LearningRate 0.2838   Epoch: 1   Global Step: 15770   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:05:22,670-Speed 5405.36 samples/sec   Loss 11.9690   LearningRate 0.2837   Epoch: 1   Global Step: 15780   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:05:30,270-Speed 5389.76 samples/sec   Loss 11.9724   LearningRate 0.2837   Epoch: 1   Global Step: 15790   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:05:37,828-Speed 5420.15 samples/sec   Loss 11.9900   LearningRate 0.2837   Epoch: 1   Global Step: 15800   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:05:45,346-Speed 5449.41 samples/sec   Loss 11.9519   LearningRate 0.2837   Epoch: 1   Global Step: 15810   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:05:52,875-Speed 5440.79 samples/sec   Loss 12.0236   LearningRate 0.2836   Epoch: 1   Global Step: 15820   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:06:00,379-Speed 5459.36 samples/sec   Loss 11.9751   LearningRate 0.2836   Epoch: 1   Global Step: 15830   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:06:07,872-Speed 5467.33 samples/sec   Loss 12.0442   LearningRate 0.2836   Epoch: 1   Global Step: 15840   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:06:15,334-Speed 5489.21 samples/sec   Loss 12.0976   LearningRate 0.2835   Epoch: 1   Global Step: 15850   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:06:22,763-Speed 5514.94 samples/sec   Loss 11.9989   LearningRate 0.2835   Epoch: 1   Global Step: 15860   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:06:30,252-Speed 5469.88 samples/sec   Loss 11.9487   LearningRate 0.2835   Epoch: 1   Global Step: 15870   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:06:37,740-Speed 5470.65 samples/sec   Loss 12.0144   LearningRate 0.2835   Epoch: 1   Global Step: 15880   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:06:45,164-Speed 5518.48 samples/sec   Loss 12.0102   LearningRate 0.2834   Epoch: 1   Global Step: 15890   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:06:52,640-Speed 5479.70 samples/sec   Loss 11.9485   LearningRate 0.2834   Epoch: 1   Global Step: 15900   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:07:00,139-Speed 5463.12 samples/sec   Loss 11.9680   LearningRate 0.2834   Epoch: 1   Global Step: 15910   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:07:07,727-Speed 5398.07 samples/sec   Loss 12.0135   LearningRate 0.2833   Epoch: 1   Global Step: 15920   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:07:15,225-Speed 5463.60 samples/sec   Loss 11.9721   LearningRate 0.2833   Epoch: 1   Global Step: 15930   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:07:22,626-Speed 5535.29 samples/sec   Loss 11.9583   LearningRate 0.2833   Epoch: 1   Global Step: 15940   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:07:30,098-Speed 5482.97 samples/sec   Loss 12.0381   LearningRate 0.2832   Epoch: 1   Global Step: 15950   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:07:37,644-Speed 5428.16 samples/sec   Loss 11.9562   LearningRate 0.2832   Epoch: 1   Global Step: 15960   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:07:45,186-Speed 5432.29 samples/sec   Loss 11.9560   LearningRate 0.2832   Epoch: 1   Global Step: 15970   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:07:52,745-Speed 5419.50 samples/sec   Loss 12.0759   LearningRate 0.2832   Epoch: 1   Global Step: 15980   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:08:00,226-Speed 5475.87 samples/sec   Loss 11.9686   LearningRate 0.2831   Epoch: 1   Global Step: 15990   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:08:07,733-Speed 5456.95 samples/sec   Loss 12.0538   LearningRate 0.2831   Epoch: 1   Global Step: 16000   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:08:52,706-[lfw][16000]XNorm: 21.761526
Training: 2022-01-07 22:08:52,707-[lfw][16000]Accuracy-Flip: 0.99650+-0.00320
Training: 2022-01-07 22:08:52,707-[lfw][16000]Accuracy-Highest: 0.99650
Training: 2022-01-07 22:09:45,165-[cfp_fp][16000]XNorm: 19.362699
Training: 2022-01-07 22:09:45,168-[cfp_fp][16000]Accuracy-Flip: 0.97871+-0.00891
Training: 2022-01-07 22:09:45,168-[cfp_fp][16000]Accuracy-Highest: 0.97871
Training: 2022-01-07 22:10:30,781-[agedb_30][16000]XNorm: 21.484574
Training: 2022-01-07 22:10:30,782-[agedb_30][16000]Accuracy-Flip: 0.96367+-0.00942
Training: 2022-01-07 22:10:30,783-[agedb_30][16000]Accuracy-Highest: 0.96367
Training: 2022-01-07 22:10:38,427-Speed 271.81 samples/sec   Loss 11.9727   LearningRate 0.2831   Epoch: 1   Global Step: 16010   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 22:10:45,922-Speed 5466.15 samples/sec   Loss 12.0170   LearningRate 0.2830   Epoch: 1   Global Step: 16020   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 22:10:53,428-Speed 5457.66 samples/sec   Loss 11.8415   LearningRate 0.2830   Epoch: 1   Global Step: 16030   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 22:11:00,996-Speed 5413.80 samples/sec   Loss 11.9885   LearningRate 0.2830   Epoch: 1   Global Step: 16040   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 22:11:08,496-Speed 5462.99 samples/sec   Loss 11.9564   LearningRate 0.2829   Epoch: 1   Global Step: 16050   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 22:11:15,997-Speed 5461.22 samples/sec   Loss 11.9800   LearningRate 0.2829   Epoch: 1   Global Step: 16060   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 22:11:23,461-Speed 5488.89 samples/sec   Loss 11.9398   LearningRate 0.2829   Epoch: 1   Global Step: 16070   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 22:11:31,383-Speed 5170.81 samples/sec   Loss 11.9938   LearningRate 0.2829   Epoch: 1   Global Step: 16080   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 22:11:38,932-Speed 5426.14 samples/sec   Loss 11.9241   LearningRate 0.2828   Epoch: 1   Global Step: 16090   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 22:11:46,409-Speed 5479.00 samples/sec   Loss 11.9227   LearningRate 0.2828   Epoch: 1   Global Step: 16100   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 22:11:53,912-Speed 5460.03 samples/sec   Loss 11.9694   LearningRate 0.2828   Epoch: 1   Global Step: 16110   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 22:12:01,408-Speed 5464.80 samples/sec   Loss 11.8951   LearningRate 0.2827   Epoch: 1   Global Step: 16120   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 22:12:08,924-Speed 5449.95 samples/sec   Loss 12.1204   LearningRate 0.2827   Epoch: 1   Global Step: 16130   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 22:12:16,417-Speed 5467.63 samples/sec   Loss 11.9497   LearningRate 0.2827   Epoch: 1   Global Step: 16140   Fp16 Grad Scale: 262144   Required: 44 hours
Training: 2022-01-07 22:12:23,863-Speed 5501.47 samples/sec   Loss 12.0267   LearningRate 0.2827   Epoch: 1   Global Step: 16150   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 22:12:31,368-Speed 5458.60 samples/sec   Loss 11.9111   LearningRate 0.2826   Epoch: 1   Global Step: 16160   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 22:12:38,962-Speed 5394.27 samples/sec   Loss 11.9392   LearningRate 0.2826   Epoch: 1   Global Step: 16170   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 22:12:46,565-Speed 5388.55 samples/sec   Loss 11.9020   LearningRate 0.2826   Epoch: 1   Global Step: 16180   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 22:12:54,084-Speed 5448.39 samples/sec   Loss 11.9080   LearningRate 0.2825   Epoch: 1   Global Step: 16190   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 22:13:01,544-Speed 5490.87 samples/sec   Loss 11.9647   LearningRate 0.2825   Epoch: 1   Global Step: 16200   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 22:13:09,059-Speed 5451.24 samples/sec   Loss 11.9416   LearningRate 0.2825   Epoch: 1   Global Step: 16210   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 22:13:16,515-Speed 5494.22 samples/sec   Loss 12.0265   LearningRate 0.2824   Epoch: 1   Global Step: 16220   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 22:13:23,964-Speed 5499.84 samples/sec   Loss 11.8712   LearningRate 0.2824   Epoch: 1   Global Step: 16230   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 22:13:31,424-Speed 5490.64 samples/sec   Loss 11.8683   LearningRate 0.2824   Epoch: 1   Global Step: 16240   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 22:13:38,914-Speed 5469.56 samples/sec   Loss 11.9481   LearningRate 0.2824   Epoch: 1   Global Step: 16250   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 22:13:46,362-Speed 5500.06 samples/sec   Loss 11.8664   LearningRate 0.2823   Epoch: 1   Global Step: 16260   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 22:13:53,973-Speed 5382.15 samples/sec   Loss 11.8610   LearningRate 0.2823   Epoch: 1   Global Step: 16270   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 22:14:01,473-Speed 5462.43 samples/sec   Loss 11.9150   LearningRate 0.2823   Epoch: 1   Global Step: 16280   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 22:14:08,899-Speed 5516.48 samples/sec   Loss 11.8809   LearningRate 0.2822   Epoch: 1   Global Step: 16290   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 22:14:16,469-Speed 5411.49 samples/sec   Loss 11.8689   LearningRate 0.2822   Epoch: 1   Global Step: 16300   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 22:14:23,945-Speed 5478.95 samples/sec   Loss 11.8531   LearningRate 0.2822   Epoch: 1   Global Step: 16310   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 22:14:31,540-Speed 5394.16 samples/sec   Loss 12.0001   LearningRate 0.2821   Epoch: 1   Global Step: 16320   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 22:14:39,032-Speed 5467.45 samples/sec   Loss 11.9334   LearningRate 0.2821   Epoch: 1   Global Step: 16330   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 22:14:46,545-Speed 5453.09 samples/sec   Loss 11.9738   LearningRate 0.2821   Epoch: 1   Global Step: 16340   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 22:14:54,031-Speed 5472.08 samples/sec   Loss 11.8681   LearningRate 0.2821   Epoch: 1   Global Step: 16350   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 22:15:01,643-Speed 5381.72 samples/sec   Loss 11.8746   LearningRate 0.2820   Epoch: 1   Global Step: 16360   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 22:15:09,141-Speed 5463.98 samples/sec   Loss 11.9397   LearningRate 0.2820   Epoch: 1   Global Step: 16370   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 22:15:16,645-Speed 5458.89 samples/sec   Loss 11.9901   LearningRate 0.2820   Epoch: 1   Global Step: 16380   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 22:15:24,219-Speed 5408.54 samples/sec   Loss 11.8824   LearningRate 0.2819   Epoch: 1   Global Step: 16390   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 22:15:31,897-Speed 5335.76 samples/sec   Loss 11.8579   LearningRate 0.2819   Epoch: 1   Global Step: 16400   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 22:15:39,392-Speed 5465.62 samples/sec   Loss 11.9369   LearningRate 0.2819   Epoch: 1   Global Step: 16410   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 22:15:47,090-Speed 5321.62 samples/sec   Loss 11.8939   LearningRate 0.2819   Epoch: 1   Global Step: 16420   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 22:15:54,566-Speed 5479.91 samples/sec   Loss 11.9192   LearningRate 0.2818   Epoch: 1   Global Step: 16430   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 22:16:02,033-Speed 5485.91 samples/sec   Loss 11.9696   LearningRate 0.2818   Epoch: 1   Global Step: 16440   Fp16 Grad Scale: 131072   Required: 44 hours
Training: 2022-01-07 22:16:09,582-Speed 5426.34 samples/sec   Loss 11.8898   LearningRate 0.2818   Epoch: 1   Global Step: 16450   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 22:16:17,039-Speed 5493.71 samples/sec   Loss 11.9455   LearningRate 0.2817   Epoch: 1   Global Step: 16460   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 22:16:24,537-Speed 5463.82 samples/sec   Loss 11.9215   LearningRate 0.2817   Epoch: 1   Global Step: 16470   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 22:16:32,018-Speed 5476.14 samples/sec   Loss 11.9503   LearningRate 0.2817   Epoch: 1   Global Step: 16480   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 22:16:39,643-Speed 5372.24 samples/sec   Loss 11.7922   LearningRate 0.2816   Epoch: 1   Global Step: 16490   Fp16 Grad Scale: 65536   Required: 44 hours
Training: 2022-01-07 22:16:47,134-Speed 5468.79 samples/sec   Loss 11.9215   LearningRate 0.2816   Epoch: 1   Global Step: 16500   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:16:54,674-Speed 5432.86 samples/sec   Loss 11.8768   LearningRate 0.2816   Epoch: 1   Global Step: 16510   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:17:02,187-Speed 5453.17 samples/sec   Loss 11.9417   LearningRate 0.2816   Epoch: 1   Global Step: 16520   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:17:09,766-Speed 5405.13 samples/sec   Loss 11.9013   LearningRate 0.2815   Epoch: 1   Global Step: 16530   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:17:17,358-Speed 5396.26 samples/sec   Loss 11.8647   LearningRate 0.2815   Epoch: 1   Global Step: 16540   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:17:24,850-Speed 5467.70 samples/sec   Loss 11.8029   LearningRate 0.2815   Epoch: 1   Global Step: 16550   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:17:32,327-Speed 5478.98 samples/sec   Loss 11.9163   LearningRate 0.2814   Epoch: 1   Global Step: 16560   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:17:39,861-Speed 5437.48 samples/sec   Loss 11.9220   LearningRate 0.2814   Epoch: 1   Global Step: 16570   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:17:47,310-Speed 5499.24 samples/sec   Loss 11.8637   LearningRate 0.2814   Epoch: 1   Global Step: 16580   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:17:54,829-Speed 5448.60 samples/sec   Loss 11.8443   LearningRate 0.2814   Epoch: 1   Global Step: 16590   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:18:02,321-Speed 5467.60 samples/sec   Loss 11.9191   LearningRate 0.2813   Epoch: 1   Global Step: 16600   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:18:09,916-Speed 5393.68 samples/sec   Loss 11.9200   LearningRate 0.2813   Epoch: 1   Global Step: 16610   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:18:17,428-Speed 5453.99 samples/sec   Loss 11.9130   LearningRate 0.2813   Epoch: 1   Global Step: 16620   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:18:24,921-Speed 5467.03 samples/sec   Loss 11.8297   LearningRate 0.2812   Epoch: 1   Global Step: 16630   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:18:32,409-Speed 5470.63 samples/sec   Loss 11.8171   LearningRate 0.2812   Epoch: 1   Global Step: 16640   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:18:39,971-Speed 5417.28 samples/sec   Loss 11.8007   LearningRate 0.2812   Epoch: 1   Global Step: 16650   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:18:47,457-Speed 5472.34 samples/sec   Loss 11.8565   LearningRate 0.2811   Epoch: 1   Global Step: 16660   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:18:54,915-Speed 5493.02 samples/sec   Loss 11.8022   LearningRate 0.2811   Epoch: 1   Global Step: 16670   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:19:02,451-Speed 5435.92 samples/sec   Loss 11.8642   LearningRate 0.2811   Epoch: 1   Global Step: 16680   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:19:09,956-Speed 5458.24 samples/sec   Loss 11.8389   LearningRate 0.2811   Epoch: 1   Global Step: 16690   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:19:17,452-Speed 5465.43 samples/sec   Loss 11.8730   LearningRate 0.2810   Epoch: 1   Global Step: 16700   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:19:24,891-Speed 5506.59 samples/sec   Loss 11.9409   LearningRate 0.2810   Epoch: 1   Global Step: 16710   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:19:32,352-Speed 5490.90 samples/sec   Loss 11.8849   LearningRate 0.2810   Epoch: 1   Global Step: 16720   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:19:39,803-Speed 5497.69 samples/sec   Loss 11.8794   LearningRate 0.2809   Epoch: 1   Global Step: 16730   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:19:47,392-Speed 5398.40 samples/sec   Loss 11.8603   LearningRate 0.2809   Epoch: 1   Global Step: 16740   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:19:54,960-Speed 5412.62 samples/sec   Loss 11.8432   LearningRate 0.2809   Epoch: 1   Global Step: 16750   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:20:02,484-Speed 5445.15 samples/sec   Loss 11.8204   LearningRate 0.2809   Epoch: 1   Global Step: 16760   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:20:09,971-Speed 5472.04 samples/sec   Loss 11.8541   LearningRate 0.2808   Epoch: 1   Global Step: 16770   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:20:17,424-Speed 5496.43 samples/sec   Loss 11.7822   LearningRate 0.2808   Epoch: 1   Global Step: 16780   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:20:24,983-Speed 5419.25 samples/sec   Loss 11.8950   LearningRate 0.2808   Epoch: 1   Global Step: 16790   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:20:32,563-Speed 5405.25 samples/sec   Loss 11.8269   LearningRate 0.2807   Epoch: 1   Global Step: 16800   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:20:40,119-Speed 5421.64 samples/sec   Loss 11.8345   LearningRate 0.2807   Epoch: 1   Global Step: 16810   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:20:47,659-Speed 5432.91 samples/sec   Loss 11.8710   LearningRate 0.2807   Epoch: 1   Global Step: 16820   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:20:55,086-Speed 5515.62 samples/sec   Loss 11.8998   LearningRate 0.2806   Epoch: 1   Global Step: 16830   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:21:02,670-Speed 5401.91 samples/sec   Loss 11.8243   LearningRate 0.2806   Epoch: 1   Global Step: 16840   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:21:10,160-Speed 5469.34 samples/sec   Loss 11.8116   LearningRate 0.2806   Epoch: 1   Global Step: 16850   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:21:17,569-Speed 5528.69 samples/sec   Loss 11.8418   LearningRate 0.2806   Epoch: 1   Global Step: 16860   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:21:25,036-Speed 5486.52 samples/sec   Loss 11.8848   LearningRate 0.2805   Epoch: 1   Global Step: 16870   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:21:32,592-Speed 5421.64 samples/sec   Loss 11.7758   LearningRate 0.2805   Epoch: 1   Global Step: 16880   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:21:40,143-Speed 5425.20 samples/sec   Loss 11.8796   LearningRate 0.2805   Epoch: 1   Global Step: 16890   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:21:47,645-Speed 5460.54 samples/sec   Loss 11.8587   LearningRate 0.2804   Epoch: 1   Global Step: 16900   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:21:55,075-Speed 5512.82 samples/sec   Loss 11.8005   LearningRate 0.2804   Epoch: 1   Global Step: 16910   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:22:02,771-Speed 5323.29 samples/sec   Loss 11.7888   LearningRate 0.2804   Epoch: 1   Global Step: 16920   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:22:10,483-Speed 5311.83 samples/sec   Loss 11.9554   LearningRate 0.2804   Epoch: 1   Global Step: 16930   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:22:17,930-Speed 5501.09 samples/sec   Loss 11.7596   LearningRate 0.2803   Epoch: 1   Global Step: 16940   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:22:25,405-Speed 5480.33 samples/sec   Loss 11.8170   LearningRate 0.2803   Epoch: 1   Global Step: 16950   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:22:32,888-Speed 5474.12 samples/sec   Loss 11.7901   LearningRate 0.2803   Epoch: 1   Global Step: 16960   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:22:40,380-Speed 5467.95 samples/sec   Loss 11.8136   LearningRate 0.2802   Epoch: 1   Global Step: 16970   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:22:47,915-Speed 5436.67 samples/sec   Loss 11.7885   LearningRate 0.2802   Epoch: 1   Global Step: 16980   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:22:55,416-Speed 5461.21 samples/sec   Loss 11.8070   LearningRate 0.2802   Epoch: 1   Global Step: 16990   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:23:02,890-Speed 5480.84 samples/sec   Loss 11.7453   LearningRate 0.2801   Epoch: 1   Global Step: 17000   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:23:10,350-Speed 5491.27 samples/sec   Loss 11.8360   LearningRate 0.2801   Epoch: 1   Global Step: 17010   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:23:17,929-Speed 5405.06 samples/sec   Loss 11.7457   LearningRate 0.2801   Epoch: 1   Global Step: 17020   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:23:25,468-Speed 5433.98 samples/sec   Loss 11.8287   LearningRate 0.2801   Epoch: 1   Global Step: 17030   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:23:33,068-Speed 5390.39 samples/sec   Loss 11.8810   LearningRate 0.2800   Epoch: 1   Global Step: 17040   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:23:40,675-Speed 5385.04 samples/sec   Loss 11.8252   LearningRate 0.2800   Epoch: 1   Global Step: 17050   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:23:48,292-Speed 5378.49 samples/sec   Loss 11.7770   LearningRate 0.2800   Epoch: 1   Global Step: 17060   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:23:55,775-Speed 5474.33 samples/sec   Loss 11.7805   LearningRate 0.2799   Epoch: 1   Global Step: 17070   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:24:03,387-Speed 5381.56 samples/sec   Loss 11.7377   LearningRate 0.2799   Epoch: 1   Global Step: 17080   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:24:10,846-Speed 5492.04 samples/sec   Loss 11.8516   LearningRate 0.2799   Epoch: 1   Global Step: 17090   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:24:18,317-Speed 5484.01 samples/sec   Loss 11.7803   LearningRate 0.2799   Epoch: 1   Global Step: 17100   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:24:25,789-Speed 5482.06 samples/sec   Loss 11.7730   LearningRate 0.2798   Epoch: 1   Global Step: 17110   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:24:33,318-Speed 5440.58 samples/sec   Loss 11.7852   LearningRate 0.2798   Epoch: 1   Global Step: 17120   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:24:40,821-Speed 5460.86 samples/sec   Loss 11.7631   LearningRate 0.2798   Epoch: 1   Global Step: 17130   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:24:48,546-Speed 5303.10 samples/sec   Loss 11.7879   LearningRate 0.2797   Epoch: 1   Global Step: 17140   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:24:56,270-Speed 5302.75 samples/sec   Loss 11.8560   LearningRate 0.2797   Epoch: 1   Global Step: 17150   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:25:03,979-Speed 5314.55 samples/sec   Loss 11.8337   LearningRate 0.2797   Epoch: 1   Global Step: 17160   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:25:11,669-Speed 5327.17 samples/sec   Loss 11.7590   LearningRate 0.2796   Epoch: 1   Global Step: 17170   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:25:19,348-Speed 5334.55 samples/sec   Loss 11.7980   LearningRate 0.2796   Epoch: 1   Global Step: 17180   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:25:26,928-Speed 5404.54 samples/sec   Loss 11.7557   LearningRate 0.2796   Epoch: 1   Global Step: 17190   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:25:34,429-Speed 5461.18 samples/sec   Loss 11.7251   LearningRate 0.2796   Epoch: 1   Global Step: 17200   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:25:41,885-Speed 5494.25 samples/sec   Loss 11.7396   LearningRate 0.2795   Epoch: 1   Global Step: 17210   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:25:49,348-Speed 5489.67 samples/sec   Loss 11.7647   LearningRate 0.2795   Epoch: 1   Global Step: 17220   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:25:56,785-Speed 5507.96 samples/sec   Loss 11.7440   LearningRate 0.2795   Epoch: 1   Global Step: 17230   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:26:04,288-Speed 5460.22 samples/sec   Loss 11.6905   LearningRate 0.2794   Epoch: 1   Global Step: 17240   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:26:11,768-Speed 5476.57 samples/sec   Loss 11.7251   LearningRate 0.2794   Epoch: 1   Global Step: 17250   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:26:19,293-Speed 5444.51 samples/sec   Loss 11.7953   LearningRate 0.2794   Epoch: 1   Global Step: 17260   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:26:26,808-Speed 5450.64 samples/sec   Loss 11.7203   LearningRate 0.2794   Epoch: 1   Global Step: 17270   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:26:34,325-Speed 5449.57 samples/sec   Loss 11.6925   LearningRate 0.2793   Epoch: 1   Global Step: 17280   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:26:41,780-Speed 5495.61 samples/sec   Loss 11.6674   LearningRate 0.2793   Epoch: 1   Global Step: 17290   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:26:49,319-Speed 5433.24 samples/sec   Loss 11.8251   LearningRate 0.2793   Epoch: 1   Global Step: 17300   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:26:56,976-Speed 5350.33 samples/sec   Loss 11.7053   LearningRate 0.2792   Epoch: 1   Global Step: 17310   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:27:04,465-Speed 5469.96 samples/sec   Loss 11.7708   LearningRate 0.2792   Epoch: 1   Global Step: 17320   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:27:11,936-Speed 5483.64 samples/sec   Loss 11.7606   LearningRate 0.2792   Epoch: 1   Global Step: 17330   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:27:19,379-Speed 5503.55 samples/sec   Loss 11.7131   LearningRate 0.2791   Epoch: 1   Global Step: 17340   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:27:26,991-Speed 5381.48 samples/sec   Loss 11.8009   LearningRate 0.2791   Epoch: 1   Global Step: 17350   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:27:34,487-Speed 5465.82 samples/sec   Loss 11.7101   LearningRate 0.2791   Epoch: 1   Global Step: 17360   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:27:42,156-Speed 5341.79 samples/sec   Loss 11.7035   LearningRate 0.2791   Epoch: 1   Global Step: 17370   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:27:49,907-Speed 5285.26 samples/sec   Loss 11.7666   LearningRate 0.2790   Epoch: 1   Global Step: 17380   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:27:57,627-Speed 5305.76 samples/sec   Loss 11.6701   LearningRate 0.2790   Epoch: 1   Global Step: 17390   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:28:05,231-Speed 5387.94 samples/sec   Loss 11.7415   LearningRate 0.2790   Epoch: 1   Global Step: 17400   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:28:12,745-Speed 5451.94 samples/sec   Loss 11.8421   LearningRate 0.2789   Epoch: 1   Global Step: 17410   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:28:20,366-Speed 5375.23 samples/sec   Loss 11.7053   LearningRate 0.2789   Epoch: 1   Global Step: 17420   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:28:27,897-Speed 5439.01 samples/sec   Loss 11.5990   LearningRate 0.2789   Epoch: 1   Global Step: 17430   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:28:35,322-Speed 5517.99 samples/sec   Loss 11.7936   LearningRate 0.2789   Epoch: 1   Global Step: 17440   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:28:42,799-Speed 5478.28 samples/sec   Loss 11.6749   LearningRate 0.2788   Epoch: 1   Global Step: 17450   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:28:50,281-Speed 5475.19 samples/sec   Loss 11.7122   LearningRate 0.2788   Epoch: 1   Global Step: 17460   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:28:57,963-Speed 5333.01 samples/sec   Loss 11.6976   LearningRate 0.2788   Epoch: 1   Global Step: 17470   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:29:05,451-Speed 5470.50 samples/sec   Loss 11.7080   LearningRate 0.2787   Epoch: 1   Global Step: 17480   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:29:12,878-Speed 5515.84 samples/sec   Loss 11.7594   LearningRate 0.2787   Epoch: 1   Global Step: 17490   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:29:20,297-Speed 5521.35 samples/sec   Loss 11.7628   LearningRate 0.2787   Epoch: 1   Global Step: 17500   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:29:27,802-Speed 5458.66 samples/sec   Loss 11.7693   LearningRate 0.2786   Epoch: 1   Global Step: 17510   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:29:35,251-Speed 5498.99 samples/sec   Loss 11.8484   LearningRate 0.2786   Epoch: 1   Global Step: 17520   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:29:42,788-Speed 5435.57 samples/sec   Loss 11.7212   LearningRate 0.2786   Epoch: 1   Global Step: 17530   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:29:50,279-Speed 5469.06 samples/sec   Loss 11.6231   LearningRate 0.2786   Epoch: 1   Global Step: 17540   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:29:57,708-Speed 5514.22 samples/sec   Loss 11.7298   LearningRate 0.2785   Epoch: 1   Global Step: 17550   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:30:05,183-Speed 5479.75 samples/sec   Loss 11.6099   LearningRate 0.2785   Epoch: 1   Global Step: 17560   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:30:12,601-Speed 5522.61 samples/sec   Loss 11.7400   LearningRate 0.2785   Epoch: 1   Global Step: 17570   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:30:20,133-Speed 5439.16 samples/sec   Loss 11.7175   LearningRate 0.2784   Epoch: 1   Global Step: 17580   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:30:27,593-Speed 5491.35 samples/sec   Loss 11.7141   LearningRate 0.2784   Epoch: 1   Global Step: 17590   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:30:35,037-Speed 5503.14 samples/sec   Loss 11.6389   LearningRate 0.2784   Epoch: 1   Global Step: 17600   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:30:42,586-Speed 5426.94 samples/sec   Loss 11.6348   LearningRate 0.2784   Epoch: 1   Global Step: 17610   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:30:50,042-Speed 5493.85 samples/sec   Loss 11.7030   LearningRate 0.2783   Epoch: 1   Global Step: 17620   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:30:57,542-Speed 5462.18 samples/sec   Loss 11.6822   LearningRate 0.2783   Epoch: 1   Global Step: 17630   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:31:05,039-Speed 5464.01 samples/sec   Loss 11.7790   LearningRate 0.2783   Epoch: 1   Global Step: 17640   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:31:12,524-Speed 5472.98 samples/sec   Loss 11.7627   LearningRate 0.2782   Epoch: 1   Global Step: 17650   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:31:19,988-Speed 5489.03 samples/sec   Loss 11.7763   LearningRate 0.2782   Epoch: 1   Global Step: 17660   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:31:27,473-Speed 5472.58 samples/sec   Loss 11.7053   LearningRate 0.2782   Epoch: 1   Global Step: 17670   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:31:34,953-Speed 5476.82 samples/sec   Loss 11.8049   LearningRate 0.2781   Epoch: 1   Global Step: 17680   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:31:42,423-Speed 5484.18 samples/sec   Loss 11.7423   LearningRate 0.2781   Epoch: 1   Global Step: 17690   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:31:49,883-Speed 5491.77 samples/sec   Loss 11.6406   LearningRate 0.2781   Epoch: 1   Global Step: 17700   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:31:57,499-Speed 5378.65 samples/sec   Loss 11.6358   LearningRate 0.2781   Epoch: 1   Global Step: 17710   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:32:05,028-Speed 5441.24 samples/sec   Loss 11.7090   LearningRate 0.2780   Epoch: 1   Global Step: 17720   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:32:12,483-Speed 5495.14 samples/sec   Loss 11.7663   LearningRate 0.2780   Epoch: 1   Global Step: 17730   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:32:20,026-Speed 5431.07 samples/sec   Loss 11.6907   LearningRate 0.2780   Epoch: 1   Global Step: 17740   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:32:27,574-Speed 5427.36 samples/sec   Loss 11.7093   LearningRate 0.2779   Epoch: 1   Global Step: 17750   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:32:35,103-Speed 5440.92 samples/sec   Loss 11.5441   LearningRate 0.2779   Epoch: 1   Global Step: 17760   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:32:42,578-Speed 5480.80 samples/sec   Loss 11.6094   LearningRate 0.2779   Epoch: 1   Global Step: 17770   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:32:50,143-Speed 5415.11 samples/sec   Loss 11.6666   LearningRate 0.2779   Epoch: 1   Global Step: 17780   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:32:57,603-Speed 5491.11 samples/sec   Loss 11.6631   LearningRate 0.2778   Epoch: 1   Global Step: 17790   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:33:05,067-Speed 5488.43 samples/sec   Loss 11.6854   LearningRate 0.2778   Epoch: 1   Global Step: 17800   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:33:12,605-Speed 5434.67 samples/sec   Loss 11.7289   LearningRate 0.2778   Epoch: 1   Global Step: 17810   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:33:20,151-Speed 5428.67 samples/sec   Loss 11.7011   LearningRate 0.2777   Epoch: 1   Global Step: 17820   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:33:27,854-Speed 5318.32 samples/sec   Loss 11.6350   LearningRate 0.2777   Epoch: 1   Global Step: 17830   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:33:35,418-Speed 5415.00 samples/sec   Loss 11.7098   LearningRate 0.2777   Epoch: 1   Global Step: 17840   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:33:43,038-Speed 5376.66 samples/sec   Loss 11.7254   LearningRate 0.2776   Epoch: 1   Global Step: 17850   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:33:50,708-Speed 5341.43 samples/sec   Loss 11.5757   LearningRate 0.2776   Epoch: 1   Global Step: 17860   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:33:58,285-Speed 5405.95 samples/sec   Loss 11.7531   LearningRate 0.2776   Epoch: 1   Global Step: 17870   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:34:05,814-Speed 5441.01 samples/sec   Loss 11.6541   LearningRate 0.2776   Epoch: 1   Global Step: 17880   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:34:13,404-Speed 5397.81 samples/sec   Loss 11.6639   LearningRate 0.2775   Epoch: 1   Global Step: 17890   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:34:21,009-Speed 5386.12 samples/sec   Loss 11.7013   LearningRate 0.2775   Epoch: 1   Global Step: 17900   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:34:28,460-Speed 5498.68 samples/sec   Loss 11.6154   LearningRate 0.2775   Epoch: 1   Global Step: 17910   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:34:35,896-Speed 5508.19 samples/sec   Loss 11.6060   LearningRate 0.2774   Epoch: 1   Global Step: 17920   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:34:43,437-Speed 5432.77 samples/sec   Loss 11.6715   LearningRate 0.2774   Epoch: 1   Global Step: 17930   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:34:50,935-Speed 5463.68 samples/sec   Loss 11.6054   LearningRate 0.2774   Epoch: 1   Global Step: 17940   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:34:58,413-Speed 5478.49 samples/sec   Loss 11.6115   LearningRate 0.2774   Epoch: 1   Global Step: 17950   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:35:05,897-Speed 5473.35 samples/sec   Loss 11.6630   LearningRate 0.2773   Epoch: 1   Global Step: 17960   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:35:13,363-Speed 5487.35 samples/sec   Loss 11.5352   LearningRate 0.2773   Epoch: 1   Global Step: 17970   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:35:20,923-Speed 5418.74 samples/sec   Loss 11.6419   LearningRate 0.2773   Epoch: 1   Global Step: 17980   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:35:28,342-Speed 5521.41 samples/sec   Loss 11.5856   LearningRate 0.2772   Epoch: 1   Global Step: 17990   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:35:35,799-Speed 5493.57 samples/sec   Loss 11.6032   LearningRate 0.2772   Epoch: 1   Global Step: 18000   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:36:20,900-[lfw][18000]XNorm: 22.731087
Training: 2022-01-07 22:36:20,901-[lfw][18000]Accuracy-Flip: 0.99700+-0.00277
Training: 2022-01-07 22:36:20,902-[lfw][18000]Accuracy-Highest: 0.99700
Training: 2022-01-07 22:37:13,876-[cfp_fp][18000]XNorm: 20.796297
Training: 2022-01-07 22:37:13,887-[cfp_fp][18000]Accuracy-Flip: 0.97357+-0.00668
Training: 2022-01-07 22:37:13,888-[cfp_fp][18000]Accuracy-Highest: 0.97871
Training: 2022-01-07 22:37:59,194-[agedb_30][18000]XNorm: 22.436626
Training: 2022-01-07 22:37:59,196-[agedb_30][18000]Accuracy-Flip: 0.95167+-0.01080
Training: 2022-01-07 22:37:59,196-[agedb_30][18000]Accuracy-Highest: 0.96367
Training: 2022-01-07 22:38:06,735-Speed 271.38 samples/sec   Loss 11.6687   LearningRate 0.2772   Epoch: 1   Global Step: 18010   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:38:14,169-Speed 5510.91 samples/sec   Loss 11.5959   LearningRate 0.2772   Epoch: 1   Global Step: 18020   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:38:21,790-Speed 5376.08 samples/sec   Loss 11.5058   LearningRate 0.2771   Epoch: 1   Global Step: 18030   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:38:29,250-Speed 5492.74 samples/sec   Loss 11.6279   LearningRate 0.2771   Epoch: 1   Global Step: 18040   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:38:36,745-Speed 5465.58 samples/sec   Loss 11.6413   LearningRate 0.2771   Epoch: 1   Global Step: 18050   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:38:44,305-Speed 5418.91 samples/sec   Loss 11.7038   LearningRate 0.2770   Epoch: 1   Global Step: 18060   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:38:51,790-Speed 5473.45 samples/sec   Loss 11.6345   LearningRate 0.2770   Epoch: 1   Global Step: 18070   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:38:59,294-Speed 5458.79 samples/sec   Loss 11.7005   LearningRate 0.2770   Epoch: 1   Global Step: 18080   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:39:06,785-Speed 5468.74 samples/sec   Loss 11.6670   LearningRate 0.2769   Epoch: 1   Global Step: 18090   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:39:14,323-Speed 5434.67 samples/sec   Loss 11.5878   LearningRate 0.2769   Epoch: 1   Global Step: 18100   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:39:21,773-Speed 5499.02 samples/sec   Loss 11.5690   LearningRate 0.2769   Epoch: 1   Global Step: 18110   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:39:29,250-Speed 5478.21 samples/sec   Loss 11.6409   LearningRate 0.2769   Epoch: 1   Global Step: 18120   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:39:36,680-Speed 5513.90 samples/sec   Loss 11.6096   LearningRate 0.2768   Epoch: 1   Global Step: 18130   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:39:44,173-Speed 5467.34 samples/sec   Loss 11.6669   LearningRate 0.2768   Epoch: 1   Global Step: 18140   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:39:51,628-Speed 5495.18 samples/sec   Loss 11.5295   LearningRate 0.2768   Epoch: 1   Global Step: 18150   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:39:59,160-Speed 5438.33 samples/sec   Loss 11.6197   LearningRate 0.2767   Epoch: 1   Global Step: 18160   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:40:06,700-Speed 5433.26 samples/sec   Loss 11.5632   LearningRate 0.2767   Epoch: 1   Global Step: 18170   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:40:14,378-Speed 5335.66 samples/sec   Loss 11.6391   LearningRate 0.2767   Epoch: 1   Global Step: 18180   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:40:21,818-Speed 5506.42 samples/sec   Loss 11.6029   LearningRate 0.2767   Epoch: 1   Global Step: 18190   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:40:29,413-Speed 5393.51 samples/sec   Loss 11.6464   LearningRate 0.2766   Epoch: 1   Global Step: 18200   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:40:36,887-Speed 5480.76 samples/sec   Loss 11.5001   LearningRate 0.2766   Epoch: 1   Global Step: 18210   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:40:44,338-Speed 5498.07 samples/sec   Loss 11.5276   LearningRate 0.2766   Epoch: 1   Global Step: 18220   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:40:51,940-Speed 5389.39 samples/sec   Loss 11.6805   LearningRate 0.2765   Epoch: 1   Global Step: 18230   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:40:59,399-Speed 5491.49 samples/sec   Loss 11.5713   LearningRate 0.2765   Epoch: 1   Global Step: 18240   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:41:06,861-Speed 5490.19 samples/sec   Loss 11.5620   LearningRate 0.2765   Epoch: 1   Global Step: 18250   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:41:14,326-Speed 5487.57 samples/sec   Loss 11.5733   LearningRate 0.2764   Epoch: 1   Global Step: 18260   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:41:21,769-Speed 5504.23 samples/sec   Loss 11.7044   LearningRate 0.2764   Epoch: 1   Global Step: 18270   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:41:29,207-Speed 5507.57 samples/sec   Loss 11.6645   LearningRate 0.2764   Epoch: 1   Global Step: 18280   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:41:36,756-Speed 5426.60 samples/sec   Loss 11.5610   LearningRate 0.2764   Epoch: 1   Global Step: 18290   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:41:44,225-Speed 5484.89 samples/sec   Loss 11.5453   LearningRate 0.2763   Epoch: 1   Global Step: 18300   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:41:51,854-Speed 5369.95 samples/sec   Loss 11.6351   LearningRate 0.2763   Epoch: 1   Global Step: 18310   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:41:59,399-Speed 5429.33 samples/sec   Loss 11.4960   LearningRate 0.2763   Epoch: 1   Global Step: 18320   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:42:06,941-Speed 5431.74 samples/sec   Loss 11.5510   LearningRate 0.2762   Epoch: 1   Global Step: 18330   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:42:14,439-Speed 5462.97 samples/sec   Loss 11.6441   LearningRate 0.2762   Epoch: 1   Global Step: 18340   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:42:21,910-Speed 5484.08 samples/sec   Loss 11.4726   LearningRate 0.2762   Epoch: 1   Global Step: 18350   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:42:29,397-Speed 5471.19 samples/sec   Loss 11.5416   LearningRate 0.2762   Epoch: 1   Global Step: 18360   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:42:36,893-Speed 5464.53 samples/sec   Loss 11.6207   LearningRate 0.2761   Epoch: 1   Global Step: 18370   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:42:44,404-Speed 5453.99 samples/sec   Loss 11.6269   LearningRate 0.2761   Epoch: 1   Global Step: 18380   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:42:51,928-Speed 5444.84 samples/sec   Loss 11.6673   LearningRate 0.2761   Epoch: 1   Global Step: 18390   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:42:59,326-Speed 5537.94 samples/sec   Loss 11.6287   LearningRate 0.2760   Epoch: 1   Global Step: 18400   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:43:06,857-Speed 5439.41 samples/sec   Loss 11.5275   LearningRate 0.2760   Epoch: 1   Global Step: 18410   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:43:14,297-Speed 5506.22 samples/sec   Loss 11.4513   LearningRate 0.2760   Epoch: 1   Global Step: 18420   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:43:21,791-Speed 5466.16 samples/sec   Loss 11.5601   LearningRate 0.2760   Epoch: 1   Global Step: 18430   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:43:29,276-Speed 5473.51 samples/sec   Loss 11.5288   LearningRate 0.2759   Epoch: 1   Global Step: 18440   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:43:36,771-Speed 5465.24 samples/sec   Loss 11.6645   LearningRate 0.2759   Epoch: 1   Global Step: 18450   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:43:44,366-Speed 5393.54 samples/sec   Loss 11.5353   LearningRate 0.2759   Epoch: 1   Global Step: 18460   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:43:51,970-Speed 5387.71 samples/sec   Loss 11.5234   LearningRate 0.2758   Epoch: 1   Global Step: 18470   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:43:59,523-Speed 5424.04 samples/sec   Loss 11.6624   LearningRate 0.2758   Epoch: 1   Global Step: 18480   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:44:07,072-Speed 5425.93 samples/sec   Loss 11.5955   LearningRate 0.2758   Epoch: 1   Global Step: 18490   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:44:14,579-Speed 5457.10 samples/sec   Loss 11.5886   LearningRate 0.2757   Epoch: 1   Global Step: 18500   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:44:22,090-Speed 5454.91 samples/sec   Loss 11.5309   LearningRate 0.2757   Epoch: 1   Global Step: 18510   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:44:29,589-Speed 5462.81 samples/sec   Loss 11.5553   LearningRate 0.2757   Epoch: 1   Global Step: 18520   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:44:37,076-Speed 5470.83 samples/sec   Loss 11.5161   LearningRate 0.2757   Epoch: 1   Global Step: 18530   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:44:44,642-Speed 5415.32 samples/sec   Loss 11.6038   LearningRate 0.2756   Epoch: 1   Global Step: 18540   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:44:52,059-Speed 5522.88 samples/sec   Loss 11.5728   LearningRate 0.2756   Epoch: 1   Global Step: 18550   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:44:59,564-Speed 5459.26 samples/sec   Loss 11.5534   LearningRate 0.2756   Epoch: 1   Global Step: 18560   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:45:07,147-Speed 5401.97 samples/sec   Loss 11.5615   LearningRate 0.2755   Epoch: 1   Global Step: 18570   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:45:14,801-Speed 5352.44 samples/sec   Loss 11.5857   LearningRate 0.2755   Epoch: 1   Global Step: 18580   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:45:22,419-Speed 5377.36 samples/sec   Loss 11.5504   LearningRate 0.2755   Epoch: 1   Global Step: 18590   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:45:29,939-Speed 5447.23 samples/sec   Loss 11.5385   LearningRate 0.2755   Epoch: 1   Global Step: 18600   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:45:37,379-Speed 5506.40 samples/sec   Loss 11.5004   LearningRate 0.2754   Epoch: 1   Global Step: 18610   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:45:44,810-Speed 5513.04 samples/sec   Loss 11.5655   LearningRate 0.2754   Epoch: 1   Global Step: 18620   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:45:52,248-Speed 5507.22 samples/sec   Loss 11.4887   LearningRate 0.2754   Epoch: 1   Global Step: 18630   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:45:59,685-Speed 5508.15 samples/sec   Loss 11.5201   LearningRate 0.2753   Epoch: 1   Global Step: 18640   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:46:07,224-Speed 5434.73 samples/sec   Loss 11.4626   LearningRate 0.2753   Epoch: 1   Global Step: 18650   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:46:14,682-Speed 5492.50 samples/sec   Loss 11.5354   LearningRate 0.2753   Epoch: 1   Global Step: 18660   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:46:22,247-Speed 5414.94 samples/sec   Loss 11.5782   LearningRate 0.2753   Epoch: 1   Global Step: 18670   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:46:29,783-Speed 5436.23 samples/sec   Loss 11.4759   LearningRate 0.2752   Epoch: 1   Global Step: 18680   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:46:37,230-Speed 5501.08 samples/sec   Loss 11.5291   LearningRate 0.2752   Epoch: 1   Global Step: 18690   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:46:44,727-Speed 5464.10 samples/sec   Loss 11.5255   LearningRate 0.2752   Epoch: 1   Global Step: 18700   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:46:52,254-Speed 5442.54 samples/sec   Loss 11.4855   LearningRate 0.2751   Epoch: 1   Global Step: 18710   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:46:59,841-Speed 5399.42 samples/sec   Loss 11.4101   LearningRate 0.2751   Epoch: 1   Global Step: 18720   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:47:07,415-Speed 5409.15 samples/sec   Loss 11.5460   LearningRate 0.2751   Epoch: 1   Global Step: 18730   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:47:14,931-Speed 5449.65 samples/sec   Loss 11.4671   LearningRate 0.2750   Epoch: 1   Global Step: 18740   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:47:22,403-Speed 5483.16 samples/sec   Loss 11.5094   LearningRate 0.2750   Epoch: 1   Global Step: 18750   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:47:29,891-Speed 5470.83 samples/sec   Loss 11.5649   LearningRate 0.2750   Epoch: 1   Global Step: 18760   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:47:37,501-Speed 5382.87 samples/sec   Loss 11.5317   LearningRate 0.2750   Epoch: 1   Global Step: 18770   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:47:45,053-Speed 5424.22 samples/sec   Loss 11.5455   LearningRate 0.2749   Epoch: 1   Global Step: 18780   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:47:52,648-Speed 5393.98 samples/sec   Loss 11.4671   LearningRate 0.2749   Epoch: 1   Global Step: 18790   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:48:00,109-Speed 5490.50 samples/sec   Loss 11.4732   LearningRate 0.2749   Epoch: 1   Global Step: 18800   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:48:07,608-Speed 5463.46 samples/sec   Loss 11.6280   LearningRate 0.2748   Epoch: 1   Global Step: 18810   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:48:15,096-Speed 5470.50 samples/sec   Loss 11.4898   LearningRate 0.2748   Epoch: 1   Global Step: 18820   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:48:22,635-Speed 5433.45 samples/sec   Loss 11.4922   LearningRate 0.2748   Epoch: 1   Global Step: 18830   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:48:30,161-Speed 5442.87 samples/sec   Loss 11.6050   LearningRate 0.2748   Epoch: 1   Global Step: 18840   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:48:37,709-Speed 5427.62 samples/sec   Loss 11.4882   LearningRate 0.2747   Epoch: 1   Global Step: 18850   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:48:45,305-Speed 5392.47 samples/sec   Loss 11.4735   LearningRate 0.2747   Epoch: 1   Global Step: 18860   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:48:52,825-Speed 5447.95 samples/sec   Loss 11.4748   LearningRate 0.2747   Epoch: 1   Global Step: 18870   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:49:00,349-Speed 5444.63 samples/sec   Loss 11.5225   LearningRate 0.2746   Epoch: 1   Global Step: 18880   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:49:07,913-Speed 5415.78 samples/sec   Loss 11.5361   LearningRate 0.2746   Epoch: 1   Global Step: 18890   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:49:15,477-Speed 5415.98 samples/sec   Loss 11.4828   LearningRate 0.2746   Epoch: 1   Global Step: 18900   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:49:23,059-Speed 5402.13 samples/sec   Loss 11.4557   LearningRate 0.2746   Epoch: 1   Global Step: 18910   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:49:30,595-Speed 5436.34 samples/sec   Loss 11.5184   LearningRate 0.2745   Epoch: 1   Global Step: 18920   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:49:38,092-Speed 5464.56 samples/sec   Loss 11.3805   LearningRate 0.2745   Epoch: 1   Global Step: 18930   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:49:45,596-Speed 5459.15 samples/sec   Loss 11.4810   LearningRate 0.2745   Epoch: 1   Global Step: 18940   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:49:53,127-Speed 5439.04 samples/sec   Loss 11.5068   LearningRate 0.2744   Epoch: 1   Global Step: 18950   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:50:00,615-Speed 5471.42 samples/sec   Loss 11.4278   LearningRate 0.2744   Epoch: 1   Global Step: 18960   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:50:08,125-Speed 5454.69 samples/sec   Loss 11.5305   LearningRate 0.2744   Epoch: 1   Global Step: 18970   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:50:15,559-Speed 5510.76 samples/sec   Loss 11.4851   LearningRate 0.2743   Epoch: 1   Global Step: 18980   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:50:23,058-Speed 5462.29 samples/sec   Loss 11.4366   LearningRate 0.2743   Epoch: 1   Global Step: 18990   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:50:30,530-Speed 5482.92 samples/sec   Loss 11.4488   LearningRate 0.2743   Epoch: 1   Global Step: 19000   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:50:38,057-Speed 5441.94 samples/sec   Loss 11.4344   LearningRate 0.2743   Epoch: 1   Global Step: 19010   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:50:45,530-Speed 5482.32 samples/sec   Loss 11.4165   LearningRate 0.2742   Epoch: 1   Global Step: 19020   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:50:53,013-Speed 5474.33 samples/sec   Loss 11.5135   LearningRate 0.2742   Epoch: 1   Global Step: 19030   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:51:00,467-Speed 5496.16 samples/sec   Loss 11.4055   LearningRate 0.2742   Epoch: 1   Global Step: 19040   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:51:07,948-Speed 5475.59 samples/sec   Loss 11.4678   LearningRate 0.2741   Epoch: 1   Global Step: 19050   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:51:15,511-Speed 5416.56 samples/sec   Loss 11.4741   LearningRate 0.2741   Epoch: 1   Global Step: 19060   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:51:23,121-Speed 5383.64 samples/sec   Loss 11.4649   LearningRate 0.2741   Epoch: 1   Global Step: 19070   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:51:30,599-Speed 5477.51 samples/sec   Loss 11.4486   LearningRate 0.2741   Epoch: 1   Global Step: 19080   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:51:38,095-Speed 5465.01 samples/sec   Loss 11.4361   LearningRate 0.2740   Epoch: 1   Global Step: 19090   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:51:45,611-Speed 5450.79 samples/sec   Loss 11.4602   LearningRate 0.2740   Epoch: 1   Global Step: 19100   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:51:53,083-Speed 5482.69 samples/sec   Loss 11.4139   LearningRate 0.2740   Epoch: 1   Global Step: 19110   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:52:00,605-Speed 5446.40 samples/sec   Loss 11.4746   LearningRate 0.2739   Epoch: 1   Global Step: 19120   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:52:08,196-Speed 5396.48 samples/sec   Loss 11.5037   LearningRate 0.2739   Epoch: 1   Global Step: 19130   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:52:15,738-Speed 5431.40 samples/sec   Loss 11.4211   LearningRate 0.2739   Epoch: 1   Global Step: 19140   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:52:23,234-Speed 5465.82 samples/sec   Loss 11.4602   LearningRate 0.2739   Epoch: 1   Global Step: 19150   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:52:30,892-Speed 5348.86 samples/sec   Loss 11.4973   LearningRate 0.2738   Epoch: 1   Global Step: 19160   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:52:38,366-Speed 5480.96 samples/sec   Loss 11.5375   LearningRate 0.2738   Epoch: 1   Global Step: 19170   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:52:45,865-Speed 5462.53 samples/sec   Loss 11.4603   LearningRate 0.2738   Epoch: 1   Global Step: 19180   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:52:53,530-Speed 5344.77 samples/sec   Loss 11.4924   LearningRate 0.2737   Epoch: 1   Global Step: 19190   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:53:00,974-Speed 5503.12 samples/sec   Loss 11.3894   LearningRate 0.2737   Epoch: 1   Global Step: 19200   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:53:08,469-Speed 5465.44 samples/sec   Loss 11.4909   LearningRate 0.2737   Epoch: 1   Global Step: 19210   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:53:16,103-Speed 5366.02 samples/sec   Loss 11.4342   LearningRate 0.2736   Epoch: 1   Global Step: 19220   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:53:23,591-Speed 5470.95 samples/sec   Loss 11.4039   LearningRate 0.2736   Epoch: 1   Global Step: 19230   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:53:31,088-Speed 5464.73 samples/sec   Loss 11.4607   LearningRate 0.2736   Epoch: 1   Global Step: 19240   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:53:38,550-Speed 5489.75 samples/sec   Loss 11.4941   LearningRate 0.2736   Epoch: 1   Global Step: 19250   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:53:45,985-Speed 5508.92 samples/sec   Loss 11.3961   LearningRate 0.2735   Epoch: 1   Global Step: 19260   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:53:53,470-Speed 5473.28 samples/sec   Loss 11.4488   LearningRate 0.2735   Epoch: 1   Global Step: 19270   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:54:00,948-Speed 5478.50 samples/sec   Loss 11.3641   LearningRate 0.2735   Epoch: 1   Global Step: 19280   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:54:08,402-Speed 5495.87 samples/sec   Loss 11.4348   LearningRate 0.2734   Epoch: 1   Global Step: 19290   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:54:15,885-Speed 5474.12 samples/sec   Loss 11.3983   LearningRate 0.2734   Epoch: 1   Global Step: 19300   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:54:23,356-Speed 5483.53 samples/sec   Loss 11.4374   LearningRate 0.2734   Epoch: 1   Global Step: 19310   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:54:30,806-Speed 5498.90 samples/sec   Loss 11.3678   LearningRate 0.2734   Epoch: 1   Global Step: 19320   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:54:38,352-Speed 5428.89 samples/sec   Loss 11.4100   LearningRate 0.2733   Epoch: 1   Global Step: 19330   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:54:45,844-Speed 5467.74 samples/sec   Loss 11.4880   LearningRate 0.2733   Epoch: 1   Global Step: 19340   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:54:53,352-Speed 5455.93 samples/sec   Loss 11.4288   LearningRate 0.2733   Epoch: 1   Global Step: 19350   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:55:00,894-Speed 5431.74 samples/sec   Loss 11.3825   LearningRate 0.2732   Epoch: 1   Global Step: 19360   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:55:08,493-Speed 5391.49 samples/sec   Loss 11.3700   LearningRate 0.2732   Epoch: 1   Global Step: 19370   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:55:15,940-Speed 5500.67 samples/sec   Loss 11.4200   LearningRate 0.2732   Epoch: 1   Global Step: 19380   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:55:23,363-Speed 5518.34 samples/sec   Loss 11.4518   LearningRate 0.2732   Epoch: 1   Global Step: 19390   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 22:55:30,904-Speed 5432.44 samples/sec   Loss 11.4555   LearningRate 0.2731   Epoch: 1   Global Step: 19400   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 22:55:38,400-Speed 5465.40 samples/sec   Loss 11.3698   LearningRate 0.2731   Epoch: 1   Global Step: 19410   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 22:55:45,886-Speed 5471.81 samples/sec   Loss 11.4789   LearningRate 0.2731   Epoch: 1   Global Step: 19420   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 22:55:53,573-Speed 5329.30 samples/sec   Loss 11.3872   LearningRate 0.2730   Epoch: 1   Global Step: 19430   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 22:56:01,180-Speed 5385.26 samples/sec   Loss 11.4274   LearningRate 0.2730   Epoch: 1   Global Step: 19440   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 22:56:08,806-Speed 5372.53 samples/sec   Loss 11.3972   LearningRate 0.2730   Epoch: 1   Global Step: 19450   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 22:56:16,234-Speed 5514.58 samples/sec   Loss 11.3956   LearningRate 0.2730   Epoch: 1   Global Step: 19460   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 22:56:23,752-Speed 5448.81 samples/sec   Loss 11.4424   LearningRate 0.2729   Epoch: 1   Global Step: 19470   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 22:56:31,290-Speed 5434.50 samples/sec   Loss 11.3618   LearningRate 0.2729   Epoch: 1   Global Step: 19480   Fp16 Grad Scale: 32768   Required: 43 hours
Training: 2022-01-07 22:56:38,821-Speed 5439.46 samples/sec   Loss 11.4350   LearningRate 0.2729   Epoch: 1   Global Step: 19490   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:56:46,364-Speed 5431.33 samples/sec   Loss 11.4035   LearningRate 0.2728   Epoch: 1   Global Step: 19500   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:56:53,843-Speed 5477.02 samples/sec   Loss 11.3216   LearningRate 0.2728   Epoch: 1   Global Step: 19510   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:57:01,413-Speed 5411.72 samples/sec   Loss 11.3007   LearningRate 0.2728   Epoch: 1   Global Step: 19520   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:57:08,841-Speed 5514.67 samples/sec   Loss 11.3491   LearningRate 0.2727   Epoch: 1   Global Step: 19530   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:57:16,327-Speed 5472.72 samples/sec   Loss 11.3888   LearningRate 0.2727   Epoch: 1   Global Step: 19540   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:57:23,786-Speed 5491.77 samples/sec   Loss 11.3088   LearningRate 0.2727   Epoch: 1   Global Step: 19550   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:57:31,232-Speed 5501.27 samples/sec   Loss 11.3392   LearningRate 0.2727   Epoch: 1   Global Step: 19560   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:57:38,756-Speed 5444.75 samples/sec   Loss 11.4129   LearningRate 0.2726   Epoch: 1   Global Step: 19570   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:57:46,205-Speed 5500.01 samples/sec   Loss 11.4412   LearningRate 0.2726   Epoch: 1   Global Step: 19580   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 22:57:53,724-Speed 5448.04 samples/sec   Loss 11.3442   LearningRate 0.2726   Epoch: 1   Global Step: 19590   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:58:01,220-Speed 5464.89 samples/sec   Loss 11.3118   LearningRate 0.2725   Epoch: 1   Global Step: 19600   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:58:08,781-Speed 5418.50 samples/sec   Loss 11.4212   LearningRate 0.2725   Epoch: 1   Global Step: 19610   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:58:16,277-Speed 5464.89 samples/sec   Loss 11.2983   LearningRate 0.2725   Epoch: 1   Global Step: 19620   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:58:23,774-Speed 5464.34 samples/sec   Loss 11.4236   LearningRate 0.2725   Epoch: 1   Global Step: 19630   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:58:31,301-Speed 5442.04 samples/sec   Loss 11.3561   LearningRate 0.2724   Epoch: 1   Global Step: 19640   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:58:38,776-Speed 5480.62 samples/sec   Loss 11.3336   LearningRate 0.2724   Epoch: 1   Global Step: 19650   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:58:46,248-Speed 5482.43 samples/sec   Loss 11.4114   LearningRate 0.2724   Epoch: 1   Global Step: 19660   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:58:53,753-Speed 5458.93 samples/sec   Loss 11.3962   LearningRate 0.2723   Epoch: 1   Global Step: 19670   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:59:01,280-Speed 5442.38 samples/sec   Loss 11.3670   LearningRate 0.2723   Epoch: 1   Global Step: 19680   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:59:08,802-Speed 5445.58 samples/sec   Loss 11.4137   LearningRate 0.2723   Epoch: 1   Global Step: 19690   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:59:16,329-Speed 5442.56 samples/sec   Loss 11.3418   LearningRate 0.2723   Epoch: 1   Global Step: 19700   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:59:23,866-Speed 5435.65 samples/sec   Loss 11.3499   LearningRate 0.2722   Epoch: 1   Global Step: 19710   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 22:59:31,392-Speed 5442.87 samples/sec   Loss 11.3578   LearningRate 0.2722   Epoch: 1   Global Step: 19720   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:59:39,017-Speed 5372.39 samples/sec   Loss 11.3200   LearningRate 0.2722   Epoch: 1   Global Step: 19730   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:59:46,577-Speed 5418.98 samples/sec   Loss 11.4350   LearningRate 0.2721   Epoch: 1   Global Step: 19740   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 22:59:54,176-Speed 5390.77 samples/sec   Loss 11.3699   LearningRate 0.2721   Epoch: 1   Global Step: 19750   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:00:01,884-Speed 5314.41 samples/sec   Loss 11.4183   LearningRate 0.2721   Epoch: 1   Global Step: 19760   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:00:09,453-Speed 5412.05 samples/sec   Loss 11.3284   LearningRate 0.2721   Epoch: 1   Global Step: 19770   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:00:17,001-Speed 5428.04 samples/sec   Loss 11.3513   LearningRate 0.2720   Epoch: 1   Global Step: 19780   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:00:24,558-Speed 5420.75 samples/sec   Loss 11.3301   LearningRate 0.2720   Epoch: 1   Global Step: 19790   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:00:32,097-Speed 5433.06 samples/sec   Loss 11.3647   LearningRate 0.2720   Epoch: 1   Global Step: 19800   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:00:39,525-Speed 5515.23 samples/sec   Loss 11.4014   LearningRate 0.2719   Epoch: 1   Global Step: 19810   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:00:47,105-Speed 5404.72 samples/sec   Loss 11.3541   LearningRate 0.2719   Epoch: 1   Global Step: 19820   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:00:54,658-Speed 5423.47 samples/sec   Loss 11.3811   LearningRate 0.2719   Epoch: 1   Global Step: 19830   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:01:02,155-Speed 5463.90 samples/sec   Loss 11.4022   LearningRate 0.2718   Epoch: 1   Global Step: 19840   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:01:09,652-Speed 5464.92 samples/sec   Loss 11.2866   LearningRate 0.2718   Epoch: 1   Global Step: 19850   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:01:17,143-Speed 5468.87 samples/sec   Loss 11.3362   LearningRate 0.2718   Epoch: 1   Global Step: 19860   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:01:24,563-Speed 5520.79 samples/sec   Loss 11.3508   LearningRate 0.2718   Epoch: 1   Global Step: 19870   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:01:32,042-Speed 5476.63 samples/sec   Loss 11.2402   LearningRate 0.2717   Epoch: 1   Global Step: 19880   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:01:39,631-Speed 5398.12 samples/sec   Loss 11.2721   LearningRate 0.2717   Epoch: 1   Global Step: 19890   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:01:47,107-Speed 5479.88 samples/sec   Loss 11.3614   LearningRate 0.2717   Epoch: 1   Global Step: 19900   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:01:54,648-Speed 5432.52 samples/sec   Loss 11.3585   LearningRate 0.2716   Epoch: 1   Global Step: 19910   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:02:02,157-Speed 5454.96 samples/sec   Loss 11.3435   LearningRate 0.2716   Epoch: 1   Global Step: 19920   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:02:09,635-Speed 5479.43 samples/sec   Loss 11.4228   LearningRate 0.2716   Epoch: 1   Global Step: 19930   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:02:17,144-Speed 5455.13 samples/sec   Loss 11.3351   LearningRate 0.2716   Epoch: 1   Global Step: 19940   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:02:24,637-Speed 5467.37 samples/sec   Loss 11.3418   LearningRate 0.2715   Epoch: 1   Global Step: 19950   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:02:32,153-Speed 5449.73 samples/sec   Loss 11.3319   LearningRate 0.2715   Epoch: 1   Global Step: 19960   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:02:39,624-Speed 5484.09 samples/sec   Loss 11.3341   LearningRate 0.2715   Epoch: 1   Global Step: 19970   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:02:47,086-Speed 5490.03 samples/sec   Loss 11.3460   LearningRate 0.2714   Epoch: 1   Global Step: 19980   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:02:54,575-Speed 5469.83 samples/sec   Loss 11.3015   LearningRate 0.2714   Epoch: 1   Global Step: 19990   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:03:02,047-Speed 5481.58 samples/sec   Loss 11.3117   LearningRate 0.2714   Epoch: 1   Global Step: 20000   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:03:47,077-[lfw][20000]XNorm: 22.719998
Training: 2022-01-07 23:03:47,078-[lfw][20000]Accuracy-Flip: 0.99617+-0.00299
Training: 2022-01-07 23:03:47,078-[lfw][20000]Accuracy-Highest: 0.99700
Training: 2022-01-07 23:04:39,975-[cfp_fp][20000]XNorm: 20.118125
Training: 2022-01-07 23:04:39,976-[cfp_fp][20000]Accuracy-Flip: 0.97986+-0.00501
Training: 2022-01-07 23:04:39,977-[cfp_fp][20000]Accuracy-Highest: 0.97986
Training: 2022-01-07 23:05:25,332-[agedb_30][20000]XNorm: 22.496124
Training: 2022-01-07 23:05:25,333-[agedb_30][20000]Accuracy-Flip: 0.96583+-0.00761
Training: 2022-01-07 23:05:25,334-[agedb_30][20000]Accuracy-Highest: 0.96583
Training: 2022-01-07 23:05:32,906-Speed 271.52 samples/sec   Loss 11.3580   LearningRate 0.2714   Epoch: 1   Global Step: 20010   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:05:40,405-Speed 5464.26 samples/sec   Loss 11.3398   LearningRate 0.2713   Epoch: 1   Global Step: 20020   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:05:48,023-Speed 5377.30 samples/sec   Loss 11.3315   LearningRate 0.2713   Epoch: 1   Global Step: 20030   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:05:55,575-Speed 5425.39 samples/sec   Loss 11.2960   LearningRate 0.2713   Epoch: 1   Global Step: 20040   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:06:03,165-Speed 5397.27 samples/sec   Loss 11.2791   LearningRate 0.2712   Epoch: 1   Global Step: 20050   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:06:10,621-Speed 5495.30 samples/sec   Loss 11.2852   LearningRate 0.2712   Epoch: 1   Global Step: 20060   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:06:18,182-Speed 5418.45 samples/sec   Loss 11.2641   LearningRate 0.2712   Epoch: 1   Global Step: 20070   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:06:25,745-Speed 5416.47 samples/sec   Loss 11.3550   LearningRate 0.2712   Epoch: 1   Global Step: 20080   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:06:33,260-Speed 5451.72 samples/sec   Loss 11.2663   LearningRate 0.2711   Epoch: 1   Global Step: 20090   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:06:40,834-Speed 5408.95 samples/sec   Loss 11.3102   LearningRate 0.2711   Epoch: 1   Global Step: 20100   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:06:48,295-Speed 5491.72 samples/sec   Loss 11.2876   LearningRate 0.2711   Epoch: 1   Global Step: 20110   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:06:55,785-Speed 5469.26 samples/sec   Loss 11.3519   LearningRate 0.2710   Epoch: 1   Global Step: 20120   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:07:03,334-Speed 5426.91 samples/sec   Loss 11.2656   LearningRate 0.2710   Epoch: 1   Global Step: 20130   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:07:10,800-Speed 5487.15 samples/sec   Loss 11.3329   LearningRate 0.2710   Epoch: 1   Global Step: 20140   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:07:18,423-Speed 5374.77 samples/sec   Loss 11.2630   LearningRate 0.2710   Epoch: 1   Global Step: 20150   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:07:25,919-Speed 5465.43 samples/sec   Loss 11.3111   LearningRate 0.2709   Epoch: 1   Global Step: 20160   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:07:33,474-Speed 5422.83 samples/sec   Loss 11.3854   LearningRate 0.2709   Epoch: 1   Global Step: 20170   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 23:07:41,142-Speed 5342.97 samples/sec   Loss 11.2381   LearningRate 0.2709   Epoch: 1   Global Step: 20180   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:07:48,714-Speed 5410.12 samples/sec   Loss 11.3156   LearningRate 0.2708   Epoch: 1   Global Step: 20190   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:07:56,327-Speed 5381.31 samples/sec   Loss 11.3037   LearningRate 0.2708   Epoch: 1   Global Step: 20200   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:08:03,884-Speed 5421.21 samples/sec   Loss 11.3119   LearningRate 0.2708   Epoch: 1   Global Step: 20210   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:08:11,030-Speed 5733.07 samples/sec   Loss 11.2783   LearningRate 0.2707   Epoch: 1   Global Step: 20220   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:08:18,436-Speed 5532.37 samples/sec   Loss 11.2766   LearningRate 0.2707   Epoch: 1   Global Step: 20230   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:08:25,926-Speed 5469.05 samples/sec   Loss 11.3240   LearningRate 0.2707   Epoch: 1   Global Step: 20240   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:08:33,467-Speed 5432.89 samples/sec   Loss 11.3496   LearningRate 0.2707   Epoch: 1   Global Step: 20250   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:08:41,027-Speed 5418.70 samples/sec   Loss 11.3046   LearningRate 0.2706   Epoch: 1   Global Step: 20260   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:08:48,433-Speed 5532.17 samples/sec   Loss 11.3640   LearningRate 0.2706   Epoch: 1   Global Step: 20270   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:08:55,698-Speed 5639.59 samples/sec   Loss 11.2480   LearningRate 0.2706   Epoch: 1   Global Step: 20280   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 23:09:03,148-Speed 5499.32 samples/sec   Loss 11.2973   LearningRate 0.2705   Epoch: 1   Global Step: 20290   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:09:10,621-Speed 5481.72 samples/sec   Loss 11.2669   LearningRate 0.2705   Epoch: 1   Global Step: 20300   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:09:18,153-Speed 5439.45 samples/sec   Loss 11.2557   LearningRate 0.2705   Epoch: 1   Global Step: 20310   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:09:25,681-Speed 5442.87 samples/sec   Loss 11.2432   LearningRate 0.2705   Epoch: 1   Global Step: 20320   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:09:33,137-Speed 5494.26 samples/sec   Loss 11.2446   LearningRate 0.2704   Epoch: 1   Global Step: 20330   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:09:40,694-Speed 5421.70 samples/sec   Loss 11.2161   LearningRate 0.2704   Epoch: 1   Global Step: 20340   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:09:48,127-Speed 5511.01 samples/sec   Loss 11.2089   LearningRate 0.2704   Epoch: 1   Global Step: 20350   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:09:55,494-Speed 5561.70 samples/sec   Loss 11.2207   LearningRate 0.2703   Epoch: 1   Global Step: 20360   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:10:02,996-Speed 5460.78 samples/sec   Loss 11.2980   LearningRate 0.2703   Epoch: 1   Global Step: 20370   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:10:10,444-Speed 5500.95 samples/sec   Loss 11.2715   LearningRate 0.2703   Epoch: 1   Global Step: 20380   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:10:17,853-Speed 5529.62 samples/sec   Loss 11.3481   LearningRate 0.2703   Epoch: 1   Global Step: 20390   Fp16 Grad Scale: 262144   Required: 43 hours
Training: 2022-01-07 23:10:25,308-Speed 5495.78 samples/sec   Loss 11.3243   LearningRate 0.2702   Epoch: 1   Global Step: 20400   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:10:32,763-Speed 5495.94 samples/sec   Loss 11.2270   LearningRate 0.2702   Epoch: 1   Global Step: 20410   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:10:40,340-Speed 5406.62 samples/sec   Loss 11.2360   LearningRate 0.2702   Epoch: 1   Global Step: 20420   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:10:47,826-Speed 5472.54 samples/sec   Loss 11.2690   LearningRate 0.2701   Epoch: 1   Global Step: 20430   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:10:55,262-Speed 5509.57 samples/sec   Loss 11.3234   LearningRate 0.2701   Epoch: 1   Global Step: 20440   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:11:02,804-Speed 5432.24 samples/sec   Loss 11.2100   LearningRate 0.2701   Epoch: 1   Global Step: 20450   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:11:10,510-Speed 5316.33 samples/sec   Loss 11.2470   LearningRate 0.2701   Epoch: 1   Global Step: 20460   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:11:18,118-Speed 5384.80 samples/sec   Loss 11.2144   LearningRate 0.2700   Epoch: 1   Global Step: 20470   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:11:25,658-Speed 5433.32 samples/sec   Loss 11.2968   LearningRate 0.2700   Epoch: 1   Global Step: 20480   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:11:33,215-Speed 5421.32 samples/sec   Loss 11.2519   LearningRate 0.2700   Epoch: 1   Global Step: 20490   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:11:40,728-Speed 5453.40 samples/sec   Loss 11.2700   LearningRate 0.2699   Epoch: 1   Global Step: 20500   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:11:48,355-Speed 5371.22 samples/sec   Loss 11.3111   LearningRate 0.2699   Epoch: 1   Global Step: 20510   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:11:56,027-Speed 5339.93 samples/sec   Loss 11.2574   LearningRate 0.2699   Epoch: 1   Global Step: 20520   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:12:03,533-Speed 5458.14 samples/sec   Loss 11.2552   LearningRate 0.2699   Epoch: 1   Global Step: 20530   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:12:10,980-Speed 5501.73 samples/sec   Loss 11.2817   LearningRate 0.2698   Epoch: 1   Global Step: 20540   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:12:18,606-Speed 5372.25 samples/sec   Loss 11.1893   LearningRate 0.2698   Epoch: 1   Global Step: 20550   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:12:26,099-Speed 5467.41 samples/sec   Loss 11.2846   LearningRate 0.2698   Epoch: 1   Global Step: 20560   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:12:33,588-Speed 5470.68 samples/sec   Loss 11.2810   LearningRate 0.2697   Epoch: 1   Global Step: 20570   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:12:41,005-Speed 5523.31 samples/sec   Loss 11.3462   LearningRate 0.2697   Epoch: 1   Global Step: 20580   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:12:48,454-Speed 5500.21 samples/sec   Loss 11.1859   LearningRate 0.2697   Epoch: 1   Global Step: 20590   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:12:55,914-Speed 5491.91 samples/sec   Loss 11.2664   LearningRate 0.2697   Epoch: 1   Global Step: 20600   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:13:03,363-Speed 5499.14 samples/sec   Loss 11.2259   LearningRate 0.2696   Epoch: 1   Global Step: 20610   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:13:10,895-Speed 5439.65 samples/sec   Loss 11.2159   LearningRate 0.2696   Epoch: 1   Global Step: 20620   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:13:18,541-Speed 5358.23 samples/sec   Loss 11.2825   LearningRate 0.2696   Epoch: 1   Global Step: 20630   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:13:26,011-Speed 5484.11 samples/sec   Loss 11.1912   LearningRate 0.2695   Epoch: 1   Global Step: 20640   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:13:33,481-Speed 5484.42 samples/sec   Loss 11.1943   LearningRate 0.2695   Epoch: 1   Global Step: 20650   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:13:40,934-Speed 5497.00 samples/sec   Loss 11.2139   LearningRate 0.2695   Epoch: 1   Global Step: 20660   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:13:48,600-Speed 5344.66 samples/sec   Loss 11.2116   LearningRate 0.2694   Epoch: 1   Global Step: 20670   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:13:56,092-Speed 5468.14 samples/sec   Loss 11.1590   LearningRate 0.2694   Epoch: 1   Global Step: 20680   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:14:03,559-Speed 5486.86 samples/sec   Loss 11.2240   LearningRate 0.2694   Epoch: 1   Global Step: 20690   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:14:11,120-Speed 5418.30 samples/sec   Loss 11.1120   LearningRate 0.2694   Epoch: 1   Global Step: 20700   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:14:18,565-Speed 5503.06 samples/sec   Loss 11.1850   LearningRate 0.2693   Epoch: 1   Global Step: 20710   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:14:26,080-Speed 5451.24 samples/sec   Loss 11.1698   LearningRate 0.2693   Epoch: 1   Global Step: 20720   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:14:33,568-Speed 5471.19 samples/sec   Loss 11.2548   LearningRate 0.2693   Epoch: 1   Global Step: 20730   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:14:40,985-Speed 5523.92 samples/sec   Loss 11.1771   LearningRate 0.2692   Epoch: 1   Global Step: 20740   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:15:04,074-Speed 1774.14 samples/sec   Loss 11.2344   LearningRate 0.2692   Epoch: 2   Global Step: 20750   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:15:11,537-Speed 5490.17 samples/sec   Loss 11.1909   LearningRate 0.2692   Epoch: 2   Global Step: 20760   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:15:19,074-Speed 5435.66 samples/sec   Loss 11.2265   LearningRate 0.2692   Epoch: 2   Global Step: 20770   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:15:26,759-Speed 5331.02 samples/sec   Loss 11.2276   LearningRate 0.2691   Epoch: 2   Global Step: 20780   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:15:34,359-Speed 5390.40 samples/sec   Loss 11.2022   LearningRate 0.2691   Epoch: 2   Global Step: 20790   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 23:15:41,821-Speed 5490.39 samples/sec   Loss 11.1431   LearningRate 0.2691   Epoch: 2   Global Step: 20800   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:15:49,211-Speed 5543.47 samples/sec   Loss 11.2018   LearningRate 0.2690   Epoch: 2   Global Step: 20810   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:15:56,652-Speed 5506.02 samples/sec   Loss 11.2193   LearningRate 0.2690   Epoch: 2   Global Step: 20820   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:16:04,108-Speed 5495.58 samples/sec   Loss 11.1743   LearningRate 0.2690   Epoch: 2   Global Step: 20830   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:16:11,585-Speed 5479.37 samples/sec   Loss 11.1841   LearningRate 0.2690   Epoch: 2   Global Step: 20840   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:16:19,052-Speed 5486.06 samples/sec   Loss 11.1986   LearningRate 0.2689   Epoch: 2   Global Step: 20850   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:16:26,500-Speed 5499.93 samples/sec   Loss 11.1687   LearningRate 0.2689   Epoch: 2   Global Step: 20860   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:16:33,924-Speed 5517.90 samples/sec   Loss 11.1680   LearningRate 0.2689   Epoch: 2   Global Step: 20870   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:16:41,366-Speed 5504.88 samples/sec   Loss 11.1903   LearningRate 0.2688   Epoch: 2   Global Step: 20880   Fp16 Grad Scale: 131072   Required: 43 hours
Training: 2022-01-07 23:16:48,779-Speed 5526.28 samples/sec   Loss 11.1951   LearningRate 0.2688   Epoch: 2   Global Step: 20890   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:16:56,270-Speed 5468.66 samples/sec   Loss 11.1634   LearningRate 0.2688   Epoch: 2   Global Step: 20900   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:17:03,836-Speed 5414.32 samples/sec   Loss 11.1071   LearningRate 0.2688   Epoch: 2   Global Step: 20910   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:17:11,401-Speed 5415.75 samples/sec   Loss 11.1754   LearningRate 0.2687   Epoch: 2   Global Step: 20920   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:17:18,943-Speed 5431.37 samples/sec   Loss 11.2178   LearningRate 0.2687   Epoch: 2   Global Step: 20930   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:17:26,409-Speed 5486.86 samples/sec   Loss 11.1054   LearningRate 0.2687   Epoch: 2   Global Step: 20940   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:17:33,926-Speed 5449.98 samples/sec   Loss 11.1838   LearningRate 0.2686   Epoch: 2   Global Step: 20950   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:17:41,458-Speed 5438.63 samples/sec   Loss 11.2068   LearningRate 0.2686   Epoch: 2   Global Step: 20960   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:17:48,955-Speed 5464.18 samples/sec   Loss 11.1927   LearningRate 0.2686   Epoch: 2   Global Step: 20970   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:17:56,431-Speed 5479.50 samples/sec   Loss 11.2208   LearningRate 0.2686   Epoch: 2   Global Step: 20980   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:18:04,041-Speed 5383.24 samples/sec   Loss 11.1227   LearningRate 0.2685   Epoch: 2   Global Step: 20990   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:18:11,561-Speed 5447.41 samples/sec   Loss 11.2397   LearningRate 0.2685   Epoch: 2   Global Step: 21000   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:18:19,119-Speed 5420.61 samples/sec   Loss 11.1767   LearningRate 0.2685   Epoch: 2   Global Step: 21010   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:18:26,632-Speed 5452.78 samples/sec   Loss 11.2045   LearningRate 0.2684   Epoch: 2   Global Step: 21020   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:18:34,190-Speed 5419.94 samples/sec   Loss 11.2293   LearningRate 0.2684   Epoch: 2   Global Step: 21030   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:18:41,709-Speed 5448.62 samples/sec   Loss 11.2026   LearningRate 0.2684   Epoch: 2   Global Step: 21040   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:18:49,185-Speed 5479.43 samples/sec   Loss 11.1103   LearningRate 0.2684   Epoch: 2   Global Step: 21050   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:18:56,652-Speed 5486.31 samples/sec   Loss 11.2185   LearningRate 0.2683   Epoch: 2   Global Step: 21060   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:19:04,108-Speed 5494.54 samples/sec   Loss 11.1168   LearningRate 0.2683   Epoch: 2   Global Step: 21070   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:19:11,582-Speed 5480.84 samples/sec   Loss 11.1028   LearningRate 0.2683   Epoch: 2   Global Step: 21080   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:19:19,109-Speed 5442.65 samples/sec   Loss 11.1265   LearningRate 0.2682   Epoch: 2   Global Step: 21090   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:19:26,562-Speed 5496.16 samples/sec   Loss 11.1532   LearningRate 0.2682   Epoch: 2   Global Step: 21100   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:19:34,019-Speed 5493.88 samples/sec   Loss 11.0977   LearningRate 0.2682   Epoch: 2   Global Step: 21110   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:19:41,497-Speed 5477.77 samples/sec   Loss 11.0692   LearningRate 0.2682   Epoch: 2   Global Step: 21120   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:19:48,996-Speed 5462.78 samples/sec   Loss 11.0748   LearningRate 0.2681   Epoch: 2   Global Step: 21130   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:19:56,444-Speed 5500.67 samples/sec   Loss 11.1107   LearningRate 0.2681   Epoch: 2   Global Step: 21140   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:20:03,868-Speed 5517.53 samples/sec   Loss 11.1992   LearningRate 0.2681   Epoch: 2   Global Step: 21150   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:20:11,300-Speed 5512.45 samples/sec   Loss 11.0954   LearningRate 0.2680   Epoch: 2   Global Step: 21160   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:20:18,756-Speed 5494.18 samples/sec   Loss 11.1986   LearningRate 0.2680   Epoch: 2   Global Step: 21170   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:20:26,242-Speed 5473.02 samples/sec   Loss 11.1239   LearningRate 0.2680   Epoch: 2   Global Step: 21180   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:20:33,732-Speed 5468.84 samples/sec   Loss 11.1371   LearningRate 0.2679   Epoch: 2   Global Step: 21190   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:20:41,163-Speed 5512.81 samples/sec   Loss 11.1509   LearningRate 0.2679   Epoch: 2   Global Step: 21200   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:20:48,585-Speed 5519.91 samples/sec   Loss 11.1505   LearningRate 0.2679   Epoch: 2   Global Step: 21210   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:20:56,136-Speed 5425.03 samples/sec   Loss 11.1121   LearningRate 0.2679   Epoch: 2   Global Step: 21220   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:21:03,556-Speed 5521.03 samples/sec   Loss 11.1761   LearningRate 0.2678   Epoch: 2   Global Step: 21230   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:21:10,985-Speed 5514.14 samples/sec   Loss 11.1769   LearningRate 0.2678   Epoch: 2   Global Step: 21240   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:21:18,530-Speed 5430.04 samples/sec   Loss 11.1851   LearningRate 0.2678   Epoch: 2   Global Step: 21250   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:21:25,965-Speed 5509.58 samples/sec   Loss 11.1297   LearningRate 0.2677   Epoch: 2   Global Step: 21260   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:21:33,383-Speed 5522.42 samples/sec   Loss 11.1595   LearningRate 0.2677   Epoch: 2   Global Step: 21270   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:21:40,796-Speed 5525.83 samples/sec   Loss 11.1465   LearningRate 0.2677   Epoch: 2   Global Step: 21280   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:21:48,227-Speed 5513.37 samples/sec   Loss 11.1575   LearningRate 0.2677   Epoch: 2   Global Step: 21290   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:21:55,710-Speed 5474.64 samples/sec   Loss 11.1527   LearningRate 0.2676   Epoch: 2   Global Step: 21300   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:22:03,309-Speed 5390.07 samples/sec   Loss 11.1026   LearningRate 0.2676   Epoch: 2   Global Step: 21310   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:22:10,798-Speed 5470.32 samples/sec   Loss 11.1308   LearningRate 0.2676   Epoch: 2   Global Step: 21320   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:22:18,329-Speed 5439.74 samples/sec   Loss 11.0686   LearningRate 0.2675   Epoch: 2   Global Step: 21330   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:22:25,779-Speed 5498.74 samples/sec   Loss 11.1765   LearningRate 0.2675   Epoch: 2   Global Step: 21340   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:22:33,207-Speed 5514.98 samples/sec   Loss 11.0577   LearningRate 0.2675   Epoch: 2   Global Step: 21350   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:22:40,687-Speed 5476.30 samples/sec   Loss 11.0631   LearningRate 0.2675   Epoch: 2   Global Step: 21360   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:22:48,085-Speed 5537.54 samples/sec   Loss 11.1136   LearningRate 0.2674   Epoch: 2   Global Step: 21370   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:22:55,525-Speed 5506.17 samples/sec   Loss 11.1260   LearningRate 0.2674   Epoch: 2   Global Step: 21380   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:23:02,924-Speed 5536.69 samples/sec   Loss 11.0848   LearningRate 0.2674   Epoch: 2   Global Step: 21390   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:23:10,408-Speed 5473.19 samples/sec   Loss 11.1314   LearningRate 0.2673   Epoch: 2   Global Step: 21400   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:23:17,839-Speed 5513.54 samples/sec   Loss 11.1276   LearningRate 0.2673   Epoch: 2   Global Step: 21410   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:23:25,398-Speed 5419.62 samples/sec   Loss 11.1029   LearningRate 0.2673   Epoch: 2   Global Step: 21420   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:23:32,859-Speed 5490.45 samples/sec   Loss 11.1373   LearningRate 0.2673   Epoch: 2   Global Step: 21430   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:23:40,272-Speed 5525.82 samples/sec   Loss 11.1771   LearningRate 0.2672   Epoch: 2   Global Step: 21440   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:23:47,814-Speed 5431.91 samples/sec   Loss 11.0597   LearningRate 0.2672   Epoch: 2   Global Step: 21450   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:23:55,245-Speed 5513.01 samples/sec   Loss 11.1414   LearningRate 0.2672   Epoch: 2   Global Step: 21460   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:24:02,647-Speed 5534.03 samples/sec   Loss 11.0790   LearningRate 0.2671   Epoch: 2   Global Step: 21470   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:24:10,108-Speed 5490.58 samples/sec   Loss 11.0864   LearningRate 0.2671   Epoch: 2   Global Step: 21480   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:24:17,539-Speed 5513.00 samples/sec   Loss 11.1491   LearningRate 0.2671   Epoch: 2   Global Step: 21490   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:24:25,013-Speed 5481.67 samples/sec   Loss 11.1665   LearningRate 0.2671   Epoch: 2   Global Step: 21500   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:24:32,490-Speed 5478.32 samples/sec   Loss 11.1332   LearningRate 0.2670   Epoch: 2   Global Step: 21510   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:24:39,970-Speed 5476.78 samples/sec   Loss 11.0830   LearningRate 0.2670   Epoch: 2   Global Step: 21520   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:24:47,376-Speed 5531.39 samples/sec   Loss 11.1303   LearningRate 0.2670   Epoch: 2   Global Step: 21530   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:24:55,864-Speed 4826.44 samples/sec   Loss 11.1614   LearningRate 0.2669   Epoch: 2   Global Step: 21540   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:25:03,313-Speed 5499.20 samples/sec   Loss 11.1160   LearningRate 0.2669   Epoch: 2   Global Step: 21550   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:25:10,753-Speed 5506.53 samples/sec   Loss 11.0824   LearningRate 0.2669   Epoch: 2   Global Step: 21560   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:25:18,190-Speed 5508.31 samples/sec   Loss 11.1342   LearningRate 0.2669   Epoch: 2   Global Step: 21570   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:25:25,585-Speed 5539.37 samples/sec   Loss 11.0231   LearningRate 0.2668   Epoch: 2   Global Step: 21580   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:25:33,044-Speed 5492.52 samples/sec   Loss 11.1483   LearningRate 0.2668   Epoch: 2   Global Step: 21590   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:25:40,473-Speed 5513.82 samples/sec   Loss 11.0752   LearningRate 0.2668   Epoch: 2   Global Step: 21600   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:25:47,961-Speed 5471.14 samples/sec   Loss 11.1516   LearningRate 0.2667   Epoch: 2   Global Step: 21610   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:25:55,404-Speed 5503.52 samples/sec   Loss 11.1113   LearningRate 0.2667   Epoch: 2   Global Step: 21620   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:26:02,838-Speed 5510.54 samples/sec   Loss 11.1186   LearningRate 0.2667   Epoch: 2   Global Step: 21630   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:26:10,254-Speed 5524.70 samples/sec   Loss 11.1081   LearningRate 0.2667   Epoch: 2   Global Step: 21640   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:26:17,772-Speed 5448.96 samples/sec   Loss 11.1352   LearningRate 0.2666   Epoch: 2   Global Step: 21650   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:26:25,209-Speed 5507.98 samples/sec   Loss 11.1775   LearningRate 0.2666   Epoch: 2   Global Step: 21660   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:26:32,678-Speed 5484.87 samples/sec   Loss 11.1357   LearningRate 0.2666   Epoch: 2   Global Step: 21670   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:26:40,118-Speed 5506.64 samples/sec   Loss 11.1749   LearningRate 0.2665   Epoch: 2   Global Step: 21680   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:26:47,671-Speed 5423.21 samples/sec   Loss 11.0567   LearningRate 0.2665   Epoch: 2   Global Step: 21690   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:26:55,296-Speed 5372.75 samples/sec   Loss 11.1511   LearningRate 0.2665   Epoch: 2   Global Step: 21700   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:27:02,864-Speed 5412.85 samples/sec   Loss 11.1755   LearningRate 0.2665   Epoch: 2   Global Step: 21710   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:27:10,445-Speed 5404.64 samples/sec   Loss 11.1895   LearningRate 0.2664   Epoch: 2   Global Step: 21720   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:27:17,873-Speed 5514.43 samples/sec   Loss 11.0780   LearningRate 0.2664   Epoch: 2   Global Step: 21730   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:27:25,328-Speed 5494.85 samples/sec   Loss 11.0504   LearningRate 0.2664   Epoch: 2   Global Step: 21740   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:27:32,810-Speed 5475.32 samples/sec   Loss 11.0496   LearningRate 0.2663   Epoch: 2   Global Step: 21750   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:27:40,356-Speed 5429.26 samples/sec   Loss 11.0780   LearningRate 0.2663   Epoch: 2   Global Step: 21760   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:27:47,874-Speed 5448.75 samples/sec   Loss 11.1490   LearningRate 0.2663   Epoch: 2   Global Step: 21770   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:27:55,340-Speed 5486.92 samples/sec   Loss 11.0733   LearningRate 0.2663   Epoch: 2   Global Step: 21780   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:28:02,762-Speed 5519.28 samples/sec   Loss 11.1463   LearningRate 0.2662   Epoch: 2   Global Step: 21790   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:28:10,176-Speed 5526.22 samples/sec   Loss 11.0484   LearningRate 0.2662   Epoch: 2   Global Step: 21800   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:28:17,589-Speed 5525.79 samples/sec   Loss 11.0198   LearningRate 0.2662   Epoch: 2   Global Step: 21810   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:28:25,204-Speed 5379.74 samples/sec   Loss 11.0803   LearningRate 0.2661   Epoch: 2   Global Step: 21820   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:28:32,718-Speed 5451.25 samples/sec   Loss 11.0680   LearningRate 0.2661   Epoch: 2   Global Step: 21830   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:28:40,184-Speed 5487.42 samples/sec   Loss 11.1270   LearningRate 0.2661   Epoch: 2   Global Step: 21840   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:28:47,606-Speed 5519.32 samples/sec   Loss 11.1066   LearningRate 0.2661   Epoch: 2   Global Step: 21850   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:28:55,018-Speed 5527.33 samples/sec   Loss 11.0477   LearningRate 0.2660   Epoch: 2   Global Step: 21860   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:29:02,462-Speed 5502.68 samples/sec   Loss 11.0503   LearningRate 0.2660   Epoch: 2   Global Step: 21870   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:29:09,983-Speed 5447.76 samples/sec   Loss 11.0110   LearningRate 0.2660   Epoch: 2   Global Step: 21880   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:29:17,433-Speed 5498.04 samples/sec   Loss 11.0111   LearningRate 0.2659   Epoch: 2   Global Step: 21890   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:29:24,908-Speed 5480.59 samples/sec   Loss 11.0166   LearningRate 0.2659   Epoch: 2   Global Step: 21900   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:29:32,353-Speed 5502.14 samples/sec   Loss 11.0615   LearningRate 0.2659   Epoch: 2   Global Step: 21910   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:29:39,805-Speed 5498.00 samples/sec   Loss 11.0283   LearningRate 0.2659   Epoch: 2   Global Step: 21920   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:29:47,219-Speed 5525.37 samples/sec   Loss 11.0350   LearningRate 0.2658   Epoch: 2   Global Step: 21930   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:29:54,666-Speed 5500.41 samples/sec   Loss 11.0288   LearningRate 0.2658   Epoch: 2   Global Step: 21940   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:30:02,154-Speed 5471.57 samples/sec   Loss 11.0077   LearningRate 0.2658   Epoch: 2   Global Step: 21950   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:30:09,562-Speed 5529.84 samples/sec   Loss 11.0629   LearningRate 0.2657   Epoch: 2   Global Step: 21960   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:30:16,961-Speed 5536.65 samples/sec   Loss 11.0466   LearningRate 0.2657   Epoch: 2   Global Step: 21970   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:30:24,384-Speed 5518.71 samples/sec   Loss 10.9941   LearningRate 0.2657   Epoch: 2   Global Step: 21980   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:30:31,795-Speed 5527.30 samples/sec   Loss 11.0680   LearningRate 0.2657   Epoch: 2   Global Step: 21990   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:30:39,231-Speed 5510.05 samples/sec   Loss 11.1854   LearningRate 0.2656   Epoch: 2   Global Step: 22000   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:31:23,985-[lfw][22000]XNorm: 23.268626
Training: 2022-01-07 23:31:23,986-[lfw][22000]Accuracy-Flip: 0.99667+-0.00307
Training: 2022-01-07 23:31:23,987-[lfw][22000]Accuracy-Highest: 0.99700
Training: 2022-01-07 23:32:17,662-[cfp_fp][22000]XNorm: 20.725033
Training: 2022-01-07 23:32:17,663-[cfp_fp][22000]Accuracy-Flip: 0.97686+-0.01000
Training: 2022-01-07 23:32:17,664-[cfp_fp][22000]Accuracy-Highest: 0.97986
Training: 2022-01-07 23:33:03,455-[agedb_30][22000]XNorm: 22.970369
Training: 2022-01-07 23:33:03,456-[agedb_30][22000]Accuracy-Flip: 0.96883+-0.00658
Training: 2022-01-07 23:33:03,457-[agedb_30][22000]Accuracy-Highest: 0.96883
Training: 2022-01-07 23:33:10,883-Speed 270.09 samples/sec   Loss 10.9078   LearningRate 0.2656   Epoch: 2   Global Step: 22010   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:33:18,289-Speed 5532.68 samples/sec   Loss 11.0790   LearningRate 0.2656   Epoch: 2   Global Step: 22020   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:33:25,812-Speed 5446.27 samples/sec   Loss 11.0612   LearningRate 0.2655   Epoch: 2   Global Step: 22030   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:33:33,235-Speed 5519.43 samples/sec   Loss 11.0414   LearningRate 0.2655   Epoch: 2   Global Step: 22040   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:33:40,749-Speed 5452.34 samples/sec   Loss 11.0148   LearningRate 0.2655   Epoch: 2   Global Step: 22050   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:33:48,176-Speed 5516.06 samples/sec   Loss 11.0341   LearningRate 0.2655   Epoch: 2   Global Step: 22060   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:33:55,621-Speed 5503.04 samples/sec   Loss 11.0944   LearningRate 0.2654   Epoch: 2   Global Step: 22070   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:34:03,128-Speed 5457.06 samples/sec   Loss 11.0128   LearningRate 0.2654   Epoch: 2   Global Step: 22080   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:34:10,643-Speed 5451.15 samples/sec   Loss 10.9913   LearningRate 0.2654   Epoch: 2   Global Step: 22090   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:34:18,238-Speed 5393.15 samples/sec   Loss 10.9683   LearningRate 0.2653   Epoch: 2   Global Step: 22100   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:34:25,742-Speed 5459.45 samples/sec   Loss 10.9578   LearningRate 0.2653   Epoch: 2   Global Step: 22110   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:34:33,325-Speed 5402.11 samples/sec   Loss 10.9648   LearningRate 0.2653   Epoch: 2   Global Step: 22120   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:34:40,863-Speed 5434.57 samples/sec   Loss 10.9495   LearningRate 0.2653   Epoch: 2   Global Step: 22130   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:34:48,296-Speed 5511.98 samples/sec   Loss 11.0282   LearningRate 0.2652   Epoch: 2   Global Step: 22140   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:34:55,812-Speed 5450.36 samples/sec   Loss 11.0063   LearningRate 0.2652   Epoch: 2   Global Step: 22150   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:35:03,521-Speed 5314.15 samples/sec   Loss 11.0643   LearningRate 0.2652   Epoch: 2   Global Step: 22160   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:35:11,023-Speed 5460.53 samples/sec   Loss 11.0666   LearningRate 0.2651   Epoch: 2   Global Step: 22170   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:35:18,451-Speed 5514.58 samples/sec   Loss 11.0221   LearningRate 0.2651   Epoch: 2   Global Step: 22180   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:35:26,003-Speed 5424.61 samples/sec   Loss 11.0047   LearningRate 0.2651   Epoch: 2   Global Step: 22190   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:35:33,490-Speed 5471.51 samples/sec   Loss 11.0080   LearningRate 0.2651   Epoch: 2   Global Step: 22200   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:35:40,893-Speed 5534.15 samples/sec   Loss 10.9454   LearningRate 0.2650   Epoch: 2   Global Step: 22210   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:35:48,343-Speed 5498.08 samples/sec   Loss 10.9685   LearningRate 0.2650   Epoch: 2   Global Step: 22220   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:35:55,787-Speed 5503.78 samples/sec   Loss 10.9773   LearningRate 0.2650   Epoch: 2   Global Step: 22230   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:36:03,282-Speed 5465.30 samples/sec   Loss 11.0890   LearningRate 0.2649   Epoch: 2   Global Step: 22240   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:36:10,923-Speed 5361.19 samples/sec   Loss 11.1079   LearningRate 0.2649   Epoch: 2   Global Step: 22250   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:36:18,337-Speed 5525.52 samples/sec   Loss 11.0209   LearningRate 0.2649   Epoch: 2   Global Step: 22260   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:36:25,750-Speed 5527.06 samples/sec   Loss 11.0060   LearningRate 0.2649   Epoch: 2   Global Step: 22270   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:36:33,228-Speed 5477.42 samples/sec   Loss 10.9853   LearningRate 0.2648   Epoch: 2   Global Step: 22280   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:36:40,722-Speed 5466.33 samples/sec   Loss 11.0261   LearningRate 0.2648   Epoch: 2   Global Step: 22290   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:36:48,324-Speed 5389.48 samples/sec   Loss 10.9754   LearningRate 0.2648   Epoch: 2   Global Step: 22300   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:36:55,759-Speed 5509.63 samples/sec   Loss 11.0742   LearningRate 0.2647   Epoch: 2   Global Step: 22310   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:37:03,245-Speed 5472.25 samples/sec   Loss 11.0062   LearningRate 0.2647   Epoch: 2   Global Step: 22320   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:37:10,726-Speed 5475.58 samples/sec   Loss 11.0283   LearningRate 0.2647   Epoch: 2   Global Step: 22330   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:37:18,172-Speed 5501.98 samples/sec   Loss 11.0350   LearningRate 0.2646   Epoch: 2   Global Step: 22340   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:37:25,650-Speed 5478.17 samples/sec   Loss 11.0606   LearningRate 0.2646   Epoch: 2   Global Step: 22350   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:37:33,150-Speed 5461.78 samples/sec   Loss 11.0135   LearningRate 0.2646   Epoch: 2   Global Step: 22360   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:37:40,600-Speed 5499.35 samples/sec   Loss 11.0078   LearningRate 0.2646   Epoch: 2   Global Step: 22370   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:37:48,081-Speed 5475.70 samples/sec   Loss 10.9225   LearningRate 0.2645   Epoch: 2   Global Step: 22380   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:37:55,576-Speed 5465.99 samples/sec   Loss 10.9720   LearningRate 0.2645   Epoch: 2   Global Step: 22390   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:38:03,010-Speed 5510.59 samples/sec   Loss 11.0134   LearningRate 0.2645   Epoch: 2   Global Step: 22400   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:38:10,474-Speed 5488.31 samples/sec   Loss 10.9657   LearningRate 0.2644   Epoch: 2   Global Step: 22410   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:38:17,949-Speed 5480.12 samples/sec   Loss 11.0306   LearningRate 0.2644   Epoch: 2   Global Step: 22420   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:38:25,407-Speed 5492.97 samples/sec   Loss 11.0131   LearningRate 0.2644   Epoch: 2   Global Step: 22430   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:38:32,992-Speed 5401.04 samples/sec   Loss 11.0757   LearningRate 0.2644   Epoch: 2   Global Step: 22440   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:38:40,439-Speed 5501.06 samples/sec   Loss 10.9950   LearningRate 0.2643   Epoch: 2   Global Step: 22450   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:38:47,889-Speed 5498.95 samples/sec   Loss 10.9954   LearningRate 0.2643   Epoch: 2   Global Step: 22460   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:38:55,460-Speed 5410.90 samples/sec   Loss 11.0387   LearningRate 0.2643   Epoch: 2   Global Step: 22470   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:39:02,834-Speed 5554.82 samples/sec   Loss 11.0630   LearningRate 0.2642   Epoch: 2   Global Step: 22480   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:39:10,315-Speed 5476.25 samples/sec   Loss 11.0609   LearningRate 0.2642   Epoch: 2   Global Step: 22490   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:39:17,817-Speed 5460.79 samples/sec   Loss 11.0442   LearningRate 0.2642   Epoch: 2   Global Step: 22500   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:39:25,281-Speed 5488.10 samples/sec   Loss 11.0258   LearningRate 0.2642   Epoch: 2   Global Step: 22510   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:39:32,744-Speed 5489.53 samples/sec   Loss 11.0174   LearningRate 0.2641   Epoch: 2   Global Step: 22520   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:39:40,186-Speed 5504.34 samples/sec   Loss 11.0272   LearningRate 0.2641   Epoch: 2   Global Step: 22530   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:39:47,632-Speed 5501.76 samples/sec   Loss 11.0278   LearningRate 0.2641   Epoch: 2   Global Step: 22540   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:39:55,201-Speed 5412.37 samples/sec   Loss 11.0016   LearningRate 0.2640   Epoch: 2   Global Step: 22550   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:40:02,695-Speed 5466.56 samples/sec   Loss 11.0108   LearningRate 0.2640   Epoch: 2   Global Step: 22560   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:40:10,248-Speed 5423.48 samples/sec   Loss 10.9866   LearningRate 0.2640   Epoch: 2   Global Step: 22570   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:40:17,777-Speed 5441.36 samples/sec   Loss 10.9914   LearningRate 0.2640   Epoch: 2   Global Step: 22580   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:40:25,194-Speed 5522.97 samples/sec   Loss 11.0742   LearningRate 0.2639   Epoch: 2   Global Step: 22590   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:40:32,788-Speed 5394.63 samples/sec   Loss 10.9671   LearningRate 0.2639   Epoch: 2   Global Step: 22600   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:40:40,224-Speed 5508.56 samples/sec   Loss 10.9751   LearningRate 0.2639   Epoch: 2   Global Step: 22610   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:40:47,771-Speed 5428.04 samples/sec   Loss 10.9171   LearningRate 0.2638   Epoch: 2   Global Step: 22620   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:40:55,393-Speed 5375.30 samples/sec   Loss 11.0116   LearningRate 0.2638   Epoch: 2   Global Step: 22630   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:41:02,926-Speed 5437.90 samples/sec   Loss 10.8609   LearningRate 0.2638   Epoch: 2   Global Step: 22640   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:41:10,383-Speed 5493.24 samples/sec   Loss 10.9325   LearningRate 0.2638   Epoch: 2   Global Step: 22650   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:41:17,833-Speed 5499.32 samples/sec   Loss 10.9016   LearningRate 0.2637   Epoch: 2   Global Step: 22660   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:41:25,378-Speed 5429.83 samples/sec   Loss 10.9801   LearningRate 0.2637   Epoch: 2   Global Step: 22670   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:41:32,828-Speed 5498.41 samples/sec   Loss 10.9169   LearningRate 0.2637   Epoch: 2   Global Step: 22680   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:41:40,384-Speed 5421.22 samples/sec   Loss 10.9366   LearningRate 0.2636   Epoch: 2   Global Step: 22690   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:41:47,898-Speed 5452.43 samples/sec   Loss 10.9330   LearningRate 0.2636   Epoch: 2   Global Step: 22700   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:41:55,572-Speed 5338.49 samples/sec   Loss 10.9402   LearningRate 0.2636   Epoch: 2   Global Step: 22710   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:42:03,194-Speed 5374.89 samples/sec   Loss 11.0319   LearningRate 0.2636   Epoch: 2   Global Step: 22720   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:42:10,890-Speed 5322.04 samples/sec   Loss 10.9252   LearningRate 0.2635   Epoch: 2   Global Step: 22730   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:42:18,379-Speed 5470.74 samples/sec   Loss 10.9631   LearningRate 0.2635   Epoch: 2   Global Step: 22740   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:42:26,049-Speed 5340.66 samples/sec   Loss 10.9774   LearningRate 0.2635   Epoch: 2   Global Step: 22750   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:42:33,771-Speed 5305.17 samples/sec   Loss 11.0116   LearningRate 0.2634   Epoch: 2   Global Step: 22760   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:42:41,294-Speed 5445.06 samples/sec   Loss 10.9469   LearningRate 0.2634   Epoch: 2   Global Step: 22770   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:42:48,917-Speed 5373.85 samples/sec   Loss 10.9240   LearningRate 0.2634   Epoch: 2   Global Step: 22780   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:42:56,563-Speed 5357.71 samples/sec   Loss 10.9211   LearningRate 0.2634   Epoch: 2   Global Step: 22790   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:43:04,053-Speed 5469.39 samples/sec   Loss 11.0130   LearningRate 0.2633   Epoch: 2   Global Step: 22800   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:43:11,715-Speed 5346.44 samples/sec   Loss 10.9615   LearningRate 0.2633   Epoch: 2   Global Step: 22810   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:43:19,190-Speed 5480.19 samples/sec   Loss 10.9356   LearningRate 0.2633   Epoch: 2   Global Step: 22820   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:43:26,722-Speed 5438.69 samples/sec   Loss 10.9259   LearningRate 0.2633   Epoch: 2   Global Step: 22830   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:43:34,276-Speed 5423.57 samples/sec   Loss 10.9806   LearningRate 0.2632   Epoch: 2   Global Step: 22840   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:43:41,783-Speed 5457.04 samples/sec   Loss 10.9104   LearningRate 0.2632   Epoch: 2   Global Step: 22850   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:43:49,368-Speed 5400.36 samples/sec   Loss 10.9297   LearningRate 0.2632   Epoch: 2   Global Step: 22860   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:43:56,904-Speed 5435.81 samples/sec   Loss 10.9661   LearningRate 0.2631   Epoch: 2   Global Step: 22870   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:44:04,614-Speed 5313.26 samples/sec   Loss 10.9679   LearningRate 0.2631   Epoch: 2   Global Step: 22880   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:44:12,101-Speed 5471.46 samples/sec   Loss 10.9290   LearningRate 0.2631   Epoch: 2   Global Step: 22890   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:44:19,631-Speed 5440.63 samples/sec   Loss 10.9289   LearningRate 0.2631   Epoch: 2   Global Step: 22900   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:44:27,136-Speed 5457.88 samples/sec   Loss 10.8642   LearningRate 0.2630   Epoch: 2   Global Step: 22910   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:44:34,585-Speed 5499.68 samples/sec   Loss 10.8623   LearningRate 0.2630   Epoch: 2   Global Step: 22920   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:44:42,364-Speed 5265.94 samples/sec   Loss 10.8893   LearningRate 0.2630   Epoch: 2   Global Step: 22930   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:44:49,906-Speed 5432.04 samples/sec   Loss 10.9993   LearningRate 0.2629   Epoch: 2   Global Step: 22940   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:44:57,376-Speed 5483.77 samples/sec   Loss 10.8969   LearningRate 0.2629   Epoch: 2   Global Step: 22950   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:45:04,949-Speed 5409.45 samples/sec   Loss 10.9269   LearningRate 0.2629   Epoch: 2   Global Step: 22960   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:45:12,462-Speed 5452.36 samples/sec   Loss 10.9379   LearningRate 0.2629   Epoch: 2   Global Step: 22970   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:45:19,926-Speed 5488.09 samples/sec   Loss 10.9238   LearningRate 0.2628   Epoch: 2   Global Step: 22980   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:45:27,407-Speed 5476.04 samples/sec   Loss 10.9306   LearningRate 0.2628   Epoch: 2   Global Step: 22990   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:45:34,927-Speed 5447.20 samples/sec   Loss 10.8749   LearningRate 0.2628   Epoch: 2   Global Step: 23000   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:45:42,493-Speed 5414.66 samples/sec   Loss 10.9072   LearningRate 0.2627   Epoch: 2   Global Step: 23010   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:45:50,060-Speed 5413.65 samples/sec   Loss 10.9318   LearningRate 0.2627   Epoch: 2   Global Step: 23020   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:45:57,563-Speed 5459.68 samples/sec   Loss 10.9286   LearningRate 0.2627   Epoch: 2   Global Step: 23030   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:46:05,107-Speed 5430.12 samples/sec   Loss 10.9701   LearningRate 0.2627   Epoch: 2   Global Step: 23040   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:46:12,590-Speed 5474.88 samples/sec   Loss 10.9153   LearningRate 0.2626   Epoch: 2   Global Step: 23050   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:46:20,064-Speed 5481.07 samples/sec   Loss 10.8952   LearningRate 0.2626   Epoch: 2   Global Step: 23060   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:46:27,587-Speed 5445.18 samples/sec   Loss 10.8905   LearningRate 0.2626   Epoch: 2   Global Step: 23070   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:46:35,141-Speed 5422.80 samples/sec   Loss 10.8937   LearningRate 0.2625   Epoch: 2   Global Step: 23080   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:46:42,836-Speed 5324.21 samples/sec   Loss 10.8919   LearningRate 0.2625   Epoch: 2   Global Step: 23090   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:46:50,679-Speed 5222.85 samples/sec   Loss 10.9573   LearningRate 0.2625   Epoch: 2   Global Step: 23100   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:46:58,316-Speed 5363.92 samples/sec   Loss 10.9971   LearningRate 0.2625   Epoch: 2   Global Step: 23110   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:47:06,022-Speed 5315.55 samples/sec   Loss 10.8642   LearningRate 0.2624   Epoch: 2   Global Step: 23120   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:47:13,470-Speed 5500.59 samples/sec   Loss 10.9687   LearningRate 0.2624   Epoch: 2   Global Step: 23130   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:47:20,999-Speed 5440.80 samples/sec   Loss 10.8887   LearningRate 0.2624   Epoch: 2   Global Step: 23140   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:47:28,487-Speed 5470.61 samples/sec   Loss 10.9554   LearningRate 0.2623   Epoch: 2   Global Step: 23150   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:47:36,212-Speed 5305.72 samples/sec   Loss 10.9755   LearningRate 0.2623   Epoch: 2   Global Step: 23160   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:47:43,871-Speed 5348.97 samples/sec   Loss 10.9947   LearningRate 0.2623   Epoch: 2   Global Step: 23170   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:47:51,427-Speed 5421.59 samples/sec   Loss 10.9189   LearningRate 0.2623   Epoch: 2   Global Step: 23180   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:47:59,013-Speed 5400.13 samples/sec   Loss 10.8826   LearningRate 0.2622   Epoch: 2   Global Step: 23190   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:48:06,691-Speed 5335.17 samples/sec   Loss 10.9500   LearningRate 0.2622   Epoch: 2   Global Step: 23200   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:48:14,351-Speed 5348.30 samples/sec   Loss 10.9164   LearningRate 0.2622   Epoch: 2   Global Step: 23210   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:48:21,786-Speed 5510.14 samples/sec   Loss 10.9691   LearningRate 0.2621   Epoch: 2   Global Step: 23220   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:48:29,224-Speed 5507.07 samples/sec   Loss 10.9575   LearningRate 0.2621   Epoch: 2   Global Step: 23230   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:48:36,718-Speed 5467.03 samples/sec   Loss 10.9581   LearningRate 0.2621   Epoch: 2   Global Step: 23240   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:48:44,304-Speed 5399.63 samples/sec   Loss 10.8268   LearningRate 0.2621   Epoch: 2   Global Step: 23250   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:48:51,750-Speed 5502.25 samples/sec   Loss 10.9373   LearningRate 0.2620   Epoch: 2   Global Step: 23260   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:48:59,326-Speed 5407.38 samples/sec   Loss 10.8933   LearningRate 0.2620   Epoch: 2   Global Step: 23270   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:49:06,872-Speed 5428.27 samples/sec   Loss 10.8756   LearningRate 0.2620   Epoch: 2   Global Step: 23280   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:49:14,355-Speed 5475.05 samples/sec   Loss 10.9227   LearningRate 0.2619   Epoch: 2   Global Step: 23290   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:49:21,880-Speed 5443.38 samples/sec   Loss 10.9125   LearningRate 0.2619   Epoch: 2   Global Step: 23300   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:49:29,355-Speed 5480.46 samples/sec   Loss 10.8577   LearningRate 0.2619   Epoch: 2   Global Step: 23310   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:49:36,940-Speed 5400.69 samples/sec   Loss 10.8912   LearningRate 0.2619   Epoch: 2   Global Step: 23320   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:49:44,439-Speed 5463.21 samples/sec   Loss 10.9078   LearningRate 0.2618   Epoch: 2   Global Step: 23330   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:49:51,975-Speed 5435.51 samples/sec   Loss 10.8832   LearningRate 0.2618   Epoch: 2   Global Step: 23340   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:49:59,516-Speed 5433.03 samples/sec   Loss 10.9118   LearningRate 0.2618   Epoch: 2   Global Step: 23350   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:50:07,027-Speed 5453.86 samples/sec   Loss 10.8319   LearningRate 0.2617   Epoch: 2   Global Step: 23360   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:50:14,540-Speed 5452.89 samples/sec   Loss 10.9091   LearningRate 0.2617   Epoch: 2   Global Step: 23370   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:50:22,045-Speed 5458.53 samples/sec   Loss 10.8906   LearningRate 0.2617   Epoch: 2   Global Step: 23380   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:50:29,563-Speed 5449.00 samples/sec   Loss 10.9110   LearningRate 0.2617   Epoch: 2   Global Step: 23390   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:50:37,088-Speed 5444.10 samples/sec   Loss 10.8160   LearningRate 0.2616   Epoch: 2   Global Step: 23400   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:50:44,641-Speed 5423.85 samples/sec   Loss 10.8590   LearningRate 0.2616   Epoch: 2   Global Step: 23410   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:50:52,127-Speed 5472.44 samples/sec   Loss 10.8333   LearningRate 0.2616   Epoch: 2   Global Step: 23420   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:50:59,663-Speed 5435.60 samples/sec   Loss 10.8470   LearningRate 0.2615   Epoch: 2   Global Step: 23430   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:51:07,213-Speed 5426.03 samples/sec   Loss 10.8837   LearningRate 0.2615   Epoch: 2   Global Step: 23440   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:51:14,725-Speed 5453.17 samples/sec   Loss 10.8957   LearningRate 0.2615   Epoch: 2   Global Step: 23450   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:51:22,271-Speed 5429.32 samples/sec   Loss 10.8990   LearningRate 0.2615   Epoch: 2   Global Step: 23460   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:51:29,791-Speed 5447.73 samples/sec   Loss 10.8892   LearningRate 0.2614   Epoch: 2   Global Step: 23470   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:51:37,294-Speed 5460.07 samples/sec   Loss 10.8617   LearningRate 0.2614   Epoch: 2   Global Step: 23480   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:51:44,831-Speed 5434.70 samples/sec   Loss 10.9155   LearningRate 0.2614   Epoch: 2   Global Step: 23490   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:51:52,383-Speed 5424.30 samples/sec   Loss 10.9616   LearningRate 0.2613   Epoch: 2   Global Step: 23500   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:51:59,932-Speed 5427.24 samples/sec   Loss 10.9183   LearningRate 0.2613   Epoch: 2   Global Step: 23510   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:52:07,443-Speed 5453.73 samples/sec   Loss 10.9456   LearningRate 0.2613   Epoch: 2   Global Step: 23520   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:52:15,007-Speed 5415.50 samples/sec   Loss 10.9071   LearningRate 0.2613   Epoch: 2   Global Step: 23530   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:52:22,568-Speed 5418.10 samples/sec   Loss 10.9275   LearningRate 0.2612   Epoch: 2   Global Step: 23540   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:52:30,111-Speed 5430.98 samples/sec   Loss 10.8874   LearningRate 0.2612   Epoch: 2   Global Step: 23550   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:52:37,548-Speed 5508.76 samples/sec   Loss 10.9061   LearningRate 0.2612   Epoch: 2   Global Step: 23560   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:52:45,159-Speed 5382.24 samples/sec   Loss 10.9090   LearningRate 0.2611   Epoch: 2   Global Step: 23570   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:52:52,599-Speed 5506.34 samples/sec   Loss 10.9133   LearningRate 0.2611   Epoch: 2   Global Step: 23580   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:53:00,245-Speed 5357.76 samples/sec   Loss 10.9079   LearningRate 0.2611   Epoch: 2   Global Step: 23590   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:53:07,760-Speed 5450.96 samples/sec   Loss 10.8680   LearningRate 0.2611   Epoch: 2   Global Step: 23600   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:53:15,316-Speed 5421.84 samples/sec   Loss 10.9124   LearningRate 0.2610   Epoch: 2   Global Step: 23610   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:53:22,876-Speed 5418.87 samples/sec   Loss 10.8319   LearningRate 0.2610   Epoch: 2   Global Step: 23620   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:53:30,438-Speed 5417.22 samples/sec   Loss 10.8862   LearningRate 0.2610   Epoch: 2   Global Step: 23630   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:53:38,054-Speed 5378.92 samples/sec   Loss 10.8212   LearningRate 0.2609   Epoch: 2   Global Step: 23640   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:53:45,573-Speed 5448.38 samples/sec   Loss 10.7935   LearningRate 0.2609   Epoch: 2   Global Step: 23650   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:53:53,057-Speed 5473.69 samples/sec   Loss 10.8826   LearningRate 0.2609   Epoch: 2   Global Step: 23660   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:54:00,686-Speed 5369.82 samples/sec   Loss 10.8299   LearningRate 0.2609   Epoch: 2   Global Step: 23670   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:54:08,119-Speed 5511.83 samples/sec   Loss 10.8574   LearningRate 0.2608   Epoch: 2   Global Step: 23680   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:54:15,729-Speed 5382.97 samples/sec   Loss 10.8556   LearningRate 0.2608   Epoch: 2   Global Step: 23690   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:54:23,379-Speed 5354.35 samples/sec   Loss 10.8858   LearningRate 0.2608   Epoch: 2   Global Step: 23700   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:54:30,847-Speed 5486.08 samples/sec   Loss 10.8630   LearningRate 0.2607   Epoch: 2   Global Step: 23710   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:54:38,448-Speed 5389.53 samples/sec   Loss 10.9198   LearningRate 0.2607   Epoch: 2   Global Step: 23720   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:54:46,003-Speed 5422.27 samples/sec   Loss 10.9055   LearningRate 0.2607   Epoch: 2   Global Step: 23730   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:54:53,649-Speed 5357.66 samples/sec   Loss 10.8114   LearningRate 0.2607   Epoch: 2   Global Step: 23740   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:55:01,140-Speed 5469.51 samples/sec   Loss 10.9123   LearningRate 0.2606   Epoch: 2   Global Step: 23750   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:55:08,709-Speed 5412.30 samples/sec   Loss 10.8722   LearningRate 0.2606   Epoch: 2   Global Step: 23760   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:55:16,150-Speed 5505.44 samples/sec   Loss 10.8524   LearningRate 0.2606   Epoch: 2   Global Step: 23770   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:55:23,801-Speed 5353.97 samples/sec   Loss 10.8601   LearningRate 0.2605   Epoch: 2   Global Step: 23780   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:55:31,272-Speed 5483.67 samples/sec   Loss 10.9125   LearningRate 0.2605   Epoch: 2   Global Step: 23790   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:55:38,869-Speed 5392.56 samples/sec   Loss 10.8283   LearningRate 0.2605   Epoch: 2   Global Step: 23800   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:55:46,470-Speed 5389.24 samples/sec   Loss 10.8082   LearningRate 0.2605   Epoch: 2   Global Step: 23810   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-07 23:55:53,973-Speed 5459.43 samples/sec   Loss 10.7511   LearningRate 0.2604   Epoch: 2   Global Step: 23820   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:56:01,546-Speed 5409.69 samples/sec   Loss 10.8506   LearningRate 0.2604   Epoch: 2   Global Step: 23830   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:56:09,117-Speed 5411.21 samples/sec   Loss 10.8297   LearningRate 0.2604   Epoch: 2   Global Step: 23840   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:56:16,752-Speed 5365.87 samples/sec   Loss 10.8390   LearningRate 0.2603   Epoch: 2   Global Step: 23850   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:56:24,226-Speed 5480.26 samples/sec   Loss 10.7964   LearningRate 0.2603   Epoch: 2   Global Step: 23860   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:56:31,758-Speed 5439.52 samples/sec   Loss 10.8455   LearningRate 0.2603   Epoch: 2   Global Step: 23870   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:56:39,377-Speed 5376.70 samples/sec   Loss 10.7930   LearningRate 0.2603   Epoch: 2   Global Step: 23880   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:56:46,961-Speed 5401.34 samples/sec   Loss 10.7879   LearningRate 0.2602   Epoch: 2   Global Step: 23890   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:56:54,538-Speed 5406.22 samples/sec   Loss 10.7936   LearningRate 0.2602   Epoch: 2   Global Step: 23900   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:57:02,176-Speed 5363.73 samples/sec   Loss 10.7967   LearningRate 0.2602   Epoch: 2   Global Step: 23910   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:57:09,630-Speed 5495.69 samples/sec   Loss 10.8692   LearningRate 0.2601   Epoch: 2   Global Step: 23920   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:57:17,080-Speed 5498.83 samples/sec   Loss 10.7783   LearningRate 0.2601   Epoch: 2   Global Step: 23930   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:57:24,619-Speed 5433.59 samples/sec   Loss 10.8510   LearningRate 0.2601   Epoch: 2   Global Step: 23940   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:57:32,108-Speed 5470.34 samples/sec   Loss 10.8989   LearningRate 0.2601   Epoch: 2   Global Step: 23950   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 23:57:39,580-Speed 5482.38 samples/sec   Loss 10.8338   LearningRate 0.2600   Epoch: 2   Global Step: 23960   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:57:47,360-Speed 5265.78 samples/sec   Loss 10.8868   LearningRate 0.2600   Epoch: 2   Global Step: 23970   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:57:54,801-Speed 5505.02 samples/sec   Loss 10.8580   LearningRate 0.2600   Epoch: 2   Global Step: 23980   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:58:02,308-Speed 5457.15 samples/sec   Loss 10.8328   LearningRate 0.2600   Epoch: 2   Global Step: 23990   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:58:09,872-Speed 5415.64 samples/sec   Loss 10.8889   LearningRate 0.2599   Epoch: 2   Global Step: 24000   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-07 23:58:54,009-[lfw][24000]XNorm: 22.602196
Training: 2022-01-07 23:58:54,010-[lfw][24000]Accuracy-Flip: 0.99767+-0.00291
Training: 2022-01-07 23:58:54,010-[lfw][24000]Accuracy-Highest: 0.99767
Training: 2022-01-07 23:59:46,833-[cfp_fp][24000]XNorm: 20.366085
Training: 2022-01-07 23:59:46,834-[cfp_fp][24000]Accuracy-Flip: 0.97686+-0.00758
Training: 2022-01-07 23:59:46,835-[cfp_fp][24000]Accuracy-Highest: 0.97986
Training: 2022-01-08 00:00:32,440-[agedb_30][24000]XNorm: 22.227586
Training: 2022-01-08 00:00:32,442-[agedb_30][24000]Accuracy-Flip: 0.96317+-0.00899
Training: 2022-01-08 00:00:32,442-[agedb_30][24000]Accuracy-Highest: 0.96883
Training: 2022-01-08 00:00:40,114-Speed 272.63 samples/sec   Loss 11.0215   LearningRate 0.2599   Epoch: 2   Global Step: 24010   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:00:47,580-Speed 5488.21 samples/sec   Loss 10.8469   LearningRate 0.2599   Epoch: 2   Global Step: 24020   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:00:55,178-Speed 5392.04 samples/sec   Loss 10.8725   LearningRate 0.2598   Epoch: 2   Global Step: 24030   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:01:02,867-Speed 5327.87 samples/sec   Loss 10.8117   LearningRate 0.2598   Epoch: 2   Global Step: 24040   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:01:10,354-Speed 5472.29 samples/sec   Loss 10.8697   LearningRate 0.2598   Epoch: 2   Global Step: 24050   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:01:18,099-Speed 5289.30 samples/sec   Loss 10.8008   LearningRate 0.2598   Epoch: 2   Global Step: 24060   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:01:25,570-Speed 5483.07 samples/sec   Loss 10.8267   LearningRate 0.2597   Epoch: 2   Global Step: 24070   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:01:33,214-Speed 5359.05 samples/sec   Loss 10.7373   LearningRate 0.2597   Epoch: 2   Global Step: 24080   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:01:40,728-Speed 5452.08 samples/sec   Loss 10.8084   LearningRate 0.2597   Epoch: 2   Global Step: 24090   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:01:48,401-Speed 5338.65 samples/sec   Loss 10.8917   LearningRate 0.2596   Epoch: 2   Global Step: 24100   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:01:56,016-Speed 5379.92 samples/sec   Loss 10.7756   LearningRate 0.2596   Epoch: 2   Global Step: 24110   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:02:03,624-Speed 5384.28 samples/sec   Loss 10.7695   LearningRate 0.2596   Epoch: 2   Global Step: 24120   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:02:11,174-Speed 5425.96 samples/sec   Loss 10.7851   LearningRate 0.2596   Epoch: 2   Global Step: 24130   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:02:18,655-Speed 5475.67 samples/sec   Loss 10.9079   LearningRate 0.2595   Epoch: 2   Global Step: 24140   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:02:26,239-Speed 5401.93 samples/sec   Loss 10.7184   LearningRate 0.2595   Epoch: 2   Global Step: 24150   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:02:33,631-Speed 5541.90 samples/sec   Loss 10.7375   LearningRate 0.2595   Epoch: 2   Global Step: 24160   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-08 00:02:41,146-Speed 5450.97 samples/sec   Loss 10.7992   LearningRate 0.2594   Epoch: 2   Global Step: 24170   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-08 00:02:48,571-Speed 5516.94 samples/sec   Loss 10.8045   LearningRate 0.2594   Epoch: 2   Global Step: 24180   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-08 00:02:56,131-Speed 5418.45 samples/sec   Loss 10.6832   LearningRate 0.2594   Epoch: 2   Global Step: 24190   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-08 00:03:03,626-Speed 5465.91 samples/sec   Loss 10.7711   LearningRate 0.2594   Epoch: 2   Global Step: 24200   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:03:11,239-Speed 5380.89 samples/sec   Loss 10.8733   LearningRate 0.2593   Epoch: 2   Global Step: 24210   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:03:18,706-Speed 5486.01 samples/sec   Loss 10.7693   LearningRate 0.2593   Epoch: 2   Global Step: 24220   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:03:26,252-Speed 5428.71 samples/sec   Loss 10.7619   LearningRate 0.2593   Epoch: 2   Global Step: 24230   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:03:33,817-Speed 5415.21 samples/sec   Loss 10.7570   LearningRate 0.2592   Epoch: 2   Global Step: 24240   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:03:41,358-Speed 5432.12 samples/sec   Loss 10.7545   LearningRate 0.2592   Epoch: 2   Global Step: 24250   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:03:48,957-Speed 5391.10 samples/sec   Loss 10.8631   LearningRate 0.2592   Epoch: 2   Global Step: 24260   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:03:56,526-Speed 5412.64 samples/sec   Loss 10.7618   LearningRate 0.2592   Epoch: 2   Global Step: 24270   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:04:04,083-Speed 5420.67 samples/sec   Loss 10.7847   LearningRate 0.2591   Epoch: 2   Global Step: 24280   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:04:11,587-Speed 5459.22 samples/sec   Loss 10.7763   LearningRate 0.2591   Epoch: 2   Global Step: 24290   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:04:19,135-Speed 5427.25 samples/sec   Loss 10.6942   LearningRate 0.2591   Epoch: 2   Global Step: 24300   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-08 00:04:26,682-Speed 5428.39 samples/sec   Loss 10.8350   LearningRate 0.2590   Epoch: 2   Global Step: 24310   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-08 00:04:34,221-Speed 5433.92 samples/sec   Loss 10.7808   LearningRate 0.2590   Epoch: 2   Global Step: 24320   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-08 00:04:41,831-Speed 5383.22 samples/sec   Loss 10.7812   LearningRate 0.2590   Epoch: 2   Global Step: 24330   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-08 00:04:49,555-Speed 5303.71 samples/sec   Loss 10.7787   LearningRate 0.2590   Epoch: 2   Global Step: 24340   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-08 00:04:56,966-Speed 5527.60 samples/sec   Loss 10.7380   LearningRate 0.2589   Epoch: 2   Global Step: 24350   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:05:04,425-Speed 5492.63 samples/sec   Loss 10.7653   LearningRate 0.2589   Epoch: 2   Global Step: 24360   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:05:11,906-Speed 5475.15 samples/sec   Loss 10.8423   LearningRate 0.2589   Epoch: 2   Global Step: 24370   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:05:19,366-Speed 5491.64 samples/sec   Loss 10.8156   LearningRate 0.2588   Epoch: 2   Global Step: 24380   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:05:26,961-Speed 5394.32 samples/sec   Loss 10.8187   LearningRate 0.2588   Epoch: 2   Global Step: 24390   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:05:34,712-Speed 5285.38 samples/sec   Loss 10.7216   LearningRate 0.2588   Epoch: 2   Global Step: 24400   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:05:42,151-Speed 5506.64 samples/sec   Loss 10.8223   LearningRate 0.2588   Epoch: 2   Global Step: 24410   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:05:49,663-Speed 5453.10 samples/sec   Loss 10.8936   LearningRate 0.2587   Epoch: 2   Global Step: 24420   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:05:57,256-Speed 5395.51 samples/sec   Loss 10.7867   LearningRate 0.2587   Epoch: 2   Global Step: 24430   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:06:04,809-Speed 5423.42 samples/sec   Loss 10.7342   LearningRate 0.2587   Epoch: 2   Global Step: 24440   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:06:12,278-Speed 5484.79 samples/sec   Loss 10.8070   LearningRate 0.2586   Epoch: 2   Global Step: 24450   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-08 00:06:19,870-Speed 5395.72 samples/sec   Loss 10.7298   LearningRate 0.2586   Epoch: 2   Global Step: 24460   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-08 00:06:27,383-Speed 5453.34 samples/sec   Loss 10.8079   LearningRate 0.2586   Epoch: 2   Global Step: 24470   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-08 00:06:34,901-Speed 5448.79 samples/sec   Loss 10.7596   LearningRate 0.2586   Epoch: 2   Global Step: 24480   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-08 00:06:42,328-Speed 5515.12 samples/sec   Loss 10.7342   LearningRate 0.2585   Epoch: 2   Global Step: 24490   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-08 00:06:49,868-Speed 5433.69 samples/sec   Loss 10.7985   LearningRate 0.2585   Epoch: 2   Global Step: 24500   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-08 00:06:57,333-Speed 5487.78 samples/sec   Loss 10.7809   LearningRate 0.2585   Epoch: 2   Global Step: 24510   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-08 00:07:04,810-Speed 5478.65 samples/sec   Loss 10.7132   LearningRate 0.2585   Epoch: 2   Global Step: 24520   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:07:12,395-Speed 5400.87 samples/sec   Loss 10.7175   LearningRate 0.2584   Epoch: 2   Global Step: 24530   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:07:20,072-Speed 5336.45 samples/sec   Loss 10.7267   LearningRate 0.2584   Epoch: 2   Global Step: 24540   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:07:27,585-Speed 5452.42 samples/sec   Loss 10.7782   LearningRate 0.2584   Epoch: 2   Global Step: 24550   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:07:35,047-Speed 5489.78 samples/sec   Loss 10.8256   LearningRate 0.2583   Epoch: 2   Global Step: 24560   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:07:42,541-Speed 5466.48 samples/sec   Loss 10.7784   LearningRate 0.2583   Epoch: 2   Global Step: 24570   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:07:50,021-Speed 5476.85 samples/sec   Loss 10.7798   LearningRate 0.2583   Epoch: 2   Global Step: 24580   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:07:57,510-Speed 5470.36 samples/sec   Loss 10.8037   LearningRate 0.2583   Epoch: 2   Global Step: 24590   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:08:04,953-Speed 5503.72 samples/sec   Loss 10.7041   LearningRate 0.2582   Epoch: 2   Global Step: 24600   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:08:12,396-Speed 5504.03 samples/sec   Loss 10.8055   LearningRate 0.2582   Epoch: 2   Global Step: 24610   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:08:19,852-Speed 5494.07 samples/sec   Loss 10.7371   LearningRate 0.2582   Epoch: 2   Global Step: 24620   Fp16 Grad Scale: 262144   Required: 42 hours
Training: 2022-01-08 00:08:27,543-Speed 5326.84 samples/sec   Loss 10.7543   LearningRate 0.2581   Epoch: 2   Global Step: 24630   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:08:35,153-Speed 5383.50 samples/sec   Loss 10.7778   LearningRate 0.2581   Epoch: 2   Global Step: 24640   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:08:42,567-Speed 5525.20 samples/sec   Loss 10.7110   LearningRate 0.2581   Epoch: 2   Global Step: 24650   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:08:49,976-Speed 5529.13 samples/sec   Loss 10.8396   LearningRate 0.2581   Epoch: 2   Global Step: 24660   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:08:57,546-Speed 5411.63 samples/sec   Loss 10.6818   LearningRate 0.2580   Epoch: 2   Global Step: 24670   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:09:05,060-Speed 5451.73 samples/sec   Loss 10.7934   LearningRate 0.2580   Epoch: 2   Global Step: 24680   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:09:12,574-Speed 5451.46 samples/sec   Loss 10.7502   LearningRate 0.2580   Epoch: 2   Global Step: 24690   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:09:20,055-Speed 5476.49 samples/sec   Loss 10.7813   LearningRate 0.2579   Epoch: 2   Global Step: 24700   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:09:27,619-Speed 5415.92 samples/sec   Loss 10.7045   LearningRate 0.2579   Epoch: 2   Global Step: 24710   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-08 00:09:35,173-Speed 5422.79 samples/sec   Loss 10.6669   LearningRate 0.2579   Epoch: 2   Global Step: 24720   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-08 00:09:42,651-Speed 5477.69 samples/sec   Loss 10.7079   LearningRate 0.2579   Epoch: 2   Global Step: 24730   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-08 00:09:50,180-Speed 5441.84 samples/sec   Loss 10.7638   LearningRate 0.2578   Epoch: 2   Global Step: 24740   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-08 00:09:57,756-Speed 5406.96 samples/sec   Loss 10.7256   LearningRate 0.2578   Epoch: 2   Global Step: 24750   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-08 00:10:05,198-Speed 5505.28 samples/sec   Loss 10.7087   LearningRate 0.2578   Epoch: 2   Global Step: 24760   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-08 00:10:12,716-Speed 5448.61 samples/sec   Loss 10.7508   LearningRate 0.2577   Epoch: 2   Global Step: 24770   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-08 00:10:20,192-Speed 5480.16 samples/sec   Loss 10.6798   LearningRate 0.2577   Epoch: 2   Global Step: 24780   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-08 00:10:28,007-Speed 5241.80 samples/sec   Loss 10.7618   LearningRate 0.2577   Epoch: 2   Global Step: 24790   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-08 00:10:35,697-Speed 5326.96 samples/sec   Loss 10.7420   LearningRate 0.2577   Epoch: 2   Global Step: 24800   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-08 00:10:43,202-Speed 5458.72 samples/sec   Loss 10.8296   LearningRate 0.2576   Epoch: 2   Global Step: 24810   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:10:50,682-Speed 5476.63 samples/sec   Loss 10.7879   LearningRate 0.2576   Epoch: 2   Global Step: 24820   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:10:58,261-Speed 5404.92 samples/sec   Loss 10.8035   LearningRate 0.2576   Epoch: 2   Global Step: 24830   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:11:05,787-Speed 5443.55 samples/sec   Loss 10.7567   LearningRate 0.2575   Epoch: 2   Global Step: 24840   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:11:13,296-Speed 5454.96 samples/sec   Loss 10.7395   LearningRate 0.2575   Epoch: 2   Global Step: 24850   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-08 00:11:20,921-Speed 5372.87 samples/sec   Loss 10.7476   LearningRate 0.2575   Epoch: 2   Global Step: 24860   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-08 00:11:28,420-Speed 5462.68 samples/sec   Loss 10.7198   LearningRate 0.2575   Epoch: 2   Global Step: 24870   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-08 00:11:35,995-Speed 5407.95 samples/sec   Loss 10.7279   LearningRate 0.2574   Epoch: 2   Global Step: 24880   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-08 00:11:43,478-Speed 5474.16 samples/sec   Loss 10.7504   LearningRate 0.2574   Epoch: 2   Global Step: 24890   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-08 00:11:51,087-Speed 5384.38 samples/sec   Loss 10.7668   LearningRate 0.2574   Epoch: 2   Global Step: 24900   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-08 00:11:58,588-Speed 5460.64 samples/sec   Loss 10.6671   LearningRate 0.2573   Epoch: 2   Global Step: 24910   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-08 00:12:06,147-Speed 5419.57 samples/sec   Loss 10.7265   LearningRate 0.2573   Epoch: 2   Global Step: 24920   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-08 00:12:13,594-Speed 5501.45 samples/sec   Loss 10.7083   LearningRate 0.2573   Epoch: 2   Global Step: 24930   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-08 00:12:21,106-Speed 5452.88 samples/sec   Loss 10.5801   LearningRate 0.2573   Epoch: 2   Global Step: 24940   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-08 00:12:28,615-Speed 5455.54 samples/sec   Loss 10.7061   LearningRate 0.2572   Epoch: 2   Global Step: 24950   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:12:36,199-Speed 5401.61 samples/sec   Loss 10.6993   LearningRate 0.2572   Epoch: 2   Global Step: 24960   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:12:43,681-Speed 5475.39 samples/sec   Loss 10.7028   LearningRate 0.2572   Epoch: 2   Global Step: 24970   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:12:51,162-Speed 5475.65 samples/sec   Loss 10.7615   LearningRate 0.2572   Epoch: 2   Global Step: 24980   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:12:58,607-Speed 5502.82 samples/sec   Loss 10.6828   LearningRate 0.2571   Epoch: 2   Global Step: 24990   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:13:06,224-Speed 5377.54 samples/sec   Loss 10.6865   LearningRate 0.2571   Epoch: 2   Global Step: 25000   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:13:13,712-Speed 5471.48 samples/sec   Loss 10.7378   LearningRate 0.2571   Epoch: 2   Global Step: 25010   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:13:21,243-Speed 5439.91 samples/sec   Loss 10.7041   LearningRate 0.2570   Epoch: 2   Global Step: 25020   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:13:28,717-Speed 5480.52 samples/sec   Loss 10.7854   LearningRate 0.2570   Epoch: 2   Global Step: 25030   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:13:36,257-Speed 5433.19 samples/sec   Loss 10.7143   LearningRate 0.2570   Epoch: 2   Global Step: 25040   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:13:43,716-Speed 5491.92 samples/sec   Loss 10.6866   LearningRate 0.2570   Epoch: 2   Global Step: 25050   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:13:51,265-Speed 5427.29 samples/sec   Loss 10.7107   LearningRate 0.2569   Epoch: 2   Global Step: 25060   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:13:58,695-Speed 5512.78 samples/sec   Loss 10.7264   LearningRate 0.2569   Epoch: 2   Global Step: 25070   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:14:06,145-Speed 5499.01 samples/sec   Loss 10.6787   LearningRate 0.2569   Epoch: 2   Global Step: 25080   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:14:13,669-Speed 5444.65 samples/sec   Loss 10.7204   LearningRate 0.2568   Epoch: 2   Global Step: 25090   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-08 00:14:21,193-Speed 5444.88 samples/sec   Loss 10.7606   LearningRate 0.2568   Epoch: 2   Global Step: 25100   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-08 00:14:28,743-Speed 5425.72 samples/sec   Loss 10.7079   LearningRate 0.2568   Epoch: 2   Global Step: 25110   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-08 00:14:36,155-Speed 5526.80 samples/sec   Loss 10.6998   LearningRate 0.2568   Epoch: 2   Global Step: 25120   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-08 00:14:43,626-Speed 5483.39 samples/sec   Loss 10.6160   LearningRate 0.2567   Epoch: 2   Global Step: 25130   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-08 00:14:51,226-Speed 5390.82 samples/sec   Loss 10.6978   LearningRate 0.2567   Epoch: 2   Global Step: 25140   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-08 00:14:58,726-Speed 5462.05 samples/sec   Loss 10.6868   LearningRate 0.2567   Epoch: 2   Global Step: 25150   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-08 00:15:06,263-Speed 5434.93 samples/sec   Loss 10.7819   LearningRate 0.2566   Epoch: 2   Global Step: 25160   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-08 00:15:13,758-Speed 5466.34 samples/sec   Loss 10.6963   LearningRate 0.2566   Epoch: 2   Global Step: 25170   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-08 00:15:21,246-Speed 5470.74 samples/sec   Loss 10.7403   LearningRate 0.2566   Epoch: 2   Global Step: 25180   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-08 00:15:28,718-Speed 5482.01 samples/sec   Loss 10.7132   LearningRate 0.2566   Epoch: 2   Global Step: 25190   Fp16 Grad Scale: 131072   Required: 42 hours
Training: 2022-01-08 00:15:36,222-Speed 5459.27 samples/sec   Loss 10.6238   LearningRate 0.2565   Epoch: 2   Global Step: 25200   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:15:43,814-Speed 5396.36 samples/sec   Loss 10.7492   LearningRate 0.2565   Epoch: 2   Global Step: 25210   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:15:51,211-Speed 5537.88 samples/sec   Loss 10.7011   LearningRate 0.2565   Epoch: 2   Global Step: 25220   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:15:58,773-Speed 5417.42 samples/sec   Loss 10.6968   LearningRate 0.2564   Epoch: 2   Global Step: 25230   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:16:06,277-Speed 5458.91 samples/sec   Loss 10.6760   LearningRate 0.2564   Epoch: 2   Global Step: 25240   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:16:13,850-Speed 5410.06 samples/sec   Loss 10.6837   LearningRate 0.2564   Epoch: 2   Global Step: 25250   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:16:21,420-Speed 5411.71 samples/sec   Loss 10.7154   LearningRate 0.2564   Epoch: 2   Global Step: 25260   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:16:28,986-Speed 5414.23 samples/sec   Loss 10.6453   LearningRate 0.2563   Epoch: 2   Global Step: 25270   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:16:36,691-Speed 5316.87 samples/sec   Loss 10.6472   LearningRate 0.2563   Epoch: 2   Global Step: 25280   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:16:44,269-Speed 5405.67 samples/sec   Loss 10.6872   LearningRate 0.2563   Epoch: 2   Global Step: 25290   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 00:16:51,708-Speed 5507.05 samples/sec   Loss 10.7322   LearningRate 0.2563   Epoch: 2   Global Step: 25300   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:16:59,125-Speed 5522.88 samples/sec   Loss 10.6430   LearningRate 0.2562   Epoch: 2   Global Step: 25310   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:17:06,615-Speed 5469.69 samples/sec   Loss 10.6846   LearningRate 0.2562   Epoch: 2   Global Step: 25320   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:17:14,183-Speed 5413.24 samples/sec   Loss 10.7901   LearningRate 0.2562   Epoch: 2   Global Step: 25330   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:17:21,739-Speed 5421.70 samples/sec   Loss 10.7179   LearningRate 0.2561   Epoch: 2   Global Step: 25340   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:17:29,203-Speed 5488.04 samples/sec   Loss 10.6702   LearningRate 0.2561   Epoch: 2   Global Step: 25350   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:17:36,702-Speed 5462.86 samples/sec   Loss 10.6793   LearningRate 0.2561   Epoch: 2   Global Step: 25360   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:17:44,243-Speed 5432.49 samples/sec   Loss 10.6377   LearningRate 0.2561   Epoch: 2   Global Step: 25370   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:17:51,849-Speed 5385.98 samples/sec   Loss 10.7458   LearningRate 0.2560   Epoch: 2   Global Step: 25380   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:17:59,319-Speed 5484.14 samples/sec   Loss 10.6427   LearningRate 0.2560   Epoch: 2   Global Step: 25390   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:18:06,943-Speed 5372.84 samples/sec   Loss 10.7258   LearningRate 0.2560   Epoch: 2   Global Step: 25400   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:18:14,438-Speed 5466.60 samples/sec   Loss 10.6991   LearningRate 0.2559   Epoch: 2   Global Step: 25410   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:18:22,031-Speed 5394.77 samples/sec   Loss 10.6707   LearningRate 0.2559   Epoch: 2   Global Step: 25420   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:18:29,572-Speed 5432.42 samples/sec   Loss 10.6800   LearningRate 0.2559   Epoch: 2   Global Step: 25430   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:18:37,243-Speed 5340.07 samples/sec   Loss 10.7246   LearningRate 0.2559   Epoch: 2   Global Step: 25440   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:18:44,832-Speed 5398.72 samples/sec   Loss 10.7045   LearningRate 0.2558   Epoch: 2   Global Step: 25450   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:18:52,467-Speed 5365.31 samples/sec   Loss 10.6916   LearningRate 0.2558   Epoch: 2   Global Step: 25460   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:19:00,133-Speed 5343.41 samples/sec   Loss 10.6828   LearningRate 0.2558   Epoch: 2   Global Step: 25470   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:19:07,612-Speed 5477.73 samples/sec   Loss 10.6938   LearningRate 0.2557   Epoch: 2   Global Step: 25480   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:19:15,202-Speed 5397.00 samples/sec   Loss 10.7097   LearningRate 0.2557   Epoch: 2   Global Step: 25490   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:19:22,762-Speed 5418.69 samples/sec   Loss 10.6338   LearningRate 0.2557   Epoch: 2   Global Step: 25500   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:19:30,166-Speed 5532.93 samples/sec   Loss 10.6595   LearningRate 0.2557   Epoch: 2   Global Step: 25510   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:19:37,883-Speed 5308.20 samples/sec   Loss 10.6618   LearningRate 0.2556   Epoch: 2   Global Step: 25520   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:19:45,325-Speed 5505.24 samples/sec   Loss 10.6594   LearningRate 0.2556   Epoch: 2   Global Step: 25530   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:19:52,958-Speed 5366.85 samples/sec   Loss 10.6499   LearningRate 0.2556   Epoch: 2   Global Step: 25540   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:20:00,582-Speed 5373.51 samples/sec   Loss 10.7151   LearningRate 0.2555   Epoch: 2   Global Step: 25550   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:20:08,187-Speed 5386.03 samples/sec   Loss 10.6072   LearningRate 0.2555   Epoch: 2   Global Step: 25560   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:20:15,698-Speed 5454.51 samples/sec   Loss 10.6024   LearningRate 0.2555   Epoch: 2   Global Step: 25570   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:20:23,107-Speed 5528.62 samples/sec   Loss 10.6243   LearningRate 0.2555   Epoch: 2   Global Step: 25580   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 00:20:30,702-Speed 5393.96 samples/sec   Loss 10.6280   LearningRate 0.2554   Epoch: 2   Global Step: 25590   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:20:38,195-Speed 5467.20 samples/sec   Loss 10.7198   LearningRate 0.2554   Epoch: 2   Global Step: 25600   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:20:45,687-Speed 5467.61 samples/sec   Loss 10.6364   LearningRate 0.2554   Epoch: 2   Global Step: 25610   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:20:53,129-Speed 5505.14 samples/sec   Loss 10.6645   LearningRate 0.2554   Epoch: 2   Global Step: 25620   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:21:00,741-Speed 5381.67 samples/sec   Loss 10.6576   LearningRate 0.2553   Epoch: 2   Global Step: 25630   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:21:08,240-Speed 5462.63 samples/sec   Loss 10.6173   LearningRate 0.2553   Epoch: 2   Global Step: 25640   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:21:15,743-Speed 5459.33 samples/sec   Loss 10.7016   LearningRate 0.2553   Epoch: 2   Global Step: 25650   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:21:23,346-Speed 5388.60 samples/sec   Loss 10.7357   LearningRate 0.2552   Epoch: 2   Global Step: 25660   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:21:30,891-Speed 5428.90 samples/sec   Loss 10.6539   LearningRate 0.2552   Epoch: 2   Global Step: 25670   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:21:38,460-Speed 5412.78 samples/sec   Loss 10.5640   LearningRate 0.2552   Epoch: 2   Global Step: 25680   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:21:45,984-Speed 5444.57 samples/sec   Loss 10.6049   LearningRate 0.2552   Epoch: 2   Global Step: 25690   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:21:53,625-Speed 5360.86 samples/sec   Loss 10.6537   LearningRate 0.2551   Epoch: 2   Global Step: 25700   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:22:01,145-Speed 5447.40 samples/sec   Loss 10.7018   LearningRate 0.2551   Epoch: 2   Global Step: 25710   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:22:08,566-Speed 5520.48 samples/sec   Loss 10.6724   LearningRate 0.2551   Epoch: 2   Global Step: 25720   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:22:16,086-Speed 5447.66 samples/sec   Loss 10.6482   LearningRate 0.2550   Epoch: 2   Global Step: 25730   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:22:23,624-Speed 5434.85 samples/sec   Loss 10.7936   LearningRate 0.2550   Epoch: 2   Global Step: 25740   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:22:31,090-Speed 5486.98 samples/sec   Loss 10.5410   LearningRate 0.2550   Epoch: 2   Global Step: 25750   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:22:38,697-Speed 5384.79 samples/sec   Loss 10.6784   LearningRate 0.2550   Epoch: 2   Global Step: 25760   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:22:46,239-Speed 5431.65 samples/sec   Loss 10.6224   LearningRate 0.2549   Epoch: 2   Global Step: 25770   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:22:53,903-Speed 5345.27 samples/sec   Loss 10.6640   LearningRate 0.2549   Epoch: 2   Global Step: 25780   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:23:01,427-Speed 5444.71 samples/sec   Loss 10.6100   LearningRate 0.2549   Epoch: 2   Global Step: 25790   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 00:23:08,922-Speed 5465.95 samples/sec   Loss 10.7088   LearningRate 0.2548   Epoch: 2   Global Step: 25800   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:23:16,377-Speed 5494.83 samples/sec   Loss 10.6523   LearningRate 0.2548   Epoch: 2   Global Step: 25810   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:23:23,844-Speed 5486.60 samples/sec   Loss 10.6108   LearningRate 0.2548   Epoch: 2   Global Step: 25820   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:23:31,299-Speed 5495.13 samples/sec   Loss 10.6528   LearningRate 0.2548   Epoch: 2   Global Step: 25830   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-01-08 00:23:38,754-Speed 5494.83 samples/sec   Loss 10.5999   LearningRate 0.2547   Epoch: 2   Global Step: 25840   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-01-08 00:23:46,189-Speed 5509.80 samples/sec   Loss 10.5700   LearningRate 0.2547   Epoch: 2   Global Step: 25850   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-01-08 00:23:53,649-Speed 5491.45 samples/sec   Loss 10.6332   LearningRate 0.2547   Epoch: 2   Global Step: 25860   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-01-08 00:24:01,135-Speed 5472.51 samples/sec   Loss 10.5829   LearningRate 0.2546   Epoch: 2   Global Step: 25870   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-01-08 00:24:08,805-Speed 5340.98 samples/sec   Loss 10.5342   LearningRate 0.2546   Epoch: 2   Global Step: 25880   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-01-08 00:24:16,315-Speed 5454.66 samples/sec   Loss 10.6541   LearningRate 0.2546   Epoch: 2   Global Step: 25890   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-01-08 00:24:23,835-Speed 5447.14 samples/sec   Loss 10.7082   LearningRate 0.2546   Epoch: 2   Global Step: 25900   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-01-08 00:24:31,363-Speed 5442.02 samples/sec   Loss 10.6719   LearningRate 0.2545   Epoch: 2   Global Step: 25910   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-01-08 00:24:38,788-Speed 5517.14 samples/sec   Loss 10.5661   LearningRate 0.2545   Epoch: 2   Global Step: 25920   Fp16 Grad Scale: 32768   Required: 41 hours
Training: 2022-01-08 00:24:46,300-Speed 5453.49 samples/sec   Loss 10.7053   LearningRate 0.2545   Epoch: 2   Global Step: 25930   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:24:53,822-Speed 5446.43 samples/sec   Loss 10.6990   LearningRate 0.2545   Epoch: 2   Global Step: 25940   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:25:01,352-Speed 5440.90 samples/sec   Loss 10.6254   LearningRate 0.2544   Epoch: 2   Global Step: 25950   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:25:08,812-Speed 5490.81 samples/sec   Loss 10.6073   LearningRate 0.2544   Epoch: 2   Global Step: 25960   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:25:16,278-Speed 5487.48 samples/sec   Loss 10.6186   LearningRate 0.2544   Epoch: 2   Global Step: 25970   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:25:23,728-Speed 5498.15 samples/sec   Loss 10.6067   LearningRate 0.2543   Epoch: 2   Global Step: 25980   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:25:31,256-Speed 5442.35 samples/sec   Loss 10.5888   LearningRate 0.2543   Epoch: 2   Global Step: 25990   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:25:38,658-Speed 5533.80 samples/sec   Loss 10.6389   LearningRate 0.2543   Epoch: 2   Global Step: 26000   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:26:22,415-[lfw][26000]XNorm: 22.938673
Training: 2022-01-08 00:26:22,416-[lfw][26000]Accuracy-Flip: 0.99717+-0.00279
Training: 2022-01-08 00:26:22,416-[lfw][26000]Accuracy-Highest: 0.99767
Training: 2022-01-08 00:27:14,920-[cfp_fp][26000]XNorm: 20.785323
Training: 2022-01-08 00:27:14,921-[cfp_fp][26000]Accuracy-Flip: 0.98271+-0.00449
Training: 2022-01-08 00:27:14,922-[cfp_fp][26000]Accuracy-Highest: 0.98271
Training: 2022-01-08 00:28:00,104-[agedb_30][26000]XNorm: 22.997359
Training: 2022-01-08 00:28:00,105-[agedb_30][26000]Accuracy-Flip: 0.97167+-0.00707
Training: 2022-01-08 00:28:00,106-[agedb_30][26000]Accuracy-Highest: 0.97167
Training: 2022-01-08 00:28:07,601-Speed 275.01 samples/sec   Loss 10.6200   LearningRate 0.2543   Epoch: 2   Global Step: 26010   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:28:15,134-Speed 5440.20 samples/sec   Loss 10.6133   LearningRate 0.2542   Epoch: 2   Global Step: 26020   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:28:22,599-Speed 5488.36 samples/sec   Loss 10.4990   LearningRate 0.2542   Epoch: 2   Global Step: 26030   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:28:30,057-Speed 5493.27 samples/sec   Loss 10.6111   LearningRate 0.2542   Epoch: 2   Global Step: 26040   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:28:37,539-Speed 5476.13 samples/sec   Loss 10.5320   LearningRate 0.2541   Epoch: 2   Global Step: 26050   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:28:44,993-Speed 5495.81 samples/sec   Loss 10.6225   LearningRate 0.2541   Epoch: 2   Global Step: 26060   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:28:52,585-Speed 5396.42 samples/sec   Loss 10.6535   LearningRate 0.2541   Epoch: 2   Global Step: 26070   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:29:00,023-Speed 5507.56 samples/sec   Loss 10.5186   LearningRate 0.2541   Epoch: 2   Global Step: 26080   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:29:07,623-Speed 5390.26 samples/sec   Loss 10.6725   LearningRate 0.2540   Epoch: 2   Global Step: 26090   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:29:15,056-Speed 5510.99 samples/sec   Loss 10.6794   LearningRate 0.2540   Epoch: 2   Global Step: 26100   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:29:22,518-Speed 5490.01 samples/sec   Loss 10.6845   LearningRate 0.2540   Epoch: 2   Global Step: 26110   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:29:29,923-Speed 5532.21 samples/sec   Loss 10.6610   LearningRate 0.2539   Epoch: 2   Global Step: 26120   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:29:37,390-Speed 5486.23 samples/sec   Loss 10.5646   LearningRate 0.2539   Epoch: 2   Global Step: 26130   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 00:29:44,858-Speed 5486.03 samples/sec   Loss 10.5695   LearningRate 0.2539   Epoch: 2   Global Step: 26140   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 00:29:52,269-Speed 5527.87 samples/sec   Loss 10.6339   LearningRate 0.2539   Epoch: 2   Global Step: 26150   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:29:59,766-Speed 5463.82 samples/sec   Loss 10.5690   LearningRate 0.2538   Epoch: 2   Global Step: 26160   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:30:07,231-Speed 5488.14 samples/sec   Loss 10.6050   LearningRate 0.2538   Epoch: 2   Global Step: 26170   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:30:14,735-Speed 5459.23 samples/sec   Loss 10.6500   LearningRate 0.2538   Epoch: 2   Global Step: 26180   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:30:22,152-Speed 5523.22 samples/sec   Loss 10.5737   LearningRate 0.2538   Epoch: 2   Global Step: 26190   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:30:29,577-Speed 5517.10 samples/sec   Loss 10.6521   LearningRate 0.2537   Epoch: 2   Global Step: 26200   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:30:36,965-Speed 5545.01 samples/sec   Loss 10.5724   LearningRate 0.2537   Epoch: 2   Global Step: 26210   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:30:44,402-Speed 5508.05 samples/sec   Loss 10.5461   LearningRate 0.2537   Epoch: 2   Global Step: 26220   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:30:51,833-Speed 5513.63 samples/sec   Loss 10.5715   LearningRate 0.2536   Epoch: 2   Global Step: 26230   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:30:59,255-Speed 5518.85 samples/sec   Loss 10.6742   LearningRate 0.2536   Epoch: 2   Global Step: 26240   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:31:06,767-Speed 5453.71 samples/sec   Loss 10.5695   LearningRate 0.2536   Epoch: 2   Global Step: 26250   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:31:14,239-Speed 5482.43 samples/sec   Loss 10.5942   LearningRate 0.2536   Epoch: 2   Global Step: 26260   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:31:21,780-Speed 5432.95 samples/sec   Loss 10.6411   LearningRate 0.2535   Epoch: 2   Global Step: 26270   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:31:29,260-Speed 5475.99 samples/sec   Loss 10.5993   LearningRate 0.2535   Epoch: 2   Global Step: 26280   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:31:36,823-Speed 5416.86 samples/sec   Loss 10.6140   LearningRate 0.2535   Epoch: 2   Global Step: 26290   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:31:44,394-Speed 5410.65 samples/sec   Loss 10.5374   LearningRate 0.2534   Epoch: 2   Global Step: 26300   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:31:51,840-Speed 5501.77 samples/sec   Loss 10.6753   LearningRate 0.2534   Epoch: 2   Global Step: 26310   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:31:59,266-Speed 5516.69 samples/sec   Loss 10.5809   LearningRate 0.2534   Epoch: 2   Global Step: 26320   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:32:06,800-Speed 5437.36 samples/sec   Loss 10.5773   LearningRate 0.2534   Epoch: 2   Global Step: 26330   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:32:14,418-Speed 5377.49 samples/sec   Loss 10.5177   LearningRate 0.2533   Epoch: 2   Global Step: 26340   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:32:21,817-Speed 5536.87 samples/sec   Loss 10.6495   LearningRate 0.2533   Epoch: 2   Global Step: 26350   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:32:29,278-Speed 5490.22 samples/sec   Loss 10.6163   LearningRate 0.2533   Epoch: 2   Global Step: 26360   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 00:32:36,712-Speed 5510.97 samples/sec   Loss 10.6889   LearningRate 0.2532   Epoch: 2   Global Step: 26370   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 00:32:44,216-Speed 5458.94 samples/sec   Loss 10.5277   LearningRate 0.2532   Epoch: 2   Global Step: 26380   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 00:32:51,598-Speed 5549.36 samples/sec   Loss 10.5246   LearningRate 0.2532   Epoch: 2   Global Step: 26390   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 00:32:59,162-Speed 5416.32 samples/sec   Loss 10.5339   LearningRate 0.2532   Epoch: 2   Global Step: 26400   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 00:33:06,756-Speed 5393.84 samples/sec   Loss 10.6275   LearningRate 0.2531   Epoch: 2   Global Step: 26410   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:33:14,196-Speed 5506.40 samples/sec   Loss 10.7076   LearningRate 0.2531   Epoch: 2   Global Step: 26420   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:33:21,694-Speed 5464.31 samples/sec   Loss 10.5289   LearningRate 0.2531   Epoch: 2   Global Step: 26430   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:33:29,175-Speed 5475.64 samples/sec   Loss 10.5547   LearningRate 0.2531   Epoch: 2   Global Step: 26440   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:33:36,679-Speed 5459.10 samples/sec   Loss 10.5893   LearningRate 0.2530   Epoch: 2   Global Step: 26450   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:33:44,093-Speed 5525.28 samples/sec   Loss 10.5474   LearningRate 0.2530   Epoch: 2   Global Step: 26460   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:33:51,536-Speed 5504.29 samples/sec   Loss 10.6719   LearningRate 0.2530   Epoch: 2   Global Step: 26470   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:33:59,041-Speed 5458.35 samples/sec   Loss 10.5745   LearningRate 0.2529   Epoch: 2   Global Step: 26480   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:34:06,474-Speed 5511.36 samples/sec   Loss 10.4960   LearningRate 0.2529   Epoch: 2   Global Step: 26490   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:34:13,939-Speed 5487.34 samples/sec   Loss 10.5843   LearningRate 0.2529   Epoch: 2   Global Step: 26500   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:34:21,361-Speed 5519.89 samples/sec   Loss 10.5340   LearningRate 0.2529   Epoch: 2   Global Step: 26510   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:34:28,787-Speed 5516.46 samples/sec   Loss 10.5969   LearningRate 0.2528   Epoch: 2   Global Step: 26520   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:34:36,181-Speed 5540.69 samples/sec   Loss 10.5198   LearningRate 0.2528   Epoch: 2   Global Step: 26530   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:34:44,005-Speed 5236.05 samples/sec   Loss 10.5814   LearningRate 0.2528   Epoch: 2   Global Step: 26540   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:34:51,395-Speed 5543.58 samples/sec   Loss 10.6045   LearningRate 0.2527   Epoch: 2   Global Step: 26550   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:34:58,844-Speed 5499.17 samples/sec   Loss 10.6798   LearningRate 0.2527   Epoch: 2   Global Step: 26560   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:35:06,339-Speed 5465.92 samples/sec   Loss 10.5708   LearningRate 0.2527   Epoch: 2   Global Step: 26570   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:35:13,902-Speed 5416.54 samples/sec   Loss 10.6048   LearningRate 0.2527   Epoch: 2   Global Step: 26580   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:35:21,404-Speed 5460.26 samples/sec   Loss 10.5526   LearningRate 0.2526   Epoch: 2   Global Step: 26590   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:35:28,880-Speed 5479.75 samples/sec   Loss 10.5951   LearningRate 0.2526   Epoch: 2   Global Step: 26600   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:35:36,413-Speed 5438.42 samples/sec   Loss 10.5071   LearningRate 0.2526   Epoch: 2   Global Step: 26610   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:35:43,882-Speed 5484.52 samples/sec   Loss 10.6041   LearningRate 0.2525   Epoch: 2   Global Step: 26620   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:35:51,314-Speed 5511.94 samples/sec   Loss 10.6027   LearningRate 0.2525   Epoch: 2   Global Step: 26630   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:35:58,792-Speed 5478.57 samples/sec   Loss 10.5820   LearningRate 0.2525   Epoch: 2   Global Step: 26640   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:36:06,277-Speed 5473.13 samples/sec   Loss 10.5129   LearningRate 0.2525   Epoch: 2   Global Step: 26650   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:36:13,774-Speed 5464.38 samples/sec   Loss 10.5248   LearningRate 0.2524   Epoch: 2   Global Step: 26660   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:36:21,315-Speed 5432.47 samples/sec   Loss 10.5508   LearningRate 0.2524   Epoch: 2   Global Step: 26670   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:36:28,955-Speed 5361.87 samples/sec   Loss 10.5053   LearningRate 0.2524   Epoch: 2   Global Step: 26680   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:36:36,427-Speed 5483.22 samples/sec   Loss 10.5487   LearningRate 0.2524   Epoch: 2   Global Step: 26690   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:36:43,854-Speed 5515.71 samples/sec   Loss 10.5268   LearningRate 0.2523   Epoch: 2   Global Step: 26700   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:36:51,282-Speed 5514.94 samples/sec   Loss 10.5386   LearningRate 0.2523   Epoch: 2   Global Step: 26710   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:36:58,867-Speed 5400.58 samples/sec   Loss 10.5799   LearningRate 0.2523   Epoch: 2   Global Step: 26720   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:37:06,499-Speed 5367.94 samples/sec   Loss 10.5412   LearningRate 0.2522   Epoch: 2   Global Step: 26730   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 00:37:13,969-Speed 5484.07 samples/sec   Loss 10.5259   LearningRate 0.2522   Epoch: 2   Global Step: 26740   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:37:21,458-Speed 5470.23 samples/sec   Loss 10.4796   LearningRate 0.2522   Epoch: 2   Global Step: 26750   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:37:28,942-Speed 5473.58 samples/sec   Loss 10.5694   LearningRate 0.2522   Epoch: 2   Global Step: 26760   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:37:36,394-Speed 5497.42 samples/sec   Loss 10.5124   LearningRate 0.2521   Epoch: 2   Global Step: 26770   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:37:43,842-Speed 5499.80 samples/sec   Loss 10.4956   LearningRate 0.2521   Epoch: 2   Global Step: 26780   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:37:51,278-Speed 5509.41 samples/sec   Loss 10.5226   LearningRate 0.2521   Epoch: 2   Global Step: 26790   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:37:58,684-Speed 5531.79 samples/sec   Loss 10.5860   LearningRate 0.2520   Epoch: 2   Global Step: 26800   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:38:06,255-Speed 5410.77 samples/sec   Loss 10.4967   LearningRate 0.2520   Epoch: 2   Global Step: 26810   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:38:13,674-Speed 5521.94 samples/sec   Loss 10.5633   LearningRate 0.2520   Epoch: 2   Global Step: 26820   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:38:21,118-Speed 5503.20 samples/sec   Loss 10.5948   LearningRate 0.2520   Epoch: 2   Global Step: 26830   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:38:28,487-Speed 5558.27 samples/sec   Loss 10.5507   LearningRate 0.2519   Epoch: 2   Global Step: 26840   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:38:36,011-Speed 5445.19 samples/sec   Loss 10.5054   LearningRate 0.2519   Epoch: 2   Global Step: 26850   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:38:43,462-Speed 5498.05 samples/sec   Loss 10.5903   LearningRate 0.2519   Epoch: 2   Global Step: 26860   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:38:50,911-Speed 5498.94 samples/sec   Loss 10.5514   LearningRate 0.2519   Epoch: 2   Global Step: 26870   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:38:58,363-Speed 5497.43 samples/sec   Loss 10.5842   LearningRate 0.2518   Epoch: 2   Global Step: 26880   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:39:05,922-Speed 5420.02 samples/sec   Loss 10.4907   LearningRate 0.2518   Epoch: 2   Global Step: 26890   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:39:13,433-Speed 5453.93 samples/sec   Loss 10.5713   LearningRate 0.2518   Epoch: 2   Global Step: 26900   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:39:21,004-Speed 5410.22 samples/sec   Loss 10.5670   LearningRate 0.2517   Epoch: 2   Global Step: 26910   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:39:28,593-Speed 5398.33 samples/sec   Loss 10.4797   LearningRate 0.2517   Epoch: 2   Global Step: 26920   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:39:36,196-Speed 5388.34 samples/sec   Loss 10.5068   LearningRate 0.2517   Epoch: 2   Global Step: 26930   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:39:43,702-Speed 5458.01 samples/sec   Loss 10.4315   LearningRate 0.2517   Epoch: 2   Global Step: 26940   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:39:51,200-Speed 5462.99 samples/sec   Loss 10.5293   LearningRate 0.2516   Epoch: 2   Global Step: 26950   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:39:58,791-Speed 5397.02 samples/sec   Loss 10.5528   LearningRate 0.2516   Epoch: 2   Global Step: 26960   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:40:06,319-Speed 5441.39 samples/sec   Loss 10.5348   LearningRate 0.2516   Epoch: 2   Global Step: 26970   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:40:13,765-Speed 5502.36 samples/sec   Loss 10.5467   LearningRate 0.2515   Epoch: 2   Global Step: 26980   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:40:21,286-Speed 5446.23 samples/sec   Loss 10.5505   LearningRate 0.2515   Epoch: 2   Global Step: 26990   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:40:28,783-Speed 5464.60 samples/sec   Loss 10.5834   LearningRate 0.2515   Epoch: 2   Global Step: 27000   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:40:36,311-Speed 5441.85 samples/sec   Loss 10.4927   LearningRate 0.2515   Epoch: 2   Global Step: 27010   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:40:43,772-Speed 5490.64 samples/sec   Loss 10.4729   LearningRate 0.2514   Epoch: 2   Global Step: 27020   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:40:51,362-Speed 5396.68 samples/sec   Loss 10.6080   LearningRate 0.2514   Epoch: 2   Global Step: 27030   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:40:58,863-Speed 5462.24 samples/sec   Loss 10.5608   LearningRate 0.2514   Epoch: 2   Global Step: 27040   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:41:06,330-Speed 5486.36 samples/sec   Loss 10.5548   LearningRate 0.2513   Epoch: 2   Global Step: 27050   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:41:13,835-Speed 5458.05 samples/sec   Loss 10.5945   LearningRate 0.2513   Epoch: 2   Global Step: 27060   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:41:21,362-Speed 5441.84 samples/sec   Loss 10.5720   LearningRate 0.2513   Epoch: 2   Global Step: 27070   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:41:28,840-Speed 5478.48 samples/sec   Loss 10.4637   LearningRate 0.2513   Epoch: 2   Global Step: 27080   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:41:36,261-Speed 5520.52 samples/sec   Loss 10.4734   LearningRate 0.2512   Epoch: 2   Global Step: 27090   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:41:43,708-Speed 5500.98 samples/sec   Loss 10.4882   LearningRate 0.2512   Epoch: 2   Global Step: 27100   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:41:51,136-Speed 5514.68 samples/sec   Loss 10.4582   LearningRate 0.2512   Epoch: 2   Global Step: 27110   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:41:58,669-Speed 5438.44 samples/sec   Loss 10.4255   LearningRate 0.2512   Epoch: 2   Global Step: 27120   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:42:06,150-Speed 5475.66 samples/sec   Loss 10.4959   LearningRate 0.2511   Epoch: 2   Global Step: 27130   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:42:13,657-Speed 5457.55 samples/sec   Loss 10.4280   LearningRate 0.2511   Epoch: 2   Global Step: 27140   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:42:21,245-Speed 5398.25 samples/sec   Loss 10.4869   LearningRate 0.2511   Epoch: 2   Global Step: 27150   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:42:28,757-Speed 5453.44 samples/sec   Loss 10.5265   LearningRate 0.2510   Epoch: 2   Global Step: 27160   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 00:42:36,327-Speed 5412.20 samples/sec   Loss 10.5606   LearningRate 0.2510   Epoch: 2   Global Step: 27170   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 00:42:43,752-Speed 5517.19 samples/sec   Loss 10.4586   LearningRate 0.2510   Epoch: 2   Global Step: 27180   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:42:51,305-Speed 5423.47 samples/sec   Loss 10.5306   LearningRate 0.2510   Epoch: 2   Global Step: 27190   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:42:58,702-Speed 5538.14 samples/sec   Loss 10.4885   LearningRate 0.2509   Epoch: 2   Global Step: 27200   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:43:06,186-Speed 5474.26 samples/sec   Loss 10.5213   LearningRate 0.2509   Epoch: 2   Global Step: 27210   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:43:13,643-Speed 5493.57 samples/sec   Loss 10.5542   LearningRate 0.2509   Epoch: 2   Global Step: 27220   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:43:21,134-Speed 5468.46 samples/sec   Loss 10.4913   LearningRate 0.2508   Epoch: 2   Global Step: 27230   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:43:28,604-Speed 5483.74 samples/sec   Loss 10.5371   LearningRate 0.2508   Epoch: 2   Global Step: 27240   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:43:36,076-Speed 5483.23 samples/sec   Loss 10.4507   LearningRate 0.2508   Epoch: 2   Global Step: 27250   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:43:43,569-Speed 5466.64 samples/sec   Loss 10.4947   LearningRate 0.2508   Epoch: 2   Global Step: 27260   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:43:51,078-Speed 5455.73 samples/sec   Loss 10.4542   LearningRate 0.2507   Epoch: 2   Global Step: 27270   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:43:58,499-Speed 5519.92 samples/sec   Loss 10.4319   LearningRate 0.2507   Epoch: 2   Global Step: 27280   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:44:05,970-Speed 5483.86 samples/sec   Loss 10.4967   LearningRate 0.2507   Epoch: 2   Global Step: 27290   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:44:13,459-Speed 5469.53 samples/sec   Loss 10.4996   LearningRate 0.2507   Epoch: 2   Global Step: 27300   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:44:20,921-Speed 5490.23 samples/sec   Loss 10.4375   LearningRate 0.2506   Epoch: 2   Global Step: 27310   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:44:28,393-Speed 5482.34 samples/sec   Loss 10.5282   LearningRate 0.2506   Epoch: 2   Global Step: 27320   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:44:35,860-Speed 5486.50 samples/sec   Loss 10.4534   LearningRate 0.2506   Epoch: 2   Global Step: 27330   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:44:43,309-Speed 5499.13 samples/sec   Loss 10.5456   LearningRate 0.2505   Epoch: 2   Global Step: 27340   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:44:50,733-Speed 5518.00 samples/sec   Loss 10.5418   LearningRate 0.2505   Epoch: 2   Global Step: 27350   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:44:58,269-Speed 5436.42 samples/sec   Loss 10.4756   LearningRate 0.2505   Epoch: 2   Global Step: 27360   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:45:05,714-Speed 5502.15 samples/sec   Loss 10.5471   LearningRate 0.2505   Epoch: 2   Global Step: 27370   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:45:13,225-Speed 5454.31 samples/sec   Loss 10.4331   LearningRate 0.2504   Epoch: 2   Global Step: 27380   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:45:20,665-Speed 5505.78 samples/sec   Loss 10.4107   LearningRate 0.2504   Epoch: 2   Global Step: 27390   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:45:28,204-Speed 5434.19 samples/sec   Loss 10.4377   LearningRate 0.2504   Epoch: 2   Global Step: 27400   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 00:45:35,687-Speed 5474.78 samples/sec   Loss 10.4937   LearningRate 0.2503   Epoch: 2   Global Step: 27410   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 00:45:43,195-Speed 5455.94 samples/sec   Loss 10.4291   LearningRate 0.2503   Epoch: 2   Global Step: 27420   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 00:45:50,671-Speed 5479.95 samples/sec   Loss 10.4927   LearningRate 0.2503   Epoch: 2   Global Step: 27430   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 00:45:58,121-Speed 5498.41 samples/sec   Loss 10.4566   LearningRate 0.2503   Epoch: 2   Global Step: 27440   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 00:46:05,551-Speed 5513.30 samples/sec   Loss 10.4387   LearningRate 0.2502   Epoch: 2   Global Step: 27450   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:46:13,019-Speed 5485.88 samples/sec   Loss 10.4782   LearningRate 0.2502   Epoch: 2   Global Step: 27460   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:46:20,458-Speed 5507.21 samples/sec   Loss 10.4457   LearningRate 0.2502   Epoch: 2   Global Step: 27470   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:46:27,889-Speed 5512.09 samples/sec   Loss 10.4601   LearningRate 0.2502   Epoch: 2   Global Step: 27480   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:46:35,364-Speed 5480.78 samples/sec   Loss 10.4484   LearningRate 0.2501   Epoch: 2   Global Step: 27490   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:46:42,922-Speed 5419.98 samples/sec   Loss 10.5676   LearningRate 0.2501   Epoch: 2   Global Step: 27500   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:46:50,405-Speed 5474.32 samples/sec   Loss 10.5531   LearningRate 0.2501   Epoch: 2   Global Step: 27510   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:46:57,858-Speed 5497.00 samples/sec   Loss 10.4972   LearningRate 0.2500   Epoch: 2   Global Step: 27520   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:47:05,315-Speed 5493.46 samples/sec   Loss 10.4574   LearningRate 0.2500   Epoch: 2   Global Step: 27530   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:47:12,766-Speed 5497.79 samples/sec   Loss 10.4305   LearningRate 0.2500   Epoch: 2   Global Step: 27540   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:47:20,201-Speed 5510.39 samples/sec   Loss 10.4303   LearningRate 0.2500   Epoch: 2   Global Step: 27550   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:47:27,805-Speed 5386.94 samples/sec   Loss 10.4221   LearningRate 0.2499   Epoch: 2   Global Step: 27560   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:47:35,359-Speed 5423.42 samples/sec   Loss 10.4998   LearningRate 0.2499   Epoch: 2   Global Step: 27570   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:47:42,872-Speed 5452.28 samples/sec   Loss 10.4969   LearningRate 0.2499   Epoch: 2   Global Step: 27580   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:47:50,289-Speed 5523.31 samples/sec   Loss 10.4409   LearningRate 0.2498   Epoch: 2   Global Step: 27590   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:47:57,775-Speed 5472.61 samples/sec   Loss 10.3868   LearningRate 0.2498   Epoch: 2   Global Step: 27600   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:48:05,358-Speed 5401.87 samples/sec   Loss 10.4148   LearningRate 0.2498   Epoch: 2   Global Step: 27610   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:48:12,964-Speed 5386.30 samples/sec   Loss 10.3910   LearningRate 0.2498   Epoch: 2   Global Step: 27620   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:48:20,432-Speed 5485.53 samples/sec   Loss 10.4679   LearningRate 0.2497   Epoch: 2   Global Step: 27630   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:48:27,865-Speed 5510.80 samples/sec   Loss 10.4525   LearningRate 0.2497   Epoch: 2   Global Step: 27640   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:48:35,307-Speed 5505.27 samples/sec   Loss 10.5495   LearningRate 0.2497   Epoch: 2   Global Step: 27650   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 00:48:42,767-Speed 5491.11 samples/sec   Loss 10.3910   LearningRate 0.2497   Epoch: 2   Global Step: 27660   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 00:48:50,335-Speed 5413.06 samples/sec   Loss 10.4578   LearningRate 0.2496   Epoch: 2   Global Step: 27670   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 00:48:57,869-Speed 5437.81 samples/sec   Loss 10.4693   LearningRate 0.2496   Epoch: 2   Global Step: 27680   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:49:05,337-Speed 5485.31 samples/sec   Loss 10.4345   LearningRate 0.2496   Epoch: 2   Global Step: 27690   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:49:12,832-Speed 5465.72 samples/sec   Loss 10.4535   LearningRate 0.2495   Epoch: 2   Global Step: 27700   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:49:20,292-Speed 5491.79 samples/sec   Loss 10.5049   LearningRate 0.2495   Epoch: 2   Global Step: 27710   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:49:27,756-Speed 5487.98 samples/sec   Loss 10.3838   LearningRate 0.2495   Epoch: 2   Global Step: 27720   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:49:35,212-Speed 5494.51 samples/sec   Loss 10.4627   LearningRate 0.2495   Epoch: 2   Global Step: 27730   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:49:42,834-Speed 5375.11 samples/sec   Loss 10.4530   LearningRate 0.2494   Epoch: 2   Global Step: 27740   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:49:50,575-Speed 5292.65 samples/sec   Loss 10.4341   LearningRate 0.2494   Epoch: 2   Global Step: 27750   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:49:58,166-Speed 5396.80 samples/sec   Loss 10.3959   LearningRate 0.2494   Epoch: 2   Global Step: 27760   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:50:05,698-Speed 5438.81 samples/sec   Loss 10.4524   LearningRate 0.2493   Epoch: 2   Global Step: 27770   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:50:13,126-Speed 5514.84 samples/sec   Loss 10.4337   LearningRate 0.2493   Epoch: 2   Global Step: 27780   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 00:50:20,714-Speed 5398.95 samples/sec   Loss 10.4901   LearningRate 0.2493   Epoch: 2   Global Step: 27790   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 00:50:28,187-Speed 5481.73 samples/sec   Loss 10.3677   LearningRate 0.2493   Epoch: 2   Global Step: 27800   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 00:50:35,596-Speed 5529.41 samples/sec   Loss 10.4672   LearningRate 0.2492   Epoch: 2   Global Step: 27810   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:50:43,120-Speed 5444.63 samples/sec   Loss 10.4447   LearningRate 0.2492   Epoch: 2   Global Step: 27820   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:50:50,610-Speed 5468.98 samples/sec   Loss 10.4941   LearningRate 0.2492   Epoch: 2   Global Step: 27830   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:50:58,094-Speed 5473.61 samples/sec   Loss 10.4840   LearningRate 0.2492   Epoch: 2   Global Step: 27840   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:51:05,533-Speed 5507.23 samples/sec   Loss 10.4458   LearningRate 0.2491   Epoch: 2   Global Step: 27850   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:51:12,975-Speed 5504.07 samples/sec   Loss 10.4249   LearningRate 0.2491   Epoch: 2   Global Step: 27860   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:51:20,415-Speed 5506.25 samples/sec   Loss 10.3856   LearningRate 0.2491   Epoch: 2   Global Step: 27870   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:51:27,919-Speed 5459.05 samples/sec   Loss 10.4247   LearningRate 0.2490   Epoch: 2   Global Step: 27880   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:51:35,345-Speed 5516.92 samples/sec   Loss 10.4667   LearningRate 0.2490   Epoch: 2   Global Step: 27890   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:51:42,789-Speed 5502.79 samples/sec   Loss 10.4385   LearningRate 0.2490   Epoch: 2   Global Step: 27900   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:51:50,204-Speed 5525.15 samples/sec   Loss 10.4063   LearningRate 0.2490   Epoch: 2   Global Step: 27910   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 00:51:57,599-Speed 5539.46 samples/sec   Loss 10.4412   LearningRate 0.2489   Epoch: 2   Global Step: 27920   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 00:52:05,065-Speed 5487.28 samples/sec   Loss 10.5103   LearningRate 0.2489   Epoch: 2   Global Step: 27930   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 00:52:12,574-Speed 5455.30 samples/sec   Loss 10.4153   LearningRate 0.2489   Epoch: 2   Global Step: 27940   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:52:20,144-Speed 5411.24 samples/sec   Loss 10.4006   LearningRate 0.2488   Epoch: 2   Global Step: 27950   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:52:27,579-Speed 5510.32 samples/sec   Loss 10.3880   LearningRate 0.2488   Epoch: 2   Global Step: 27960   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:52:35,159-Speed 5403.95 samples/sec   Loss 10.4063   LearningRate 0.2488   Epoch: 2   Global Step: 27970   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:52:42,629-Speed 5484.45 samples/sec   Loss 10.4581   LearningRate 0.2488   Epoch: 2   Global Step: 27980   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:52:50,104-Speed 5480.70 samples/sec   Loss 10.4315   LearningRate 0.2487   Epoch: 2   Global Step: 27990   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:52:57,541-Speed 5508.12 samples/sec   Loss 10.3888   LearningRate 0.2487   Epoch: 2   Global Step: 28000   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:53:42,316-[lfw][28000]XNorm: 23.403026
Training: 2022-01-08 00:53:42,316-[lfw][28000]Accuracy-Flip: 0.99750+-0.00261
Training: 2022-01-08 00:53:42,317-[lfw][28000]Accuracy-Highest: 0.99767
Training: 2022-01-08 00:54:35,138-[cfp_fp][28000]XNorm: 21.135900
Training: 2022-01-08 00:54:35,139-[cfp_fp][28000]Accuracy-Flip: 0.98043+-0.00626
Training: 2022-01-08 00:54:35,140-[cfp_fp][28000]Accuracy-Highest: 0.98271
Training: 2022-01-08 00:55:21,509-[agedb_30][28000]XNorm: 23.027044
Training: 2022-01-08 00:55:21,510-[agedb_30][28000]Accuracy-Flip: 0.96683+-0.00858
Training: 2022-01-08 00:55:21,511-[agedb_30][28000]Accuracy-Highest: 0.97167
Training: 2022-01-08 00:55:29,080-Speed 270.30 samples/sec   Loss 10.5031   LearningRate 0.2487   Epoch: 2   Global Step: 28010   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:55:36,606-Speed 5443.70 samples/sec   Loss 10.5074   LearningRate 0.2487   Epoch: 2   Global Step: 28020   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:55:44,069-Speed 5489.82 samples/sec   Loss 10.3859   LearningRate 0.2486   Epoch: 2   Global Step: 28030   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:55:51,510-Speed 5506.41 samples/sec   Loss 10.4960   LearningRate 0.2486   Epoch: 2   Global Step: 28040   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 00:55:58,941-Speed 5513.54 samples/sec   Loss 10.3912   LearningRate 0.2486   Epoch: 2   Global Step: 28050   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 00:56:06,544-Speed 5388.82 samples/sec   Loss 10.4438   LearningRate 0.2485   Epoch: 2   Global Step: 28060   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 00:56:14,034-Speed 5468.61 samples/sec   Loss 10.4406   LearningRate 0.2485   Epoch: 2   Global Step: 28070   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 00:56:21,516-Speed 5475.75 samples/sec   Loss 10.4261   LearningRate 0.2485   Epoch: 2   Global Step: 28080   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 00:56:28,972-Speed 5494.68 samples/sec   Loss 10.3799   LearningRate 0.2485   Epoch: 2   Global Step: 28090   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:56:36,470-Speed 5463.32 samples/sec   Loss 10.3119   LearningRate 0.2484   Epoch: 2   Global Step: 28100   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:56:44,028-Speed 5420.01 samples/sec   Loss 10.4507   LearningRate 0.2484   Epoch: 2   Global Step: 28110   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:56:51,432-Speed 5533.04 samples/sec   Loss 10.5030   LearningRate 0.2484   Epoch: 2   Global Step: 28120   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:56:58,839-Speed 5530.61 samples/sec   Loss 10.5120   LearningRate 0.2483   Epoch: 2   Global Step: 28130   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:57:06,303-Speed 5487.89 samples/sec   Loss 10.4313   LearningRate 0.2483   Epoch: 2   Global Step: 28140   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:57:13,750-Speed 5501.17 samples/sec   Loss 10.4042   LearningRate 0.2483   Epoch: 2   Global Step: 28150   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:57:21,281-Speed 5439.57 samples/sec   Loss 10.4462   LearningRate 0.2483   Epoch: 2   Global Step: 28160   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:57:28,704-Speed 5518.68 samples/sec   Loss 10.3460   LearningRate 0.2482   Epoch: 2   Global Step: 28170   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:57:36,410-Speed 5315.66 samples/sec   Loss 10.3556   LearningRate 0.2482   Epoch: 2   Global Step: 28180   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:57:43,908-Speed 5463.79 samples/sec   Loss 10.4144   LearningRate 0.2482   Epoch: 2   Global Step: 28190   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 00:57:51,350-Speed 5504.69 samples/sec   Loss 10.4097   LearningRate 0.2482   Epoch: 2   Global Step: 28200   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:57:58,807-Speed 5494.17 samples/sec   Loss 10.4463   LearningRate 0.2481   Epoch: 2   Global Step: 28210   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:58:06,478-Speed 5339.47 samples/sec   Loss 10.4182   LearningRate 0.2481   Epoch: 2   Global Step: 28220   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:58:13,932-Speed 5496.10 samples/sec   Loss 10.4328   LearningRate 0.2481   Epoch: 2   Global Step: 28230   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:58:21,513-Speed 5404.06 samples/sec   Loss 10.4360   LearningRate 0.2480   Epoch: 2   Global Step: 28240   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:58:28,954-Speed 5504.68 samples/sec   Loss 10.3564   LearningRate 0.2480   Epoch: 2   Global Step: 28250   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:58:36,367-Speed 5526.00 samples/sec   Loss 10.4399   LearningRate 0.2480   Epoch: 2   Global Step: 28260   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:58:43,821-Speed 5496.24 samples/sec   Loss 10.3692   LearningRate 0.2480   Epoch: 2   Global Step: 28270   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:58:51,319-Speed 5463.45 samples/sec   Loss 10.3885   LearningRate 0.2479   Epoch: 2   Global Step: 28280   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:58:58,729-Speed 5528.51 samples/sec   Loss 10.3948   LearningRate 0.2479   Epoch: 2   Global Step: 28290   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 00:59:06,306-Speed 5406.11 samples/sec   Loss 10.3503   LearningRate 0.2479   Epoch: 2   Global Step: 28300   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:59:13,804-Speed 5463.49 samples/sec   Loss 10.4399   LearningRate 0.2478   Epoch: 2   Global Step: 28310   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:59:21,286-Speed 5486.02 samples/sec   Loss 10.4010   LearningRate 0.2478   Epoch: 2   Global Step: 28320   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:59:28,845-Speed 5418.85 samples/sec   Loss 10.2974   LearningRate 0.2478   Epoch: 2   Global Step: 28330   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:59:36,279-Speed 5510.82 samples/sec   Loss 10.3884   LearningRate 0.2478   Epoch: 2   Global Step: 28340   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:59:43,728-Speed 5499.57 samples/sec   Loss 10.3930   LearningRate 0.2477   Epoch: 2   Global Step: 28350   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:59:51,132-Speed 5533.15 samples/sec   Loss 10.4189   LearningRate 0.2477   Epoch: 2   Global Step: 28360   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 00:59:58,592-Speed 5491.24 samples/sec   Loss 10.3602   LearningRate 0.2477   Epoch: 2   Global Step: 28370   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 01:00:06,093-Speed 5461.65 samples/sec   Loss 10.3796   LearningRate 0.2477   Epoch: 2   Global Step: 28380   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 01:00:13,582-Speed 5469.70 samples/sec   Loss 10.4058   LearningRate 0.2476   Epoch: 2   Global Step: 28390   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 01:00:21,047-Speed 5488.33 samples/sec   Loss 10.4145   LearningRate 0.2476   Epoch: 2   Global Step: 28400   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:00:28,584-Speed 5435.22 samples/sec   Loss 10.3931   LearningRate 0.2476   Epoch: 2   Global Step: 28410   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:00:36,028-Speed 5502.41 samples/sec   Loss 10.3513   LearningRate 0.2475   Epoch: 2   Global Step: 28420   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 01:00:43,518-Speed 5469.76 samples/sec   Loss 10.3876   LearningRate 0.2475   Epoch: 2   Global Step: 28430   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 01:00:50,924-Speed 5531.77 samples/sec   Loss 10.4254   LearningRate 0.2475   Epoch: 2   Global Step: 28440   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 01:00:58,376-Speed 5496.35 samples/sec   Loss 10.3749   LearningRate 0.2475   Epoch: 2   Global Step: 28450   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 01:01:05,861-Speed 5473.32 samples/sec   Loss 10.3242   LearningRate 0.2474   Epoch: 2   Global Step: 28460   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 01:01:13,320-Speed 5492.67 samples/sec   Loss 10.3717   LearningRate 0.2474   Epoch: 2   Global Step: 28470   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 01:01:20,833-Speed 5452.30 samples/sec   Loss 10.3794   LearningRate 0.2474   Epoch: 2   Global Step: 28480   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 01:01:28,336-Speed 5460.08 samples/sec   Loss 10.4189   LearningRate 0.2474   Epoch: 2   Global Step: 28490   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 01:01:35,783-Speed 5500.97 samples/sec   Loss 10.2567   LearningRate 0.2473   Epoch: 2   Global Step: 28500   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 01:01:43,208-Speed 5517.48 samples/sec   Loss 10.3149   LearningRate 0.2473   Epoch: 2   Global Step: 28510   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 01:01:50,632-Speed 5517.79 samples/sec   Loss 10.3176   LearningRate 0.2473   Epoch: 2   Global Step: 28520   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:01:58,138-Speed 5457.61 samples/sec   Loss 10.3304   LearningRate 0.2472   Epoch: 2   Global Step: 28530   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:02:05,644-Speed 5457.98 samples/sec   Loss 10.3127   LearningRate 0.2472   Epoch: 2   Global Step: 28540   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:02:13,156-Speed 5453.41 samples/sec   Loss 10.3147   LearningRate 0.2472   Epoch: 2   Global Step: 28550   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:02:20,644-Speed 5470.36 samples/sec   Loss 10.5062   LearningRate 0.2472   Epoch: 2   Global Step: 28560   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 01:02:28,088-Speed 5503.59 samples/sec   Loss 10.3970   LearningRate 0.2471   Epoch: 2   Global Step: 28570   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 01:02:35,593-Speed 5458.32 samples/sec   Loss 10.2842   LearningRate 0.2471   Epoch: 2   Global Step: 28580   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 01:02:43,015-Speed 5519.69 samples/sec   Loss 10.3775   LearningRate 0.2471   Epoch: 2   Global Step: 28590   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 01:02:50,476-Speed 5490.14 samples/sec   Loss 10.3674   LearningRate 0.2470   Epoch: 2   Global Step: 28600   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 01:02:57,958-Speed 5475.94 samples/sec   Loss 10.3852   LearningRate 0.2470   Epoch: 2   Global Step: 28610   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 01:03:05,459-Speed 5460.93 samples/sec   Loss 10.3477   LearningRate 0.2470   Epoch: 2   Global Step: 28620   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 01:03:13,058-Speed 5391.16 samples/sec   Loss 10.3486   LearningRate 0.2470   Epoch: 2   Global Step: 28630   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 01:03:20,624-Speed 5414.15 samples/sec   Loss 10.4293   LearningRate 0.2469   Epoch: 2   Global Step: 28640   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 01:03:28,079-Speed 5495.26 samples/sec   Loss 10.4544   LearningRate 0.2469   Epoch: 2   Global Step: 28650   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 01:03:35,556-Speed 5479.13 samples/sec   Loss 10.3045   LearningRate 0.2469   Epoch: 2   Global Step: 28660   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:03:43,025-Speed 5485.14 samples/sec   Loss 10.3365   LearningRate 0.2469   Epoch: 2   Global Step: 28670   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:03:50,411-Speed 5545.43 samples/sec   Loss 10.2640   LearningRate 0.2468   Epoch: 2   Global Step: 28680   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:03:57,875-Speed 5488.78 samples/sec   Loss 10.2456   LearningRate 0.2468   Epoch: 2   Global Step: 28690   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:04:05,574-Speed 5321.22 samples/sec   Loss 10.3326   LearningRate 0.2468   Epoch: 2   Global Step: 28700   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:04:12,992-Speed 5522.50 samples/sec   Loss 10.3368   LearningRate 0.2467   Epoch: 2   Global Step: 28710   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:04:20,546-Speed 5422.74 samples/sec   Loss 10.4483   LearningRate 0.2467   Epoch: 2   Global Step: 28720   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:04:27,981-Speed 5509.50 samples/sec   Loss 10.3580   LearningRate 0.2467   Epoch: 2   Global Step: 28730   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:04:35,377-Speed 5538.97 samples/sec   Loss 10.4111   LearningRate 0.2467   Epoch: 2   Global Step: 28740   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:04:42,780-Speed 5533.80 samples/sec   Loss 10.4250   LearningRate 0.2466   Epoch: 2   Global Step: 28750   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:04:50,218-Speed 5507.72 samples/sec   Loss 10.3333   LearningRate 0.2466   Epoch: 2   Global Step: 28760   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:04:57,675-Speed 5493.69 samples/sec   Loss 10.3811   LearningRate 0.2466   Epoch: 2   Global Step: 28770   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:05:05,154-Speed 5477.05 samples/sec   Loss 10.3315   LearningRate 0.2465   Epoch: 2   Global Step: 28780   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:05:12,643-Speed 5470.47 samples/sec   Loss 10.3571   LearningRate 0.2465   Epoch: 2   Global Step: 28790   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:05:20,125-Speed 5475.27 samples/sec   Loss 10.3486   LearningRate 0.2465   Epoch: 2   Global Step: 28800   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:05:27,613-Speed 5470.80 samples/sec   Loss 10.3085   LearningRate 0.2465   Epoch: 2   Global Step: 28810   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:05:35,153-Speed 5432.97 samples/sec   Loss 10.3022   LearningRate 0.2464   Epoch: 2   Global Step: 28820   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:05:42,740-Speed 5400.02 samples/sec   Loss 10.3579   LearningRate 0.2464   Epoch: 2   Global Step: 28830   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:05:50,303-Speed 5416.45 samples/sec   Loss 10.2850   LearningRate 0.2464   Epoch: 2   Global Step: 28840   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:05:57,684-Speed 5549.80 samples/sec   Loss 10.4319   LearningRate 0.2464   Epoch: 2   Global Step: 28850   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:06:05,233-Speed 5426.96 samples/sec   Loss 10.4070   LearningRate 0.2463   Epoch: 2   Global Step: 28860   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 01:06:12,653-Speed 5520.46 samples/sec   Loss 10.3923   LearningRate 0.2463   Epoch: 2   Global Step: 28870   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 01:06:20,132-Speed 5477.38 samples/sec   Loss 10.3170   LearningRate 0.2463   Epoch: 2   Global Step: 28880   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 01:06:27,576-Speed 5503.54 samples/sec   Loss 10.3453   LearningRate 0.2462   Epoch: 2   Global Step: 28890   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 01:06:34,993-Speed 5522.92 samples/sec   Loss 10.3159   LearningRate 0.2462   Epoch: 2   Global Step: 28900   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:06:42,434-Speed 5505.73 samples/sec   Loss 10.3516   LearningRate 0.2462   Epoch: 2   Global Step: 28910   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:06:49,880-Speed 5501.52 samples/sec   Loss 10.3344   LearningRate 0.2462   Epoch: 2   Global Step: 28920   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:06:57,335-Speed 5495.06 samples/sec   Loss 10.3175   LearningRate 0.2461   Epoch: 2   Global Step: 28930   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:07:04,873-Speed 5434.71 samples/sec   Loss 10.3049   LearningRate 0.2461   Epoch: 2   Global Step: 28940   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:07:12,349-Speed 5480.21 samples/sec   Loss 10.3365   LearningRate 0.2461   Epoch: 2   Global Step: 28950   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:07:19,845-Speed 5464.63 samples/sec   Loss 10.3849   LearningRate 0.2461   Epoch: 2   Global Step: 28960   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:07:27,365-Speed 5447.75 samples/sec   Loss 10.3234   LearningRate 0.2460   Epoch: 2   Global Step: 28970   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:07:34,846-Speed 5475.86 samples/sec   Loss 10.3206   LearningRate 0.2460   Epoch: 2   Global Step: 28980   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:07:42,265-Speed 5522.13 samples/sec   Loss 10.3125   LearningRate 0.2460   Epoch: 2   Global Step: 28990   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:07:49,707-Speed 5504.43 samples/sec   Loss 10.3222   LearningRate 0.2459   Epoch: 2   Global Step: 29000   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 01:07:57,134-Speed 5515.29 samples/sec   Loss 10.3220   LearningRate 0.2459   Epoch: 2   Global Step: 29010   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 01:08:04,630-Speed 5465.31 samples/sec   Loss 10.2858   LearningRate 0.2459   Epoch: 2   Global Step: 29020   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 01:08:12,014-Speed 5548.02 samples/sec   Loss 10.3506   LearningRate 0.2459   Epoch: 2   Global Step: 29030   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:08:19,485-Speed 5483.02 samples/sec   Loss 10.3723   LearningRate 0.2458   Epoch: 2   Global Step: 29040   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:08:26,919-Speed 5510.60 samples/sec   Loss 10.3503   LearningRate 0.2458   Epoch: 2   Global Step: 29050   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:08:34,454-Speed 5437.18 samples/sec   Loss 10.3091   LearningRate 0.2458   Epoch: 2   Global Step: 29060   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:08:41,857-Speed 5533.19 samples/sec   Loss 10.2872   LearningRate 0.2457   Epoch: 2   Global Step: 29070   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:08:49,286-Speed 5514.45 samples/sec   Loss 10.3007   LearningRate 0.2457   Epoch: 2   Global Step: 29080   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:08:56,673-Speed 5545.77 samples/sec   Loss 10.3141   LearningRate 0.2457   Epoch: 2   Global Step: 29090   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:09:04,245-Speed 5410.11 samples/sec   Loss 10.3507   LearningRate 0.2457   Epoch: 2   Global Step: 29100   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:09:11,753-Speed 5456.41 samples/sec   Loss 10.2348   LearningRate 0.2456   Epoch: 2   Global Step: 29110   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:09:19,231-Speed 5478.18 samples/sec   Loss 10.2997   LearningRate 0.2456   Epoch: 2   Global Step: 29120   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:09:26,707-Speed 5479.80 samples/sec   Loss 10.3492   LearningRate 0.2456   Epoch: 2   Global Step: 29130   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 01:09:34,174-Speed 5487.56 samples/sec   Loss 10.3503   LearningRate 0.2456   Epoch: 2   Global Step: 29140   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:09:41,641-Speed 5485.51 samples/sec   Loss 10.2459   LearningRate 0.2455   Epoch: 2   Global Step: 29150   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:09:49,128-Speed 5471.97 samples/sec   Loss 10.3205   LearningRate 0.2455   Epoch: 2   Global Step: 29160   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:09:56,558-Speed 5513.83 samples/sec   Loss 10.3077   LearningRate 0.2455   Epoch: 2   Global Step: 29170   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:10:04,017-Speed 5492.19 samples/sec   Loss 10.3087   LearningRate 0.2454   Epoch: 2   Global Step: 29180   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:10:11,562-Speed 5429.55 samples/sec   Loss 10.3342   LearningRate 0.2454   Epoch: 2   Global Step: 29190   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:10:19,118-Speed 5421.61 samples/sec   Loss 10.2842   LearningRate 0.2454   Epoch: 2   Global Step: 29200   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:10:26,636-Speed 5449.43 samples/sec   Loss 10.2910   LearningRate 0.2454   Epoch: 2   Global Step: 29210   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:10:34,188-Speed 5423.76 samples/sec   Loss 10.3422   LearningRate 0.2453   Epoch: 2   Global Step: 29220   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:10:41,678-Speed 5469.83 samples/sec   Loss 10.3917   LearningRate 0.2453   Epoch: 2   Global Step: 29230   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:10:49,161-Speed 5474.54 samples/sec   Loss 10.3027   LearningRate 0.2453   Epoch: 2   Global Step: 29240   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 01:10:56,627-Speed 5487.32 samples/sec   Loss 10.2499   LearningRate 0.2453   Epoch: 2   Global Step: 29250   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 01:11:04,103-Speed 5479.03 samples/sec   Loss 10.3177   LearningRate 0.2452   Epoch: 2   Global Step: 29260   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:11:11,533-Speed 5513.81 samples/sec   Loss 10.2570   LearningRate 0.2452   Epoch: 2   Global Step: 29270   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 01:11:19,012-Speed 5477.54 samples/sec   Loss 10.2773   LearningRate 0.2452   Epoch: 2   Global Step: 29280   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 01:11:26,435-Speed 5518.98 samples/sec   Loss 10.3114   LearningRate 0.2451   Epoch: 2   Global Step: 29290   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 01:11:33,896-Speed 5490.77 samples/sec   Loss 10.2382   LearningRate 0.2451   Epoch: 2   Global Step: 29300   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 01:11:41,296-Speed 5535.92 samples/sec   Loss 10.2538   LearningRate 0.2451   Epoch: 2   Global Step: 29310   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 01:11:48,640-Speed 5577.49 samples/sec   Loss 10.3292   LearningRate 0.2451   Epoch: 2   Global Step: 29320   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 01:11:56,081-Speed 5506.11 samples/sec   Loss 10.2393   LearningRate 0.2450   Epoch: 2   Global Step: 29330   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 01:12:03,544-Speed 5488.54 samples/sec   Loss 10.3252   LearningRate 0.2450   Epoch: 2   Global Step: 29340   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 01:12:11,001-Speed 5494.06 samples/sec   Loss 10.3402   LearningRate 0.2450   Epoch: 2   Global Step: 29350   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 01:12:18,518-Speed 5449.82 samples/sec   Loss 10.4045   LearningRate 0.2450   Epoch: 2   Global Step: 29360   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 01:12:26,091-Speed 5409.03 samples/sec   Loss 10.2446   LearningRate 0.2449   Epoch: 2   Global Step: 29370   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:12:33,694-Speed 5388.19 samples/sec   Loss 10.3315   LearningRate 0.2449   Epoch: 2   Global Step: 29380   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:12:41,279-Speed 5400.61 samples/sec   Loss 10.3108   LearningRate 0.2449   Epoch: 2   Global Step: 29390   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:12:48,810-Speed 5439.38 samples/sec   Loss 10.3771   LearningRate 0.2448   Epoch: 2   Global Step: 29400   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:12:56,222-Speed 5527.14 samples/sec   Loss 10.2629   LearningRate 0.2448   Epoch: 2   Global Step: 29410   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 01:13:03,766-Speed 5430.78 samples/sec   Loss 10.3255   LearningRate 0.2448   Epoch: 2   Global Step: 29420   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 01:13:11,284-Speed 5448.49 samples/sec   Loss 10.2531   LearningRate 0.2448   Epoch: 2   Global Step: 29430   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 01:13:18,818-Speed 5437.49 samples/sec   Loss 10.2690   LearningRate 0.2447   Epoch: 2   Global Step: 29440   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 01:13:26,301-Speed 5474.31 samples/sec   Loss 10.2624   LearningRate 0.2447   Epoch: 2   Global Step: 29450   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 01:13:33,755-Speed 5495.88 samples/sec   Loss 10.2667   LearningRate 0.2447   Epoch: 2   Global Step: 29460   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-08 01:13:41,251-Speed 5464.83 samples/sec   Loss 10.2334   LearningRate 0.2446   Epoch: 2   Global Step: 29470   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:13:48,747-Speed 5464.76 samples/sec   Loss 10.3193   LearningRate 0.2446   Epoch: 2   Global Step: 29480   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:13:56,262-Speed 5451.73 samples/sec   Loss 10.2841   LearningRate 0.2446   Epoch: 2   Global Step: 29490   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:14:03,865-Speed 5387.84 samples/sec   Loss 10.2516   LearningRate 0.2446   Epoch: 2   Global Step: 29500   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:14:11,469-Speed 5387.16 samples/sec   Loss 10.2915   LearningRate 0.2445   Epoch: 2   Global Step: 29510   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:14:19,115-Speed 5357.87 samples/sec   Loss 10.2808   LearningRate 0.2445   Epoch: 2   Global Step: 29520   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:14:26,722-Speed 5384.94 samples/sec   Loss 10.3182   LearningRate 0.2445   Epoch: 2   Global Step: 29530   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:14:34,309-Speed 5399.99 samples/sec   Loss 10.2275   LearningRate 0.2445   Epoch: 2   Global Step: 29540   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:14:41,789-Speed 5476.38 samples/sec   Loss 10.2742   LearningRate 0.2444   Epoch: 2   Global Step: 29550   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:14:49,458-Speed 5341.24 samples/sec   Loss 10.2990   LearningRate 0.2444   Epoch: 2   Global Step: 29560   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:14:56,902-Speed 5503.77 samples/sec   Loss 10.3243   LearningRate 0.2444   Epoch: 2   Global Step: 29570   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:15:04,386-Speed 5473.74 samples/sec   Loss 10.3139   LearningRate 0.2443   Epoch: 2   Global Step: 29580   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:15:11,979-Speed 5394.45 samples/sec   Loss 10.2460   LearningRate 0.2443   Epoch: 2   Global Step: 29590   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:15:19,561-Speed 5403.08 samples/sec   Loss 10.3109   LearningRate 0.2443   Epoch: 2   Global Step: 29600   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:15:27,049-Speed 5471.27 samples/sec   Loss 10.2915   LearningRate 0.2443   Epoch: 2   Global Step: 29610   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:15:34,592-Speed 5431.04 samples/sec   Loss 10.3366   LearningRate 0.2442   Epoch: 2   Global Step: 29620   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:15:42,142-Speed 5425.25 samples/sec   Loss 10.3138   LearningRate 0.2442   Epoch: 2   Global Step: 29630   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:15:49,583-Speed 5504.89 samples/sec   Loss 10.2604   LearningRate 0.2442   Epoch: 2   Global Step: 29640   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:15:57,069-Speed 5472.96 samples/sec   Loss 10.2588   LearningRate 0.2442   Epoch: 2   Global Step: 29650   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:16:04,646-Speed 5406.91 samples/sec   Loss 10.3317   LearningRate 0.2441   Epoch: 2   Global Step: 29660   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:16:12,061-Speed 5524.24 samples/sec   Loss 10.2724   LearningRate 0.2441   Epoch: 2   Global Step: 29670   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:16:19,586-Speed 5443.63 samples/sec   Loss 10.1892   LearningRate 0.2441   Epoch: 2   Global Step: 29680   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:16:27,088-Speed 5460.51 samples/sec   Loss 10.2100   LearningRate 0.2440   Epoch: 2   Global Step: 29690   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:16:34,550-Speed 5490.48 samples/sec   Loss 10.2372   LearningRate 0.2440   Epoch: 2   Global Step: 29700   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:16:42,137-Speed 5398.73 samples/sec   Loss 10.2258   LearningRate 0.2440   Epoch: 2   Global Step: 29710   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:16:49,755-Speed 5377.27 samples/sec   Loss 10.2832   LearningRate 0.2440   Epoch: 2   Global Step: 29720   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:16:57,335-Speed 5405.03 samples/sec   Loss 10.3056   LearningRate 0.2439   Epoch: 2   Global Step: 29730   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:17:04,825-Speed 5469.54 samples/sec   Loss 10.2010   LearningRate 0.2439   Epoch: 2   Global Step: 29740   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:17:12,276-Speed 5497.38 samples/sec   Loss 10.3197   LearningRate 0.2439   Epoch: 2   Global Step: 29750   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:17:19,796-Speed 5447.52 samples/sec   Loss 10.2634   LearningRate 0.2439   Epoch: 2   Global Step: 29760   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:17:27,394-Speed 5391.60 samples/sec   Loss 10.2787   LearningRate 0.2438   Epoch: 2   Global Step: 29770   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:17:34,830-Speed 5509.92 samples/sec   Loss 10.3610   LearningRate 0.2438   Epoch: 2   Global Step: 29780   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:17:42,355-Speed 5443.24 samples/sec   Loss 10.3312   LearningRate 0.2438   Epoch: 2   Global Step: 29790   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:17:49,851-Speed 5465.02 samples/sec   Loss 10.3473   LearningRate 0.2437   Epoch: 2   Global Step: 29800   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:17:57,466-Speed 5379.94 samples/sec   Loss 10.2791   LearningRate 0.2437   Epoch: 2   Global Step: 29810   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:18:05,000-Speed 5437.44 samples/sec   Loss 10.2852   LearningRate 0.2437   Epoch: 2   Global Step: 29820   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:18:12,443-Speed 5503.50 samples/sec   Loss 10.2395   LearningRate 0.2437   Epoch: 2   Global Step: 29830   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:18:19,929-Speed 5471.86 samples/sec   Loss 10.2930   LearningRate 0.2436   Epoch: 2   Global Step: 29840   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:18:27,386-Speed 5494.66 samples/sec   Loss 10.2769   LearningRate 0.2436   Epoch: 2   Global Step: 29850   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:18:35,022-Speed 5364.26 samples/sec   Loss 10.3143   LearningRate 0.2436   Epoch: 2   Global Step: 29860   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:18:42,489-Speed 5486.45 samples/sec   Loss 10.2773   LearningRate 0.2435   Epoch: 2   Global Step: 29870   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:18:49,887-Speed 5536.88 samples/sec   Loss 10.2634   LearningRate 0.2435   Epoch: 2   Global Step: 29880   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:18:57,385-Speed 5464.24 samples/sec   Loss 10.3211   LearningRate 0.2435   Epoch: 2   Global Step: 29890   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:19:04,851-Speed 5486.81 samples/sec   Loss 10.2523   LearningRate 0.2435   Epoch: 2   Global Step: 29900   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:19:12,313-Speed 5489.80 samples/sec   Loss 10.2686   LearningRate 0.2434   Epoch: 2   Global Step: 29910   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:19:19,781-Speed 5485.42 samples/sec   Loss 10.2136   LearningRate 0.2434   Epoch: 2   Global Step: 29920   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:19:27,251-Speed 5484.78 samples/sec   Loss 10.2265   LearningRate 0.2434   Epoch: 2   Global Step: 29930   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:19:34,728-Speed 5479.08 samples/sec   Loss 10.2984   LearningRate 0.2434   Epoch: 2   Global Step: 29940   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:19:42,315-Speed 5399.39 samples/sec   Loss 10.2010   LearningRate 0.2433   Epoch: 2   Global Step: 29950   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:19:49,777-Speed 5489.03 samples/sec   Loss 10.2699   LearningRate 0.2433   Epoch: 2   Global Step: 29960   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:19:57,337-Speed 5419.71 samples/sec   Loss 10.2691   LearningRate 0.2433   Epoch: 2   Global Step: 29970   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:20:04,833-Speed 5464.45 samples/sec   Loss 10.2596   LearningRate 0.2432   Epoch: 2   Global Step: 29980   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:20:12,289-Speed 5494.88 samples/sec   Loss 10.2622   LearningRate 0.2432   Epoch: 2   Global Step: 29990   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:20:19,783-Speed 5466.01 samples/sec   Loss 10.1201   LearningRate 0.2432   Epoch: 2   Global Step: 30000   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:21:03,965-[lfw][30000]XNorm: 23.945636
Training: 2022-01-08 01:21:03,966-[lfw][30000]Accuracy-Flip: 0.99733+-0.00291
Training: 2022-01-08 01:21:03,966-[lfw][30000]Accuracy-Highest: 0.99767
Training: 2022-01-08 01:21:56,813-[cfp_fp][30000]XNorm: 21.433830
Training: 2022-01-08 01:21:56,814-[cfp_fp][30000]Accuracy-Flip: 0.98114+-0.00486
Training: 2022-01-08 01:21:56,815-[cfp_fp][30000]Accuracy-Highest: 0.98271
Training: 2022-01-08 01:22:42,750-[agedb_30][30000]XNorm: 23.247524
Training: 2022-01-08 01:22:42,751-[agedb_30][30000]Accuracy-Flip: 0.96900+-0.00821
Training: 2022-01-08 01:22:42,752-[agedb_30][30000]Accuracy-Highest: 0.97167
Training: 2022-01-08 01:22:50,397-Speed 271.96 samples/sec   Loss 10.2570   LearningRate 0.2432   Epoch: 2   Global Step: 30010   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:22:57,869-Speed 5484.01 samples/sec   Loss 10.2458   LearningRate 0.2431   Epoch: 2   Global Step: 30020   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:23:05,272-Speed 5533.96 samples/sec   Loss 10.2764   LearningRate 0.2431   Epoch: 2   Global Step: 30030   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:23:12,766-Speed 5467.11 samples/sec   Loss 10.2532   LearningRate 0.2431   Epoch: 2   Global Step: 30040   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:23:20,246-Speed 5477.20 samples/sec   Loss 10.1908   LearningRate 0.2431   Epoch: 2   Global Step: 30050   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:23:27,740-Speed 5466.85 samples/sec   Loss 10.2252   LearningRate 0.2430   Epoch: 2   Global Step: 30060   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:23:35,232-Speed 5467.56 samples/sec   Loss 10.2872   LearningRate 0.2430   Epoch: 2   Global Step: 30070   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:23:42,691-Speed 5492.05 samples/sec   Loss 10.2503   LearningRate 0.2430   Epoch: 2   Global Step: 30080   Fp16 Grad Scale: 262144   Required: 41 hours
Training: 2022-01-08 01:23:50,102-Speed 5527.71 samples/sec   Loss 10.2443   LearningRate 0.2429   Epoch: 2   Global Step: 30090   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:23:57,536-Speed 5510.89 samples/sec   Loss 10.2122   LearningRate 0.2429   Epoch: 2   Global Step: 30100   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:24:05,056-Speed 5447.88 samples/sec   Loss 10.2406   LearningRate 0.2429   Epoch: 2   Global Step: 30110   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:24:12,509-Speed 5495.85 samples/sec   Loss 10.2939   LearningRate 0.2429   Epoch: 2   Global Step: 30120   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:24:20,022-Speed 5452.87 samples/sec   Loss 10.2938   LearningRate 0.2428   Epoch: 2   Global Step: 30130   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:24:27,755-Speed 5297.53 samples/sec   Loss 10.1883   LearningRate 0.2428   Epoch: 2   Global Step: 30140   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:24:35,313-Speed 5420.77 samples/sec   Loss 10.2200   LearningRate 0.2428   Epoch: 2   Global Step: 30150   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-08 01:24:42,824-Speed 5453.49 samples/sec   Loss 10.2913   LearningRate 0.2428   Epoch: 2   Global Step: 30160   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:24:50,377-Speed 5424.67 samples/sec   Loss 10.1439   LearningRate 0.2427   Epoch: 2   Global Step: 30170   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:24:57,853-Speed 5479.03 samples/sec   Loss 10.2378   LearningRate 0.2427   Epoch: 2   Global Step: 30180   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:25:05,455-Speed 5388.86 samples/sec   Loss 10.2509   LearningRate 0.2427   Epoch: 2   Global Step: 30190   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-01-08 01:25:12,950-Speed 5465.33 samples/sec   Loss 10.2110   LearningRate 0.2426   Epoch: 2   Global Step: 30200   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-01-08 01:25:20,402-Speed 5497.58 samples/sec   Loss 10.2067   LearningRate 0.2426   Epoch: 2   Global Step: 30210   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:25:27,999-Speed 5392.83 samples/sec   Loss 10.2545   LearningRate 0.2426   Epoch: 2   Global Step: 30220   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:25:35,533-Speed 5437.43 samples/sec   Loss 10.2147   LearningRate 0.2426   Epoch: 2   Global Step: 30230   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:25:43,000-Speed 5485.86 samples/sec   Loss 10.1414   LearningRate 0.2425   Epoch: 2   Global Step: 30240   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:25:50,581-Speed 5404.32 samples/sec   Loss 10.2103   LearningRate 0.2425   Epoch: 2   Global Step: 30250   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:25:58,066-Speed 5472.85 samples/sec   Loss 10.1257   LearningRate 0.2425   Epoch: 2   Global Step: 30260   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:26:05,612-Speed 5429.01 samples/sec   Loss 10.2372   LearningRate 0.2425   Epoch: 2   Global Step: 30270   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:26:13,105-Speed 5466.76 samples/sec   Loss 10.2472   LearningRate 0.2424   Epoch: 2   Global Step: 30280   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:26:20,492-Speed 5545.51 samples/sec   Loss 10.2132   LearningRate 0.2424   Epoch: 2   Global Step: 30290   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:26:27,966-Speed 5480.94 samples/sec   Loss 10.2122   LearningRate 0.2424   Epoch: 2   Global Step: 30300   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:26:35,605-Speed 5362.80 samples/sec   Loss 10.2806   LearningRate 0.2423   Epoch: 2   Global Step: 30310   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:26:43,106-Speed 5461.49 samples/sec   Loss 10.2228   LearningRate 0.2423   Epoch: 2   Global Step: 30320   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:26:50,630-Speed 5444.41 samples/sec   Loss 10.2625   LearningRate 0.2423   Epoch: 2   Global Step: 30330   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:26:58,132-Speed 5460.81 samples/sec   Loss 10.2799   LearningRate 0.2423   Epoch: 2   Global Step: 30340   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:27:05,622-Speed 5469.03 samples/sec   Loss 10.2530   LearningRate 0.2422   Epoch: 2   Global Step: 30350   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:27:13,145-Speed 5445.50 samples/sec   Loss 10.1739   LearningRate 0.2422   Epoch: 2   Global Step: 30360   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:27:20,642-Speed 5464.29 samples/sec   Loss 10.2621   LearningRate 0.2422   Epoch: 2   Global Step: 30370   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:27:28,158-Speed 5450.47 samples/sec   Loss 10.2198   LearningRate 0.2422   Epoch: 2   Global Step: 30380   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:27:35,640-Speed 5475.29 samples/sec   Loss 10.1946   LearningRate 0.2421   Epoch: 2   Global Step: 30390   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:27:43,164-Speed 5444.86 samples/sec   Loss 10.2206   LearningRate 0.2421   Epoch: 2   Global Step: 30400   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:27:50,714-Speed 5425.86 samples/sec   Loss 10.1876   LearningRate 0.2421   Epoch: 2   Global Step: 30410   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-01-08 01:27:58,251-Speed 5435.38 samples/sec   Loss 10.2875   LearningRate 0.2420   Epoch: 2   Global Step: 30420   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:28:05,707-Speed 5494.45 samples/sec   Loss 10.2226   LearningRate 0.2420   Epoch: 2   Global Step: 30430   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:28:13,155-Speed 5500.36 samples/sec   Loss 10.2727   LearningRate 0.2420   Epoch: 2   Global Step: 30440   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:28:20,594-Speed 5507.19 samples/sec   Loss 10.3310   LearningRate 0.2420   Epoch: 2   Global Step: 30450   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:28:28,142-Speed 5427.08 samples/sec   Loss 10.1740   LearningRate 0.2419   Epoch: 2   Global Step: 30460   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:28:35,613-Speed 5483.46 samples/sec   Loss 10.2985   LearningRate 0.2419   Epoch: 2   Global Step: 30470   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:28:43,090-Speed 5478.85 samples/sec   Loss 10.2149   LearningRate 0.2419   Epoch: 2   Global Step: 30480   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:28:50,604-Speed 5452.19 samples/sec   Loss 10.2228   LearningRate 0.2419   Epoch: 2   Global Step: 30490   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:28:58,097-Speed 5467.24 samples/sec   Loss 10.2362   LearningRate 0.2418   Epoch: 2   Global Step: 30500   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:29:05,525-Speed 5514.45 samples/sec   Loss 10.2009   LearningRate 0.2418   Epoch: 2   Global Step: 30510   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:29:12,989-Speed 5488.95 samples/sec   Loss 10.2079   LearningRate 0.2418   Epoch: 2   Global Step: 30520   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:29:20,534-Speed 5429.16 samples/sec   Loss 10.1341   LearningRate 0.2417   Epoch: 2   Global Step: 30530   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:29:28,017-Speed 5474.54 samples/sec   Loss 10.2132   LearningRate 0.2417   Epoch: 2   Global Step: 30540   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:29:35,499-Speed 5475.25 samples/sec   Loss 10.1608   LearningRate 0.2417   Epoch: 2   Global Step: 30550   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:29:43,021-Speed 5446.80 samples/sec   Loss 10.1846   LearningRate 0.2417   Epoch: 2   Global Step: 30560   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:29:50,540-Speed 5448.00 samples/sec   Loss 10.1903   LearningRate 0.2416   Epoch: 2   Global Step: 30570   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:29:58,071-Speed 5439.60 samples/sec   Loss 10.1898   LearningRate 0.2416   Epoch: 2   Global Step: 30580   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:30:05,847-Speed 5267.87 samples/sec   Loss 10.2268   LearningRate 0.2416   Epoch: 2   Global Step: 30590   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:30:13,399-Speed 5425.27 samples/sec   Loss 10.2415   LearningRate 0.2415   Epoch: 2   Global Step: 30600   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:30:20,882-Speed 5474.21 samples/sec   Loss 10.2345   LearningRate 0.2415   Epoch: 2   Global Step: 30610   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:30:28,417-Speed 5436.31 samples/sec   Loss 10.2394   LearningRate 0.2415   Epoch: 2   Global Step: 30620   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:30:35,984-Speed 5414.08 samples/sec   Loss 10.1574   LearningRate 0.2415   Epoch: 2   Global Step: 30630   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:30:43,542-Speed 5420.52 samples/sec   Loss 10.1695   LearningRate 0.2414   Epoch: 2   Global Step: 30640   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:30:51,059-Speed 5449.42 samples/sec   Loss 10.1797   LearningRate 0.2414   Epoch: 2   Global Step: 30650   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-01-08 01:30:58,610-Speed 5425.67 samples/sec   Loss 10.2594   LearningRate 0.2414   Epoch: 2   Global Step: 30660   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-01-08 01:31:06,167-Speed 5420.76 samples/sec   Loss 10.2759   LearningRate 0.2414   Epoch: 2   Global Step: 30670   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-01-08 01:31:13,833-Speed 5344.16 samples/sec   Loss 10.2991   LearningRate 0.2413   Epoch: 2   Global Step: 30680   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:31:21,398-Speed 5414.35 samples/sec   Loss 10.1658   LearningRate 0.2413   Epoch: 2   Global Step: 30690   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:31:28,914-Speed 5450.65 samples/sec   Loss 10.1837   LearningRate 0.2413   Epoch: 2   Global Step: 30700   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:31:36,393-Speed 5477.20 samples/sec   Loss 10.2508   LearningRate 0.2412   Epoch: 2   Global Step: 30710   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:31:43,915-Speed 5446.74 samples/sec   Loss 10.1608   LearningRate 0.2412   Epoch: 2   Global Step: 30720   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:31:51,405-Speed 5468.65 samples/sec   Loss 10.1822   LearningRate 0.2412   Epoch: 2   Global Step: 30730   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:31:58,973-Speed 5412.92 samples/sec   Loss 10.2029   LearningRate 0.2412   Epoch: 2   Global Step: 30740   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:32:06,492-Speed 5448.49 samples/sec   Loss 10.1755   LearningRate 0.2411   Epoch: 2   Global Step: 30750   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:32:14,021-Speed 5441.69 samples/sec   Loss 10.1655   LearningRate 0.2411   Epoch: 2   Global Step: 30760   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:32:21,460-Speed 5506.90 samples/sec   Loss 10.2053   LearningRate 0.2411   Epoch: 2   Global Step: 30770   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:32:28,926-Speed 5486.54 samples/sec   Loss 10.2118   LearningRate 0.2411   Epoch: 2   Global Step: 30780   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:32:36,436-Speed 5455.42 samples/sec   Loss 10.1548   LearningRate 0.2410   Epoch: 2   Global Step: 30790   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:32:44,107-Speed 5340.02 samples/sec   Loss 10.1806   LearningRate 0.2410   Epoch: 2   Global Step: 30800   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:32:51,655-Speed 5427.49 samples/sec   Loss 10.1894   LearningRate 0.2410   Epoch: 2   Global Step: 30810   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:32:59,165-Speed 5454.75 samples/sec   Loss 10.2402   LearningRate 0.2409   Epoch: 2   Global Step: 30820   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:33:06,691-Speed 5443.04 samples/sec   Loss 10.2144   LearningRate 0.2409   Epoch: 2   Global Step: 30830   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:33:14,292-Speed 5390.28 samples/sec   Loss 10.1687   LearningRate 0.2409   Epoch: 2   Global Step: 30840   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:33:21,908-Speed 5378.83 samples/sec   Loss 10.1515   LearningRate 0.2409   Epoch: 2   Global Step: 30850   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:33:29,585-Speed 5335.72 samples/sec   Loss 10.2455   LearningRate 0.2408   Epoch: 2   Global Step: 30860   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:33:37,165-Speed 5404.54 samples/sec   Loss 10.1495   LearningRate 0.2408   Epoch: 2   Global Step: 30870   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:33:44,748-Speed 5402.58 samples/sec   Loss 10.2167   LearningRate 0.2408   Epoch: 2   Global Step: 30880   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:33:52,285-Speed 5435.27 samples/sec   Loss 10.1523   LearningRate 0.2408   Epoch: 2   Global Step: 30890   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:33:59,796-Speed 5453.56 samples/sec   Loss 10.1648   LearningRate 0.2407   Epoch: 2   Global Step: 30900   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:34:07,421-Speed 5372.76 samples/sec   Loss 10.1320   LearningRate 0.2407   Epoch: 2   Global Step: 30910   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:34:14,960-Speed 5433.93 samples/sec   Loss 10.2209   LearningRate 0.2407   Epoch: 2   Global Step: 30920   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:34:22,473-Speed 5452.99 samples/sec   Loss 10.2218   LearningRate 0.2406   Epoch: 2   Global Step: 30930   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-01-08 01:34:30,128-Speed 5351.18 samples/sec   Loss 10.1819   LearningRate 0.2406   Epoch: 2   Global Step: 30940   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:34:37,697-Speed 5412.43 samples/sec   Loss 10.1772   LearningRate 0.2406   Epoch: 2   Global Step: 30950   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:34:45,271-Speed 5408.65 samples/sec   Loss 10.1928   LearningRate 0.2406   Epoch: 2   Global Step: 30960   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:34:52,825-Speed 5423.05 samples/sec   Loss 10.1878   LearningRate 0.2405   Epoch: 2   Global Step: 30970   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:35:00,439-Speed 5380.44 samples/sec   Loss 10.1686   LearningRate 0.2405   Epoch: 2   Global Step: 30980   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:35:08,012-Speed 5409.53 samples/sec   Loss 10.1709   LearningRate 0.2405   Epoch: 2   Global Step: 30990   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:35:15,587-Speed 5408.19 samples/sec   Loss 10.2147   LearningRate 0.2405   Epoch: 2   Global Step: 31000   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:35:23,188-Speed 5389.61 samples/sec   Loss 10.1931   LearningRate 0.2404   Epoch: 2   Global Step: 31010   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:35:30,789-Speed 5388.75 samples/sec   Loss 10.2284   LearningRate 0.2404   Epoch: 2   Global Step: 31020   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:35:38,349-Speed 5419.26 samples/sec   Loss 10.2146   LearningRate 0.2404   Epoch: 2   Global Step: 31030   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:35:45,951-Speed 5388.44 samples/sec   Loss 10.1716   LearningRate 0.2403   Epoch: 2   Global Step: 31040   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:35:53,480-Speed 5441.14 samples/sec   Loss 10.1989   LearningRate 0.2403   Epoch: 2   Global Step: 31050   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:36:01,062-Speed 5403.56 samples/sec   Loss 10.1838   LearningRate 0.2403   Epoch: 2   Global Step: 31060   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:36:08,613-Speed 5424.83 samples/sec   Loss 10.2357   LearningRate 0.2403   Epoch: 2   Global Step: 31070   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:36:16,155-Speed 5431.50 samples/sec   Loss 10.1753   LearningRate 0.2402   Epoch: 2   Global Step: 31080   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:36:23,736-Speed 5403.62 samples/sec   Loss 10.2294   LearningRate 0.2402   Epoch: 2   Global Step: 31090   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:36:31,231-Speed 5465.67 samples/sec   Loss 10.1361   LearningRate 0.2402   Epoch: 2   Global Step: 31100   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:36:38,771-Speed 5433.09 samples/sec   Loss 10.2006   LearningRate 0.2402   Epoch: 2   Global Step: 31110   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:37:01,123-Speed 1832.58 samples/sec   Loss 10.2077   LearningRate 0.2401   Epoch: 3   Global Step: 31120   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:37:08,568-Speed 5502.39 samples/sec   Loss 10.1981   LearningRate 0.2401   Epoch: 3   Global Step: 31130   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:37:16,001-Speed 5511.30 samples/sec   Loss 10.0911   LearningRate 0.2401   Epoch: 3   Global Step: 31140   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:37:23,527-Speed 5442.99 samples/sec   Loss 10.2123   LearningRate 0.2400   Epoch: 3   Global Step: 31150   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:37:30,967-Speed 5506.28 samples/sec   Loss 10.0856   LearningRate 0.2400   Epoch: 3   Global Step: 31160   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:37:38,462-Speed 5465.94 samples/sec   Loss 10.1207   LearningRate 0.2400   Epoch: 3   Global Step: 31170   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:37:45,873-Speed 5527.62 samples/sec   Loss 10.1434   LearningRate 0.2400   Epoch: 3   Global Step: 31180   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:37:53,285-Speed 5526.63 samples/sec   Loss 10.1184   LearningRate 0.2399   Epoch: 3   Global Step: 31190   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-01-08 01:38:00,729-Speed 5503.10 samples/sec   Loss 10.1059   LearningRate 0.2399   Epoch: 3   Global Step: 31200   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-01-08 01:38:08,142-Speed 5525.87 samples/sec   Loss 10.1938   LearningRate 0.2399   Epoch: 3   Global Step: 31210   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-01-08 01:38:15,603-Speed 5490.94 samples/sec   Loss 10.1486   LearningRate 0.2399   Epoch: 3   Global Step: 31220   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-01-08 01:38:23,071-Speed 5485.84 samples/sec   Loss 10.1654   LearningRate 0.2398   Epoch: 3   Global Step: 31230   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-01-08 01:38:30,458-Speed 5545.69 samples/sec   Loss 10.1614   LearningRate 0.2398   Epoch: 3   Global Step: 31240   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:38:37,861-Speed 5533.16 samples/sec   Loss 10.1508   LearningRate 0.2398   Epoch: 3   Global Step: 31250   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:38:45,270-Speed 5530.00 samples/sec   Loss 10.1904   LearningRate 0.2397   Epoch: 3   Global Step: 31260   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:38:52,656-Speed 5546.25 samples/sec   Loss 10.2749   LearningRate 0.2397   Epoch: 3   Global Step: 31270   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:39:00,070-Speed 5525.57 samples/sec   Loss 10.1225   LearningRate 0.2397   Epoch: 3   Global Step: 31280   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:39:07,518-Speed 5499.90 samples/sec   Loss 10.2402   LearningRate 0.2397   Epoch: 3   Global Step: 31290   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:39:15,060-Speed 5431.49 samples/sec   Loss 10.1060   LearningRate 0.2396   Epoch: 3   Global Step: 31300   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:39:22,582-Speed 5446.36 samples/sec   Loss 10.1889   LearningRate 0.2396   Epoch: 3   Global Step: 31310   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:39:30,148-Speed 5414.68 samples/sec   Loss 10.1241   LearningRate 0.2396   Epoch: 3   Global Step: 31320   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:39:37,618-Speed 5483.52 samples/sec   Loss 10.1325   LearningRate 0.2396   Epoch: 3   Global Step: 31330   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:39:45,071-Speed 5497.21 samples/sec   Loss 10.0320   LearningRate 0.2395   Epoch: 3   Global Step: 31340   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-01-08 01:39:52,514-Speed 5503.91 samples/sec   Loss 10.1441   LearningRate 0.2395   Epoch: 3   Global Step: 31350   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:39:59,975-Speed 5490.62 samples/sec   Loss 10.0893   LearningRate 0.2395   Epoch: 3   Global Step: 31360   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:40:07,513-Speed 5434.25 samples/sec   Loss 10.1467   LearningRate 0.2395   Epoch: 3   Global Step: 31370   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:40:14,996-Speed 5474.51 samples/sec   Loss 10.2295   LearningRate 0.2394   Epoch: 3   Global Step: 31380   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:40:22,539-Speed 5431.08 samples/sec   Loss 10.1172   LearningRate 0.2394   Epoch: 3   Global Step: 31390   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:40:30,027-Speed 5471.08 samples/sec   Loss 10.0866   LearningRate 0.2394   Epoch: 3   Global Step: 31400   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:40:37,516-Speed 5469.70 samples/sec   Loss 10.0640   LearningRate 0.2393   Epoch: 3   Global Step: 31410   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:40:44,980-Speed 5488.72 samples/sec   Loss 10.1037   LearningRate 0.2393   Epoch: 3   Global Step: 31420   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:40:52,468-Speed 5470.66 samples/sec   Loss 10.0996   LearningRate 0.2393   Epoch: 3   Global Step: 31430   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:40:59,921-Speed 5496.45 samples/sec   Loss 10.0757   LearningRate 0.2393   Epoch: 3   Global Step: 31440   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:41:07,378-Speed 5493.88 samples/sec   Loss 10.1320   LearningRate 0.2392   Epoch: 3   Global Step: 31450   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:41:14,840-Speed 5489.88 samples/sec   Loss 10.1364   LearningRate 0.2392   Epoch: 3   Global Step: 31460   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:41:22,295-Speed 5495.04 samples/sec   Loss 10.1531   LearningRate 0.2392   Epoch: 3   Global Step: 31470   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:41:29,770-Speed 5480.36 samples/sec   Loss 10.0786   LearningRate 0.2392   Epoch: 3   Global Step: 31480   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:41:37,227-Speed 5493.39 samples/sec   Loss 10.0862   LearningRate 0.2391   Epoch: 3   Global Step: 31490   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:41:44,724-Speed 5464.00 samples/sec   Loss 10.1303   LearningRate 0.2391   Epoch: 3   Global Step: 31500   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:41:52,225-Speed 5461.79 samples/sec   Loss 10.1205   LearningRate 0.2391   Epoch: 3   Global Step: 31510   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:41:59,882-Speed 5350.43 samples/sec   Loss 10.2172   LearningRate 0.2390   Epoch: 3   Global Step: 31520   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:42:07,798-Speed 5175.16 samples/sec   Loss 10.1789   LearningRate 0.2390   Epoch: 3   Global Step: 31530   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:42:15,280-Speed 5474.81 samples/sec   Loss 10.1080   LearningRate 0.2390   Epoch: 3   Global Step: 31540   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:42:22,756-Speed 5479.22 samples/sec   Loss 10.2140   LearningRate 0.2390   Epoch: 3   Global Step: 31550   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-01-08 01:42:30,228-Speed 5482.57 samples/sec   Loss 10.1563   LearningRate 0.2389   Epoch: 3   Global Step: 31560   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:42:37,722-Speed 5466.97 samples/sec   Loss 10.0950   LearningRate 0.2389   Epoch: 3   Global Step: 31570   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:42:45,215-Speed 5466.72 samples/sec   Loss 10.1808   LearningRate 0.2389   Epoch: 3   Global Step: 31580   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:42:52,733-Speed 5448.73 samples/sec   Loss 10.1594   LearningRate 0.2389   Epoch: 3   Global Step: 31590   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:43:00,234-Speed 5461.37 samples/sec   Loss 10.1424   LearningRate 0.2388   Epoch: 3   Global Step: 31600   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:43:07,698-Speed 5489.06 samples/sec   Loss 10.0872   LearningRate 0.2388   Epoch: 3   Global Step: 31610   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:43:15,169-Speed 5482.53 samples/sec   Loss 10.0942   LearningRate 0.2388   Epoch: 3   Global Step: 31620   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:43:22,639-Speed 5484.76 samples/sec   Loss 10.1171   LearningRate 0.2387   Epoch: 3   Global Step: 31630   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:43:30,120-Speed 5476.13 samples/sec   Loss 10.1651   LearningRate 0.2387   Epoch: 3   Global Step: 31640   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:43:37,629-Speed 5455.41 samples/sec   Loss 10.1185   LearningRate 0.2387   Epoch: 3   Global Step: 31650   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:43:45,067-Speed 5506.81 samples/sec   Loss 10.1869   LearningRate 0.2387   Epoch: 3   Global Step: 31660   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-01-08 01:43:52,562-Speed 5466.13 samples/sec   Loss 10.1458   LearningRate 0.2386   Epoch: 3   Global Step: 31670   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:44:00,008-Speed 5502.09 samples/sec   Loss 10.1836   LearningRate 0.2386   Epoch: 3   Global Step: 31680   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:44:07,571-Speed 5416.54 samples/sec   Loss 10.1250   LearningRate 0.2386   Epoch: 3   Global Step: 31690   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:44:15,224-Speed 5352.61 samples/sec   Loss 10.2027   LearningRate 0.2386   Epoch: 3   Global Step: 31700   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:44:22,823-Speed 5391.53 samples/sec   Loss 10.1238   LearningRate 0.2385   Epoch: 3   Global Step: 31710   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:44:30,364-Speed 5432.44 samples/sec   Loss 10.1427   LearningRate 0.2385   Epoch: 3   Global Step: 31720   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:44:37,942-Speed 5405.80 samples/sec   Loss 10.1069   LearningRate 0.2385   Epoch: 3   Global Step: 31730   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:44:45,453-Speed 5453.38 samples/sec   Loss 10.2108   LearningRate 0.2384   Epoch: 3   Global Step: 31740   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:44:52,977-Speed 5445.42 samples/sec   Loss 10.1311   LearningRate 0.2384   Epoch: 3   Global Step: 31750   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:45:00,487-Speed 5454.05 samples/sec   Loss 10.1181   LearningRate 0.2384   Epoch: 3   Global Step: 31760   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:45:07,930-Speed 5504.13 samples/sec   Loss 10.1758   LearningRate 0.2384   Epoch: 3   Global Step: 31770   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:45:15,425-Speed 5466.01 samples/sec   Loss 10.1717   LearningRate 0.2383   Epoch: 3   Global Step: 31780   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:45:22,855-Speed 5513.58 samples/sec   Loss 10.1480   LearningRate 0.2383   Epoch: 3   Global Step: 31790   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:45:30,389-Speed 5437.12 samples/sec   Loss 10.1622   LearningRate 0.2383   Epoch: 3   Global Step: 31800   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:45:37,889-Speed 5462.74 samples/sec   Loss 10.1585   LearningRate 0.2383   Epoch: 3   Global Step: 31810   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:45:45,320-Speed 5512.65 samples/sec   Loss 10.1218   LearningRate 0.2382   Epoch: 3   Global Step: 31820   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:45:52,773-Speed 5496.24 samples/sec   Loss 10.1191   LearningRate 0.2382   Epoch: 3   Global Step: 31830   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:46:00,246-Speed 5482.16 samples/sec   Loss 10.1262   LearningRate 0.2382   Epoch: 3   Global Step: 31840   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:46:07,709-Speed 5488.69 samples/sec   Loss 10.1097   LearningRate 0.2381   Epoch: 3   Global Step: 31850   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:46:15,185-Speed 5480.21 samples/sec   Loss 10.0386   LearningRate 0.2381   Epoch: 3   Global Step: 31860   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:46:22,628-Speed 5503.11 samples/sec   Loss 10.1194   LearningRate 0.2381   Epoch: 3   Global Step: 31870   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:46:30,108-Speed 5476.97 samples/sec   Loss 10.1471   LearningRate 0.2381   Epoch: 3   Global Step: 31880   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:46:37,602-Speed 5466.48 samples/sec   Loss 10.1436   LearningRate 0.2380   Epoch: 3   Global Step: 31890   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:46:45,092-Speed 5469.76 samples/sec   Loss 10.0723   LearningRate 0.2380   Epoch: 3   Global Step: 31900   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:46:52,579-Speed 5471.25 samples/sec   Loss 10.1850   LearningRate 0.2380   Epoch: 3   Global Step: 31910   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:47:00,094-Speed 5451.19 samples/sec   Loss 10.1108   LearningRate 0.2380   Epoch: 3   Global Step: 31920   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:47:07,616-Speed 5446.57 samples/sec   Loss 10.0974   LearningRate 0.2379   Epoch: 3   Global Step: 31930   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:47:15,112-Speed 5465.22 samples/sec   Loss 10.0581   LearningRate 0.2379   Epoch: 3   Global Step: 31940   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:47:22,623-Speed 5453.31 samples/sec   Loss 10.0951   LearningRate 0.2379   Epoch: 3   Global Step: 31950   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:47:30,030-Speed 5531.44 samples/sec   Loss 10.0871   LearningRate 0.2378   Epoch: 3   Global Step: 31960   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:47:37,573-Speed 5430.73 samples/sec   Loss 10.0873   LearningRate 0.2378   Epoch: 3   Global Step: 31970   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:47:45,098-Speed 5443.94 samples/sec   Loss 10.1855   LearningRate 0.2378   Epoch: 3   Global Step: 31980   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:47:52,595-Speed 5463.59 samples/sec   Loss 10.1402   LearningRate 0.2378   Epoch: 3   Global Step: 31990   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:48:00,104-Speed 5455.93 samples/sec   Loss 10.0961   LearningRate 0.2377   Epoch: 3   Global Step: 32000   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:48:44,548-[lfw][32000]XNorm: 21.934491
Training: 2022-01-08 01:48:44,548-[lfw][32000]Accuracy-Flip: 0.99800+-0.00277
Training: 2022-01-08 01:48:44,549-[lfw][32000]Accuracy-Highest: 0.99800
Training: 2022-01-08 01:49:37,890-[cfp_fp][32000]XNorm: 20.134749
Training: 2022-01-08 01:49:37,891-[cfp_fp][32000]Accuracy-Flip: 0.98457+-0.00514
Training: 2022-01-08 01:49:37,892-[cfp_fp][32000]Accuracy-Highest: 0.98457
Training: 2022-01-08 01:50:23,995-[agedb_30][32000]XNorm: 21.555493
Training: 2022-01-08 01:50:23,997-[agedb_30][32000]Accuracy-Flip: 0.96650+-0.00973
Training: 2022-01-08 01:50:23,997-[agedb_30][32000]Accuracy-Highest: 0.97167
Training: 2022-01-08 01:50:31,505-Speed 270.54 samples/sec   Loss 10.1102   LearningRate 0.2377   Epoch: 3   Global Step: 32010   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:50:38,937-Speed 5513.67 samples/sec   Loss 10.1682   LearningRate 0.2377   Epoch: 3   Global Step: 32020   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:50:46,494-Speed 5421.55 samples/sec   Loss 10.1179   LearningRate 0.2377   Epoch: 3   Global Step: 32030   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:50:53,914-Speed 5521.65 samples/sec   Loss 10.1721   LearningRate 0.2376   Epoch: 3   Global Step: 32040   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:51:01,358-Speed 5503.54 samples/sec   Loss 10.2088   LearningRate 0.2376   Epoch: 3   Global Step: 32050   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:51:08,912-Speed 5423.73 samples/sec   Loss 10.1278   LearningRate 0.2376   Epoch: 3   Global Step: 32060   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:51:16,461-Speed 5426.20 samples/sec   Loss 10.1301   LearningRate 0.2375   Epoch: 3   Global Step: 32070   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:51:23,984-Speed 5446.23 samples/sec   Loss 10.0819   LearningRate 0.2375   Epoch: 3   Global Step: 32080   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:51:31,443-Speed 5492.98 samples/sec   Loss 10.0694   LearningRate 0.2375   Epoch: 3   Global Step: 32090   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:51:38,897-Speed 5496.30 samples/sec   Loss 10.1681   LearningRate 0.2375   Epoch: 3   Global Step: 32100   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:51:46,375-Speed 5478.28 samples/sec   Loss 10.1062   LearningRate 0.2374   Epoch: 3   Global Step: 32110   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:51:53,837-Speed 5490.21 samples/sec   Loss 10.0788   LearningRate 0.2374   Epoch: 3   Global Step: 32120   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:52:01,285-Speed 5500.56 samples/sec   Loss 10.1160   LearningRate 0.2374   Epoch: 3   Global Step: 32130   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 01:52:08,802-Speed 5449.55 samples/sec   Loss 10.1092   LearningRate 0.2374   Epoch: 3   Global Step: 32140   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 01:52:16,252-Speed 5498.63 samples/sec   Loss 10.1718   LearningRate 0.2373   Epoch: 3   Global Step: 32150   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 01:52:23,699-Speed 5500.82 samples/sec   Loss 10.0776   LearningRate 0.2373   Epoch: 3   Global Step: 32160   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 01:52:31,271-Speed 5410.65 samples/sec   Loss 10.0758   LearningRate 0.2373   Epoch: 3   Global Step: 32170   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 01:52:38,767-Speed 5464.74 samples/sec   Loss 10.0858   LearningRate 0.2373   Epoch: 3   Global Step: 32180   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 01:52:46,225-Speed 5492.54 samples/sec   Loss 10.0583   LearningRate 0.2372   Epoch: 3   Global Step: 32190   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 01:52:53,640-Speed 5525.07 samples/sec   Loss 10.0702   LearningRate 0.2372   Epoch: 3   Global Step: 32200   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 01:53:01,140-Speed 5462.21 samples/sec   Loss 10.0861   LearningRate 0.2372   Epoch: 3   Global Step: 32210   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 01:53:08,543-Speed 5533.17 samples/sec   Loss 10.0896   LearningRate 0.2371   Epoch: 3   Global Step: 32220   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 01:53:16,046-Speed 5460.10 samples/sec   Loss 10.1656   LearningRate 0.2371   Epoch: 3   Global Step: 32230   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:53:23,521-Speed 5480.17 samples/sec   Loss 10.1078   LearningRate 0.2371   Epoch: 3   Global Step: 32240   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:53:31,139-Speed 5377.51 samples/sec   Loss 10.0845   LearningRate 0.2371   Epoch: 3   Global Step: 32250   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:53:38,600-Speed 5490.47 samples/sec   Loss 10.1310   LearningRate 0.2370   Epoch: 3   Global Step: 32260   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:53:46,099-Speed 5462.29 samples/sec   Loss 10.0681   LearningRate 0.2370   Epoch: 3   Global Step: 32270   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:53:53,625-Speed 5443.51 samples/sec   Loss 10.0916   LearningRate 0.2370   Epoch: 3   Global Step: 32280   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:54:01,260-Speed 5365.73 samples/sec   Loss 10.1138   LearningRate 0.2370   Epoch: 3   Global Step: 32290   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:54:08,765-Speed 5458.41 samples/sec   Loss 10.0926   LearningRate 0.2369   Epoch: 3   Global Step: 32300   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:54:16,270-Speed 5458.45 samples/sec   Loss 10.0529   LearningRate 0.2369   Epoch: 3   Global Step: 32310   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:54:23,747-Speed 5478.35 samples/sec   Loss 10.0992   LearningRate 0.2369   Epoch: 3   Global Step: 32320   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:54:31,205-Speed 5493.54 samples/sec   Loss 9.9902   LearningRate 0.2368   Epoch: 3   Global Step: 32330   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:54:38,627-Speed 5519.50 samples/sec   Loss 10.0997   LearningRate 0.2368   Epoch: 3   Global Step: 32340   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:54:46,163-Speed 5435.66 samples/sec   Loss 10.0236   LearningRate 0.2368   Epoch: 3   Global Step: 32350   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:54:53,579-Speed 5523.80 samples/sec   Loss 10.1284   LearningRate 0.2368   Epoch: 3   Global Step: 32360   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 01:55:01,053-Speed 5480.85 samples/sec   Loss 10.0605   LearningRate 0.2367   Epoch: 3   Global Step: 32370   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 01:55:08,501-Speed 5500.71 samples/sec   Loss 10.1065   LearningRate 0.2367   Epoch: 3   Global Step: 32380   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 01:55:15,999-Speed 5462.86 samples/sec   Loss 10.0655   LearningRate 0.2367   Epoch: 3   Global Step: 32390   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 01:55:23,452-Speed 5496.00 samples/sec   Loss 10.1331   LearningRate 0.2367   Epoch: 3   Global Step: 32400   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 01:55:30,919-Speed 5486.83 samples/sec   Loss 10.1795   LearningRate 0.2366   Epoch: 3   Global Step: 32410   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 01:55:38,360-Speed 5505.78 samples/sec   Loss 10.1632   LearningRate 0.2366   Epoch: 3   Global Step: 32420   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 01:55:45,861-Speed 5460.72 samples/sec   Loss 10.0153   LearningRate 0.2366   Epoch: 3   Global Step: 32430   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 01:55:53,364-Speed 5459.84 samples/sec   Loss 10.0520   LearningRate 0.2365   Epoch: 3   Global Step: 32440   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 01:56:00,853-Speed 5469.76 samples/sec   Loss 10.0059   LearningRate 0.2365   Epoch: 3   Global Step: 32450   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 01:56:08,372-Speed 5448.59 samples/sec   Loss 10.0906   LearningRate 0.2365   Epoch: 3   Global Step: 32460   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:56:15,881-Speed 5455.44 samples/sec   Loss 10.1085   LearningRate 0.2365   Epoch: 3   Global Step: 32470   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:56:23,278-Speed 5537.80 samples/sec   Loss 10.0273   LearningRate 0.2364   Epoch: 3   Global Step: 32480   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:56:30,721-Speed 5504.31 samples/sec   Loss 10.0369   LearningRate 0.2364   Epoch: 3   Global Step: 32490   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:56:38,157-Speed 5509.32 samples/sec   Loss 10.0953   LearningRate 0.2364   Epoch: 3   Global Step: 32500   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:56:45,576-Speed 5521.65 samples/sec   Loss 10.0111   LearningRate 0.2364   Epoch: 3   Global Step: 32510   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:56:53,079-Speed 5459.63 samples/sec   Loss 10.0245   LearningRate 0.2363   Epoch: 3   Global Step: 32520   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:57:00,555-Speed 5479.29 samples/sec   Loss 10.0155   LearningRate 0.2363   Epoch: 3   Global Step: 32530   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:57:08,006-Speed 5498.44 samples/sec   Loss 10.2048   LearningRate 0.2363   Epoch: 3   Global Step: 32540   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:57:15,538-Speed 5438.48 samples/sec   Loss 10.0579   LearningRate 0.2363   Epoch: 3   Global Step: 32550   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 01:57:23,099-Speed 5418.16 samples/sec   Loss 10.0210   LearningRate 0.2362   Epoch: 3   Global Step: 32560   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:57:30,650-Speed 5425.02 samples/sec   Loss 10.0458   LearningRate 0.2362   Epoch: 3   Global Step: 32570   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:57:38,211-Speed 5418.43 samples/sec   Loss 10.0509   LearningRate 0.2362   Epoch: 3   Global Step: 32580   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:57:45,658-Speed 5500.36 samples/sec   Loss 10.1048   LearningRate 0.2361   Epoch: 3   Global Step: 32590   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:57:53,096-Speed 5507.43 samples/sec   Loss 10.0083   LearningRate 0.2361   Epoch: 3   Global Step: 32600   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:58:00,721-Speed 5372.55 samples/sec   Loss 10.0561   LearningRate 0.2361   Epoch: 3   Global Step: 32610   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:58:08,169-Speed 5500.83 samples/sec   Loss 10.1262   LearningRate 0.2361   Epoch: 3   Global Step: 32620   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:58:15,795-Speed 5371.49 samples/sec   Loss 10.0304   LearningRate 0.2360   Epoch: 3   Global Step: 32630   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:58:23,311-Speed 5450.17 samples/sec   Loss 10.0466   LearningRate 0.2360   Epoch: 3   Global Step: 32640   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:58:30,797-Speed 5472.26 samples/sec   Loss 10.0973   LearningRate 0.2360   Epoch: 3   Global Step: 32650   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:58:38,312-Speed 5451.27 samples/sec   Loss 9.9806   LearningRate 0.2360   Epoch: 3   Global Step: 32660   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-01-08 01:58:45,822-Speed 5455.03 samples/sec   Loss 10.0499   LearningRate 0.2359   Epoch: 3   Global Step: 32670   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-01-08 01:58:53,324-Speed 5460.38 samples/sec   Loss 10.1019   LearningRate 0.2359   Epoch: 3   Global Step: 32680   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:59:00,924-Speed 5390.39 samples/sec   Loss 10.1044   LearningRate 0.2359   Epoch: 3   Global Step: 32690   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:59:08,446-Speed 5445.87 samples/sec   Loss 9.9945   LearningRate 0.2358   Epoch: 3   Global Step: 32700   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:59:16,137-Speed 5326.25 samples/sec   Loss 10.0562   LearningRate 0.2358   Epoch: 3   Global Step: 32710   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:59:23,679-Speed 5431.80 samples/sec   Loss 10.0634   LearningRate 0.2358   Epoch: 3   Global Step: 32720   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:59:31,360-Speed 5333.03 samples/sec   Loss 10.1010   LearningRate 0.2358   Epoch: 3   Global Step: 32730   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:59:39,002-Speed 5361.21 samples/sec   Loss 10.0693   LearningRate 0.2357   Epoch: 3   Global Step: 32740   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:59:46,512-Speed 5454.31 samples/sec   Loss 10.0690   LearningRate 0.2357   Epoch: 3   Global Step: 32750   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 01:59:54,073-Speed 5418.27 samples/sec   Loss 10.0301   LearningRate 0.2357   Epoch: 3   Global Step: 32760   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:00:01,576-Speed 5459.59 samples/sec   Loss 10.0865   LearningRate 0.2357   Epoch: 3   Global Step: 32770   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:00:09,091-Speed 5451.18 samples/sec   Loss 10.0910   LearningRate 0.2356   Epoch: 3   Global Step: 32780   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-01-08 02:00:16,610-Speed 5448.42 samples/sec   Loss 10.0830   LearningRate 0.2356   Epoch: 3   Global Step: 32790   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-01-08 02:00:24,136-Speed 5442.95 samples/sec   Loss 10.0553   LearningRate 0.2356   Epoch: 3   Global Step: 32800   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:00:31,644-Speed 5456.71 samples/sec   Loss 10.0137   LearningRate 0.2355   Epoch: 3   Global Step: 32810   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:00:39,287-Speed 5359.68 samples/sec   Loss 10.1181   LearningRate 0.2355   Epoch: 3   Global Step: 32820   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:00:46,798-Speed 5454.23 samples/sec   Loss 10.0892   LearningRate 0.2355   Epoch: 3   Global Step: 32830   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 02:00:54,756-Speed 5147.61 samples/sec   Loss 10.0233   LearningRate 0.2355   Epoch: 3   Global Step: 32840   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 02:01:02,236-Speed 5476.79 samples/sec   Loss 10.0461   LearningRate 0.2354   Epoch: 3   Global Step: 32850   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 02:01:09,772-Speed 5435.94 samples/sec   Loss 10.0587   LearningRate 0.2354   Epoch: 3   Global Step: 32860   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 02:01:17,256-Speed 5474.14 samples/sec   Loss 9.9419   LearningRate 0.2354   Epoch: 3   Global Step: 32870   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 02:01:24,704-Speed 5499.78 samples/sec   Loss 10.0314   LearningRate 0.2354   Epoch: 3   Global Step: 32880   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 02:01:32,204-Speed 5462.00 samples/sec   Loss 10.1129   LearningRate 0.2353   Epoch: 3   Global Step: 32890   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 02:01:39,778-Speed 5408.90 samples/sec   Loss 10.0286   LearningRate 0.2353   Epoch: 3   Global Step: 32900   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 02:01:47,302-Speed 5444.86 samples/sec   Loss 10.0909   LearningRate 0.2353   Epoch: 3   Global Step: 32910   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 02:01:54,861-Speed 5419.19 samples/sec   Loss 10.0550   LearningRate 0.2353   Epoch: 3   Global Step: 32920   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 02:02:02,547-Speed 5329.88 samples/sec   Loss 10.0885   LearningRate 0.2352   Epoch: 3   Global Step: 32930   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 02:02:10,013-Speed 5487.34 samples/sec   Loss 10.0163   LearningRate 0.2352   Epoch: 3   Global Step: 32940   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 02:02:17,462-Speed 5499.14 samples/sec   Loss 10.1539   LearningRate 0.2352   Epoch: 3   Global Step: 32950   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 02:02:25,002-Speed 5433.70 samples/sec   Loss 10.1263   LearningRate 0.2351   Epoch: 3   Global Step: 32960   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 02:02:32,605-Speed 5387.88 samples/sec   Loss 9.9483   LearningRate 0.2351   Epoch: 3   Global Step: 32970   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 02:02:40,293-Speed 5328.22 samples/sec   Loss 10.0537   LearningRate 0.2351   Epoch: 3   Global Step: 32980   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 02:02:47,873-Speed 5404.65 samples/sec   Loss 10.0501   LearningRate 0.2351   Epoch: 3   Global Step: 32990   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 02:02:55,323-Speed 5498.76 samples/sec   Loss 10.0927   LearningRate 0.2350   Epoch: 3   Global Step: 33000   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 02:03:02,941-Speed 5377.30 samples/sec   Loss 10.0579   LearningRate 0.2350   Epoch: 3   Global Step: 33010   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 02:03:10,498-Speed 5421.04 samples/sec   Loss 10.0043   LearningRate 0.2350   Epoch: 3   Global Step: 33020   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 02:03:18,156-Speed 5349.51 samples/sec   Loss 9.9707   LearningRate 0.2350   Epoch: 3   Global Step: 33030   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 02:03:25,766-Speed 5383.08 samples/sec   Loss 9.9482   LearningRate 0.2349   Epoch: 3   Global Step: 33040   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 02:03:33,283-Speed 5450.14 samples/sec   Loss 9.9846   LearningRate 0.2349   Epoch: 3   Global Step: 33050   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 02:03:40,901-Speed 5377.46 samples/sec   Loss 10.1659   LearningRate 0.2349   Epoch: 3   Global Step: 33060   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 02:03:48,396-Speed 5465.43 samples/sec   Loss 10.0646   LearningRate 0.2348   Epoch: 3   Global Step: 33070   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 02:03:55,970-Speed 5408.67 samples/sec   Loss 9.9310   LearningRate 0.2348   Epoch: 3   Global Step: 33080   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:04:03,610-Speed 5361.53 samples/sec   Loss 10.0066   LearningRate 0.2348   Epoch: 3   Global Step: 33090   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:04:11,059-Speed 5499.72 samples/sec   Loss 10.0170   LearningRate 0.2348   Epoch: 3   Global Step: 33100   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:04:18,604-Speed 5430.01 samples/sec   Loss 10.0116   LearningRate 0.2347   Epoch: 3   Global Step: 33110   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:04:26,228-Speed 5373.28 samples/sec   Loss 9.9787   LearningRate 0.2347   Epoch: 3   Global Step: 33120   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:04:33,822-Speed 5394.31 samples/sec   Loss 9.9524   LearningRate 0.2347   Epoch: 3   Global Step: 33130   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:04:41,482-Speed 5348.20 samples/sec   Loss 9.9953   LearningRate 0.2347   Epoch: 3   Global Step: 33140   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:04:49,059-Speed 5406.45 samples/sec   Loss 10.0294   LearningRate 0.2346   Epoch: 3   Global Step: 33150   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:04:56,690-Speed 5368.44 samples/sec   Loss 10.0198   LearningRate 0.2346   Epoch: 3   Global Step: 33160   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:05:04,282-Speed 5395.51 samples/sec   Loss 10.0718   LearningRate 0.2346   Epoch: 3   Global Step: 33170   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:05:12,017-Speed 5295.97 samples/sec   Loss 10.0067   LearningRate 0.2346   Epoch: 3   Global Step: 33180   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-01-08 02:05:19,579-Speed 5417.79 samples/sec   Loss 10.0881   LearningRate 0.2345   Epoch: 3   Global Step: 33190   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:05:27,167-Speed 5398.56 samples/sec   Loss 9.9114   LearningRate 0.2345   Epoch: 3   Global Step: 33200   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:05:34,868-Speed 5318.69 samples/sec   Loss 10.0349   LearningRate 0.2345   Epoch: 3   Global Step: 33210   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:05:42,456-Speed 5399.32 samples/sec   Loss 10.0119   LearningRate 0.2344   Epoch: 3   Global Step: 33220   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:05:49,997-Speed 5432.49 samples/sec   Loss 9.9967   LearningRate 0.2344   Epoch: 3   Global Step: 33230   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:05:57,572-Speed 5407.59 samples/sec   Loss 10.0592   LearningRate 0.2344   Epoch: 3   Global Step: 33240   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:06:05,197-Speed 5372.36 samples/sec   Loss 10.0038   LearningRate 0.2344   Epoch: 3   Global Step: 33250   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:06:12,863-Speed 5344.07 samples/sec   Loss 10.0490   LearningRate 0.2343   Epoch: 3   Global Step: 33260   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:06:20,536-Speed 5338.71 samples/sec   Loss 10.0212   LearningRate 0.2343   Epoch: 3   Global Step: 33270   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:06:28,108-Speed 5410.35 samples/sec   Loss 9.9788   LearningRate 0.2343   Epoch: 3   Global Step: 33280   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:06:35,723-Speed 5379.35 samples/sec   Loss 10.0443   LearningRate 0.2343   Epoch: 3   Global Step: 33290   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:06:43,495-Speed 5270.42 samples/sec   Loss 10.0429   LearningRate 0.2342   Epoch: 3   Global Step: 33300   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:06:51,241-Speed 5289.06 samples/sec   Loss 10.0617   LearningRate 0.2342   Epoch: 3   Global Step: 33310   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:06:58,929-Speed 5327.90 samples/sec   Loss 9.9376   LearningRate 0.2342   Epoch: 3   Global Step: 33320   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:07:06,437-Speed 5456.91 samples/sec   Loss 9.9592   LearningRate 0.2341   Epoch: 3   Global Step: 33330   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:07:13,972-Speed 5436.62 samples/sec   Loss 10.0130   LearningRate 0.2341   Epoch: 3   Global Step: 33340   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:07:21,524-Speed 5424.90 samples/sec   Loss 10.0456   LearningRate 0.2341   Epoch: 3   Global Step: 33350   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:07:29,024-Speed 5462.06 samples/sec   Loss 10.0292   LearningRate 0.2341   Epoch: 3   Global Step: 33360   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:07:36,515-Speed 5468.42 samples/sec   Loss 10.0561   LearningRate 0.2340   Epoch: 3   Global Step: 33370   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:07:44,224-Speed 5313.71 samples/sec   Loss 10.0302   LearningRate 0.2340   Epoch: 3   Global Step: 33380   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 02:07:51,814-Speed 5397.78 samples/sec   Loss 10.0122   LearningRate 0.2340   Epoch: 3   Global Step: 33390   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 02:07:59,407-Speed 5395.25 samples/sec   Loss 10.0663   LearningRate 0.2340   Epoch: 3   Global Step: 33400   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 02:08:06,948-Speed 5432.49 samples/sec   Loss 9.9899   LearningRate 0.2339   Epoch: 3   Global Step: 33410   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 02:08:14,557-Speed 5383.16 samples/sec   Loss 10.0010   LearningRate 0.2339   Epoch: 3   Global Step: 33420   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 02:08:22,038-Speed 5475.84 samples/sec   Loss 10.0142   LearningRate 0.2339   Epoch: 3   Global Step: 33430   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 02:08:29,660-Speed 5374.83 samples/sec   Loss 9.9937   LearningRate 0.2339   Epoch: 3   Global Step: 33440   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 02:08:37,241-Speed 5403.97 samples/sec   Loss 10.0433   LearningRate 0.2338   Epoch: 3   Global Step: 33450   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 02:08:44,873-Speed 5366.90 samples/sec   Loss 10.0101   LearningRate 0.2338   Epoch: 3   Global Step: 33460   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 02:08:52,531-Speed 5349.96 samples/sec   Loss 9.9678   LearningRate 0.2338   Epoch: 3   Global Step: 33470   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 02:09:00,134-Speed 5388.07 samples/sec   Loss 9.9623   LearningRate 0.2337   Epoch: 3   Global Step: 33480   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:09:07,660-Speed 5442.86 samples/sec   Loss 10.0714   LearningRate 0.2337   Epoch: 3   Global Step: 33490   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:09:15,081-Speed 5520.47 samples/sec   Loss 10.0701   LearningRate 0.2337   Epoch: 3   Global Step: 33500   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:09:22,696-Speed 5379.92 samples/sec   Loss 10.0268   LearningRate 0.2337   Epoch: 3   Global Step: 33510   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:09:30,332-Speed 5365.03 samples/sec   Loss 9.9400   LearningRate 0.2336   Epoch: 3   Global Step: 33520   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:09:37,853-Speed 5445.93 samples/sec   Loss 9.9743   LearningRate 0.2336   Epoch: 3   Global Step: 33530   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:09:45,509-Speed 5350.90 samples/sec   Loss 9.9741   LearningRate 0.2336   Epoch: 3   Global Step: 33540   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:09:53,236-Speed 5314.52 samples/sec   Loss 10.0249   LearningRate 0.2336   Epoch: 3   Global Step: 33550   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:10:00,696-Speed 5491.51 samples/sec   Loss 9.9028   LearningRate 0.2335   Epoch: 3   Global Step: 33560   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:10:08,294-Speed 5391.12 samples/sec   Loss 9.9942   LearningRate 0.2335   Epoch: 3   Global Step: 33570   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:10:15,900-Speed 5386.31 samples/sec   Loss 9.9948   LearningRate 0.2335   Epoch: 3   Global Step: 33580   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-01-08 02:10:23,400-Speed 5462.37 samples/sec   Loss 9.9872   LearningRate 0.2334   Epoch: 3   Global Step: 33590   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-01-08 02:10:30,933-Speed 5438.35 samples/sec   Loss 9.9684   LearningRate 0.2334   Epoch: 3   Global Step: 33600   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-01-08 02:10:38,537-Speed 5386.79 samples/sec   Loss 9.9523   LearningRate 0.2334   Epoch: 3   Global Step: 33610   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-01-08 02:10:46,177-Speed 5361.90 samples/sec   Loss 10.0828   LearningRate 0.2334   Epoch: 3   Global Step: 33620   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-01-08 02:10:53,768-Speed 5397.07 samples/sec   Loss 10.0853   LearningRate 0.2333   Epoch: 3   Global Step: 33630   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-01-08 02:11:01,443-Speed 5337.75 samples/sec   Loss 10.0548   LearningRate 0.2333   Epoch: 3   Global Step: 33640   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:11:09,040-Speed 5391.61 samples/sec   Loss 9.9700   LearningRate 0.2333   Epoch: 3   Global Step: 33650   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:11:16,619-Speed 5405.31 samples/sec   Loss 9.9994   LearningRate 0.2333   Epoch: 3   Global Step: 33660   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:11:24,143-Speed 5444.78 samples/sec   Loss 10.0432   LearningRate 0.2332   Epoch: 3   Global Step: 33670   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:11:31,642-Speed 5462.83 samples/sec   Loss 9.9439   LearningRate 0.2332   Epoch: 3   Global Step: 33680   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:11:39,198-Speed 5421.70 samples/sec   Loss 10.0093   LearningRate 0.2332   Epoch: 3   Global Step: 33690   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:11:46,811-Speed 5380.55 samples/sec   Loss 9.9549   LearningRate 0.2332   Epoch: 3   Global Step: 33700   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:11:54,445-Speed 5366.82 samples/sec   Loss 9.9280   LearningRate 0.2331   Epoch: 3   Global Step: 33710   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:12:02,085-Speed 5362.15 samples/sec   Loss 9.9468   LearningRate 0.2331   Epoch: 3   Global Step: 33720   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:12:09,649-Speed 5415.64 samples/sec   Loss 9.9632   LearningRate 0.2331   Epoch: 3   Global Step: 33730   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:12:17,255-Speed 5386.03 samples/sec   Loss 9.9579   LearningRate 0.2330   Epoch: 3   Global Step: 33740   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-01-08 02:12:24,811-Speed 5421.46 samples/sec   Loss 9.9607   LearningRate 0.2330   Epoch: 3   Global Step: 33750   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:12:32,320-Speed 5455.37 samples/sec   Loss 9.9677   LearningRate 0.2330   Epoch: 3   Global Step: 33760   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:12:39,882-Speed 5417.45 samples/sec   Loss 10.1327   LearningRate 0.2330   Epoch: 3   Global Step: 33770   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:12:47,423-Speed 5432.68 samples/sec   Loss 10.0378   LearningRate 0.2329   Epoch: 3   Global Step: 33780   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:12:54,978-Speed 5422.26 samples/sec   Loss 10.0696   LearningRate 0.2329   Epoch: 3   Global Step: 33790   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:13:02,664-Speed 5329.53 samples/sec   Loss 9.9348   LearningRate 0.2329   Epoch: 3   Global Step: 33800   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:13:10,147-Speed 5474.01 samples/sec   Loss 9.9598   LearningRate 0.2329   Epoch: 3   Global Step: 33810   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:13:17,722-Speed 5408.26 samples/sec   Loss 9.9437   LearningRate 0.2328   Epoch: 3   Global Step: 33820   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:13:25,310-Speed 5398.49 samples/sec   Loss 9.9847   LearningRate 0.2328   Epoch: 3   Global Step: 33830   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:13:32,916-Speed 5385.92 samples/sec   Loss 10.0147   LearningRate 0.2328   Epoch: 3   Global Step: 33840   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:13:40,583-Speed 5343.35 samples/sec   Loss 9.9893   LearningRate 0.2327   Epoch: 3   Global Step: 33850   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-01-08 02:13:48,161-Speed 5405.92 samples/sec   Loss 9.9783   LearningRate 0.2327   Epoch: 3   Global Step: 33860   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-01-08 02:13:55,730-Speed 5412.20 samples/sec   Loss 9.9234   LearningRate 0.2327   Epoch: 3   Global Step: 33870   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:14:03,383-Speed 5352.33 samples/sec   Loss 9.9890   LearningRate 0.2327   Epoch: 3   Global Step: 33880   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:14:11,033-Speed 5354.91 samples/sec   Loss 9.9720   LearningRate 0.2326   Epoch: 3   Global Step: 33890   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:14:18,637-Speed 5387.70 samples/sec   Loss 9.9883   LearningRate 0.2326   Epoch: 3   Global Step: 33900   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:14:26,180-Speed 5430.46 samples/sec   Loss 9.9901   LearningRate 0.2326   Epoch: 3   Global Step: 33910   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:14:33,735-Speed 5422.36 samples/sec   Loss 10.0201   LearningRate 0.2326   Epoch: 3   Global Step: 33920   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:14:41,294-Speed 5419.80 samples/sec   Loss 10.0044   LearningRate 0.2325   Epoch: 3   Global Step: 33930   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:14:48,886-Speed 5396.09 samples/sec   Loss 9.9784   LearningRate 0.2325   Epoch: 3   Global Step: 33940   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:14:56,435-Speed 5426.48 samples/sec   Loss 10.0272   LearningRate 0.2325   Epoch: 3   Global Step: 33950   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:15:03,942-Speed 5457.06 samples/sec   Loss 9.9716   LearningRate 0.2325   Epoch: 3   Global Step: 33960   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:15:11,480-Speed 5434.31 samples/sec   Loss 9.9826   LearningRate 0.2324   Epoch: 3   Global Step: 33970   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 02:15:19,037-Speed 5420.68 samples/sec   Loss 9.9900   LearningRate 0.2324   Epoch: 3   Global Step: 33980   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 02:15:26,635-Speed 5391.79 samples/sec   Loss 9.9179   LearningRate 0.2324   Epoch: 3   Global Step: 33990   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 02:15:34,216-Speed 5404.03 samples/sec   Loss 9.9214   LearningRate 0.2323   Epoch: 3   Global Step: 34000   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 02:16:18,581-[lfw][34000]XNorm: 20.941988
Training: 2022-01-08 02:16:18,582-[lfw][34000]Accuracy-Flip: 0.99767+-0.00260
Training: 2022-01-08 02:16:18,582-[lfw][34000]Accuracy-Highest: 0.99800
Training: 2022-01-08 02:17:10,601-[cfp_fp][34000]XNorm: 18.630953
Training: 2022-01-08 02:17:10,603-[cfp_fp][34000]Accuracy-Flip: 0.98371+-0.00496
Training: 2022-01-08 02:17:10,603-[cfp_fp][34000]Accuracy-Highest: 0.98457
Training: 2022-01-08 02:17:56,228-[agedb_30][34000]XNorm: 21.064765
Training: 2022-01-08 02:17:56,229-[agedb_30][34000]Accuracy-Flip: 0.96700+-0.00945
Training: 2022-01-08 02:17:56,230-[agedb_30][34000]Accuracy-Highest: 0.97167
Training: 2022-01-08 02:18:03,765-Speed 273.89 samples/sec   Loss 9.9618   LearningRate 0.2323   Epoch: 3   Global Step: 34010   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:18:11,294-Speed 5441.36 samples/sec   Loss 9.9410   LearningRate 0.2323   Epoch: 3   Global Step: 34020   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:18:18,775-Speed 5476.55 samples/sec   Loss 9.9420   LearningRate 0.2323   Epoch: 3   Global Step: 34030   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 02:18:26,266-Speed 5469.29 samples/sec   Loss 10.0507   LearningRate 0.2322   Epoch: 3   Global Step: 34040   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 02:18:33,898-Speed 5368.00 samples/sec   Loss 9.9681   LearningRate 0.2322   Epoch: 3   Global Step: 34050   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 02:18:41,587-Speed 5328.02 samples/sec   Loss 9.9864   LearningRate 0.2322   Epoch: 3   Global Step: 34060   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 02:18:49,099-Speed 5453.70 samples/sec   Loss 9.9463   LearningRate 0.2322   Epoch: 3   Global Step: 34070   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 02:18:56,646-Speed 5428.03 samples/sec   Loss 9.9710   LearningRate 0.2321   Epoch: 3   Global Step: 34080   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 02:19:06,423-Speed 5478.85 samples/sec   Loss 9.9550   LearningRate 0.2321   Epoch: 3   Global Step: 34090   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 02:19:14,005-Speed 5402.41 samples/sec   Loss 9.9057   LearningRate 0.2321   Epoch: 3   Global Step: 34100   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 02:19:21,595-Speed 5397.99 samples/sec   Loss 10.0501   LearningRate 0.2321   Epoch: 3   Global Step: 34110   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 02:19:29,225-Speed 5368.65 samples/sec   Loss 9.9946   LearningRate 0.2320   Epoch: 3   Global Step: 34120   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-08 02:19:36,854-Speed 5369.40 samples/sec   Loss 9.9690   LearningRate 0.2320   Epoch: 3   Global Step: 34130   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 02:19:44,667-Speed 5243.97 samples/sec   Loss 9.9672   LearningRate 0.2320   Epoch: 3   Global Step: 34140   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 02:19:52,226-Speed 5419.52 samples/sec   Loss 9.9579   LearningRate 0.2319   Epoch: 3   Global Step: 34150   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 02:19:59,830-Speed 5387.55 samples/sec   Loss 9.9156   LearningRate 0.2319   Epoch: 3   Global Step: 34160   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 02:20:07,403-Speed 5408.71 samples/sec   Loss 9.9308   LearningRate 0.2319   Epoch: 3   Global Step: 34170   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 02:20:15,132-Speed 5300.51 samples/sec   Loss 9.9804   LearningRate 0.2319   Epoch: 3   Global Step: 34180   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 02:20:22,744-Speed 5381.77 samples/sec   Loss 9.9489   LearningRate 0.2318   Epoch: 3   Global Step: 34190   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 02:20:30,282-Speed 5434.99 samples/sec   Loss 9.9507   LearningRate 0.2318   Epoch: 3   Global Step: 34200   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 02:20:37,703-Speed 5519.72 samples/sec   Loss 10.0102   LearningRate 0.2318   Epoch: 3   Global Step: 34210   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 02:20:45,213-Speed 5455.19 samples/sec   Loss 9.8880   LearningRate 0.2318   Epoch: 3   Global Step: 34220   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-08 02:20:52,720-Speed 5457.58 samples/sec   Loss 9.9156   LearningRate 0.2317   Epoch: 3   Global Step: 34230   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:21:00,290-Speed 5411.50 samples/sec   Loss 9.9556   LearningRate 0.2317   Epoch: 3   Global Step: 34240   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:21:07,906-Speed 5378.39 samples/sec   Loss 9.9805   LearningRate 0.2317   Epoch: 3   Global Step: 34250   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:21:15,598-Speed 5325.95 samples/sec   Loss 9.9332   LearningRate 0.2317   Epoch: 3   Global Step: 34260   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:21:23,200-Speed 5388.91 samples/sec   Loss 9.9373   LearningRate 0.2316   Epoch: 3   Global Step: 34270   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:21:30,881-Speed 5333.55 samples/sec   Loss 9.9993   LearningRate 0.2316   Epoch: 3   Global Step: 34280   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:21:38,527-Speed 5357.79 samples/sec   Loss 9.8383   LearningRate 0.2316   Epoch: 3   Global Step: 34290   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:21:46,087-Speed 5418.51 samples/sec   Loss 9.9966   LearningRate 0.2315   Epoch: 3   Global Step: 34300   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:21:53,668-Speed 5403.71 samples/sec   Loss 9.9714   LearningRate 0.2315   Epoch: 3   Global Step: 34310   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:22:01,281-Speed 5381.17 samples/sec   Loss 9.9635   LearningRate 0.2315   Epoch: 3   Global Step: 34320   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:22:08,863-Speed 5402.64 samples/sec   Loss 9.8914   LearningRate 0.2315   Epoch: 3   Global Step: 34330   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:22:16,530-Speed 5343.23 samples/sec   Loss 9.9540   LearningRate 0.2314   Epoch: 3   Global Step: 34340   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:22:23,973-Speed 5504.76 samples/sec   Loss 9.9509   LearningRate 0.2314   Epoch: 3   Global Step: 34350   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:22:31,549-Speed 5407.57 samples/sec   Loss 9.9799   LearningRate 0.2314   Epoch: 3   Global Step: 34360   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:22:39,080-Speed 5439.22 samples/sec   Loss 9.9146   LearningRate 0.2314   Epoch: 3   Global Step: 34370   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:22:46,606-Speed 5443.38 samples/sec   Loss 9.9770   LearningRate 0.2313   Epoch: 3   Global Step: 34380   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:22:54,171-Speed 5414.61 samples/sec   Loss 9.9043   LearningRate 0.2313   Epoch: 3   Global Step: 34390   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:23:01,725-Speed 5423.18 samples/sec   Loss 9.9732   LearningRate 0.2313   Epoch: 3   Global Step: 34400   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:23:09,255-Speed 5440.23 samples/sec   Loss 9.9264   LearningRate 0.2313   Epoch: 3   Global Step: 34410   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:23:16,731-Speed 5479.56 samples/sec   Loss 9.9553   LearningRate 0.2312   Epoch: 3   Global Step: 34420   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:23:24,343-Speed 5381.57 samples/sec   Loss 9.8893   LearningRate 0.2312   Epoch: 3   Global Step: 34430   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-01-08 02:23:31,861-Speed 5448.97 samples/sec   Loss 9.8879   LearningRate 0.2312   Epoch: 3   Global Step: 34440   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:23:39,418-Speed 5420.61 samples/sec   Loss 9.8826   LearningRate 0.2311   Epoch: 3   Global Step: 34450   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:23:46,911-Speed 5467.47 samples/sec   Loss 10.0466   LearningRate 0.2311   Epoch: 3   Global Step: 34460   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:23:54,821-Speed 5178.70 samples/sec   Loss 10.0013   LearningRate 0.2311   Epoch: 3   Global Step: 34470   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:24:02,371-Speed 5426.06 samples/sec   Loss 10.0077   LearningRate 0.2311   Epoch: 3   Global Step: 34480   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:24:09,858-Speed 5471.37 samples/sec   Loss 9.8854   LearningRate 0.2310   Epoch: 3   Global Step: 34490   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:24:17,293-Speed 5510.05 samples/sec   Loss 9.9039   LearningRate 0.2310   Epoch: 3   Global Step: 34500   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:24:24,925-Speed 5367.82 samples/sec   Loss 9.9512   LearningRate 0.2310   Epoch: 3   Global Step: 34510   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:24:32,502-Speed 5406.14 samples/sec   Loss 9.9634   LearningRate 0.2310   Epoch: 3   Global Step: 34520   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:24:39,918-Speed 5524.21 samples/sec   Loss 9.9096   LearningRate 0.2309   Epoch: 3   Global Step: 34530   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-08 02:24:47,498-Speed 5404.08 samples/sec   Loss 9.9632   LearningRate 0.2309   Epoch: 3   Global Step: 34540   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 02:24:55,058-Speed 5418.83 samples/sec   Loss 9.8873   LearningRate 0.2309   Epoch: 3   Global Step: 34550   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:25:02,663-Speed 5386.99 samples/sec   Loss 9.8897   LearningRate 0.2308   Epoch: 3   Global Step: 34560   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:25:10,194-Speed 5439.15 samples/sec   Loss 9.9076   LearningRate 0.2308   Epoch: 3   Global Step: 34570   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:25:17,778-Speed 5401.73 samples/sec   Loss 9.9617   LearningRate 0.2308   Epoch: 3   Global Step: 34580   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:25:25,256-Speed 5477.91 samples/sec   Loss 9.9015   LearningRate 0.2308   Epoch: 3   Global Step: 34590   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:25:32,803-Speed 5428.32 samples/sec   Loss 9.8990   LearningRate 0.2307   Epoch: 3   Global Step: 34600   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:25:40,286-Speed 5474.23 samples/sec   Loss 9.9715   LearningRate 0.2307   Epoch: 3   Global Step: 34610   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:25:47,902-Speed 5379.51 samples/sec   Loss 9.9915   LearningRate 0.2307   Epoch: 3   Global Step: 34620   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:25:55,686-Speed 5262.22 samples/sec   Loss 9.8830   LearningRate 0.2307   Epoch: 3   Global Step: 34630   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:26:03,383-Speed 5322.88 samples/sec   Loss 9.9067   LearningRate 0.2306   Epoch: 3   Global Step: 34640   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:26:10,980-Speed 5391.98 samples/sec   Loss 9.8664   LearningRate 0.2306   Epoch: 3   Global Step: 34650   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:26:18,526-Speed 5428.71 samples/sec   Loss 9.8511   LearningRate 0.2306   Epoch: 3   Global Step: 34660   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:26:26,171-Speed 5358.60 samples/sec   Loss 10.0037   LearningRate 0.2306   Epoch: 3   Global Step: 34670   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:26:33,655-Speed 5473.71 samples/sec   Loss 9.9402   LearningRate 0.2305   Epoch: 3   Global Step: 34680   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:26:41,138-Speed 5474.14 samples/sec   Loss 9.9560   LearningRate 0.2305   Epoch: 3   Global Step: 34690   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:26:48,850-Speed 5312.27 samples/sec   Loss 9.9131   LearningRate 0.2305   Epoch: 3   Global Step: 34700   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:26:56,455-Speed 5386.43 samples/sec   Loss 9.9355   LearningRate 0.2304   Epoch: 3   Global Step: 34710   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:27:04,020-Speed 5415.39 samples/sec   Loss 9.8645   LearningRate 0.2304   Epoch: 3   Global Step: 34720   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:27:11,442-Speed 5519.00 samples/sec   Loss 9.8986   LearningRate 0.2304   Epoch: 3   Global Step: 34730   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:27:18,989-Speed 5427.92 samples/sec   Loss 9.9602   LearningRate 0.2304   Epoch: 3   Global Step: 34740   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:27:26,552-Speed 5417.00 samples/sec   Loss 9.9289   LearningRate 0.2303   Epoch: 3   Global Step: 34750   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:27:34,099-Speed 5427.84 samples/sec   Loss 9.8856   LearningRate 0.2303   Epoch: 3   Global Step: 34760   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:27:41,629-Speed 5439.97 samples/sec   Loss 9.9187   LearningRate 0.2303   Epoch: 3   Global Step: 34770   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:27:49,149-Speed 5448.18 samples/sec   Loss 9.9196   LearningRate 0.2303   Epoch: 3   Global Step: 34780   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:27:56,656-Speed 5457.16 samples/sec   Loss 9.8447   LearningRate 0.2302   Epoch: 3   Global Step: 34790   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:28:04,183-Speed 5442.62 samples/sec   Loss 9.9367   LearningRate 0.2302   Epoch: 3   Global Step: 34800   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:28:11,608-Speed 5516.87 samples/sec   Loss 9.9594   LearningRate 0.2302   Epoch: 3   Global Step: 34810   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:28:19,079-Speed 5483.29 samples/sec   Loss 9.8860   LearningRate 0.2302   Epoch: 3   Global Step: 34820   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:28:26,719-Speed 5362.18 samples/sec   Loss 9.8929   LearningRate 0.2301   Epoch: 3   Global Step: 34830   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:28:34,370-Speed 5354.59 samples/sec   Loss 9.8905   LearningRate 0.2301   Epoch: 3   Global Step: 34840   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:28:41,793-Speed 5519.04 samples/sec   Loss 9.9431   LearningRate 0.2301   Epoch: 3   Global Step: 34850   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:28:49,418-Speed 5372.12 samples/sec   Loss 9.8397   LearningRate 0.2300   Epoch: 3   Global Step: 34860   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 02:28:57,047-Speed 5370.17 samples/sec   Loss 9.9237   LearningRate 0.2300   Epoch: 3   Global Step: 34870   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 02:29:04,763-Speed 5309.34 samples/sec   Loss 9.9727   LearningRate 0.2300   Epoch: 3   Global Step: 34880   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 02:29:12,386-Speed 5373.95 samples/sec   Loss 9.9250   LearningRate 0.2300   Epoch: 3   Global Step: 34890   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 02:29:20,024-Speed 5363.25 samples/sec   Loss 9.8776   LearningRate 0.2299   Epoch: 3   Global Step: 34900   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 02:29:27,595-Speed 5410.93 samples/sec   Loss 9.9222   LearningRate 0.2299   Epoch: 3   Global Step: 34910   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 02:29:35,050-Speed 5495.02 samples/sec   Loss 9.9223   LearningRate 0.2299   Epoch: 3   Global Step: 34920   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:29:42,607-Speed 5421.37 samples/sec   Loss 9.9176   LearningRate 0.2299   Epoch: 3   Global Step: 34930   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:29:50,259-Speed 5353.22 samples/sec   Loss 9.8753   LearningRate 0.2298   Epoch: 3   Global Step: 34940   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:29:57,890-Speed 5367.94 samples/sec   Loss 9.8837   LearningRate 0.2298   Epoch: 3   Global Step: 34950   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:30:05,456-Speed 5415.21 samples/sec   Loss 9.9112   LearningRate 0.2298   Epoch: 3   Global Step: 34960   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:30:12,882-Speed 5516.90 samples/sec   Loss 9.8994   LearningRate 0.2298   Epoch: 3   Global Step: 34970   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:30:20,432-Speed 5425.94 samples/sec   Loss 9.8468   LearningRate 0.2297   Epoch: 3   Global Step: 34980   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:30:28,000-Speed 5412.44 samples/sec   Loss 9.8698   LearningRate 0.2297   Epoch: 3   Global Step: 34990   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:30:35,518-Speed 5449.26 samples/sec   Loss 9.8289   LearningRate 0.2297   Epoch: 3   Global Step: 35000   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:30:43,071-Speed 5423.95 samples/sec   Loss 9.9002   LearningRate 0.2296   Epoch: 3   Global Step: 35010   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:30:50,767-Speed 5322.10 samples/sec   Loss 9.8205   LearningRate 0.2296   Epoch: 3   Global Step: 35020   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 02:30:58,271-Speed 5458.88 samples/sec   Loss 9.8815   LearningRate 0.2296   Epoch: 3   Global Step: 35030   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 02:31:05,789-Speed 5449.38 samples/sec   Loss 9.8650   LearningRate 0.2296   Epoch: 3   Global Step: 35040   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 02:31:13,376-Speed 5400.13 samples/sec   Loss 9.9243   LearningRate 0.2295   Epoch: 3   Global Step: 35050   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 02:31:20,997-Speed 5374.86 samples/sec   Loss 9.8387   LearningRate 0.2295   Epoch: 3   Global Step: 35060   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:31:28,627-Speed 5368.80 samples/sec   Loss 9.8735   LearningRate 0.2295   Epoch: 3   Global Step: 35070   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:31:36,217-Speed 5397.10 samples/sec   Loss 9.8058   LearningRate 0.2295   Epoch: 3   Global Step: 35080   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:31:43,828-Speed 5383.31 samples/sec   Loss 9.8342   LearningRate 0.2294   Epoch: 3   Global Step: 35090   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:31:51,410-Speed 5402.15 samples/sec   Loss 9.8853   LearningRate 0.2294   Epoch: 3   Global Step: 35100   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:31:58,994-Speed 5401.92 samples/sec   Loss 9.8612   LearningRate 0.2294   Epoch: 3   Global Step: 35110   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:32:06,528-Speed 5437.61 samples/sec   Loss 9.9682   LearningRate 0.2294   Epoch: 3   Global Step: 35120   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:32:14,073-Speed 5429.32 samples/sec   Loss 9.9459   LearningRate 0.2293   Epoch: 3   Global Step: 35130   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:32:21,644-Speed 5411.05 samples/sec   Loss 9.8663   LearningRate 0.2293   Epoch: 3   Global Step: 35140   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:32:29,220-Speed 5407.12 samples/sec   Loss 9.9129   LearningRate 0.2293   Epoch: 3   Global Step: 35150   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:32:36,830-Speed 5382.93 samples/sec   Loss 9.9718   LearningRate 0.2292   Epoch: 3   Global Step: 35160   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 02:32:44,362-Speed 5439.00 samples/sec   Loss 9.8694   LearningRate 0.2292   Epoch: 3   Global Step: 35170   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:32:51,879-Speed 5449.53 samples/sec   Loss 9.8639   LearningRate 0.2292   Epoch: 3   Global Step: 35180   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:32:59,467-Speed 5399.02 samples/sec   Loss 9.9330   LearningRate 0.2292   Epoch: 3   Global Step: 35190   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:33:06,975-Speed 5458.83 samples/sec   Loss 9.9524   LearningRate 0.2291   Epoch: 3   Global Step: 35200   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:33:14,448-Speed 5482.25 samples/sec   Loss 9.8203   LearningRate 0.2291   Epoch: 3   Global Step: 35210   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:33:21,920-Speed 5482.26 samples/sec   Loss 9.8969   LearningRate 0.2291   Epoch: 3   Global Step: 35220   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:33:29,341-Speed 5520.31 samples/sec   Loss 9.8771   LearningRate 0.2291   Epoch: 3   Global Step: 35230   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:33:37,013-Speed 5339.81 samples/sec   Loss 9.8617   LearningRate 0.2290   Epoch: 3   Global Step: 35240   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:33:44,575-Speed 5417.54 samples/sec   Loss 9.8491   LearningRate 0.2290   Epoch: 3   Global Step: 35250   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:33:52,251-Speed 5336.17 samples/sec   Loss 9.8338   LearningRate 0.2290   Epoch: 3   Global Step: 35260   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:34:00,071-Speed 5238.59 samples/sec   Loss 9.9262   LearningRate 0.2290   Epoch: 3   Global Step: 35270   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:34:07,612-Speed 5432.18 samples/sec   Loss 9.7823   LearningRate 0.2289   Epoch: 3   Global Step: 35280   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:34:15,191-Speed 5405.51 samples/sec   Loss 9.8737   LearningRate 0.2289   Epoch: 3   Global Step: 35290   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:34:22,740-Speed 5426.32 samples/sec   Loss 9.8264   LearningRate 0.2289   Epoch: 3   Global Step: 35300   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:34:30,233-Speed 5466.94 samples/sec   Loss 9.8567   LearningRate 0.2288   Epoch: 3   Global Step: 35310   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:34:37,770-Speed 5435.76 samples/sec   Loss 9.8626   LearningRate 0.2288   Epoch: 3   Global Step: 35320   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:34:45,289-Speed 5447.95 samples/sec   Loss 9.8311   LearningRate 0.2288   Epoch: 3   Global Step: 35330   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:34:52,748-Speed 5491.89 samples/sec   Loss 9.8003   LearningRate 0.2288   Epoch: 3   Global Step: 35340   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:35:00,314-Speed 5414.17 samples/sec   Loss 9.8944   LearningRate 0.2287   Epoch: 3   Global Step: 35350   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:35:07,818-Speed 5459.30 samples/sec   Loss 9.8213   LearningRate 0.2287   Epoch: 3   Global Step: 35360   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:35:15,292-Speed 5481.46 samples/sec   Loss 9.8794   LearningRate 0.2287   Epoch: 3   Global Step: 35370   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:35:22,835-Speed 5430.79 samples/sec   Loss 9.8256   LearningRate 0.2287   Epoch: 3   Global Step: 35380   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:35:30,405-Speed 5411.61 samples/sec   Loss 9.8548   LearningRate 0.2286   Epoch: 3   Global Step: 35390   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:35:37,871-Speed 5486.89 samples/sec   Loss 9.8583   LearningRate 0.2286   Epoch: 3   Global Step: 35400   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:35:45,436-Speed 5415.26 samples/sec   Loss 9.9108   LearningRate 0.2286   Epoch: 3   Global Step: 35410   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:35:53,050-Speed 5380.53 samples/sec   Loss 9.8586   LearningRate 0.2286   Epoch: 3   Global Step: 35420   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:36:00,499-Speed 5499.33 samples/sec   Loss 9.8800   LearningRate 0.2285   Epoch: 3   Global Step: 35430   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:36:07,918-Speed 5521.88 samples/sec   Loss 9.8317   LearningRate 0.2285   Epoch: 3   Global Step: 35440   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:36:15,379-Speed 5490.73 samples/sec   Loss 9.8411   LearningRate 0.2285   Epoch: 3   Global Step: 35450   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:36:22,905-Speed 5442.54 samples/sec   Loss 9.9178   LearningRate 0.2285   Epoch: 3   Global Step: 35460   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:36:30,475-Speed 5411.94 samples/sec   Loss 9.8951   LearningRate 0.2284   Epoch: 3   Global Step: 35470   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:36:38,030-Speed 5422.35 samples/sec   Loss 9.8829   LearningRate 0.2284   Epoch: 3   Global Step: 35480   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:36:45,599-Speed 5412.03 samples/sec   Loss 9.7892   LearningRate 0.2284   Epoch: 3   Global Step: 35490   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:36:53,159-Speed 5418.59 samples/sec   Loss 9.8547   LearningRate 0.2283   Epoch: 3   Global Step: 35500   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:37:00,772-Speed 5381.33 samples/sec   Loss 9.9035   LearningRate 0.2283   Epoch: 3   Global Step: 35510   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:37:08,309-Speed 5435.69 samples/sec   Loss 9.8185   LearningRate 0.2283   Epoch: 3   Global Step: 35520   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:37:15,728-Speed 5521.32 samples/sec   Loss 9.8354   LearningRate 0.2283   Epoch: 3   Global Step: 35530   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:37:23,215-Speed 5471.61 samples/sec   Loss 9.9463   LearningRate 0.2282   Epoch: 3   Global Step: 35540   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:37:30,763-Speed 5427.51 samples/sec   Loss 9.8896   LearningRate 0.2282   Epoch: 3   Global Step: 35550   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:37:38,321-Speed 5420.06 samples/sec   Loss 9.9118   LearningRate 0.2282   Epoch: 3   Global Step: 35560   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 02:37:45,802-Speed 5475.86 samples/sec   Loss 9.8834   LearningRate 0.2282   Epoch: 3   Global Step: 35570   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:37:53,265-Speed 5488.92 samples/sec   Loss 9.8972   LearningRate 0.2281   Epoch: 3   Global Step: 35580   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:38:00,793-Speed 5442.14 samples/sec   Loss 9.8915   LearningRate 0.2281   Epoch: 3   Global Step: 35590   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:38:08,441-Speed 5356.65 samples/sec   Loss 9.8591   LearningRate 0.2281   Epoch: 3   Global Step: 35600   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:38:15,889-Speed 5499.78 samples/sec   Loss 9.8205   LearningRate 0.2281   Epoch: 3   Global Step: 35610   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:38:23,360-Speed 5483.24 samples/sec   Loss 9.8567   LearningRate 0.2280   Epoch: 3   Global Step: 35620   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:38:30,804-Speed 5503.91 samples/sec   Loss 9.9124   LearningRate 0.2280   Epoch: 3   Global Step: 35630   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:38:38,246-Speed 5503.69 samples/sec   Loss 9.8072   LearningRate 0.2280   Epoch: 3   Global Step: 35640   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:38:45,818-Speed 5410.18 samples/sec   Loss 9.8617   LearningRate 0.2279   Epoch: 3   Global Step: 35650   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:38:53,449-Speed 5369.06 samples/sec   Loss 9.7957   LearningRate 0.2279   Epoch: 3   Global Step: 35660   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:39:01,022-Speed 5409.50 samples/sec   Loss 9.8985   LearningRate 0.2279   Epoch: 3   Global Step: 35670   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 02:39:08,763-Speed 5292.05 samples/sec   Loss 9.8622   LearningRate 0.2279   Epoch: 3   Global Step: 35680   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 02:39:16,314-Speed 5424.55 samples/sec   Loss 9.7981   LearningRate 0.2278   Epoch: 3   Global Step: 35690   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 02:39:23,901-Speed 5400.20 samples/sec   Loss 9.8515   LearningRate 0.2278   Epoch: 3   Global Step: 35700   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 02:39:31,332-Speed 5512.52 samples/sec   Loss 9.8615   LearningRate 0.2278   Epoch: 3   Global Step: 35710   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:39:38,973-Speed 5360.78 samples/sec   Loss 9.8863   LearningRate 0.2278   Epoch: 3   Global Step: 35720   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:39:46,737-Speed 5276.45 samples/sec   Loss 9.8877   LearningRate 0.2277   Epoch: 3   Global Step: 35730   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:39:54,198-Speed 5490.45 samples/sec   Loss 9.8300   LearningRate 0.2277   Epoch: 3   Global Step: 35740   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:40:01,723-Speed 5444.42 samples/sec   Loss 9.8290   LearningRate 0.2277   Epoch: 3   Global Step: 35750   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-01-08 02:40:09,203-Speed 5475.94 samples/sec   Loss 9.7677   LearningRate 0.2277   Epoch: 3   Global Step: 35760   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-01-08 02:40:16,696-Speed 5467.39 samples/sec   Loss 9.8382   LearningRate 0.2276   Epoch: 3   Global Step: 35770   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-01-08 02:40:24,101-Speed 5532.64 samples/sec   Loss 9.8209   LearningRate 0.2276   Epoch: 3   Global Step: 35780   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-01-08 02:40:31,500-Speed 5536.05 samples/sec   Loss 9.9046   LearningRate 0.2276   Epoch: 3   Global Step: 35790   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-01-08 02:40:39,063-Speed 5416.66 samples/sec   Loss 9.8389   LearningRate 0.2275   Epoch: 3   Global Step: 35800   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-01-08 02:40:46,682-Speed 5377.10 samples/sec   Loss 9.8414   LearningRate 0.2275   Epoch: 3   Global Step: 35810   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-01-08 02:40:54,172-Speed 5469.18 samples/sec   Loss 9.8221   LearningRate 0.2275   Epoch: 3   Global Step: 35820   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-01-08 02:41:01,809-Speed 5364.09 samples/sec   Loss 9.9626   LearningRate 0.2275   Epoch: 3   Global Step: 35830   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-01-08 02:41:09,378-Speed 5412.22 samples/sec   Loss 9.8227   LearningRate 0.2274   Epoch: 3   Global Step: 35840   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-01-08 02:41:17,065-Speed 5329.23 samples/sec   Loss 9.8580   LearningRate 0.2274   Epoch: 3   Global Step: 35850   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:41:24,774-Speed 5314.50 samples/sec   Loss 9.8128   LearningRate 0.2274   Epoch: 3   Global Step: 35860   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:41:32,309-Speed 5436.16 samples/sec   Loss 9.8419   LearningRate 0.2274   Epoch: 3   Global Step: 35870   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:41:39,881-Speed 5409.79 samples/sec   Loss 9.8262   LearningRate 0.2273   Epoch: 3   Global Step: 35880   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:41:47,369-Speed 5471.19 samples/sec   Loss 9.8019   LearningRate 0.2273   Epoch: 3   Global Step: 35890   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:41:54,950-Speed 5403.57 samples/sec   Loss 9.8333   LearningRate 0.2273   Epoch: 3   Global Step: 35900   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:42:02,379-Speed 5514.05 samples/sec   Loss 9.9363   LearningRate 0.2273   Epoch: 3   Global Step: 35910   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:42:09,861-Speed 5475.36 samples/sec   Loss 9.8533   LearningRate 0.2272   Epoch: 3   Global Step: 35920   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:42:17,374-Speed 5452.65 samples/sec   Loss 9.8731   LearningRate 0.2272   Epoch: 3   Global Step: 35930   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:42:24,926-Speed 5424.17 samples/sec   Loss 9.8314   LearningRate 0.2272   Epoch: 3   Global Step: 35940   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:42:32,461-Speed 5436.80 samples/sec   Loss 9.9162   LearningRate 0.2272   Epoch: 3   Global Step: 35950   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:42:39,889-Speed 5514.58 samples/sec   Loss 9.8785   LearningRate 0.2271   Epoch: 3   Global Step: 35960   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:42:47,450-Speed 5417.84 samples/sec   Loss 9.8195   LearningRate 0.2271   Epoch: 3   Global Step: 35970   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:42:55,036-Speed 5400.46 samples/sec   Loss 9.7869   LearningRate 0.2271   Epoch: 3   Global Step: 35980   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:43:02,537-Speed 5461.67 samples/sec   Loss 9.7629   LearningRate 0.2270   Epoch: 3   Global Step: 35990   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:43:10,078-Speed 5431.58 samples/sec   Loss 9.7796   LearningRate 0.2270   Epoch: 3   Global Step: 36000   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:43:54,071-[lfw][36000]XNorm: 21.759819
Training: 2022-01-08 02:43:54,072-[lfw][36000]Accuracy-Flip: 0.99750+-0.00318
Training: 2022-01-08 02:43:54,073-[lfw][36000]Accuracy-Highest: 0.99800
Training: 2022-01-08 02:44:46,041-[cfp_fp][36000]XNorm: 19.605702
Training: 2022-01-08 02:44:46,042-[cfp_fp][36000]Accuracy-Flip: 0.97829+-0.00758
Training: 2022-01-08 02:44:46,043-[cfp_fp][36000]Accuracy-Highest: 0.98457
Training: 2022-01-08 02:45:31,662-[agedb_30][36000]XNorm: 21.594704
Training: 2022-01-08 02:45:31,663-[agedb_30][36000]Accuracy-Flip: 0.97250+-0.00772
Training: 2022-01-08 02:45:31,664-[agedb_30][36000]Accuracy-Highest: 0.97250
Training: 2022-01-08 02:45:39,279-Speed 274.53 samples/sec   Loss 9.8549   LearningRate 0.2270   Epoch: 3   Global Step: 36010   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:45:46,713-Speed 5511.85 samples/sec   Loss 9.7913   LearningRate 0.2270   Epoch: 3   Global Step: 36020   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:45:54,192-Speed 5478.45 samples/sec   Loss 9.8088   LearningRate 0.2269   Epoch: 3   Global Step: 36030   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:46:01,839-Speed 5357.30 samples/sec   Loss 9.8020   LearningRate 0.2269   Epoch: 3   Global Step: 36040   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:46:09,619-Speed 5266.10 samples/sec   Loss 9.8769   LearningRate 0.2269   Epoch: 3   Global Step: 36050   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:46:17,295-Speed 5337.03 samples/sec   Loss 9.8396   LearningRate 0.2269   Epoch: 3   Global Step: 36060   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:46:24,802-Speed 5457.31 samples/sec   Loss 9.8483   LearningRate 0.2268   Epoch: 3   Global Step: 36070   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:46:32,326-Speed 5444.94 samples/sec   Loss 9.8590   LearningRate 0.2268   Epoch: 3   Global Step: 36080   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:46:39,929-Speed 5388.35 samples/sec   Loss 9.8844   LearningRate 0.2268   Epoch: 3   Global Step: 36090   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:46:47,777-Speed 5220.68 samples/sec   Loss 9.8534   LearningRate 0.2268   Epoch: 3   Global Step: 36100   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:46:55,350-Speed 5409.68 samples/sec   Loss 9.7857   LearningRate 0.2267   Epoch: 3   Global Step: 36110   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:47:02,866-Speed 5451.09 samples/sec   Loss 9.8005   LearningRate 0.2267   Epoch: 3   Global Step: 36120   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:47:10,390-Speed 5444.86 samples/sec   Loss 9.9096   LearningRate 0.2267   Epoch: 3   Global Step: 36130   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:47:17,910-Speed 5447.92 samples/sec   Loss 9.8090   LearningRate 0.2266   Epoch: 3   Global Step: 36140   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:47:25,438-Speed 5443.44 samples/sec   Loss 9.7908   LearningRate 0.2266   Epoch: 3   Global Step: 36150   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:47:33,124-Speed 5330.38 samples/sec   Loss 9.7375   LearningRate 0.2266   Epoch: 3   Global Step: 36160   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:47:40,810-Speed 5329.94 samples/sec   Loss 9.8530   LearningRate 0.2266   Epoch: 3   Global Step: 36170   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 02:47:48,347-Speed 5435.79 samples/sec   Loss 9.8141   LearningRate 0.2265   Epoch: 3   Global Step: 36180   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 02:47:55,854-Speed 5457.78 samples/sec   Loss 9.7381   LearningRate 0.2265   Epoch: 3   Global Step: 36190   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:48:03,323-Speed 5485.16 samples/sec   Loss 9.7251   LearningRate 0.2265   Epoch: 3   Global Step: 36200   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:48:11,154-Speed 5231.83 samples/sec   Loss 9.8164   LearningRate 0.2265   Epoch: 3   Global Step: 36210   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:48:18,776-Speed 5374.88 samples/sec   Loss 9.8319   LearningRate 0.2264   Epoch: 3   Global Step: 36220   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:48:26,393-Speed 5378.95 samples/sec   Loss 9.7496   LearningRate 0.2264   Epoch: 3   Global Step: 36230   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:48:33,848-Speed 5496.00 samples/sec   Loss 9.8062   LearningRate 0.2264   Epoch: 3   Global Step: 36240   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:48:41,229-Speed 5550.44 samples/sec   Loss 9.8920   LearningRate 0.2264   Epoch: 3   Global Step: 36250   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:48:48,704-Speed 5481.01 samples/sec   Loss 9.8203   LearningRate 0.2263   Epoch: 3   Global Step: 36260   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:48:56,290-Speed 5400.19 samples/sec   Loss 9.7937   LearningRate 0.2263   Epoch: 3   Global Step: 36270   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:49:03,934-Speed 5359.60 samples/sec   Loss 9.9010   LearningRate 0.2263   Epoch: 3   Global Step: 36280   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:49:11,428-Speed 5466.92 samples/sec   Loss 9.8206   LearningRate 0.2263   Epoch: 3   Global Step: 36290   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:49:18,921-Speed 5467.31 samples/sec   Loss 9.8234   LearningRate 0.2262   Epoch: 3   Global Step: 36300   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:49:26,471-Speed 5426.50 samples/sec   Loss 9.7776   LearningRate 0.2262   Epoch: 3   Global Step: 36310   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:49:34,034-Speed 5416.07 samples/sec   Loss 9.8210   LearningRate 0.2262   Epoch: 3   Global Step: 36320   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:49:41,538-Speed 5459.11 samples/sec   Loss 9.8357   LearningRate 0.2261   Epoch: 3   Global Step: 36330   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:49:49,115-Speed 5406.55 samples/sec   Loss 9.8293   LearningRate 0.2261   Epoch: 3   Global Step: 36340   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:49:56,600-Speed 5472.92 samples/sec   Loss 9.7646   LearningRate 0.2261   Epoch: 3   Global Step: 36350   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:50:04,060-Speed 5491.14 samples/sec   Loss 9.7958   LearningRate 0.2261   Epoch: 3   Global Step: 36360   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:50:11,526-Speed 5487.44 samples/sec   Loss 9.7663   LearningRate 0.2260   Epoch: 3   Global Step: 36370   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:50:18,963-Speed 5508.35 samples/sec   Loss 9.8196   LearningRate 0.2260   Epoch: 3   Global Step: 36380   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:50:26,412-Speed 5499.86 samples/sec   Loss 9.7516   LearningRate 0.2260   Epoch: 3   Global Step: 36390   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:50:33,886-Speed 5480.50 samples/sec   Loss 9.8119   LearningRate 0.2260   Epoch: 3   Global Step: 36400   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:50:41,399-Speed 5452.47 samples/sec   Loss 9.7193   LearningRate 0.2259   Epoch: 3   Global Step: 36410   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 02:50:48,886-Speed 5472.11 samples/sec   Loss 9.8062   LearningRate 0.2259   Epoch: 3   Global Step: 36420   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 02:50:56,260-Speed 5554.94 samples/sec   Loss 9.7149   LearningRate 0.2259   Epoch: 3   Global Step: 36430   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:51:03,754-Speed 5466.88 samples/sec   Loss 9.7663   LearningRate 0.2259   Epoch: 3   Global Step: 36440   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:51:11,295-Speed 5432.68 samples/sec   Loss 9.7326   LearningRate 0.2258   Epoch: 3   Global Step: 36450   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:51:18,863-Speed 5413.10 samples/sec   Loss 9.8331   LearningRate 0.2258   Epoch: 3   Global Step: 36460   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:51:26,419-Speed 5421.71 samples/sec   Loss 9.7593   LearningRate 0.2258   Epoch: 3   Global Step: 36470   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:51:34,048-Speed 5369.62 samples/sec   Loss 9.7456   LearningRate 0.2257   Epoch: 3   Global Step: 36480   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:51:41,707-Speed 5348.85 samples/sec   Loss 9.7169   LearningRate 0.2257   Epoch: 3   Global Step: 36490   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:51:49,120-Speed 5526.49 samples/sec   Loss 9.8569   LearningRate 0.2257   Epoch: 3   Global Step: 36500   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:51:56,602-Speed 5475.26 samples/sec   Loss 9.8231   LearningRate 0.2257   Epoch: 3   Global Step: 36510   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:52:04,081-Speed 5477.05 samples/sec   Loss 9.7225   LearningRate 0.2256   Epoch: 3   Global Step: 36520   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:52:11,623-Speed 5431.64 samples/sec   Loss 9.8483   LearningRate 0.2256   Epoch: 3   Global Step: 36530   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 02:52:19,022-Speed 5537.13 samples/sec   Loss 9.8420   LearningRate 0.2256   Epoch: 3   Global Step: 36540   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:52:26,626-Speed 5387.26 samples/sec   Loss 9.7690   LearningRate 0.2256   Epoch: 3   Global Step: 36550   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:52:34,196-Speed 5412.02 samples/sec   Loss 9.7522   LearningRate 0.2255   Epoch: 3   Global Step: 36560   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:52:41,696-Speed 5462.06 samples/sec   Loss 9.8408   LearningRate 0.2255   Epoch: 3   Global Step: 36570   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:52:49,303-Speed 5385.49 samples/sec   Loss 9.7215   LearningRate 0.2255   Epoch: 3   Global Step: 36580   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:52:56,789-Speed 5472.22 samples/sec   Loss 9.7522   LearningRate 0.2255   Epoch: 3   Global Step: 36590   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:53:04,297-Speed 5455.96 samples/sec   Loss 9.8004   LearningRate 0.2254   Epoch: 3   Global Step: 36600   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:53:11,829-Speed 5439.04 samples/sec   Loss 9.7891   LearningRate 0.2254   Epoch: 3   Global Step: 36610   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:53:19,406-Speed 5406.73 samples/sec   Loss 9.8100   LearningRate 0.2254   Epoch: 3   Global Step: 36620   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:53:26,819-Speed 5526.44 samples/sec   Loss 9.8197   LearningRate 0.2254   Epoch: 3   Global Step: 36630   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:53:34,284-Speed 5487.38 samples/sec   Loss 9.7699   LearningRate 0.2253   Epoch: 3   Global Step: 36640   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:53:41,748-Speed 5488.04 samples/sec   Loss 9.8327   LearningRate 0.2253   Epoch: 3   Global Step: 36650   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:53:49,293-Speed 5429.52 samples/sec   Loss 9.7373   LearningRate 0.2253   Epoch: 3   Global Step: 36660   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:53:56,794-Speed 5461.83 samples/sec   Loss 9.7029   LearningRate 0.2252   Epoch: 3   Global Step: 36670   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:54:04,265-Speed 5482.96 samples/sec   Loss 9.8087   LearningRate 0.2252   Epoch: 3   Global Step: 36680   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:54:11,723-Speed 5493.12 samples/sec   Loss 9.6878   LearningRate 0.2252   Epoch: 3   Global Step: 36690   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:54:19,261-Speed 5434.25 samples/sec   Loss 9.7194   LearningRate 0.2252   Epoch: 3   Global Step: 36700   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:54:26,792-Speed 5440.29 samples/sec   Loss 9.7509   LearningRate 0.2251   Epoch: 3   Global Step: 36710   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:54:34,298-Speed 5457.33 samples/sec   Loss 9.7913   LearningRate 0.2251   Epoch: 3   Global Step: 36720   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:54:41,719-Speed 5520.69 samples/sec   Loss 9.7651   LearningRate 0.2251   Epoch: 3   Global Step: 36730   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:54:49,209-Speed 5469.09 samples/sec   Loss 9.7861   LearningRate 0.2251   Epoch: 3   Global Step: 36740   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 02:54:56,742-Speed 5438.26 samples/sec   Loss 9.7785   LearningRate 0.2250   Epoch: 3   Global Step: 36750   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 02:55:04,211-Speed 5484.93 samples/sec   Loss 9.7762   LearningRate 0.2250   Epoch: 3   Global Step: 36760   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:55:11,655-Speed 5503.11 samples/sec   Loss 9.7801   LearningRate 0.2250   Epoch: 3   Global Step: 36770   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:55:19,372-Speed 5308.21 samples/sec   Loss 9.8219   LearningRate 0.2250   Epoch: 3   Global Step: 36780   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:55:26,946-Speed 5409.22 samples/sec   Loss 9.8356   LearningRate 0.2249   Epoch: 3   Global Step: 36790   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:55:34,431-Speed 5472.44 samples/sec   Loss 9.7523   LearningRate 0.2249   Epoch: 3   Global Step: 36800   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:55:42,006-Speed 5408.40 samples/sec   Loss 9.8835   LearningRate 0.2249   Epoch: 3   Global Step: 36810   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:55:49,533-Speed 5442.46 samples/sec   Loss 9.6753   LearningRate 0.2249   Epoch: 3   Global Step: 36820   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:55:56,982-Speed 5499.45 samples/sec   Loss 9.7816   LearningRate 0.2248   Epoch: 3   Global Step: 36830   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:56:04,468-Speed 5472.23 samples/sec   Loss 9.7933   LearningRate 0.2248   Epoch: 3   Global Step: 36840   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:56:11,878-Speed 5528.87 samples/sec   Loss 9.7397   LearningRate 0.2248   Epoch: 3   Global Step: 36850   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:56:19,313-Speed 5509.58 samples/sec   Loss 9.7686   LearningRate 0.2247   Epoch: 3   Global Step: 36860   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:56:26,743-Speed 5513.33 samples/sec   Loss 9.7602   LearningRate 0.2247   Epoch: 3   Global Step: 36870   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:56:34,256-Speed 5452.78 samples/sec   Loss 9.6757   LearningRate 0.2247   Epoch: 3   Global Step: 36880   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:56:41,705-Speed 5499.60 samples/sec   Loss 9.6953   LearningRate 0.2247   Epoch: 3   Global Step: 36890   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:56:49,128-Speed 5518.65 samples/sec   Loss 9.7797   LearningRate 0.2246   Epoch: 3   Global Step: 36900   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:56:56,585-Speed 5493.66 samples/sec   Loss 9.7704   LearningRate 0.2246   Epoch: 3   Global Step: 36910   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 02:57:03,975-Speed 5543.62 samples/sec   Loss 9.8386   LearningRate 0.2246   Epoch: 3   Global Step: 36920   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:57:11,507-Speed 5439.25 samples/sec   Loss 9.7969   LearningRate 0.2246   Epoch: 3   Global Step: 36930   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:57:18,964-Speed 5493.34 samples/sec   Loss 9.7051   LearningRate 0.2245   Epoch: 3   Global Step: 36940   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:57:26,403-Speed 5507.00 samples/sec   Loss 9.7980   LearningRate 0.2245   Epoch: 3   Global Step: 36950   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:57:33,867-Speed 5488.75 samples/sec   Loss 9.7665   LearningRate 0.2245   Epoch: 3   Global Step: 36960   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:57:41,348-Speed 5475.59 samples/sec   Loss 9.8191   LearningRate 0.2245   Epoch: 3   Global Step: 36970   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:57:48,845-Speed 5464.19 samples/sec   Loss 9.7403   LearningRate 0.2244   Epoch: 3   Global Step: 36980   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:57:56,485-Speed 5361.75 samples/sec   Loss 9.7464   LearningRate 0.2244   Epoch: 3   Global Step: 36990   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:58:03,987-Speed 5461.06 samples/sec   Loss 9.6354   LearningRate 0.2244   Epoch: 3   Global Step: 37000   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:58:11,569-Speed 5402.84 samples/sec   Loss 9.7771   LearningRate 0.2244   Epoch: 3   Global Step: 37010   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:58:19,036-Speed 5486.16 samples/sec   Loss 9.8232   LearningRate 0.2243   Epoch: 3   Global Step: 37020   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 02:58:26,427-Speed 5542.62 samples/sec   Loss 9.7779   LearningRate 0.2243   Epoch: 3   Global Step: 37030   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:58:33,974-Speed 5428.69 samples/sec   Loss 9.8186   LearningRate 0.2243   Epoch: 3   Global Step: 37040   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:58:41,427-Speed 5496.52 samples/sec   Loss 9.8089   LearningRate 0.2242   Epoch: 3   Global Step: 37050   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:58:48,949-Speed 5445.65 samples/sec   Loss 9.7854   LearningRate 0.2242   Epoch: 3   Global Step: 37060   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:58:56,412-Speed 5488.89 samples/sec   Loss 9.7379   LearningRate 0.2242   Epoch: 3   Global Step: 37070   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:59:03,902-Speed 5469.95 samples/sec   Loss 9.7121   LearningRate 0.2242   Epoch: 3   Global Step: 37080   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:59:11,449-Speed 5428.66 samples/sec   Loss 9.7707   LearningRate 0.2241   Epoch: 3   Global Step: 37090   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:59:18,879-Speed 5513.21 samples/sec   Loss 9.7841   LearningRate 0.2241   Epoch: 3   Global Step: 37100   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:59:26,375-Speed 5464.37 samples/sec   Loss 9.7377   LearningRate 0.2241   Epoch: 3   Global Step: 37110   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:59:33,903-Speed 5442.29 samples/sec   Loss 9.7085   LearningRate 0.2241   Epoch: 3   Global Step: 37120   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 02:59:41,396-Speed 5467.36 samples/sec   Loss 9.7015   LearningRate 0.2240   Epoch: 3   Global Step: 37130   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 02:59:48,867-Speed 5482.72 samples/sec   Loss 9.7315   LearningRate 0.2240   Epoch: 3   Global Step: 37140   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 02:59:56,240-Speed 5556.42 samples/sec   Loss 9.7265   LearningRate 0.2240   Epoch: 3   Global Step: 37150   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:00:03,626-Speed 5546.77 samples/sec   Loss 9.6412   LearningRate 0.2240   Epoch: 3   Global Step: 37160   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:00:11,131-Speed 5458.72 samples/sec   Loss 9.7965   LearningRate 0.2239   Epoch: 3   Global Step: 37170   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:00:18,545-Speed 5525.08 samples/sec   Loss 9.6863   LearningRate 0.2239   Epoch: 3   Global Step: 37180   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:00:26,014-Speed 5484.59 samples/sec   Loss 9.7755   LearningRate 0.2239   Epoch: 3   Global Step: 37190   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:00:33,473-Speed 5492.15 samples/sec   Loss 9.8328   LearningRate 0.2239   Epoch: 3   Global Step: 37200   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:00:40,880-Speed 5530.84 samples/sec   Loss 9.7114   LearningRate 0.2238   Epoch: 3   Global Step: 37210   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:00:48,322-Speed 5505.01 samples/sec   Loss 9.7512   LearningRate 0.2238   Epoch: 3   Global Step: 37220   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:00:55,901-Speed 5405.18 samples/sec   Loss 9.7631   LearningRate 0.2238   Epoch: 3   Global Step: 37230   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:01:03,492-Speed 5396.14 samples/sec   Loss 9.7875   LearningRate 0.2237   Epoch: 3   Global Step: 37240   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:01:11,043-Speed 5425.54 samples/sec   Loss 9.7949   LearningRate 0.2237   Epoch: 3   Global Step: 37250   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:01:18,534-Speed 5468.88 samples/sec   Loss 9.7578   LearningRate 0.2237   Epoch: 3   Global Step: 37260   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:01:26,140-Speed 5385.26 samples/sec   Loss 9.6971   LearningRate 0.2237   Epoch: 3   Global Step: 37270   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:01:33,623-Speed 5474.58 samples/sec   Loss 9.7094   LearningRate 0.2236   Epoch: 3   Global Step: 37280   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:01:41,122-Speed 5463.10 samples/sec   Loss 9.6871   LearningRate 0.2236   Epoch: 3   Global Step: 37290   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:01:48,539-Speed 5522.75 samples/sec   Loss 9.6385   LearningRate 0.2236   Epoch: 3   Global Step: 37300   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:01:55,962-Speed 5519.08 samples/sec   Loss 9.6887   LearningRate 0.2236   Epoch: 3   Global Step: 37310   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:02:03,561-Speed 5390.79 samples/sec   Loss 9.7406   LearningRate 0.2235   Epoch: 3   Global Step: 37320   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:02:11,120-Speed 5419.56 samples/sec   Loss 9.7200   LearningRate 0.2235   Epoch: 3   Global Step: 37330   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:02:18,649-Speed 5441.08 samples/sec   Loss 9.7775   LearningRate 0.2235   Epoch: 3   Global Step: 37340   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:02:26,192-Speed 5430.86 samples/sec   Loss 9.6944   LearningRate 0.2235   Epoch: 3   Global Step: 37350   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:02:33,680-Speed 5471.04 samples/sec   Loss 9.6820   LearningRate 0.2234   Epoch: 3   Global Step: 37360   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:02:41,146-Speed 5486.84 samples/sec   Loss 9.7545   LearningRate 0.2234   Epoch: 3   Global Step: 37370   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:02:48,713-Speed 5413.71 samples/sec   Loss 9.6856   LearningRate 0.2234   Epoch: 3   Global Step: 37380   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:02:56,130-Speed 5522.88 samples/sec   Loss 9.7305   LearningRate 0.2234   Epoch: 3   Global Step: 37390   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:03:03,614-Speed 5473.76 samples/sec   Loss 9.7446   LearningRate 0.2233   Epoch: 3   Global Step: 37400   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:03:11,107-Speed 5467.91 samples/sec   Loss 9.7150   LearningRate 0.2233   Epoch: 3   Global Step: 37410   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:03:18,625-Speed 5448.81 samples/sec   Loss 9.6741   LearningRate 0.2233   Epoch: 3   Global Step: 37420   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:03:26,120-Speed 5465.24 samples/sec   Loss 9.7818   LearningRate 0.2232   Epoch: 3   Global Step: 37430   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:03:33,665-Speed 5430.08 samples/sec   Loss 9.7575   LearningRate 0.2232   Epoch: 3   Global Step: 37440   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:03:41,193-Speed 5442.01 samples/sec   Loss 9.7513   LearningRate 0.2232   Epoch: 3   Global Step: 37450   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:03:48,742-Speed 5426.34 samples/sec   Loss 9.7478   LearningRate 0.2232   Epoch: 3   Global Step: 37460   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:03:56,327-Speed 5400.45 samples/sec   Loss 9.7135   LearningRate 0.2231   Epoch: 3   Global Step: 37470   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:04:03,882-Speed 5422.31 samples/sec   Loss 9.7241   LearningRate 0.2231   Epoch: 3   Global Step: 37480   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:04:11,421-Speed 5434.11 samples/sec   Loss 9.6786   LearningRate 0.2231   Epoch: 3   Global Step: 37490   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 03:04:18,845-Speed 5518.35 samples/sec   Loss 9.7306   LearningRate 0.2231   Epoch: 3   Global Step: 37500   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 03:04:26,337-Speed 5467.43 samples/sec   Loss 9.6946   LearningRate 0.2230   Epoch: 3   Global Step: 37510   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 03:04:33,876-Speed 5433.63 samples/sec   Loss 9.7028   LearningRate 0.2230   Epoch: 3   Global Step: 37520   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 03:04:41,345-Speed 5484.46 samples/sec   Loss 9.7858   LearningRate 0.2230   Epoch: 3   Global Step: 37530   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 03:04:48,901-Speed 5421.93 samples/sec   Loss 9.6434   LearningRate 0.2230   Epoch: 3   Global Step: 37540   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 03:04:56,341-Speed 5506.14 samples/sec   Loss 9.7284   LearningRate 0.2229   Epoch: 3   Global Step: 37550   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:05:03,903-Speed 5417.13 samples/sec   Loss 9.7261   LearningRate 0.2229   Epoch: 3   Global Step: 37560   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:05:11,380-Speed 5478.63 samples/sec   Loss 9.7194   LearningRate 0.2229   Epoch: 3   Global Step: 37570   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:05:18,843-Speed 5489.14 samples/sec   Loss 9.7507   LearningRate 0.2229   Epoch: 3   Global Step: 37580   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:05:26,322-Speed 5477.72 samples/sec   Loss 9.7292   LearningRate 0.2228   Epoch: 3   Global Step: 37590   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:05:33,850-Speed 5441.57 samples/sec   Loss 9.6985   LearningRate 0.2228   Epoch: 3   Global Step: 37600   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:05:41,325-Speed 5480.68 samples/sec   Loss 9.6587   LearningRate 0.2228   Epoch: 3   Global Step: 37610   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:05:48,785-Speed 5491.75 samples/sec   Loss 9.6157   LearningRate 0.2227   Epoch: 3   Global Step: 37620   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:05:56,260-Speed 5479.88 samples/sec   Loss 9.6791   LearningRate 0.2227   Epoch: 3   Global Step: 37630   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:06:03,746-Speed 5472.52 samples/sec   Loss 9.7069   LearningRate 0.2227   Epoch: 3   Global Step: 37640   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:06:11,207-Speed 5490.55 samples/sec   Loss 9.6824   LearningRate 0.2227   Epoch: 3   Global Step: 37650   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 03:06:18,729-Speed 5446.12 samples/sec   Loss 9.7799   LearningRate 0.2226   Epoch: 3   Global Step: 37660   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:06:26,352-Speed 5374.55 samples/sec   Loss 9.7650   LearningRate 0.2226   Epoch: 3   Global Step: 37670   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:06:33,955-Speed 5387.74 samples/sec   Loss 9.7437   LearningRate 0.2226   Epoch: 3   Global Step: 37680   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:06:41,388-Speed 5511.02 samples/sec   Loss 9.6782   LearningRate 0.2226   Epoch: 3   Global Step: 37690   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:06:48,830-Speed 5504.61 samples/sec   Loss 9.7257   LearningRate 0.2225   Epoch: 3   Global Step: 37700   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:06:56,332-Speed 5460.54 samples/sec   Loss 9.7522   LearningRate 0.2225   Epoch: 3   Global Step: 37710   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:07:03,854-Speed 5446.63 samples/sec   Loss 9.7220   LearningRate 0.2225   Epoch: 3   Global Step: 37720   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:07:11,482-Speed 5370.42 samples/sec   Loss 9.6062   LearningRate 0.2225   Epoch: 3   Global Step: 37730   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:07:19,057-Speed 5407.64 samples/sec   Loss 9.6851   LearningRate 0.2224   Epoch: 3   Global Step: 37740   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:07:26,543-Speed 5472.56 samples/sec   Loss 9.6453   LearningRate 0.2224   Epoch: 3   Global Step: 37750   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:07:34,049-Speed 5457.54 samples/sec   Loss 9.6422   LearningRate 0.2224   Epoch: 3   Global Step: 37760   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 03:07:41,591-Speed 5431.24 samples/sec   Loss 9.7162   LearningRate 0.2224   Epoch: 3   Global Step: 37770   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 03:07:49,089-Speed 5464.45 samples/sec   Loss 9.6426   LearningRate 0.2223   Epoch: 3   Global Step: 37780   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 03:07:56,517-Speed 5515.11 samples/sec   Loss 9.6978   LearningRate 0.2223   Epoch: 3   Global Step: 37790   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:08:04,004-Speed 5470.91 samples/sec   Loss 9.6977   LearningRate 0.2223   Epoch: 3   Global Step: 37800   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:08:11,433-Speed 5514.27 samples/sec   Loss 9.6903   LearningRate 0.2222   Epoch: 3   Global Step: 37810   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:08:18,884-Speed 5498.07 samples/sec   Loss 9.6920   LearningRate 0.2222   Epoch: 3   Global Step: 37820   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:08:26,349-Speed 5487.89 samples/sec   Loss 9.7067   LearningRate 0.2222   Epoch: 3   Global Step: 37830   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:08:33,865-Speed 5450.68 samples/sec   Loss 9.6161   LearningRate 0.2222   Epoch: 3   Global Step: 37840   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:08:41,341-Speed 5479.49 samples/sec   Loss 9.6974   LearningRate 0.2221   Epoch: 3   Global Step: 37850   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:08:48,771-Speed 5513.34 samples/sec   Loss 9.7122   LearningRate 0.2221   Epoch: 3   Global Step: 37860   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:08:56,243-Speed 5482.49 samples/sec   Loss 9.6823   LearningRate 0.2221   Epoch: 3   Global Step: 37870   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:09:03,693-Speed 5498.86 samples/sec   Loss 9.7116   LearningRate 0.2221   Epoch: 3   Global Step: 37880   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:09:11,156-Speed 5489.28 samples/sec   Loss 9.6608   LearningRate 0.2220   Epoch: 3   Global Step: 37890   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 03:09:18,629-Speed 5481.47 samples/sec   Loss 9.7114   LearningRate 0.2220   Epoch: 3   Global Step: 37900   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:09:26,183-Speed 5423.81 samples/sec   Loss 9.7551   LearningRate 0.2220   Epoch: 3   Global Step: 37910   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:09:33,607-Speed 5518.01 samples/sec   Loss 9.6747   LearningRate 0.2220   Epoch: 3   Global Step: 37920   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:09:41,119-Speed 5453.43 samples/sec   Loss 9.7449   LearningRate 0.2219   Epoch: 3   Global Step: 37930   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:09:48,671-Speed 5424.10 samples/sec   Loss 9.6223   LearningRate 0.2219   Epoch: 3   Global Step: 37940   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:09:56,130-Speed 5492.78 samples/sec   Loss 9.6173   LearningRate 0.2219   Epoch: 3   Global Step: 37950   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:10:03,609-Speed 5476.93 samples/sec   Loss 9.6246   LearningRate 0.2219   Epoch: 3   Global Step: 37960   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:10:11,053-Speed 5503.60 samples/sec   Loss 9.6665   LearningRate 0.2218   Epoch: 3   Global Step: 37970   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:10:18,550-Speed 5463.33 samples/sec   Loss 9.7733   LearningRate 0.2218   Epoch: 3   Global Step: 37980   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:10:26,037-Speed 5471.81 samples/sec   Loss 9.6893   LearningRate 0.2218   Epoch: 3   Global Step: 37990   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:10:33,484-Speed 5501.15 samples/sec   Loss 9.6567   LearningRate 0.2218   Epoch: 3   Global Step: 38000   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:11:17,672-[lfw][38000]XNorm: 21.775363
Training: 2022-01-08 03:11:17,672-[lfw][38000]Accuracy-Flip: 0.99700+-0.00323
Training: 2022-01-08 03:11:17,673-[lfw][38000]Accuracy-Highest: 0.99800
Training: 2022-01-08 03:12:09,739-[cfp_fp][38000]XNorm: 19.744928
Training: 2022-01-08 03:12:09,740-[cfp_fp][38000]Accuracy-Flip: 0.98200+-0.00524
Training: 2022-01-08 03:12:09,740-[cfp_fp][38000]Accuracy-Highest: 0.98457
Training: 2022-01-08 03:12:55,301-[agedb_30][38000]XNorm: 21.892830
Training: 2022-01-08 03:12:55,302-[agedb_30][38000]Accuracy-Flip: 0.96900+-0.00646
Training: 2022-01-08 03:12:55,303-[agedb_30][38000]Accuracy-Highest: 0.97250
Training: 2022-01-08 03:13:02,859-Speed 274.21 samples/sec   Loss 9.6749   LearningRate 0.2217   Epoch: 3   Global Step: 38010   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:13:10,320-Speed 5493.54 samples/sec   Loss 9.6767   LearningRate 0.2217   Epoch: 3   Global Step: 38020   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:13:18,027-Speed 5315.96 samples/sec   Loss 9.7095   LearningRate 0.2217   Epoch: 3   Global Step: 38030   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:13:25,581-Speed 5423.85 samples/sec   Loss 9.5988   LearningRate 0.2216   Epoch: 3   Global Step: 38040   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:13:33,094-Speed 5452.89 samples/sec   Loss 9.6701   LearningRate 0.2216   Epoch: 3   Global Step: 38050   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:13:40,648-Speed 5424.16 samples/sec   Loss 9.6103   LearningRate 0.2216   Epoch: 3   Global Step: 38060   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:13:48,166-Speed 5449.52 samples/sec   Loss 9.6957   LearningRate 0.2216   Epoch: 3   Global Step: 38070   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:13:55,621-Speed 5494.70 samples/sec   Loss 9.5678   LearningRate 0.2215   Epoch: 3   Global Step: 38080   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:14:03,078-Speed 5493.23 samples/sec   Loss 9.6757   LearningRate 0.2215   Epoch: 3   Global Step: 38090   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:14:10,559-Speed 5476.14 samples/sec   Loss 9.6200   LearningRate 0.2215   Epoch: 3   Global Step: 38100   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-08 03:14:18,043-Speed 5473.99 samples/sec   Loss 9.6311   LearningRate 0.2215   Epoch: 3   Global Step: 38110   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:14:25,576-Speed 5437.97 samples/sec   Loss 9.6768   LearningRate 0.2214   Epoch: 3   Global Step: 38120   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:14:33,110-Speed 5437.59 samples/sec   Loss 9.6331   LearningRate 0.2214   Epoch: 3   Global Step: 38130   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:14:40,689-Speed 5405.14 samples/sec   Loss 9.7253   LearningRate 0.2214   Epoch: 3   Global Step: 38140   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:14:48,137-Speed 5499.98 samples/sec   Loss 9.7965   LearningRate 0.2214   Epoch: 3   Global Step: 38150   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:14:55,628-Speed 5468.28 samples/sec   Loss 9.6743   LearningRate 0.2213   Epoch: 3   Global Step: 38160   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:15:03,419-Speed 5258.03 samples/sec   Loss 9.6632   LearningRate 0.2213   Epoch: 3   Global Step: 38170   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:15:11,104-Speed 5331.12 samples/sec   Loss 9.5775   LearningRate 0.2213   Epoch: 3   Global Step: 38180   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:15:18,637-Speed 5437.69 samples/sec   Loss 9.6725   LearningRate 0.2213   Epoch: 3   Global Step: 38190   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:15:26,194-Speed 5421.00 samples/sec   Loss 9.6731   LearningRate 0.2212   Epoch: 3   Global Step: 38200   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:15:33,647-Speed 5496.77 samples/sec   Loss 9.6465   LearningRate 0.2212   Epoch: 3   Global Step: 38210   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:15:41,207-Speed 5418.40 samples/sec   Loss 9.7432   LearningRate 0.2212   Epoch: 3   Global Step: 38220   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:15:48,809-Speed 5388.99 samples/sec   Loss 9.5943   LearningRate 0.2211   Epoch: 3   Global Step: 38230   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:15:56,295-Speed 5472.23 samples/sec   Loss 9.6908   LearningRate 0.2211   Epoch: 3   Global Step: 38240   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:16:03,783-Speed 5470.49 samples/sec   Loss 9.6048   LearningRate 0.2211   Epoch: 3   Global Step: 38250   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:16:11,244-Speed 5490.42 samples/sec   Loss 9.5951   LearningRate 0.2211   Epoch: 3   Global Step: 38260   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:16:18,682-Speed 5508.10 samples/sec   Loss 9.6857   LearningRate 0.2210   Epoch: 3   Global Step: 38270   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:16:26,117-Speed 5509.16 samples/sec   Loss 9.6615   LearningRate 0.2210   Epoch: 3   Global Step: 38280   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:16:33,595-Speed 5478.36 samples/sec   Loss 9.6661   LearningRate 0.2210   Epoch: 3   Global Step: 38290   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:16:41,022-Speed 5515.96 samples/sec   Loss 9.6606   LearningRate 0.2210   Epoch: 3   Global Step: 38300   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:16:48,747-Speed 5302.96 samples/sec   Loss 9.6069   LearningRate 0.2209   Epoch: 3   Global Step: 38310   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:16:56,177-Speed 5513.44 samples/sec   Loss 9.6188   LearningRate 0.2209   Epoch: 3   Global Step: 38320   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:17:03,663-Speed 5472.43 samples/sec   Loss 9.6406   LearningRate 0.2209   Epoch: 3   Global Step: 38330   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:17:11,239-Speed 5407.76 samples/sec   Loss 9.6441   LearningRate 0.2209   Epoch: 3   Global Step: 38340   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:17:18,679-Speed 5506.38 samples/sec   Loss 9.7444   LearningRate 0.2208   Epoch: 3   Global Step: 38350   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:17:26,175-Speed 5464.80 samples/sec   Loss 9.6747   LearningRate 0.2208   Epoch: 3   Global Step: 38360   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:17:33,714-Speed 5433.50 samples/sec   Loss 9.6918   LearningRate 0.2208   Epoch: 3   Global Step: 38370   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:17:41,211-Speed 5464.94 samples/sec   Loss 9.6792   LearningRate 0.2208   Epoch: 3   Global Step: 38380   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:17:48,641-Speed 5513.50 samples/sec   Loss 9.6130   LearningRate 0.2207   Epoch: 3   Global Step: 38390   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:17:56,139-Speed 5463.20 samples/sec   Loss 9.6159   LearningRate 0.2207   Epoch: 3   Global Step: 38400   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:18:03,561-Speed 5520.00 samples/sec   Loss 9.6470   LearningRate 0.2207   Epoch: 3   Global Step: 38410   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:18:10,991-Speed 5513.55 samples/sec   Loss 9.6076   LearningRate 0.2207   Epoch: 3   Global Step: 38420   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:18:18,451-Speed 5490.86 samples/sec   Loss 9.7143   LearningRate 0.2206   Epoch: 3   Global Step: 38430   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:18:25,857-Speed 5531.87 samples/sec   Loss 9.7228   LearningRate 0.2206   Epoch: 3   Global Step: 38440   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:18:33,337-Speed 5476.64 samples/sec   Loss 9.6814   LearningRate 0.2206   Epoch: 3   Global Step: 38450   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:18:40,757-Speed 5520.95 samples/sec   Loss 9.6563   LearningRate 0.2205   Epoch: 3   Global Step: 38460   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:18:48,156-Speed 5536.41 samples/sec   Loss 9.6360   LearningRate 0.2205   Epoch: 3   Global Step: 38470   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:18:55,569-Speed 5526.73 samples/sec   Loss 9.6754   LearningRate 0.2205   Epoch: 3   Global Step: 38480   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:19:03,018-Speed 5499.31 samples/sec   Loss 9.5939   LearningRate 0.2205   Epoch: 3   Global Step: 38490   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:19:10,537-Speed 5448.66 samples/sec   Loss 9.6627   LearningRate 0.2204   Epoch: 3   Global Step: 38500   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:19:18,042-Speed 5458.33 samples/sec   Loss 9.5707   LearningRate 0.2204   Epoch: 3   Global Step: 38510   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:19:25,590-Speed 5427.50 samples/sec   Loss 9.6391   LearningRate 0.2204   Epoch: 3   Global Step: 38520   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:19:33,093-Speed 5459.88 samples/sec   Loss 9.5539   LearningRate 0.2204   Epoch: 3   Global Step: 38530   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:19:40,701-Speed 5385.12 samples/sec   Loss 9.5774   LearningRate 0.2203   Epoch: 3   Global Step: 38540   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:19:48,163-Speed 5489.62 samples/sec   Loss 9.7196   LearningRate 0.2203   Epoch: 3   Global Step: 38550   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:19:55,655-Speed 5467.76 samples/sec   Loss 9.6626   LearningRate 0.2203   Epoch: 3   Global Step: 38560   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:20:03,190-Speed 5436.43 samples/sec   Loss 9.5950   LearningRate 0.2203   Epoch: 3   Global Step: 38570   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:20:10,714-Speed 5444.91 samples/sec   Loss 9.5883   LearningRate 0.2202   Epoch: 3   Global Step: 38580   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:20:18,228-Speed 5451.83 samples/sec   Loss 9.6453   LearningRate 0.2202   Epoch: 3   Global Step: 38590   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:20:25,825-Speed 5392.63 samples/sec   Loss 9.5949   LearningRate 0.2202   Epoch: 3   Global Step: 38600   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:20:33,465-Speed 5361.76 samples/sec   Loss 9.6772   LearningRate 0.2202   Epoch: 3   Global Step: 38610   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:20:41,029-Speed 5416.52 samples/sec   Loss 9.7232   LearningRate 0.2201   Epoch: 3   Global Step: 38620   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:20:48,644-Speed 5379.41 samples/sec   Loss 9.6763   LearningRate 0.2201   Epoch: 3   Global Step: 38630   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:20:56,192-Speed 5427.45 samples/sec   Loss 9.6690   LearningRate 0.2201   Epoch: 3   Global Step: 38640   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-08 03:21:03,711-Speed 5448.63 samples/sec   Loss 9.5790   LearningRate 0.2201   Epoch: 3   Global Step: 38650   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-01-08 03:21:11,230-Speed 5448.28 samples/sec   Loss 9.6759   LearningRate 0.2200   Epoch: 3   Global Step: 38660   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-01-08 03:21:18,793-Speed 5416.79 samples/sec   Loss 9.5815   LearningRate 0.2200   Epoch: 3   Global Step: 38670   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-01-08 03:21:26,209-Speed 5524.41 samples/sec   Loss 9.6645   LearningRate 0.2200   Epoch: 3   Global Step: 38680   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-01-08 03:21:33,647-Speed 5506.81 samples/sec   Loss 9.6734   LearningRate 0.2199   Epoch: 3   Global Step: 38690   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-01-08 03:21:41,159-Speed 5473.77 samples/sec   Loss 9.5790   LearningRate 0.2199   Epoch: 3   Global Step: 38700   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-01-08 03:21:48,633-Speed 5480.95 samples/sec   Loss 9.4748   LearningRate 0.2199   Epoch: 3   Global Step: 38710   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-01-08 03:21:56,048-Speed 5525.01 samples/sec   Loss 9.6001   LearningRate 0.2199   Epoch: 3   Global Step: 38720   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-01-08 03:22:03,501-Speed 5496.18 samples/sec   Loss 9.5904   LearningRate 0.2198   Epoch: 3   Global Step: 38730   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-01-08 03:22:10,953-Speed 5497.48 samples/sec   Loss 9.6718   LearningRate 0.2198   Epoch: 3   Global Step: 38740   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-01-08 03:22:18,435-Speed 5475.87 samples/sec   Loss 9.5120   LearningRate 0.2198   Epoch: 3   Global Step: 38750   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:22:25,970-Speed 5436.59 samples/sec   Loss 9.5840   LearningRate 0.2198   Epoch: 3   Global Step: 38760   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:22:33,439-Speed 5484.59 samples/sec   Loss 9.6120   LearningRate 0.2197   Epoch: 3   Global Step: 38770   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:22:40,894-Speed 5495.02 samples/sec   Loss 9.5792   LearningRate 0.2197   Epoch: 3   Global Step: 38780   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:22:48,522-Speed 5370.76 samples/sec   Loss 9.6213   LearningRate 0.2197   Epoch: 3   Global Step: 38790   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:22:56,154-Speed 5367.80 samples/sec   Loss 9.5373   LearningRate 0.2197   Epoch: 3   Global Step: 38800   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:23:03,670-Speed 5450.73 samples/sec   Loss 9.5263   LearningRate 0.2196   Epoch: 3   Global Step: 38810   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-08 03:23:11,225-Speed 5421.79 samples/sec   Loss 9.6755   LearningRate 0.2196   Epoch: 3   Global Step: 38820   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:23:18,688-Speed 5489.08 samples/sec   Loss 9.6769   LearningRate 0.2196   Epoch: 3   Global Step: 38830   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:23:26,180-Speed 5468.42 samples/sec   Loss 9.5622   LearningRate 0.2196   Epoch: 3   Global Step: 38840   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:23:33,776-Speed 5393.34 samples/sec   Loss 9.5418   LearningRate 0.2195   Epoch: 3   Global Step: 38850   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:23:41,225-Speed 5498.80 samples/sec   Loss 9.5629   LearningRate 0.2195   Epoch: 3   Global Step: 38860   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:23:48,658-Speed 5511.67 samples/sec   Loss 9.6210   LearningRate 0.2195   Epoch: 3   Global Step: 38870   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:23:56,167-Speed 5455.37 samples/sec   Loss 9.5639   LearningRate 0.2195   Epoch: 3   Global Step: 38880   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:24:03,631-Speed 5488.60 samples/sec   Loss 9.5893   LearningRate 0.2194   Epoch: 3   Global Step: 38890   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:24:11,136-Speed 5458.35 samples/sec   Loss 9.5966   LearningRate 0.2194   Epoch: 3   Global Step: 38900   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:24:18,720-Speed 5401.77 samples/sec   Loss 9.6422   LearningRate 0.2194   Epoch: 3   Global Step: 38910   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:24:26,388-Speed 5342.39 samples/sec   Loss 9.6176   LearningRate 0.2193   Epoch: 3   Global Step: 38920   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:24:33,931-Speed 5431.28 samples/sec   Loss 9.4917   LearningRate 0.2193   Epoch: 3   Global Step: 38930   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:24:41,377-Speed 5501.41 samples/sec   Loss 9.5864   LearningRate 0.2193   Epoch: 3   Global Step: 38940   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:24:48,805-Speed 5515.07 samples/sec   Loss 9.5807   LearningRate 0.2193   Epoch: 3   Global Step: 38950   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:24:56,237-Speed 5512.49 samples/sec   Loss 9.5781   LearningRate 0.2192   Epoch: 3   Global Step: 38960   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:25:03,707-Speed 5483.66 samples/sec   Loss 9.6029   LearningRate 0.2192   Epoch: 3   Global Step: 38970   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:25:11,209-Speed 5460.85 samples/sec   Loss 9.5713   LearningRate 0.2192   Epoch: 3   Global Step: 38980   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:25:18,730-Speed 5447.04 samples/sec   Loss 9.5742   LearningRate 0.2192   Epoch: 3   Global Step: 38990   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:25:26,191-Speed 5490.37 samples/sec   Loss 9.6085   LearningRate 0.2191   Epoch: 3   Global Step: 39000   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:25:33,660-Speed 5485.37 samples/sec   Loss 9.6009   LearningRate 0.2191   Epoch: 3   Global Step: 39010   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:25:41,148-Speed 5470.35 samples/sec   Loss 9.6080   LearningRate 0.2191   Epoch: 3   Global Step: 39020   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:25:48,610-Speed 5489.62 samples/sec   Loss 9.5985   LearningRate 0.2191   Epoch: 3   Global Step: 39030   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:25:56,019-Speed 5529.90 samples/sec   Loss 9.5608   LearningRate 0.2190   Epoch: 3   Global Step: 39040   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:26:03,438-Speed 5521.67 samples/sec   Loss 9.5292   LearningRate 0.2190   Epoch: 3   Global Step: 39050   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-08 03:26:10,928-Speed 5469.07 samples/sec   Loss 9.6002   LearningRate 0.2190   Epoch: 3   Global Step: 39060   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-08 03:26:18,352-Speed 5518.40 samples/sec   Loss 9.6562   LearningRate 0.2190   Epoch: 3   Global Step: 39070   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-08 03:26:25,818-Speed 5486.69 samples/sec   Loss 9.5163   LearningRate 0.2189   Epoch: 3   Global Step: 39080   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:26:33,288-Speed 5484.35 samples/sec   Loss 9.6872   LearningRate 0.2189   Epoch: 3   Global Step: 39090   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:26:41,099-Speed 5244.61 samples/sec   Loss 9.7123   LearningRate 0.2189   Epoch: 3   Global Step: 39100   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:26:48,536-Speed 5508.33 samples/sec   Loss 9.5637   LearningRate 0.2189   Epoch: 3   Global Step: 39110   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:26:56,034-Speed 5463.98 samples/sec   Loss 9.5643   LearningRate 0.2188   Epoch: 3   Global Step: 39120   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:27:03,485-Speed 5497.99 samples/sec   Loss 9.6186   LearningRate 0.2188   Epoch: 3   Global Step: 39130   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:27:10,919-Speed 5510.26 samples/sec   Loss 9.5822   LearningRate 0.2188   Epoch: 3   Global Step: 39140   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:27:18,386-Speed 5485.99 samples/sec   Loss 9.5505   LearningRate 0.2187   Epoch: 3   Global Step: 39150   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:27:25,792-Speed 5540.90 samples/sec   Loss 9.5984   LearningRate 0.2187   Epoch: 3   Global Step: 39160   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:27:33,300-Speed 5456.22 samples/sec   Loss 9.5342   LearningRate 0.2187   Epoch: 3   Global Step: 39170   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:27:40,702-Speed 5534.06 samples/sec   Loss 9.5272   LearningRate 0.2187   Epoch: 3   Global Step: 39180   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:27:48,131-Speed 5514.15 samples/sec   Loss 9.5724   LearningRate 0.2186   Epoch: 3   Global Step: 39190   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:27:55,618-Speed 5471.77 samples/sec   Loss 9.5365   LearningRate 0.2186   Epoch: 3   Global Step: 39200   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:28:03,122-Speed 5459.36 samples/sec   Loss 9.6270   LearningRate 0.2186   Epoch: 3   Global Step: 39210   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:28:10,591-Speed 5484.22 samples/sec   Loss 9.6275   LearningRate 0.2186   Epoch: 3   Global Step: 39220   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:28:18,021-Speed 5513.65 samples/sec   Loss 9.6059   LearningRate 0.2185   Epoch: 3   Global Step: 39230   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:28:25,507-Speed 5472.44 samples/sec   Loss 9.5955   LearningRate 0.2185   Epoch: 3   Global Step: 39240   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:28:32,937-Speed 5513.59 samples/sec   Loss 9.5966   LearningRate 0.2185   Epoch: 3   Global Step: 39250   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:28:40,372-Speed 5509.87 samples/sec   Loss 9.5989   LearningRate 0.2185   Epoch: 3   Global Step: 39260   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:28:47,778-Speed 5531.35 samples/sec   Loss 9.5483   LearningRate 0.2184   Epoch: 3   Global Step: 39270   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:28:55,279-Speed 5461.14 samples/sec   Loss 9.6636   LearningRate 0.2184   Epoch: 3   Global Step: 39280   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:29:02,787-Speed 5456.88 samples/sec   Loss 9.6115   LearningRate 0.2184   Epoch: 3   Global Step: 39290   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:29:10,241-Speed 5495.41 samples/sec   Loss 9.5559   LearningRate 0.2184   Epoch: 3   Global Step: 39300   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:29:17,683-Speed 5504.19 samples/sec   Loss 9.6031   LearningRate 0.2183   Epoch: 3   Global Step: 39310   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:29:25,116-Speed 5511.71 samples/sec   Loss 9.6240   LearningRate 0.2183   Epoch: 3   Global Step: 39320   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:29:32,555-Speed 5506.92 samples/sec   Loss 9.5858   LearningRate 0.2183   Epoch: 3   Global Step: 39330   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:29:40,200-Speed 5358.59 samples/sec   Loss 9.5304   LearningRate 0.2183   Epoch: 3   Global Step: 39340   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:29:47,667-Speed 5486.23 samples/sec   Loss 9.5828   LearningRate 0.2182   Epoch: 3   Global Step: 39350   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:29:55,110-Speed 5503.44 samples/sec   Loss 9.5545   LearningRate 0.2182   Epoch: 3   Global Step: 39360   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:30:02,565-Speed 5495.50 samples/sec   Loss 9.4744   LearningRate 0.2182   Epoch: 3   Global Step: 39370   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:30:10,013-Speed 5500.22 samples/sec   Loss 9.6191   LearningRate 0.2182   Epoch: 3   Global Step: 39380   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-08 03:30:17,396-Speed 5548.25 samples/sec   Loss 9.5862   LearningRate 0.2181   Epoch: 3   Global Step: 39390   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:30:24,846-Speed 5499.09 samples/sec   Loss 9.5493   LearningRate 0.2181   Epoch: 3   Global Step: 39400   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:30:32,288-Speed 5504.96 samples/sec   Loss 9.5816   LearningRate 0.2181   Epoch: 3   Global Step: 39410   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:30:39,659-Speed 5557.72 samples/sec   Loss 9.5333   LearningRate 0.2180   Epoch: 3   Global Step: 39420   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:30:47,107-Speed 5499.53 samples/sec   Loss 9.5478   LearningRate 0.2180   Epoch: 3   Global Step: 39430   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:30:54,576-Speed 5484.86 samples/sec   Loss 9.5861   LearningRate 0.2180   Epoch: 3   Global Step: 39440   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:31:02,033-Speed 5494.06 samples/sec   Loss 9.5444   LearningRate 0.2180   Epoch: 3   Global Step: 39450   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:31:09,654-Speed 5375.53 samples/sec   Loss 9.5536   LearningRate 0.2179   Epoch: 3   Global Step: 39460   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:31:17,171-Speed 5449.27 samples/sec   Loss 9.5719   LearningRate 0.2179   Epoch: 3   Global Step: 39470   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:31:24,620-Speed 5499.63 samples/sec   Loss 9.5535   LearningRate 0.2179   Epoch: 3   Global Step: 39480   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:31:32,100-Speed 5476.20 samples/sec   Loss 9.5520   LearningRate 0.2179   Epoch: 3   Global Step: 39490   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:31:39,572-Speed 5483.02 samples/sec   Loss 9.5464   LearningRate 0.2178   Epoch: 3   Global Step: 39500   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:31:46,986-Speed 5525.30 samples/sec   Loss 9.5648   LearningRate 0.2178   Epoch: 3   Global Step: 39510   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:31:54,415-Speed 5514.30 samples/sec   Loss 9.4516   LearningRate 0.2178   Epoch: 3   Global Step: 39520   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:32:01,875-Speed 5491.31 samples/sec   Loss 9.4982   LearningRate 0.2178   Epoch: 3   Global Step: 39530   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:32:09,403-Speed 5442.01 samples/sec   Loss 9.5197   LearningRate 0.2177   Epoch: 3   Global Step: 39540   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:32:16,851-Speed 5500.41 samples/sec   Loss 9.5483   LearningRate 0.2177   Epoch: 3   Global Step: 39550   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:32:24,409-Speed 5419.45 samples/sec   Loss 9.5110   LearningRate 0.2177   Epoch: 3   Global Step: 39560   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:32:31,908-Speed 5463.18 samples/sec   Loss 9.5274   LearningRate 0.2177   Epoch: 3   Global Step: 39570   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:32:39,375-Speed 5486.20 samples/sec   Loss 9.4924   LearningRate 0.2176   Epoch: 3   Global Step: 39580   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:32:46,863-Speed 5471.16 samples/sec   Loss 9.5408   LearningRate 0.2176   Epoch: 3   Global Step: 39590   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:32:54,305-Speed 5504.09 samples/sec   Loss 9.4801   LearningRate 0.2176   Epoch: 3   Global Step: 39600   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:33:01,793-Speed 5470.46 samples/sec   Loss 9.5960   LearningRate 0.2176   Epoch: 3   Global Step: 39610   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:33:09,280-Speed 5472.43 samples/sec   Loss 9.5077   LearningRate 0.2175   Epoch: 3   Global Step: 39620   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:33:16,772-Speed 5467.41 samples/sec   Loss 9.5001   LearningRate 0.2175   Epoch: 3   Global Step: 39630   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:33:24,224-Speed 5497.36 samples/sec   Loss 9.6830   LearningRate 0.2175   Epoch: 3   Global Step: 39640   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:33:31,719-Speed 5465.48 samples/sec   Loss 9.5622   LearningRate 0.2175   Epoch: 3   Global Step: 39650   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:33:39,347-Speed 5371.31 samples/sec   Loss 9.5404   LearningRate 0.2174   Epoch: 3   Global Step: 39660   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:33:46,800-Speed 5496.08 samples/sec   Loss 9.5348   LearningRate 0.2174   Epoch: 3   Global Step: 39670   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:33:54,323-Speed 5445.46 samples/sec   Loss 9.5337   LearningRate 0.2174   Epoch: 3   Global Step: 39680   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:34:01,934-Speed 5382.26 samples/sec   Loss 9.5682   LearningRate 0.2173   Epoch: 3   Global Step: 39690   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:34:09,455-Speed 5447.00 samples/sec   Loss 9.4712   LearningRate 0.2173   Epoch: 3   Global Step: 39700   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:34:16,914-Speed 5492.46 samples/sec   Loss 9.5049   LearningRate 0.2173   Epoch: 3   Global Step: 39710   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:34:24,363-Speed 5499.54 samples/sec   Loss 9.5858   LearningRate 0.2173   Epoch: 3   Global Step: 39720   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:34:31,933-Speed 5411.24 samples/sec   Loss 9.6154   LearningRate 0.2172   Epoch: 3   Global Step: 39730   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:34:39,666-Speed 5297.66 samples/sec   Loss 9.5433   LearningRate 0.2172   Epoch: 3   Global Step: 39740   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:34:47,276-Speed 5382.97 samples/sec   Loss 9.5553   LearningRate 0.2172   Epoch: 3   Global Step: 39750   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:34:54,904-Speed 5370.61 samples/sec   Loss 9.5290   LearningRate 0.2172   Epoch: 3   Global Step: 39760   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:35:02,514-Speed 5383.06 samples/sec   Loss 9.3961   LearningRate 0.2171   Epoch: 3   Global Step: 39770   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-08 03:35:10,112-Speed 5391.36 samples/sec   Loss 9.4996   LearningRate 0.2171   Epoch: 3   Global Step: 39780   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:35:17,814-Speed 5319.07 samples/sec   Loss 9.4786   LearningRate 0.2171   Epoch: 3   Global Step: 39790   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:35:25,457-Speed 5359.88 samples/sec   Loss 9.5836   LearningRate 0.2171   Epoch: 3   Global Step: 39800   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:35:33,073-Speed 5379.29 samples/sec   Loss 9.5483   LearningRate 0.2170   Epoch: 3   Global Step: 39810   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:35:40,690-Speed 5377.97 samples/sec   Loss 9.5189   LearningRate 0.2170   Epoch: 3   Global Step: 39820   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:35:48,173-Speed 5474.86 samples/sec   Loss 9.6523   LearningRate 0.2170   Epoch: 3   Global Step: 39830   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:35:55,683-Speed 5454.50 samples/sec   Loss 9.5607   LearningRate 0.2170   Epoch: 3   Global Step: 39840   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:36:03,177-Speed 5466.14 samples/sec   Loss 9.4478   LearningRate 0.2169   Epoch: 3   Global Step: 39850   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:36:10,767-Speed 5397.26 samples/sec   Loss 9.5621   LearningRate 0.2169   Epoch: 3   Global Step: 39860   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-01-08 03:36:18,221-Speed 5495.97 samples/sec   Loss 9.4756   LearningRate 0.2169   Epoch: 3   Global Step: 39870   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-01-08 03:36:25,771-Speed 5426.56 samples/sec   Loss 9.5452   LearningRate 0.2169   Epoch: 3   Global Step: 39880   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-01-08 03:36:33,210-Speed 5506.43 samples/sec   Loss 9.4514   LearningRate 0.2168   Epoch: 3   Global Step: 39890   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-01-08 03:36:40,692-Speed 5475.14 samples/sec   Loss 9.5465   LearningRate 0.2168   Epoch: 3   Global Step: 39900   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-01-08 03:36:48,149-Speed 5493.66 samples/sec   Loss 9.5617   LearningRate 0.2168   Epoch: 3   Global Step: 39910   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-01-08 03:36:55,577-Speed 5514.75 samples/sec   Loss 9.5444   LearningRate 0.2168   Epoch: 3   Global Step: 39920   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-01-08 03:37:03,135-Speed 5420.40 samples/sec   Loss 9.5247   LearningRate 0.2167   Epoch: 3   Global Step: 39930   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-01-08 03:37:10,674-Speed 5433.73 samples/sec   Loss 9.5834   LearningRate 0.2167   Epoch: 3   Global Step: 39940   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-01-08 03:37:18,201-Speed 5442.71 samples/sec   Loss 9.5478   LearningRate 0.2167   Epoch: 3   Global Step: 39950   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-01-08 03:37:25,655-Speed 5496.30 samples/sec   Loss 9.4854   LearningRate 0.2166   Epoch: 3   Global Step: 39960   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:37:33,271-Speed 5378.36 samples/sec   Loss 9.5712   LearningRate 0.2166   Epoch: 3   Global Step: 39970   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:37:40,781-Speed 5455.66 samples/sec   Loss 9.4593   LearningRate 0.2166   Epoch: 3   Global Step: 39980   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:37:48,452-Speed 5340.18 samples/sec   Loss 9.4943   LearningRate 0.2166   Epoch: 3   Global Step: 39990   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:37:56,044-Speed 5395.01 samples/sec   Loss 9.5001   LearningRate 0.2165   Epoch: 3   Global Step: 40000   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:38:40,076-[lfw][40000]XNorm: 23.917961
Training: 2022-01-08 03:38:40,077-[lfw][40000]Accuracy-Flip: 0.99717+-0.00236
Training: 2022-01-08 03:38:40,078-[lfw][40000]Accuracy-Highest: 0.99800
Training: 2022-01-08 03:39:32,410-[cfp_fp][40000]XNorm: 21.545600
Training: 2022-01-08 03:39:32,411-[cfp_fp][40000]Accuracy-Flip: 0.98600+-0.00534
Training: 2022-01-08 03:39:32,412-[cfp_fp][40000]Accuracy-Highest: 0.98600
Training: 2022-01-08 03:40:17,829-[agedb_30][40000]XNorm: 24.016595
Training: 2022-01-08 03:40:17,831-[agedb_30][40000]Accuracy-Flip: 0.96867+-0.00609
Training: 2022-01-08 03:40:17,831-[agedb_30][40000]Accuracy-Highest: 0.97250
Training: 2022-01-08 03:40:25,573-Speed 273.93 samples/sec   Loss 9.5309   LearningRate 0.2165   Epoch: 3   Global Step: 40010   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:40:33,116-Speed 5432.05 samples/sec   Loss 9.5041   LearningRate 0.2165   Epoch: 3   Global Step: 40020   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:40:40,620-Speed 5459.38 samples/sec   Loss 9.5758   LearningRate 0.2165   Epoch: 3   Global Step: 40030   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:40:48,167-Speed 5428.45 samples/sec   Loss 9.4909   LearningRate 0.2164   Epoch: 3   Global Step: 40040   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:40:55,651-Speed 5473.43 samples/sec   Loss 9.5943   LearningRate 0.2164   Epoch: 3   Global Step: 40050   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:41:03,099-Speed 5501.72 samples/sec   Loss 9.4999   LearningRate 0.2164   Epoch: 3   Global Step: 40060   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:41:10,546-Speed 5501.12 samples/sec   Loss 9.4775   LearningRate 0.2164   Epoch: 3   Global Step: 40070   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:41:17,988-Speed 5504.19 samples/sec   Loss 9.4907   LearningRate 0.2163   Epoch: 3   Global Step: 40080   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:41:25,484-Speed 5465.33 samples/sec   Loss 9.5811   LearningRate 0.2163   Epoch: 3   Global Step: 40090   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:41:32,991-Speed 5456.97 samples/sec   Loss 9.5988   LearningRate 0.2163   Epoch: 3   Global Step: 40100   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:41:40,463-Speed 5482.55 samples/sec   Loss 9.5369   LearningRate 0.2163   Epoch: 3   Global Step: 40110   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:41:48,020-Speed 5420.60 samples/sec   Loss 9.5154   LearningRate 0.2162   Epoch: 3   Global Step: 40120   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:41:55,591-Speed 5410.66 samples/sec   Loss 9.4905   LearningRate 0.2162   Epoch: 3   Global Step: 40130   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:42:03,138-Speed 5428.33 samples/sec   Loss 9.4981   LearningRate 0.2162   Epoch: 3   Global Step: 40140   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:42:10,692-Speed 5423.27 samples/sec   Loss 9.4135   LearningRate 0.2162   Epoch: 3   Global Step: 40150   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:42:18,237-Speed 5428.89 samples/sec   Loss 9.4765   LearningRate 0.2161   Epoch: 3   Global Step: 40160   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:42:25,845-Speed 5384.80 samples/sec   Loss 9.4482   LearningRate 0.2161   Epoch: 3   Global Step: 40170   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:42:33,357-Speed 5453.07 samples/sec   Loss 9.4868   LearningRate 0.2161   Epoch: 3   Global Step: 40180   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:42:40,858-Speed 5461.40 samples/sec   Loss 9.5101   LearningRate 0.2161   Epoch: 3   Global Step: 40190   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:42:48,351-Speed 5467.37 samples/sec   Loss 9.4372   LearningRate 0.2160   Epoch: 3   Global Step: 40200   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:42:55,825-Speed 5480.96 samples/sec   Loss 9.4576   LearningRate 0.2160   Epoch: 3   Global Step: 40210   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:43:03,298-Speed 5481.57 samples/sec   Loss 9.4635   LearningRate 0.2160   Epoch: 3   Global Step: 40220   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:43:10,784-Speed 5472.82 samples/sec   Loss 9.4368   LearningRate 0.2159   Epoch: 3   Global Step: 40230   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:43:18,357-Speed 5409.06 samples/sec   Loss 9.5058   LearningRate 0.2159   Epoch: 3   Global Step: 40240   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:43:25,798-Speed 5505.05 samples/sec   Loss 9.5039   LearningRate 0.2159   Epoch: 3   Global Step: 40250   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:43:33,282-Speed 5474.19 samples/sec   Loss 9.5320   LearningRate 0.2159   Epoch: 3   Global Step: 40260   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-08 03:43:40,760-Speed 5477.84 samples/sec   Loss 9.5404   LearningRate 0.2158   Epoch: 3   Global Step: 40270   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-08 03:43:48,284-Speed 5444.71 samples/sec   Loss 9.5365   LearningRate 0.2158   Epoch: 3   Global Step: 40280   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:43:55,850-Speed 5414.29 samples/sec   Loss 9.4290   LearningRate 0.2158   Epoch: 3   Global Step: 40290   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:44:03,416-Speed 5414.28 samples/sec   Loss 9.4932   LearningRate 0.2158   Epoch: 3   Global Step: 40300   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:44:10,962-Speed 5428.85 samples/sec   Loss 9.5371   LearningRate 0.2157   Epoch: 3   Global Step: 40310   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:44:18,555-Speed 5395.42 samples/sec   Loss 9.4763   LearningRate 0.2157   Epoch: 3   Global Step: 40320   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:44:26,115-Speed 5418.21 samples/sec   Loss 9.6025   LearningRate 0.2157   Epoch: 3   Global Step: 40330   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:44:33,584-Speed 5485.47 samples/sec   Loss 9.4785   LearningRate 0.2157   Epoch: 3   Global Step: 40340   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:44:41,102-Speed 5449.34 samples/sec   Loss 9.4881   LearningRate 0.2156   Epoch: 3   Global Step: 40350   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:44:48,714-Speed 5381.53 samples/sec   Loss 9.4655   LearningRate 0.2156   Epoch: 3   Global Step: 40360   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:44:56,184-Speed 5483.56 samples/sec   Loss 9.5061   LearningRate 0.2156   Epoch: 3   Global Step: 40370   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:45:03,722-Speed 5434.86 samples/sec   Loss 9.4551   LearningRate 0.2156   Epoch: 3   Global Step: 40380   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:45:11,163-Speed 5505.50 samples/sec   Loss 9.4006   LearningRate 0.2155   Epoch: 3   Global Step: 40390   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:45:18,649-Speed 5472.16 samples/sec   Loss 9.5283   LearningRate 0.2155   Epoch: 3   Global Step: 40400   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:45:26,118-Speed 5484.55 samples/sec   Loss 9.5147   LearningRate 0.2155   Epoch: 3   Global Step: 40410   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:45:33,623-Speed 5458.24 samples/sec   Loss 9.5150   LearningRate 0.2155   Epoch: 3   Global Step: 40420   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:45:41,071-Speed 5500.71 samples/sec   Loss 9.4443   LearningRate 0.2154   Epoch: 3   Global Step: 40430   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:45:48,537-Speed 5486.72 samples/sec   Loss 9.5214   LearningRate 0.2154   Epoch: 3   Global Step: 40440   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:45:55,988-Speed 5498.31 samples/sec   Loss 9.5048   LearningRate 0.2154   Epoch: 3   Global Step: 40450   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:46:03,524-Speed 5436.16 samples/sec   Loss 9.5186   LearningRate 0.2154   Epoch: 3   Global Step: 40460   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:46:10,976-Speed 5496.65 samples/sec   Loss 9.5234   LearningRate 0.2153   Epoch: 3   Global Step: 40470   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:46:18,480-Speed 5459.81 samples/sec   Loss 9.4069   LearningRate 0.2153   Epoch: 3   Global Step: 40480   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:46:26,021-Speed 5433.35 samples/sec   Loss 9.5059   LearningRate 0.2153   Epoch: 3   Global Step: 40490   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:46:33,517-Speed 5464.99 samples/sec   Loss 9.5115   LearningRate 0.2153   Epoch: 3   Global Step: 40500   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:46:41,085-Speed 5413.05 samples/sec   Loss 9.5540   LearningRate 0.2152   Epoch: 3   Global Step: 40510   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:46:48,659-Speed 5409.07 samples/sec   Loss 9.4905   LearningRate 0.2152   Epoch: 3   Global Step: 40520   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:46:56,147-Speed 5470.62 samples/sec   Loss 9.4873   LearningRate 0.2152   Epoch: 3   Global Step: 40530   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:47:03,648-Speed 5461.43 samples/sec   Loss 9.4356   LearningRate 0.2151   Epoch: 3   Global Step: 40540   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:47:11,182-Speed 5437.50 samples/sec   Loss 9.5063   LearningRate 0.2151   Epoch: 3   Global Step: 40550   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:47:18,640-Speed 5492.55 samples/sec   Loss 9.4211   LearningRate 0.2151   Epoch: 3   Global Step: 40560   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:47:26,174-Speed 5437.55 samples/sec   Loss 9.4399   LearningRate 0.2151   Epoch: 3   Global Step: 40570   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:47:33,736-Speed 5417.11 samples/sec   Loss 9.4664   LearningRate 0.2150   Epoch: 3   Global Step: 40580   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:47:41,336-Speed 5390.38 samples/sec   Loss 9.4100   LearningRate 0.2150   Epoch: 3   Global Step: 40590   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:47:48,832-Speed 5465.56 samples/sec   Loss 9.4548   LearningRate 0.2150   Epoch: 3   Global Step: 40600   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:47:56,288-Speed 5494.14 samples/sec   Loss 9.4809   LearningRate 0.2150   Epoch: 3   Global Step: 40610   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:48:03,801-Speed 5452.78 samples/sec   Loss 9.4359   LearningRate 0.2149   Epoch: 3   Global Step: 40620   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:48:11,285-Speed 5473.33 samples/sec   Loss 9.4619   LearningRate 0.2149   Epoch: 3   Global Step: 40630   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:48:18,738-Speed 5496.97 samples/sec   Loss 9.4197   LearningRate 0.2149   Epoch: 3   Global Step: 40640   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:48:26,255-Speed 5449.52 samples/sec   Loss 9.5120   LearningRate 0.2149   Epoch: 3   Global Step: 40650   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:48:33,856-Speed 5389.38 samples/sec   Loss 9.5001   LearningRate 0.2148   Epoch: 3   Global Step: 40660   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:48:41,441-Speed 5400.79 samples/sec   Loss 9.4530   LearningRate 0.2148   Epoch: 3   Global Step: 40670   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:48:49,016-Speed 5407.90 samples/sec   Loss 9.4467   LearningRate 0.2148   Epoch: 3   Global Step: 40680   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:48:56,513-Speed 5464.38 samples/sec   Loss 9.4436   LearningRate 0.2148   Epoch: 3   Global Step: 40690   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:49:03,975-Speed 5489.99 samples/sec   Loss 9.5441   LearningRate 0.2147   Epoch: 3   Global Step: 40700   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:49:11,474-Speed 5463.13 samples/sec   Loss 9.4182   LearningRate 0.2147   Epoch: 3   Global Step: 40710   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:49:18,943-Speed 5484.44 samples/sec   Loss 9.4648   LearningRate 0.2147   Epoch: 3   Global Step: 40720   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:49:26,396-Speed 5496.73 samples/sec   Loss 9.4892   LearningRate 0.2147   Epoch: 3   Global Step: 40730   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:49:33,966-Speed 5411.36 samples/sec   Loss 9.5027   LearningRate 0.2146   Epoch: 3   Global Step: 40740   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:49:41,488-Speed 5445.86 samples/sec   Loss 9.3349   LearningRate 0.2146   Epoch: 3   Global Step: 40750   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:49:49,035-Speed 5428.50 samples/sec   Loss 9.3308   LearningRate 0.2146   Epoch: 3   Global Step: 40760   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:49:56,693-Speed 5348.95 samples/sec   Loss 9.4055   LearningRate 0.2146   Epoch: 3   Global Step: 40770   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:50:04,255-Speed 5417.71 samples/sec   Loss 9.4122   LearningRate 0.2145   Epoch: 3   Global Step: 40780   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:50:11,864-Speed 5383.34 samples/sec   Loss 9.4219   LearningRate 0.2145   Epoch: 3   Global Step: 40790   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:50:19,291-Speed 5515.85 samples/sec   Loss 9.4259   LearningRate 0.2145   Epoch: 3   Global Step: 40800   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:50:26,708-Speed 5523.18 samples/sec   Loss 9.4763   LearningRate 0.2145   Epoch: 3   Global Step: 40810   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:50:34,187-Speed 5477.57 samples/sec   Loss 9.4293   LearningRate 0.2144   Epoch: 3   Global Step: 40820   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:50:41,644-Speed 5493.39 samples/sec   Loss 9.5547   LearningRate 0.2144   Epoch: 3   Global Step: 40830   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:50:49,127-Speed 5474.47 samples/sec   Loss 9.4571   LearningRate 0.2144   Epoch: 3   Global Step: 40840   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:50:56,583-Speed 5494.20 samples/sec   Loss 9.4309   LearningRate 0.2144   Epoch: 3   Global Step: 40850   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:51:04,115-Speed 5438.52 samples/sec   Loss 9.4304   LearningRate 0.2143   Epoch: 3   Global Step: 40860   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:51:11,588-Speed 5482.47 samples/sec   Loss 9.4186   LearningRate 0.2143   Epoch: 3   Global Step: 40870   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:51:19,085-Speed 5464.31 samples/sec   Loss 9.4139   LearningRate 0.2143   Epoch: 3   Global Step: 40880   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:51:26,652-Speed 5413.28 samples/sec   Loss 9.3807   LearningRate 0.2142   Epoch: 3   Global Step: 40890   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:51:34,138-Speed 5472.43 samples/sec   Loss 9.5008   LearningRate 0.2142   Epoch: 3   Global Step: 40900   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-08 03:51:41,601-Speed 5489.46 samples/sec   Loss 9.4685   LearningRate 0.2142   Epoch: 3   Global Step: 40910   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:51:49,414-Speed 5243.06 samples/sec   Loss 9.4031   LearningRate 0.2142   Epoch: 3   Global Step: 40920   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:51:57,096-Speed 5333.24 samples/sec   Loss 9.4028   LearningRate 0.2141   Epoch: 3   Global Step: 40930   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:52:04,609-Speed 5452.05 samples/sec   Loss 9.4423   LearningRate 0.2141   Epoch: 3   Global Step: 40940   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:52:12,111-Speed 5460.99 samples/sec   Loss 9.4767   LearningRate 0.2141   Epoch: 3   Global Step: 40950   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:52:19,686-Speed 5408.06 samples/sec   Loss 9.4696   LearningRate 0.2141   Epoch: 3   Global Step: 40960   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:52:27,261-Speed 5408.14 samples/sec   Loss 9.3910   LearningRate 0.2140   Epoch: 3   Global Step: 40970   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:52:34,864-Speed 5387.79 samples/sec   Loss 9.4336   LearningRate 0.2140   Epoch: 3   Global Step: 40980   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:52:42,353-Speed 5469.99 samples/sec   Loss 9.3582   LearningRate 0.2140   Epoch: 3   Global Step: 40990   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:52:49,952-Speed 5390.98 samples/sec   Loss 9.5125   LearningRate 0.2140   Epoch: 3   Global Step: 41000   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:52:57,458-Speed 5457.57 samples/sec   Loss 9.5007   LearningRate 0.2139   Epoch: 3   Global Step: 41010   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:53:04,995-Speed 5435.23 samples/sec   Loss 9.3641   LearningRate 0.2139   Epoch: 3   Global Step: 41020   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:53:12,541-Speed 5428.78 samples/sec   Loss 9.5485   LearningRate 0.2139   Epoch: 3   Global Step: 41030   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:53:20,029-Speed 5470.83 samples/sec   Loss 9.4829   LearningRate 0.2139   Epoch: 3   Global Step: 41040   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:53:27,718-Speed 5328.15 samples/sec   Loss 9.4446   LearningRate 0.2138   Epoch: 3   Global Step: 41050   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:53:35,308-Speed 5397.04 samples/sec   Loss 9.3896   LearningRate 0.2138   Epoch: 3   Global Step: 41060   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:53:42,909-Speed 5389.35 samples/sec   Loss 9.3834   LearningRate 0.2138   Epoch: 3   Global Step: 41070   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:53:50,492-Speed 5402.48 samples/sec   Loss 9.4624   LearningRate 0.2138   Epoch: 3   Global Step: 41080   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:53:58,067-Speed 5408.10 samples/sec   Loss 9.4209   LearningRate 0.2137   Epoch: 3   Global Step: 41090   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:54:05,644-Speed 5406.88 samples/sec   Loss 9.3973   LearningRate 0.2137   Epoch: 3   Global Step: 41100   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:54:13,157-Speed 5452.37 samples/sec   Loss 9.4778   LearningRate 0.2137   Epoch: 3   Global Step: 41110   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:54:20,754-Speed 5392.50 samples/sec   Loss 9.4507   LearningRate 0.2137   Epoch: 3   Global Step: 41120   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:54:28,265-Speed 5454.03 samples/sec   Loss 9.4742   LearningRate 0.2136   Epoch: 3   Global Step: 41130   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:54:35,863-Speed 5391.47 samples/sec   Loss 9.4667   LearningRate 0.2136   Epoch: 3   Global Step: 41140   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:54:43,403-Speed 5433.18 samples/sec   Loss 9.4411   LearningRate 0.2136   Epoch: 3   Global Step: 41150   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:54:50,903-Speed 5462.01 samples/sec   Loss 9.3390   LearningRate 0.2136   Epoch: 3   Global Step: 41160   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:54:58,429-Speed 5443.16 samples/sec   Loss 9.4436   LearningRate 0.2135   Epoch: 3   Global Step: 41170   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:55:05,893-Speed 5488.96 samples/sec   Loss 9.4257   LearningRate 0.2135   Epoch: 3   Global Step: 41180   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:55:13,366-Speed 5481.60 samples/sec   Loss 9.4193   LearningRate 0.2135   Epoch: 3   Global Step: 41190   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:55:20,914-Speed 5427.65 samples/sec   Loss 9.4209   LearningRate 0.2135   Epoch: 3   Global Step: 41200   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:55:28,522-Speed 5384.28 samples/sec   Loss 9.4238   LearningRate 0.2134   Epoch: 3   Global Step: 41210   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:55:36,127-Speed 5387.15 samples/sec   Loss 9.4930   LearningRate 0.2134   Epoch: 3   Global Step: 41220   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:55:43,697-Speed 5411.62 samples/sec   Loss 9.4318   LearningRate 0.2134   Epoch: 3   Global Step: 41230   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:55:51,356-Speed 5348.51 samples/sec   Loss 9.4280   LearningRate 0.2133   Epoch: 3   Global Step: 41240   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:55:58,915-Speed 5419.26 samples/sec   Loss 9.3531   LearningRate 0.2133   Epoch: 3   Global Step: 41250   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:56:06,483-Speed 5413.24 samples/sec   Loss 9.3934   LearningRate 0.2133   Epoch: 3   Global Step: 41260   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:56:14,116-Speed 5366.70 samples/sec   Loss 9.3768   LearningRate 0.2133   Epoch: 3   Global Step: 41270   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:56:21,629-Speed 5453.12 samples/sec   Loss 9.4261   LearningRate 0.2132   Epoch: 3   Global Step: 41280   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:56:29,087-Speed 5492.40 samples/sec   Loss 9.4223   LearningRate 0.2132   Epoch: 3   Global Step: 41290   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:56:36,626-Speed 5433.75 samples/sec   Loss 9.4170   LearningRate 0.2132   Epoch: 3   Global Step: 41300   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:56:44,209-Speed 5402.38 samples/sec   Loss 9.3723   LearningRate 0.2132   Epoch: 3   Global Step: 41310   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:56:51,677-Speed 5485.91 samples/sec   Loss 9.4880   LearningRate 0.2131   Epoch: 3   Global Step: 41320   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:56:59,094-Speed 5523.02 samples/sec   Loss 9.4186   LearningRate 0.2131   Epoch: 3   Global Step: 41330   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:57:06,615-Speed 5446.56 samples/sec   Loss 9.3466   LearningRate 0.2131   Epoch: 3   Global Step: 41340   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:57:14,329-Speed 5310.61 samples/sec   Loss 9.5012   LearningRate 0.2131   Epoch: 3   Global Step: 41350   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:57:21,982-Speed 5353.22 samples/sec   Loss 9.4677   LearningRate 0.2130   Epoch: 3   Global Step: 41360   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:57:29,734-Speed 5284.77 samples/sec   Loss 9.5325   LearningRate 0.2130   Epoch: 3   Global Step: 41370   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:57:37,219-Speed 5472.47 samples/sec   Loss 9.4341   LearningRate 0.2130   Epoch: 3   Global Step: 41380   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:57:44,762-Speed 5430.60 samples/sec   Loss 9.4390   LearningRate 0.2130   Epoch: 3   Global Step: 41390   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:57:52,264-Speed 5460.77 samples/sec   Loss 9.4303   LearningRate 0.2129   Epoch: 3   Global Step: 41400   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:57:59,818-Speed 5423.52 samples/sec   Loss 9.4220   LearningRate 0.2129   Epoch: 3   Global Step: 41410   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:58:07,359-Speed 5432.27 samples/sec   Loss 9.3565   LearningRate 0.2129   Epoch: 3   Global Step: 41420   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 03:58:14,849-Speed 5468.72 samples/sec   Loss 9.4357   LearningRate 0.2129   Epoch: 3   Global Step: 41430   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:58:22,457-Speed 5384.92 samples/sec   Loss 9.4195   LearningRate 0.2128   Epoch: 3   Global Step: 41440   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:58:30,037-Speed 5404.63 samples/sec   Loss 9.3710   LearningRate 0.2128   Epoch: 3   Global Step: 41450   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:58:37,725-Speed 5328.71 samples/sec   Loss 9.3551   LearningRate 0.2128   Epoch: 3   Global Step: 41460   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:58:45,258-Speed 5437.52 samples/sec   Loss 9.3857   LearningRate 0.2128   Epoch: 3   Global Step: 41470   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:58:52,905-Speed 5357.84 samples/sec   Loss 9.4178   LearningRate 0.2127   Epoch: 3   Global Step: 41480   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:59:16,076-Speed 1767.79 samples/sec   Loss 9.3659   LearningRate 0.2127   Epoch: 4   Global Step: 41490   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:59:23,558-Speed 5474.96 samples/sec   Loss 9.4160   LearningRate 0.2127   Epoch: 4   Global Step: 41500   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:59:31,142-Speed 5401.87 samples/sec   Loss 9.3544   LearningRate 0.2127   Epoch: 4   Global Step: 41510   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:59:38,559-Speed 5522.98 samples/sec   Loss 9.3627   LearningRate 0.2126   Epoch: 4   Global Step: 41520   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 03:59:46,030-Speed 5483.24 samples/sec   Loss 9.3927   LearningRate 0.2126   Epoch: 4   Global Step: 41530   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-08 03:59:53,465-Speed 5510.13 samples/sec   Loss 9.3771   LearningRate 0.2126   Epoch: 4   Global Step: 41540   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-08 04:00:00,886-Speed 5520.15 samples/sec   Loss 9.3237   LearningRate 0.2126   Epoch: 4   Global Step: 41550   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:00:08,303-Speed 5523.90 samples/sec   Loss 9.4898   LearningRate 0.2125   Epoch: 4   Global Step: 41560   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:00:15,760-Speed 5493.24 samples/sec   Loss 9.3922   LearningRate 0.2125   Epoch: 4   Global Step: 41570   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:00:23,211-Speed 5498.03 samples/sec   Loss 9.4494   LearningRate 0.2125   Epoch: 4   Global Step: 41580   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:00:30,652-Speed 5505.25 samples/sec   Loss 9.4020   LearningRate 0.2125   Epoch: 4   Global Step: 41590   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:00:38,083-Speed 5512.93 samples/sec   Loss 9.3436   LearningRate 0.2124   Epoch: 4   Global Step: 41600   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:00:45,499-Speed 5524.14 samples/sec   Loss 9.3315   LearningRate 0.2124   Epoch: 4   Global Step: 41610   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:00:52,915-Speed 5523.59 samples/sec   Loss 9.4045   LearningRate 0.2124   Epoch: 4   Global Step: 41620   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:01:00,363-Speed 5499.63 samples/sec   Loss 9.3997   LearningRate 0.2123   Epoch: 4   Global Step: 41630   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:01:07,796-Speed 5511.94 samples/sec   Loss 9.3349   LearningRate 0.2123   Epoch: 4   Global Step: 41640   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:01:15,282-Speed 5472.15 samples/sec   Loss 9.3566   LearningRate 0.2123   Epoch: 4   Global Step: 41650   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-08 04:01:22,869-Speed 5399.52 samples/sec   Loss 9.4461   LearningRate 0.2123   Epoch: 4   Global Step: 41660   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-08 04:01:30,316-Speed 5500.51 samples/sec   Loss 9.3578   LearningRate 0.2122   Epoch: 4   Global Step: 41670   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-08 04:01:37,791-Speed 5480.56 samples/sec   Loss 9.3483   LearningRate 0.2122   Epoch: 4   Global Step: 41680   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-08 04:01:45,253-Speed 5489.96 samples/sec   Loss 9.3850   LearningRate 0.2122   Epoch: 4   Global Step: 41690   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-08 04:01:52,768-Speed 5450.85 samples/sec   Loss 9.3532   LearningRate 0.2122   Epoch: 4   Global Step: 41700   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-08 04:02:00,278-Speed 5454.66 samples/sec   Loss 9.4923   LearningRate 0.2121   Epoch: 4   Global Step: 41710   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:02:07,745-Speed 5486.20 samples/sec   Loss 9.3780   LearningRate 0.2121   Epoch: 4   Global Step: 41720   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:02:15,151-Speed 5531.55 samples/sec   Loss 9.3611   LearningRate 0.2121   Epoch: 4   Global Step: 41730   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:02:22,610-Speed 5491.90 samples/sec   Loss 9.3877   LearningRate 0.2121   Epoch: 4   Global Step: 41740   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:02:29,991-Speed 5549.51 samples/sec   Loss 9.4035   LearningRate 0.2120   Epoch: 4   Global Step: 41750   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-01-08 04:02:37,454-Speed 5489.61 samples/sec   Loss 9.3687   LearningRate 0.2120   Epoch: 4   Global Step: 41760   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-01-08 04:02:44,879-Speed 5517.71 samples/sec   Loss 9.3683   LearningRate 0.2120   Epoch: 4   Global Step: 41770   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-01-08 04:02:52,321-Speed 5503.90 samples/sec   Loss 9.2806   LearningRate 0.2120   Epoch: 4   Global Step: 41780   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-01-08 04:02:59,772-Speed 5498.20 samples/sec   Loss 9.3185   LearningRate 0.2119   Epoch: 4   Global Step: 41790   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-01-08 04:03:07,236-Speed 5488.77 samples/sec   Loss 9.3295   LearningRate 0.2119   Epoch: 4   Global Step: 41800   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-01-08 04:03:14,723-Speed 5471.29 samples/sec   Loss 9.3160   LearningRate 0.2119   Epoch: 4   Global Step: 41810   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-01-08 04:03:22,182-Speed 5491.92 samples/sec   Loss 9.3291   LearningRate 0.2119   Epoch: 4   Global Step: 41820   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-01-08 04:03:29,676-Speed 5466.16 samples/sec   Loss 9.2931   LearningRate 0.2118   Epoch: 4   Global Step: 41830   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-01-08 04:03:37,129-Speed 5497.20 samples/sec   Loss 9.3075   LearningRate 0.2118   Epoch: 4   Global Step: 41840   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-01-08 04:03:44,783-Speed 5352.01 samples/sec   Loss 9.3181   LearningRate 0.2118   Epoch: 4   Global Step: 41850   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 04:03:52,301-Speed 5448.48 samples/sec   Loss 9.3635   LearningRate 0.2118   Epoch: 4   Global Step: 41860   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 04:03:59,956-Speed 5351.16 samples/sec   Loss 9.3643   LearningRate 0.2117   Epoch: 4   Global Step: 41870   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 04:04:07,490-Speed 5437.33 samples/sec   Loss 9.3875   LearningRate 0.2117   Epoch: 4   Global Step: 41880   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 04:04:15,129-Speed 5363.64 samples/sec   Loss 9.3959   LearningRate 0.2117   Epoch: 4   Global Step: 41890   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 04:04:22,582-Speed 5496.27 samples/sec   Loss 9.3558   LearningRate 0.2117   Epoch: 4   Global Step: 41900   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 04:04:30,103-Speed 5446.63 samples/sec   Loss 9.3633   LearningRate 0.2116   Epoch: 4   Global Step: 41910   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 04:04:37,639-Speed 5435.36 samples/sec   Loss 9.3913   LearningRate 0.2116   Epoch: 4   Global Step: 41920   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 04:04:45,162-Speed 5445.95 samples/sec   Loss 9.3657   LearningRate 0.2116   Epoch: 4   Global Step: 41930   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 04:04:52,641-Speed 5477.46 samples/sec   Loss 9.3685   LearningRate 0.2116   Epoch: 4   Global Step: 41940   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 04:05:00,213-Speed 5409.73 samples/sec   Loss 9.3834   LearningRate 0.2115   Epoch: 4   Global Step: 41950   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:05:07,663-Speed 5498.70 samples/sec   Loss 9.4267   LearningRate 0.2115   Epoch: 4   Global Step: 41960   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:05:15,145-Speed 5476.09 samples/sec   Loss 9.2825   LearningRate 0.2115   Epoch: 4   Global Step: 41970   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:05:22,691-Speed 5428.82 samples/sec   Loss 9.3149   LearningRate 0.2115   Epoch: 4   Global Step: 41980   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:05:30,270-Speed 5404.83 samples/sec   Loss 9.3627   LearningRate 0.2114   Epoch: 4   Global Step: 41990   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:05:37,758-Speed 5470.55 samples/sec   Loss 9.2753   LearningRate 0.2114   Epoch: 4   Global Step: 42000   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:06:22,275-[lfw][42000]XNorm: 23.390701
Training: 2022-01-08 04:06:22,276-[lfw][42000]Accuracy-Flip: 0.99717+-0.00317
Training: 2022-01-08 04:06:22,276-[lfw][42000]Accuracy-Highest: 0.99800
Training: 2022-01-08 04:07:14,015-[cfp_fp][42000]XNorm: 21.024187
Training: 2022-01-08 04:07:14,016-[cfp_fp][42000]Accuracy-Flip: 0.98086+-0.00583
Training: 2022-01-08 04:07:14,017-[cfp_fp][42000]Accuracy-Highest: 0.98600
Training: 2022-01-08 04:07:59,728-[agedb_30][42000]XNorm: 23.162721
Training: 2022-01-08 04:07:59,729-[agedb_30][42000]Accuracy-Flip: 0.96750+-0.00880
Training: 2022-01-08 04:07:59,730-[agedb_30][42000]Accuracy-Highest: 0.97250
Training: 2022-01-08 04:08:07,536-Speed 273.47 samples/sec   Loss 9.3577   LearningRate 0.2114   Epoch: 4   Global Step: 42010   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:08:15,210-Speed 5339.04 samples/sec   Loss 9.4016   LearningRate 0.2113   Epoch: 4   Global Step: 42020   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:08:22,918-Speed 5315.14 samples/sec   Loss 9.3557   LearningRate 0.2113   Epoch: 4   Global Step: 42030   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:08:30,551-Speed 5367.80 samples/sec   Loss 9.3102   LearningRate 0.2113   Epoch: 4   Global Step: 42040   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:08:38,110-Speed 5420.01 samples/sec   Loss 9.3468   LearningRate 0.2113   Epoch: 4   Global Step: 42050   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-08 04:08:45,615-Speed 5457.82 samples/sec   Loss 9.3971   LearningRate 0.2112   Epoch: 4   Global Step: 42060   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-08 04:08:53,115-Speed 5462.36 samples/sec   Loss 9.3715   LearningRate 0.2112   Epoch: 4   Global Step: 42070   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-08 04:09:00,729-Speed 5380.11 samples/sec   Loss 9.3778   LearningRate 0.2112   Epoch: 4   Global Step: 42080   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-08 04:09:08,292-Speed 5417.05 samples/sec   Loss 9.3661   LearningRate 0.2112   Epoch: 4   Global Step: 42090   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-08 04:09:15,816-Speed 5444.48 samples/sec   Loss 9.4113   LearningRate 0.2111   Epoch: 4   Global Step: 42100   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-08 04:09:23,389-Speed 5409.66 samples/sec   Loss 9.3112   LearningRate 0.2111   Epoch: 4   Global Step: 42110   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-08 04:09:30,827-Speed 5507.21 samples/sec   Loss 9.3730   LearningRate 0.2111   Epoch: 4   Global Step: 42120   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-08 04:09:38,304-Speed 5479.17 samples/sec   Loss 9.4089   LearningRate 0.2111   Epoch: 4   Global Step: 42130   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-08 04:09:45,844-Speed 5432.71 samples/sec   Loss 9.3491   LearningRate 0.2110   Epoch: 4   Global Step: 42140   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-08 04:09:53,373-Speed 5441.42 samples/sec   Loss 9.3291   LearningRate 0.2110   Epoch: 4   Global Step: 42150   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:10:00,851-Speed 5478.23 samples/sec   Loss 9.3479   LearningRate 0.2110   Epoch: 4   Global Step: 42160   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:10:08,407-Speed 5421.68 samples/sec   Loss 9.3034   LearningRate 0.2110   Epoch: 4   Global Step: 42170   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:10:15,992-Speed 5400.76 samples/sec   Loss 9.3203   LearningRate 0.2109   Epoch: 4   Global Step: 42180   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:10:23,479-Speed 5471.55 samples/sec   Loss 9.2430   LearningRate 0.2109   Epoch: 4   Global Step: 42190   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:10:30,996-Speed 5449.75 samples/sec   Loss 9.3029   LearningRate 0.2109   Epoch: 4   Global Step: 42200   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:10:38,489-Speed 5466.72 samples/sec   Loss 9.3816   LearningRate 0.2109   Epoch: 4   Global Step: 42210   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:10:46,034-Speed 5429.31 samples/sec   Loss 9.3285   LearningRate 0.2108   Epoch: 4   Global Step: 42220   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:10:53,578-Speed 5430.77 samples/sec   Loss 9.3246   LearningRate 0.2108   Epoch: 4   Global Step: 42230   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:11:01,050-Speed 5482.06 samples/sec   Loss 9.3156   LearningRate 0.2108   Epoch: 4   Global Step: 42240   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:11:08,622-Speed 5410.01 samples/sec   Loss 9.4037   LearningRate 0.2108   Epoch: 4   Global Step: 42250   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-08 04:11:16,157-Speed 5437.10 samples/sec   Loss 9.4282   LearningRate 0.2107   Epoch: 4   Global Step: 42260   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:11:23,675-Speed 5448.75 samples/sec   Loss 9.3602   LearningRate 0.2107   Epoch: 4   Global Step: 42270   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:11:31,341-Speed 5344.05 samples/sec   Loss 9.3592   LearningRate 0.2107   Epoch: 4   Global Step: 42280   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:11:38,888-Speed 5428.02 samples/sec   Loss 9.3488   LearningRate 0.2107   Epoch: 4   Global Step: 42290   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:11:46,479-Speed 5395.92 samples/sec   Loss 9.3451   LearningRate 0.2106   Epoch: 4   Global Step: 42300   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:11:54,045-Speed 5414.81 samples/sec   Loss 9.3458   LearningRate 0.2106   Epoch: 4   Global Step: 42310   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:12:01,632-Speed 5399.49 samples/sec   Loss 9.3081   LearningRate 0.2106   Epoch: 4   Global Step: 42320   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:12:09,267-Speed 5366.96 samples/sec   Loss 9.3011   LearningRate 0.2106   Epoch: 4   Global Step: 42330   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:12:16,869-Speed 5388.69 samples/sec   Loss 9.3562   LearningRate 0.2105   Epoch: 4   Global Step: 42340   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 04:12:24,413-Speed 5430.07 samples/sec   Loss 9.3106   LearningRate 0.2105   Epoch: 4   Global Step: 42350   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 04:12:32,075-Speed 5347.33 samples/sec   Loss 9.3018   LearningRate 0.2105   Epoch: 4   Global Step: 42360   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 04:12:39,619-Speed 5430.28 samples/sec   Loss 9.3288   LearningRate 0.2105   Epoch: 4   Global Step: 42370   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 04:12:47,140-Speed 5446.99 samples/sec   Loss 9.2851   LearningRate 0.2104   Epoch: 4   Global Step: 42380   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 04:12:54,646-Speed 5457.96 samples/sec   Loss 9.3833   LearningRate 0.2104   Epoch: 4   Global Step: 42390   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 04:13:02,165-Speed 5447.72 samples/sec   Loss 9.3624   LearningRate 0.2104   Epoch: 4   Global Step: 42400   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 04:13:09,721-Speed 5421.12 samples/sec   Loss 9.2986   LearningRate 0.2104   Epoch: 4   Global Step: 42410   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 04:13:17,294-Speed 5409.53 samples/sec   Loss 9.3102   LearningRate 0.2103   Epoch: 4   Global Step: 42420   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 04:13:24,785-Speed 5469.29 samples/sec   Loss 9.3331   LearningRate 0.2103   Epoch: 4   Global Step: 42430   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 04:13:32,326-Speed 5432.05 samples/sec   Loss 9.3349   LearningRate 0.2103   Epoch: 4   Global Step: 42440   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:13:39,858-Speed 5438.84 samples/sec   Loss 9.3244   LearningRate 0.2103   Epoch: 4   Global Step: 42450   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:13:47,389-Speed 5439.59 samples/sec   Loss 9.3432   LearningRate 0.2102   Epoch: 4   Global Step: 42460   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:13:54,965-Speed 5408.46 samples/sec   Loss 9.4136   LearningRate 0.2102   Epoch: 4   Global Step: 42470   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:14:02,581-Speed 5378.31 samples/sec   Loss 9.3457   LearningRate 0.2102   Epoch: 4   Global Step: 42480   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:14:10,183-Speed 5389.01 samples/sec   Loss 9.3476   LearningRate 0.2101   Epoch: 4   Global Step: 42490   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:14:17,846-Speed 5345.69 samples/sec   Loss 9.3498   LearningRate 0.2101   Epoch: 4   Global Step: 42500   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:14:25,365-Speed 5449.00 samples/sec   Loss 9.3022   LearningRate 0.2101   Epoch: 4   Global Step: 42510   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:14:32,842-Speed 5478.71 samples/sec   Loss 9.3533   LearningRate 0.2101   Epoch: 4   Global Step: 42520   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:14:40,485-Speed 5359.78 samples/sec   Loss 9.2067   LearningRate 0.2100   Epoch: 4   Global Step: 42530   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:14:48,026-Speed 5431.89 samples/sec   Loss 9.3262   LearningRate 0.2100   Epoch: 4   Global Step: 42540   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:14:55,527-Speed 5461.81 samples/sec   Loss 9.2507   LearningRate 0.2100   Epoch: 4   Global Step: 42550   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:15:02,994-Speed 5486.45 samples/sec   Loss 9.3675   LearningRate 0.2100   Epoch: 4   Global Step: 42560   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:15:10,563-Speed 5411.91 samples/sec   Loss 9.2152   LearningRate 0.2099   Epoch: 4   Global Step: 42570   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:15:18,091-Speed 5442.23 samples/sec   Loss 9.3118   LearningRate 0.2099   Epoch: 4   Global Step: 42580   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:15:25,649-Speed 5420.53 samples/sec   Loss 9.3144   LearningRate 0.2099   Epoch: 4   Global Step: 42590   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:15:33,135-Speed 5471.98 samples/sec   Loss 9.2659   LearningRate 0.2099   Epoch: 4   Global Step: 42600   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:15:40,769-Speed 5366.47 samples/sec   Loss 9.3136   LearningRate 0.2098   Epoch: 4   Global Step: 42610   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:15:48,447-Speed 5335.52 samples/sec   Loss 9.3428   LearningRate 0.2098   Epoch: 4   Global Step: 42620   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:15:55,979-Speed 5438.70 samples/sec   Loss 9.3469   LearningRate 0.2098   Epoch: 4   Global Step: 42630   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:16:03,612-Speed 5367.12 samples/sec   Loss 9.3447   LearningRate 0.2098   Epoch: 4   Global Step: 42640   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-08 04:16:11,210-Speed 5391.75 samples/sec   Loss 9.2877   LearningRate 0.2097   Epoch: 4   Global Step: 42650   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-08 04:16:18,682-Speed 5482.64 samples/sec   Loss 9.2621   LearningRate 0.2097   Epoch: 4   Global Step: 42660   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-08 04:16:26,187-Speed 5458.24 samples/sec   Loss 9.2098   LearningRate 0.2097   Epoch: 4   Global Step: 42670   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:16:33,702-Speed 5450.88 samples/sec   Loss 9.2515   LearningRate 0.2097   Epoch: 4   Global Step: 42680   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:16:41,419-Speed 5309.22 samples/sec   Loss 9.3625   LearningRate 0.2096   Epoch: 4   Global Step: 42690   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:16:48,940-Speed 5446.92 samples/sec   Loss 9.2873   LearningRate 0.2096   Epoch: 4   Global Step: 42700   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:16:56,450-Speed 5454.56 samples/sec   Loss 9.3077   LearningRate 0.2096   Epoch: 4   Global Step: 42710   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:17:04,025-Speed 5408.38 samples/sec   Loss 9.2293   LearningRate 0.2096   Epoch: 4   Global Step: 42720   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:17:11,572-Speed 5428.03 samples/sec   Loss 9.3012   LearningRate 0.2095   Epoch: 4   Global Step: 42730   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:17:19,038-Speed 5486.76 samples/sec   Loss 9.3078   LearningRate 0.2095   Epoch: 4   Global Step: 42740   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:17:26,523-Speed 5472.96 samples/sec   Loss 9.2668   LearningRate 0.2095   Epoch: 4   Global Step: 42750   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:17:33,994-Speed 5483.72 samples/sec   Loss 9.3140   LearningRate 0.2095   Epoch: 4   Global Step: 42760   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:17:41,522-Speed 5441.94 samples/sec   Loss 9.3400   LearningRate 0.2094   Epoch: 4   Global Step: 42770   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-08 04:17:49,062-Speed 5432.93 samples/sec   Loss 9.2410   LearningRate 0.2094   Epoch: 4   Global Step: 42780   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:17:56,685-Speed 5373.80 samples/sec   Loss 9.2912   LearningRate 0.2094   Epoch: 4   Global Step: 42790   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:18:04,289-Speed 5387.45 samples/sec   Loss 9.2539   LearningRate 0.2094   Epoch: 4   Global Step: 42800   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:18:11,832-Speed 5431.24 samples/sec   Loss 9.2886   LearningRate 0.2093   Epoch: 4   Global Step: 42810   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:18:19,316-Speed 5473.59 samples/sec   Loss 9.2016   LearningRate 0.2093   Epoch: 4   Global Step: 42820   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:18:26,788-Speed 5482.52 samples/sec   Loss 9.2130   LearningRate 0.2093   Epoch: 4   Global Step: 42830   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:18:34,237-Speed 5499.71 samples/sec   Loss 9.3460   LearningRate 0.2093   Epoch: 4   Global Step: 42840   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:18:41,814-Speed 5406.67 samples/sec   Loss 9.3177   LearningRate 0.2092   Epoch: 4   Global Step: 42850   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:18:49,376-Speed 5416.93 samples/sec   Loss 9.3307   LearningRate 0.2092   Epoch: 4   Global Step: 42860   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:18:56,935-Speed 5419.46 samples/sec   Loss 9.3003   LearningRate 0.2092   Epoch: 4   Global Step: 42870   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:19:04,342-Speed 5530.46 samples/sec   Loss 9.2814   LearningRate 0.2092   Epoch: 4   Global Step: 42880   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-08 04:19:11,987-Speed 5358.60 samples/sec   Loss 9.2315   LearningRate 0.2091   Epoch: 4   Global Step: 42890   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-08 04:19:19,412-Speed 5517.81 samples/sec   Loss 9.3113   LearningRate 0.2091   Epoch: 4   Global Step: 42900   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 04:19:26,940-Speed 5441.09 samples/sec   Loss 9.3388   LearningRate 0.2091   Epoch: 4   Global Step: 42910   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 04:19:34,514-Speed 5409.09 samples/sec   Loss 9.2730   LearningRate 0.2091   Epoch: 4   Global Step: 42920   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 04:19:42,032-Speed 5448.97 samples/sec   Loss 9.2639   LearningRate 0.2090   Epoch: 4   Global Step: 42930   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 04:19:49,558-Speed 5442.89 samples/sec   Loss 9.3397   LearningRate 0.2090   Epoch: 4   Global Step: 42940   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 04:19:57,079-Speed 5446.87 samples/sec   Loss 9.2083   LearningRate 0.2090   Epoch: 4   Global Step: 42950   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 04:20:04,591-Speed 5453.37 samples/sec   Loss 9.2107   LearningRate 0.2090   Epoch: 4   Global Step: 42960   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 04:20:12,105-Speed 5452.16 samples/sec   Loss 9.2313   LearningRate 0.2089   Epoch: 4   Global Step: 42970   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 04:20:19,555-Speed 5498.86 samples/sec   Loss 9.3115   LearningRate 0.2089   Epoch: 4   Global Step: 42980   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 04:20:27,047-Speed 5467.47 samples/sec   Loss 9.2993   LearningRate 0.2089   Epoch: 4   Global Step: 42990   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 04:20:34,634-Speed 5399.26 samples/sec   Loss 9.2418   LearningRate 0.2089   Epoch: 4   Global Step: 43000   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:20:42,098-Speed 5488.43 samples/sec   Loss 9.2398   LearningRate 0.2088   Epoch: 4   Global Step: 43010   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:20:49,599-Speed 5461.71 samples/sec   Loss 9.1914   LearningRate 0.2088   Epoch: 4   Global Step: 43020   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:20:57,072-Speed 5482.19 samples/sec   Loss 9.2625   LearningRate 0.2088   Epoch: 4   Global Step: 43030   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:21:04,596-Speed 5444.12 samples/sec   Loss 9.1684   LearningRate 0.2088   Epoch: 4   Global Step: 43040   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:21:12,146-Speed 5426.22 samples/sec   Loss 9.2882   LearningRate 0.2087   Epoch: 4   Global Step: 43050   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:21:19,653-Speed 5457.63 samples/sec   Loss 9.3283   LearningRate 0.2087   Epoch: 4   Global Step: 43060   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:21:27,122-Speed 5484.63 samples/sec   Loss 9.2995   LearningRate 0.2087   Epoch: 4   Global Step: 43070   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:21:34,608-Speed 5472.10 samples/sec   Loss 9.2162   LearningRate 0.2086   Epoch: 4   Global Step: 43080   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:21:42,217-Speed 5383.96 samples/sec   Loss 9.2559   LearningRate 0.2086   Epoch: 4   Global Step: 43090   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:21:49,649-Speed 5511.86 samples/sec   Loss 9.3275   LearningRate 0.2086   Epoch: 4   Global Step: 43100   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-08 04:21:57,183-Speed 5437.50 samples/sec   Loss 9.2082   LearningRate 0.2086   Epoch: 4   Global Step: 43110   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-08 04:22:04,820-Speed 5364.45 samples/sec   Loss 9.2602   LearningRate 0.2085   Epoch: 4   Global Step: 43120   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-08 04:22:12,394-Speed 5408.45 samples/sec   Loss 9.3123   LearningRate 0.2085   Epoch: 4   Global Step: 43130   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-08 04:22:19,959-Speed 5415.14 samples/sec   Loss 9.2292   LearningRate 0.2085   Epoch: 4   Global Step: 43140   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-08 04:22:27,460-Speed 5461.31 samples/sec   Loss 9.3010   LearningRate 0.2085   Epoch: 4   Global Step: 43150   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-08 04:22:34,946-Speed 5472.75 samples/sec   Loss 9.2648   LearningRate 0.2084   Epoch: 4   Global Step: 43160   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 04:22:42,455-Speed 5455.27 samples/sec   Loss 9.2716   LearningRate 0.2084   Epoch: 4   Global Step: 43170   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-08 04:22:50,030-Speed 5407.55 samples/sec   Loss 9.2025   LearningRate 0.2084   Epoch: 4   Global Step: 43180   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 04:22:57,569-Speed 5434.07 samples/sec   Loss 9.2713   LearningRate 0.2084   Epoch: 4   Global Step: 43190   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 04:23:05,131-Speed 5416.94 samples/sec   Loss 9.2203   LearningRate 0.2083   Epoch: 4   Global Step: 43200   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 04:23:12,707-Speed 5407.87 samples/sec   Loss 9.2329   LearningRate 0.2083   Epoch: 4   Global Step: 43210   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 04:23:20,251-Speed 5430.01 samples/sec   Loss 9.3008   LearningRate 0.2083   Epoch: 4   Global Step: 43220   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 04:23:27,736-Speed 5472.43 samples/sec   Loss 9.2895   LearningRate 0.2083   Epoch: 4   Global Step: 43230   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 04:23:35,274-Speed 5434.57 samples/sec   Loss 9.2694   LearningRate 0.2082   Epoch: 4   Global Step: 43240   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 04:23:42,876-Speed 5389.00 samples/sec   Loss 9.2296   LearningRate 0.2082   Epoch: 4   Global Step: 43250   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 04:23:50,412-Speed 5435.59 samples/sec   Loss 9.2310   LearningRate 0.2082   Epoch: 4   Global Step: 43260   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:23:58,906-Speed 5514.76 samples/sec   Loss 9.1479   LearningRate 0.2082   Epoch: 4   Global Step: 43270   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:24:06,460-Speed 5422.82 samples/sec   Loss 9.2896   LearningRate 0.2081   Epoch: 4   Global Step: 43280   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:24:13,997-Speed 5435.31 samples/sec   Loss 9.3310   LearningRate 0.2081   Epoch: 4   Global Step: 43290   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:24:21,477-Speed 5477.32 samples/sec   Loss 9.2363   LearningRate 0.2081   Epoch: 4   Global Step: 43300   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:24:28,982-Speed 5458.01 samples/sec   Loss 9.2667   LearningRate 0.2081   Epoch: 4   Global Step: 43310   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:24:36,536-Speed 5423.15 samples/sec   Loss 9.2889   LearningRate 0.2080   Epoch: 4   Global Step: 43320   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:24:44,020-Speed 5473.68 samples/sec   Loss 9.1796   LearningRate 0.2080   Epoch: 4   Global Step: 43330   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:24:51,558-Speed 5434.85 samples/sec   Loss 9.2422   LearningRate 0.2080   Epoch: 4   Global Step: 43340   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:24:59,164-Speed 5386.22 samples/sec   Loss 9.2196   LearningRate 0.2080   Epoch: 4   Global Step: 43350   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:25:06,729-Speed 5414.64 samples/sec   Loss 9.2528   LearningRate 0.2079   Epoch: 4   Global Step: 43360   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:25:14,243-Speed 5451.81 samples/sec   Loss 9.2098   LearningRate 0.2079   Epoch: 4   Global Step: 43370   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:25:21,910-Speed 5344.13 samples/sec   Loss 9.3001   LearningRate 0.2079   Epoch: 4   Global Step: 43380   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:25:29,445-Speed 5436.31 samples/sec   Loss 9.3410   LearningRate 0.2079   Epoch: 4   Global Step: 43390   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:25:36,954-Speed 5455.21 samples/sec   Loss 9.2262   LearningRate 0.2078   Epoch: 4   Global Step: 43400   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:25:44,513-Speed 5419.09 samples/sec   Loss 9.1748   LearningRate 0.2078   Epoch: 4   Global Step: 43410   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:25:52,029-Speed 5450.97 samples/sec   Loss 9.1102   LearningRate 0.2078   Epoch: 4   Global Step: 43420   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:25:59,578-Speed 5426.40 samples/sec   Loss 9.2119   LearningRate 0.2078   Epoch: 4   Global Step: 43430   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:26:07,077-Speed 5462.59 samples/sec   Loss 9.2914   LearningRate 0.2077   Epoch: 4   Global Step: 43440   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:26:14,695-Speed 5377.50 samples/sec   Loss 9.2907   LearningRate 0.2077   Epoch: 4   Global Step: 43450   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:26:22,310-Speed 5379.82 samples/sec   Loss 9.2142   LearningRate 0.2077   Epoch: 4   Global Step: 43460   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:26:29,869-Speed 5419.28 samples/sec   Loss 9.2320   LearningRate 0.2077   Epoch: 4   Global Step: 43470   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:26:37,398-Speed 5440.91 samples/sec   Loss 9.2045   LearningRate 0.2076   Epoch: 4   Global Step: 43480   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:26:44,866-Speed 5485.63 samples/sec   Loss 9.1607   LearningRate 0.2076   Epoch: 4   Global Step: 43490   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:26:52,366-Speed 5462.43 samples/sec   Loss 9.2223   LearningRate 0.2076   Epoch: 4   Global Step: 43500   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:26:59,929-Speed 5416.29 samples/sec   Loss 9.2849   LearningRate 0.2076   Epoch: 4   Global Step: 43510   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:27:07,421-Speed 5467.95 samples/sec   Loss 9.2050   LearningRate 0.2075   Epoch: 4   Global Step: 43520   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:27:15,063-Speed 5360.65 samples/sec   Loss 9.2247   LearningRate 0.2075   Epoch: 4   Global Step: 43530   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:27:22,583-Speed 5447.66 samples/sec   Loss 9.2090   LearningRate 0.2075   Epoch: 4   Global Step: 43540   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:27:30,100-Speed 5449.63 samples/sec   Loss 9.2329   LearningRate 0.2075   Epoch: 4   Global Step: 43550   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:27:37,636-Speed 5435.58 samples/sec   Loss 9.2068   LearningRate 0.2074   Epoch: 4   Global Step: 43560   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:27:45,139-Speed 5460.26 samples/sec   Loss 9.1638   LearningRate 0.2074   Epoch: 4   Global Step: 43570   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:27:52,669-Speed 5440.65 samples/sec   Loss 9.2132   LearningRate 0.2074   Epoch: 4   Global Step: 43580   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:28:00,120-Speed 5497.60 samples/sec   Loss 9.2550   LearningRate 0.2074   Epoch: 4   Global Step: 43590   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:28:07,591-Speed 5483.52 samples/sec   Loss 9.2359   LearningRate 0.2073   Epoch: 4   Global Step: 43600   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:28:15,137-Speed 5428.62 samples/sec   Loss 9.1240   LearningRate 0.2073   Epoch: 4   Global Step: 43610   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:28:22,656-Speed 5448.43 samples/sec   Loss 9.2587   LearningRate 0.2073   Epoch: 4   Global Step: 43620   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:28:30,125-Speed 5485.19 samples/sec   Loss 9.1839   LearningRate 0.2073   Epoch: 4   Global Step: 43630   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:28:37,568-Speed 5503.11 samples/sec   Loss 9.2305   LearningRate 0.2072   Epoch: 4   Global Step: 43640   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:28:45,044-Speed 5479.47 samples/sec   Loss 9.1825   LearningRate 0.2072   Epoch: 4   Global Step: 43650   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:28:52,546-Speed 5461.51 samples/sec   Loss 9.2399   LearningRate 0.2072   Epoch: 4   Global Step: 43660   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:29:00,024-Speed 5478.13 samples/sec   Loss 9.1764   LearningRate 0.2072   Epoch: 4   Global Step: 43670   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:29:07,524-Speed 5461.69 samples/sec   Loss 9.2480   LearningRate 0.2071   Epoch: 4   Global Step: 43680   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:29:15,020-Speed 5464.73 samples/sec   Loss 9.1352   LearningRate 0.2071   Epoch: 4   Global Step: 43690   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:29:22,547-Speed 5443.01 samples/sec   Loss 9.1267   LearningRate 0.2071   Epoch: 4   Global Step: 43700   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:29:30,022-Speed 5480.37 samples/sec   Loss 9.2407   LearningRate 0.2071   Epoch: 4   Global Step: 43710   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:29:37,472-Speed 5498.35 samples/sec   Loss 9.2858   LearningRate 0.2070   Epoch: 4   Global Step: 43720   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:29:44,988-Speed 5450.52 samples/sec   Loss 9.1372   LearningRate 0.2070   Epoch: 4   Global Step: 43730   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:29:52,573-Speed 5400.89 samples/sec   Loss 9.2368   LearningRate 0.2070   Epoch: 4   Global Step: 43740   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:30:00,054-Speed 5476.29 samples/sec   Loss 9.2904   LearningRate 0.2070   Epoch: 4   Global Step: 43750   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:30:07,539-Speed 5473.19 samples/sec   Loss 9.2981   LearningRate 0.2069   Epoch: 4   Global Step: 43760   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:30:14,982-Speed 5503.59 samples/sec   Loss 9.1991   LearningRate 0.2069   Epoch: 4   Global Step: 43770   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:30:22,459-Speed 5479.30 samples/sec   Loss 9.2620   LearningRate 0.2069   Epoch: 4   Global Step: 43780   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:30:29,905-Speed 5501.34 samples/sec   Loss 9.2067   LearningRate 0.2068   Epoch: 4   Global Step: 43790   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:30:37,386-Speed 5476.13 samples/sec   Loss 9.2453   LearningRate 0.2068   Epoch: 4   Global Step: 43800   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:30:44,937-Speed 5424.86 samples/sec   Loss 9.2299   LearningRate 0.2068   Epoch: 4   Global Step: 43810   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:30:52,495-Speed 5420.41 samples/sec   Loss 9.1861   LearningRate 0.2068   Epoch: 4   Global Step: 43820   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:31:00,114-Speed 5376.63 samples/sec   Loss 9.2230   LearningRate 0.2067   Epoch: 4   Global Step: 43830   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:31:07,647-Speed 5438.12 samples/sec   Loss 9.2127   LearningRate 0.2067   Epoch: 4   Global Step: 43840   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:31:15,173-Speed 5443.15 samples/sec   Loss 9.2208   LearningRate 0.2067   Epoch: 4   Global Step: 43850   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:31:22,714-Speed 5432.56 samples/sec   Loss 9.2282   LearningRate 0.2067   Epoch: 4   Global Step: 43860   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:31:30,226-Speed 5453.09 samples/sec   Loss 9.2208   LearningRate 0.2066   Epoch: 4   Global Step: 43870   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:31:37,674-Speed 5500.05 samples/sec   Loss 9.1704   LearningRate 0.2066   Epoch: 4   Global Step: 43880   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:31:45,316-Speed 5360.79 samples/sec   Loss 9.2046   LearningRate 0.2066   Epoch: 4   Global Step: 43890   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:31:52,771-Speed 5495.20 samples/sec   Loss 9.1914   LearningRate 0.2066   Epoch: 4   Global Step: 43900   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:32:00,281-Speed 5455.25 samples/sec   Loss 9.1410   LearningRate 0.2065   Epoch: 4   Global Step: 43910   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:32:07,768-Speed 5471.33 samples/sec   Loss 9.1724   LearningRate 0.2065   Epoch: 4   Global Step: 43920   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 04:32:15,356-Speed 5398.86 samples/sec   Loss 9.2606   LearningRate 0.2065   Epoch: 4   Global Step: 43930   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 04:32:22,797-Speed 5504.76 samples/sec   Loss 9.1921   LearningRate 0.2065   Epoch: 4   Global Step: 43940   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 04:32:30,337-Speed 5433.40 samples/sec   Loss 9.1999   LearningRate 0.2064   Epoch: 4   Global Step: 43950   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 04:32:37,888-Speed 5425.16 samples/sec   Loss 9.2180   LearningRate 0.2064   Epoch: 4   Global Step: 43960   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 04:32:45,363-Speed 5480.35 samples/sec   Loss 9.1248   LearningRate 0.2064   Epoch: 4   Global Step: 43970   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 04:32:53,017-Speed 5352.75 samples/sec   Loss 9.2272   LearningRate 0.2064   Epoch: 4   Global Step: 43980   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 04:33:00,542-Speed 5443.66 samples/sec   Loss 9.2201   LearningRate 0.2063   Epoch: 4   Global Step: 43990   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 04:33:08,052-Speed 5455.05 samples/sec   Loss 9.2303   LearningRate 0.2063   Epoch: 4   Global Step: 44000   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 04:33:52,652-[lfw][44000]XNorm: 23.556365
Training: 2022-01-08 04:33:52,653-[lfw][44000]Accuracy-Flip: 0.99700+-0.00314
Training: 2022-01-08 04:33:52,653-[lfw][44000]Accuracy-Highest: 0.99800
Training: 2022-01-08 04:34:45,178-[cfp_fp][44000]XNorm: 21.248845
Training: 2022-01-08 04:34:45,179-[cfp_fp][44000]Accuracy-Flip: 0.98329+-0.00515
Training: 2022-01-08 04:34:45,180-[cfp_fp][44000]Accuracy-Highest: 0.98600
Training: 2022-01-08 04:35:30,701-[agedb_30][44000]XNorm: 23.516151
Training: 2022-01-08 04:35:30,701-[agedb_30][44000]Accuracy-Flip: 0.96950+-0.00723
Training: 2022-01-08 04:35:30,702-[agedb_30][44000]Accuracy-Highest: 0.97250
Training: 2022-01-08 04:35:38,319-Speed 272.58 samples/sec   Loss 9.1919   LearningRate 0.2063   Epoch: 4   Global Step: 44010   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 04:35:45,831-Speed 5454.26 samples/sec   Loss 9.1334   LearningRate 0.2063   Epoch: 4   Global Step: 44020   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:35:53,319-Speed 5471.62 samples/sec   Loss 9.1505   LearningRate 0.2062   Epoch: 4   Global Step: 44030   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:36:00,757-Speed 5508.37 samples/sec   Loss 9.1529   LearningRate 0.2062   Epoch: 4   Global Step: 44040   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:36:08,320-Speed 5416.34 samples/sec   Loss 9.2418   LearningRate 0.2062   Epoch: 4   Global Step: 44050   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:36:15,836-Speed 5450.77 samples/sec   Loss 9.2004   LearningRate 0.2062   Epoch: 4   Global Step: 44060   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:36:23,291-Speed 5495.16 samples/sec   Loss 9.1994   LearningRate 0.2061   Epoch: 4   Global Step: 44070   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:36:30,851-Speed 5418.41 samples/sec   Loss 9.1670   LearningRate 0.2061   Epoch: 4   Global Step: 44080   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:36:38,418-Speed 5413.73 samples/sec   Loss 9.2242   LearningRate 0.2061   Epoch: 4   Global Step: 44090   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:36:45,893-Speed 5480.62 samples/sec   Loss 9.2312   LearningRate 0.2061   Epoch: 4   Global Step: 44100   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:36:53,366-Speed 5482.07 samples/sec   Loss 9.2209   LearningRate 0.2060   Epoch: 4   Global Step: 44110   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:37:00,893-Speed 5442.34 samples/sec   Loss 9.1944   LearningRate 0.2060   Epoch: 4   Global Step: 44120   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:37:08,411-Speed 5448.99 samples/sec   Loss 9.1691   LearningRate 0.2060   Epoch: 4   Global Step: 44130   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:37:15,858-Speed 5500.43 samples/sec   Loss 9.1351   LearningRate 0.2060   Epoch: 4   Global Step: 44140   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:37:23,266-Speed 5530.05 samples/sec   Loss 9.1640   LearningRate 0.2059   Epoch: 4   Global Step: 44150   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:37:30,834-Speed 5412.81 samples/sec   Loss 9.1094   LearningRate 0.2059   Epoch: 4   Global Step: 44160   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:37:38,430-Speed 5393.04 samples/sec   Loss 9.2770   LearningRate 0.2059   Epoch: 4   Global Step: 44170   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:37:46,064-Speed 5366.51 samples/sec   Loss 9.2047   LearningRate 0.2059   Epoch: 4   Global Step: 44180   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:37:53,561-Speed 5464.70 samples/sec   Loss 9.1667   LearningRate 0.2058   Epoch: 4   Global Step: 44190   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:38:01,094-Speed 5437.60 samples/sec   Loss 9.1590   LearningRate 0.2058   Epoch: 4   Global Step: 44200   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:38:08,811-Speed 5308.63 samples/sec   Loss 9.1940   LearningRate 0.2058   Epoch: 4   Global Step: 44210   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:38:16,372-Speed 5418.16 samples/sec   Loss 9.1606   LearningRate 0.2058   Epoch: 4   Global Step: 44220   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:38:23,910-Speed 5434.14 samples/sec   Loss 9.1498   LearningRate 0.2057   Epoch: 4   Global Step: 44230   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:38:31,443-Speed 5438.35 samples/sec   Loss 9.2469   LearningRate 0.2057   Epoch: 4   Global Step: 44240   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:38:38,991-Speed 5426.94 samples/sec   Loss 9.1852   LearningRate 0.2057   Epoch: 4   Global Step: 44250   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:38:46,526-Speed 5436.82 samples/sec   Loss 9.2092   LearningRate 0.2057   Epoch: 4   Global Step: 44260   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:38:54,099-Speed 5409.32 samples/sec   Loss 9.1451   LearningRate 0.2056   Epoch: 4   Global Step: 44270   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:39:01,603-Speed 5459.34 samples/sec   Loss 9.2355   LearningRate 0.2056   Epoch: 4   Global Step: 44280   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:39:09,119-Speed 5450.66 samples/sec   Loss 9.2110   LearningRate 0.2056   Epoch: 4   Global Step: 44290   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:39:16,659-Speed 5433.17 samples/sec   Loss 9.1066   LearningRate 0.2056   Epoch: 4   Global Step: 44300   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:39:24,312-Speed 5352.68 samples/sec   Loss 9.1866   LearningRate 0.2055   Epoch: 4   Global Step: 44310   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:39:31,794-Speed 5475.00 samples/sec   Loss 9.2165   LearningRate 0.2055   Epoch: 4   Global Step: 44320   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:39:39,264-Speed 5483.82 samples/sec   Loss 9.1346   LearningRate 0.2055   Epoch: 4   Global Step: 44330   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:39:46,759-Speed 5465.56 samples/sec   Loss 9.1866   LearningRate 0.2055   Epoch: 4   Global Step: 44340   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:39:54,311-Speed 5424.71 samples/sec   Loss 9.1343   LearningRate 0.2054   Epoch: 4   Global Step: 44350   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:40:02,163-Speed 5217.27 samples/sec   Loss 9.2331   LearningRate 0.2054   Epoch: 4   Global Step: 44360   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:40:09,757-Speed 5394.33 samples/sec   Loss 9.1492   LearningRate 0.2054   Epoch: 4   Global Step: 44370   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:40:17,330-Speed 5409.31 samples/sec   Loss 9.1538   LearningRate 0.2054   Epoch: 4   Global Step: 44380   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:40:24,846-Speed 5450.08 samples/sec   Loss 9.0995   LearningRate 0.2053   Epoch: 4   Global Step: 44390   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:40:32,379-Speed 5438.72 samples/sec   Loss 9.1511   LearningRate 0.2053   Epoch: 4   Global Step: 44400   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:40:39,916-Speed 5435.37 samples/sec   Loss 9.1611   LearningRate 0.2053   Epoch: 4   Global Step: 44410   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:40:47,536-Speed 5375.78 samples/sec   Loss 9.1303   LearningRate 0.2053   Epoch: 4   Global Step: 44420   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:40:55,005-Speed 5484.49 samples/sec   Loss 9.1399   LearningRate 0.2052   Epoch: 4   Global Step: 44430   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:41:02,478-Speed 5482.30 samples/sec   Loss 9.1359   LearningRate 0.2052   Epoch: 4   Global Step: 44440   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:41:09,924-Speed 5501.88 samples/sec   Loss 9.1355   LearningRate 0.2052   Epoch: 4   Global Step: 44450   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:41:17,510-Speed 5399.74 samples/sec   Loss 9.1860   LearningRate 0.2052   Epoch: 4   Global Step: 44460   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:41:25,090-Speed 5404.59 samples/sec   Loss 9.1421   LearningRate 0.2051   Epoch: 4   Global Step: 44470   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:41:32,617-Speed 5442.18 samples/sec   Loss 9.1992   LearningRate 0.2051   Epoch: 4   Global Step: 44480   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:41:40,142-Speed 5444.27 samples/sec   Loss 9.2101   LearningRate 0.2051   Epoch: 4   Global Step: 44490   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:41:47,646-Speed 5459.44 samples/sec   Loss 9.1643   LearningRate 0.2051   Epoch: 4   Global Step: 44500   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:41:55,178-Speed 5438.87 samples/sec   Loss 9.1807   LearningRate 0.2050   Epoch: 4   Global Step: 44510   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:42:02,654-Speed 5479.32 samples/sec   Loss 9.1981   LearningRate 0.2050   Epoch: 4   Global Step: 44520   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:42:10,165-Speed 5453.89 samples/sec   Loss 9.0902   LearningRate 0.2050   Epoch: 4   Global Step: 44530   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:42:17,784-Speed 5377.18 samples/sec   Loss 9.1291   LearningRate 0.2050   Epoch: 4   Global Step: 44540   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:42:25,310-Speed 5442.84 samples/sec   Loss 9.1503   LearningRate 0.2049   Epoch: 4   Global Step: 44550   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:42:32,887-Speed 5407.00 samples/sec   Loss 9.0983   LearningRate 0.2049   Epoch: 4   Global Step: 44560   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:42:40,503-Speed 5378.70 samples/sec   Loss 9.1275   LearningRate 0.2049   Epoch: 4   Global Step: 44570   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:42:48,061-Speed 5420.44 samples/sec   Loss 9.1182   LearningRate 0.2049   Epoch: 4   Global Step: 44580   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:42:55,640-Speed 5404.95 samples/sec   Loss 9.2183   LearningRate 0.2048   Epoch: 4   Global Step: 44590   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:43:03,258-Speed 5377.51 samples/sec   Loss 9.2029   LearningRate 0.2048   Epoch: 4   Global Step: 44600   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:43:10,870-Speed 5381.35 samples/sec   Loss 9.1309   LearningRate 0.2048   Epoch: 4   Global Step: 44610   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:43:18,595-Speed 5302.91 samples/sec   Loss 9.0653   LearningRate 0.2048   Epoch: 4   Global Step: 44620   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:43:26,401-Speed 5248.30 samples/sec   Loss 9.1558   LearningRate 0.2047   Epoch: 4   Global Step: 44630   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:43:33,979-Speed 5405.16 samples/sec   Loss 9.1104   LearningRate 0.2047   Epoch: 4   Global Step: 44640   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:43:41,529-Speed 5426.47 samples/sec   Loss 9.0717   LearningRate 0.2047   Epoch: 4   Global Step: 44650   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:43:48,988-Speed 5491.68 samples/sec   Loss 9.1783   LearningRate 0.2047   Epoch: 4   Global Step: 44660   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:43:56,575-Speed 5399.90 samples/sec   Loss 9.1826   LearningRate 0.2046   Epoch: 4   Global Step: 44670   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:44:04,192-Speed 5377.71 samples/sec   Loss 9.1236   LearningRate 0.2046   Epoch: 4   Global Step: 44680   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:44:11,809-Speed 5378.32 samples/sec   Loss 9.1477   LearningRate 0.2046   Epoch: 4   Global Step: 44690   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:44:19,355-Speed 5428.83 samples/sec   Loss 9.1653   LearningRate 0.2046   Epoch: 4   Global Step: 44700   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:44:27,024-Speed 5341.94 samples/sec   Loss 9.1223   LearningRate 0.2045   Epoch: 4   Global Step: 44710   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:44:34,613-Speed 5397.64 samples/sec   Loss 9.1097   LearningRate 0.2045   Epoch: 4   Global Step: 44720   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:44:42,154-Speed 5432.03 samples/sec   Loss 9.2089   LearningRate 0.2045   Epoch: 4   Global Step: 44730   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:44:49,681-Speed 5442.86 samples/sec   Loss 9.2182   LearningRate 0.2045   Epoch: 4   Global Step: 44740   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:44:57,187-Speed 5457.98 samples/sec   Loss 9.1381   LearningRate 0.2044   Epoch: 4   Global Step: 44750   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:45:04,755-Speed 5412.69 samples/sec   Loss 9.2367   LearningRate 0.2044   Epoch: 4   Global Step: 44760   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:45:12,319-Speed 5415.76 samples/sec   Loss 9.1035   LearningRate 0.2044   Epoch: 4   Global Step: 44770   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:45:19,915-Speed 5393.09 samples/sec   Loss 9.1226   LearningRate 0.2044   Epoch: 4   Global Step: 44780   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 04:45:27,492-Speed 5406.24 samples/sec   Loss 9.0920   LearningRate 0.2043   Epoch: 4   Global Step: 44790   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 04:45:35,047-Speed 5422.21 samples/sec   Loss 9.1605   LearningRate 0.2043   Epoch: 4   Global Step: 44800   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 04:45:42,581-Speed 5437.43 samples/sec   Loss 9.1145   LearningRate 0.2043   Epoch: 4   Global Step: 44810   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 04:45:50,111-Speed 5440.15 samples/sec   Loss 9.0923   LearningRate 0.2043   Epoch: 4   Global Step: 44820   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 04:45:57,747-Speed 5364.74 samples/sec   Loss 9.0807   LearningRate 0.2042   Epoch: 4   Global Step: 44830   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 04:46:05,296-Speed 5426.86 samples/sec   Loss 9.1551   LearningRate 0.2042   Epoch: 4   Global Step: 44840   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 04:46:12,898-Speed 5388.49 samples/sec   Loss 9.2039   LearningRate 0.2042   Epoch: 4   Global Step: 44850   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 04:46:20,510-Speed 5381.77 samples/sec   Loss 9.0664   LearningRate 0.2042   Epoch: 4   Global Step: 44860   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 04:46:28,022-Speed 5453.81 samples/sec   Loss 9.1088   LearningRate 0.2041   Epoch: 4   Global Step: 44870   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 04:46:35,591-Speed 5411.94 samples/sec   Loss 9.0809   LearningRate 0.2041   Epoch: 4   Global Step: 44880   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:46:43,097-Speed 5457.15 samples/sec   Loss 9.0930   LearningRate 0.2041   Epoch: 4   Global Step: 44890   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:46:50,684-Speed 5399.64 samples/sec   Loss 9.1272   LearningRate 0.2041   Epoch: 4   Global Step: 44900   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:46:58,302-Speed 5377.83 samples/sec   Loss 9.0943   LearningRate 0.2040   Epoch: 4   Global Step: 44910   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:47:06,090-Speed 5260.11 samples/sec   Loss 9.0692   LearningRate 0.2040   Epoch: 4   Global Step: 44920   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:47:13,736-Speed 5357.13 samples/sec   Loss 9.1256   LearningRate 0.2040   Epoch: 4   Global Step: 44930   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:47:21,363-Speed 5371.71 samples/sec   Loss 9.0408   LearningRate 0.2040   Epoch: 4   Global Step: 44940   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:47:28,987-Speed 5373.51 samples/sec   Loss 9.1082   LearningRate 0.2039   Epoch: 4   Global Step: 44950   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:47:36,637-Speed 5354.24 samples/sec   Loss 9.1129   LearningRate 0.2039   Epoch: 4   Global Step: 44960   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:47:44,146-Speed 5455.50 samples/sec   Loss 9.0748   LearningRate 0.2039   Epoch: 4   Global Step: 44970   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:47:51,607-Speed 5490.80 samples/sec   Loss 9.0761   LearningRate 0.2039   Epoch: 4   Global Step: 44980   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:47:59,193-Speed 5400.68 samples/sec   Loss 9.1650   LearningRate 0.2038   Epoch: 4   Global Step: 44990   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:48:06,844-Speed 5354.10 samples/sec   Loss 9.0382   LearningRate 0.2038   Epoch: 4   Global Step: 45000   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:48:14,482-Speed 5363.26 samples/sec   Loss 9.0349   LearningRate 0.2038   Epoch: 4   Global Step: 45010   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:48:22,036-Speed 5423.03 samples/sec   Loss 9.1199   LearningRate 0.2038   Epoch: 4   Global Step: 45020   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:48:29,747-Speed 5313.34 samples/sec   Loss 9.0434   LearningRate 0.2037   Epoch: 4   Global Step: 45030   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:48:37,414-Speed 5342.71 samples/sec   Loss 9.1012   LearningRate 0.2037   Epoch: 4   Global Step: 45040   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:48:44,986-Speed 5410.00 samples/sec   Loss 9.1504   LearningRate 0.2037   Epoch: 4   Global Step: 45050   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:48:52,575-Speed 5397.84 samples/sec   Loss 9.2329   LearningRate 0.2036   Epoch: 4   Global Step: 45060   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:49:00,205-Speed 5369.39 samples/sec   Loss 9.1457   LearningRate 0.2036   Epoch: 4   Global Step: 45070   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:49:07,904-Speed 5320.54 samples/sec   Loss 9.0674   LearningRate 0.2036   Epoch: 4   Global Step: 45080   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:49:15,572-Speed 5342.32 samples/sec   Loss 9.1551   LearningRate 0.2036   Epoch: 4   Global Step: 45090   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:49:23,138-Speed 5414.92 samples/sec   Loss 9.0531   LearningRate 0.2035   Epoch: 4   Global Step: 45100   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:49:30,818-Speed 5334.14 samples/sec   Loss 9.1882   LearningRate 0.2035   Epoch: 4   Global Step: 45110   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:49:38,542-Speed 5303.65 samples/sec   Loss 9.0677   LearningRate 0.2035   Epoch: 4   Global Step: 45120   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:49:46,249-Speed 5314.82 samples/sec   Loss 9.1532   LearningRate 0.2035   Epoch: 4   Global Step: 45130   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:49:53,873-Speed 5373.56 samples/sec   Loss 9.1603   LearningRate 0.2034   Epoch: 4   Global Step: 45140   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:50:01,463-Speed 5397.50 samples/sec   Loss 9.0949   LearningRate 0.2034   Epoch: 4   Global Step: 45150   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:50:08,953-Speed 5469.36 samples/sec   Loss 9.2165   LearningRate 0.2034   Epoch: 4   Global Step: 45160   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:50:16,396-Speed 5503.22 samples/sec   Loss 9.1235   LearningRate 0.2034   Epoch: 4   Global Step: 45170   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:50:23,966-Speed 5411.92 samples/sec   Loss 8.9951   LearningRate 0.2033   Epoch: 4   Global Step: 45180   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:50:31,515-Speed 5427.00 samples/sec   Loss 9.1611   LearningRate 0.2033   Epoch: 4   Global Step: 45190   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:50:39,155-Speed 5361.94 samples/sec   Loss 9.1031   LearningRate 0.2033   Epoch: 4   Global Step: 45200   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:50:46,947-Speed 5257.12 samples/sec   Loss 9.0869   LearningRate 0.2033   Epoch: 4   Global Step: 45210   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:50:54,629-Speed 5332.88 samples/sec   Loss 9.1016   LearningRate 0.2032   Epoch: 4   Global Step: 45220   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:51:02,314-Speed 5330.82 samples/sec   Loss 9.0952   LearningRate 0.2032   Epoch: 4   Global Step: 45230   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:51:09,902-Speed 5398.14 samples/sec   Loss 9.1727   LearningRate 0.2032   Epoch: 4   Global Step: 45240   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:51:17,560-Speed 5349.62 samples/sec   Loss 9.0298   LearningRate 0.2032   Epoch: 4   Global Step: 45250   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:51:25,134-Speed 5408.52 samples/sec   Loss 9.1311   LearningRate 0.2031   Epoch: 4   Global Step: 45260   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:51:32,730-Speed 5392.95 samples/sec   Loss 9.0658   LearningRate 0.2031   Epoch: 4   Global Step: 45270   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:51:40,339-Speed 5383.75 samples/sec   Loss 9.2031   LearningRate 0.2031   Epoch: 4   Global Step: 45280   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:51:47,884-Speed 5429.00 samples/sec   Loss 9.0990   LearningRate 0.2031   Epoch: 4   Global Step: 45290   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:51:55,502-Speed 5378.28 samples/sec   Loss 9.0985   LearningRate 0.2030   Epoch: 4   Global Step: 45300   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:52:03,166-Speed 5345.10 samples/sec   Loss 9.0893   LearningRate 0.2030   Epoch: 4   Global Step: 45310   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:52:10,847-Speed 5333.04 samples/sec   Loss 9.0842   LearningRate 0.2030   Epoch: 4   Global Step: 45320   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:52:18,537-Speed 5327.11 samples/sec   Loss 9.0493   LearningRate 0.2030   Epoch: 4   Global Step: 45330   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:52:26,159-Speed 5374.55 samples/sec   Loss 9.0814   LearningRate 0.2029   Epoch: 4   Global Step: 45340   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 04:52:33,863-Speed 5317.66 samples/sec   Loss 9.1562   LearningRate 0.2029   Epoch: 4   Global Step: 45350   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 04:52:41,412-Speed 5426.18 samples/sec   Loss 9.1083   LearningRate 0.2029   Epoch: 4   Global Step: 45360   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 04:52:49,010-Speed 5391.36 samples/sec   Loss 9.0291   LearningRate 0.2029   Epoch: 4   Global Step: 45370   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 04:52:56,547-Speed 5436.50 samples/sec   Loss 9.0492   LearningRate 0.2028   Epoch: 4   Global Step: 45380   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 04:53:04,064-Speed 5449.50 samples/sec   Loss 9.0832   LearningRate 0.2028   Epoch: 4   Global Step: 45390   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 04:53:11,657-Speed 5395.43 samples/sec   Loss 9.1098   LearningRate 0.2028   Epoch: 4   Global Step: 45400   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 04:53:19,272-Speed 5378.77 samples/sec   Loss 9.0880   LearningRate 0.2028   Epoch: 4   Global Step: 45410   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 04:53:26,875-Speed 5388.70 samples/sec   Loss 9.1028   LearningRate 0.2027   Epoch: 4   Global Step: 45420   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 04:53:34,506-Speed 5368.12 samples/sec   Loss 9.0750   LearningRate 0.2027   Epoch: 4   Global Step: 45430   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 04:53:42,081-Speed 5408.34 samples/sec   Loss 9.0618   LearningRate 0.2027   Epoch: 4   Global Step: 45440   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:53:49,694-Speed 5380.33 samples/sec   Loss 9.0574   LearningRate 0.2027   Epoch: 4   Global Step: 45450   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:53:57,188-Speed 5467.11 samples/sec   Loss 9.1189   LearningRate 0.2026   Epoch: 4   Global Step: 45460   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:54:04,762-Speed 5408.68 samples/sec   Loss 9.1120   LearningRate 0.2026   Epoch: 4   Global Step: 45470   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:54:12,366-Speed 5386.90 samples/sec   Loss 9.1014   LearningRate 0.2026   Epoch: 4   Global Step: 45480   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:54:20,034-Speed 5342.50 samples/sec   Loss 9.0141   LearningRate 0.2026   Epoch: 4   Global Step: 45490   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:54:27,603-Speed 5412.65 samples/sec   Loss 9.0875   LearningRate 0.2025   Epoch: 4   Global Step: 45500   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:54:35,185-Speed 5402.96 samples/sec   Loss 9.0815   LearningRate 0.2025   Epoch: 4   Global Step: 45510   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:54:42,783-Speed 5391.30 samples/sec   Loss 9.0477   LearningRate 0.2025   Epoch: 4   Global Step: 45520   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:54:50,392-Speed 5383.82 samples/sec   Loss 9.0992   LearningRate 0.2025   Epoch: 4   Global Step: 45530   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:54:57,951-Speed 5419.67 samples/sec   Loss 9.0869   LearningRate 0.2024   Epoch: 4   Global Step: 45540   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:55:05,681-Speed 5299.47 samples/sec   Loss 9.0563   LearningRate 0.2024   Epoch: 4   Global Step: 45550   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:55:13,337-Speed 5351.13 samples/sec   Loss 8.9586   LearningRate 0.2024   Epoch: 4   Global Step: 45560   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:55:20,976-Speed 5362.17 samples/sec   Loss 9.0385   LearningRate 0.2024   Epoch: 4   Global Step: 45570   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:55:28,478-Speed 5460.15 samples/sec   Loss 9.0400   LearningRate 0.2023   Epoch: 4   Global Step: 45580   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:55:36,194-Speed 5309.81 samples/sec   Loss 9.0145   LearningRate 0.2023   Epoch: 4   Global Step: 45590   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:55:43,751-Speed 5420.83 samples/sec   Loss 9.1040   LearningRate 0.2023   Epoch: 4   Global Step: 45600   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:55:51,316-Speed 5414.62 samples/sec   Loss 9.0446   LearningRate 0.2023   Epoch: 4   Global Step: 45610   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:55:58,877-Speed 5417.87 samples/sec   Loss 9.1034   LearningRate 0.2022   Epoch: 4   Global Step: 45620   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:56:06,404-Speed 5442.24 samples/sec   Loss 9.0270   LearningRate 0.2022   Epoch: 4   Global Step: 45630   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:56:13,997-Speed 5395.48 samples/sec   Loss 8.9813   LearningRate 0.2022   Epoch: 4   Global Step: 45640   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:56:21,526-Speed 5441.05 samples/sec   Loss 9.0291   LearningRate 0.2022   Epoch: 4   Global Step: 45650   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:56:29,127-Speed 5389.42 samples/sec   Loss 9.0210   LearningRate 0.2021   Epoch: 4   Global Step: 45660   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:56:36,708-Speed 5403.39 samples/sec   Loss 9.0721   LearningRate 0.2021   Epoch: 4   Global Step: 45670   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:56:44,286-Speed 5406.19 samples/sec   Loss 9.0194   LearningRate 0.2021   Epoch: 4   Global Step: 45680   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:56:51,933-Speed 5356.99 samples/sec   Loss 9.0608   LearningRate 0.2021   Epoch: 4   Global Step: 45690   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:56:59,525-Speed 5395.75 samples/sec   Loss 9.0508   LearningRate 0.2020   Epoch: 4   Global Step: 45700   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:57:07,271-Speed 5288.75 samples/sec   Loss 9.0129   LearningRate 0.2020   Epoch: 4   Global Step: 45710   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:57:14,863-Speed 5395.67 samples/sec   Loss 9.0631   LearningRate 0.2020   Epoch: 4   Global Step: 45720   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:57:22,421-Speed 5419.97 samples/sec   Loss 9.1064   LearningRate 0.2020   Epoch: 4   Global Step: 45730   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:57:29,904-Speed 5474.39 samples/sec   Loss 9.0210   LearningRate 0.2019   Epoch: 4   Global Step: 45740   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:57:37,501-Speed 5392.21 samples/sec   Loss 9.0894   LearningRate 0.2019   Epoch: 4   Global Step: 45750   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:57:45,012-Speed 5454.50 samples/sec   Loss 9.0906   LearningRate 0.2019   Epoch: 4   Global Step: 45760   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:57:52,564-Speed 5424.42 samples/sec   Loss 9.0327   LearningRate 0.2019   Epoch: 4   Global Step: 45770   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:58:00,101-Speed 5434.96 samples/sec   Loss 9.0164   LearningRate 0.2018   Epoch: 4   Global Step: 45780   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:58:07,637-Speed 5436.11 samples/sec   Loss 8.9897   LearningRate 0.2018   Epoch: 4   Global Step: 45790   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:58:15,238-Speed 5389.24 samples/sec   Loss 9.0828   LearningRate 0.2018   Epoch: 4   Global Step: 45800   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:58:22,757-Speed 5448.24 samples/sec   Loss 9.0358   LearningRate 0.2018   Epoch: 4   Global Step: 45810   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:58:30,366-Speed 5383.95 samples/sec   Loss 9.0962   LearningRate 0.2017   Epoch: 4   Global Step: 45820   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:58:37,905-Speed 5433.74 samples/sec   Loss 9.0303   LearningRate 0.2017   Epoch: 4   Global Step: 45830   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:58:45,422-Speed 5449.63 samples/sec   Loss 9.0441   LearningRate 0.2017   Epoch: 4   Global Step: 45840   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:58:53,092-Speed 5341.12 samples/sec   Loss 9.0731   LearningRate 0.2017   Epoch: 4   Global Step: 45850   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:59:00,597-Speed 5458.45 samples/sec   Loss 9.1010   LearningRate 0.2016   Epoch: 4   Global Step: 45860   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:59:08,248-Speed 5354.06 samples/sec   Loss 9.0678   LearningRate 0.2016   Epoch: 4   Global Step: 45870   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:59:16,005-Speed 5281.18 samples/sec   Loss 9.0874   LearningRate 0.2016   Epoch: 4   Global Step: 45880   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:59:23,599-Speed 5394.07 samples/sec   Loss 9.1006   LearningRate 0.2016   Epoch: 4   Global Step: 45890   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:59:31,204-Speed 5387.40 samples/sec   Loss 9.0273   LearningRate 0.2015   Epoch: 4   Global Step: 45900   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:59:38,757-Speed 5423.46 samples/sec   Loss 9.1019   LearningRate 0.2015   Epoch: 4   Global Step: 45910   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 04:59:46,268-Speed 5453.92 samples/sec   Loss 9.0680   LearningRate 0.2015   Epoch: 4   Global Step: 45920   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 04:59:53,832-Speed 5416.09 samples/sec   Loss 9.0567   LearningRate 0.2015   Epoch: 4   Global Step: 45930   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 05:00:01,373-Speed 5432.64 samples/sec   Loss 9.1061   LearningRate 0.2014   Epoch: 4   Global Step: 45940   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 05:00:08,846-Speed 5481.44 samples/sec   Loss 9.0480   LearningRate 0.2014   Epoch: 4   Global Step: 45950   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:00:16,315-Speed 5484.77 samples/sec   Loss 9.0779   LearningRate 0.2014   Epoch: 4   Global Step: 45960   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:00:23,931-Speed 5379.06 samples/sec   Loss 9.0107   LearningRate 0.2014   Epoch: 4   Global Step: 45970   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:00:31,518-Speed 5399.33 samples/sec   Loss 9.0615   LearningRate 0.2013   Epoch: 4   Global Step: 45980   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:00:39,005-Speed 5471.74 samples/sec   Loss 9.0759   LearningRate 0.2013   Epoch: 4   Global Step: 45990   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:00:46,514-Speed 5455.28 samples/sec   Loss 8.9841   LearningRate 0.2013   Epoch: 4   Global Step: 46000   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:01:31,136-[lfw][46000]XNorm: 23.927140
Training: 2022-01-08 05:01:31,136-[lfw][46000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-01-08 05:01:31,137-[lfw][46000]Accuracy-Highest: 0.99817
Training: 2022-01-08 05:02:23,702-[cfp_fp][46000]XNorm: 21.717181
Training: 2022-01-08 05:02:23,703-[cfp_fp][46000]Accuracy-Flip: 0.98486+-0.00543
Training: 2022-01-08 05:02:23,704-[cfp_fp][46000]Accuracy-Highest: 0.98600
Training: 2022-01-08 05:03:09,184-[agedb_30][46000]XNorm: 23.814232
Training: 2022-01-08 05:03:09,186-[agedb_30][46000]Accuracy-Flip: 0.97050+-0.00796
Training: 2022-01-08 05:03:09,186-[agedb_30][46000]Accuracy-Highest: 0.97250
Training: 2022-01-08 05:03:16,968-Speed 272.25 samples/sec   Loss 9.0464   LearningRate 0.2013   Epoch: 4   Global Step: 46010   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:03:24,575-Speed 5386.89 samples/sec   Loss 9.0081   LearningRate 0.2012   Epoch: 4   Global Step: 46020   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:03:32,070-Speed 5466.26 samples/sec   Loss 9.0485   LearningRate 0.2012   Epoch: 4   Global Step: 46030   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:03:39,563-Speed 5468.21 samples/sec   Loss 9.0250   LearningRate 0.2012   Epoch: 4   Global Step: 46040   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:03:47,163-Speed 5389.69 samples/sec   Loss 9.0825   LearningRate 0.2012   Epoch: 4   Global Step: 46050   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:03:54,789-Speed 5372.79 samples/sec   Loss 9.0459   LearningRate 0.2011   Epoch: 4   Global Step: 46060   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:04:02,394-Speed 5386.96 samples/sec   Loss 9.0480   LearningRate 0.2011   Epoch: 4   Global Step: 46070   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:04:09,993-Speed 5390.89 samples/sec   Loss 9.0591   LearningRate 0.2011   Epoch: 4   Global Step: 46080   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:04:17,552-Speed 5419.49 samples/sec   Loss 9.0684   LearningRate 0.2011   Epoch: 4   Global Step: 46090   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:04:25,097-Speed 5429.33 samples/sec   Loss 9.0891   LearningRate 0.2010   Epoch: 4   Global Step: 46100   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:04:32,649-Speed 5424.52 samples/sec   Loss 9.0707   LearningRate 0.2010   Epoch: 4   Global Step: 46110   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:04:40,298-Speed 5355.37 samples/sec   Loss 9.0578   LearningRate 0.2010   Epoch: 4   Global Step: 46120   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:04:47,794-Speed 5465.06 samples/sec   Loss 8.9750   LearningRate 0.2010   Epoch: 4   Global Step: 46130   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:04:55,464-Speed 5341.24 samples/sec   Loss 8.9829   LearningRate 0.2009   Epoch: 4   Global Step: 46140   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:05:03,020-Speed 5421.65 samples/sec   Loss 9.0374   LearningRate 0.2009   Epoch: 4   Global Step: 46150   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 05:05:10,535-Speed 5450.84 samples/sec   Loss 9.0948   LearningRate 0.2009   Epoch: 4   Global Step: 46160   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 05:05:18,020-Speed 5472.97 samples/sec   Loss 9.0068   LearningRate 0.2009   Epoch: 4   Global Step: 46170   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 05:05:25,506-Speed 5471.78 samples/sec   Loss 9.1317   LearningRate 0.2008   Epoch: 4   Global Step: 46180   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 05:05:33,058-Speed 5424.88 samples/sec   Loss 9.1094   LearningRate 0.2008   Epoch: 4   Global Step: 46190   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 05:05:40,522-Speed 5488.20 samples/sec   Loss 9.0522   LearningRate 0.2008   Epoch: 4   Global Step: 46200   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 05:05:48,026-Speed 5459.19 samples/sec   Loss 8.9966   LearningRate 0.2008   Epoch: 4   Global Step: 46210   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 05:05:55,515-Speed 5470.00 samples/sec   Loss 9.0423   LearningRate 0.2007   Epoch: 4   Global Step: 46220   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 05:06:03,155-Speed 5361.86 samples/sec   Loss 9.0410   LearningRate 0.2007   Epoch: 4   Global Step: 46230   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:06:10,674-Speed 5448.50 samples/sec   Loss 9.0409   LearningRate 0.2007   Epoch: 4   Global Step: 46240   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:06:18,269-Speed 5393.51 samples/sec   Loss 9.0412   LearningRate 0.2007   Epoch: 4   Global Step: 46250   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:06:25,906-Speed 5364.30 samples/sec   Loss 9.0507   LearningRate 0.2007   Epoch: 4   Global Step: 46260   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:06:33,519-Speed 5380.86 samples/sec   Loss 9.0364   LearningRate 0.2006   Epoch: 4   Global Step: 46270   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:06:41,032-Speed 5452.64 samples/sec   Loss 9.0022   LearningRate 0.2006   Epoch: 4   Global Step: 46280   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:06:48,547-Speed 5451.50 samples/sec   Loss 9.0554   LearningRate 0.2006   Epoch: 4   Global Step: 46290   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:06:56,206-Speed 5348.28 samples/sec   Loss 9.0333   LearningRate 0.2006   Epoch: 4   Global Step: 46300   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:07:03,789-Speed 5402.77 samples/sec   Loss 8.9991   LearningRate 0.2005   Epoch: 4   Global Step: 46310   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:07:11,264-Speed 5480.20 samples/sec   Loss 9.0520   LearningRate 0.2005   Epoch: 4   Global Step: 46320   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:07:18,920-Speed 5350.63 samples/sec   Loss 8.9687   LearningRate 0.2005   Epoch: 4   Global Step: 46330   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 05:07:26,482-Speed 5417.17 samples/sec   Loss 9.0527   LearningRate 0.2005   Epoch: 4   Global Step: 46340   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 05:07:33,932-Speed 5499.08 samples/sec   Loss 9.0815   LearningRate 0.2004   Epoch: 4   Global Step: 46350   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:07:41,585-Speed 5353.30 samples/sec   Loss 9.0111   LearningRate 0.2004   Epoch: 4   Global Step: 46360   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:07:49,121-Speed 5435.57 samples/sec   Loss 9.0825   LearningRate 0.2004   Epoch: 4   Global Step: 46370   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:07:56,640-Speed 5448.54 samples/sec   Loss 8.9198   LearningRate 0.2004   Epoch: 4   Global Step: 46380   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:08:04,226-Speed 5399.88 samples/sec   Loss 9.0405   LearningRate 0.2003   Epoch: 4   Global Step: 46390   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:08:11,744-Speed 5448.80 samples/sec   Loss 9.0135   LearningRate 0.2003   Epoch: 4   Global Step: 46400   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:08:19,238-Speed 5466.92 samples/sec   Loss 9.0875   LearningRate 0.2003   Epoch: 4   Global Step: 46410   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:08:26,857-Speed 5376.59 samples/sec   Loss 8.8704   LearningRate 0.2003   Epoch: 4   Global Step: 46420   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:08:34,389-Speed 5438.44 samples/sec   Loss 9.0454   LearningRate 0.2002   Epoch: 4   Global Step: 46430   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:08:42,019-Speed 5369.11 samples/sec   Loss 9.0960   LearningRate 0.2002   Epoch: 4   Global Step: 46440   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:08:49,575-Speed 5421.90 samples/sec   Loss 9.0438   LearningRate 0.2002   Epoch: 4   Global Step: 46450   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:08:57,154-Speed 5404.77 samples/sec   Loss 9.0096   LearningRate 0.2002   Epoch: 4   Global Step: 46460   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:09:04,689-Speed 5436.82 samples/sec   Loss 9.0156   LearningRate 0.2001   Epoch: 4   Global Step: 46470   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:09:12,176-Speed 5471.67 samples/sec   Loss 9.0256   LearningRate 0.2001   Epoch: 4   Global Step: 46480   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:09:19,738-Speed 5417.35 samples/sec   Loss 8.9972   LearningRate 0.2001   Epoch: 4   Global Step: 46490   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:09:27,382-Speed 5358.86 samples/sec   Loss 8.9799   LearningRate 0.2001   Epoch: 4   Global Step: 46500   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:09:34,924-Speed 5431.75 samples/sec   Loss 9.0023   LearningRate 0.2000   Epoch: 4   Global Step: 46510   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:09:42,503-Speed 5404.85 samples/sec   Loss 9.0306   LearningRate 0.2000   Epoch: 4   Global Step: 46520   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:09:50,065-Speed 5417.45 samples/sec   Loss 9.0290   LearningRate 0.2000   Epoch: 4   Global Step: 46530   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:09:57,691-Speed 5371.83 samples/sec   Loss 8.9931   LearningRate 0.2000   Epoch: 4   Global Step: 46540   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:10:05,176-Speed 5472.80 samples/sec   Loss 9.0103   LearningRate 0.1999   Epoch: 4   Global Step: 46550   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:10:12,724-Speed 5427.86 samples/sec   Loss 9.0193   LearningRate 0.1999   Epoch: 4   Global Step: 46560   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:10:20,239-Speed 5450.83 samples/sec   Loss 9.0336   LearningRate 0.1999   Epoch: 4   Global Step: 46570   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:10:27,827-Speed 5398.85 samples/sec   Loss 9.0603   LearningRate 0.1999   Epoch: 4   Global Step: 46580   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:10:35,449-Speed 5374.51 samples/sec   Loss 9.0949   LearningRate 0.1998   Epoch: 4   Global Step: 46590   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:10:43,011-Speed 5417.55 samples/sec   Loss 9.0929   LearningRate 0.1998   Epoch: 4   Global Step: 46600   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:10:50,577-Speed 5414.52 samples/sec   Loss 8.9771   LearningRate 0.1998   Epoch: 4   Global Step: 46610   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:10:58,157-Speed 5404.24 samples/sec   Loss 8.9737   LearningRate 0.1998   Epoch: 4   Global Step: 46620   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:11:05,753-Speed 5392.23 samples/sec   Loss 9.0470   LearningRate 0.1997   Epoch: 4   Global Step: 46630   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:11:13,302-Speed 5427.03 samples/sec   Loss 9.0576   LearningRate 0.1997   Epoch: 4   Global Step: 46640   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:11:20,845-Speed 5431.10 samples/sec   Loss 9.0163   LearningRate 0.1997   Epoch: 4   Global Step: 46650   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:11:28,395-Speed 5425.85 samples/sec   Loss 8.9904   LearningRate 0.1997   Epoch: 4   Global Step: 46660   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:11:35,956-Speed 5417.57 samples/sec   Loss 8.9671   LearningRate 0.1996   Epoch: 4   Global Step: 46670   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:11:43,568-Speed 5382.12 samples/sec   Loss 9.0210   LearningRate 0.1996   Epoch: 4   Global Step: 46680   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:11:51,125-Speed 5420.76 samples/sec   Loss 9.0493   LearningRate 0.1996   Epoch: 4   Global Step: 46690   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:11:58,683-Speed 5420.22 samples/sec   Loss 9.0377   LearningRate 0.1996   Epoch: 4   Global Step: 46700   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:12:06,187-Speed 5458.55 samples/sec   Loss 9.0787   LearningRate 0.1995   Epoch: 4   Global Step: 46710   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:12:13,713-Speed 5443.43 samples/sec   Loss 8.9013   LearningRate 0.1995   Epoch: 4   Global Step: 46720   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:12:21,212-Speed 5462.77 samples/sec   Loss 8.9643   LearningRate 0.1995   Epoch: 4   Global Step: 46730   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:12:28,972-Speed 5279.27 samples/sec   Loss 8.8744   LearningRate 0.1995   Epoch: 4   Global Step: 46740   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:12:36,591-Speed 5376.46 samples/sec   Loss 8.9383   LearningRate 0.1994   Epoch: 4   Global Step: 46750   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:12:44,041-Speed 5498.99 samples/sec   Loss 9.0176   LearningRate 0.1994   Epoch: 4   Global Step: 46760   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:12:51,673-Speed 5367.45 samples/sec   Loss 9.0427   LearningRate 0.1994   Epoch: 4   Global Step: 46770   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:12:59,127-Speed 5495.60 samples/sec   Loss 9.0241   LearningRate 0.1994   Epoch: 4   Global Step: 46780   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:13:06,646-Speed 5448.35 samples/sec   Loss 8.9399   LearningRate 0.1993   Epoch: 4   Global Step: 46790   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:13:14,157-Speed 5454.35 samples/sec   Loss 8.9490   LearningRate 0.1993   Epoch: 4   Global Step: 46800   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:13:21,717-Speed 5418.19 samples/sec   Loss 9.0139   LearningRate 0.1993   Epoch: 4   Global Step: 46810   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:13:29,229-Speed 5453.46 samples/sec   Loss 8.9575   LearningRate 0.1993   Epoch: 4   Global Step: 46820   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:13:36,764-Speed 5436.56 samples/sec   Loss 8.9445   LearningRate 0.1992   Epoch: 4   Global Step: 46830   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:13:44,218-Speed 5496.36 samples/sec   Loss 8.9455   LearningRate 0.1992   Epoch: 4   Global Step: 46840   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:13:51,758-Speed 5433.40 samples/sec   Loss 8.9465   LearningRate 0.1992   Epoch: 4   Global Step: 46850   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:13:59,363-Speed 5386.69 samples/sec   Loss 8.9903   LearningRate 0.1992   Epoch: 4   Global Step: 46860   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:14:06,881-Speed 5448.76 samples/sec   Loss 9.0370   LearningRate 0.1991   Epoch: 4   Global Step: 46870   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:14:14,355-Speed 5481.41 samples/sec   Loss 8.9750   LearningRate 0.1991   Epoch: 4   Global Step: 46880   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:14:21,867-Speed 5452.74 samples/sec   Loss 8.9696   LearningRate 0.1991   Epoch: 4   Global Step: 46890   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:14:29,435-Speed 5413.47 samples/sec   Loss 9.0501   LearningRate 0.1991   Epoch: 4   Global Step: 46900   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:14:36,944-Speed 5454.80 samples/sec   Loss 9.0077   LearningRate 0.1990   Epoch: 4   Global Step: 46910   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:14:44,504-Speed 5419.00 samples/sec   Loss 8.9246   LearningRate 0.1990   Epoch: 4   Global Step: 46920   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:14:52,194-Speed 5327.21 samples/sec   Loss 8.9717   LearningRate 0.1990   Epoch: 4   Global Step: 46930   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:14:59,716-Speed 5446.31 samples/sec   Loss 8.9523   LearningRate 0.1990   Epoch: 4   Global Step: 46940   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:15:07,350-Speed 5365.68 samples/sec   Loss 9.0144   LearningRate 0.1989   Epoch: 4   Global Step: 46950   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:15:15,009-Speed 5348.83 samples/sec   Loss 8.9080   LearningRate 0.1989   Epoch: 4   Global Step: 46960   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:15:22,682-Speed 5339.39 samples/sec   Loss 9.0187   LearningRate 0.1989   Epoch: 4   Global Step: 46970   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:15:30,141-Speed 5491.85 samples/sec   Loss 8.9512   LearningRate 0.1989   Epoch: 4   Global Step: 46980   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:15:37,628-Speed 5471.16 samples/sec   Loss 9.0442   LearningRate 0.1988   Epoch: 4   Global Step: 46990   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:15:45,040-Speed 5526.91 samples/sec   Loss 8.9943   LearningRate 0.1988   Epoch: 4   Global Step: 47000   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:15:52,575-Speed 5437.10 samples/sec   Loss 9.0814   LearningRate 0.1988   Epoch: 4   Global Step: 47010   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:16:00,055-Speed 5476.84 samples/sec   Loss 8.9394   LearningRate 0.1988   Epoch: 4   Global Step: 47020   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:16:07,666-Speed 5381.82 samples/sec   Loss 9.0237   LearningRate 0.1987   Epoch: 4   Global Step: 47030   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:16:15,167-Speed 5461.61 samples/sec   Loss 8.9350   LearningRate 0.1987   Epoch: 4   Global Step: 47040   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:16:22,724-Speed 5421.09 samples/sec   Loss 8.9998   LearningRate 0.1987   Epoch: 4   Global Step: 47050   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:16:30,291-Speed 5413.55 samples/sec   Loss 9.0071   LearningRate 0.1987   Epoch: 4   Global Step: 47060   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:16:37,789-Speed 5463.57 samples/sec   Loss 9.0116   LearningRate 0.1986   Epoch: 4   Global Step: 47070   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:16:45,279-Speed 5468.98 samples/sec   Loss 8.9489   LearningRate 0.1986   Epoch: 4   Global Step: 47080   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:16:52,843-Speed 5415.55 samples/sec   Loss 8.9482   LearningRate 0.1986   Epoch: 4   Global Step: 47090   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:17:00,392-Speed 5426.89 samples/sec   Loss 9.0427   LearningRate 0.1986   Epoch: 4   Global Step: 47100   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:17:08,108-Speed 5309.17 samples/sec   Loss 8.9716   LearningRate 0.1985   Epoch: 4   Global Step: 47110   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:17:15,556-Speed 5500.45 samples/sec   Loss 9.0411   LearningRate 0.1985   Epoch: 4   Global Step: 47120   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 05:17:23,021-Speed 5487.47 samples/sec   Loss 8.9308   LearningRate 0.1985   Epoch: 4   Global Step: 47130   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 05:17:30,556-Speed 5436.43 samples/sec   Loss 9.0270   LearningRate 0.1985   Epoch: 4   Global Step: 47140   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:17:38,046-Speed 5469.45 samples/sec   Loss 9.0179   LearningRate 0.1984   Epoch: 4   Global Step: 47150   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:17:45,527-Speed 5476.24 samples/sec   Loss 9.0492   LearningRate 0.1984   Epoch: 4   Global Step: 47160   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:17:53,080-Speed 5424.04 samples/sec   Loss 8.9572   LearningRate 0.1984   Epoch: 4   Global Step: 47170   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:18:00,637-Speed 5420.76 samples/sec   Loss 9.0055   LearningRate 0.1984   Epoch: 4   Global Step: 47180   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:18:08,160-Speed 5445.54 samples/sec   Loss 8.9965   LearningRate 0.1983   Epoch: 4   Global Step: 47190   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:18:15,683-Speed 5445.15 samples/sec   Loss 8.9170   LearningRate 0.1983   Epoch: 4   Global Step: 47200   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:18:23,176-Speed 5467.29 samples/sec   Loss 8.9288   LearningRate 0.1983   Epoch: 4   Global Step: 47210   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:18:30,672-Speed 5464.81 samples/sec   Loss 8.8984   LearningRate 0.1983   Epoch: 4   Global Step: 47220   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:18:38,276-Speed 5387.15 samples/sec   Loss 8.8877   LearningRate 0.1982   Epoch: 4   Global Step: 47230   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:18:45,747-Speed 5483.72 samples/sec   Loss 8.9775   LearningRate 0.1982   Epoch: 4   Global Step: 47240   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:18:53,256-Speed 5455.56 samples/sec   Loss 9.0123   LearningRate 0.1982   Epoch: 4   Global Step: 47250   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:19:00,759-Speed 5459.31 samples/sec   Loss 8.9450   LearningRate 0.1982   Epoch: 4   Global Step: 47260   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:19:08,275-Speed 5450.49 samples/sec   Loss 8.9730   LearningRate 0.1981   Epoch: 4   Global Step: 47270   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:19:15,781-Speed 5458.39 samples/sec   Loss 8.9012   LearningRate 0.1981   Epoch: 4   Global Step: 47280   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:19:23,332-Speed 5424.44 samples/sec   Loss 8.9105   LearningRate 0.1981   Epoch: 4   Global Step: 47290   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:19:30,913-Speed 5403.54 samples/sec   Loss 9.0185   LearningRate 0.1981   Epoch: 4   Global Step: 47300   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:19:38,433-Speed 5448.17 samples/sec   Loss 8.9451   LearningRate 0.1980   Epoch: 4   Global Step: 47310   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:19:45,973-Speed 5433.04 samples/sec   Loss 8.9448   LearningRate 0.1980   Epoch: 4   Global Step: 47320   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:19:53,502-Speed 5440.54 samples/sec   Loss 8.9343   LearningRate 0.1980   Epoch: 4   Global Step: 47330   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:20:01,095-Speed 5395.17 samples/sec   Loss 8.9091   LearningRate 0.1980   Epoch: 4   Global Step: 47340   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:20:08,653-Speed 5420.35 samples/sec   Loss 8.9773   LearningRate 0.1979   Epoch: 4   Global Step: 47350   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:20:16,157-Speed 5458.68 samples/sec   Loss 8.9901   LearningRate 0.1979   Epoch: 4   Global Step: 47360   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:20:23,708-Speed 5425.84 samples/sec   Loss 8.9749   LearningRate 0.1979   Epoch: 4   Global Step: 47370   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:20:31,325-Speed 5377.90 samples/sec   Loss 9.0412   LearningRate 0.1979   Epoch: 4   Global Step: 47380   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:20:38,945-Speed 5375.58 samples/sec   Loss 8.9371   LearningRate 0.1978   Epoch: 4   Global Step: 47390   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:20:46,427-Speed 5475.24 samples/sec   Loss 8.9575   LearningRate 0.1978   Epoch: 4   Global Step: 47400   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:20:53,997-Speed 5411.68 samples/sec   Loss 8.9476   LearningRate 0.1978   Epoch: 4   Global Step: 47410   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:21:01,600-Speed 5388.10 samples/sec   Loss 8.9813   LearningRate 0.1978   Epoch: 4   Global Step: 47420   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:21:09,083-Speed 5474.61 samples/sec   Loss 9.0075   LearningRate 0.1977   Epoch: 4   Global Step: 47430   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:21:16,596-Speed 5452.73 samples/sec   Loss 8.9479   LearningRate 0.1977   Epoch: 4   Global Step: 47440   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:21:24,250-Speed 5351.64 samples/sec   Loss 8.9516   LearningRate 0.1977   Epoch: 4   Global Step: 47450   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:21:31,777-Speed 5442.81 samples/sec   Loss 8.9824   LearningRate 0.1977   Epoch: 4   Global Step: 47460   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:21:39,305-Speed 5441.61 samples/sec   Loss 8.9345   LearningRate 0.1976   Epoch: 4   Global Step: 47470   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:21:46,782-Speed 5478.56 samples/sec   Loss 8.9829   LearningRate 0.1976   Epoch: 4   Global Step: 47480   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:21:54,288-Speed 5458.10 samples/sec   Loss 8.9750   LearningRate 0.1976   Epoch: 4   Global Step: 47490   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:22:01,837-Speed 5426.78 samples/sec   Loss 8.8747   LearningRate 0.1976   Epoch: 4   Global Step: 47500   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:22:09,350-Speed 5452.41 samples/sec   Loss 8.9245   LearningRate 0.1975   Epoch: 4   Global Step: 47510   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:22:16,808-Speed 5493.07 samples/sec   Loss 8.9845   LearningRate 0.1975   Epoch: 4   Global Step: 47520   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 05:22:24,310-Speed 5460.89 samples/sec   Loss 8.9408   LearningRate 0.1975   Epoch: 4   Global Step: 47530   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 05:22:31,795-Speed 5472.20 samples/sec   Loss 9.0423   LearningRate 0.1975   Epoch: 4   Global Step: 47540   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:22:39,373-Speed 5406.54 samples/sec   Loss 8.9239   LearningRate 0.1974   Epoch: 4   Global Step: 47550   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:22:46,945-Speed 5409.63 samples/sec   Loss 8.9720   LearningRate 0.1974   Epoch: 4   Global Step: 47560   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:22:54,605-Speed 5348.59 samples/sec   Loss 8.9693   LearningRate 0.1974   Epoch: 4   Global Step: 47570   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:23:02,161-Speed 5420.93 samples/sec   Loss 8.9131   LearningRate 0.1974   Epoch: 4   Global Step: 47580   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:23:09,739-Speed 5405.93 samples/sec   Loss 8.9277   LearningRate 0.1974   Epoch: 4   Global Step: 47590   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:23:17,314-Speed 5408.40 samples/sec   Loss 8.9443   LearningRate 0.1973   Epoch: 4   Global Step: 47600   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:23:24,847-Speed 5438.21 samples/sec   Loss 8.9640   LearningRate 0.1973   Epoch: 4   Global Step: 47610   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:23:32,540-Speed 5324.94 samples/sec   Loss 8.9561   LearningRate 0.1973   Epoch: 4   Global Step: 47620   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:23:40,102-Speed 5417.51 samples/sec   Loss 8.9782   LearningRate 0.1973   Epoch: 4   Global Step: 47630   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:23:47,746-Speed 5359.05 samples/sec   Loss 8.9701   LearningRate 0.1972   Epoch: 4   Global Step: 47640   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:23:55,392-Speed 5357.92 samples/sec   Loss 8.9755   LearningRate 0.1972   Epoch: 4   Global Step: 47650   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:24:03,077-Speed 5330.54 samples/sec   Loss 8.9887   LearningRate 0.1972   Epoch: 4   Global Step: 47660   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:24:10,679-Speed 5388.51 samples/sec   Loss 9.0099   LearningRate 0.1972   Epoch: 4   Global Step: 47670   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:24:18,265-Speed 5400.03 samples/sec   Loss 8.9916   LearningRate 0.1971   Epoch: 4   Global Step: 47680   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:24:25,741-Speed 5480.42 samples/sec   Loss 8.9323   LearningRate 0.1971   Epoch: 4   Global Step: 47690   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:24:33,334-Speed 5395.08 samples/sec   Loss 8.9506   LearningRate 0.1971   Epoch: 4   Global Step: 47700   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:24:40,960-Speed 5371.44 samples/sec   Loss 8.9143   LearningRate 0.1971   Epoch: 4   Global Step: 47710   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:24:48,567-Speed 5385.13 samples/sec   Loss 8.9950   LearningRate 0.1970   Epoch: 4   Global Step: 47720   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:24:56,125-Speed 5420.81 samples/sec   Loss 8.8957   LearningRate 0.1970   Epoch: 4   Global Step: 47730   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:25:03,704-Speed 5404.60 samples/sec   Loss 8.9279   LearningRate 0.1970   Epoch: 4   Global Step: 47740   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:25:11,290-Speed 5400.26 samples/sec   Loss 8.9634   LearningRate 0.1970   Epoch: 4   Global Step: 47750   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:25:18,860-Speed 5411.82 samples/sec   Loss 8.9156   LearningRate 0.1969   Epoch: 4   Global Step: 47760   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:25:26,450-Speed 5397.42 samples/sec   Loss 8.8856   LearningRate 0.1969   Epoch: 4   Global Step: 47770   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:25:34,085-Speed 5365.55 samples/sec   Loss 8.9353   LearningRate 0.1969   Epoch: 4   Global Step: 47780   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:25:41,647-Speed 5416.69 samples/sec   Loss 8.9335   LearningRate 0.1969   Epoch: 4   Global Step: 47790   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:25:49,486-Speed 5226.08 samples/sec   Loss 8.9433   LearningRate 0.1968   Epoch: 4   Global Step: 47800   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:25:57,005-Speed 5448.60 samples/sec   Loss 8.9185   LearningRate 0.1968   Epoch: 4   Global Step: 47810   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:26:04,788-Speed 5263.26 samples/sec   Loss 8.8747   LearningRate 0.1968   Epoch: 4   Global Step: 47820   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:26:12,268-Speed 5476.62 samples/sec   Loss 8.9112   LearningRate 0.1968   Epoch: 4   Global Step: 47830   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:26:19,846-Speed 5405.74 samples/sec   Loss 8.9429   LearningRate 0.1967   Epoch: 4   Global Step: 47840   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:26:27,392-Speed 5429.07 samples/sec   Loss 8.8934   LearningRate 0.1967   Epoch: 4   Global Step: 47850   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:26:34,958-Speed 5414.09 samples/sec   Loss 8.9171   LearningRate 0.1967   Epoch: 4   Global Step: 47860   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:26:42,454-Speed 5464.95 samples/sec   Loss 8.9192   LearningRate 0.1967   Epoch: 4   Global Step: 47870   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:26:50,029-Speed 5407.81 samples/sec   Loss 8.8783   LearningRate 0.1966   Epoch: 4   Global Step: 47880   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:26:57,586-Speed 5421.38 samples/sec   Loss 8.8980   LearningRate 0.1966   Epoch: 4   Global Step: 47890   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:27:05,105-Speed 5447.86 samples/sec   Loss 9.0074   LearningRate 0.1966   Epoch: 4   Global Step: 47900   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 05:27:12,660-Speed 5422.44 samples/sec   Loss 8.8966   LearningRate 0.1966   Epoch: 4   Global Step: 47910   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:27:20,222-Speed 5416.62 samples/sec   Loss 8.9073   LearningRate 0.1965   Epoch: 4   Global Step: 47920   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:27:27,908-Speed 5330.22 samples/sec   Loss 8.8762   LearningRate 0.1965   Epoch: 4   Global Step: 47930   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:27:35,568-Speed 5348.13 samples/sec   Loss 8.9293   LearningRate 0.1965   Epoch: 4   Global Step: 47940   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:27:43,136-Speed 5412.93 samples/sec   Loss 8.9184   LearningRate 0.1965   Epoch: 4   Global Step: 47950   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:27:50,629-Speed 5466.90 samples/sec   Loss 8.8575   LearningRate 0.1964   Epoch: 4   Global Step: 47960   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:27:58,188-Speed 5420.34 samples/sec   Loss 8.8723   LearningRate 0.1964   Epoch: 4   Global Step: 47970   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:28:05,675-Speed 5471.68 samples/sec   Loss 8.9812   LearningRate 0.1964   Epoch: 4   Global Step: 47980   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:28:13,162-Speed 5470.64 samples/sec   Loss 8.9927   LearningRate 0.1964   Epoch: 4   Global Step: 47990   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:28:20,775-Speed 5380.94 samples/sec   Loss 8.9173   LearningRate 0.1963   Epoch: 4   Global Step: 48000   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:29:04,874-[lfw][48000]XNorm: 23.765317
Training: 2022-01-08 05:29:04,875-[lfw][48000]Accuracy-Flip: 0.99717+-0.00279
Training: 2022-01-08 05:29:04,875-[lfw][48000]Accuracy-Highest: 0.99817
Training: 2022-01-08 05:29:56,633-[cfp_fp][48000]XNorm: 21.289137
Training: 2022-01-08 05:29:56,634-[cfp_fp][48000]Accuracy-Flip: 0.98186+-0.00535
Training: 2022-01-08 05:29:56,635-[cfp_fp][48000]Accuracy-Highest: 0.98600
Training: 2022-01-08 05:30:42,416-[agedb_30][48000]XNorm: 23.444254
Training: 2022-01-08 05:30:42,418-[agedb_30][48000]Accuracy-Flip: 0.96933+-0.00978
Training: 2022-01-08 05:30:42,418-[agedb_30][48000]Accuracy-Highest: 0.97250
Training: 2022-01-08 05:30:50,115-Speed 274.28 samples/sec   Loss 8.9716   LearningRate 0.1963   Epoch: 4   Global Step: 48010   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:30:57,684-Speed 5412.83 samples/sec   Loss 8.8830   LearningRate 0.1963   Epoch: 4   Global Step: 48020   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:31:05,326-Speed 5361.23 samples/sec   Loss 8.9810   LearningRate 0.1963   Epoch: 4   Global Step: 48030   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:31:12,859-Speed 5438.69 samples/sec   Loss 8.8828   LearningRate 0.1962   Epoch: 4   Global Step: 48040   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:31:20,574-Speed 5310.25 samples/sec   Loss 8.9359   LearningRate 0.1962   Epoch: 4   Global Step: 48050   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:31:28,225-Speed 5355.15 samples/sec   Loss 8.8929   LearningRate 0.1962   Epoch: 4   Global Step: 48060   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:31:35,853-Speed 5371.00 samples/sec   Loss 8.8715   LearningRate 0.1962   Epoch: 4   Global Step: 48070   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:31:43,397-Speed 5430.08 samples/sec   Loss 8.8165   LearningRate 0.1961   Epoch: 4   Global Step: 48080   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:31:50,900-Speed 5460.16 samples/sec   Loss 8.9555   LearningRate 0.1961   Epoch: 4   Global Step: 48090   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:31:58,522-Speed 5374.17 samples/sec   Loss 8.8946   LearningRate 0.1961   Epoch: 4   Global Step: 48100   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:32:06,148-Speed 5371.68 samples/sec   Loss 8.8763   LearningRate 0.1961   Epoch: 4   Global Step: 48110   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 05:32:13,843-Speed 5324.25 samples/sec   Loss 8.8600   LearningRate 0.1960   Epoch: 4   Global Step: 48120   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 05:32:21,325-Speed 5475.45 samples/sec   Loss 8.9130   LearningRate 0.1960   Epoch: 4   Global Step: 48130   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:32:28,877-Speed 5424.53 samples/sec   Loss 8.8417   LearningRate 0.1960   Epoch: 4   Global Step: 48140   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:32:36,391-Speed 5451.41 samples/sec   Loss 8.9436   LearningRate 0.1960   Epoch: 4   Global Step: 48150   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:32:43,872-Speed 5475.93 samples/sec   Loss 8.8817   LearningRate 0.1959   Epoch: 4   Global Step: 48160   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-01-08 05:32:51,362-Speed 5469.95 samples/sec   Loss 8.9198   LearningRate 0.1959   Epoch: 4   Global Step: 48170   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-01-08 05:32:58,879-Speed 5449.46 samples/sec   Loss 8.9624   LearningRate 0.1959   Epoch: 4   Global Step: 48180   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-01-08 05:33:06,430-Speed 5424.98 samples/sec   Loss 8.8484   LearningRate 0.1959   Epoch: 4   Global Step: 48190   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-01-08 05:33:13,965-Speed 5436.72 samples/sec   Loss 8.9669   LearningRate 0.1958   Epoch: 4   Global Step: 48200   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-01-08 05:33:21,511-Speed 5429.39 samples/sec   Loss 8.8924   LearningRate 0.1958   Epoch: 4   Global Step: 48210   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-01-08 05:33:28,987-Speed 5479.65 samples/sec   Loss 8.9331   LearningRate 0.1958   Epoch: 4   Global Step: 48220   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-01-08 05:33:36,465-Speed 5477.96 samples/sec   Loss 8.9187   LearningRate 0.1958   Epoch: 4   Global Step: 48230   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-01-08 05:33:43,997-Speed 5438.97 samples/sec   Loss 8.8658   LearningRate 0.1957   Epoch: 4   Global Step: 48240   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-01-08 05:33:51,526-Speed 5440.69 samples/sec   Loss 8.9288   LearningRate 0.1957   Epoch: 4   Global Step: 48250   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-01-08 05:33:59,042-Speed 5450.78 samples/sec   Loss 8.8641   LearningRate 0.1957   Epoch: 4   Global Step: 48260   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:34:06,598-Speed 5421.23 samples/sec   Loss 8.8589   LearningRate 0.1957   Epoch: 4   Global Step: 48270   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:34:14,185-Speed 5399.60 samples/sec   Loss 8.8182   LearningRate 0.1957   Epoch: 4   Global Step: 48280   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:34:21,679-Speed 5466.54 samples/sec   Loss 8.9036   LearningRate 0.1956   Epoch: 4   Global Step: 48290   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:34:29,157-Speed 5477.72 samples/sec   Loss 8.9461   LearningRate 0.1956   Epoch: 4   Global Step: 48300   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:34:36,674-Speed 5449.43 samples/sec   Loss 8.9503   LearningRate 0.1956   Epoch: 4   Global Step: 48310   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:34:44,160-Speed 5472.24 samples/sec   Loss 8.8857   LearningRate 0.1956   Epoch: 4   Global Step: 48320   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:34:51,675-Speed 5451.61 samples/sec   Loss 8.8858   LearningRate 0.1955   Epoch: 4   Global Step: 48330   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:34:59,223-Speed 5427.40 samples/sec   Loss 8.8248   LearningRate 0.1955   Epoch: 4   Global Step: 48340   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:35:06,696-Speed 5481.58 samples/sec   Loss 8.8817   LearningRate 0.1955   Epoch: 4   Global Step: 48350   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:35:14,231-Speed 5436.55 samples/sec   Loss 8.8902   LearningRate 0.1955   Epoch: 4   Global Step: 48360   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:35:21,755-Speed 5445.09 samples/sec   Loss 8.8277   LearningRate 0.1954   Epoch: 4   Global Step: 48370   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:35:29,348-Speed 5395.50 samples/sec   Loss 8.8554   LearningRate 0.1954   Epoch: 4   Global Step: 48380   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:35:37,055-Speed 5315.05 samples/sec   Loss 8.8291   LearningRate 0.1954   Epoch: 4   Global Step: 48390   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:35:44,539-Speed 5473.49 samples/sec   Loss 8.9170   LearningRate 0.1954   Epoch: 4   Global Step: 48400   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:35:52,406-Speed 5207.74 samples/sec   Loss 8.8559   LearningRate 0.1953   Epoch: 4   Global Step: 48410   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:35:59,947-Speed 5432.59 samples/sec   Loss 8.8609   LearningRate 0.1953   Epoch: 4   Global Step: 48420   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:36:07,457-Speed 5454.72 samples/sec   Loss 8.8786   LearningRate 0.1953   Epoch: 4   Global Step: 48430   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:36:14,998-Speed 5431.83 samples/sec   Loss 8.9135   LearningRate 0.1953   Epoch: 4   Global Step: 48440   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:36:22,523-Speed 5444.48 samples/sec   Loss 8.9038   LearningRate 0.1952   Epoch: 4   Global Step: 48450   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:36:30,114-Speed 5396.91 samples/sec   Loss 8.8671   LearningRate 0.1952   Epoch: 4   Global Step: 48460   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 05:36:37,555-Speed 5505.63 samples/sec   Loss 8.8539   LearningRate 0.1952   Epoch: 4   Global Step: 48470   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:36:45,024-Speed 5484.51 samples/sec   Loss 8.8442   LearningRate 0.1952   Epoch: 4   Global Step: 48480   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:36:52,492-Speed 5485.56 samples/sec   Loss 8.8270   LearningRate 0.1951   Epoch: 4   Global Step: 48490   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:36:59,986-Speed 5466.39 samples/sec   Loss 8.8244   LearningRate 0.1951   Epoch: 4   Global Step: 48500   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:37:07,674-Speed 5329.01 samples/sec   Loss 8.8991   LearningRate 0.1951   Epoch: 4   Global Step: 48510   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:37:15,274-Speed 5390.05 samples/sec   Loss 8.9630   LearningRate 0.1951   Epoch: 4   Global Step: 48520   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:37:22,905-Speed 5367.92 samples/sec   Loss 8.8246   LearningRate 0.1950   Epoch: 4   Global Step: 48530   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:37:30,433-Speed 5442.63 samples/sec   Loss 8.8610   LearningRate 0.1950   Epoch: 4   Global Step: 48540   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:37:37,942-Speed 5455.29 samples/sec   Loss 8.8279   LearningRate 0.1950   Epoch: 4   Global Step: 48550   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:37:45,607-Speed 5344.63 samples/sec   Loss 8.8663   LearningRate 0.1950   Epoch: 4   Global Step: 48560   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:37:53,251-Speed 5359.15 samples/sec   Loss 8.8732   LearningRate 0.1949   Epoch: 4   Global Step: 48570   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:38:00,912-Speed 5346.95 samples/sec   Loss 8.8312   LearningRate 0.1949   Epoch: 4   Global Step: 48580   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:38:09,025-Speed 5049.52 samples/sec   Loss 8.9319   LearningRate 0.1949   Epoch: 4   Global Step: 48590   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:38:16,579-Speed 5422.80 samples/sec   Loss 8.9260   LearningRate 0.1949   Epoch: 4   Global Step: 48600   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:38:24,108-Speed 5440.98 samples/sec   Loss 8.8625   LearningRate 0.1948   Epoch: 4   Global Step: 48610   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:38:31,678-Speed 5411.61 samples/sec   Loss 8.8987   LearningRate 0.1948   Epoch: 4   Global Step: 48620   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:38:39,169-Speed 5468.53 samples/sec   Loss 8.9397   LearningRate 0.1948   Epoch: 4   Global Step: 48630   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:38:46,690-Speed 5446.57 samples/sec   Loss 8.8521   LearningRate 0.1948   Epoch: 4   Global Step: 48640   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:38:54,229-Speed 5433.88 samples/sec   Loss 8.8504   LearningRate 0.1947   Epoch: 4   Global Step: 48650   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:39:01,678-Speed 5499.68 samples/sec   Loss 8.8730   LearningRate 0.1947   Epoch: 4   Global Step: 48660   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:39:09,161-Speed 5474.50 samples/sec   Loss 8.8791   LearningRate 0.1947   Epoch: 4   Global Step: 48670   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 05:39:16,668-Speed 5456.96 samples/sec   Loss 8.7828   LearningRate 0.1947   Epoch: 4   Global Step: 48680   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 05:39:24,259-Speed 5396.68 samples/sec   Loss 8.8606   LearningRate 0.1946   Epoch: 4   Global Step: 48690   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:39:31,839-Speed 5404.89 samples/sec   Loss 8.9162   LearningRate 0.1946   Epoch: 4   Global Step: 48700   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:39:39,454-Speed 5379.33 samples/sec   Loss 8.8479   LearningRate 0.1946   Epoch: 4   Global Step: 48710   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:39:47,031-Speed 5406.42 samples/sec   Loss 8.8637   LearningRate 0.1946   Epoch: 4   Global Step: 48720   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:39:54,664-Speed 5367.00 samples/sec   Loss 8.7631   LearningRate 0.1945   Epoch: 4   Global Step: 48730   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:40:02,272-Speed 5385.29 samples/sec   Loss 8.8450   LearningRate 0.1945   Epoch: 4   Global Step: 48740   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:40:09,943-Speed 5339.94 samples/sec   Loss 8.8685   LearningRate 0.1945   Epoch: 4   Global Step: 48750   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:40:17,517-Speed 5408.31 samples/sec   Loss 8.9201   LearningRate 0.1945   Epoch: 4   Global Step: 48760   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:40:25,048-Speed 5439.63 samples/sec   Loss 8.8928   LearningRate 0.1944   Epoch: 4   Global Step: 48770   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:40:32,593-Speed 5429.92 samples/sec   Loss 8.8915   LearningRate 0.1944   Epoch: 4   Global Step: 48780   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:40:40,100-Speed 5457.29 samples/sec   Loss 8.9232   LearningRate 0.1944   Epoch: 4   Global Step: 48790   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:40:47,557-Speed 5492.73 samples/sec   Loss 8.8146   LearningRate 0.1944   Epoch: 4   Global Step: 48800   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:40:55,042-Speed 5472.76 samples/sec   Loss 8.8668   LearningRate 0.1943   Epoch: 4   Global Step: 48810   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:41:02,578-Speed 5436.78 samples/sec   Loss 8.8275   LearningRate 0.1943   Epoch: 4   Global Step: 48820   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:41:10,039-Speed 5490.61 samples/sec   Loss 8.9367   LearningRate 0.1943   Epoch: 4   Global Step: 48830   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:41:17,523-Speed 5473.15 samples/sec   Loss 8.8870   LearningRate 0.1943   Epoch: 4   Global Step: 48840   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:41:25,144-Speed 5375.50 samples/sec   Loss 8.7917   LearningRate 0.1943   Epoch: 4   Global Step: 48850   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:41:32,672-Speed 5441.68 samples/sec   Loss 8.8524   LearningRate 0.1942   Epoch: 4   Global Step: 48860   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:41:40,258-Speed 5400.50 samples/sec   Loss 8.9470   LearningRate 0.1942   Epoch: 4   Global Step: 48870   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:41:47,788-Speed 5439.63 samples/sec   Loss 8.8301   LearningRate 0.1942   Epoch: 4   Global Step: 48880   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:41:55,290-Speed 5461.05 samples/sec   Loss 8.8204   LearningRate 0.1942   Epoch: 4   Global Step: 48890   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:42:02,793-Speed 5460.42 samples/sec   Loss 8.8507   LearningRate 0.1941   Epoch: 4   Global Step: 48900   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:42:10,303-Speed 5455.39 samples/sec   Loss 8.8523   LearningRate 0.1941   Epoch: 4   Global Step: 48910   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:42:17,726-Speed 5518.45 samples/sec   Loss 8.9080   LearningRate 0.1941   Epoch: 4   Global Step: 48920   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:42:25,286-Speed 5418.64 samples/sec   Loss 8.8148   LearningRate 0.1941   Epoch: 4   Global Step: 48930   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:42:32,872-Speed 5399.98 samples/sec   Loss 8.8422   LearningRate 0.1940   Epoch: 4   Global Step: 48940   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:42:40,367-Speed 5465.93 samples/sec   Loss 8.8665   LearningRate 0.1940   Epoch: 4   Global Step: 48950   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:42:47,863-Speed 5465.02 samples/sec   Loss 8.8105   LearningRate 0.1940   Epoch: 4   Global Step: 48960   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:42:55,420-Speed 5420.82 samples/sec   Loss 8.7766   LearningRate 0.1940   Epoch: 4   Global Step: 48970   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:43:02,971-Speed 5424.74 samples/sec   Loss 8.8420   LearningRate 0.1939   Epoch: 4   Global Step: 48980   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:43:10,436-Speed 5487.76 samples/sec   Loss 8.9069   LearningRate 0.1939   Epoch: 4   Global Step: 48990   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:43:17,974-Speed 5435.02 samples/sec   Loss 8.8545   LearningRate 0.1939   Epoch: 4   Global Step: 49000   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:43:25,540-Speed 5413.99 samples/sec   Loss 8.8742   LearningRate 0.1939   Epoch: 4   Global Step: 49010   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:43:33,066-Speed 5442.93 samples/sec   Loss 8.9356   LearningRate 0.1938   Epoch: 4   Global Step: 49020   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:43:44,215-Speed 3674.24 samples/sec   Loss 8.7994   LearningRate 0.1938   Epoch: 4   Global Step: 49030   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:43:51,766-Speed 5432.05 samples/sec   Loss 8.8206   LearningRate 0.1938   Epoch: 4   Global Step: 49040   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:43:59,292-Speed 5442.59 samples/sec   Loss 8.8505   LearningRate 0.1938   Epoch: 4   Global Step: 49050   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:44:06,722-Speed 5514.02 samples/sec   Loss 8.8292   LearningRate 0.1937   Epoch: 4   Global Step: 49060   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:44:14,170-Speed 5500.43 samples/sec   Loss 8.8382   LearningRate 0.1937   Epoch: 4   Global Step: 49070   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:44:21,628-Speed 5492.50 samples/sec   Loss 8.8495   LearningRate 0.1937   Epoch: 4   Global Step: 49080   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:44:29,131-Speed 5459.80 samples/sec   Loss 8.7434   LearningRate 0.1937   Epoch: 4   Global Step: 49090   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:44:36,633-Speed 5461.32 samples/sec   Loss 8.8121   LearningRate 0.1936   Epoch: 4   Global Step: 49100   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:44:44,148-Speed 5451.32 samples/sec   Loss 8.7646   LearningRate 0.1936   Epoch: 4   Global Step: 49110   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:44:51,710-Speed 5416.66 samples/sec   Loss 8.7819   LearningRate 0.1936   Epoch: 4   Global Step: 49120   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:44:59,252-Speed 5432.06 samples/sec   Loss 8.8342   LearningRate 0.1936   Epoch: 4   Global Step: 49130   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:45:06,819-Speed 5413.43 samples/sec   Loss 8.8146   LearningRate 0.1935   Epoch: 4   Global Step: 49140   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:45:14,351-Speed 5438.99 samples/sec   Loss 8.7487   LearningRate 0.1935   Epoch: 4   Global Step: 49150   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:45:21,880-Speed 5441.46 samples/sec   Loss 8.8986   LearningRate 0.1935   Epoch: 4   Global Step: 49160   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:45:29,389-Speed 5454.86 samples/sec   Loss 8.8265   LearningRate 0.1935   Epoch: 4   Global Step: 49170   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 05:45:36,831-Speed 5504.93 samples/sec   Loss 8.8584   LearningRate 0.1934   Epoch: 4   Global Step: 49180   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:45:44,304-Speed 5482.52 samples/sec   Loss 8.7457   LearningRate 0.1934   Epoch: 4   Global Step: 49190   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:45:51,953-Speed 5355.10 samples/sec   Loss 8.8037   LearningRate 0.1934   Epoch: 4   Global Step: 49200   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:45:59,450-Speed 5464.24 samples/sec   Loss 8.8136   LearningRate 0.1934   Epoch: 4   Global Step: 49210   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:46:07,049-Speed 5391.48 samples/sec   Loss 8.7730   LearningRate 0.1933   Epoch: 4   Global Step: 49220   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:46:14,666-Speed 5377.75 samples/sec   Loss 8.8015   LearningRate 0.1933   Epoch: 4   Global Step: 49230   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:46:22,222-Speed 5421.45 samples/sec   Loss 8.7771   LearningRate 0.1933   Epoch: 4   Global Step: 49240   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:46:29,936-Speed 5310.98 samples/sec   Loss 8.8015   LearningRate 0.1933   Epoch: 4   Global Step: 49250   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:46:37,456-Speed 5447.91 samples/sec   Loss 8.9008   LearningRate 0.1932   Epoch: 4   Global Step: 49260   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:46:45,072-Speed 5378.55 samples/sec   Loss 8.8486   LearningRate 0.1932   Epoch: 4   Global Step: 49270   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:46:52,736-Speed 5345.53 samples/sec   Loss 8.8139   LearningRate 0.1932   Epoch: 4   Global Step: 49280   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 05:47:00,256-Speed 5447.14 samples/sec   Loss 8.8719   LearningRate 0.1932   Epoch: 4   Global Step: 49290   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 05:47:07,896-Speed 5362.20 samples/sec   Loss 8.7366   LearningRate 0.1931   Epoch: 4   Global Step: 49300   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:47:15,398-Speed 5461.01 samples/sec   Loss 8.8251   LearningRate 0.1931   Epoch: 4   Global Step: 49310   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:47:22,932-Speed 5437.41 samples/sec   Loss 8.8211   LearningRate 0.1931   Epoch: 4   Global Step: 49320   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:47:30,408-Speed 5479.73 samples/sec   Loss 8.8550   LearningRate 0.1931   Epoch: 4   Global Step: 49330   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:47:37,886-Speed 5477.78 samples/sec   Loss 8.7970   LearningRate 0.1931   Epoch: 4   Global Step: 49340   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:47:45,458-Speed 5410.18 samples/sec   Loss 8.9414   LearningRate 0.1930   Epoch: 4   Global Step: 49350   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:47:53,008-Speed 5426.08 samples/sec   Loss 8.7937   LearningRate 0.1930   Epoch: 4   Global Step: 49360   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:48:00,608-Speed 5390.21 samples/sec   Loss 8.7576   LearningRate 0.1930   Epoch: 4   Global Step: 49370   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:48:08,143-Speed 5436.33 samples/sec   Loss 8.8083   LearningRate 0.1930   Epoch: 4   Global Step: 49380   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:48:15,660-Speed 5450.15 samples/sec   Loss 8.7965   LearningRate 0.1929   Epoch: 4   Global Step: 49390   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:48:23,189-Speed 5441.37 samples/sec   Loss 8.8791   LearningRate 0.1929   Epoch: 4   Global Step: 49400   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 05:48:30,776-Speed 5398.85 samples/sec   Loss 8.8572   LearningRate 0.1929   Epoch: 4   Global Step: 49410   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:48:38,250-Speed 5481.47 samples/sec   Loss 8.8660   LearningRate 0.1929   Epoch: 4   Global Step: 49420   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:48:45,778-Speed 5441.83 samples/sec   Loss 8.8712   LearningRate 0.1928   Epoch: 4   Global Step: 49430   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:48:53,272-Speed 5466.32 samples/sec   Loss 8.8682   LearningRate 0.1928   Epoch: 4   Global Step: 49440   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:49:00,784-Speed 5453.19 samples/sec   Loss 8.8309   LearningRate 0.1928   Epoch: 4   Global Step: 49450   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:49:08,307-Speed 5445.69 samples/sec   Loss 8.8744   LearningRate 0.1928   Epoch: 4   Global Step: 49460   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:49:15,768-Speed 5490.41 samples/sec   Loss 8.8392   LearningRate 0.1927   Epoch: 4   Global Step: 49470   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:49:23,323-Speed 5422.91 samples/sec   Loss 8.7890   LearningRate 0.1927   Epoch: 4   Global Step: 49480   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:49:30,858-Speed 5436.39 samples/sec   Loss 8.7407   LearningRate 0.1927   Epoch: 4   Global Step: 49490   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:49:38,409-Speed 5424.88 samples/sec   Loss 8.8112   LearningRate 0.1927   Epoch: 4   Global Step: 49500   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:49:45,975-Speed 5414.48 samples/sec   Loss 8.7811   LearningRate 0.1926   Epoch: 4   Global Step: 49510   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 05:49:53,466-Speed 5469.11 samples/sec   Loss 8.8016   LearningRate 0.1926   Epoch: 4   Global Step: 49520   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 05:50:00,951-Speed 5473.19 samples/sec   Loss 8.7982   LearningRate 0.1926   Epoch: 4   Global Step: 49530   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:50:08,440-Speed 5470.08 samples/sec   Loss 8.8404   LearningRate 0.1926   Epoch: 4   Global Step: 49540   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:50:15,851-Speed 5527.64 samples/sec   Loss 8.8049   LearningRate 0.1925   Epoch: 4   Global Step: 49550   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:50:23,344-Speed 5467.37 samples/sec   Loss 8.8109   LearningRate 0.1925   Epoch: 4   Global Step: 49560   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:50:30,866-Speed 5445.93 samples/sec   Loss 8.7420   LearningRate 0.1925   Epoch: 4   Global Step: 49570   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:50:38,376-Speed 5454.68 samples/sec   Loss 8.7355   LearningRate 0.1925   Epoch: 4   Global Step: 49580   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:50:45,869-Speed 5467.21 samples/sec   Loss 8.7861   LearningRate 0.1924   Epoch: 4   Global Step: 49590   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:50:53,364-Speed 5466.09 samples/sec   Loss 8.8631   LearningRate 0.1924   Epoch: 4   Global Step: 49600   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:51:00,889-Speed 5443.62 samples/sec   Loss 8.8041   LearningRate 0.1924   Epoch: 4   Global Step: 49610   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:51:08,447-Speed 5420.62 samples/sec   Loss 8.8158   LearningRate 0.1924   Epoch: 4   Global Step: 49620   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:51:16,383-Speed 5162.05 samples/sec   Loss 8.8361   LearningRate 0.1923   Epoch: 4   Global Step: 49630   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 05:51:24,115-Speed 5298.33 samples/sec   Loss 8.8009   LearningRate 0.1923   Epoch: 4   Global Step: 49640   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 05:51:31,635-Speed 5446.92 samples/sec   Loss 8.7941   LearningRate 0.1923   Epoch: 4   Global Step: 49650   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 05:51:39,110-Speed 5480.57 samples/sec   Loss 8.7455   LearningRate 0.1923   Epoch: 4   Global Step: 49660   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 05:51:46,634-Speed 5444.22 samples/sec   Loss 8.8058   LearningRate 0.1922   Epoch: 4   Global Step: 49670   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 05:51:54,215-Speed 5404.24 samples/sec   Loss 8.7427   LearningRate 0.1922   Epoch: 4   Global Step: 49680   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:52:01,735-Speed 5447.76 samples/sec   Loss 8.7664   LearningRate 0.1922   Epoch: 4   Global Step: 49690   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:52:09,272-Speed 5434.76 samples/sec   Loss 8.7485   LearningRate 0.1922   Epoch: 4   Global Step: 49700   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:52:16,831-Speed 5419.15 samples/sec   Loss 8.7485   LearningRate 0.1921   Epoch: 4   Global Step: 49710   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:52:24,351-Speed 5448.46 samples/sec   Loss 8.8098   LearningRate 0.1921   Epoch: 4   Global Step: 49720   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:52:31,794-Speed 5503.22 samples/sec   Loss 8.7571   LearningRate 0.1921   Epoch: 4   Global Step: 49730   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:52:39,311-Speed 5449.29 samples/sec   Loss 8.8158   LearningRate 0.1921   Epoch: 4   Global Step: 49740   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:52:46,841-Speed 5440.25 samples/sec   Loss 8.8008   LearningRate 0.1921   Epoch: 4   Global Step: 49750   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:52:54,425-Speed 5402.31 samples/sec   Loss 8.8965   LearningRate 0.1920   Epoch: 4   Global Step: 49760   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:53:01,943-Speed 5448.89 samples/sec   Loss 8.8386   LearningRate 0.1920   Epoch: 4   Global Step: 49770   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:53:09,529-Speed 5400.14 samples/sec   Loss 8.8246   LearningRate 0.1920   Epoch: 4   Global Step: 49780   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:53:17,122-Speed 5394.94 samples/sec   Loss 8.7891   LearningRate 0.1920   Epoch: 4   Global Step: 49790   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:53:24,657-Speed 5436.86 samples/sec   Loss 8.7256   LearningRate 0.1919   Epoch: 4   Global Step: 49800   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:53:32,167-Speed 5454.91 samples/sec   Loss 8.7625   LearningRate 0.1919   Epoch: 4   Global Step: 49810   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:53:39,680-Speed 5452.22 samples/sec   Loss 8.7749   LearningRate 0.1919   Epoch: 4   Global Step: 49820   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:53:47,126-Speed 5502.11 samples/sec   Loss 8.7610   LearningRate 0.1919   Epoch: 4   Global Step: 49830   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:53:54,673-Speed 5427.69 samples/sec   Loss 8.8460   LearningRate 0.1918   Epoch: 4   Global Step: 49840   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:54:02,194-Speed 5447.40 samples/sec   Loss 8.8041   LearningRate 0.1918   Epoch: 4   Global Step: 49850   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:54:09,717-Speed 5445.43 samples/sec   Loss 8.7335   LearningRate 0.1918   Epoch: 4   Global Step: 49860   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:54:17,189-Speed 5482.33 samples/sec   Loss 8.7881   LearningRate 0.1918   Epoch: 4   Global Step: 49870   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:54:24,769-Speed 5404.36 samples/sec   Loss 8.7637   LearningRate 0.1917   Epoch: 4   Global Step: 49880   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:54:32,369-Speed 5390.48 samples/sec   Loss 8.7158   LearningRate 0.1917   Epoch: 4   Global Step: 49890   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 05:54:39,917-Speed 5427.31 samples/sec   Loss 8.8255   LearningRate 0.1917   Epoch: 4   Global Step: 49900   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:54:47,509-Speed 5395.82 samples/sec   Loss 8.7053   LearningRate 0.1917   Epoch: 4   Global Step: 49910   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:54:55,060-Speed 5424.94 samples/sec   Loss 8.7490   LearningRate 0.1916   Epoch: 4   Global Step: 49920   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:55:02,566-Speed 5457.45 samples/sec   Loss 8.7831   LearningRate 0.1916   Epoch: 4   Global Step: 49930   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:55:10,047-Speed 5476.64 samples/sec   Loss 8.8007   LearningRate 0.1916   Epoch: 4   Global Step: 49940   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:55:17,573-Speed 5442.75 samples/sec   Loss 8.7293   LearningRate 0.1916   Epoch: 4   Global Step: 49950   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:55:25,077-Speed 5458.85 samples/sec   Loss 8.7395   LearningRate 0.1915   Epoch: 4   Global Step: 49960   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:55:32,676-Speed 5390.80 samples/sec   Loss 8.8211   LearningRate 0.1915   Epoch: 4   Global Step: 49970   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:55:40,221-Speed 5430.08 samples/sec   Loss 8.8497   LearningRate 0.1915   Epoch: 4   Global Step: 49980   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:55:47,749-Speed 5441.14 samples/sec   Loss 8.7937   LearningRate 0.1915   Epoch: 4   Global Step: 49990   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:55:55,329-Speed 5403.83 samples/sec   Loss 8.7585   LearningRate 0.1914   Epoch: 4   Global Step: 50000   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 05:56:39,506-[lfw][50000]XNorm: 23.231188
Training: 2022-01-08 05:56:39,507-[lfw][50000]Accuracy-Flip: 0.99700+-0.00245
Training: 2022-01-08 05:56:39,507-[lfw][50000]Accuracy-Highest: 0.99817
Training: 2022-01-08 05:57:31,014-[cfp_fp][50000]XNorm: 21.256862
Training: 2022-01-08 05:57:31,015-[cfp_fp][50000]Accuracy-Flip: 0.98386+-0.00691
Training: 2022-01-08 05:57:31,016-[cfp_fp][50000]Accuracy-Highest: 0.98600
Training: 2022-01-08 05:58:17,299-[agedb_30][50000]XNorm: 23.094199
Training: 2022-01-08 05:58:17,300-[agedb_30][50000]Accuracy-Flip: 0.97083+-0.00970
Training: 2022-01-08 05:58:17,301-[agedb_30][50000]Accuracy-Highest: 0.97250
Training: 2022-01-08 05:58:24,853-Speed 273.94 samples/sec   Loss 8.8490   LearningRate 0.1914   Epoch: 4   Global Step: 50010   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 05:58:32,324-Speed 5484.50 samples/sec   Loss 8.8449   LearningRate 0.1914   Epoch: 4   Global Step: 50020   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:58:39,901-Speed 5407.34 samples/sec   Loss 8.7510   LearningRate 0.1914   Epoch: 4   Global Step: 50030   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:58:47,418-Speed 5450.69 samples/sec   Loss 8.8425   LearningRate 0.1913   Epoch: 4   Global Step: 50040   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:58:54,828-Speed 5528.59 samples/sec   Loss 8.7903   LearningRate 0.1913   Epoch: 4   Global Step: 50050   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:59:02,402-Speed 5408.84 samples/sec   Loss 8.7460   LearningRate 0.1913   Epoch: 4   Global Step: 50060   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:59:09,965-Speed 5415.99 samples/sec   Loss 8.7330   LearningRate 0.1913   Epoch: 4   Global Step: 50070   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:59:17,575-Speed 5383.14 samples/sec   Loss 8.7792   LearningRate 0.1912   Epoch: 4   Global Step: 50080   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:59:25,200-Speed 5373.37 samples/sec   Loss 8.7140   LearningRate 0.1912   Epoch: 4   Global Step: 50090   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:59:32,750-Speed 5425.22 samples/sec   Loss 8.7879   LearningRate 0.1912   Epoch: 4   Global Step: 50100   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:59:40,417-Speed 5342.92 samples/sec   Loss 8.7856   LearningRate 0.1912   Epoch: 4   Global Step: 50110   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 05:59:47,934-Speed 5450.22 samples/sec   Loss 8.7950   LearningRate 0.1912   Epoch: 4   Global Step: 50120   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 05:59:55,524-Speed 5397.77 samples/sec   Loss 8.7721   LearningRate 0.1911   Epoch: 4   Global Step: 50130   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:00:03,182-Speed 5349.11 samples/sec   Loss 8.7201   LearningRate 0.1911   Epoch: 4   Global Step: 50140   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:00:10,740-Speed 5420.33 samples/sec   Loss 8.7950   LearningRate 0.1911   Epoch: 4   Global Step: 50150   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:00:18,368-Speed 5370.30 samples/sec   Loss 8.7641   LearningRate 0.1911   Epoch: 4   Global Step: 50160   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:00:25,773-Speed 5532.19 samples/sec   Loss 8.7260   LearningRate 0.1910   Epoch: 4   Global Step: 50170   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:00:33,335-Speed 5417.65 samples/sec   Loss 8.6841   LearningRate 0.1910   Epoch: 4   Global Step: 50180   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:00:40,947-Speed 5381.28 samples/sec   Loss 8.7562   LearningRate 0.1910   Epoch: 4   Global Step: 50190   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:00:48,457-Speed 5455.06 samples/sec   Loss 8.8039   LearningRate 0.1910   Epoch: 4   Global Step: 50200   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:00:55,926-Speed 5484.84 samples/sec   Loss 8.6479   LearningRate 0.1909   Epoch: 4   Global Step: 50210   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:01:03,433-Speed 5456.58 samples/sec   Loss 8.7570   LearningRate 0.1909   Epoch: 4   Global Step: 50220   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:01:10,962-Speed 5440.78 samples/sec   Loss 8.8095   LearningRate 0.1909   Epoch: 4   Global Step: 50230   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:01:18,462-Speed 5462.38 samples/sec   Loss 8.7606   LearningRate 0.1909   Epoch: 4   Global Step: 50240   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:01:25,929-Speed 5486.04 samples/sec   Loss 8.8096   LearningRate 0.1908   Epoch: 4   Global Step: 50250   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:01:33,388-Speed 5492.20 samples/sec   Loss 8.7786   LearningRate 0.1908   Epoch: 4   Global Step: 50260   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:01:40,826-Speed 5507.49 samples/sec   Loss 8.7845   LearningRate 0.1908   Epoch: 4   Global Step: 50270   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:01:48,398-Speed 5410.33 samples/sec   Loss 8.7485   LearningRate 0.1908   Epoch: 4   Global Step: 50280   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:01:55,929-Speed 5439.70 samples/sec   Loss 8.7962   LearningRate 0.1907   Epoch: 4   Global Step: 50290   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:02:03,444-Speed 5451.27 samples/sec   Loss 8.7901   LearningRate 0.1907   Epoch: 4   Global Step: 50300   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:02:11,038-Speed 5394.32 samples/sec   Loss 8.7668   LearningRate 0.1907   Epoch: 4   Global Step: 50310   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:02:18,491-Speed 5496.50 samples/sec   Loss 8.7550   LearningRate 0.1907   Epoch: 4   Global Step: 50320   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:02:26,037-Speed 5428.77 samples/sec   Loss 8.7211   LearningRate 0.1906   Epoch: 4   Global Step: 50330   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:02:33,609-Speed 5410.50 samples/sec   Loss 8.7237   LearningRate 0.1906   Epoch: 4   Global Step: 50340   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:02:41,153-Speed 5429.91 samples/sec   Loss 8.7948   LearningRate 0.1906   Epoch: 4   Global Step: 50350   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:02:48,633-Speed 5477.07 samples/sec   Loss 8.8668   LearningRate 0.1906   Epoch: 4   Global Step: 50360   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:02:56,128-Speed 5465.87 samples/sec   Loss 8.7629   LearningRate 0.1905   Epoch: 4   Global Step: 50370   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 06:03:03,675-Speed 5427.71 samples/sec   Loss 8.7529   LearningRate 0.1905   Epoch: 4   Global Step: 50380   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 06:03:11,133-Speed 5493.23 samples/sec   Loss 8.7470   LearningRate 0.1905   Epoch: 4   Global Step: 50390   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 06:03:18,598-Speed 5487.46 samples/sec   Loss 8.7069   LearningRate 0.1905   Epoch: 4   Global Step: 50400   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:03:26,149-Speed 5425.19 samples/sec   Loss 8.7718   LearningRate 0.1904   Epoch: 4   Global Step: 50410   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:03:33,762-Speed 5380.88 samples/sec   Loss 8.7689   LearningRate 0.1904   Epoch: 4   Global Step: 50420   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:03:41,366-Speed 5387.83 samples/sec   Loss 8.7728   LearningRate 0.1904   Epoch: 4   Global Step: 50430   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:03:48,962-Speed 5392.17 samples/sec   Loss 8.7363   LearningRate 0.1904   Epoch: 4   Global Step: 50440   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:03:56,521-Speed 5419.63 samples/sec   Loss 8.7157   LearningRate 0.1903   Epoch: 4   Global Step: 50450   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:04:04,062-Speed 5432.19 samples/sec   Loss 8.7899   LearningRate 0.1903   Epoch: 4   Global Step: 50460   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:04:11,567-Speed 5458.63 samples/sec   Loss 8.7049   LearningRate 0.1903   Epoch: 4   Global Step: 50470   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:04:19,129-Speed 5417.00 samples/sec   Loss 8.7506   LearningRate 0.1903   Epoch: 4   Global Step: 50480   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:04:26,734-Speed 5386.59 samples/sec   Loss 8.7740   LearningRate 0.1903   Epoch: 4   Global Step: 50490   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:04:34,166-Speed 5512.31 samples/sec   Loss 8.7771   LearningRate 0.1902   Epoch: 4   Global Step: 50500   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:04:41,707-Speed 5432.14 samples/sec   Loss 8.7041   LearningRate 0.1902   Epoch: 4   Global Step: 50510   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:04:49,175-Speed 5485.43 samples/sec   Loss 8.7347   LearningRate 0.1902   Epoch: 4   Global Step: 50520   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:04:56,697-Speed 5446.13 samples/sec   Loss 8.7579   LearningRate 0.1902   Epoch: 4   Global Step: 50530   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:05:04,235-Speed 5434.83 samples/sec   Loss 8.7440   LearningRate 0.1901   Epoch: 4   Global Step: 50540   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:05:11,685-Speed 5498.59 samples/sec   Loss 8.7267   LearningRate 0.1901   Epoch: 4   Global Step: 50550   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:05:19,130-Speed 5502.45 samples/sec   Loss 8.7230   LearningRate 0.1901   Epoch: 4   Global Step: 50560   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:05:26,699-Speed 5412.87 samples/sec   Loss 8.8120   LearningRate 0.1901   Epoch: 4   Global Step: 50570   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:05:34,423-Speed 5303.57 samples/sec   Loss 8.7136   LearningRate 0.1900   Epoch: 4   Global Step: 50580   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:05:42,016-Speed 5394.90 samples/sec   Loss 8.7611   LearningRate 0.1900   Epoch: 4   Global Step: 50590   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:05:49,617-Speed 5389.35 samples/sec   Loss 8.7124   LearningRate 0.1900   Epoch: 4   Global Step: 50600   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 06:05:57,159-Speed 5432.21 samples/sec   Loss 8.8066   LearningRate 0.1900   Epoch: 4   Global Step: 50610   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:06:04,645-Speed 5472.00 samples/sec   Loss 8.6961   LearningRate 0.1899   Epoch: 4   Global Step: 50620   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:06:12,190-Speed 5429.58 samples/sec   Loss 8.6222   LearningRate 0.1899   Epoch: 4   Global Step: 50630   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:06:19,731-Speed 5432.47 samples/sec   Loss 8.7404   LearningRate 0.1899   Epoch: 4   Global Step: 50640   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:06:27,262-Speed 5439.58 samples/sec   Loss 8.7268   LearningRate 0.1899   Epoch: 4   Global Step: 50650   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:06:34,739-Speed 5479.21 samples/sec   Loss 8.8124   LearningRate 0.1898   Epoch: 4   Global Step: 50660   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:06:42,217-Speed 5478.07 samples/sec   Loss 8.7750   LearningRate 0.1898   Epoch: 4   Global Step: 50670   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:06:49,779-Speed 5417.23 samples/sec   Loss 8.7143   LearningRate 0.1898   Epoch: 4   Global Step: 50680   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:06:57,394-Speed 5379.36 samples/sec   Loss 8.7084   LearningRate 0.1898   Epoch: 4   Global Step: 50690   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:07:04,901-Speed 5457.20 samples/sec   Loss 8.7180   LearningRate 0.1897   Epoch: 4   Global Step: 50700   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:07:12,438-Speed 5434.91 samples/sec   Loss 8.7079   LearningRate 0.1897   Epoch: 4   Global Step: 50710   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 06:07:19,949-Speed 5454.56 samples/sec   Loss 8.7653   LearningRate 0.1897   Epoch: 4   Global Step: 50720   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 06:07:27,402-Speed 5496.44 samples/sec   Loss 8.7181   LearningRate 0.1897   Epoch: 4   Global Step: 50730   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 06:07:34,890-Speed 5470.74 samples/sec   Loss 8.7420   LearningRate 0.1896   Epoch: 4   Global Step: 50740   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:07:42,375-Speed 5473.53 samples/sec   Loss 8.7355   LearningRate 0.1896   Epoch: 4   Global Step: 50750   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:07:49,910-Speed 5436.16 samples/sec   Loss 8.6780   LearningRate 0.1896   Epoch: 4   Global Step: 50760   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:07:57,421-Speed 5454.27 samples/sec   Loss 8.7278   LearningRate 0.1896   Epoch: 4   Global Step: 50770   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:08:04,982-Speed 5418.13 samples/sec   Loss 8.7381   LearningRate 0.1896   Epoch: 4   Global Step: 50780   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:08:12,480-Speed 5463.25 samples/sec   Loss 8.8019   LearningRate 0.1895   Epoch: 4   Global Step: 50790   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:08:20,045-Speed 5415.39 samples/sec   Loss 8.7429   LearningRate 0.1895   Epoch: 4   Global Step: 50800   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:08:27,542-Speed 5463.84 samples/sec   Loss 8.6934   LearningRate 0.1895   Epoch: 4   Global Step: 50810   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:08:35,085-Speed 5431.24 samples/sec   Loss 8.7548   LearningRate 0.1895   Epoch: 4   Global Step: 50820   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:08:42,569-Speed 5473.95 samples/sec   Loss 8.6658   LearningRate 0.1894   Epoch: 4   Global Step: 50830   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:08:50,058-Speed 5470.29 samples/sec   Loss 8.7566   LearningRate 0.1894   Epoch: 4   Global Step: 50840   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:08:57,576-Speed 5448.23 samples/sec   Loss 8.7324   LearningRate 0.1894   Epoch: 4   Global Step: 50850   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:09:05,067-Speed 5469.50 samples/sec   Loss 8.6903   LearningRate 0.1894   Epoch: 4   Global Step: 50860   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:09:12,556-Speed 5469.60 samples/sec   Loss 8.7419   LearningRate 0.1893   Epoch: 4   Global Step: 50870   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:09:20,058-Speed 5460.99 samples/sec   Loss 8.6822   LearningRate 0.1893   Epoch: 4   Global Step: 50880   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:09:27,547-Speed 5469.76 samples/sec   Loss 8.7522   LearningRate 0.1893   Epoch: 4   Global Step: 50890   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:09:35,026-Speed 5477.55 samples/sec   Loss 8.7153   LearningRate 0.1893   Epoch: 4   Global Step: 50900   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:09:42,585-Speed 5419.48 samples/sec   Loss 8.7466   LearningRate 0.1892   Epoch: 4   Global Step: 50910   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:09:50,220-Speed 5365.74 samples/sec   Loss 8.7373   LearningRate 0.1892   Epoch: 4   Global Step: 50920   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:09:57,731-Speed 5453.91 samples/sec   Loss 8.6797   LearningRate 0.1892   Epoch: 4   Global Step: 50930   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:10:05,260-Speed 5440.96 samples/sec   Loss 8.7594   LearningRate 0.1892   Epoch: 4   Global Step: 50940   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:10:12,980-Speed 5306.47 samples/sec   Loss 8.7681   LearningRate 0.1891   Epoch: 4   Global Step: 50950   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:10:20,441-Speed 5490.93 samples/sec   Loss 8.6885   LearningRate 0.1891   Epoch: 4   Global Step: 50960   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:10:28,000-Speed 5418.80 samples/sec   Loss 8.7030   LearningRate 0.1891   Epoch: 4   Global Step: 50970   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:10:35,422-Speed 5520.24 samples/sec   Loss 8.6954   LearningRate 0.1891   Epoch: 4   Global Step: 50980   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:10:42,892-Speed 5483.75 samples/sec   Loss 8.7142   LearningRate 0.1890   Epoch: 4   Global Step: 50990   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:10:50,404-Speed 5453.02 samples/sec   Loss 8.7039   LearningRate 0.1890   Epoch: 4   Global Step: 51000   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:10:57,884-Speed 5476.75 samples/sec   Loss 8.7182   LearningRate 0.1890   Epoch: 4   Global Step: 51010   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:11:05,384-Speed 5462.10 samples/sec   Loss 8.7184   LearningRate 0.1890   Epoch: 4   Global Step: 51020   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:11:12,880-Speed 5465.03 samples/sec   Loss 8.7118   LearningRate 0.1889   Epoch: 4   Global Step: 51030   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:11:20,467-Speed 5399.37 samples/sec   Loss 8.8364   LearningRate 0.1889   Epoch: 4   Global Step: 51040   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:11:28,040-Speed 5409.38 samples/sec   Loss 8.6613   LearningRate 0.1889   Epoch: 4   Global Step: 51050   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:11:35,496-Speed 5494.61 samples/sec   Loss 8.6615   LearningRate 0.1889   Epoch: 4   Global Step: 51060   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:11:42,986-Speed 5468.69 samples/sec   Loss 8.7769   LearningRate 0.1888   Epoch: 4   Global Step: 51070   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:11:50,544-Speed 5420.79 samples/sec   Loss 8.7344   LearningRate 0.1888   Epoch: 4   Global Step: 51080   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-01-08 06:11:58,070-Speed 5442.64 samples/sec   Loss 8.6173   LearningRate 0.1888   Epoch: 4   Global Step: 51090   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-01-08 06:12:05,555-Speed 5473.25 samples/sec   Loss 8.6905   LearningRate 0.1888   Epoch: 4   Global Step: 51100   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-01-08 06:12:13,104-Speed 5426.58 samples/sec   Loss 8.7427   LearningRate 0.1888   Epoch: 4   Global Step: 51110   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-01-08 06:12:20,667-Speed 5416.19 samples/sec   Loss 8.7063   LearningRate 0.1887   Epoch: 4   Global Step: 51120   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-01-08 06:12:28,228-Speed 5418.05 samples/sec   Loss 8.7300   LearningRate 0.1887   Epoch: 4   Global Step: 51130   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-01-08 06:12:35,782-Speed 5423.33 samples/sec   Loss 8.6813   LearningRate 0.1887   Epoch: 4   Global Step: 51140   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-01-08 06:12:43,281-Speed 5462.67 samples/sec   Loss 8.7641   LearningRate 0.1887   Epoch: 4   Global Step: 51150   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-01-08 06:12:50,862-Speed 5403.42 samples/sec   Loss 8.7423   LearningRate 0.1886   Epoch: 4   Global Step: 51160   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-01-08 06:12:58,367-Speed 5458.57 samples/sec   Loss 8.6851   LearningRate 0.1886   Epoch: 4   Global Step: 51170   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-01-08 06:13:05,935-Speed 5413.02 samples/sec   Loss 8.6860   LearningRate 0.1886   Epoch: 4   Global Step: 51180   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:13:13,447-Speed 5453.18 samples/sec   Loss 8.7385   LearningRate 0.1886   Epoch: 4   Global Step: 51190   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:13:21,004-Speed 5420.92 samples/sec   Loss 8.7175   LearningRate 0.1885   Epoch: 4   Global Step: 51200   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:13:28,530-Speed 5443.30 samples/sec   Loss 8.6703   LearningRate 0.1885   Epoch: 4   Global Step: 51210   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:13:36,066-Speed 5436.21 samples/sec   Loss 8.7125   LearningRate 0.1885   Epoch: 4   Global Step: 51220   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:13:43,560-Speed 5466.06 samples/sec   Loss 8.6925   LearningRate 0.1885   Epoch: 4   Global Step: 51230   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:13:51,055-Speed 5465.11 samples/sec   Loss 8.6500   LearningRate 0.1884   Epoch: 4   Global Step: 51240   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:13:58,529-Speed 5481.10 samples/sec   Loss 8.7238   LearningRate 0.1884   Epoch: 4   Global Step: 51250   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:14:06,111-Speed 5403.81 samples/sec   Loss 8.7025   LearningRate 0.1884   Epoch: 4   Global Step: 51260   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:14:13,638-Speed 5442.01 samples/sec   Loss 8.6912   LearningRate 0.1884   Epoch: 4   Global Step: 51270   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:14:21,126-Speed 5470.93 samples/sec   Loss 8.6354   LearningRate 0.1883   Epoch: 4   Global Step: 51280   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:14:28,695-Speed 5412.07 samples/sec   Loss 8.7119   LearningRate 0.1883   Epoch: 4   Global Step: 51290   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:14:36,174-Speed 5477.70 samples/sec   Loss 8.6544   LearningRate 0.1883   Epoch: 4   Global Step: 51300   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:14:43,652-Speed 5477.93 samples/sec   Loss 8.7056   LearningRate 0.1883   Epoch: 4   Global Step: 51310   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:14:51,102-Speed 5498.28 samples/sec   Loss 8.7550   LearningRate 0.1882   Epoch: 4   Global Step: 51320   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:14:58,697-Speed 5394.16 samples/sec   Loss 8.7411   LearningRate 0.1882   Epoch: 4   Global Step: 51330   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:15:06,194-Speed 5464.34 samples/sec   Loss 8.6707   LearningRate 0.1882   Epoch: 4   Global Step: 51340   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:15:13,711-Speed 5449.82 samples/sec   Loss 8.6696   LearningRate 0.1882   Epoch: 4   Global Step: 51350   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:15:21,247-Speed 5435.12 samples/sec   Loss 8.7262   LearningRate 0.1881   Epoch: 4   Global Step: 51360   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:15:28,796-Speed 5427.03 samples/sec   Loss 8.7770   LearningRate 0.1881   Epoch: 4   Global Step: 51370   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:15:36,216-Speed 5521.23 samples/sec   Loss 8.7030   LearningRate 0.1881   Epoch: 4   Global Step: 51380   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:15:43,715-Speed 5462.64 samples/sec   Loss 8.6659   LearningRate 0.1881   Epoch: 4   Global Step: 51390   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:15:51,238-Speed 5445.52 samples/sec   Loss 8.6905   LearningRate 0.1881   Epoch: 4   Global Step: 51400   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:15:58,747-Speed 5455.08 samples/sec   Loss 8.6723   LearningRate 0.1880   Epoch: 4   Global Step: 51410   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:16:06,377-Speed 5369.13 samples/sec   Loss 8.7027   LearningRate 0.1880   Epoch: 4   Global Step: 51420   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:16:13,941-Speed 5415.65 samples/sec   Loss 8.6628   LearningRate 0.1880   Epoch: 4   Global Step: 51430   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:16:21,468-Speed 5442.99 samples/sec   Loss 8.6551   LearningRate 0.1880   Epoch: 4   Global Step: 51440   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:16:29,052-Speed 5401.45 samples/sec   Loss 8.7079   LearningRate 0.1879   Epoch: 4   Global Step: 51450   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:16:36,575-Speed 5445.21 samples/sec   Loss 8.6628   LearningRate 0.1879   Epoch: 4   Global Step: 51460   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:16:44,040-Speed 5487.49 samples/sec   Loss 8.6983   LearningRate 0.1879   Epoch: 4   Global Step: 51470   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:16:51,524-Speed 5473.74 samples/sec   Loss 8.6829   LearningRate 0.1879   Epoch: 4   Global Step: 51480   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:16:59,071-Speed 5428.65 samples/sec   Loss 8.6903   LearningRate 0.1878   Epoch: 4   Global Step: 51490   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:17:06,554-Speed 5473.47 samples/sec   Loss 8.7497   LearningRate 0.1878   Epoch: 4   Global Step: 51500   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:17:14,073-Speed 5448.52 samples/sec   Loss 8.6988   LearningRate 0.1878   Epoch: 4   Global Step: 51510   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:17:21,636-Speed 5417.36 samples/sec   Loss 8.6142   LearningRate 0.1878   Epoch: 4   Global Step: 51520   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:17:29,246-Speed 5382.67 samples/sec   Loss 8.6896   LearningRate 0.1877   Epoch: 4   Global Step: 51530   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:17:36,807-Speed 5418.12 samples/sec   Loss 8.7418   LearningRate 0.1877   Epoch: 4   Global Step: 51540   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:17:44,318-Speed 5454.50 samples/sec   Loss 8.7139   LearningRate 0.1877   Epoch: 4   Global Step: 51550   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:17:51,922-Speed 5387.30 samples/sec   Loss 8.6786   LearningRate 0.1877   Epoch: 4   Global Step: 51560   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:18:00,243-Speed 4922.97 samples/sec   Loss 8.7090   LearningRate 0.1876   Epoch: 4   Global Step: 51570   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:18:07,806-Speed 5416.59 samples/sec   Loss 8.6983   LearningRate 0.1876   Epoch: 4   Global Step: 51580   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 06:18:15,317-Speed 5454.24 samples/sec   Loss 8.7065   LearningRate 0.1876   Epoch: 4   Global Step: 51590   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:18:22,960-Speed 5360.21 samples/sec   Loss 8.6609   LearningRate 0.1876   Epoch: 4   Global Step: 51600   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:18:30,507-Speed 5427.87 samples/sec   Loss 8.7079   LearningRate 0.1875   Epoch: 4   Global Step: 51610   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:18:38,110-Speed 5387.70 samples/sec   Loss 8.7099   LearningRate 0.1875   Epoch: 4   Global Step: 51620   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:18:45,688-Speed 5406.49 samples/sec   Loss 8.6923   LearningRate 0.1875   Epoch: 4   Global Step: 51630   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:18:53,307-Speed 5376.58 samples/sec   Loss 8.6734   LearningRate 0.1875   Epoch: 4   Global Step: 51640   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:19:00,918-Speed 5382.46 samples/sec   Loss 8.6757   LearningRate 0.1874   Epoch: 4   Global Step: 51650   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:19:08,472-Speed 5423.28 samples/sec   Loss 8.6721   LearningRate 0.1874   Epoch: 4   Global Step: 51660   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:19:16,266-Speed 5256.04 samples/sec   Loss 8.6073   LearningRate 0.1874   Epoch: 4   Global Step: 51670   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:19:23,859-Speed 5395.02 samples/sec   Loss 8.7486   LearningRate 0.1874   Epoch: 4   Global Step: 51680   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:19:31,320-Speed 5489.75 samples/sec   Loss 8.7284   LearningRate 0.1874   Epoch: 4   Global Step: 51690   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-01-08 06:19:38,955-Speed 5365.55 samples/sec   Loss 8.6745   LearningRate 0.1873   Epoch: 4   Global Step: 51700   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-01-08 06:19:46,597-Speed 5360.88 samples/sec   Loss 8.7385   LearningRate 0.1873   Epoch: 4   Global Step: 51710   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-01-08 06:19:54,085-Speed 5470.75 samples/sec   Loss 8.6854   LearningRate 0.1873   Epoch: 4   Global Step: 51720   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-01-08 06:20:01,672-Speed 5399.48 samples/sec   Loss 8.7246   LearningRate 0.1873   Epoch: 4   Global Step: 51730   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-01-08 06:20:09,184-Speed 5453.38 samples/sec   Loss 8.6434   LearningRate 0.1872   Epoch: 4   Global Step: 51740   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-01-08 06:20:16,776-Speed 5396.04 samples/sec   Loss 8.6804   LearningRate 0.1872   Epoch: 4   Global Step: 51750   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-01-08 06:20:24,271-Speed 5465.59 samples/sec   Loss 8.7661   LearningRate 0.1872   Epoch: 4   Global Step: 51760   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-01-08 06:20:31,896-Speed 5372.82 samples/sec   Loss 8.6631   LearningRate 0.1872   Epoch: 4   Global Step: 51770   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-01-08 06:20:39,473-Speed 5406.22 samples/sec   Loss 8.6095   LearningRate 0.1871   Epoch: 4   Global Step: 51780   Fp16 Grad Scale: 32768   Required: 36 hours
Training: 2022-01-08 06:20:46,937-Speed 5488.65 samples/sec   Loss 8.7057   LearningRate 0.1871   Epoch: 4   Global Step: 51790   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:20:54,456-Speed 5448.42 samples/sec   Loss 8.6817   LearningRate 0.1871   Epoch: 4   Global Step: 51800   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:21:02,000-Speed 5430.25 samples/sec   Loss 8.6297   LearningRate 0.1871   Epoch: 4   Global Step: 51810   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:21:09,420-Speed 5520.53 samples/sec   Loss 8.6845   LearningRate 0.1870   Epoch: 4   Global Step: 51820   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:21:17,006-Speed 5400.14 samples/sec   Loss 8.6698   LearningRate 0.1870   Epoch: 4   Global Step: 51830   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:21:24,641-Speed 5365.62 samples/sec   Loss 8.7294   LearningRate 0.1870   Epoch: 4   Global Step: 51840   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:21:46,897-Speed 1840.52 samples/sec   Loss 8.6155   LearningRate 0.1870   Epoch: 5   Global Step: 51850   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:21:54,394-Speed 5464.43 samples/sec   Loss 8.6472   LearningRate 0.1869   Epoch: 5   Global Step: 51860   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:22:01,833-Speed 5506.93 samples/sec   Loss 8.6924   LearningRate 0.1869   Epoch: 5   Global Step: 51870   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:22:09,429-Speed 5393.20 samples/sec   Loss 8.6392   LearningRate 0.1869   Epoch: 5   Global Step: 51880   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:22:17,116-Speed 5328.51 samples/sec   Loss 8.5739   LearningRate 0.1869   Epoch: 5   Global Step: 51890   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:22:24,518-Speed 5534.42 samples/sec   Loss 8.5394   LearningRate 0.1868   Epoch: 5   Global Step: 51900   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:22:31,931-Speed 5526.46 samples/sec   Loss 8.6105   LearningRate 0.1868   Epoch: 5   Global Step: 51910   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:22:39,375-Speed 5503.97 samples/sec   Loss 8.6703   LearningRate 0.1868   Epoch: 5   Global Step: 51920   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:22:46,947-Speed 5409.72 samples/sec   Loss 8.6522   LearningRate 0.1868   Epoch: 5   Global Step: 51930   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:22:54,439-Speed 5468.09 samples/sec   Loss 8.6649   LearningRate 0.1868   Epoch: 5   Global Step: 51940   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:23:01,841-Speed 5533.70 samples/sec   Loss 8.7116   LearningRate 0.1867   Epoch: 5   Global Step: 51950   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:23:09,350-Speed 5456.03 samples/sec   Loss 8.6589   LearningRate 0.1867   Epoch: 5   Global Step: 51960   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:23:16,828-Speed 5478.41 samples/sec   Loss 8.6580   LearningRate 0.1867   Epoch: 5   Global Step: 51970   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:23:24,294-Speed 5486.26 samples/sec   Loss 8.7402   LearningRate 0.1867   Epoch: 5   Global Step: 51980   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:23:31,762-Speed 5485.43 samples/sec   Loss 8.6899   LearningRate 0.1866   Epoch: 5   Global Step: 51990   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:23:39,206-Speed 5503.21 samples/sec   Loss 8.6634   LearningRate 0.1866   Epoch: 5   Global Step: 52000   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:24:24,102-[lfw][52000]XNorm: 23.197142
Training: 2022-01-08 06:24:24,102-[lfw][52000]Accuracy-Flip: 0.99750+-0.00261
Training: 2022-01-08 06:24:24,103-[lfw][52000]Accuracy-Highest: 0.99817
Training: 2022-01-08 06:25:16,178-[cfp_fp][52000]XNorm: 20.822825
Training: 2022-01-08 06:25:16,179-[cfp_fp][52000]Accuracy-Flip: 0.98429+-0.00383
Training: 2022-01-08 06:25:16,179-[cfp_fp][52000]Accuracy-Highest: 0.98600
Training: 2022-01-08 06:26:02,076-[agedb_30][52000]XNorm: 22.821595
Training: 2022-01-08 06:26:02,077-[agedb_30][52000]Accuracy-Flip: 0.97233+-0.00790
Training: 2022-01-08 06:26:02,078-[agedb_30][52000]Accuracy-Highest: 0.97250
Training: 2022-01-08 06:26:09,646-Speed 272.27 samples/sec   Loss 8.6605   LearningRate 0.1866   Epoch: 5   Global Step: 52010   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:26:17,096-Speed 5501.96 samples/sec   Loss 8.6896   LearningRate 0.1866   Epoch: 5   Global Step: 52020   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:26:24,651-Speed 5422.08 samples/sec   Loss 8.6153   LearningRate 0.1865   Epoch: 5   Global Step: 52030   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:26:32,116-Speed 5488.60 samples/sec   Loss 8.6518   LearningRate 0.1865   Epoch: 5   Global Step: 52040   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:26:39,607-Speed 5468.60 samples/sec   Loss 8.6280   LearningRate 0.1865   Epoch: 5   Global Step: 52050   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:26:47,028-Speed 5520.78 samples/sec   Loss 8.6180   LearningRate 0.1865   Epoch: 5   Global Step: 52060   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:26:54,503-Speed 5481.01 samples/sec   Loss 8.7284   LearningRate 0.1864   Epoch: 5   Global Step: 52070   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:27:01,915-Speed 5527.16 samples/sec   Loss 8.6294   LearningRate 0.1864   Epoch: 5   Global Step: 52080   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:27:09,203-Speed 5621.70 samples/sec   Loss 8.5798   LearningRate 0.1864   Epoch: 5   Global Step: 52090   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:27:16,661-Speed 5493.60 samples/sec   Loss 8.6836   LearningRate 0.1864   Epoch: 5   Global Step: 52100   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:27:24,175-Speed 5452.46 samples/sec   Loss 8.5745   LearningRate 0.1863   Epoch: 5   Global Step: 52110   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:27:31,746-Speed 5411.82 samples/sec   Loss 8.6242   LearningRate 0.1863   Epoch: 5   Global Step: 52120   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:27:39,315-Speed 5412.85 samples/sec   Loss 8.5843   LearningRate 0.1863   Epoch: 5   Global Step: 52130   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:27:46,967-Speed 5353.90 samples/sec   Loss 8.6036   LearningRate 0.1863   Epoch: 5   Global Step: 52140   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:27:54,696-Speed 5300.77 samples/sec   Loss 8.6901   LearningRate 0.1862   Epoch: 5   Global Step: 52150   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:28:02,357-Speed 5347.49 samples/sec   Loss 8.6681   LearningRate 0.1862   Epoch: 5   Global Step: 52160   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:28:10,039-Speed 5332.82 samples/sec   Loss 8.6425   LearningRate 0.1862   Epoch: 5   Global Step: 52170   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:28:17,837-Speed 5253.61 samples/sec   Loss 8.6976   LearningRate 0.1862   Epoch: 5   Global Step: 52180   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:28:25,552-Speed 5310.21 samples/sec   Loss 8.6170   LearningRate 0.1862   Epoch: 5   Global Step: 52190   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:28:33,162-Speed 5383.74 samples/sec   Loss 8.6419   LearningRate 0.1861   Epoch: 5   Global Step: 52200   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:28:40,889-Speed 5301.19 samples/sec   Loss 8.6050   LearningRate 0.1861   Epoch: 5   Global Step: 52210   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:28:48,560-Speed 5340.41 samples/sec   Loss 8.6741   LearningRate 0.1861   Epoch: 5   Global Step: 52220   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:28:56,365-Speed 5249.30 samples/sec   Loss 8.5536   LearningRate 0.1861   Epoch: 5   Global Step: 52230   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:29:04,190-Speed 5234.70 samples/sec   Loss 8.6886   LearningRate 0.1860   Epoch: 5   Global Step: 52240   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:29:11,898-Speed 5314.84 samples/sec   Loss 8.6562   LearningRate 0.1860   Epoch: 5   Global Step: 52250   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:29:19,584-Speed 5330.10 samples/sec   Loss 8.6425   LearningRate 0.1860   Epoch: 5   Global Step: 52260   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:29:27,172-Speed 5399.14 samples/sec   Loss 8.6102   LearningRate 0.1860   Epoch: 5   Global Step: 52270   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:29:34,767-Speed 5394.27 samples/sec   Loss 8.6208   LearningRate 0.1859   Epoch: 5   Global Step: 52280   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:29:42,467-Speed 5319.61 samples/sec   Loss 8.6683   LearningRate 0.1859   Epoch: 5   Global Step: 52290   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:29:50,032-Speed 5415.19 samples/sec   Loss 8.6153   LearningRate 0.1859   Epoch: 5   Global Step: 52300   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:29:57,510-Speed 5477.98 samples/sec   Loss 8.6279   LearningRate 0.1859   Epoch: 5   Global Step: 52310   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:30:05,151-Speed 5361.58 samples/sec   Loss 8.6337   LearningRate 0.1858   Epoch: 5   Global Step: 52320   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:30:12,619-Speed 5485.45 samples/sec   Loss 8.6390   LearningRate 0.1858   Epoch: 5   Global Step: 52330   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:30:20,278-Speed 5348.23 samples/sec   Loss 8.6059   LearningRate 0.1858   Epoch: 5   Global Step: 52340   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 06:30:27,820-Speed 5431.72 samples/sec   Loss 8.6833   LearningRate 0.1858   Epoch: 5   Global Step: 52350   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 06:30:35,514-Speed 5324.74 samples/sec   Loss 8.6408   LearningRate 0.1857   Epoch: 5   Global Step: 52360   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:30:43,042-Speed 5441.75 samples/sec   Loss 8.6670   LearningRate 0.1857   Epoch: 5   Global Step: 52370   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:30:50,505-Speed 5488.21 samples/sec   Loss 8.6737   LearningRate 0.1857   Epoch: 5   Global Step: 52380   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:30:57,984-Speed 5477.62 samples/sec   Loss 8.6600   LearningRate 0.1857   Epoch: 5   Global Step: 52390   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:31:05,419-Speed 5509.78 samples/sec   Loss 8.6564   LearningRate 0.1856   Epoch: 5   Global Step: 52400   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:31:12,873-Speed 5496.03 samples/sec   Loss 8.6893   LearningRate 0.1856   Epoch: 5   Global Step: 52410   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:31:20,648-Speed 5268.47 samples/sec   Loss 8.6348   LearningRate 0.1856   Epoch: 5   Global Step: 52420   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:31:28,285-Speed 5364.79 samples/sec   Loss 8.6251   LearningRate 0.1856   Epoch: 5   Global Step: 52430   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:31:35,828-Speed 5431.07 samples/sec   Loss 8.6420   LearningRate 0.1856   Epoch: 5   Global Step: 52440   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:31:43,311-Speed 5473.77 samples/sec   Loss 8.5771   LearningRate 0.1855   Epoch: 5   Global Step: 52450   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:31:51,016-Speed 5316.66 samples/sec   Loss 8.6341   LearningRate 0.1855   Epoch: 5   Global Step: 52460   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:31:58,589-Speed 5410.26 samples/sec   Loss 8.6086   LearningRate 0.1855   Epoch: 5   Global Step: 52470   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:32:06,032-Speed 5503.90 samples/sec   Loss 8.6696   LearningRate 0.1855   Epoch: 5   Global Step: 52480   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:32:13,645-Speed 5380.78 samples/sec   Loss 8.5892   LearningRate 0.1854   Epoch: 5   Global Step: 52490   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:32:21,105-Speed 5491.42 samples/sec   Loss 8.6602   LearningRate 0.1854   Epoch: 5   Global Step: 52500   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:32:28,620-Speed 5451.21 samples/sec   Loss 8.6121   LearningRate 0.1854   Epoch: 5   Global Step: 52510   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:32:36,355-Speed 5296.90 samples/sec   Loss 8.5790   LearningRate 0.1854   Epoch: 5   Global Step: 52520   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:32:43,867-Speed 5452.71 samples/sec   Loss 8.6412   LearningRate 0.1853   Epoch: 5   Global Step: 52530   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:32:51,358-Speed 5469.00 samples/sec   Loss 8.7030   LearningRate 0.1853   Epoch: 5   Global Step: 52540   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:32:58,891-Speed 5438.32 samples/sec   Loss 8.7113   LearningRate 0.1853   Epoch: 5   Global Step: 52550   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 06:33:06,397-Speed 5457.61 samples/sec   Loss 8.6388   LearningRate 0.1853   Epoch: 5   Global Step: 52560   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 06:33:13,848-Speed 5497.28 samples/sec   Loss 8.5376   LearningRate 0.1852   Epoch: 5   Global Step: 52570   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:33:21,394-Speed 5429.18 samples/sec   Loss 8.6302   LearningRate 0.1852   Epoch: 5   Global Step: 52580   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:33:28,956-Speed 5418.32 samples/sec   Loss 8.6023   LearningRate 0.1852   Epoch: 5   Global Step: 52590   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:33:36,421-Speed 5489.17 samples/sec   Loss 8.5618   LearningRate 0.1852   Epoch: 5   Global Step: 52600   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:33:43,932-Speed 5453.60 samples/sec   Loss 8.6396   LearningRate 0.1851   Epoch: 5   Global Step: 52610   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:33:51,328-Speed 5538.82 samples/sec   Loss 8.5524   LearningRate 0.1851   Epoch: 5   Global Step: 52620   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:33:58,731-Speed 5534.13 samples/sec   Loss 8.5730   LearningRate 0.1851   Epoch: 5   Global Step: 52630   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:34:06,240-Speed 5455.26 samples/sec   Loss 8.5697   LearningRate 0.1851   Epoch: 5   Global Step: 52640   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:34:13,606-Speed 5561.59 samples/sec   Loss 8.6948   LearningRate 0.1851   Epoch: 5   Global Step: 52650   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:34:21,019-Speed 5525.95 samples/sec   Loss 8.5826   LearningRate 0.1850   Epoch: 5   Global Step: 52660   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:34:28,762-Speed 5291.08 samples/sec   Loss 8.6813   LearningRate 0.1850   Epoch: 5   Global Step: 52670   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:34:36,507-Speed 5289.27 samples/sec   Loss 8.7350   LearningRate 0.1850   Epoch: 5   Global Step: 52680   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:34:44,095-Speed 5398.63 samples/sec   Loss 8.6211   LearningRate 0.1850   Epoch: 5   Global Step: 52690   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:34:51,551-Speed 5494.30 samples/sec   Loss 8.7005   LearningRate 0.1849   Epoch: 5   Global Step: 52700   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:34:58,944-Speed 5540.99 samples/sec   Loss 8.5595   LearningRate 0.1849   Epoch: 5   Global Step: 52710   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:35:06,420-Speed 5479.86 samples/sec   Loss 8.6333   LearningRate 0.1849   Epoch: 5   Global Step: 52720   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:35:14,000-Speed 5404.08 samples/sec   Loss 8.6557   LearningRate 0.1849   Epoch: 5   Global Step: 52730   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:35:21,422-Speed 5519.62 samples/sec   Loss 8.6576   LearningRate 0.1848   Epoch: 5   Global Step: 52740   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:35:28,942-Speed 5447.62 samples/sec   Loss 8.6263   LearningRate 0.1848   Epoch: 5   Global Step: 52750   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:35:36,350-Speed 5529.88 samples/sec   Loss 8.6463   LearningRate 0.1848   Epoch: 5   Global Step: 52760   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:35:43,915-Speed 5415.13 samples/sec   Loss 8.6394   LearningRate 0.1848   Epoch: 5   Global Step: 52770   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:35:51,433-Speed 5449.25 samples/sec   Loss 8.5686   LearningRate 0.1847   Epoch: 5   Global Step: 52780   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:35:58,866-Speed 5511.35 samples/sec   Loss 8.5881   LearningRate 0.1847   Epoch: 5   Global Step: 52790   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:36:06,382-Speed 5450.17 samples/sec   Loss 8.6274   LearningRate 0.1847   Epoch: 5   Global Step: 52800   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:36:13,902-Speed 5448.17 samples/sec   Loss 8.6130   LearningRate 0.1847   Epoch: 5   Global Step: 52810   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:36:21,313-Speed 5526.70 samples/sec   Loss 8.5488   LearningRate 0.1846   Epoch: 5   Global Step: 52820   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:36:28,758-Speed 5502.99 samples/sec   Loss 8.5855   LearningRate 0.1846   Epoch: 5   Global Step: 52830   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:36:36,269-Speed 5454.46 samples/sec   Loss 8.6360   LearningRate 0.1846   Epoch: 5   Global Step: 52840   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:36:43,859-Speed 5397.65 samples/sec   Loss 8.6313   LearningRate 0.1846   Epoch: 5   Global Step: 52850   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:36:51,490-Speed 5367.87 samples/sec   Loss 8.6579   LearningRate 0.1845   Epoch: 5   Global Step: 52860   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:36:59,147-Speed 5350.44 samples/sec   Loss 8.6751   LearningRate 0.1845   Epoch: 5   Global Step: 52870   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:37:06,677-Speed 5440.47 samples/sec   Loss 8.6157   LearningRate 0.1845   Epoch: 5   Global Step: 52880   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:37:14,197-Speed 5447.46 samples/sec   Loss 8.5926   LearningRate 0.1845   Epoch: 5   Global Step: 52890   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:37:21,706-Speed 5455.61 samples/sec   Loss 8.6267   LearningRate 0.1845   Epoch: 5   Global Step: 52900   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:37:29,125-Speed 5521.33 samples/sec   Loss 8.5817   LearningRate 0.1844   Epoch: 5   Global Step: 52910   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:37:36,618-Speed 5467.88 samples/sec   Loss 8.6621   LearningRate 0.1844   Epoch: 5   Global Step: 52920   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:37:44,180-Speed 5417.15 samples/sec   Loss 8.5894   LearningRate 0.1844   Epoch: 5   Global Step: 52930   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:37:51,670-Speed 5469.27 samples/sec   Loss 8.6025   LearningRate 0.1844   Epoch: 5   Global Step: 52940   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:37:59,137-Speed 5485.94 samples/sec   Loss 8.5895   LearningRate 0.1843   Epoch: 5   Global Step: 52950   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:38:06,695-Speed 5420.41 samples/sec   Loss 8.5946   LearningRate 0.1843   Epoch: 5   Global Step: 52960   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:38:14,195-Speed 5462.32 samples/sec   Loss 8.5577   LearningRate 0.1843   Epoch: 5   Global Step: 52970   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:38:21,734-Speed 5433.73 samples/sec   Loss 8.6094   LearningRate 0.1843   Epoch: 5   Global Step: 52980   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:38:29,370-Speed 5365.07 samples/sec   Loss 8.6482   LearningRate 0.1842   Epoch: 5   Global Step: 52990   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:38:36,872-Speed 5460.81 samples/sec   Loss 8.6023   LearningRate 0.1842   Epoch: 5   Global Step: 53000   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:38:44,361-Speed 5470.00 samples/sec   Loss 8.6099   LearningRate 0.1842   Epoch: 5   Global Step: 53010   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:38:51,875-Speed 5451.56 samples/sec   Loss 8.6973   LearningRate 0.1842   Epoch: 5   Global Step: 53020   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:38:59,332-Speed 5493.41 samples/sec   Loss 8.6226   LearningRate 0.1841   Epoch: 5   Global Step: 53030   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 06:39:06,887-Speed 5422.25 samples/sec   Loss 8.5931   LearningRate 0.1841   Epoch: 5   Global Step: 53040   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:39:14,388-Speed 5461.97 samples/sec   Loss 8.5375   LearningRate 0.1841   Epoch: 5   Global Step: 53050   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:39:21,949-Speed 5417.93 samples/sec   Loss 8.5943   LearningRate 0.1841   Epoch: 5   Global Step: 53060   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:39:29,554-Speed 5386.50 samples/sec   Loss 8.5790   LearningRate 0.1840   Epoch: 5   Global Step: 53070   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:39:37,251-Speed 5322.29 samples/sec   Loss 8.5986   LearningRate 0.1840   Epoch: 5   Global Step: 53080   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:39:44,819-Speed 5413.56 samples/sec   Loss 8.6220   LearningRate 0.1840   Epoch: 5   Global Step: 53090   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:39:52,261-Speed 5504.40 samples/sec   Loss 8.6721   LearningRate 0.1840   Epoch: 5   Global Step: 53100   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:39:59,984-Speed 5304.58 samples/sec   Loss 8.5576   LearningRate 0.1840   Epoch: 5   Global Step: 53110   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:40:07,694-Speed 5312.87 samples/sec   Loss 8.6460   LearningRate 0.1839   Epoch: 5   Global Step: 53120   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:40:15,198-Speed 5459.91 samples/sec   Loss 8.5934   LearningRate 0.1839   Epoch: 5   Global Step: 53130   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:40:22,665-Speed 5485.42 samples/sec   Loss 8.6087   LearningRate 0.1839   Epoch: 5   Global Step: 53140   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:40:30,255-Speed 5397.50 samples/sec   Loss 8.5413   LearningRate 0.1839   Epoch: 5   Global Step: 53150   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:40:37,740-Speed 5472.93 samples/sec   Loss 8.6040   LearningRate 0.1838   Epoch: 5   Global Step: 53160   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:40:45,240-Speed 5462.41 samples/sec   Loss 8.5765   LearningRate 0.1838   Epoch: 5   Global Step: 53170   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:40:52,819-Speed 5405.30 samples/sec   Loss 8.6205   LearningRate 0.1838   Epoch: 5   Global Step: 53180   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:41:00,364-Speed 5429.49 samples/sec   Loss 8.5322   LearningRate 0.1838   Epoch: 5   Global Step: 53190   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:41:07,978-Speed 5380.15 samples/sec   Loss 8.6434   LearningRate 0.1837   Epoch: 5   Global Step: 53200   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:41:15,586-Speed 5384.98 samples/sec   Loss 8.6248   LearningRate 0.1837   Epoch: 5   Global Step: 53210   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:41:23,088-Speed 5460.78 samples/sec   Loss 8.5457   LearningRate 0.1837   Epoch: 5   Global Step: 53220   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:41:30,605-Speed 5449.18 samples/sec   Loss 8.5230   LearningRate 0.1837   Epoch: 5   Global Step: 53230   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:41:38,078-Speed 5482.34 samples/sec   Loss 8.5390   LearningRate 0.1836   Epoch: 5   Global Step: 53240   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:41:45,490-Speed 5527.23 samples/sec   Loss 8.6180   LearningRate 0.1836   Epoch: 5   Global Step: 53250   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:41:52,940-Speed 5498.37 samples/sec   Loss 8.5851   LearningRate 0.1836   Epoch: 5   Global Step: 53260   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:42:00,437-Speed 5464.60 samples/sec   Loss 8.6393   LearningRate 0.1836   Epoch: 5   Global Step: 53270   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:42:07,952-Speed 5450.30 samples/sec   Loss 8.6006   LearningRate 0.1835   Epoch: 5   Global Step: 53280   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:42:15,548-Speed 5393.63 samples/sec   Loss 8.6337   LearningRate 0.1835   Epoch: 5   Global Step: 53290   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:42:23,185-Speed 5364.06 samples/sec   Loss 8.6356   LearningRate 0.1835   Epoch: 5   Global Step: 53300   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:42:30,714-Speed 5440.99 samples/sec   Loss 8.6090   LearningRate 0.1835   Epoch: 5   Global Step: 53310   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:42:38,256-Speed 5431.31 samples/sec   Loss 8.5692   LearningRate 0.1835   Epoch: 5   Global Step: 53320   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:42:45,788-Speed 5439.11 samples/sec   Loss 8.5795   LearningRate 0.1834   Epoch: 5   Global Step: 53330   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:42:53,357-Speed 5412.57 samples/sec   Loss 8.5605   LearningRate 0.1834   Epoch: 5   Global Step: 53340   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:43:00,786-Speed 5513.50 samples/sec   Loss 8.5700   LearningRate 0.1834   Epoch: 5   Global Step: 53350   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:43:08,339-Speed 5423.94 samples/sec   Loss 8.5855   LearningRate 0.1834   Epoch: 5   Global Step: 53360   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:43:15,871-Speed 5438.36 samples/sec   Loss 8.6891   LearningRate 0.1833   Epoch: 5   Global Step: 53370   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:43:23,422-Speed 5425.90 samples/sec   Loss 8.5454   LearningRate 0.1833   Epoch: 5   Global Step: 53380   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:43:30,891-Speed 5484.44 samples/sec   Loss 8.5616   LearningRate 0.1833   Epoch: 5   Global Step: 53390   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 06:43:38,361-Speed 5484.29 samples/sec   Loss 8.5726   LearningRate 0.1833   Epoch: 5   Global Step: 53400   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:43:45,883-Speed 5445.97 samples/sec   Loss 8.5860   LearningRate 0.1832   Epoch: 5   Global Step: 53410   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:43:53,363-Speed 5476.76 samples/sec   Loss 8.5691   LearningRate 0.1832   Epoch: 5   Global Step: 53420   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:44:00,750-Speed 5545.53 samples/sec   Loss 8.5822   LearningRate 0.1832   Epoch: 5   Global Step: 53430   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:44:08,251-Speed 5460.94 samples/sec   Loss 8.5398   LearningRate 0.1832   Epoch: 5   Global Step: 53440   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:44:15,686-Speed 5510.29 samples/sec   Loss 8.5547   LearningRate 0.1831   Epoch: 5   Global Step: 53450   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:44:23,145-Speed 5492.57 samples/sec   Loss 8.6284   LearningRate 0.1831   Epoch: 5   Global Step: 53460   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:44:30,635-Speed 5468.92 samples/sec   Loss 8.5280   LearningRate 0.1831   Epoch: 5   Global Step: 53470   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:44:38,312-Speed 5336.14 samples/sec   Loss 8.6121   LearningRate 0.1831   Epoch: 5   Global Step: 53480   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:44:45,772-Speed 5491.13 samples/sec   Loss 8.5498   LearningRate 0.1830   Epoch: 5   Global Step: 53490   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:44:53,300-Speed 5441.83 samples/sec   Loss 8.5918   LearningRate 0.1830   Epoch: 5   Global Step: 53500   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:45:00,720-Speed 5520.90 samples/sec   Loss 8.5827   LearningRate 0.1830   Epoch: 5   Global Step: 53510   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:45:08,192-Speed 5482.58 samples/sec   Loss 8.6374   LearningRate 0.1830   Epoch: 5   Global Step: 53520   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:45:15,690-Speed 5463.83 samples/sec   Loss 8.5665   LearningRate 0.1830   Epoch: 5   Global Step: 53530   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:45:23,176-Speed 5472.26 samples/sec   Loss 8.5832   LearningRate 0.1829   Epoch: 5   Global Step: 53540   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:45:30,656-Speed 5476.77 samples/sec   Loss 8.5388   LearningRate 0.1829   Epoch: 5   Global Step: 53550   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:45:38,070-Speed 5525.50 samples/sec   Loss 8.5050   LearningRate 0.1829   Epoch: 5   Global Step: 53560   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:45:45,538-Speed 5485.18 samples/sec   Loss 8.5779   LearningRate 0.1829   Epoch: 5   Global Step: 53570   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:45:52,955-Speed 5523.86 samples/sec   Loss 8.5267   LearningRate 0.1828   Epoch: 5   Global Step: 53580   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:46:00,461-Speed 5457.36 samples/sec   Loss 8.5703   LearningRate 0.1828   Epoch: 5   Global Step: 53590   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:46:07,847-Speed 5546.06 samples/sec   Loss 8.5110   LearningRate 0.1828   Epoch: 5   Global Step: 53600   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:46:15,493-Speed 5357.83 samples/sec   Loss 8.5987   LearningRate 0.1828   Epoch: 5   Global Step: 53610   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:46:22,894-Speed 5536.04 samples/sec   Loss 8.6403   LearningRate 0.1827   Epoch: 5   Global Step: 53620   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:46:30,383-Speed 5470.25 samples/sec   Loss 8.5151   LearningRate 0.1827   Epoch: 5   Global Step: 53630   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:46:37,843-Speed 5490.97 samples/sec   Loss 8.5180   LearningRate 0.1827   Epoch: 5   Global Step: 53640   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:46:45,335-Speed 5467.78 samples/sec   Loss 8.5956   LearningRate 0.1827   Epoch: 5   Global Step: 53650   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:46:52,854-Speed 5448.66 samples/sec   Loss 8.5470   LearningRate 0.1826   Epoch: 5   Global Step: 53660   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:47:00,395-Speed 5433.10 samples/sec   Loss 8.5105   LearningRate 0.1826   Epoch: 5   Global Step: 53670   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:47:07,919-Speed 5444.03 samples/sec   Loss 8.5426   LearningRate 0.1826   Epoch: 5   Global Step: 53680   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:47:15,382-Speed 5489.31 samples/sec   Loss 8.5941   LearningRate 0.1826   Epoch: 5   Global Step: 53690   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:47:22,968-Speed 5400.61 samples/sec   Loss 8.5560   LearningRate 0.1825   Epoch: 5   Global Step: 53700   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:47:30,529-Speed 5418.15 samples/sec   Loss 8.5658   LearningRate 0.1825   Epoch: 5   Global Step: 53710   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:47:38,006-Speed 5478.58 samples/sec   Loss 8.5796   LearningRate 0.1825   Epoch: 5   Global Step: 53720   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:47:45,558-Speed 5424.10 samples/sec   Loss 8.5325   LearningRate 0.1825   Epoch: 5   Global Step: 53730   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:47:52,993-Speed 5510.20 samples/sec   Loss 8.5903   LearningRate 0.1825   Epoch: 5   Global Step: 53740   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:48:00,452-Speed 5492.54 samples/sec   Loss 8.5981   LearningRate 0.1824   Epoch: 5   Global Step: 53750   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:48:08,005-Speed 5423.38 samples/sec   Loss 8.5512   LearningRate 0.1824   Epoch: 5   Global Step: 53760   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:48:15,531-Speed 5443.53 samples/sec   Loss 8.5343   LearningRate 0.1824   Epoch: 5   Global Step: 53770   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:48:22,978-Speed 5500.24 samples/sec   Loss 8.5303   LearningRate 0.1824   Epoch: 5   Global Step: 53780   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:48:30,452-Speed 5481.90 samples/sec   Loss 8.5496   LearningRate 0.1823   Epoch: 5   Global Step: 53790   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:48:37,861-Speed 5528.93 samples/sec   Loss 8.5137   LearningRate 0.1823   Epoch: 5   Global Step: 53800   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:48:45,469-Speed 5384.25 samples/sec   Loss 8.5177   LearningRate 0.1823   Epoch: 5   Global Step: 53810   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:48:52,998-Speed 5441.08 samples/sec   Loss 8.6238   LearningRate 0.1823   Epoch: 5   Global Step: 53820   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:49:00,520-Speed 5446.38 samples/sec   Loss 8.5692   LearningRate 0.1822   Epoch: 5   Global Step: 53830   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:49:08,099-Speed 5405.60 samples/sec   Loss 8.4993   LearningRate 0.1822   Epoch: 5   Global Step: 53840   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:49:15,683-Speed 5401.54 samples/sec   Loss 8.5277   LearningRate 0.1822   Epoch: 5   Global Step: 53850   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:49:23,131-Speed 5499.93 samples/sec   Loss 8.5801   LearningRate 0.1822   Epoch: 5   Global Step: 53860   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:49:30,587-Speed 5494.54 samples/sec   Loss 8.5424   LearningRate 0.1821   Epoch: 5   Global Step: 53870   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:49:38,107-Speed 5447.84 samples/sec   Loss 8.5350   LearningRate 0.1821   Epoch: 5   Global Step: 53880   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:49:45,528-Speed 5519.64 samples/sec   Loss 8.5575   LearningRate 0.1821   Epoch: 5   Global Step: 53890   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:49:53,030-Speed 5460.56 samples/sec   Loss 8.5146   LearningRate 0.1821   Epoch: 5   Global Step: 53900   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:50:00,525-Speed 5466.46 samples/sec   Loss 8.5087   LearningRate 0.1820   Epoch: 5   Global Step: 53910   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:50:08,070-Speed 5429.20 samples/sec   Loss 8.5831   LearningRate 0.1820   Epoch: 5   Global Step: 53920   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:50:15,687-Speed 5378.48 samples/sec   Loss 8.5909   LearningRate 0.1820   Epoch: 5   Global Step: 53930   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:50:23,318-Speed 5367.61 samples/sec   Loss 8.5239   LearningRate 0.1820   Epoch: 5   Global Step: 53940   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:50:30,853-Speed 5437.46 samples/sec   Loss 8.4530   LearningRate 0.1820   Epoch: 5   Global Step: 53950   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:50:38,310-Speed 5493.84 samples/sec   Loss 8.6534   LearningRate 0.1819   Epoch: 5   Global Step: 53960   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:50:45,777-Speed 5485.55 samples/sec   Loss 8.5073   LearningRate 0.1819   Epoch: 5   Global Step: 53970   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:50:53,242-Speed 5487.69 samples/sec   Loss 8.5767   LearningRate 0.1819   Epoch: 5   Global Step: 53980   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:51:00,682-Speed 5506.99 samples/sec   Loss 8.5711   LearningRate 0.1819   Epoch: 5   Global Step: 53990   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:51:08,166-Speed 5473.59 samples/sec   Loss 8.5232   LearningRate 0.1818   Epoch: 5   Global Step: 54000   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:52:03,476-[lfw][54000]XNorm: 21.966289
Training: 2022-01-08 06:52:03,477-[lfw][54000]Accuracy-Flip: 0.99700+-0.00277
Training: 2022-01-08 06:52:03,478-[lfw][54000]Accuracy-Highest: 0.99817
Training: 2022-01-08 06:53:03,484-[cfp_fp][54000]XNorm: 19.873822
Training: 2022-01-08 06:53:03,485-[cfp_fp][54000]Accuracy-Flip: 0.98371+-0.00617
Training: 2022-01-08 06:53:03,486-[cfp_fp][54000]Accuracy-Highest: 0.98600
Training: 2022-01-08 06:53:48,872-[agedb_30][54000]XNorm: 21.688754
Training: 2022-01-08 06:53:48,874-[agedb_30][54000]Accuracy-Flip: 0.97333+-0.00837
Training: 2022-01-08 06:53:48,874-[agedb_30][54000]Accuracy-Highest: 0.97333
Training: 2022-01-08 06:53:56,130-Speed 243.86 samples/sec   Loss 8.4834   LearningRate 0.1818   Epoch: 5   Global Step: 54010   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:54:03,449-Speed 5598.61 samples/sec   Loss 8.5494   LearningRate 0.1818   Epoch: 5   Global Step: 54020   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:54:10,803-Speed 5570.09 samples/sec   Loss 8.5273   LearningRate 0.1818   Epoch: 5   Global Step: 54030   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:54:18,355-Speed 5425.73 samples/sec   Loss 8.4617   LearningRate 0.1817   Epoch: 5   Global Step: 54040   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:54:25,905-Speed 5426.22 samples/sec   Loss 8.5223   LearningRate 0.1817   Epoch: 5   Global Step: 54050   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:54:33,471-Speed 5415.13 samples/sec   Loss 8.5562   LearningRate 0.1817   Epoch: 5   Global Step: 54060   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:54:40,994-Speed 5445.64 samples/sec   Loss 8.5713   LearningRate 0.1817   Epoch: 5   Global Step: 54070   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:54:48,551-Speed 5420.75 samples/sec   Loss 8.6158   LearningRate 0.1816   Epoch: 5   Global Step: 54080   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:54:56,087-Speed 5436.74 samples/sec   Loss 8.6020   LearningRate 0.1816   Epoch: 5   Global Step: 54090   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:55:03,664-Speed 5406.38 samples/sec   Loss 8.5671   LearningRate 0.1816   Epoch: 5   Global Step: 54100   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 06:55:11,151-Speed 5471.60 samples/sec   Loss 8.5608   LearningRate 0.1816   Epoch: 5   Global Step: 54110   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:55:18,680-Speed 5441.27 samples/sec   Loss 8.5260   LearningRate 0.1816   Epoch: 5   Global Step: 54120   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:55:26,276-Speed 5399.97 samples/sec   Loss 8.5416   LearningRate 0.1815   Epoch: 5   Global Step: 54130   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:55:33,900-Speed 5373.58 samples/sec   Loss 8.5485   LearningRate 0.1815   Epoch: 5   Global Step: 54140   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:55:41,462-Speed 5417.23 samples/sec   Loss 8.5016   LearningRate 0.1815   Epoch: 5   Global Step: 54150   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:55:48,990-Speed 5441.28 samples/sec   Loss 8.5914   LearningRate 0.1815   Epoch: 5   Global Step: 54160   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 06:55:56,464-Speed 5481.66 samples/sec   Loss 8.5923   LearningRate 0.1814   Epoch: 5   Global Step: 54170   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:56:04,001-Speed 5435.83 samples/sec   Loss 8.5912   LearningRate 0.1814   Epoch: 5   Global Step: 54180   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:56:11,573-Speed 5410.15 samples/sec   Loss 8.4886   LearningRate 0.1814   Epoch: 5   Global Step: 54190   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:56:19,118-Speed 5428.90 samples/sec   Loss 8.5611   LearningRate 0.1814   Epoch: 5   Global Step: 54200   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:56:26,671-Speed 5424.29 samples/sec   Loss 8.5279   LearningRate 0.1813   Epoch: 5   Global Step: 54210   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:56:34,233-Speed 5416.57 samples/sec   Loss 8.5104   LearningRate 0.1813   Epoch: 5   Global Step: 54220   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:56:41,777-Speed 5430.94 samples/sec   Loss 8.5319   LearningRate 0.1813   Epoch: 5   Global Step: 54230   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:56:49,374-Speed 5391.65 samples/sec   Loss 8.5919   LearningRate 0.1813   Epoch: 5   Global Step: 54240   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:56:56,947-Speed 5409.44 samples/sec   Loss 8.5072   LearningRate 0.1812   Epoch: 5   Global Step: 54250   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:57:04,575-Speed 5370.39 samples/sec   Loss 8.5782   LearningRate 0.1812   Epoch: 5   Global Step: 54260   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-01-08 06:57:12,192-Speed 5378.60 samples/sec   Loss 8.5449   LearningRate 0.1812   Epoch: 5   Global Step: 54270   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-01-08 06:57:19,779-Speed 5399.02 samples/sec   Loss 8.5002   LearningRate 0.1812   Epoch: 5   Global Step: 54280   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-01-08 06:57:27,334-Speed 5422.35 samples/sec   Loss 8.5511   LearningRate 0.1811   Epoch: 5   Global Step: 54290   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-01-08 06:57:34,930-Speed 5393.59 samples/sec   Loss 8.5037   LearningRate 0.1811   Epoch: 5   Global Step: 54300   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-01-08 06:57:42,415-Speed 5472.84 samples/sec   Loss 8.4574   LearningRate 0.1811   Epoch: 5   Global Step: 54310   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-01-08 06:57:50,022-Speed 5385.24 samples/sec   Loss 8.5253   LearningRate 0.1811   Epoch: 5   Global Step: 54320   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-01-08 06:57:57,548-Speed 5442.68 samples/sec   Loss 8.5405   LearningRate 0.1811   Epoch: 5   Global Step: 54330   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-01-08 06:58:05,038-Speed 5470.65 samples/sec   Loss 8.5741   LearningRate 0.1810   Epoch: 5   Global Step: 54340   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-01-08 06:58:12,576-Speed 5434.60 samples/sec   Loss 8.5764   LearningRate 0.1810   Epoch: 5   Global Step: 54350   Fp16 Grad Scale: 32768   Required: 35 hours
Training: 2022-01-08 06:58:20,108-Speed 5439.23 samples/sec   Loss 8.5936   LearningRate 0.1810   Epoch: 5   Global Step: 54360   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:58:27,635-Speed 5442.36 samples/sec   Loss 8.5240   LearningRate 0.1810   Epoch: 5   Global Step: 54370   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:58:35,168-Speed 5438.09 samples/sec   Loss 8.5420   LearningRate 0.1809   Epoch: 5   Global Step: 54380   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:58:42,774-Speed 5385.79 samples/sec   Loss 8.5282   LearningRate 0.1809   Epoch: 5   Global Step: 54390   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:58:50,339-Speed 5415.24 samples/sec   Loss 8.5470   LearningRate 0.1809   Epoch: 5   Global Step: 54400   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:58:58,028-Speed 5328.34 samples/sec   Loss 8.5533   LearningRate 0.1809   Epoch: 5   Global Step: 54410   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:59:05,511-Speed 5473.95 samples/sec   Loss 8.4759   LearningRate 0.1808   Epoch: 5   Global Step: 54420   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:59:13,051-Speed 5433.10 samples/sec   Loss 8.5345   LearningRate 0.1808   Epoch: 5   Global Step: 54430   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:59:20,616-Speed 5414.97 samples/sec   Loss 8.4941   LearningRate 0.1808   Epoch: 5   Global Step: 54440   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:59:28,229-Speed 5381.40 samples/sec   Loss 8.5458   LearningRate 0.1808   Epoch: 5   Global Step: 54450   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:59:35,746-Speed 5449.55 samples/sec   Loss 8.4521   LearningRate 0.1807   Epoch: 5   Global Step: 54460   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:59:43,450-Speed 5317.26 samples/sec   Loss 8.5476   LearningRate 0.1807   Epoch: 5   Global Step: 54470   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:59:50,986-Speed 5436.37 samples/sec   Loss 8.5428   LearningRate 0.1807   Epoch: 5   Global Step: 54480   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 06:59:58,608-Speed 5374.71 samples/sec   Loss 8.4817   LearningRate 0.1807   Epoch: 5   Global Step: 54490   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:00:06,256-Speed 5355.88 samples/sec   Loss 8.4747   LearningRate 0.1807   Epoch: 5   Global Step: 54500   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:00:13,794-Speed 5434.26 samples/sec   Loss 8.4386   LearningRate 0.1806   Epoch: 5   Global Step: 54510   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:00:21,363-Speed 5412.74 samples/sec   Loss 8.5638   LearningRate 0.1806   Epoch: 5   Global Step: 54520   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:00:29,001-Speed 5363.45 samples/sec   Loss 8.5368   LearningRate 0.1806   Epoch: 5   Global Step: 54530   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:00:36,548-Speed 5427.85 samples/sec   Loss 8.5353   LearningRate 0.1806   Epoch: 5   Global Step: 54540   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:00:44,091-Speed 5430.63 samples/sec   Loss 8.5153   LearningRate 0.1805   Epoch: 5   Global Step: 54550   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:00:51,701-Speed 5383.02 samples/sec   Loss 8.5433   LearningRate 0.1805   Epoch: 5   Global Step: 54560   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:00:59,217-Speed 5450.73 samples/sec   Loss 8.4791   LearningRate 0.1805   Epoch: 5   Global Step: 54570   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:01:06,924-Speed 5315.17 samples/sec   Loss 8.5352   LearningRate 0.1805   Epoch: 5   Global Step: 54580   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:01:14,484-Speed 5418.89 samples/sec   Loss 8.5400   LearningRate 0.1804   Epoch: 5   Global Step: 54590   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:01:22,121-Speed 5363.81 samples/sec   Loss 8.5178   LearningRate 0.1804   Epoch: 5   Global Step: 54600   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:01:29,612-Speed 5469.21 samples/sec   Loss 8.5157   LearningRate 0.1804   Epoch: 5   Global Step: 54610   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:01:37,104-Speed 5467.72 samples/sec   Loss 8.5521   LearningRate 0.1804   Epoch: 5   Global Step: 54620   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:01:44,589-Speed 5472.77 samples/sec   Loss 8.5231   LearningRate 0.1803   Epoch: 5   Global Step: 54630   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:01:52,124-Speed 5436.69 samples/sec   Loss 8.4390   LearningRate 0.1803   Epoch: 5   Global Step: 54640   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:01:59,655-Speed 5440.39 samples/sec   Loss 8.4696   LearningRate 0.1803   Epoch: 5   Global Step: 54650   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:02:07,163-Speed 5456.48 samples/sec   Loss 8.4871   LearningRate 0.1803   Epoch: 5   Global Step: 54660   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:02:14,670-Speed 5456.18 samples/sec   Loss 8.4967   LearningRate 0.1802   Epoch: 5   Global Step: 54670   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:02:22,234-Speed 5416.30 samples/sec   Loss 8.4615   LearningRate 0.1802   Epoch: 5   Global Step: 54680   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:02:29,702-Speed 5485.17 samples/sec   Loss 8.4914   LearningRate 0.1802   Epoch: 5   Global Step: 54690   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:02:37,224-Speed 5446.42 samples/sec   Loss 8.5172   LearningRate 0.1802   Epoch: 5   Global Step: 54700   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:02:44,756-Speed 5438.82 samples/sec   Loss 8.5099   LearningRate 0.1802   Epoch: 5   Global Step: 54710   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:02:52,346-Speed 5397.05 samples/sec   Loss 8.4628   LearningRate 0.1801   Epoch: 5   Global Step: 54720   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:02:59,913-Speed 5414.41 samples/sec   Loss 8.4599   LearningRate 0.1801   Epoch: 5   Global Step: 54730   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:03:07,502-Speed 5397.36 samples/sec   Loss 8.4812   LearningRate 0.1801   Epoch: 5   Global Step: 54740   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:03:14,964-Speed 5489.97 samples/sec   Loss 8.4792   LearningRate 0.1801   Epoch: 5   Global Step: 54750   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:03:22,573-Speed 5383.96 samples/sec   Loss 8.5009   LearningRate 0.1800   Epoch: 5   Global Step: 54760   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:03:30,088-Speed 5451.60 samples/sec   Loss 8.4044   LearningRate 0.1800   Epoch: 5   Global Step: 54770   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:03:37,657-Speed 5411.97 samples/sec   Loss 8.5053   LearningRate 0.1800   Epoch: 5   Global Step: 54780   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:03:45,279-Speed 5374.31 samples/sec   Loss 8.4553   LearningRate 0.1800   Epoch: 5   Global Step: 54790   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:03:52,906-Speed 5371.53 samples/sec   Loss 8.5574   LearningRate 0.1799   Epoch: 5   Global Step: 54800   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:04:00,357-Speed 5497.66 samples/sec   Loss 8.5424   LearningRate 0.1799   Epoch: 5   Global Step: 54810   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:04:07,923-Speed 5414.98 samples/sec   Loss 8.6175   LearningRate 0.1799   Epoch: 5   Global Step: 54820   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:04:15,513-Speed 5397.11 samples/sec   Loss 8.5569   LearningRate 0.1799   Epoch: 5   Global Step: 54830   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:04:23,028-Speed 5451.15 samples/sec   Loss 8.5564   LearningRate 0.1798   Epoch: 5   Global Step: 54840   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:04:30,626-Speed 5392.06 samples/sec   Loss 8.4857   LearningRate 0.1798   Epoch: 5   Global Step: 54850   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:04:38,125-Speed 5463.17 samples/sec   Loss 8.5024   LearningRate 0.1798   Epoch: 5   Global Step: 54860   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:04:45,726-Speed 5388.88 samples/sec   Loss 8.5366   LearningRate 0.1798   Epoch: 5   Global Step: 54870   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:04:53,280-Speed 5423.13 samples/sec   Loss 8.4756   LearningRate 0.1798   Epoch: 5   Global Step: 54880   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:05:00,782-Speed 5460.11 samples/sec   Loss 8.5535   LearningRate 0.1797   Epoch: 5   Global Step: 54890   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:05:08,358-Speed 5407.51 samples/sec   Loss 8.5348   LearningRate 0.1797   Epoch: 5   Global Step: 54900   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:05:15,807-Speed 5499.46 samples/sec   Loss 8.4780   LearningRate 0.1797   Epoch: 5   Global Step: 54910   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:05:23,310-Speed 5459.72 samples/sec   Loss 8.4863   LearningRate 0.1797   Epoch: 5   Global Step: 54920   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:05:30,933-Speed 5374.04 samples/sec   Loss 8.4754   LearningRate 0.1796   Epoch: 5   Global Step: 54930   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:05:38,490-Speed 5421.14 samples/sec   Loss 8.4250   LearningRate 0.1796   Epoch: 5   Global Step: 54940   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:05:46,047-Speed 5420.96 samples/sec   Loss 8.4917   LearningRate 0.1796   Epoch: 5   Global Step: 54950   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:05:53,571-Speed 5444.40 samples/sec   Loss 8.4770   LearningRate 0.1796   Epoch: 5   Global Step: 54960   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:06:01,207-Speed 5364.06 samples/sec   Loss 8.4853   LearningRate 0.1795   Epoch: 5   Global Step: 54970   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:06:08,805-Speed 5392.09 samples/sec   Loss 8.4851   LearningRate 0.1795   Epoch: 5   Global Step: 54980   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:06:16,303-Speed 5463.57 samples/sec   Loss 8.4940   LearningRate 0.1795   Epoch: 5   Global Step: 54990   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:06:23,851-Speed 5427.29 samples/sec   Loss 8.4590   LearningRate 0.1795   Epoch: 5   Global Step: 55000   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:06:31,487-Speed 5364.84 samples/sec   Loss 8.4797   LearningRate 0.1794   Epoch: 5   Global Step: 55010   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:06:39,020-Speed 5438.10 samples/sec   Loss 8.5409   LearningRate 0.1794   Epoch: 5   Global Step: 55020   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:06:46,686-Speed 5344.46 samples/sec   Loss 8.4669   LearningRate 0.1794   Epoch: 5   Global Step: 55030   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:06:54,188-Speed 5460.09 samples/sec   Loss 8.5151   LearningRate 0.1794   Epoch: 5   Global Step: 55040   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:07:01,902-Speed 5310.74 samples/sec   Loss 8.4695   LearningRate 0.1794   Epoch: 5   Global Step: 55050   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:07:09,486-Speed 5401.48 samples/sec   Loss 8.4245   LearningRate 0.1793   Epoch: 5   Global Step: 55060   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:07:17,017-Speed 5439.18 samples/sec   Loss 8.3990   LearningRate 0.1793   Epoch: 5   Global Step: 55070   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:07:24,574-Speed 5421.04 samples/sec   Loss 8.4494   LearningRate 0.1793   Epoch: 5   Global Step: 55080   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:07:32,147-Speed 5409.52 samples/sec   Loss 8.4403   LearningRate 0.1793   Epoch: 5   Global Step: 55090   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:07:39,688-Speed 5432.09 samples/sec   Loss 8.5467   LearningRate 0.1792   Epoch: 5   Global Step: 55100   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:07:47,211-Speed 5445.47 samples/sec   Loss 8.4630   LearningRate 0.1792   Epoch: 5   Global Step: 55110   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:07:54,832-Speed 5375.44 samples/sec   Loss 8.5110   LearningRate 0.1792   Epoch: 5   Global Step: 55120   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:08:02,397-Speed 5415.00 samples/sec   Loss 8.4808   LearningRate 0.1792   Epoch: 5   Global Step: 55130   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:08:09,901-Speed 5459.13 samples/sec   Loss 8.4363   LearningRate 0.1791   Epoch: 5   Global Step: 55140   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:08:17,394-Speed 5467.40 samples/sec   Loss 8.5237   LearningRate 0.1791   Epoch: 5   Global Step: 55150   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:08:24,911-Speed 5449.94 samples/sec   Loss 8.4632   LearningRate 0.1791   Epoch: 5   Global Step: 55160   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:08:32,512-Speed 5388.86 samples/sec   Loss 8.4696   LearningRate 0.1791   Epoch: 5   Global Step: 55170   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:08:40,103-Speed 5396.96 samples/sec   Loss 8.5199   LearningRate 0.1790   Epoch: 5   Global Step: 55180   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:08:47,667-Speed 5416.11 samples/sec   Loss 8.4486   LearningRate 0.1790   Epoch: 5   Global Step: 55190   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:08:55,213-Speed 5428.55 samples/sec   Loss 8.4426   LearningRate 0.1790   Epoch: 5   Global Step: 55200   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:09:02,815-Speed 5388.20 samples/sec   Loss 8.4684   LearningRate 0.1790   Epoch: 5   Global Step: 55210   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:09:10,355-Speed 5433.11 samples/sec   Loss 8.5186   LearningRate 0.1790   Epoch: 5   Global Step: 55220   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:09:17,872-Speed 5450.08 samples/sec   Loss 8.4777   LearningRate 0.1789   Epoch: 5   Global Step: 55230   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:09:25,366-Speed 5466.23 samples/sec   Loss 8.3978   LearningRate 0.1789   Epoch: 5   Global Step: 55240   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:09:32,871-Speed 5457.93 samples/sec   Loss 8.4512   LearningRate 0.1789   Epoch: 5   Global Step: 55250   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:09:40,384-Speed 5452.98 samples/sec   Loss 8.4761   LearningRate 0.1789   Epoch: 5   Global Step: 55260   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:09:47,954-Speed 5411.40 samples/sec   Loss 8.4867   LearningRate 0.1788   Epoch: 5   Global Step: 55270   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:09:55,527-Speed 5409.61 samples/sec   Loss 8.4246   LearningRate 0.1788   Epoch: 5   Global Step: 55280   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:10:03,042-Speed 5451.15 samples/sec   Loss 8.3840   LearningRate 0.1788   Epoch: 5   Global Step: 55290   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:10:10,628-Speed 5400.17 samples/sec   Loss 8.3914   LearningRate 0.1788   Epoch: 5   Global Step: 55300   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:10:18,169-Speed 5432.96 samples/sec   Loss 8.4632   LearningRate 0.1787   Epoch: 5   Global Step: 55310   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:10:25,739-Speed 5411.26 samples/sec   Loss 8.4954   LearningRate 0.1787   Epoch: 5   Global Step: 55320   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:10:33,229-Speed 5469.50 samples/sec   Loss 8.5016   LearningRate 0.1787   Epoch: 5   Global Step: 55330   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:10:40,713-Speed 5473.10 samples/sec   Loss 8.4022   LearningRate 0.1787   Epoch: 5   Global Step: 55340   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:10:48,195-Speed 5475.58 samples/sec   Loss 8.4264   LearningRate 0.1786   Epoch: 5   Global Step: 55350   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:10:55,784-Speed 5398.21 samples/sec   Loss 8.4212   LearningRate 0.1786   Epoch: 5   Global Step: 55360   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:11:06,840-Speed 3705.05 samples/sec   Loss 8.4967   LearningRate 0.1786   Epoch: 5   Global Step: 55370   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:11:14,598-Speed 5280.59 samples/sec   Loss 8.3991   LearningRate 0.1786   Epoch: 5   Global Step: 55380   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:11:22,201-Speed 5388.26 samples/sec   Loss 8.5254   LearningRate 0.1786   Epoch: 5   Global Step: 55390   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:11:29,820-Speed 5376.50 samples/sec   Loss 8.4312   LearningRate 0.1785   Epoch: 5   Global Step: 55400   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:11:37,355-Speed 5436.96 samples/sec   Loss 8.4587   LearningRate 0.1785   Epoch: 5   Global Step: 55410   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:11:44,863-Speed 5456.61 samples/sec   Loss 8.4826   LearningRate 0.1785   Epoch: 5   Global Step: 55420   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:11:52,374-Speed 5453.69 samples/sec   Loss 8.4559   LearningRate 0.1785   Epoch: 5   Global Step: 55430   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:11:59,878-Speed 5459.24 samples/sec   Loss 8.4834   LearningRate 0.1784   Epoch: 5   Global Step: 55440   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:12:07,381-Speed 5459.80 samples/sec   Loss 8.5056   LearningRate 0.1784   Epoch: 5   Global Step: 55450   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:12:14,902-Speed 5446.98 samples/sec   Loss 8.4469   LearningRate 0.1784   Epoch: 5   Global Step: 55460   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:12:22,484-Speed 5403.01 samples/sec   Loss 8.4530   LearningRate 0.1784   Epoch: 5   Global Step: 55470   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:12:30,017-Speed 5438.07 samples/sec   Loss 8.4563   LearningRate 0.1783   Epoch: 5   Global Step: 55480   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:12:37,628-Speed 5382.95 samples/sec   Loss 8.5332   LearningRate 0.1783   Epoch: 5   Global Step: 55490   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:12:45,193-Speed 5415.00 samples/sec   Loss 8.4801   LearningRate 0.1783   Epoch: 5   Global Step: 55500   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:12:52,824-Speed 5368.26 samples/sec   Loss 8.4002   LearningRate 0.1783   Epoch: 5   Global Step: 55510   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:13:00,354-Speed 5440.29 samples/sec   Loss 8.4493   LearningRate 0.1782   Epoch: 5   Global Step: 55520   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:13:08,028-Speed 5338.55 samples/sec   Loss 8.4523   LearningRate 0.1782   Epoch: 5   Global Step: 55530   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:13:15,637-Speed 5383.85 samples/sec   Loss 8.4862   LearningRate 0.1782   Epoch: 5   Global Step: 55540   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 07:13:23,183-Speed 5428.93 samples/sec   Loss 8.4244   LearningRate 0.1782   Epoch: 5   Global Step: 55550   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:13:30,860-Speed 5335.66 samples/sec   Loss 8.4983   LearningRate 0.1782   Epoch: 5   Global Step: 55560   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:13:38,400-Speed 5433.37 samples/sec   Loss 8.4995   LearningRate 0.1781   Epoch: 5   Global Step: 55570   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:13:45,878-Speed 5477.99 samples/sec   Loss 8.4047   LearningRate 0.1781   Epoch: 5   Global Step: 55580   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:13:53,501-Speed 5374.49 samples/sec   Loss 8.4227   LearningRate 0.1781   Epoch: 5   Global Step: 55590   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:14:01,119-Speed 5377.34 samples/sec   Loss 8.4392   LearningRate 0.1781   Epoch: 5   Global Step: 55600   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:14:08,773-Speed 5352.36 samples/sec   Loss 8.3942   LearningRate 0.1780   Epoch: 5   Global Step: 55610   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:14:16,338-Speed 5414.93 samples/sec   Loss 8.4272   LearningRate 0.1780   Epoch: 5   Global Step: 55620   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:14:23,924-Speed 5400.27 samples/sec   Loss 8.4048   LearningRate 0.1780   Epoch: 5   Global Step: 55630   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:14:31,476-Speed 5424.72 samples/sec   Loss 8.4462   LearningRate 0.1780   Epoch: 5   Global Step: 55640   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:14:38,993-Speed 5449.47 samples/sec   Loss 8.4629   LearningRate 0.1779   Epoch: 5   Global Step: 55650   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:14:46,577-Speed 5401.53 samples/sec   Loss 8.4590   LearningRate 0.1779   Epoch: 5   Global Step: 55660   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:14:54,165-Speed 5398.97 samples/sec   Loss 8.3907   LearningRate 0.1779   Epoch: 5   Global Step: 55670   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:15:01,675-Speed 5455.46 samples/sec   Loss 8.4472   LearningRate 0.1779   Epoch: 5   Global Step: 55680   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:15:09,222-Speed 5427.97 samples/sec   Loss 8.3655   LearningRate 0.1779   Epoch: 5   Global Step: 55690   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:15:16,791-Speed 5411.93 samples/sec   Loss 8.4008   LearningRate 0.1778   Epoch: 5   Global Step: 55700   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:15:24,429-Speed 5363.18 samples/sec   Loss 8.4625   LearningRate 0.1778   Epoch: 5   Global Step: 55710   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:15:32,086-Speed 5350.78 samples/sec   Loss 8.4220   LearningRate 0.1778   Epoch: 5   Global Step: 55720   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:15:39,636-Speed 5425.24 samples/sec   Loss 8.4383   LearningRate 0.1778   Epoch: 5   Global Step: 55730   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:15:47,307-Speed 5340.41 samples/sec   Loss 8.3717   LearningRate 0.1777   Epoch: 5   Global Step: 55740   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:15:54,911-Speed 5387.59 samples/sec   Loss 8.4353   LearningRate 0.1777   Epoch: 5   Global Step: 55750   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:16:02,466-Speed 5422.74 samples/sec   Loss 8.4624   LearningRate 0.1777   Epoch: 5   Global Step: 55760   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:16:10,115-Speed 5355.38 samples/sec   Loss 8.4404   LearningRate 0.1777   Epoch: 5   Global Step: 55770   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:16:17,622-Speed 5456.38 samples/sec   Loss 8.4363   LearningRate 0.1776   Epoch: 5   Global Step: 55780   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:16:25,160-Speed 5435.18 samples/sec   Loss 8.3706   LearningRate 0.1776   Epoch: 5   Global Step: 55790   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:16:32,712-Speed 5424.37 samples/sec   Loss 8.4331   LearningRate 0.1776   Epoch: 5   Global Step: 55800   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:16:40,285-Speed 5409.62 samples/sec   Loss 8.4446   LearningRate 0.1776   Epoch: 5   Global Step: 55810   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:16:47,775-Speed 5468.80 samples/sec   Loss 8.3930   LearningRate 0.1775   Epoch: 5   Global Step: 55820   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:16:55,248-Speed 5482.26 samples/sec   Loss 8.4244   LearningRate 0.1775   Epoch: 5   Global Step: 55830   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:17:02,813-Speed 5415.25 samples/sec   Loss 8.4356   LearningRate 0.1775   Epoch: 5   Global Step: 55840   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:17:10,418-Speed 5386.75 samples/sec   Loss 8.4013   LearningRate 0.1775   Epoch: 5   Global Step: 55850   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:17:18,034-Speed 5378.57 samples/sec   Loss 8.4355   LearningRate 0.1775   Epoch: 5   Global Step: 55860   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:17:25,552-Speed 5449.33 samples/sec   Loss 8.4405   LearningRate 0.1774   Epoch: 5   Global Step: 55870   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:17:33,179-Speed 5371.03 samples/sec   Loss 8.4555   LearningRate 0.1774   Epoch: 5   Global Step: 55880   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:17:40,785-Speed 5385.92 samples/sec   Loss 8.4493   LearningRate 0.1774   Epoch: 5   Global Step: 55890   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:17:48,315-Speed 5440.79 samples/sec   Loss 8.3878   LearningRate 0.1774   Epoch: 5   Global Step: 55900   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:17:55,884-Speed 5411.94 samples/sec   Loss 8.4700   LearningRate 0.1773   Epoch: 5   Global Step: 55910   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:18:03,507-Speed 5374.11 samples/sec   Loss 8.4745   LearningRate 0.1773   Epoch: 5   Global Step: 55920   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:18:11,080-Speed 5408.97 samples/sec   Loss 8.4060   LearningRate 0.1773   Epoch: 5   Global Step: 55930   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:18:18,631-Speed 5425.67 samples/sec   Loss 8.4027   LearningRate 0.1773   Epoch: 5   Global Step: 55940   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:18:26,158-Speed 5441.99 samples/sec   Loss 8.4103   LearningRate 0.1772   Epoch: 5   Global Step: 55950   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:18:33,694-Speed 5436.21 samples/sec   Loss 8.4139   LearningRate 0.1772   Epoch: 5   Global Step: 55960   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:18:41,221-Speed 5442.35 samples/sec   Loss 8.4415   LearningRate 0.1772   Epoch: 5   Global Step: 55970   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:18:48,797-Speed 5407.09 samples/sec   Loss 8.4058   LearningRate 0.1772   Epoch: 5   Global Step: 55980   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 07:18:56,302-Speed 5458.31 samples/sec   Loss 8.4285   LearningRate 0.1771   Epoch: 5   Global Step: 55990   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:19:03,893-Speed 5396.99 samples/sec   Loss 8.4121   LearningRate 0.1771   Epoch: 5   Global Step: 56000   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:19:48,144-[lfw][56000]XNorm: 22.057266
Training: 2022-01-08 07:19:48,145-[lfw][56000]Accuracy-Flip: 0.99767+-0.00238
Training: 2022-01-08 07:19:48,146-[lfw][56000]Accuracy-Highest: 0.99817
Training: 2022-01-08 07:20:40,414-[cfp_fp][56000]XNorm: 19.616236
Training: 2022-01-08 07:20:40,416-[cfp_fp][56000]Accuracy-Flip: 0.98471+-0.00461
Training: 2022-01-08 07:20:40,417-[cfp_fp][56000]Accuracy-Highest: 0.98600
Training: 2022-01-08 07:21:26,438-[agedb_30][56000]XNorm: 21.866216
Training: 2022-01-08 07:21:26,440-[agedb_30][56000]Accuracy-Flip: 0.97517+-0.00693
Training: 2022-01-08 07:21:26,441-[agedb_30][56000]Accuracy-Highest: 0.97517
Training: 2022-01-08 07:21:34,145-Speed 272.61 samples/sec   Loss 8.4133   LearningRate 0.1771   Epoch: 5   Global Step: 56010   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:21:41,801-Speed 5351.79 samples/sec   Loss 8.3815   LearningRate 0.1771   Epoch: 5   Global Step: 56020   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:21:49,460-Speed 5348.61 samples/sec   Loss 8.4853   LearningRate 0.1771   Epoch: 5   Global Step: 56030   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:21:56,940-Speed 5478.02 samples/sec   Loss 8.4431   LearningRate 0.1770   Epoch: 5   Global Step: 56040   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:22:04,510-Speed 5411.57 samples/sec   Loss 8.3664   LearningRate 0.1770   Epoch: 5   Global Step: 56050   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:22:12,064-Speed 5422.72 samples/sec   Loss 8.4304   LearningRate 0.1770   Epoch: 5   Global Step: 56060   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:22:19,576-Speed 5453.95 samples/sec   Loss 8.4394   LearningRate 0.1770   Epoch: 5   Global Step: 56070   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:22:27,107-Speed 5439.37 samples/sec   Loss 8.3604   LearningRate 0.1769   Epoch: 5   Global Step: 56080   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:22:34,663-Speed 5421.74 samples/sec   Loss 8.4127   LearningRate 0.1769   Epoch: 5   Global Step: 56090   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:22:42,272-Speed 5383.65 samples/sec   Loss 8.3654   LearningRate 0.1769   Epoch: 5   Global Step: 56100   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:22:49,880-Speed 5384.32 samples/sec   Loss 8.4416   LearningRate 0.1769   Epoch: 5   Global Step: 56110   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 07:22:57,497-Speed 5378.29 samples/sec   Loss 8.3666   LearningRate 0.1768   Epoch: 5   Global Step: 56120   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:23:05,042-Speed 5429.20 samples/sec   Loss 8.3414   LearningRate 0.1768   Epoch: 5   Global Step: 56130   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:23:12,548-Speed 5458.22 samples/sec   Loss 8.2854   LearningRate 0.1768   Epoch: 5   Global Step: 56140   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:23:20,051-Speed 5460.21 samples/sec   Loss 8.4335   LearningRate 0.1768   Epoch: 5   Global Step: 56150   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:23:27,646-Speed 5393.51 samples/sec   Loss 8.3739   LearningRate 0.1767   Epoch: 5   Global Step: 56160   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:23:35,224-Speed 5405.82 samples/sec   Loss 8.4770   LearningRate 0.1767   Epoch: 5   Global Step: 56170   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:23:42,717-Speed 5467.50 samples/sec   Loss 8.4421   LearningRate 0.1767   Epoch: 5   Global Step: 56180   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:23:50,226-Speed 5455.23 samples/sec   Loss 8.3745   LearningRate 0.1767   Epoch: 5   Global Step: 56190   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:23:57,819-Speed 5395.17 samples/sec   Loss 8.4348   LearningRate 0.1767   Epoch: 5   Global Step: 56200   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:24:05,428-Speed 5383.79 samples/sec   Loss 8.3788   LearningRate 0.1766   Epoch: 5   Global Step: 56210   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:24:12,958-Speed 5440.65 samples/sec   Loss 8.4244   LearningRate 0.1766   Epoch: 5   Global Step: 56220   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 07:24:20,574-Speed 5379.11 samples/sec   Loss 8.4196   LearningRate 0.1766   Epoch: 5   Global Step: 56230   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 07:24:28,097-Speed 5445.12 samples/sec   Loss 8.4522   LearningRate 0.1766   Epoch: 5   Global Step: 56240   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 07:24:35,572-Speed 5480.51 samples/sec   Loss 8.3965   LearningRate 0.1765   Epoch: 5   Global Step: 56250   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 07:24:43,141-Speed 5411.90 samples/sec   Loss 8.4062   LearningRate 0.1765   Epoch: 5   Global Step: 56260   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 07:24:50,734-Speed 5395.40 samples/sec   Loss 8.3560   LearningRate 0.1765   Epoch: 5   Global Step: 56270   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 07:24:58,286-Speed 5424.55 samples/sec   Loss 8.3638   LearningRate 0.1765   Epoch: 5   Global Step: 56280   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 07:25:05,806-Speed 5446.80 samples/sec   Loss 8.4818   LearningRate 0.1764   Epoch: 5   Global Step: 56290   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 07:25:13,398-Speed 5396.60 samples/sec   Loss 8.4607   LearningRate 0.1764   Epoch: 5   Global Step: 56300   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 07:25:20,980-Speed 5402.82 samples/sec   Loss 8.4134   LearningRate 0.1764   Epoch: 5   Global Step: 56310   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 07:25:28,487-Speed 5456.73 samples/sec   Loss 8.3897   LearningRate 0.1764   Epoch: 5   Global Step: 56320   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 07:25:36,132-Speed 5358.20 samples/sec   Loss 8.3996   LearningRate 0.1764   Epoch: 5   Global Step: 56330   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 07:25:43,704-Speed 5410.41 samples/sec   Loss 8.3704   LearningRate 0.1763   Epoch: 5   Global Step: 56340   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 07:25:51,197-Speed 5467.76 samples/sec   Loss 8.3852   LearningRate 0.1763   Epoch: 5   Global Step: 56350   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 07:25:58,856-Speed 5348.15 samples/sec   Loss 8.3537   LearningRate 0.1763   Epoch: 5   Global Step: 56360   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 07:26:06,383-Speed 5442.10 samples/sec   Loss 8.4181   LearningRate 0.1763   Epoch: 5   Global Step: 56370   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 07:26:13,942-Speed 5420.19 samples/sec   Loss 8.4339   LearningRate 0.1762   Epoch: 5   Global Step: 56380   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 07:26:21,547-Speed 5386.38 samples/sec   Loss 8.3698   LearningRate 0.1762   Epoch: 5   Global Step: 56390   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 07:26:29,057-Speed 5454.85 samples/sec   Loss 8.4388   LearningRate 0.1762   Epoch: 5   Global Step: 56400   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 07:26:36,581-Speed 5444.39 samples/sec   Loss 8.4350   LearningRate 0.1762   Epoch: 5   Global Step: 56410   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:26:44,128-Speed 5428.40 samples/sec   Loss 8.4388   LearningRate 0.1761   Epoch: 5   Global Step: 56420   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:26:51,666-Speed 5434.23 samples/sec   Loss 8.4169   LearningRate 0.1761   Epoch: 5   Global Step: 56430   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:26:59,193-Speed 5442.12 samples/sec   Loss 8.3931   LearningRate 0.1761   Epoch: 5   Global Step: 56440   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:27:06,752-Speed 5419.10 samples/sec   Loss 8.3714   LearningRate 0.1761   Epoch: 5   Global Step: 56450   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:27:14,334-Speed 5403.87 samples/sec   Loss 8.4167   LearningRate 0.1760   Epoch: 5   Global Step: 56460   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:27:21,881-Speed 5427.64 samples/sec   Loss 8.4307   LearningRate 0.1760   Epoch: 5   Global Step: 56470   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:27:29,443-Speed 5417.03 samples/sec   Loss 8.3477   LearningRate 0.1760   Epoch: 5   Global Step: 56480   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:27:37,059-Speed 5379.17 samples/sec   Loss 8.4620   LearningRate 0.1760   Epoch: 5   Global Step: 56490   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:27:44,584-Speed 5443.88 samples/sec   Loss 8.4238   LearningRate 0.1760   Epoch: 5   Global Step: 56500   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:27:52,324-Speed 5292.96 samples/sec   Loss 8.3748   LearningRate 0.1759   Epoch: 5   Global Step: 56510   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 07:27:59,918-Speed 5394.82 samples/sec   Loss 8.3824   LearningRate 0.1759   Epoch: 5   Global Step: 56520   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:28:07,531-Speed 5380.29 samples/sec   Loss 8.3850   LearningRate 0.1759   Epoch: 5   Global Step: 56530   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:28:15,085-Speed 5423.11 samples/sec   Loss 8.3767   LearningRate 0.1759   Epoch: 5   Global Step: 56540   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:28:22,716-Speed 5368.62 samples/sec   Loss 8.4522   LearningRate 0.1758   Epoch: 5   Global Step: 56550   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:28:30,256-Speed 5433.04 samples/sec   Loss 8.3817   LearningRate 0.1758   Epoch: 5   Global Step: 56560   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:28:37,840-Speed 5401.34 samples/sec   Loss 8.3539   LearningRate 0.1758   Epoch: 5   Global Step: 56570   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:28:45,463-Speed 5374.24 samples/sec   Loss 8.3801   LearningRate 0.1758   Epoch: 5   Global Step: 56580   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:28:53,074-Speed 5382.62 samples/sec   Loss 8.3962   LearningRate 0.1757   Epoch: 5   Global Step: 56590   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:29:00,654-Speed 5404.68 samples/sec   Loss 8.2978   LearningRate 0.1757   Epoch: 5   Global Step: 56600   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:29:08,196-Speed 5430.91 samples/sec   Loss 8.3832   LearningRate 0.1757   Epoch: 5   Global Step: 56610   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:29:15,833-Speed 5364.19 samples/sec   Loss 8.4122   LearningRate 0.1757   Epoch: 5   Global Step: 56620   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 07:29:23,368-Speed 5437.14 samples/sec   Loss 8.3575   LearningRate 0.1757   Epoch: 5   Global Step: 56630   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 07:29:31,038-Speed 5340.87 samples/sec   Loss 8.3447   LearningRate 0.1756   Epoch: 5   Global Step: 56640   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 07:29:38,589-Speed 5425.17 samples/sec   Loss 8.3684   LearningRate 0.1756   Epoch: 5   Global Step: 56650   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 07:29:46,108-Speed 5448.41 samples/sec   Loss 8.4025   LearningRate 0.1756   Epoch: 5   Global Step: 56660   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 07:29:53,667-Speed 5419.22 samples/sec   Loss 8.3216   LearningRate 0.1756   Epoch: 5   Global Step: 56670   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 07:30:01,294-Speed 5370.74 samples/sec   Loss 8.3824   LearningRate 0.1755   Epoch: 5   Global Step: 56680   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 07:30:08,944-Speed 5355.19 samples/sec   Loss 8.4011   LearningRate 0.1755   Epoch: 5   Global Step: 56690   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 07:30:16,583-Speed 5362.56 samples/sec   Loss 8.4246   LearningRate 0.1755   Epoch: 5   Global Step: 56700   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:30:24,175-Speed 5395.84 samples/sec   Loss 8.4076   LearningRate 0.1755   Epoch: 5   Global Step: 56710   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:30:31,708-Speed 5438.39 samples/sec   Loss 8.4733   LearningRate 0.1754   Epoch: 5   Global Step: 56720   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 07:30:39,219-Speed 5453.50 samples/sec   Loss 8.3747   LearningRate 0.1754   Epoch: 5   Global Step: 56730   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:30:46,850-Speed 5368.63 samples/sec   Loss 8.4037   LearningRate 0.1754   Epoch: 5   Global Step: 56740   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:30:54,481-Speed 5368.38 samples/sec   Loss 8.3530   LearningRate 0.1754   Epoch: 5   Global Step: 56750   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:31:02,104-Speed 5373.74 samples/sec   Loss 8.4207   LearningRate 0.1753   Epoch: 5   Global Step: 56760   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:31:09,706-Speed 5388.75 samples/sec   Loss 8.3388   LearningRate 0.1753   Epoch: 5   Global Step: 56770   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:31:17,286-Speed 5404.18 samples/sec   Loss 8.4017   LearningRate 0.1753   Epoch: 5   Global Step: 56780   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:31:24,836-Speed 5426.13 samples/sec   Loss 8.3871   LearningRate 0.1753   Epoch: 5   Global Step: 56790   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:31:32,495-Speed 5348.54 samples/sec   Loss 8.4116   LearningRate 0.1753   Epoch: 5   Global Step: 56800   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 07:31:40,080-Speed 5400.92 samples/sec   Loss 8.4149   LearningRate 0.1752   Epoch: 5   Global Step: 56810   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:31:47,670-Speed 5397.42 samples/sec   Loss 8.3631   LearningRate 0.1752   Epoch: 5   Global Step: 56820   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:31:55,213-Speed 5430.70 samples/sec   Loss 8.3507   LearningRate 0.1752   Epoch: 5   Global Step: 56830   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:32:02,749-Speed 5436.46 samples/sec   Loss 8.3611   LearningRate 0.1752   Epoch: 5   Global Step: 56840   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:32:10,312-Speed 5416.30 samples/sec   Loss 8.3156   LearningRate 0.1751   Epoch: 5   Global Step: 56850   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:32:17,967-Speed 5351.68 samples/sec   Loss 8.3661   LearningRate 0.1751   Epoch: 5   Global Step: 56860   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:32:25,626-Speed 5348.49 samples/sec   Loss 8.3847   LearningRate 0.1751   Epoch: 5   Global Step: 56870   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:32:33,172-Speed 5428.75 samples/sec   Loss 8.4121   LearningRate 0.1751   Epoch: 5   Global Step: 56880   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:32:40,755-Speed 5402.55 samples/sec   Loss 8.3608   LearningRate 0.1750   Epoch: 5   Global Step: 56890   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:32:48,409-Speed 5352.30 samples/sec   Loss 8.3454   LearningRate 0.1750   Epoch: 5   Global Step: 56900   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:32:56,096-Speed 5329.12 samples/sec   Loss 8.3255   LearningRate 0.1750   Epoch: 5   Global Step: 56910   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 07:33:03,707-Speed 5382.13 samples/sec   Loss 8.4231   LearningRate 0.1750   Epoch: 5   Global Step: 56920   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 07:33:11,357-Speed 5355.12 samples/sec   Loss 8.3072   LearningRate 0.1750   Epoch: 5   Global Step: 56930   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 07:33:18,957-Speed 5389.91 samples/sec   Loss 8.3779   LearningRate 0.1749   Epoch: 5   Global Step: 56940   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:33:26,598-Speed 5361.37 samples/sec   Loss 8.3940   LearningRate 0.1749   Epoch: 5   Global Step: 56950   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:33:34,258-Speed 5348.26 samples/sec   Loss 8.4495   LearningRate 0.1749   Epoch: 5   Global Step: 56960   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:33:41,931-Speed 5338.85 samples/sec   Loss 8.3916   LearningRate 0.1749   Epoch: 5   Global Step: 56970   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:33:49,603-Speed 5339.37 samples/sec   Loss 8.4104   LearningRate 0.1748   Epoch: 5   Global Step: 56980   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:33:57,139-Speed 5435.80 samples/sec   Loss 8.3514   LearningRate 0.1748   Epoch: 5   Global Step: 56990   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:34:04,770-Speed 5368.32 samples/sec   Loss 8.3244   LearningRate 0.1748   Epoch: 5   Global Step: 57000   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:34:12,379-Speed 5383.97 samples/sec   Loss 8.3708   LearningRate 0.1748   Epoch: 5   Global Step: 57010   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:34:20,008-Speed 5369.73 samples/sec   Loss 8.3691   LearningRate 0.1747   Epoch: 5   Global Step: 57020   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:34:27,639-Speed 5368.44 samples/sec   Loss 8.3882   LearningRate 0.1747   Epoch: 5   Global Step: 57030   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:34:35,300-Speed 5347.27 samples/sec   Loss 8.2623   LearningRate 0.1747   Epoch: 5   Global Step: 57040   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 07:34:42,795-Speed 5465.61 samples/sec   Loss 8.3488   LearningRate 0.1747   Epoch: 5   Global Step: 57050   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:34:50,415-Speed 5375.95 samples/sec   Loss 8.3175   LearningRate 0.1747   Epoch: 5   Global Step: 57060   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:34:57,947-Speed 5438.61 samples/sec   Loss 8.3363   LearningRate 0.1746   Epoch: 5   Global Step: 57070   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:35:05,614-Speed 5343.05 samples/sec   Loss 8.3624   LearningRate 0.1746   Epoch: 5   Global Step: 57080   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:35:13,105-Speed 5468.36 samples/sec   Loss 8.3631   LearningRate 0.1746   Epoch: 5   Global Step: 57090   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:35:20,721-Speed 5379.38 samples/sec   Loss 8.3530   LearningRate 0.1746   Epoch: 5   Global Step: 57100   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:35:28,162-Speed 5505.64 samples/sec   Loss 8.3495   LearningRate 0.1745   Epoch: 5   Global Step: 57110   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:35:35,692-Speed 5439.70 samples/sec   Loss 8.4158   LearningRate 0.1745   Epoch: 5   Global Step: 57120   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:35:43,201-Speed 5455.90 samples/sec   Loss 8.3915   LearningRate 0.1745   Epoch: 5   Global Step: 57130   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:35:50,700-Speed 5462.51 samples/sec   Loss 8.3478   LearningRate 0.1745   Epoch: 5   Global Step: 57140   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:35:58,185-Speed 5472.87 samples/sec   Loss 8.4023   LearningRate 0.1744   Epoch: 5   Global Step: 57150   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:36:05,763-Speed 5406.02 samples/sec   Loss 8.3754   LearningRate 0.1744   Epoch: 5   Global Step: 57160   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:36:13,332-Speed 5412.60 samples/sec   Loss 8.3646   LearningRate 0.1744   Epoch: 5   Global Step: 57170   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:36:20,903-Speed 5410.55 samples/sec   Loss 8.3510   LearningRate 0.1744   Epoch: 5   Global Step: 57180   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:36:28,473-Speed 5412.00 samples/sec   Loss 8.3558   LearningRate 0.1744   Epoch: 5   Global Step: 57190   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:36:36,048-Speed 5407.69 samples/sec   Loss 8.3656   LearningRate 0.1743   Epoch: 5   Global Step: 57200   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:36:43,708-Speed 5347.70 samples/sec   Loss 8.3318   LearningRate 0.1743   Epoch: 5   Global Step: 57210   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:36:51,360-Speed 5353.67 samples/sec   Loss 8.3200   LearningRate 0.1743   Epoch: 5   Global Step: 57220   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:36:59,062-Speed 5319.71 samples/sec   Loss 8.3295   LearningRate 0.1743   Epoch: 5   Global Step: 57230   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:37:06,607-Speed 5429.10 samples/sec   Loss 8.3856   LearningRate 0.1742   Epoch: 5   Global Step: 57240   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:37:14,163-Speed 5421.04 samples/sec   Loss 8.2866   LearningRate 0.1742   Epoch: 5   Global Step: 57250   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 07:37:21,735-Speed 5410.57 samples/sec   Loss 8.3791   LearningRate 0.1742   Epoch: 5   Global Step: 57260   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 07:37:29,271-Speed 5435.91 samples/sec   Loss 8.3378   LearningRate 0.1742   Epoch: 5   Global Step: 57270   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:37:36,867-Speed 5392.80 samples/sec   Loss 8.4209   LearningRate 0.1741   Epoch: 5   Global Step: 57280   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:37:44,374-Speed 5456.75 samples/sec   Loss 8.3179   LearningRate 0.1741   Epoch: 5   Global Step: 57290   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:37:51,917-Speed 5431.45 samples/sec   Loss 8.3490   LearningRate 0.1741   Epoch: 5   Global Step: 57300   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:37:59,428-Speed 5453.83 samples/sec   Loss 8.2967   LearningRate 0.1741   Epoch: 5   Global Step: 57310   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:38:07,052-Speed 5373.29 samples/sec   Loss 8.3473   LearningRate 0.1740   Epoch: 5   Global Step: 57320   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:38:14,712-Speed 5348.05 samples/sec   Loss 8.3967   LearningRate 0.1740   Epoch: 5   Global Step: 57330   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:38:22,653-Speed 5158.78 samples/sec   Loss 8.2561   LearningRate 0.1740   Epoch: 5   Global Step: 57340   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:38:30,181-Speed 5442.06 samples/sec   Loss 8.3486   LearningRate 0.1740   Epoch: 5   Global Step: 57350   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:38:37,809-Speed 5370.02 samples/sec   Loss 8.2804   LearningRate 0.1740   Epoch: 5   Global Step: 57360   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:38:45,344-Speed 5436.43 samples/sec   Loss 8.3428   LearningRate 0.1739   Epoch: 5   Global Step: 57370   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 07:38:52,800-Speed 5494.46 samples/sec   Loss 8.4760   LearningRate 0.1739   Epoch: 5   Global Step: 57380   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:39:00,456-Speed 5350.42 samples/sec   Loss 8.3370   LearningRate 0.1739   Epoch: 5   Global Step: 57390   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:39:08,067-Speed 5382.84 samples/sec   Loss 8.2640   LearningRate 0.1739   Epoch: 5   Global Step: 57400   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:39:15,656-Speed 5397.24 samples/sec   Loss 8.3024   LearningRate 0.1738   Epoch: 5   Global Step: 57410   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:39:23,214-Speed 5420.14 samples/sec   Loss 8.3268   LearningRate 0.1738   Epoch: 5   Global Step: 57420   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:39:30,654-Speed 5506.66 samples/sec   Loss 8.3083   LearningRate 0.1738   Epoch: 5   Global Step: 57430   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:39:38,131-Speed 5478.86 samples/sec   Loss 8.3517   LearningRate 0.1738   Epoch: 5   Global Step: 57440   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:39:45,648-Speed 5449.74 samples/sec   Loss 8.2479   LearningRate 0.1737   Epoch: 5   Global Step: 57450   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:39:53,208-Speed 5418.50 samples/sec   Loss 8.3260   LearningRate 0.1737   Epoch: 5   Global Step: 57460   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:40:00,747-Speed 5434.17 samples/sec   Loss 8.3558   LearningRate 0.1737   Epoch: 5   Global Step: 57470   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:40:08,439-Speed 5325.60 samples/sec   Loss 8.2823   LearningRate 0.1737   Epoch: 5   Global Step: 57480   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 07:40:16,028-Speed 5398.12 samples/sec   Loss 8.4155   LearningRate 0.1737   Epoch: 5   Global Step: 57490   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:40:23,558-Speed 5440.40 samples/sec   Loss 8.3227   LearningRate 0.1736   Epoch: 5   Global Step: 57500   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:40:31,118-Speed 5418.35 samples/sec   Loss 8.3072   LearningRate 0.1736   Epoch: 5   Global Step: 57510   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:40:38,666-Speed 5427.86 samples/sec   Loss 8.3605   LearningRate 0.1736   Epoch: 5   Global Step: 57520   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:40:46,157-Speed 5468.26 samples/sec   Loss 8.3695   LearningRate 0.1736   Epoch: 5   Global Step: 57530   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:40:53,711-Speed 5423.41 samples/sec   Loss 8.4198   LearningRate 0.1735   Epoch: 5   Global Step: 57540   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:41:01,376-Speed 5344.43 samples/sec   Loss 8.3398   LearningRate 0.1735   Epoch: 5   Global Step: 57550   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:41:08,960-Speed 5402.01 samples/sec   Loss 8.2510   LearningRate 0.1735   Epoch: 5   Global Step: 57560   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:41:16,458-Speed 5462.67 samples/sec   Loss 8.3324   LearningRate 0.1735   Epoch: 5   Global Step: 57570   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:41:23,957-Speed 5463.31 samples/sec   Loss 8.3267   LearningRate 0.1734   Epoch: 5   Global Step: 57580   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:41:31,486-Speed 5440.81 samples/sec   Loss 8.3711   LearningRate 0.1734   Epoch: 5   Global Step: 57590   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:41:39,013-Speed 5442.88 samples/sec   Loss 8.2772   LearningRate 0.1734   Epoch: 5   Global Step: 57600   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:41:46,555-Speed 5431.77 samples/sec   Loss 8.3403   LearningRate 0.1734   Epoch: 5   Global Step: 57610   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:41:54,142-Speed 5399.10 samples/sec   Loss 8.2820   LearningRate 0.1734   Epoch: 5   Global Step: 57620   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:42:01,687-Speed 5429.55 samples/sec   Loss 8.2871   LearningRate 0.1733   Epoch: 5   Global Step: 57630   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:42:09,276-Speed 5398.11 samples/sec   Loss 8.3423   LearningRate 0.1733   Epoch: 5   Global Step: 57640   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:42:16,736-Speed 5490.95 samples/sec   Loss 8.3088   LearningRate 0.1733   Epoch: 5   Global Step: 57650   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:42:24,196-Speed 5491.70 samples/sec   Loss 8.3169   LearningRate 0.1733   Epoch: 5   Global Step: 57660   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:42:31,844-Speed 5356.14 samples/sec   Loss 8.3655   LearningRate 0.1732   Epoch: 5   Global Step: 57670   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:42:39,625-Speed 5264.87 samples/sec   Loss 8.3182   LearningRate 0.1732   Epoch: 5   Global Step: 57680   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:42:47,213-Speed 5398.88 samples/sec   Loss 8.3154   LearningRate 0.1732   Epoch: 5   Global Step: 57690   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:42:54,716-Speed 5459.75 samples/sec   Loss 8.3693   LearningRate 0.1732   Epoch: 5   Global Step: 57700   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:43:02,321-Speed 5386.09 samples/sec   Loss 8.2315   LearningRate 0.1731   Epoch: 5   Global Step: 57710   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:43:09,874-Speed 5424.39 samples/sec   Loss 8.3319   LearningRate 0.1731   Epoch: 5   Global Step: 57720   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:43:17,406-Speed 5439.40 samples/sec   Loss 8.3204   LearningRate 0.1731   Epoch: 5   Global Step: 57730   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:43:24,985-Speed 5405.09 samples/sec   Loss 8.2522   LearningRate 0.1731   Epoch: 5   Global Step: 57740   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:43:32,567-Speed 5403.04 samples/sec   Loss 8.2920   LearningRate 0.1731   Epoch: 5   Global Step: 57750   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:43:40,120-Speed 5423.38 samples/sec   Loss 8.3453   LearningRate 0.1730   Epoch: 5   Global Step: 57760   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:43:47,625-Speed 5458.93 samples/sec   Loss 8.3396   LearningRate 0.1730   Epoch: 5   Global Step: 57770   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:43:55,150-Speed 5443.53 samples/sec   Loss 8.3796   LearningRate 0.1730   Epoch: 5   Global Step: 57780   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:44:02,744-Speed 5393.90 samples/sec   Loss 8.3593   LearningRate 0.1730   Epoch: 5   Global Step: 57790   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:44:10,378-Speed 5366.92 samples/sec   Loss 8.2761   LearningRate 0.1729   Epoch: 5   Global Step: 57800   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:44:18,069-Speed 5326.42 samples/sec   Loss 8.3639   LearningRate 0.1729   Epoch: 5   Global Step: 57810   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:44:25,600-Speed 5439.10 samples/sec   Loss 8.4109   LearningRate 0.1729   Epoch: 5   Global Step: 57820   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:44:33,143-Speed 5430.82 samples/sec   Loss 8.2767   LearningRate 0.1729   Epoch: 5   Global Step: 57830   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:44:40,693-Speed 5426.64 samples/sec   Loss 8.3524   LearningRate 0.1728   Epoch: 5   Global Step: 57840   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:44:48,207-Speed 5451.51 samples/sec   Loss 8.3598   LearningRate 0.1728   Epoch: 5   Global Step: 57850   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:44:55,738-Speed 5439.74 samples/sec   Loss 8.3293   LearningRate 0.1728   Epoch: 5   Global Step: 57860   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:45:03,279-Speed 5431.88 samples/sec   Loss 8.2595   LearningRate 0.1728   Epoch: 5   Global Step: 57870   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:45:10,804-Speed 5444.83 samples/sec   Loss 8.2629   LearningRate 0.1728   Epoch: 5   Global Step: 57880   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 07:45:18,440-Speed 5364.58 samples/sec   Loss 8.3176   LearningRate 0.1727   Epoch: 5   Global Step: 57890   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 07:45:25,954-Speed 5451.68 samples/sec   Loss 8.2642   LearningRate 0.1727   Epoch: 5   Global Step: 57900   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:45:33,552-Speed 5391.61 samples/sec   Loss 8.3670   LearningRate 0.1727   Epoch: 5   Global Step: 57910   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:45:41,123-Speed 5411.19 samples/sec   Loss 8.2551   LearningRate 0.1727   Epoch: 5   Global Step: 57920   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:45:48,623-Speed 5461.99 samples/sec   Loss 8.2960   LearningRate 0.1726   Epoch: 5   Global Step: 57930   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:45:56,228-Speed 5386.56 samples/sec   Loss 8.2649   LearningRate 0.1726   Epoch: 5   Global Step: 57940   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:46:03,799-Speed 5411.03 samples/sec   Loss 8.3328   LearningRate 0.1726   Epoch: 5   Global Step: 57950   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:46:11,336-Speed 5434.60 samples/sec   Loss 8.3909   LearningRate 0.1726   Epoch: 5   Global Step: 57960   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:46:18,831-Speed 5465.77 samples/sec   Loss 8.3493   LearningRate 0.1725   Epoch: 5   Global Step: 57970   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:46:26,304-Speed 5482.01 samples/sec   Loss 8.3266   LearningRate 0.1725   Epoch: 5   Global Step: 57980   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:46:33,797-Speed 5467.16 samples/sec   Loss 8.3620   LearningRate 0.1725   Epoch: 5   Global Step: 57990   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:46:41,294-Speed 5464.10 samples/sec   Loss 8.3587   LearningRate 0.1725   Epoch: 5   Global Step: 58000   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 07:47:25,305-[lfw][58000]XNorm: 22.065349
Training: 2022-01-08 07:47:25,305-[lfw][58000]Accuracy-Flip: 0.99800+-0.00267
Training: 2022-01-08 07:47:25,306-[lfw][58000]Accuracy-Highest: 0.99817
Training: 2022-01-08 07:48:16,844-[cfp_fp][58000]XNorm: 20.194377
Training: 2022-01-08 07:48:16,845-[cfp_fp][58000]Accuracy-Flip: 0.98271+-0.00659
Training: 2022-01-08 07:48:16,846-[cfp_fp][58000]Accuracy-Highest: 0.98600
Training: 2022-01-08 07:49:02,402-[agedb_30][58000]XNorm: 22.030144
Training: 2022-01-08 07:49:02,403-[agedb_30][58000]Accuracy-Flip: 0.97467+-0.00666
Training: 2022-01-08 07:49:02,404-[agedb_30][58000]Accuracy-Highest: 0.97517
Training: 2022-01-08 07:49:09,964-Speed 275.51 samples/sec   Loss 8.2991   LearningRate 0.1725   Epoch: 5   Global Step: 58010   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 07:49:17,463-Speed 5463.14 samples/sec   Loss 8.3650   LearningRate 0.1724   Epoch: 5   Global Step: 58020   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 07:49:25,004-Speed 5433.11 samples/sec   Loss 8.4459   LearningRate 0.1724   Epoch: 5   Global Step: 58030   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 07:49:32,434-Speed 5513.09 samples/sec   Loss 8.2671   LearningRate 0.1724   Epoch: 5   Global Step: 58040   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:49:39,955-Speed 5447.71 samples/sec   Loss 8.2814   LearningRate 0.1724   Epoch: 5   Global Step: 58050   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:49:47,516-Speed 5417.43 samples/sec   Loss 8.2755   LearningRate 0.1723   Epoch: 5   Global Step: 58060   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:49:55,083-Speed 5413.62 samples/sec   Loss 8.2687   LearningRate 0.1723   Epoch: 5   Global Step: 58070   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:50:02,493-Speed 5528.76 samples/sec   Loss 8.2356   LearningRate 0.1723   Epoch: 5   Global Step: 58080   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:50:09,953-Speed 5491.35 samples/sec   Loss 8.2489   LearningRate 0.1723   Epoch: 5   Global Step: 58090   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:50:17,549-Speed 5392.78 samples/sec   Loss 8.2454   LearningRate 0.1722   Epoch: 5   Global Step: 58100   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:50:25,086-Speed 5435.29 samples/sec   Loss 8.2946   LearningRate 0.1722   Epoch: 5   Global Step: 58110   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:50:32,498-Speed 5526.78 samples/sec   Loss 8.3606   LearningRate 0.1722   Epoch: 5   Global Step: 58120   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:50:40,004-Speed 5458.47 samples/sec   Loss 8.3000   LearningRate 0.1722   Epoch: 5   Global Step: 58130   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:50:47,504-Speed 5461.68 samples/sec   Loss 8.2519   LearningRate 0.1722   Epoch: 5   Global Step: 58140   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 07:50:55,012-Speed 5455.75 samples/sec   Loss 8.3205   LearningRate 0.1721   Epoch: 5   Global Step: 58150   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 07:51:02,563-Speed 5425.08 samples/sec   Loss 8.2736   LearningRate 0.1721   Epoch: 5   Global Step: 58160   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 07:51:10,034-Speed 5483.54 samples/sec   Loss 8.3094   LearningRate 0.1721   Epoch: 5   Global Step: 58170   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:51:17,626-Speed 5396.28 samples/sec   Loss 8.2585   LearningRate 0.1721   Epoch: 5   Global Step: 58180   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:51:25,340-Speed 5310.27 samples/sec   Loss 8.3294   LearningRate 0.1720   Epoch: 5   Global Step: 58190   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:51:32,894-Speed 5422.55 samples/sec   Loss 8.2244   LearningRate 0.1720   Epoch: 5   Global Step: 58200   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:51:40,446-Speed 5425.04 samples/sec   Loss 8.2879   LearningRate 0.1720   Epoch: 5   Global Step: 58210   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:51:48,059-Speed 5380.90 samples/sec   Loss 8.2051   LearningRate 0.1720   Epoch: 5   Global Step: 58220   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:51:55,576-Speed 5449.72 samples/sec   Loss 8.3429   LearningRate 0.1719   Epoch: 5   Global Step: 58230   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:52:03,190-Speed 5380.59 samples/sec   Loss 8.2679   LearningRate 0.1719   Epoch: 5   Global Step: 58240   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:52:10,759-Speed 5412.35 samples/sec   Loss 8.2023   LearningRate 0.1719   Epoch: 5   Global Step: 58250   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:52:18,362-Speed 5387.94 samples/sec   Loss 8.2957   LearningRate 0.1719   Epoch: 5   Global Step: 58260   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:52:25,956-Speed 5393.76 samples/sec   Loss 8.3692   LearningRate 0.1719   Epoch: 5   Global Step: 58270   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 07:52:33,481-Speed 5444.35 samples/sec   Loss 8.2907   LearningRate 0.1718   Epoch: 5   Global Step: 58280   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:52:40,996-Speed 5451.47 samples/sec   Loss 8.3255   LearningRate 0.1718   Epoch: 5   Global Step: 58290   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:52:48,507-Speed 5453.86 samples/sec   Loss 8.2497   LearningRate 0.1718   Epoch: 5   Global Step: 58300   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:52:56,037-Speed 5439.80 samples/sec   Loss 8.1794   LearningRate 0.1718   Epoch: 5   Global Step: 58310   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:53:03,639-Speed 5388.81 samples/sec   Loss 8.2873   LearningRate 0.1717   Epoch: 5   Global Step: 58320   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:53:11,168-Speed 5441.69 samples/sec   Loss 8.2861   LearningRate 0.1717   Epoch: 5   Global Step: 58330   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:53:18,635-Speed 5485.41 samples/sec   Loss 8.2178   LearningRate 0.1717   Epoch: 5   Global Step: 58340   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:53:26,204-Speed 5412.36 samples/sec   Loss 8.2771   LearningRate 0.1717   Epoch: 5   Global Step: 58350   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:53:33,791-Speed 5400.11 samples/sec   Loss 8.3191   LearningRate 0.1716   Epoch: 5   Global Step: 58360   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:53:41,532-Speed 5292.00 samples/sec   Loss 8.1974   LearningRate 0.1716   Epoch: 5   Global Step: 58370   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:53:49,088-Speed 5421.59 samples/sec   Loss 8.3433   LearningRate 0.1716   Epoch: 5   Global Step: 58380   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 07:53:56,551-Speed 5489.42 samples/sec   Loss 8.2822   LearningRate 0.1716   Epoch: 5   Global Step: 58390   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 07:54:04,117-Speed 5414.12 samples/sec   Loss 8.2716   LearningRate 0.1716   Epoch: 5   Global Step: 58400   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 07:54:11,602-Speed 5473.86 samples/sec   Loss 8.2969   LearningRate 0.1715   Epoch: 5   Global Step: 58410   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:54:19,157-Speed 5421.97 samples/sec   Loss 8.2923   LearningRate 0.1715   Epoch: 5   Global Step: 58420   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:54:26,895-Speed 5294.17 samples/sec   Loss 8.2087   LearningRate 0.1715   Epoch: 5   Global Step: 58430   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:54:34,495-Speed 5389.93 samples/sec   Loss 8.2877   LearningRate 0.1715   Epoch: 5   Global Step: 58440   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:54:42,057-Speed 5417.43 samples/sec   Loss 8.3265   LearningRate 0.1714   Epoch: 5   Global Step: 58450   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:54:49,573-Speed 5449.99 samples/sec   Loss 8.2728   LearningRate 0.1714   Epoch: 5   Global Step: 58460   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:54:57,126-Speed 5424.05 samples/sec   Loss 8.2792   LearningRate 0.1714   Epoch: 5   Global Step: 58470   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:55:04,653-Speed 5442.80 samples/sec   Loss 8.1768   LearningRate 0.1714   Epoch: 5   Global Step: 58480   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:55:12,176-Speed 5445.49 samples/sec   Loss 8.2819   LearningRate 0.1713   Epoch: 5   Global Step: 58490   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:55:19,805-Speed 5369.13 samples/sec   Loss 8.2939   LearningRate 0.1713   Epoch: 5   Global Step: 58500   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:55:27,384-Speed 5405.65 samples/sec   Loss 8.2383   LearningRate 0.1713   Epoch: 5   Global Step: 58510   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 07:55:34,938-Speed 5423.15 samples/sec   Loss 8.2743   LearningRate 0.1713   Epoch: 5   Global Step: 58520   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 07:55:42,485-Speed 5427.93 samples/sec   Loss 8.2331   LearningRate 0.1713   Epoch: 5   Global Step: 58530   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 07:55:49,990-Speed 5458.05 samples/sec   Loss 8.3248   LearningRate 0.1712   Epoch: 5   Global Step: 58540   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:55:57,743-Speed 5283.86 samples/sec   Loss 8.2779   LearningRate 0.1712   Epoch: 5   Global Step: 58550   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:56:05,426-Speed 5332.68 samples/sec   Loss 8.2417   LearningRate 0.1712   Epoch: 5   Global Step: 58560   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:56:12,924-Speed 5463.68 samples/sec   Loss 8.2633   LearningRate 0.1712   Epoch: 5   Global Step: 58570   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:56:20,424-Speed 5461.91 samples/sec   Loss 8.2158   LearningRate 0.1711   Epoch: 5   Global Step: 58580   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:56:27,932-Speed 5456.04 samples/sec   Loss 8.2297   LearningRate 0.1711   Epoch: 5   Global Step: 58590   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:56:35,562-Speed 5369.37 samples/sec   Loss 8.2505   LearningRate 0.1711   Epoch: 5   Global Step: 58600   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:56:43,027-Speed 5488.41 samples/sec   Loss 8.2573   LearningRate 0.1711   Epoch: 5   Global Step: 58610   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:56:50,483-Speed 5493.45 samples/sec   Loss 8.2752   LearningRate 0.1710   Epoch: 5   Global Step: 58620   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:56:58,013-Speed 5440.51 samples/sec   Loss 8.3037   LearningRate 0.1710   Epoch: 5   Global Step: 58630   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:57:05,551-Speed 5434.58 samples/sec   Loss 8.1876   LearningRate 0.1710   Epoch: 5   Global Step: 58640   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:57:13,044-Speed 5467.03 samples/sec   Loss 8.2258   LearningRate 0.1710   Epoch: 5   Global Step: 58650   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:57:20,659-Speed 5379.88 samples/sec   Loss 8.2196   LearningRate 0.1710   Epoch: 5   Global Step: 58660   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-01-08 07:57:28,195-Speed 5435.85 samples/sec   Loss 8.3163   LearningRate 0.1709   Epoch: 5   Global Step: 58670   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-01-08 07:57:35,690-Speed 5465.33 samples/sec   Loss 8.2357   LearningRate 0.1709   Epoch: 5   Global Step: 58680   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-01-08 07:57:43,206-Speed 5451.28 samples/sec   Loss 8.2439   LearningRate 0.1709   Epoch: 5   Global Step: 58690   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-01-08 07:57:50,783-Speed 5405.96 samples/sec   Loss 8.1838   LearningRate 0.1709   Epoch: 5   Global Step: 58700   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-01-08 07:57:58,348-Speed 5415.17 samples/sec   Loss 8.2829   LearningRate 0.1708   Epoch: 5   Global Step: 58710   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-01-08 07:58:05,912-Speed 5416.14 samples/sec   Loss 8.2134   LearningRate 0.1708   Epoch: 5   Global Step: 58720   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-01-08 07:58:13,458-Speed 5428.94 samples/sec   Loss 8.1924   LearningRate 0.1708   Epoch: 5   Global Step: 58730   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-01-08 07:58:21,063-Speed 5386.98 samples/sec   Loss 8.2525   LearningRate 0.1708   Epoch: 5   Global Step: 58740   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-01-08 07:58:28,691-Speed 5370.03 samples/sec   Loss 8.3135   LearningRate 0.1707   Epoch: 5   Global Step: 58750   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-01-08 07:58:36,182-Speed 5468.75 samples/sec   Loss 8.3638   LearningRate 0.1707   Epoch: 5   Global Step: 58760   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:58:43,780-Speed 5392.22 samples/sec   Loss 8.2681   LearningRate 0.1707   Epoch: 5   Global Step: 58770   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:58:51,301-Speed 5446.50 samples/sec   Loss 8.2884   LearningRate 0.1707   Epoch: 5   Global Step: 58780   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:58:58,943-Speed 5360.78 samples/sec   Loss 8.3020   LearningRate 0.1707   Epoch: 5   Global Step: 58790   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:59:06,472-Speed 5440.64 samples/sec   Loss 8.2363   LearningRate 0.1706   Epoch: 5   Global Step: 58800   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:59:14,079-Speed 5385.51 samples/sec   Loss 8.2285   LearningRate 0.1706   Epoch: 5   Global Step: 58810   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:59:21,598-Speed 5448.41 samples/sec   Loss 8.1943   LearningRate 0.1706   Epoch: 5   Global Step: 58820   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:59:29,041-Speed 5503.37 samples/sec   Loss 8.2698   LearningRate 0.1706   Epoch: 5   Global Step: 58830   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:59:36,600-Speed 5419.51 samples/sec   Loss 8.2688   LearningRate 0.1705   Epoch: 5   Global Step: 58840   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:59:44,053-Speed 5496.49 samples/sec   Loss 8.2117   LearningRate 0.1705   Epoch: 5   Global Step: 58850   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 07:59:51,504-Speed 5498.53 samples/sec   Loss 8.2267   LearningRate 0.1705   Epoch: 5   Global Step: 58860   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 07:59:58,987-Speed 5474.15 samples/sec   Loss 8.1725   LearningRate 0.1705   Epoch: 5   Global Step: 58870   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:00:06,559-Speed 5409.51 samples/sec   Loss 8.2160   LearningRate 0.1704   Epoch: 5   Global Step: 58880   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:00:14,095-Speed 5436.01 samples/sec   Loss 8.2200   LearningRate 0.1704   Epoch: 5   Global Step: 58890   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:00:21,554-Speed 5492.51 samples/sec   Loss 8.1942   LearningRate 0.1704   Epoch: 5   Global Step: 58900   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:00:29,144-Speed 5396.73 samples/sec   Loss 8.3163   LearningRate 0.1704   Epoch: 5   Global Step: 58910   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:00:36,729-Speed 5401.19 samples/sec   Loss 8.2268   LearningRate 0.1704   Epoch: 5   Global Step: 58920   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:00:44,257-Speed 5441.41 samples/sec   Loss 8.2204   LearningRate 0.1703   Epoch: 5   Global Step: 58930   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:00:51,799-Speed 5432.05 samples/sec   Loss 8.3089   LearningRate 0.1703   Epoch: 5   Global Step: 58940   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:00:59,238-Speed 5506.38 samples/sec   Loss 8.1966   LearningRate 0.1703   Epoch: 5   Global Step: 58950   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:01:06,809-Speed 5410.62 samples/sec   Loss 8.2301   LearningRate 0.1703   Epoch: 5   Global Step: 58960   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:01:14,312-Speed 5460.15 samples/sec   Loss 8.3121   LearningRate 0.1702   Epoch: 5   Global Step: 58970   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:01:21,776-Speed 5488.61 samples/sec   Loss 8.2793   LearningRate 0.1702   Epoch: 5   Global Step: 58980   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:01:29,314-Speed 5434.40 samples/sec   Loss 8.1847   LearningRate 0.1702   Epoch: 5   Global Step: 58990   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:01:36,891-Speed 5406.66 samples/sec   Loss 8.3039   LearningRate 0.1702   Epoch: 5   Global Step: 59000   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:01:44,453-Speed 5416.86 samples/sec   Loss 8.2046   LearningRate 0.1702   Epoch: 5   Global Step: 59010   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:01:52,006-Speed 5424.27 samples/sec   Loss 8.2714   LearningRate 0.1701   Epoch: 5   Global Step: 59020   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:01:59,638-Speed 5367.61 samples/sec   Loss 8.2299   LearningRate 0.1701   Epoch: 5   Global Step: 59030   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:02:07,216-Speed 5405.23 samples/sec   Loss 8.2238   LearningRate 0.1701   Epoch: 5   Global Step: 59040   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:02:14,707-Speed 5469.08 samples/sec   Loss 8.2596   LearningRate 0.1701   Epoch: 5   Global Step: 59050   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:02:22,220-Speed 5452.80 samples/sec   Loss 8.2234   LearningRate 0.1700   Epoch: 5   Global Step: 59060   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:02:29,704-Speed 5474.04 samples/sec   Loss 8.2531   LearningRate 0.1700   Epoch: 5   Global Step: 59070   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:02:37,171-Speed 5485.66 samples/sec   Loss 8.2288   LearningRate 0.1700   Epoch: 5   Global Step: 59080   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-01-08 08:02:44,712-Speed 5432.15 samples/sec   Loss 8.1853   LearningRate 0.1700   Epoch: 5   Global Step: 59090   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-01-08 08:02:52,349-Speed 5364.40 samples/sec   Loss 8.2661   LearningRate 0.1699   Epoch: 5   Global Step: 59100   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-01-08 08:02:59,902-Speed 5423.90 samples/sec   Loss 8.2482   LearningRate 0.1699   Epoch: 5   Global Step: 59110   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-01-08 08:03:07,743-Speed 5224.25 samples/sec   Loss 8.3109   LearningRate 0.1699   Epoch: 5   Global Step: 59120   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-01-08 08:03:15,250-Speed 5456.85 samples/sec   Loss 8.2458   LearningRate 0.1699   Epoch: 5   Global Step: 59130   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-01-08 08:03:22,749-Speed 5463.72 samples/sec   Loss 8.1583   LearningRate 0.1699   Epoch: 5   Global Step: 59140   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-01-08 08:03:30,266-Speed 5449.97 samples/sec   Loss 8.2297   LearningRate 0.1698   Epoch: 5   Global Step: 59150   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-01-08 08:03:37,898-Speed 5366.71 samples/sec   Loss 8.2834   LearningRate 0.1698   Epoch: 5   Global Step: 59160   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-01-08 08:03:45,431-Speed 5438.77 samples/sec   Loss 8.2270   LearningRate 0.1698   Epoch: 5   Global Step: 59170   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-01-08 08:03:53,016-Speed 5400.51 samples/sec   Loss 8.2079   LearningRate 0.1698   Epoch: 5   Global Step: 59180   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:04:00,508-Speed 5468.51 samples/sec   Loss 8.1828   LearningRate 0.1697   Epoch: 5   Global Step: 59190   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:04:07,991-Speed 5474.03 samples/sec   Loss 8.2204   LearningRate 0.1697   Epoch: 5   Global Step: 59200   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:04:15,546-Speed 5422.43 samples/sec   Loss 8.2180   LearningRate 0.1697   Epoch: 5   Global Step: 59210   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:04:23,205-Speed 5348.66 samples/sec   Loss 8.1920   LearningRate 0.1697   Epoch: 5   Global Step: 59220   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:04:30,748-Speed 5431.03 samples/sec   Loss 8.2193   LearningRate 0.1696   Epoch: 5   Global Step: 59230   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:04:38,344-Speed 5392.69 samples/sec   Loss 8.2045   LearningRate 0.1696   Epoch: 5   Global Step: 59240   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:04:45,890-Speed 5429.16 samples/sec   Loss 8.3024   LearningRate 0.1696   Epoch: 5   Global Step: 59250   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:04:53,550-Speed 5347.92 samples/sec   Loss 8.1682   LearningRate 0.1696   Epoch: 5   Global Step: 59260   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:05:01,107-Speed 5420.93 samples/sec   Loss 8.2228   LearningRate 0.1696   Epoch: 5   Global Step: 59270   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:05:08,618-Speed 5454.19 samples/sec   Loss 8.2186   LearningRate 0.1695   Epoch: 5   Global Step: 59280   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:05:16,322-Speed 5317.38 samples/sec   Loss 8.2100   LearningRate 0.1695   Epoch: 5   Global Step: 59290   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:05:23,856-Speed 5437.86 samples/sec   Loss 8.1318   LearningRate 0.1695   Epoch: 5   Global Step: 59300   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:05:31,442-Speed 5400.21 samples/sec   Loss 8.1613   LearningRate 0.1695   Epoch: 5   Global Step: 59310   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:05:39,077-Speed 5364.81 samples/sec   Loss 8.2120   LearningRate 0.1694   Epoch: 5   Global Step: 59320   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:05:46,534-Speed 5504.80 samples/sec   Loss 8.1714   LearningRate 0.1694   Epoch: 5   Global Step: 59330   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:05:54,167-Speed 5366.85 samples/sec   Loss 8.2808   LearningRate 0.1694   Epoch: 5   Global Step: 59340   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:06:01,792-Speed 5372.94 samples/sec   Loss 8.2041   LearningRate 0.1694   Epoch: 5   Global Step: 59350   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:06:09,310-Speed 5448.42 samples/sec   Loss 8.3106   LearningRate 0.1693   Epoch: 5   Global Step: 59360   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:06:16,870-Speed 5418.72 samples/sec   Loss 8.2014   LearningRate 0.1693   Epoch: 5   Global Step: 59370   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:06:24,512-Speed 5360.49 samples/sec   Loss 8.2453   LearningRate 0.1693   Epoch: 5   Global Step: 59380   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 08:06:32,024-Speed 5453.50 samples/sec   Loss 8.1694   LearningRate 0.1693   Epoch: 5   Global Step: 59390   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 08:06:39,576-Speed 5424.85 samples/sec   Loss 8.1207   LearningRate 0.1693   Epoch: 5   Global Step: 59400   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 08:06:47,079-Speed 5459.07 samples/sec   Loss 8.1819   LearningRate 0.1692   Epoch: 5   Global Step: 59410   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 08:06:54,586-Speed 5457.07 samples/sec   Loss 8.2028   LearningRate 0.1692   Epoch: 5   Global Step: 59420   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 08:07:02,153-Speed 5413.89 samples/sec   Loss 8.1677   LearningRate 0.1692   Epoch: 5   Global Step: 59430   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:07:09,631-Speed 5478.20 samples/sec   Loss 8.1891   LearningRate 0.1692   Epoch: 5   Global Step: 59440   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:07:17,177-Speed 5428.33 samples/sec   Loss 8.1949   LearningRate 0.1691   Epoch: 5   Global Step: 59450   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:07:24,758-Speed 5404.08 samples/sec   Loss 8.1758   LearningRate 0.1691   Epoch: 5   Global Step: 59460   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:07:32,323-Speed 5415.35 samples/sec   Loss 8.1760   LearningRate 0.1691   Epoch: 5   Global Step: 59470   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:07:39,787-Speed 5488.46 samples/sec   Loss 8.1489   LearningRate 0.1691   Epoch: 5   Global Step: 59480   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:07:47,288-Speed 5460.58 samples/sec   Loss 8.1996   LearningRate 0.1691   Epoch: 5   Global Step: 59490   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:07:54,813-Speed 5444.08 samples/sec   Loss 8.1992   LearningRate 0.1690   Epoch: 5   Global Step: 59500   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:08:02,432-Speed 5377.14 samples/sec   Loss 8.2864   LearningRate 0.1690   Epoch: 5   Global Step: 59510   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:08:09,955-Speed 5445.31 samples/sec   Loss 8.1891   LearningRate 0.1690   Epoch: 5   Global Step: 59520   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:08:17,583-Speed 5370.34 samples/sec   Loss 8.2352   LearningRate 0.1690   Epoch: 5   Global Step: 59530   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 08:08:25,064-Speed 5475.58 samples/sec   Loss 8.2031   LearningRate 0.1689   Epoch: 5   Global Step: 59540   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 08:08:32,568-Speed 5459.32 samples/sec   Loss 8.2039   LearningRate 0.1689   Epoch: 5   Global Step: 59550   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:08:40,126-Speed 5420.42 samples/sec   Loss 8.2453   LearningRate 0.1689   Epoch: 5   Global Step: 59560   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:08:47,673-Speed 5427.55 samples/sec   Loss 8.1889   LearningRate 0.1689   Epoch: 5   Global Step: 59570   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:08:55,214-Speed 5432.42 samples/sec   Loss 8.1712   LearningRate 0.1688   Epoch: 5   Global Step: 59580   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:09:02,826-Speed 5382.24 samples/sec   Loss 8.1591   LearningRate 0.1688   Epoch: 5   Global Step: 59590   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:09:10,325-Speed 5462.64 samples/sec   Loss 8.1830   LearningRate 0.1688   Epoch: 5   Global Step: 59600   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:09:17,900-Speed 5408.06 samples/sec   Loss 8.2250   LearningRate 0.1688   Epoch: 5   Global Step: 59610   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:09:25,714-Speed 5242.74 samples/sec   Loss 8.1654   LearningRate 0.1688   Epoch: 5   Global Step: 59620   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:09:33,391-Speed 5336.51 samples/sec   Loss 8.1669   LearningRate 0.1687   Epoch: 5   Global Step: 59630   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:09:40,984-Speed 5395.16 samples/sec   Loss 8.2291   LearningRate 0.1687   Epoch: 5   Global Step: 59640   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:09:48,505-Speed 5446.39 samples/sec   Loss 8.1241   LearningRate 0.1687   Epoch: 5   Global Step: 59650   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 08:09:56,035-Speed 5440.52 samples/sec   Loss 8.2051   LearningRate 0.1687   Epoch: 5   Global Step: 59660   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 08:10:03,556-Speed 5446.69 samples/sec   Loss 8.2103   LearningRate 0.1686   Epoch: 5   Global Step: 59670   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 08:10:11,196-Speed 5361.78 samples/sec   Loss 8.1846   LearningRate 0.1686   Epoch: 5   Global Step: 59680   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:10:18,908-Speed 5312.14 samples/sec   Loss 8.1715   LearningRate 0.1686   Epoch: 5   Global Step: 59690   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:10:26,551-Speed 5359.99 samples/sec   Loss 8.1447   LearningRate 0.1686   Epoch: 5   Global Step: 59700   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:10:34,138-Speed 5399.10 samples/sec   Loss 8.2399   LearningRate 0.1685   Epoch: 5   Global Step: 59710   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:10:41,676-Speed 5434.94 samples/sec   Loss 8.1556   LearningRate 0.1685   Epoch: 5   Global Step: 59720   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:10:49,174-Speed 5463.37 samples/sec   Loss 8.1468   LearningRate 0.1685   Epoch: 5   Global Step: 59730   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:10:56,666-Speed 5467.84 samples/sec   Loss 8.1640   LearningRate 0.1685   Epoch: 5   Global Step: 59740   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:11:04,223-Speed 5420.67 samples/sec   Loss 8.2503   LearningRate 0.1685   Epoch: 5   Global Step: 59750   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:11:11,784-Speed 5417.93 samples/sec   Loss 8.2148   LearningRate 0.1684   Epoch: 5   Global Step: 59760   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:11:19,363-Speed 5405.02 samples/sec   Loss 8.2024   LearningRate 0.1684   Epoch: 5   Global Step: 59770   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:11:26,904-Speed 5432.40 samples/sec   Loss 8.2156   LearningRate 0.1684   Epoch: 5   Global Step: 59780   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:11:34,414-Speed 5454.82 samples/sec   Loss 8.0896   LearningRate 0.1684   Epoch: 5   Global Step: 59790   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:11:42,020-Speed 5386.12 samples/sec   Loss 8.2134   LearningRate 0.1683   Epoch: 5   Global Step: 59800   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:11:49,636-Speed 5378.64 samples/sec   Loss 8.1235   LearningRate 0.1683   Epoch: 5   Global Step: 59810   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:11:57,131-Speed 5465.72 samples/sec   Loss 8.1751   LearningRate 0.1683   Epoch: 5   Global Step: 59820   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:12:04,632-Speed 5460.94 samples/sec   Loss 8.1848   LearningRate 0.1683   Epoch: 5   Global Step: 59830   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:12:12,143-Speed 5453.88 samples/sec   Loss 8.2049   LearningRate 0.1683   Epoch: 5   Global Step: 59840   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:12:19,801-Speed 5350.01 samples/sec   Loss 8.2422   LearningRate 0.1682   Epoch: 5   Global Step: 59850   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:12:27,315-Speed 5451.70 samples/sec   Loss 8.2261   LearningRate 0.1682   Epoch: 5   Global Step: 59860   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:12:34,913-Speed 5391.47 samples/sec   Loss 8.1935   LearningRate 0.1682   Epoch: 5   Global Step: 59870   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:12:42,357-Speed 5502.99 samples/sec   Loss 8.1900   LearningRate 0.1682   Epoch: 5   Global Step: 59880   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:12:50,005-Speed 5356.61 samples/sec   Loss 8.1480   LearningRate 0.1681   Epoch: 5   Global Step: 59890   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:12:57,748-Speed 5290.38 samples/sec   Loss 8.2237   LearningRate 0.1681   Epoch: 5   Global Step: 59900   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:13:05,401-Speed 5353.12 samples/sec   Loss 8.2065   LearningRate 0.1681   Epoch: 5   Global Step: 59910   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:13:12,965-Speed 5415.82 samples/sec   Loss 8.1957   LearningRate 0.1681   Epoch: 5   Global Step: 59920   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:13:20,571-Speed 5385.81 samples/sec   Loss 8.1321   LearningRate 0.1680   Epoch: 5   Global Step: 59930   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:13:28,152-Speed 5403.52 samples/sec   Loss 8.1787   LearningRate 0.1680   Epoch: 5   Global Step: 59940   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:13:35,730-Speed 5406.54 samples/sec   Loss 8.2058   LearningRate 0.1680   Epoch: 5   Global Step: 59950   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:13:43,195-Speed 5487.43 samples/sec   Loss 8.2352   LearningRate 0.1680   Epoch: 5   Global Step: 59960   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:13:50,657-Speed 5490.09 samples/sec   Loss 8.1810   LearningRate 0.1680   Epoch: 5   Global Step: 59970   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:13:58,226-Speed 5411.82 samples/sec   Loss 8.2205   LearningRate 0.1679   Epoch: 5   Global Step: 59980   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:14:05,929-Speed 5318.52 samples/sec   Loss 8.1258   LearningRate 0.1679   Epoch: 5   Global Step: 59990   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:14:13,411-Speed 5475.31 samples/sec   Loss 8.1819   LearningRate 0.1679   Epoch: 5   Global Step: 60000   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:14:57,714-[lfw][60000]XNorm: 22.505832
Training: 2022-01-08 08:14:57,715-[lfw][60000]Accuracy-Flip: 0.99783+-0.00289
Training: 2022-01-08 08:14:57,715-[lfw][60000]Accuracy-Highest: 0.99817
Training: 2022-01-08 08:15:50,005-[cfp_fp][60000]XNorm: 20.086574
Training: 2022-01-08 08:15:50,006-[cfp_fp][60000]Accuracy-Flip: 0.98529+-0.00542
Training: 2022-01-08 08:15:50,007-[cfp_fp][60000]Accuracy-Highest: 0.98600
Training: 2022-01-08 08:16:35,870-[agedb_30][60000]XNorm: 22.146841
Training: 2022-01-08 08:16:35,871-[agedb_30][60000]Accuracy-Flip: 0.97283+-0.00610
Training: 2022-01-08 08:16:35,872-[agedb_30][60000]Accuracy-Highest: 0.97517
Training: 2022-01-08 08:16:43,484-Speed 272.94 samples/sec   Loss 8.1454   LearningRate 0.1679   Epoch: 5   Global Step: 60010   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:16:50,945-Speed 5491.11 samples/sec   Loss 8.1256   LearningRate 0.1678   Epoch: 5   Global Step: 60020   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:16:58,384-Speed 5507.17 samples/sec   Loss 8.1467   LearningRate 0.1678   Epoch: 5   Global Step: 60030   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:17:05,896-Speed 5453.93 samples/sec   Loss 8.1353   LearningRate 0.1678   Epoch: 5   Global Step: 60040   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:17:13,400-Speed 5459.57 samples/sec   Loss 8.1883   LearningRate 0.1678   Epoch: 5   Global Step: 60050   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:17:20,915-Speed 5451.48 samples/sec   Loss 8.1904   LearningRate 0.1678   Epoch: 5   Global Step: 60060   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:17:28,451-Speed 5436.37 samples/sec   Loss 8.1595   LearningRate 0.1677   Epoch: 5   Global Step: 60070   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:17:36,049-Speed 5391.59 samples/sec   Loss 8.1432   LearningRate 0.1677   Epoch: 5   Global Step: 60080   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:17:43,625-Speed 5407.04 samples/sec   Loss 8.1306   LearningRate 0.1677   Epoch: 5   Global Step: 60090   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:17:51,257-Speed 5367.73 samples/sec   Loss 8.1409   LearningRate 0.1677   Epoch: 5   Global Step: 60100   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:17:58,806-Speed 5427.01 samples/sec   Loss 8.1694   LearningRate 0.1676   Epoch: 5   Global Step: 60110   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:18:06,270-Speed 5488.15 samples/sec   Loss 8.1378   LearningRate 0.1676   Epoch: 5   Global Step: 60120   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:18:13,809-Speed 5433.45 samples/sec   Loss 8.1164   LearningRate 0.1676   Epoch: 5   Global Step: 60130   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:18:21,269-Speed 5491.20 samples/sec   Loss 8.1470   LearningRate 0.1676   Epoch: 5   Global Step: 60140   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:18:28,786-Speed 5450.39 samples/sec   Loss 8.1344   LearningRate 0.1675   Epoch: 5   Global Step: 60150   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:18:36,540-Speed 5283.68 samples/sec   Loss 8.0945   LearningRate 0.1675   Epoch: 5   Global Step: 60160   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 08:18:44,020-Speed 5475.91 samples/sec   Loss 8.1691   LearningRate 0.1675   Epoch: 5   Global Step: 60170   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:18:51,560-Speed 5433.52 samples/sec   Loss 8.1625   LearningRate 0.1675   Epoch: 5   Global Step: 60180   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:18:59,055-Speed 5465.75 samples/sec   Loss 8.1746   LearningRate 0.1675   Epoch: 5   Global Step: 60190   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:19:06,635-Speed 5404.05 samples/sec   Loss 8.2445   LearningRate 0.1674   Epoch: 5   Global Step: 60200   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:19:14,111-Speed 5479.59 samples/sec   Loss 8.1538   LearningRate 0.1674   Epoch: 5   Global Step: 60210   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:19:21,676-Speed 5415.44 samples/sec   Loss 8.1359   LearningRate 0.1674   Epoch: 5   Global Step: 60220   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:19:29,175-Speed 5462.91 samples/sec   Loss 8.1550   LearningRate 0.1674   Epoch: 5   Global Step: 60230   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:19:36,666-Speed 5468.79 samples/sec   Loss 8.0802   LearningRate 0.1673   Epoch: 5   Global Step: 60240   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:19:44,195-Speed 5440.56 samples/sec   Loss 8.0311   LearningRate 0.1673   Epoch: 5   Global Step: 60250   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:19:51,867-Speed 5339.50 samples/sec   Loss 8.1106   LearningRate 0.1673   Epoch: 5   Global Step: 60260   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:19:59,445-Speed 5406.44 samples/sec   Loss 8.0523   LearningRate 0.1673   Epoch: 5   Global Step: 60270   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:20:06,989-Speed 5429.81 samples/sec   Loss 8.1022   LearningRate 0.1672   Epoch: 5   Global Step: 60280   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:20:14,523-Speed 5437.72 samples/sec   Loss 8.1085   LearningRate 0.1672   Epoch: 5   Global Step: 60290   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:20:22,005-Speed 5475.50 samples/sec   Loss 8.1178   LearningRate 0.1672   Epoch: 5   Global Step: 60300   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:20:29,631-Speed 5371.74 samples/sec   Loss 8.1657   LearningRate 0.1672   Epoch: 5   Global Step: 60310   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:20:37,135-Speed 5458.99 samples/sec   Loss 8.1280   LearningRate 0.1672   Epoch: 5   Global Step: 60320   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:20:44,581-Speed 5501.68 samples/sec   Loss 8.0691   LearningRate 0.1671   Epoch: 5   Global Step: 60330   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:20:52,209-Speed 5370.29 samples/sec   Loss 8.1642   LearningRate 0.1671   Epoch: 5   Global Step: 60340   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:20:59,784-Speed 5407.86 samples/sec   Loss 8.1399   LearningRate 0.1671   Epoch: 5   Global Step: 60350   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:21:07,261-Speed 5479.37 samples/sec   Loss 8.1298   LearningRate 0.1671   Epoch: 5   Global Step: 60360   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:21:14,805-Speed 5429.69 samples/sec   Loss 8.1391   LearningRate 0.1670   Epoch: 5   Global Step: 60370   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 08:21:22,332-Speed 5442.95 samples/sec   Loss 8.1051   LearningRate 0.1670   Epoch: 5   Global Step: 60380   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:21:29,785-Speed 5496.17 samples/sec   Loss 8.1845   LearningRate 0.1670   Epoch: 5   Global Step: 60390   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:21:37,331-Speed 5428.53 samples/sec   Loss 8.0716   LearningRate 0.1670   Epoch: 5   Global Step: 60400   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:21:44,846-Speed 5451.54 samples/sec   Loss 8.1460   LearningRate 0.1670   Epoch: 5   Global Step: 60410   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:21:52,339-Speed 5466.82 samples/sec   Loss 8.1985   LearningRate 0.1669   Epoch: 5   Global Step: 60420   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:21:59,859-Speed 5447.17 samples/sec   Loss 8.1387   LearningRate 0.1669   Epoch: 5   Global Step: 60430   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:22:07,363-Speed 5459.83 samples/sec   Loss 8.1020   LearningRate 0.1669   Epoch: 5   Global Step: 60440   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:22:14,995-Speed 5367.64 samples/sec   Loss 8.1063   LearningRate 0.1669   Epoch: 5   Global Step: 60450   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:22:22,505-Speed 5454.43 samples/sec   Loss 8.0980   LearningRate 0.1668   Epoch: 5   Global Step: 60460   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:22:30,000-Speed 5465.42 samples/sec   Loss 8.1191   LearningRate 0.1668   Epoch: 5   Global Step: 60470   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:22:37,516-Speed 5450.36 samples/sec   Loss 8.1031   LearningRate 0.1668   Epoch: 5   Global Step: 60480   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:22:44,980-Speed 5488.60 samples/sec   Loss 8.1310   LearningRate 0.1668   Epoch: 5   Global Step: 60490   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-01-08 08:22:52,592-Speed 5381.62 samples/sec   Loss 8.1875   LearningRate 0.1667   Epoch: 5   Global Step: 60500   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-01-08 08:23:00,149-Speed 5421.08 samples/sec   Loss 8.1755   LearningRate 0.1667   Epoch: 5   Global Step: 60510   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-01-08 08:23:07,667-Speed 5448.99 samples/sec   Loss 8.1826   LearningRate 0.1667   Epoch: 5   Global Step: 60520   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-01-08 08:23:15,288-Speed 5375.23 samples/sec   Loss 8.1560   LearningRate 0.1667   Epoch: 5   Global Step: 60530   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-01-08 08:23:22,857-Speed 5412.14 samples/sec   Loss 8.1456   LearningRate 0.1667   Epoch: 5   Global Step: 60540   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-01-08 08:23:30,266-Speed 5529.43 samples/sec   Loss 8.0564   LearningRate 0.1666   Epoch: 5   Global Step: 60550   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-01-08 08:23:37,741-Speed 5480.14 samples/sec   Loss 8.1470   LearningRate 0.1666   Epoch: 5   Global Step: 60560   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-01-08 08:23:45,277-Speed 5435.35 samples/sec   Loss 8.1679   LearningRate 0.1666   Epoch: 5   Global Step: 60570   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-01-08 08:23:52,782-Speed 5458.64 samples/sec   Loss 8.1121   LearningRate 0.1666   Epoch: 5   Global Step: 60580   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-01-08 08:24:00,289-Speed 5457.22 samples/sec   Loss 8.1763   LearningRate 0.1665   Epoch: 5   Global Step: 60590   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:24:07,812-Speed 5444.96 samples/sec   Loss 8.1373   LearningRate 0.1665   Epoch: 5   Global Step: 60600   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:24:15,334-Speed 5446.08 samples/sec   Loss 8.0861   LearningRate 0.1665   Epoch: 5   Global Step: 60610   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:24:22,889-Speed 5422.82 samples/sec   Loss 8.1681   LearningRate 0.1665   Epoch: 5   Global Step: 60620   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:24:30,410-Speed 5446.53 samples/sec   Loss 8.1918   LearningRate 0.1665   Epoch: 5   Global Step: 60630   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:24:37,927-Speed 5450.06 samples/sec   Loss 8.1845   LearningRate 0.1664   Epoch: 5   Global Step: 60640   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:24:45,511-Speed 5401.08 samples/sec   Loss 8.0795   LearningRate 0.1664   Epoch: 5   Global Step: 60650   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:24:53,030-Speed 5448.48 samples/sec   Loss 8.1613   LearningRate 0.1664   Epoch: 5   Global Step: 60660   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:25:00,543-Speed 5452.92 samples/sec   Loss 8.0886   LearningRate 0.1664   Epoch: 5   Global Step: 60670   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:25:08,074-Speed 5439.29 samples/sec   Loss 8.1143   LearningRate 0.1663   Epoch: 5   Global Step: 60680   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:25:15,606-Speed 5438.83 samples/sec   Loss 8.1017   LearningRate 0.1663   Epoch: 5   Global Step: 60690   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:25:23,074-Speed 5485.19 samples/sec   Loss 8.1614   LearningRate 0.1663   Epoch: 5   Global Step: 60700   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:25:30,597-Speed 5445.64 samples/sec   Loss 8.1321   LearningRate 0.1663   Epoch: 5   Global Step: 60710   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:25:38,070-Speed 5481.93 samples/sec   Loss 8.1385   LearningRate 0.1663   Epoch: 5   Global Step: 60720   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:25:45,627-Speed 5420.77 samples/sec   Loss 8.1354   LearningRate 0.1662   Epoch: 5   Global Step: 60730   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:25:53,106-Speed 5477.25 samples/sec   Loss 8.1232   LearningRate 0.1662   Epoch: 5   Global Step: 60740   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:26:00,629-Speed 5445.29 samples/sec   Loss 8.1514   LearningRate 0.1662   Epoch: 5   Global Step: 60750   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:26:08,136-Speed 5457.51 samples/sec   Loss 8.1303   LearningRate 0.1662   Epoch: 5   Global Step: 60760   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:26:15,622-Speed 5471.39 samples/sec   Loss 8.0806   LearningRate 0.1661   Epoch: 5   Global Step: 60770   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:26:23,164-Speed 5431.75 samples/sec   Loss 8.1215   LearningRate 0.1661   Epoch: 5   Global Step: 60780   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:26:30,746-Speed 5403.66 samples/sec   Loss 8.0694   LearningRate 0.1661   Epoch: 5   Global Step: 60790   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 08:26:38,393-Speed 5356.41 samples/sec   Loss 8.1364   LearningRate 0.1661   Epoch: 5   Global Step: 60800   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:26:45,876-Speed 5474.64 samples/sec   Loss 8.2141   LearningRate 0.1660   Epoch: 5   Global Step: 60810   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:26:53,374-Speed 5463.08 samples/sec   Loss 8.1088   LearningRate 0.1660   Epoch: 5   Global Step: 60820   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:27:00,880-Speed 5457.96 samples/sec   Loss 8.1072   LearningRate 0.1660   Epoch: 5   Global Step: 60830   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:27:08,391-Speed 5454.35 samples/sec   Loss 8.1021   LearningRate 0.1660   Epoch: 5   Global Step: 60840   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:27:15,892-Speed 5460.85 samples/sec   Loss 8.0506   LearningRate 0.1660   Epoch: 5   Global Step: 60850   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:27:23,454-Speed 5417.19 samples/sec   Loss 8.0966   LearningRate 0.1659   Epoch: 5   Global Step: 60860   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:27:30,965-Speed 5453.96 samples/sec   Loss 8.1392   LearningRate 0.1659   Epoch: 5   Global Step: 60870   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:27:38,501-Speed 5436.12 samples/sec   Loss 8.0806   LearningRate 0.1659   Epoch: 5   Global Step: 60880   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:27:46,248-Speed 5288.18 samples/sec   Loss 8.1228   LearningRate 0.1659   Epoch: 5   Global Step: 60890   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:27:53,821-Speed 5408.85 samples/sec   Loss 8.1391   LearningRate 0.1658   Epoch: 5   Global Step: 60900   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:28:01,448-Speed 5426.82 samples/sec   Loss 8.0664   LearningRate 0.1658   Epoch: 5   Global Step: 60910   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:28:09,118-Speed 5340.97 samples/sec   Loss 8.1119   LearningRate 0.1658   Epoch: 5   Global Step: 60920   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:28:16,660-Speed 5431.72 samples/sec   Loss 8.0645   LearningRate 0.1658   Epoch: 5   Global Step: 60930   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:28:24,263-Speed 5387.74 samples/sec   Loss 8.0227   LearningRate 0.1658   Epoch: 5   Global Step: 60940   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:28:31,721-Speed 5493.40 samples/sec   Loss 8.0822   LearningRate 0.1657   Epoch: 5   Global Step: 60950   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 08:28:39,258-Speed 5434.89 samples/sec   Loss 8.1734   LearningRate 0.1657   Epoch: 5   Global Step: 60960   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:28:46,783-Speed 5443.94 samples/sec   Loss 8.0786   LearningRate 0.1657   Epoch: 5   Global Step: 60970   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:28:54,250-Speed 5486.45 samples/sec   Loss 8.1126   LearningRate 0.1657   Epoch: 5   Global Step: 60980   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:29:01,757-Speed 5457.01 samples/sec   Loss 8.0822   LearningRate 0.1656   Epoch: 5   Global Step: 60990   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:29:09,234-Speed 5478.30 samples/sec   Loss 8.1295   LearningRate 0.1656   Epoch: 5   Global Step: 61000   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:29:16,801-Speed 5414.11 samples/sec   Loss 8.0912   LearningRate 0.1656   Epoch: 5   Global Step: 61010   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:29:24,342-Speed 5432.43 samples/sec   Loss 8.1334   LearningRate 0.1656   Epoch: 5   Global Step: 61020   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:29:31,935-Speed 5394.61 samples/sec   Loss 8.0809   LearningRate 0.1655   Epoch: 5   Global Step: 61030   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:29:39,585-Speed 5355.47 samples/sec   Loss 8.0682   LearningRate 0.1655   Epoch: 5   Global Step: 61040   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:29:47,150-Speed 5415.06 samples/sec   Loss 8.0319   LearningRate 0.1655   Epoch: 5   Global Step: 61050   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 08:29:54,744-Speed 5394.37 samples/sec   Loss 8.1323   LearningRate 0.1655   Epoch: 5   Global Step: 61060   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 08:30:02,275-Speed 5439.50 samples/sec   Loss 8.1313   LearningRate 0.1655   Epoch: 5   Global Step: 61070   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:30:09,916-Speed 5361.47 samples/sec   Loss 8.1097   LearningRate 0.1654   Epoch: 5   Global Step: 61080   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:30:17,394-Speed 5477.51 samples/sec   Loss 8.1100   LearningRate 0.1654   Epoch: 5   Global Step: 61090   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:30:24,898-Speed 5459.50 samples/sec   Loss 8.0726   LearningRate 0.1654   Epoch: 5   Global Step: 61100   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:30:32,473-Speed 5407.92 samples/sec   Loss 8.0840   LearningRate 0.1654   Epoch: 5   Global Step: 61110   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:30:39,961-Speed 5470.91 samples/sec   Loss 8.0491   LearningRate 0.1653   Epoch: 5   Global Step: 61120   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:30:47,443-Speed 5474.84 samples/sec   Loss 8.0934   LearningRate 0.1653   Epoch: 5   Global Step: 61130   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:30:55,849-Speed 4873.59 samples/sec   Loss 8.0362   LearningRate 0.1653   Epoch: 5   Global Step: 61140   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:31:03,429-Speed 5404.35 samples/sec   Loss 8.1402   LearningRate 0.1653   Epoch: 5   Global Step: 61150   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:31:10,958-Speed 5440.58 samples/sec   Loss 8.1657   LearningRate 0.1653   Epoch: 5   Global Step: 61160   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:31:18,497-Speed 5434.33 samples/sec   Loss 8.0275   LearningRate 0.1652   Epoch: 5   Global Step: 61170   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:31:26,072-Speed 5407.88 samples/sec   Loss 8.0749   LearningRate 0.1652   Epoch: 5   Global Step: 61180   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:31:33,538-Speed 5487.30 samples/sec   Loss 8.1400   LearningRate 0.1652   Epoch: 5   Global Step: 61190   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:31:41,108-Speed 5411.09 samples/sec   Loss 8.0610   LearningRate 0.1652   Epoch: 5   Global Step: 61200   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:31:48,641-Speed 5438.40 samples/sec   Loss 8.0638   LearningRate 0.1651   Epoch: 5   Global Step: 61210   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:31:56,129-Speed 5471.12 samples/sec   Loss 8.0079   LearningRate 0.1651   Epoch: 5   Global Step: 61220   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:32:03,691-Speed 5416.97 samples/sec   Loss 8.0916   LearningRate 0.1651   Epoch: 5   Global Step: 61230   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:32:11,240-Speed 5426.27 samples/sec   Loss 8.1046   LearningRate 0.1651   Epoch: 5   Global Step: 61240   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:32:18,757-Speed 5450.16 samples/sec   Loss 8.1418   LearningRate 0.1651   Epoch: 5   Global Step: 61250   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:32:26,300-Speed 5431.16 samples/sec   Loss 8.1486   LearningRate 0.1650   Epoch: 5   Global Step: 61260   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:32:34,008-Speed 5314.89 samples/sec   Loss 8.0444   LearningRate 0.1650   Epoch: 5   Global Step: 61270   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:32:41,570-Speed 5416.78 samples/sec   Loss 8.0861   LearningRate 0.1650   Epoch: 5   Global Step: 61280   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:32:49,055-Speed 5473.75 samples/sec   Loss 8.0529   LearningRate 0.1650   Epoch: 5   Global Step: 61290   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:32:56,617-Speed 5417.39 samples/sec   Loss 8.0719   LearningRate 0.1649   Epoch: 5   Global Step: 61300   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:33:04,171-Speed 5422.81 samples/sec   Loss 8.0521   LearningRate 0.1649   Epoch: 5   Global Step: 61310   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:33:11,782-Speed 5382.41 samples/sec   Loss 8.1336   LearningRate 0.1649   Epoch: 5   Global Step: 61320   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:33:19,444-Speed 5346.56 samples/sec   Loss 7.9907   LearningRate 0.1649   Epoch: 5   Global Step: 61330   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:33:27,063-Speed 5377.17 samples/sec   Loss 8.0450   LearningRate 0.1648   Epoch: 5   Global Step: 61340   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 08:33:34,532-Speed 5484.37 samples/sec   Loss 8.0725   LearningRate 0.1648   Epoch: 5   Global Step: 61350   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 08:33:42,284-Speed 5284.56 samples/sec   Loss 8.1586   LearningRate 0.1648   Epoch: 5   Global Step: 61360   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:33:49,954-Speed 5341.28 samples/sec   Loss 8.0755   LearningRate 0.1648   Epoch: 5   Global Step: 61370   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:33:57,493-Speed 5433.65 samples/sec   Loss 8.0593   LearningRate 0.1648   Epoch: 5   Global Step: 61380   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:34:05,102-Speed 5384.16 samples/sec   Loss 8.0245   LearningRate 0.1647   Epoch: 5   Global Step: 61390   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:34:12,577-Speed 5479.65 samples/sec   Loss 8.1010   LearningRate 0.1647   Epoch: 5   Global Step: 61400   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:34:20,133-Speed 5421.80 samples/sec   Loss 8.1171   LearningRate 0.1647   Epoch: 5   Global Step: 61410   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:34:27,826-Speed 5325.00 samples/sec   Loss 8.0964   LearningRate 0.1647   Epoch: 5   Global Step: 61420   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:34:35,382-Speed 5421.58 samples/sec   Loss 8.0924   LearningRate 0.1646   Epoch: 5   Global Step: 61430   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:34:42,909-Speed 5442.52 samples/sec   Loss 8.0889   LearningRate 0.1646   Epoch: 5   Global Step: 61440   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:34:50,464-Speed 5422.63 samples/sec   Loss 8.0924   LearningRate 0.1646   Epoch: 5   Global Step: 61450   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:34:58,040-Speed 5406.84 samples/sec   Loss 8.1138   LearningRate 0.1646   Epoch: 5   Global Step: 61460   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:35:05,678-Speed 5363.51 samples/sec   Loss 8.0862   LearningRate 0.1646   Epoch: 5   Global Step: 61470   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:35:13,321-Speed 5359.87 samples/sec   Loss 8.0228   LearningRate 0.1645   Epoch: 5   Global Step: 61480   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:35:20,856-Speed 5436.69 samples/sec   Loss 7.9843   LearningRate 0.1645   Epoch: 5   Global Step: 61490   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:35:28,363-Speed 5456.63 samples/sec   Loss 8.0939   LearningRate 0.1645   Epoch: 5   Global Step: 61500   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:35:35,927-Speed 5415.95 samples/sec   Loss 8.0303   LearningRate 0.1645   Epoch: 5   Global Step: 61510   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:35:43,489-Speed 5416.97 samples/sec   Loss 8.0156   LearningRate 0.1644   Epoch: 5   Global Step: 61520   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:35:51,025-Speed 5436.29 samples/sec   Loss 8.0358   LearningRate 0.1644   Epoch: 5   Global Step: 61530   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:35:58,572-Speed 5427.74 samples/sec   Loss 8.0446   LearningRate 0.1644   Epoch: 5   Global Step: 61540   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:36:06,096-Speed 5444.47 samples/sec   Loss 8.0336   LearningRate 0.1644   Epoch: 5   Global Step: 61550   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:36:13,675-Speed 5405.21 samples/sec   Loss 8.0423   LearningRate 0.1644   Epoch: 5   Global Step: 61560   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:36:21,288-Speed 5381.46 samples/sec   Loss 8.0959   LearningRate 0.1643   Epoch: 5   Global Step: 61570   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:36:28,924-Speed 5364.19 samples/sec   Loss 8.1593   LearningRate 0.1643   Epoch: 5   Global Step: 61580   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:36:36,437-Speed 5453.08 samples/sec   Loss 8.0761   LearningRate 0.1643   Epoch: 5   Global Step: 61590   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:36:43,962-Speed 5444.11 samples/sec   Loss 8.0476   LearningRate 0.1643   Epoch: 5   Global Step: 61600   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:36:51,516-Speed 5423.09 samples/sec   Loss 8.0500   LearningRate 0.1642   Epoch: 5   Global Step: 61610   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:36:59,089-Speed 5409.26 samples/sec   Loss 8.1129   LearningRate 0.1642   Epoch: 5   Global Step: 61620   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:37:06,596-Speed 5456.65 samples/sec   Loss 8.0375   LearningRate 0.1642   Epoch: 5   Global Step: 61630   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:37:14,105-Speed 5455.86 samples/sec   Loss 8.1298   LearningRate 0.1642   Epoch: 5   Global Step: 61640   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:37:21,770-Speed 5344.48 samples/sec   Loss 8.0204   LearningRate 0.1641   Epoch: 5   Global Step: 61650   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:37:29,342-Speed 5409.92 samples/sec   Loss 7.9301   LearningRate 0.1641   Epoch: 5   Global Step: 61660   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:37:36,881-Speed 5434.09 samples/sec   Loss 8.0844   LearningRate 0.1641   Epoch: 5   Global Step: 61670   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:37:44,386-Speed 5458.38 samples/sec   Loss 8.0640   LearningRate 0.1641   Epoch: 5   Global Step: 61680   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:37:51,916-Speed 5440.68 samples/sec   Loss 7.9866   LearningRate 0.1641   Epoch: 5   Global Step: 61690   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:37:59,493-Speed 5406.18 samples/sec   Loss 8.0525   LearningRate 0.1640   Epoch: 5   Global Step: 61700   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:38:07,003-Speed 5454.56 samples/sec   Loss 8.0191   LearningRate 0.1640   Epoch: 5   Global Step: 61710   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:38:14,551-Speed 5427.83 samples/sec   Loss 7.9881   LearningRate 0.1640   Epoch: 5   Global Step: 61720   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:38:22,055-Speed 5458.84 samples/sec   Loss 8.0506   LearningRate 0.1640   Epoch: 5   Global Step: 61730   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:38:29,583-Speed 5441.50 samples/sec   Loss 8.0342   LearningRate 0.1639   Epoch: 5   Global Step: 61740   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:38:37,121-Speed 5434.71 samples/sec   Loss 8.0634   LearningRate 0.1639   Epoch: 5   Global Step: 61750   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:38:44,683-Speed 5416.88 samples/sec   Loss 8.1133   LearningRate 0.1639   Epoch: 5   Global Step: 61760   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:38:52,273-Speed 5397.26 samples/sec   Loss 8.0269   LearningRate 0.1639   Epoch: 5   Global Step: 61770   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:38:59,816-Speed 5431.77 samples/sec   Loss 8.0266   LearningRate 0.1639   Epoch: 5   Global Step: 61780   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:39:07,407-Speed 5396.37 samples/sec   Loss 8.0203   LearningRate 0.1638   Epoch: 5   Global Step: 61790   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:39:14,980-Speed 5408.73 samples/sec   Loss 8.0987   LearningRate 0.1638   Epoch: 5   Global Step: 61800   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:39:22,464-Speed 5474.24 samples/sec   Loss 8.0462   LearningRate 0.1638   Epoch: 5   Global Step: 61810   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:39:29,990-Speed 5443.00 samples/sec   Loss 8.1082   LearningRate 0.1638   Epoch: 5   Global Step: 61820   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:39:37,580-Speed 5397.49 samples/sec   Loss 8.1144   LearningRate 0.1637   Epoch: 5   Global Step: 61830   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:39:45,226-Speed 5357.75 samples/sec   Loss 8.0419   LearningRate 0.1637   Epoch: 5   Global Step: 61840   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:39:52,739-Speed 5452.47 samples/sec   Loss 8.1166   LearningRate 0.1637   Epoch: 5   Global Step: 61850   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:40:00,213-Speed 5481.33 samples/sec   Loss 8.0611   LearningRate 0.1637   Epoch: 5   Global Step: 61860   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:40:07,757-Speed 5429.80 samples/sec   Loss 8.1206   LearningRate 0.1637   Epoch: 5   Global Step: 61870   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:40:15,324-Speed 5413.41 samples/sec   Loss 7.9959   LearningRate 0.1636   Epoch: 5   Global Step: 61880   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:40:23,140-Speed 5241.96 samples/sec   Loss 8.0373   LearningRate 0.1636   Epoch: 5   Global Step: 61890   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:40:30,780-Speed 5361.61 samples/sec   Loss 7.9053   LearningRate 0.1636   Epoch: 5   Global Step: 61900   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:40:38,335-Speed 5422.52 samples/sec   Loss 8.0029   LearningRate 0.1636   Epoch: 5   Global Step: 61910   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:40:45,898-Speed 5416.13 samples/sec   Loss 8.0371   LearningRate 0.1635   Epoch: 5   Global Step: 61920   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:40:53,598-Speed 5320.09 samples/sec   Loss 8.0015   LearningRate 0.1635   Epoch: 5   Global Step: 61930   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:41:01,129-Speed 5440.12 samples/sec   Loss 8.1628   LearningRate 0.1635   Epoch: 5   Global Step: 61940   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:41:08,906-Speed 5267.40 samples/sec   Loss 8.0040   LearningRate 0.1635   Epoch: 5   Global Step: 61950   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:41:16,458-Speed 5424.27 samples/sec   Loss 8.0692   LearningRate 0.1635   Epoch: 5   Global Step: 61960   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:41:24,015-Speed 5422.28 samples/sec   Loss 8.0321   LearningRate 0.1634   Epoch: 5   Global Step: 61970   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:41:31,664-Speed 5355.53 samples/sec   Loss 8.0531   LearningRate 0.1634   Epoch: 5   Global Step: 61980   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:41:39,276-Speed 5381.63 samples/sec   Loss 8.0506   LearningRate 0.1634   Epoch: 5   Global Step: 61990   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:41:46,792-Speed 5450.27 samples/sec   Loss 7.9653   LearningRate 0.1634   Epoch: 5   Global Step: 62000   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:42:30,440-[lfw][62000]XNorm: 24.483062
Training: 2022-01-08 08:42:30,440-[lfw][62000]Accuracy-Flip: 0.99750+-0.00271
Training: 2022-01-08 08:42:30,441-[lfw][62000]Accuracy-Highest: 0.99817
Training: 2022-01-08 08:43:21,313-[cfp_fp][62000]XNorm: 21.841480
Training: 2022-01-08 08:43:21,314-[cfp_fp][62000]Accuracy-Flip: 0.98414+-0.00591
Training: 2022-01-08 08:43:21,314-[cfp_fp][62000]Accuracy-Highest: 0.98600
Training: 2022-01-08 08:44:07,315-[agedb_30][62000]XNorm: 23.860036
Training: 2022-01-08 08:44:07,316-[agedb_30][62000]Accuracy-Flip: 0.97667+-0.00632
Training: 2022-01-08 08:44:07,317-[agedb_30][62000]Accuracy-Highest: 0.97667
Training: 2022-01-08 08:44:14,871-Speed 276.61 samples/sec   Loss 8.0580   LearningRate 0.1633   Epoch: 5   Global Step: 62010   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:44:22,494-Speed 5374.90 samples/sec   Loss 7.9895   LearningRate 0.1633   Epoch: 5   Global Step: 62020   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:44:30,167-Speed 5339.08 samples/sec   Loss 8.0346   LearningRate 0.1633   Epoch: 5   Global Step: 62030   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:44:37,696-Speed 5441.68 samples/sec   Loss 8.0234   LearningRate 0.1633   Epoch: 5   Global Step: 62040   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:44:45,237-Speed 5432.14 samples/sec   Loss 8.0448   LearningRate 0.1632   Epoch: 5   Global Step: 62050   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:44:52,770-Speed 5438.44 samples/sec   Loss 8.0530   LearningRate 0.1632   Epoch: 5   Global Step: 62060   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:45:00,290-Speed 5448.06 samples/sec   Loss 8.0814   LearningRate 0.1632   Epoch: 5   Global Step: 62070   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 08:45:07,792-Speed 5460.20 samples/sec   Loss 8.0471   LearningRate 0.1632   Epoch: 5   Global Step: 62080   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:45:15,317-Speed 5443.96 samples/sec   Loss 8.0694   LearningRate 0.1632   Epoch: 5   Global Step: 62090   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:45:22,779-Speed 5489.86 samples/sec   Loss 8.0410   LearningRate 0.1631   Epoch: 5   Global Step: 62100   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:45:30,435-Speed 5350.94 samples/sec   Loss 8.0090   LearningRate 0.1631   Epoch: 5   Global Step: 62110   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:45:38,034-Speed 5390.81 samples/sec   Loss 8.0557   LearningRate 0.1631   Epoch: 5   Global Step: 62120   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:45:45,524-Speed 5469.10 samples/sec   Loss 8.0338   LearningRate 0.1631   Epoch: 5   Global Step: 62130   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:45:53,191-Speed 5343.25 samples/sec   Loss 7.9804   LearningRate 0.1630   Epoch: 5   Global Step: 62140   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:46:00,743-Speed 5424.83 samples/sec   Loss 8.0015   LearningRate 0.1630   Epoch: 5   Global Step: 62150   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:46:08,319-Speed 5406.97 samples/sec   Loss 8.0124   LearningRate 0.1630   Epoch: 5   Global Step: 62160   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:46:15,888-Speed 5412.14 samples/sec   Loss 8.0546   LearningRate 0.1630   Epoch: 5   Global Step: 62170   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:46:23,355-Speed 5486.04 samples/sec   Loss 8.0771   LearningRate 0.1630   Epoch: 5   Global Step: 62180   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-01-08 08:46:30,878-Speed 5445.70 samples/sec   Loss 8.0594   LearningRate 0.1629   Epoch: 5   Global Step: 62190   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-01-08 08:46:38,481-Speed 5387.49 samples/sec   Loss 8.0860   LearningRate 0.1629   Epoch: 5   Global Step: 62200   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-01-08 08:46:46,057-Speed 5407.70 samples/sec   Loss 8.0537   LearningRate 0.1629   Epoch: 5   Global Step: 62210   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-01-08 08:47:08,463-Speed 1828.29 samples/sec   Loss 7.9910   LearningRate 0.1629   Epoch: 6   Global Step: 62220   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-01-08 08:47:15,994-Speed 5438.92 samples/sec   Loss 8.0278   LearningRate 0.1628   Epoch: 6   Global Step: 62230   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-01-08 08:47:23,423-Speed 5514.65 samples/sec   Loss 8.0598   LearningRate 0.1628   Epoch: 6   Global Step: 62240   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-01-08 08:47:31,050-Speed 5371.01 samples/sec   Loss 7.9564   LearningRate 0.1628   Epoch: 6   Global Step: 62250   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-01-08 08:47:38,621-Speed 5410.80 samples/sec   Loss 7.9926   LearningRate 0.1628   Epoch: 6   Global Step: 62260   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-01-08 08:47:46,242-Speed 5375.64 samples/sec   Loss 8.0076   LearningRate 0.1628   Epoch: 6   Global Step: 62270   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-01-08 08:47:53,673-Speed 5512.88 samples/sec   Loss 7.9362   LearningRate 0.1627   Epoch: 6   Global Step: 62280   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:48:01,116-Speed 5503.56 samples/sec   Loss 8.0098   LearningRate 0.1627   Epoch: 6   Global Step: 62290   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:48:08,547-Speed 5512.98 samples/sec   Loss 8.0955   LearningRate 0.1627   Epoch: 6   Global Step: 62300   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-01-08 08:48:15,987-Speed 5506.08 samples/sec   Loss 7.9582   LearningRate 0.1627   Epoch: 6   Global Step: 62310   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-01-08 08:48:23,478-Speed 5468.63 samples/sec   Loss 7.9825   LearningRate 0.1626   Epoch: 6   Global Step: 62320   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-01-08 08:48:30,943-Speed 5487.74 samples/sec   Loss 7.9392   LearningRate 0.1626   Epoch: 6   Global Step: 62330   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-01-08 08:48:38,374-Speed 5512.77 samples/sec   Loss 7.9924   LearningRate 0.1626   Epoch: 6   Global Step: 62340   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-01-08 08:48:45,918-Speed 5429.59 samples/sec   Loss 8.0278   LearningRate 0.1626   Epoch: 6   Global Step: 62350   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-01-08 08:48:53,407-Speed 5470.20 samples/sec   Loss 8.0790   LearningRate 0.1626   Epoch: 6   Global Step: 62360   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-01-08 08:49:00,894-Speed 5471.55 samples/sec   Loss 8.0444   LearningRate 0.1625   Epoch: 6   Global Step: 62370   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-01-08 08:49:08,351-Speed 5494.14 samples/sec   Loss 7.9558   LearningRate 0.1625   Epoch: 6   Global Step: 62380   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-01-08 08:49:15,843-Speed 5467.13 samples/sec   Loss 7.9707   LearningRate 0.1625   Epoch: 6   Global Step: 62390   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-01-08 08:49:23,541-Speed 5321.96 samples/sec   Loss 7.9307   LearningRate 0.1625   Epoch: 6   Global Step: 62400   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:49:31,011-Speed 5483.62 samples/sec   Loss 8.0293   LearningRate 0.1624   Epoch: 6   Global Step: 62410   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:49:38,513-Speed 5460.96 samples/sec   Loss 7.9691   LearningRate 0.1624   Epoch: 6   Global Step: 62420   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:49:45,992-Speed 5476.92 samples/sec   Loss 7.9912   LearningRate 0.1624   Epoch: 6   Global Step: 62430   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:49:53,460-Speed 5485.80 samples/sec   Loss 7.9504   LearningRate 0.1624   Epoch: 6   Global Step: 62440   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:50:01,036-Speed 5407.16 samples/sec   Loss 7.9482   LearningRate 0.1624   Epoch: 6   Global Step: 62450   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:50:08,680-Speed 5359.01 samples/sec   Loss 7.9886   LearningRate 0.1623   Epoch: 6   Global Step: 62460   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:50:16,371-Speed 5326.74 samples/sec   Loss 7.9773   LearningRate 0.1623   Epoch: 6   Global Step: 62470   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:50:24,055-Speed 5331.24 samples/sec   Loss 7.9676   LearningRate 0.1623   Epoch: 6   Global Step: 62480   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:50:31,764-Speed 5314.17 samples/sec   Loss 8.0044   LearningRate 0.1623   Epoch: 6   Global Step: 62490   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:50:39,456-Speed 5325.75 samples/sec   Loss 7.9613   LearningRate 0.1622   Epoch: 6   Global Step: 62500   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:50:47,155-Speed 5320.50 samples/sec   Loss 7.9043   LearningRate 0.1622   Epoch: 6   Global Step: 62510   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:50:54,842-Speed 5329.27 samples/sec   Loss 8.0517   LearningRate 0.1622   Epoch: 6   Global Step: 62520   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:51:02,490-Speed 5356.28 samples/sec   Loss 8.0281   LearningRate 0.1622   Epoch: 6   Global Step: 62530   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:51:10,187-Speed 5323.10 samples/sec   Loss 7.9361   LearningRate 0.1622   Epoch: 6   Global Step: 62540   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:51:18,026-Speed 5225.17 samples/sec   Loss 8.0105   LearningRate 0.1621   Epoch: 6   Global Step: 62550   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:51:25,684-Speed 5349.29 samples/sec   Loss 7.9661   LearningRate 0.1621   Epoch: 6   Global Step: 62560   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:51:33,221-Speed 5435.06 samples/sec   Loss 7.9804   LearningRate 0.1621   Epoch: 6   Global Step: 62570   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:51:40,828-Speed 5386.14 samples/sec   Loss 7.9644   LearningRate 0.1621   Epoch: 6   Global Step: 62580   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:51:48,349-Speed 5446.37 samples/sec   Loss 8.0450   LearningRate 0.1620   Epoch: 6   Global Step: 62590   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:51:55,805-Speed 5494.31 samples/sec   Loss 7.9831   LearningRate 0.1620   Epoch: 6   Global Step: 62600   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:52:03,448-Speed 5359.64 samples/sec   Loss 7.9759   LearningRate 0.1620   Epoch: 6   Global Step: 62610   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:52:11,030-Speed 5403.64 samples/sec   Loss 8.0159   LearningRate 0.1620   Epoch: 6   Global Step: 62620   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:52:18,873-Speed 5222.90 samples/sec   Loss 7.9703   LearningRate 0.1619   Epoch: 6   Global Step: 62630   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:52:26,397-Speed 5444.26 samples/sec   Loss 7.9540   LearningRate 0.1619   Epoch: 6   Global Step: 62640   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:52:33,961-Speed 5415.62 samples/sec   Loss 8.0556   LearningRate 0.1619   Epoch: 6   Global Step: 62650   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:52:41,606-Speed 5358.98 samples/sec   Loss 8.0517   LearningRate 0.1619   Epoch: 6   Global Step: 62660   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:52:49,371-Speed 5275.31 samples/sec   Loss 7.9630   LearningRate 0.1619   Epoch: 6   Global Step: 62670   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:52:56,915-Speed 5429.90 samples/sec   Loss 7.9642   LearningRate 0.1618   Epoch: 6   Global Step: 62680   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:53:04,585-Speed 5341.33 samples/sec   Loss 8.0214   LearningRate 0.1618   Epoch: 6   Global Step: 62690   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:53:12,152-Speed 5414.06 samples/sec   Loss 7.8985   LearningRate 0.1618   Epoch: 6   Global Step: 62700   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 08:53:19,775-Speed 5373.38 samples/sec   Loss 8.0019   LearningRate 0.1618   Epoch: 6   Global Step: 62710   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 08:53:27,323-Speed 5427.30 samples/sec   Loss 8.0022   LearningRate 0.1617   Epoch: 6   Global Step: 62720   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 08:53:34,915-Speed 5395.95 samples/sec   Loss 8.0522   LearningRate 0.1617   Epoch: 6   Global Step: 62730   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:53:42,581-Speed 5344.03 samples/sec   Loss 8.0436   LearningRate 0.1617   Epoch: 6   Global Step: 62740   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:53:50,134-Speed 5423.93 samples/sec   Loss 8.0418   LearningRate 0.1617   Epoch: 6   Global Step: 62750   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:53:57,649-Speed 5450.77 samples/sec   Loss 7.9683   LearningRate 0.1617   Epoch: 6   Global Step: 62760   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:54:05,199-Speed 5425.87 samples/sec   Loss 7.9815   LearningRate 0.1616   Epoch: 6   Global Step: 62770   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:54:12,779-Speed 5404.41 samples/sec   Loss 7.9820   LearningRate 0.1616   Epoch: 6   Global Step: 62780   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:54:20,319-Speed 5433.32 samples/sec   Loss 7.9325   LearningRate 0.1616   Epoch: 6   Global Step: 62790   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:54:27,879-Speed 5418.54 samples/sec   Loss 8.0155   LearningRate 0.1616   Epoch: 6   Global Step: 62800   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:54:35,376-Speed 5463.82 samples/sec   Loss 7.9935   LearningRate 0.1615   Epoch: 6   Global Step: 62810   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:54:42,931-Speed 5423.14 samples/sec   Loss 8.0063   LearningRate 0.1615   Epoch: 6   Global Step: 62820   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:54:50,485-Speed 5422.61 samples/sec   Loss 7.9380   LearningRate 0.1615   Epoch: 6   Global Step: 62830   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:54:58,008-Speed 5444.81 samples/sec   Loss 7.9694   LearningRate 0.1615   Epoch: 6   Global Step: 62840   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:55:05,479-Speed 5483.46 samples/sec   Loss 8.0081   LearningRate 0.1615   Epoch: 6   Global Step: 62850   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:55:12,910-Speed 5513.37 samples/sec   Loss 7.9412   LearningRate 0.1614   Epoch: 6   Global Step: 62860   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:55:20,524-Speed 5380.29 samples/sec   Loss 7.9320   LearningRate 0.1614   Epoch: 6   Global Step: 62870   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:55:28,108-Speed 5401.04 samples/sec   Loss 7.9794   LearningRate 0.1614   Epoch: 6   Global Step: 62880   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:55:35,676-Speed 5413.37 samples/sec   Loss 7.9338   LearningRate 0.1614   Epoch: 6   Global Step: 62890   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:55:43,259-Speed 5402.41 samples/sec   Loss 7.9065   LearningRate 0.1613   Epoch: 6   Global Step: 62900   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:55:50,895-Speed 5364.45 samples/sec   Loss 7.9784   LearningRate 0.1613   Epoch: 6   Global Step: 62910   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:55:58,528-Speed 5366.70 samples/sec   Loss 8.0184   LearningRate 0.1613   Epoch: 6   Global Step: 62920   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:56:06,062-Speed 5437.86 samples/sec   Loss 7.9323   LearningRate 0.1613   Epoch: 6   Global Step: 62930   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:56:13,680-Speed 5377.43 samples/sec   Loss 7.9943   LearningRate 0.1613   Epoch: 6   Global Step: 62940   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:56:21,172-Speed 5467.77 samples/sec   Loss 7.8688   LearningRate 0.1612   Epoch: 6   Global Step: 62950   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:56:28,741-Speed 5412.33 samples/sec   Loss 7.9707   LearningRate 0.1612   Epoch: 6   Global Step: 62960   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:56:36,264-Speed 5445.30 samples/sec   Loss 7.9751   LearningRate 0.1612   Epoch: 6   Global Step: 62970   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:56:43,816-Speed 5425.07 samples/sec   Loss 7.9690   LearningRate 0.1612   Epoch: 6   Global Step: 62980   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:56:51,346-Speed 5439.83 samples/sec   Loss 7.9764   LearningRate 0.1611   Epoch: 6   Global Step: 62990   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:56:58,837-Speed 5468.67 samples/sec   Loss 8.0145   LearningRate 0.1611   Epoch: 6   Global Step: 63000   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:57:06,493-Speed 5350.71 samples/sec   Loss 8.0237   LearningRate 0.1611   Epoch: 6   Global Step: 63010   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:57:14,015-Speed 5446.00 samples/sec   Loss 8.0259   LearningRate 0.1611   Epoch: 6   Global Step: 63020   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:57:21,641-Speed 5372.43 samples/sec   Loss 7.9670   LearningRate 0.1611   Epoch: 6   Global Step: 63030   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:57:29,391-Speed 5285.00 samples/sec   Loss 7.9599   LearningRate 0.1610   Epoch: 6   Global Step: 63040   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:57:36,923-Speed 5439.24 samples/sec   Loss 7.9396   LearningRate 0.1610   Epoch: 6   Global Step: 63050   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:57:44,447-Speed 5444.53 samples/sec   Loss 7.8847   LearningRate 0.1610   Epoch: 6   Global Step: 63060   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:57:52,011-Speed 5416.36 samples/sec   Loss 7.9715   LearningRate 0.1610   Epoch: 6   Global Step: 63070   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:57:59,574-Speed 5416.22 samples/sec   Loss 8.0298   LearningRate 0.1609   Epoch: 6   Global Step: 63080   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:58:07,075-Speed 5460.76 samples/sec   Loss 7.9558   LearningRate 0.1609   Epoch: 6   Global Step: 63090   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:58:14,637-Speed 5417.46 samples/sec   Loss 7.9512   LearningRate 0.1609   Epoch: 6   Global Step: 63100   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:58:22,299-Speed 5347.10 samples/sec   Loss 7.9965   LearningRate 0.1609   Epoch: 6   Global Step: 63110   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:58:29,819-Speed 5446.80 samples/sec   Loss 7.9443   LearningRate 0.1609   Epoch: 6   Global Step: 63120   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 08:58:37,374-Speed 5422.06 samples/sec   Loss 7.9537   LearningRate 0.1608   Epoch: 6   Global Step: 63130   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:58:45,011-Speed 5364.65 samples/sec   Loss 7.9034   LearningRate 0.1608   Epoch: 6   Global Step: 63140   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:58:52,702-Speed 5326.33 samples/sec   Loss 7.9729   LearningRate 0.1608   Epoch: 6   Global Step: 63150   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:59:00,364-Speed 5346.84 samples/sec   Loss 7.9600   LearningRate 0.1608   Epoch: 6   Global Step: 63160   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:59:07,998-Speed 5365.49 samples/sec   Loss 7.9481   LearningRate 0.1607   Epoch: 6   Global Step: 63170   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:59:15,557-Speed 5419.91 samples/sec   Loss 8.0207   LearningRate 0.1607   Epoch: 6   Global Step: 63180   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:59:23,087-Speed 5440.24 samples/sec   Loss 7.9620   LearningRate 0.1607   Epoch: 6   Global Step: 63190   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:59:30,581-Speed 5466.34 samples/sec   Loss 7.8850   LearningRate 0.1607   Epoch: 6   Global Step: 63200   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:59:38,211-Speed 5368.73 samples/sec   Loss 7.9178   LearningRate 0.1607   Epoch: 6   Global Step: 63210   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:59:45,834-Speed 5374.34 samples/sec   Loss 7.9129   LearningRate 0.1606   Epoch: 6   Global Step: 63220   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 08:59:53,459-Speed 5372.51 samples/sec   Loss 7.9898   LearningRate 0.1606   Epoch: 6   Global Step: 63230   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:00:00,927-Speed 5484.79 samples/sec   Loss 7.9074   LearningRate 0.1606   Epoch: 6   Global Step: 63240   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:00:08,503-Speed 5407.25 samples/sec   Loss 7.8931   LearningRate 0.1606   Epoch: 6   Global Step: 63250   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:00:16,006-Speed 5460.03 samples/sec   Loss 7.9560   LearningRate 0.1605   Epoch: 6   Global Step: 63260   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:00:23,553-Speed 5428.26 samples/sec   Loss 8.0110   LearningRate 0.1605   Epoch: 6   Global Step: 63270   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:00:31,259-Speed 5315.83 samples/sec   Loss 7.9051   LearningRate 0.1605   Epoch: 6   Global Step: 63280   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:00:38,956-Speed 5322.22 samples/sec   Loss 7.9462   LearningRate 0.1605   Epoch: 6   Global Step: 63290   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:00:46,737-Speed 5265.11 samples/sec   Loss 7.9444   LearningRate 0.1605   Epoch: 6   Global Step: 63300   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:00:54,403-Speed 5344.29 samples/sec   Loss 8.0283   LearningRate 0.1604   Epoch: 6   Global Step: 63310   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:01:01,973-Speed 5411.11 samples/sec   Loss 7.9174   LearningRate 0.1604   Epoch: 6   Global Step: 63320   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:01:09,537-Speed 5415.66 samples/sec   Loss 7.9557   LearningRate 0.1604   Epoch: 6   Global Step: 63330   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:01:17,164-Speed 5371.63 samples/sec   Loss 7.9317   LearningRate 0.1604   Epoch: 6   Global Step: 63340   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:01:24,660-Speed 5464.60 samples/sec   Loss 7.9385   LearningRate 0.1603   Epoch: 6   Global Step: 63350   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:01:32,262-Speed 5388.52 samples/sec   Loss 7.9422   LearningRate 0.1603   Epoch: 6   Global Step: 63360   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:01:39,833-Speed 5410.80 samples/sec   Loss 7.9337   LearningRate 0.1603   Epoch: 6   Global Step: 63370   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:01:47,543-Speed 5313.70 samples/sec   Loss 7.8840   LearningRate 0.1603   Epoch: 6   Global Step: 63380   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:01:55,031-Speed 5471.28 samples/sec   Loss 7.9896   LearningRate 0.1603   Epoch: 6   Global Step: 63390   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:02:02,575-Speed 5429.93 samples/sec   Loss 7.9528   LearningRate 0.1602   Epoch: 6   Global Step: 63400   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:02:10,296-Speed 5305.56 samples/sec   Loss 7.9336   LearningRate 0.1602   Epoch: 6   Global Step: 63410   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:02:17,836-Speed 5433.35 samples/sec   Loss 7.9733   LearningRate 0.1602   Epoch: 6   Global Step: 63420   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:02:25,355-Speed 5448.82 samples/sec   Loss 7.9440   LearningRate 0.1602   Epoch: 6   Global Step: 63430   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:02:32,877-Speed 5445.46 samples/sec   Loss 7.9645   LearningRate 0.1601   Epoch: 6   Global Step: 63440   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:02:40,373-Speed 5464.87 samples/sec   Loss 7.8278   LearningRate 0.1601   Epoch: 6   Global Step: 63450   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:02:47,917-Speed 5430.51 samples/sec   Loss 7.8615   LearningRate 0.1601   Epoch: 6   Global Step: 63460   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:02:55,365-Speed 5499.86 samples/sec   Loss 7.9437   LearningRate 0.1601   Epoch: 6   Global Step: 63470   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:03:02,968-Speed 5388.49 samples/sec   Loss 7.9251   LearningRate 0.1601   Epoch: 6   Global Step: 63480   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:03:10,462-Speed 5465.92 samples/sec   Loss 7.8854   LearningRate 0.1600   Epoch: 6   Global Step: 63490   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:03:18,094-Speed 5367.59 samples/sec   Loss 7.9681   LearningRate 0.1600   Epoch: 6   Global Step: 63500   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:03:25,623-Speed 5441.28 samples/sec   Loss 7.9319   LearningRate 0.1600   Epoch: 6   Global Step: 63510   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:03:33,247-Speed 5373.08 samples/sec   Loss 7.8544   LearningRate 0.1600   Epoch: 6   Global Step: 63520   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:03:40,879-Speed 5367.51 samples/sec   Loss 7.9317   LearningRate 0.1599   Epoch: 6   Global Step: 63530   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:03:48,538-Speed 5348.80 samples/sec   Loss 7.9503   LearningRate 0.1599   Epoch: 6   Global Step: 63540   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:03:56,045-Speed 5457.58 samples/sec   Loss 7.9409   LearningRate 0.1599   Epoch: 6   Global Step: 63550   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:04:03,601-Speed 5421.17 samples/sec   Loss 7.9205   LearningRate 0.1599   Epoch: 6   Global Step: 63560   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:04:11,106-Speed 5458.83 samples/sec   Loss 7.9319   LearningRate 0.1599   Epoch: 6   Global Step: 63570   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:04:18,715-Speed 5383.50 samples/sec   Loss 7.9499   LearningRate 0.1598   Epoch: 6   Global Step: 63580   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:04:26,201-Speed 5472.00 samples/sec   Loss 7.9130   LearningRate 0.1598   Epoch: 6   Global Step: 63590   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:04:33,760-Speed 5419.67 samples/sec   Loss 7.9158   LearningRate 0.1598   Epoch: 6   Global Step: 63600   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:04:41,363-Speed 5388.08 samples/sec   Loss 7.9437   LearningRate 0.1598   Epoch: 6   Global Step: 63610   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:04:48,836-Speed 5481.49 samples/sec   Loss 7.8568   LearningRate 0.1597   Epoch: 6   Global Step: 63620   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:04:56,335-Speed 5462.84 samples/sec   Loss 7.8892   LearningRate 0.1597   Epoch: 6   Global Step: 63630   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:05:03,894-Speed 5419.90 samples/sec   Loss 7.9160   LearningRate 0.1597   Epoch: 6   Global Step: 63640   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:05:11,624-Speed 5299.21 samples/sec   Loss 7.8562   LearningRate 0.1597   Epoch: 6   Global Step: 63650   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:05:19,160-Speed 5435.41 samples/sec   Loss 7.9260   LearningRate 0.1597   Epoch: 6   Global Step: 63660   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:05:26,671-Speed 5454.24 samples/sec   Loss 7.8859   LearningRate 0.1596   Epoch: 6   Global Step: 63670   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:05:34,165-Speed 5466.95 samples/sec   Loss 7.9505   LearningRate 0.1596   Epoch: 6   Global Step: 63680   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:05:41,694-Speed 5440.14 samples/sec   Loss 7.9417   LearningRate 0.1596   Epoch: 6   Global Step: 63690   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:05:49,174-Speed 5477.11 samples/sec   Loss 7.9206   LearningRate 0.1596   Epoch: 6   Global Step: 63700   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:05:56,725-Speed 5424.76 samples/sec   Loss 7.9586   LearningRate 0.1595   Epoch: 6   Global Step: 63710   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:06:04,210-Speed 5473.32 samples/sec   Loss 7.8600   LearningRate 0.1595   Epoch: 6   Global Step: 63720   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:06:11,688-Speed 5477.60 samples/sec   Loss 7.8494   LearningRate 0.1595   Epoch: 6   Global Step: 63730   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:06:19,210-Speed 5446.41 samples/sec   Loss 7.9340   LearningRate 0.1595   Epoch: 6   Global Step: 63740   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:06:26,684-Speed 5481.03 samples/sec   Loss 7.8777   LearningRate 0.1595   Epoch: 6   Global Step: 63750   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:06:34,193-Speed 5455.94 samples/sec   Loss 7.9396   LearningRate 0.1594   Epoch: 6   Global Step: 63760   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:06:41,671-Speed 5478.18 samples/sec   Loss 8.0015   LearningRate 0.1594   Epoch: 6   Global Step: 63770   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:06:49,123-Speed 5497.03 samples/sec   Loss 7.9174   LearningRate 0.1594   Epoch: 6   Global Step: 63780   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:06:56,688-Speed 5415.55 samples/sec   Loss 7.9512   LearningRate 0.1594   Epoch: 6   Global Step: 63790   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:07:04,294-Speed 5385.56 samples/sec   Loss 7.9618   LearningRate 0.1593   Epoch: 6   Global Step: 63800   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:07:11,786-Speed 5468.13 samples/sec   Loss 7.8101   LearningRate 0.1593   Epoch: 6   Global Step: 63810   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:07:19,246-Speed 5491.02 samples/sec   Loss 7.9624   LearningRate 0.1593   Epoch: 6   Global Step: 63820   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:07:26,788-Speed 5431.99 samples/sec   Loss 7.8902   LearningRate 0.1593   Epoch: 6   Global Step: 63830   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:07:34,386-Speed 5391.59 samples/sec   Loss 7.9228   LearningRate 0.1593   Epoch: 6   Global Step: 63840   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:07:41,953-Speed 5413.58 samples/sec   Loss 7.9330   LearningRate 0.1592   Epoch: 6   Global Step: 63850   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:07:49,419-Speed 5487.10 samples/sec   Loss 7.9339   LearningRate 0.1592   Epoch: 6   Global Step: 63860   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:07:56,927-Speed 5455.77 samples/sec   Loss 7.9521   LearningRate 0.1592   Epoch: 6   Global Step: 63870   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:08:04,457-Speed 5440.65 samples/sec   Loss 7.8947   LearningRate 0.1592   Epoch: 6   Global Step: 63880   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:08:11,941-Speed 5473.31 samples/sec   Loss 7.8299   LearningRate 0.1591   Epoch: 6   Global Step: 63890   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:08:19,398-Speed 5493.94 samples/sec   Loss 7.8627   LearningRate 0.1591   Epoch: 6   Global Step: 63900   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:08:26,887-Speed 5470.04 samples/sec   Loss 7.9118   LearningRate 0.1591   Epoch: 6   Global Step: 63910   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:08:34,395-Speed 5456.27 samples/sec   Loss 7.9751   LearningRate 0.1591   Epoch: 6   Global Step: 63920   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 09:08:42,043-Speed 5356.25 samples/sec   Loss 7.9560   LearningRate 0.1591   Epoch: 6   Global Step: 63930   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:08:49,539-Speed 5464.95 samples/sec   Loss 7.9086   LearningRate 0.1590   Epoch: 6   Global Step: 63940   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:08:56,992-Speed 5496.28 samples/sec   Loss 7.8947   LearningRate 0.1590   Epoch: 6   Global Step: 63950   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:09:04,484-Speed 5467.99 samples/sec   Loss 7.9095   LearningRate 0.1590   Epoch: 6   Global Step: 63960   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:09:11,912-Speed 5515.31 samples/sec   Loss 7.9123   LearningRate 0.1590   Epoch: 6   Global Step: 63970   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:09:19,377-Speed 5487.84 samples/sec   Loss 7.8530   LearningRate 0.1589   Epoch: 6   Global Step: 63980   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:09:26,821-Speed 5502.65 samples/sec   Loss 7.8435   LearningRate 0.1589   Epoch: 6   Global Step: 63990   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:09:34,303-Speed 5475.58 samples/sec   Loss 7.8751   LearningRate 0.1589   Epoch: 6   Global Step: 64000   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:10:18,232-[lfw][64000]XNorm: 23.364512
Training: 2022-01-08 09:10:18,233-[lfw][64000]Accuracy-Flip: 0.99783+-0.00279
Training: 2022-01-08 09:10:18,233-[lfw][64000]Accuracy-Highest: 0.99817
Training: 2022-01-08 09:11:09,983-[cfp_fp][64000]XNorm: 21.319322
Training: 2022-01-08 09:11:09,984-[cfp_fp][64000]Accuracy-Flip: 0.98457+-0.00541
Training: 2022-01-08 09:11:09,985-[cfp_fp][64000]Accuracy-Highest: 0.98600
Training: 2022-01-08 09:11:56,151-[agedb_30][64000]XNorm: 23.179053
Training: 2022-01-08 09:11:56,153-[agedb_30][64000]Accuracy-Flip: 0.97333+-0.00601
Training: 2022-01-08 09:11:56,153-[agedb_30][64000]Accuracy-Highest: 0.97667
Training: 2022-01-08 09:12:03,341-Speed 274.83 samples/sec   Loss 7.8360   LearningRate 0.1589   Epoch: 6   Global Step: 64010   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:12:11,041-Speed 5321.07 samples/sec   Loss 7.9691   LearningRate 0.1589   Epoch: 6   Global Step: 64020   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:12:18,647-Speed 5386.20 samples/sec   Loss 7.8796   LearningRate 0.1588   Epoch: 6   Global Step: 64030   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 09:12:26,319-Speed 5339.83 samples/sec   Loss 7.8392   LearningRate 0.1588   Epoch: 6   Global Step: 64040   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:12:33,840-Speed 5447.35 samples/sec   Loss 7.8844   LearningRate 0.1588   Epoch: 6   Global Step: 64050   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:12:41,344-Speed 5459.52 samples/sec   Loss 7.9158   LearningRate 0.1588   Epoch: 6   Global Step: 64060   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:12:48,856-Speed 5453.14 samples/sec   Loss 7.8561   LearningRate 0.1587   Epoch: 6   Global Step: 64070   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:12:56,374-Speed 5448.84 samples/sec   Loss 7.8801   LearningRate 0.1587   Epoch: 6   Global Step: 64080   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:13:03,871-Speed 5464.27 samples/sec   Loss 7.9044   LearningRate 0.1587   Epoch: 6   Global Step: 64090   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:13:11,386-Speed 5451.23 samples/sec   Loss 7.8989   LearningRate 0.1587   Epoch: 6   Global Step: 64100   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:13:18,921-Speed 5436.60 samples/sec   Loss 7.8612   LearningRate 0.1587   Epoch: 6   Global Step: 64110   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:13:26,419-Speed 5463.32 samples/sec   Loss 7.8331   LearningRate 0.1586   Epoch: 6   Global Step: 64120   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:13:33,970-Speed 5425.10 samples/sec   Loss 7.8552   LearningRate 0.1586   Epoch: 6   Global Step: 64130   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:13:41,469-Speed 5463.03 samples/sec   Loss 7.8818   LearningRate 0.1586   Epoch: 6   Global Step: 64140   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:13:48,930-Speed 5490.89 samples/sec   Loss 7.8828   LearningRate 0.1586   Epoch: 6   Global Step: 64150   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:13:56,362-Speed 5511.52 samples/sec   Loss 7.8992   LearningRate 0.1585   Epoch: 6   Global Step: 64160   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:14:03,956-Speed 5394.47 samples/sec   Loss 7.8316   LearningRate 0.1585   Epoch: 6   Global Step: 64170   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:14:11,444-Speed 5470.65 samples/sec   Loss 7.8954   LearningRate 0.1585   Epoch: 6   Global Step: 64180   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:14:18,984-Speed 5433.83 samples/sec   Loss 7.8399   LearningRate 0.1585   Epoch: 6   Global Step: 64190   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:14:26,538-Speed 5422.43 samples/sec   Loss 7.8445   LearningRate 0.1585   Epoch: 6   Global Step: 64200   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:14:33,993-Speed 5494.71 samples/sec   Loss 7.8848   LearningRate 0.1584   Epoch: 6   Global Step: 64210   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:14:41,434-Speed 5505.43 samples/sec   Loss 7.8509   LearningRate 0.1584   Epoch: 6   Global Step: 64220   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:14:48,908-Speed 5481.30 samples/sec   Loss 7.8058   LearningRate 0.1584   Epoch: 6   Global Step: 64230   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:14:56,409-Speed 5461.17 samples/sec   Loss 7.8192   LearningRate 0.1584   Epoch: 6   Global Step: 64240   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:15:03,911-Speed 5460.68 samples/sec   Loss 7.8525   LearningRate 0.1583   Epoch: 6   Global Step: 64250   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:15:11,614-Speed 5318.11 samples/sec   Loss 7.8949   LearningRate 0.1583   Epoch: 6   Global Step: 64260   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:15:19,226-Speed 5381.66 samples/sec   Loss 7.8631   LearningRate 0.1583   Epoch: 6   Global Step: 64270   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:15:26,862-Speed 5365.39 samples/sec   Loss 7.8742   LearningRate 0.1583   Epoch: 6   Global Step: 64280   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:15:34,431-Speed 5411.72 samples/sec   Loss 7.9409   LearningRate 0.1583   Epoch: 6   Global Step: 64290   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:15:42,027-Speed 5392.97 samples/sec   Loss 7.9143   LearningRate 0.1582   Epoch: 6   Global Step: 64300   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:15:49,685-Speed 5349.30 samples/sec   Loss 7.9281   LearningRate 0.1582   Epoch: 6   Global Step: 64310   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:15:57,376-Speed 5326.50 samples/sec   Loss 7.9257   LearningRate 0.1582   Epoch: 6   Global Step: 64320   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:16:05,136-Speed 5279.13 samples/sec   Loss 7.8392   LearningRate 0.1582   Epoch: 6   Global Step: 64330   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:16:12,630-Speed 5466.43 samples/sec   Loss 7.8867   LearningRate 0.1581   Epoch: 6   Global Step: 64340   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:16:20,071-Speed 5505.08 samples/sec   Loss 7.8830   LearningRate 0.1581   Epoch: 6   Global Step: 64350   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:16:27,619-Speed 5427.60 samples/sec   Loss 7.8057   LearningRate 0.1581   Epoch: 6   Global Step: 64360   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:16:35,162-Speed 5431.39 samples/sec   Loss 7.8186   LearningRate 0.1581   Epoch: 6   Global Step: 64370   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:16:42,789-Speed 5370.66 samples/sec   Loss 7.8356   LearningRate 0.1581   Epoch: 6   Global Step: 64380   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:16:50,280-Speed 5469.11 samples/sec   Loss 7.8502   LearningRate 0.1580   Epoch: 6   Global Step: 64390   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:16:57,767-Speed 5471.68 samples/sec   Loss 7.8872   LearningRate 0.1580   Epoch: 6   Global Step: 64400   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:17:05,277-Speed 5454.41 samples/sec   Loss 7.8995   LearningRate 0.1580   Epoch: 6   Global Step: 64410   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:17:12,942-Speed 5344.51 samples/sec   Loss 7.9055   LearningRate 0.1580   Epoch: 6   Global Step: 64420   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:17:20,444-Speed 5461.26 samples/sec   Loss 7.8624   LearningRate 0.1579   Epoch: 6   Global Step: 64430   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:17:27,958-Speed 5451.30 samples/sec   Loss 7.8994   LearningRate 0.1579   Epoch: 6   Global Step: 64440   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:17:35,519-Speed 5418.24 samples/sec   Loss 7.8888   LearningRate 0.1579   Epoch: 6   Global Step: 64450   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:17:43,061-Speed 5431.54 samples/sec   Loss 7.9135   LearningRate 0.1579   Epoch: 6   Global Step: 64460   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:17:50,587-Speed 5443.30 samples/sec   Loss 7.8697   LearningRate 0.1579   Epoch: 6   Global Step: 64470   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:17:58,079-Speed 5468.16 samples/sec   Loss 7.8913   LearningRate 0.1578   Epoch: 6   Global Step: 64480   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:18:05,545-Speed 5487.15 samples/sec   Loss 7.8912   LearningRate 0.1578   Epoch: 6   Global Step: 64490   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:18:12,986-Speed 5505.41 samples/sec   Loss 7.8797   LearningRate 0.1578   Epoch: 6   Global Step: 64500   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-01-08 09:18:20,534-Speed 5427.10 samples/sec   Loss 7.8755   LearningRate 0.1578   Epoch: 6   Global Step: 64510   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-01-08 09:18:28,023-Speed 5470.35 samples/sec   Loss 7.9428   LearningRate 0.1577   Epoch: 6   Global Step: 64520   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-01-08 09:18:35,558-Speed 5436.36 samples/sec   Loss 7.9629   LearningRate 0.1577   Epoch: 6   Global Step: 64530   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-01-08 09:18:43,055-Speed 5464.22 samples/sec   Loss 7.9058   LearningRate 0.1577   Epoch: 6   Global Step: 64540   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-01-08 09:18:50,620-Speed 5415.47 samples/sec   Loss 7.8896   LearningRate 0.1577   Epoch: 6   Global Step: 64550   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-01-08 09:18:58,131-Speed 5453.80 samples/sec   Loss 7.8678   LearningRate 0.1577   Epoch: 6   Global Step: 64560   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-01-08 09:19:05,658-Speed 5442.53 samples/sec   Loss 7.8774   LearningRate 0.1576   Epoch: 6   Global Step: 64570   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-01-08 09:19:13,138-Speed 5476.76 samples/sec   Loss 7.7909   LearningRate 0.1576   Epoch: 6   Global Step: 64580   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-01-08 09:19:20,638-Speed 5461.93 samples/sec   Loss 7.8952   LearningRate 0.1576   Epoch: 6   Global Step: 64590   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-01-08 09:19:28,121-Speed 5474.51 samples/sec   Loss 7.9196   LearningRate 0.1576   Epoch: 6   Global Step: 64600   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:19:35,757-Speed 5365.36 samples/sec   Loss 7.8012   LearningRate 0.1575   Epoch: 6   Global Step: 64610   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:19:43,309-Speed 5423.83 samples/sec   Loss 7.8805   LearningRate 0.1575   Epoch: 6   Global Step: 64620   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:19:50,772-Speed 5489.11 samples/sec   Loss 7.9330   LearningRate 0.1575   Epoch: 6   Global Step: 64630   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:19:58,206-Speed 5510.71 samples/sec   Loss 7.8776   LearningRate 0.1575   Epoch: 6   Global Step: 64640   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:20:05,704-Speed 5463.96 samples/sec   Loss 7.8303   LearningRate 0.1575   Epoch: 6   Global Step: 64650   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:20:13,137-Speed 5511.25 samples/sec   Loss 7.8588   LearningRate 0.1574   Epoch: 6   Global Step: 64660   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:20:20,775-Speed 5363.59 samples/sec   Loss 7.8889   LearningRate 0.1574   Epoch: 6   Global Step: 64670   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:20:28,279-Speed 5459.09 samples/sec   Loss 7.8312   LearningRate 0.1574   Epoch: 6   Global Step: 64680   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:20:35,846-Speed 5413.72 samples/sec   Loss 7.9434   LearningRate 0.1574   Epoch: 6   Global Step: 64690   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:20:43,419-Speed 5409.53 samples/sec   Loss 7.8588   LearningRate 0.1573   Epoch: 6   Global Step: 64700   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:20:50,924-Speed 5457.97 samples/sec   Loss 7.8553   LearningRate 0.1573   Epoch: 6   Global Step: 64710   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:20:58,508-Speed 5401.51 samples/sec   Loss 7.8471   LearningRate 0.1573   Epoch: 6   Global Step: 64720   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:21:06,167-Speed 5349.19 samples/sec   Loss 7.8927   LearningRate 0.1573   Epoch: 6   Global Step: 64730   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:21:13,849-Speed 5332.49 samples/sec   Loss 7.8217   LearningRate 0.1573   Epoch: 6   Global Step: 64740   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:21:21,405-Speed 5421.75 samples/sec   Loss 7.8354   LearningRate 0.1572   Epoch: 6   Global Step: 64750   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:21:28,973-Speed 5412.70 samples/sec   Loss 7.7464   LearningRate 0.1572   Epoch: 6   Global Step: 64760   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:21:36,548-Speed 5408.41 samples/sec   Loss 7.8506   LearningRate 0.1572   Epoch: 6   Global Step: 64770   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:21:44,065-Speed 5449.69 samples/sec   Loss 7.8081   LearningRate 0.1572   Epoch: 6   Global Step: 64780   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:21:51,680-Speed 5379.58 samples/sec   Loss 7.8178   LearningRate 0.1572   Epoch: 6   Global Step: 64790   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:21:59,172-Speed 5468.27 samples/sec   Loss 7.8927   LearningRate 0.1571   Epoch: 6   Global Step: 64800   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:22:06,692-Speed 5447.09 samples/sec   Loss 7.9223   LearningRate 0.1571   Epoch: 6   Global Step: 64810   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:22:14,215-Speed 5445.59 samples/sec   Loss 7.8071   LearningRate 0.1571   Epoch: 6   Global Step: 64820   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:22:21,700-Speed 5472.91 samples/sec   Loss 7.8191   LearningRate 0.1571   Epoch: 6   Global Step: 64830   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:22:29,156-Speed 5494.15 samples/sec   Loss 7.7915   LearningRate 0.1570   Epoch: 6   Global Step: 64840   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:22:36,687-Speed 5439.35 samples/sec   Loss 7.7724   LearningRate 0.1570   Epoch: 6   Global Step: 64850   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:22:44,296-Speed 5384.11 samples/sec   Loss 7.9066   LearningRate 0.1570   Epoch: 6   Global Step: 64860   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:22:51,798-Speed 5460.47 samples/sec   Loss 7.8049   LearningRate 0.1570   Epoch: 6   Global Step: 64870   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:22:59,436-Speed 5363.27 samples/sec   Loss 7.8313   LearningRate 0.1570   Epoch: 6   Global Step: 64880   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:23:06,947-Speed 5454.26 samples/sec   Loss 7.8530   LearningRate 0.1569   Epoch: 6   Global Step: 64890   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:23:14,503-Speed 5421.76 samples/sec   Loss 7.8626   LearningRate 0.1569   Epoch: 6   Global Step: 64900   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:23:21,992-Speed 5469.29 samples/sec   Loss 7.8820   LearningRate 0.1569   Epoch: 6   Global Step: 64910   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:23:29,495-Speed 5460.47 samples/sec   Loss 7.9108   LearningRate 0.1569   Epoch: 6   Global Step: 64920   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:23:36,915-Speed 5520.83 samples/sec   Loss 7.8413   LearningRate 0.1568   Epoch: 6   Global Step: 64930   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:23:44,445-Speed 5440.53 samples/sec   Loss 7.8003   LearningRate 0.1568   Epoch: 6   Global Step: 64940   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:23:51,960-Speed 5450.43 samples/sec   Loss 7.8409   LearningRate 0.1568   Epoch: 6   Global Step: 64950   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:23:59,523-Speed 5416.59 samples/sec   Loss 7.8362   LearningRate 0.1568   Epoch: 6   Global Step: 64960   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:24:06,999-Speed 5479.90 samples/sec   Loss 7.8652   LearningRate 0.1568   Epoch: 6   Global Step: 64970   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:24:14,583-Speed 5401.37 samples/sec   Loss 7.8015   LearningRate 0.1567   Epoch: 6   Global Step: 64980   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:24:22,049-Speed 5486.57 samples/sec   Loss 7.8038   LearningRate 0.1567   Epoch: 6   Global Step: 64990   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:24:29,663-Speed 5380.46 samples/sec   Loss 7.8410   LearningRate 0.1567   Epoch: 6   Global Step: 65000   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:24:37,311-Speed 5356.37 samples/sec   Loss 7.8284   LearningRate 0.1567   Epoch: 6   Global Step: 65010   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:24:44,951-Speed 5362.04 samples/sec   Loss 7.8515   LearningRate 0.1566   Epoch: 6   Global Step: 65020   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:24:52,624-Speed 5338.79 samples/sec   Loss 7.8236   LearningRate 0.1566   Epoch: 6   Global Step: 65030   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:25:00,128-Speed 5458.88 samples/sec   Loss 7.9028   LearningRate 0.1566   Epoch: 6   Global Step: 65040   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:25:07,900-Speed 5271.66 samples/sec   Loss 7.8071   LearningRate 0.1566   Epoch: 6   Global Step: 65050   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:25:15,495-Speed 5393.78 samples/sec   Loss 7.7859   LearningRate 0.1566   Epoch: 6   Global Step: 65060   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:25:23,103-Speed 5383.72 samples/sec   Loss 7.7810   LearningRate 0.1565   Epoch: 6   Global Step: 65070   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:25:30,693-Speed 5397.91 samples/sec   Loss 7.9008   LearningRate 0.1565   Epoch: 6   Global Step: 65080   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:25:38,221-Speed 5441.61 samples/sec   Loss 7.7970   LearningRate 0.1565   Epoch: 6   Global Step: 65090   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:25:45,677-Speed 5494.78 samples/sec   Loss 7.7788   LearningRate 0.1565   Epoch: 6   Global Step: 65100   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:25:53,233-Speed 5421.21 samples/sec   Loss 7.7873   LearningRate 0.1564   Epoch: 6   Global Step: 65110   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:26:00,749-Speed 5450.50 samples/sec   Loss 7.7574   LearningRate 0.1564   Epoch: 6   Global Step: 65120   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:26:08,294-Speed 5429.58 samples/sec   Loss 7.7661   LearningRate 0.1564   Epoch: 6   Global Step: 65130   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:26:15,942-Speed 5356.30 samples/sec   Loss 7.7833   LearningRate 0.1564   Epoch: 6   Global Step: 65140   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:26:23,483-Speed 5431.93 samples/sec   Loss 7.8060   LearningRate 0.1564   Epoch: 6   Global Step: 65150   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:26:30,924-Speed 5505.32 samples/sec   Loss 7.8223   LearningRate 0.1563   Epoch: 6   Global Step: 65160   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:26:38,443-Speed 5448.90 samples/sec   Loss 7.8853   LearningRate 0.1563   Epoch: 6   Global Step: 65170   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:26:45,954-Speed 5453.85 samples/sec   Loss 7.8414   LearningRate 0.1563   Epoch: 6   Global Step: 65180   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:26:53,439-Speed 5472.47 samples/sec   Loss 7.8692   LearningRate 0.1563   Epoch: 6   Global Step: 65190   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:27:00,915-Speed 5479.72 samples/sec   Loss 7.8562   LearningRate 0.1562   Epoch: 6   Global Step: 65200   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:27:08,505-Speed 5397.38 samples/sec   Loss 7.7906   LearningRate 0.1562   Epoch: 6   Global Step: 65210   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:27:16,032-Speed 5442.74 samples/sec   Loss 7.8994   LearningRate 0.1562   Epoch: 6   Global Step: 65220   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:27:23,647-Speed 5379.29 samples/sec   Loss 7.8528   LearningRate 0.1562   Epoch: 6   Global Step: 65230   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 09:27:31,315-Speed 5342.16 samples/sec   Loss 7.7697   LearningRate 0.1562   Epoch: 6   Global Step: 65240   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:27:38,895-Speed 5404.57 samples/sec   Loss 7.8052   LearningRate 0.1561   Epoch: 6   Global Step: 65250   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:27:46,458-Speed 5416.56 samples/sec   Loss 7.7999   LearningRate 0.1561   Epoch: 6   Global Step: 65260   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:27:54,034-Speed 5407.06 samples/sec   Loss 7.6978   LearningRate 0.1561   Epoch: 6   Global Step: 65270   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:28:01,568-Speed 5437.80 samples/sec   Loss 7.8548   LearningRate 0.1561   Epoch: 6   Global Step: 65280   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:28:09,113-Speed 5429.78 samples/sec   Loss 7.7998   LearningRate 0.1561   Epoch: 6   Global Step: 65290   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:28:16,712-Speed 5390.90 samples/sec   Loss 7.7996   LearningRate 0.1560   Epoch: 6   Global Step: 65300   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:28:24,213-Speed 5461.61 samples/sec   Loss 7.8146   LearningRate 0.1560   Epoch: 6   Global Step: 65310   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:28:31,958-Speed 5289.23 samples/sec   Loss 7.8285   LearningRate 0.1560   Epoch: 6   Global Step: 65320   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:28:39,401-Speed 5503.87 samples/sec   Loss 7.7754   LearningRate 0.1560   Epoch: 6   Global Step: 65330   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:28:46,996-Speed 5393.34 samples/sec   Loss 7.8213   LearningRate 0.1559   Epoch: 6   Global Step: 65340   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:28:54,537-Speed 5432.58 samples/sec   Loss 7.8992   LearningRate 0.1559   Epoch: 6   Global Step: 65350   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:29:02,241-Speed 5317.95 samples/sec   Loss 7.8560   LearningRate 0.1559   Epoch: 6   Global Step: 65360   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:29:09,767-Speed 5443.09 samples/sec   Loss 7.8503   LearningRate 0.1559   Epoch: 6   Global Step: 65370   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:29:17,354-Speed 5398.96 samples/sec   Loss 7.8628   LearningRate 0.1559   Epoch: 6   Global Step: 65380   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:29:24,905-Speed 5425.19 samples/sec   Loss 7.8150   LearningRate 0.1558   Epoch: 6   Global Step: 65390   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:29:32,385-Speed 5477.19 samples/sec   Loss 7.7939   LearningRate 0.1558   Epoch: 6   Global Step: 65400   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:29:39,908-Speed 5445.24 samples/sec   Loss 7.8071   LearningRate 0.1558   Epoch: 6   Global Step: 65410   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 09:29:47,413-Speed 5458.23 samples/sec   Loss 7.8201   LearningRate 0.1558   Epoch: 6   Global Step: 65420   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:29:54,932-Speed 5448.28 samples/sec   Loss 7.7877   LearningRate 0.1557   Epoch: 6   Global Step: 65430   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:30:02,413-Speed 5476.26 samples/sec   Loss 7.7767   LearningRate 0.1557   Epoch: 6   Global Step: 65440   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 09:30:09,824-Speed 5527.84 samples/sec   Loss 7.7595   LearningRate 0.1557   Epoch: 6   Global Step: 65450   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:30:17,284-Speed 5491.16 samples/sec   Loss 7.8122   LearningRate 0.1557   Epoch: 6   Global Step: 65460   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:30:24,734-Speed 5499.04 samples/sec   Loss 7.7934   LearningRate 0.1557   Epoch: 6   Global Step: 65470   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:30:32,284-Speed 5425.23 samples/sec   Loss 7.8492   LearningRate 0.1556   Epoch: 6   Global Step: 65480   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:30:39,841-Speed 5421.31 samples/sec   Loss 7.8227   LearningRate 0.1556   Epoch: 6   Global Step: 65490   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:30:47,392-Speed 5425.15 samples/sec   Loss 7.7401   LearningRate 0.1556   Epoch: 6   Global Step: 65500   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:30:54,907-Speed 5451.24 samples/sec   Loss 7.7240   LearningRate 0.1556   Epoch: 6   Global Step: 65510   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:31:02,503-Speed 5392.78 samples/sec   Loss 7.8094   LearningRate 0.1555   Epoch: 6   Global Step: 65520   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:31:09,980-Speed 5479.20 samples/sec   Loss 7.7503   LearningRate 0.1555   Epoch: 6   Global Step: 65530   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:31:17,512-Speed 5438.53 samples/sec   Loss 7.7317   LearningRate 0.1555   Epoch: 6   Global Step: 65540   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:31:24,975-Speed 5489.16 samples/sec   Loss 7.8345   LearningRate 0.1555   Epoch: 6   Global Step: 65550   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:31:32,433-Speed 5493.09 samples/sec   Loss 7.7275   LearningRate 0.1555   Epoch: 6   Global Step: 65560   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:31:39,985-Speed 5424.27 samples/sec   Loss 7.7964   LearningRate 0.1554   Epoch: 6   Global Step: 65570   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:31:47,492-Speed 5457.43 samples/sec   Loss 7.7767   LearningRate 0.1554   Epoch: 6   Global Step: 65580   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:31:55,064-Speed 5409.65 samples/sec   Loss 7.8027   LearningRate 0.1554   Epoch: 6   Global Step: 65590   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:32:02,497-Speed 5511.76 samples/sec   Loss 7.8045   LearningRate 0.1554   Epoch: 6   Global Step: 65600   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:32:09,947-Speed 5498.30 samples/sec   Loss 7.8082   LearningRate 0.1553   Epoch: 6   Global Step: 65610   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:32:17,426-Speed 5477.26 samples/sec   Loss 7.7177   LearningRate 0.1553   Epoch: 6   Global Step: 65620   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:32:24,862-Speed 5509.44 samples/sec   Loss 7.8017   LearningRate 0.1553   Epoch: 6   Global Step: 65630   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:32:32,275-Speed 5525.82 samples/sec   Loss 7.8107   LearningRate 0.1553   Epoch: 6   Global Step: 65640   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:32:39,763-Speed 5470.84 samples/sec   Loss 7.7096   LearningRate 0.1553   Epoch: 6   Global Step: 65650   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:32:47,237-Speed 5481.27 samples/sec   Loss 7.8199   LearningRate 0.1552   Epoch: 6   Global Step: 65660   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:32:54,658-Speed 5520.56 samples/sec   Loss 7.8179   LearningRate 0.1552   Epoch: 6   Global Step: 65670   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:33:02,195-Speed 5435.48 samples/sec   Loss 7.7722   LearningRate 0.1552   Epoch: 6   Global Step: 65680   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:33:09,633-Speed 5506.95 samples/sec   Loss 7.6942   LearningRate 0.1552   Epoch: 6   Global Step: 65690   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:33:17,199-Speed 5414.47 samples/sec   Loss 7.7528   LearningRate 0.1552   Epoch: 6   Global Step: 65700   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:33:24,682-Speed 5474.81 samples/sec   Loss 7.7665   LearningRate 0.1551   Epoch: 6   Global Step: 65710   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:33:32,183-Speed 5461.45 samples/sec   Loss 7.7200   LearningRate 0.1551   Epoch: 6   Global Step: 65720   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:33:39,636-Speed 5496.55 samples/sec   Loss 7.7211   LearningRate 0.1551   Epoch: 6   Global Step: 65730   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:33:47,075-Speed 5507.07 samples/sec   Loss 7.7395   LearningRate 0.1551   Epoch: 6   Global Step: 65740   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:33:54,509-Speed 5510.52 samples/sec   Loss 7.7485   LearningRate 0.1550   Epoch: 6   Global Step: 65750   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:34:02,004-Speed 5466.02 samples/sec   Loss 7.7377   LearningRate 0.1550   Epoch: 6   Global Step: 65760   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:34:09,511-Speed 5456.68 samples/sec   Loss 7.7735   LearningRate 0.1550   Epoch: 6   Global Step: 65770   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:34:16,958-Speed 5500.88 samples/sec   Loss 7.7578   LearningRate 0.1550   Epoch: 6   Global Step: 65780   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:34:24,430-Speed 5483.21 samples/sec   Loss 7.6833   LearningRate 0.1550   Epoch: 6   Global Step: 65790   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:34:32,005-Speed 5407.64 samples/sec   Loss 7.7424   LearningRate 0.1549   Epoch: 6   Global Step: 65800   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:34:39,439-Speed 5510.41 samples/sec   Loss 7.8149   LearningRate 0.1549   Epoch: 6   Global Step: 65810   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:34:46,961-Speed 5446.15 samples/sec   Loss 7.8853   LearningRate 0.1549   Epoch: 6   Global Step: 65820   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:34:54,420-Speed 5492.60 samples/sec   Loss 7.7828   LearningRate 0.1549   Epoch: 6   Global Step: 65830   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:35:01,883-Speed 5489.08 samples/sec   Loss 7.7614   LearningRate 0.1548   Epoch: 6   Global Step: 65840   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:35:09,377-Speed 5466.24 samples/sec   Loss 7.8589   LearningRate 0.1548   Epoch: 6   Global Step: 65850   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:35:17,137-Speed 5279.56 samples/sec   Loss 7.7755   LearningRate 0.1548   Epoch: 6   Global Step: 65860   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:35:24,734-Speed 5391.98 samples/sec   Loss 7.8055   LearningRate 0.1548   Epoch: 6   Global Step: 65870   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:35:32,349-Speed 5379.80 samples/sec   Loss 7.7866   LearningRate 0.1548   Epoch: 6   Global Step: 65880   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:35:39,871-Speed 5446.36 samples/sec   Loss 7.8506   LearningRate 0.1547   Epoch: 6   Global Step: 65890   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:35:47,459-Speed 5398.78 samples/sec   Loss 7.7309   LearningRate 0.1547   Epoch: 6   Global Step: 65900   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:35:55,044-Speed 5400.83 samples/sec   Loss 7.7185   LearningRate 0.1547   Epoch: 6   Global Step: 65910   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:36:02,628-Speed 5401.60 samples/sec   Loss 7.7136   LearningRate 0.1547   Epoch: 6   Global Step: 65920   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:36:10,190-Speed 5416.98 samples/sec   Loss 7.7691   LearningRate 0.1546   Epoch: 6   Global Step: 65930   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:36:17,838-Speed 5356.35 samples/sec   Loss 7.7045   LearningRate 0.1546   Epoch: 6   Global Step: 65940   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:36:25,429-Speed 5396.54 samples/sec   Loss 7.8010   LearningRate 0.1546   Epoch: 6   Global Step: 65950   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:36:33,224-Speed 5255.81 samples/sec   Loss 7.7581   LearningRate 0.1546   Epoch: 6   Global Step: 65960   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:36:40,744-Speed 5446.80 samples/sec   Loss 7.8232   LearningRate 0.1546   Epoch: 6   Global Step: 65970   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:36:48,276-Speed 5439.17 samples/sec   Loss 7.7529   LearningRate 0.1545   Epoch: 6   Global Step: 65980   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:36:55,744-Speed 5486.06 samples/sec   Loss 7.8048   LearningRate 0.1545   Epoch: 6   Global Step: 65990   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:37:03,261-Speed 5449.57 samples/sec   Loss 7.7845   LearningRate 0.1545   Epoch: 6   Global Step: 66000   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:38:01,667-[lfw][66000]XNorm: 23.302244
Training: 2022-01-08 09:38:01,667-[lfw][66000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-01-08 09:38:01,668-[lfw][66000]Accuracy-Highest: 0.99817
Training: 2022-01-08 09:39:07,024-[cfp_fp][66000]XNorm: 21.406650
Training: 2022-01-08 09:39:07,025-[cfp_fp][66000]Accuracy-Flip: 0.98229+-0.00434
Training: 2022-01-08 09:39:07,026-[cfp_fp][66000]Accuracy-Highest: 0.98600
Training: 2022-01-08 09:39:52,822-[agedb_30][66000]XNorm: 23.238662
Training: 2022-01-08 09:39:52,823-[agedb_30][66000]Accuracy-Flip: 0.97367+-0.00741
Training: 2022-01-08 09:39:52,824-[agedb_30][66000]Accuracy-Highest: 0.97667
Training: 2022-01-08 09:40:00,427-Speed 231.20 samples/sec   Loss 7.7415   LearningRate 0.1545   Epoch: 6   Global Step: 66010   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:40:08,034-Speed 5385.66 samples/sec   Loss 7.8329   LearningRate 0.1545   Epoch: 6   Global Step: 66020   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:40:15,569-Speed 5437.81 samples/sec   Loss 7.7801   LearningRate 0.1544   Epoch: 6   Global Step: 66030   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:40:23,173-Speed 5387.87 samples/sec   Loss 7.7460   LearningRate 0.1544   Epoch: 6   Global Step: 66040   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:40:30,684-Speed 5454.69 samples/sec   Loss 7.6985   LearningRate 0.1544   Epoch: 6   Global Step: 66050   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:40:38,362-Speed 5335.42 samples/sec   Loss 7.7827   LearningRate 0.1544   Epoch: 6   Global Step: 66060   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:40:45,909-Speed 5428.66 samples/sec   Loss 7.7371   LearningRate 0.1543   Epoch: 6   Global Step: 66070   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:40:53,581-Speed 5339.27 samples/sec   Loss 7.7144   LearningRate 0.1543   Epoch: 6   Global Step: 66080   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:41:01,055-Speed 5481.46 samples/sec   Loss 7.6908   LearningRate 0.1543   Epoch: 6   Global Step: 66090   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:41:08,531-Speed 5479.07 samples/sec   Loss 7.7230   LearningRate 0.1543   Epoch: 6   Global Step: 66100   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:41:15,980-Speed 5499.42 samples/sec   Loss 7.7602   LearningRate 0.1543   Epoch: 6   Global Step: 66110   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:41:23,487-Speed 5457.12 samples/sec   Loss 7.8028   LearningRate 0.1542   Epoch: 6   Global Step: 66120   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:41:30,999-Speed 5453.70 samples/sec   Loss 7.7764   LearningRate 0.1542   Epoch: 6   Global Step: 66130   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:41:38,508-Speed 5455.03 samples/sec   Loss 7.7725   LearningRate 0.1542   Epoch: 6   Global Step: 66140   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:41:45,930-Speed 5519.42 samples/sec   Loss 7.7913   LearningRate 0.1542   Epoch: 6   Global Step: 66150   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:41:53,362-Speed 5512.05 samples/sec   Loss 7.7476   LearningRate 0.1541   Epoch: 6   Global Step: 66160   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:42:00,777-Speed 5524.81 samples/sec   Loss 7.7600   LearningRate 0.1541   Epoch: 6   Global Step: 66170   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:42:08,296-Speed 5448.19 samples/sec   Loss 7.7138   LearningRate 0.1541   Epoch: 6   Global Step: 66180   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:42:15,807-Speed 5453.77 samples/sec   Loss 7.7142   LearningRate 0.1541   Epoch: 6   Global Step: 66190   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:42:23,278-Speed 5483.87 samples/sec   Loss 7.7695   LearningRate 0.1541   Epoch: 6   Global Step: 66200   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:42:30,826-Speed 5427.11 samples/sec   Loss 7.7436   LearningRate 0.1540   Epoch: 6   Global Step: 66210   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:42:38,329-Speed 5459.76 samples/sec   Loss 7.7536   LearningRate 0.1540   Epoch: 6   Global Step: 66220   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:42:45,827-Speed 5463.30 samples/sec   Loss 7.7636   LearningRate 0.1540   Epoch: 6   Global Step: 66230   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:42:53,339-Speed 5453.68 samples/sec   Loss 7.7963   LearningRate 0.1540   Epoch: 6   Global Step: 66240   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:43:00,918-Speed 5405.23 samples/sec   Loss 7.7694   LearningRate 0.1539   Epoch: 6   Global Step: 66250   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:43:08,498-Speed 5404.56 samples/sec   Loss 7.7561   LearningRate 0.1539   Epoch: 6   Global Step: 66260   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:43:16,173-Speed 5336.89 samples/sec   Loss 7.7158   LearningRate 0.1539   Epoch: 6   Global Step: 66270   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:43:23,794-Speed 5376.01 samples/sec   Loss 7.7390   LearningRate 0.1539   Epoch: 6   Global Step: 66280   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:43:31,368-Speed 5408.48 samples/sec   Loss 7.7708   LearningRate 0.1539   Epoch: 6   Global Step: 66290   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:43:38,927-Speed 5419.15 samples/sec   Loss 7.7382   LearningRate 0.1538   Epoch: 6   Global Step: 66300   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:43:46,496-Speed 5412.29 samples/sec   Loss 7.7765   LearningRate 0.1538   Epoch: 6   Global Step: 66310   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:43:54,074-Speed 5405.82 samples/sec   Loss 7.7676   LearningRate 0.1538   Epoch: 6   Global Step: 66320   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:44:01,590-Speed 5450.56 samples/sec   Loss 7.7704   LearningRate 0.1538   Epoch: 6   Global Step: 66330   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:44:09,146-Speed 5421.17 samples/sec   Loss 7.7482   LearningRate 0.1538   Epoch: 6   Global Step: 66340   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:44:16,610-Speed 5489.09 samples/sec   Loss 7.7471   LearningRate 0.1537   Epoch: 6   Global Step: 66350   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:44:24,181-Speed 5410.52 samples/sec   Loss 7.7693   LearningRate 0.1537   Epoch: 6   Global Step: 66360   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:44:31,784-Speed 5388.42 samples/sec   Loss 7.7640   LearningRate 0.1537   Epoch: 6   Global Step: 66370   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:44:39,468-Speed 5331.42 samples/sec   Loss 7.7963   LearningRate 0.1537   Epoch: 6   Global Step: 66380   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:44:46,949-Speed 5475.39 samples/sec   Loss 7.7047   LearningRate 0.1536   Epoch: 6   Global Step: 66390   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:44:54,545-Speed 5393.39 samples/sec   Loss 7.7182   LearningRate 0.1536   Epoch: 6   Global Step: 66400   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:45:02,112-Speed 5414.18 samples/sec   Loss 7.8166   LearningRate 0.1536   Epoch: 6   Global Step: 66410   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:45:09,646-Speed 5437.33 samples/sec   Loss 7.7345   LearningRate 0.1536   Epoch: 6   Global Step: 66420   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:45:17,222-Speed 5406.93 samples/sec   Loss 7.7226   LearningRate 0.1536   Epoch: 6   Global Step: 66430   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:45:24,723-Speed 5461.85 samples/sec   Loss 7.7192   LearningRate 0.1535   Epoch: 6   Global Step: 66440   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:45:32,206-Speed 5474.34 samples/sec   Loss 7.7093   LearningRate 0.1535   Epoch: 6   Global Step: 66450   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:45:39,639-Speed 5510.93 samples/sec   Loss 7.7317   LearningRate 0.1535   Epoch: 6   Global Step: 66460   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:45:47,132-Speed 5467.28 samples/sec   Loss 7.7041   LearningRate 0.1535   Epoch: 6   Global Step: 66470   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:45:54,593-Speed 5490.79 samples/sec   Loss 7.6654   LearningRate 0.1534   Epoch: 6   Global Step: 66480   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 09:46:02,035-Speed 5504.90 samples/sec   Loss 7.6917   LearningRate 0.1534   Epoch: 6   Global Step: 66490   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:46:09,551-Speed 5450.38 samples/sec   Loss 7.7446   LearningRate 0.1534   Epoch: 6   Global Step: 66500   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:46:17,112-Speed 5418.14 samples/sec   Loss 7.7107   LearningRate 0.1534   Epoch: 6   Global Step: 66510   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:46:24,674-Speed 5417.15 samples/sec   Loss 7.7326   LearningRate 0.1534   Epoch: 6   Global Step: 66520   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:46:32,174-Speed 5462.28 samples/sec   Loss 7.7901   LearningRate 0.1533   Epoch: 6   Global Step: 66530   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:46:39,760-Speed 5399.91 samples/sec   Loss 7.6926   LearningRate 0.1533   Epoch: 6   Global Step: 66540   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:46:47,224-Speed 5489.12 samples/sec   Loss 7.6364   LearningRate 0.1533   Epoch: 6   Global Step: 66550   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:46:54,789-Speed 5414.75 samples/sec   Loss 7.7513   LearningRate 0.1533   Epoch: 6   Global Step: 66560   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:47:02,344-Speed 5422.30 samples/sec   Loss 7.7716   LearningRate 0.1533   Epoch: 6   Global Step: 66570   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:47:09,825-Speed 5476.31 samples/sec   Loss 7.7221   LearningRate 0.1532   Epoch: 6   Global Step: 66580   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:47:17,373-Speed 5426.90 samples/sec   Loss 7.7091   LearningRate 0.1532   Epoch: 6   Global Step: 66590   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:47:24,942-Speed 5411.99 samples/sec   Loss 7.7799   LearningRate 0.1532   Epoch: 6   Global Step: 66600   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:47:32,477-Speed 5436.87 samples/sec   Loss 7.6920   LearningRate 0.1532   Epoch: 6   Global Step: 66610   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:47:40,033-Speed 5421.82 samples/sec   Loss 7.7584   LearningRate 0.1531   Epoch: 6   Global Step: 66620   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:47:47,582-Speed 5426.42 samples/sec   Loss 7.7467   LearningRate 0.1531   Epoch: 6   Global Step: 66630   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:47:55,230-Speed 5356.24 samples/sec   Loss 7.7325   LearningRate 0.1531   Epoch: 6   Global Step: 66640   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:48:02,809-Speed 5405.20 samples/sec   Loss 7.7073   LearningRate 0.1531   Epoch: 6   Global Step: 66650   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:48:10,359-Speed 5425.80 samples/sec   Loss 7.7299   LearningRate 0.1531   Epoch: 6   Global Step: 66660   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:48:17,937-Speed 5405.41 samples/sec   Loss 7.7219   LearningRate 0.1530   Epoch: 6   Global Step: 66670   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:48:25,397-Speed 5491.39 samples/sec   Loss 7.7144   LearningRate 0.1530   Epoch: 6   Global Step: 66680   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:48:32,970-Speed 5409.82 samples/sec   Loss 7.7393   LearningRate 0.1530   Epoch: 6   Global Step: 66690   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:48:40,546-Speed 5407.35 samples/sec   Loss 7.7509   LearningRate 0.1530   Epoch: 6   Global Step: 66700   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:48:48,071-Speed 5444.03 samples/sec   Loss 7.6848   LearningRate 0.1529   Epoch: 6   Global Step: 66710   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 09:48:55,759-Speed 5328.51 samples/sec   Loss 7.7250   LearningRate 0.1529   Epoch: 6   Global Step: 66720   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:49:03,409-Speed 5354.47 samples/sec   Loss 7.7423   LearningRate 0.1529   Epoch: 6   Global Step: 66730   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:49:10,967-Speed 5420.81 samples/sec   Loss 7.7471   LearningRate 0.1529   Epoch: 6   Global Step: 66740   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:49:18,455-Speed 5470.77 samples/sec   Loss 7.7834   LearningRate 0.1529   Epoch: 6   Global Step: 66750   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:49:25,939-Speed 5473.37 samples/sec   Loss 7.7515   LearningRate 0.1528   Epoch: 6   Global Step: 66760   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:49:33,522-Speed 5402.35 samples/sec   Loss 7.7381   LearningRate 0.1528   Epoch: 6   Global Step: 66770   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:49:41,056-Speed 5437.47 samples/sec   Loss 7.7163   LearningRate 0.1528   Epoch: 6   Global Step: 66780   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:49:48,773-Speed 5308.29 samples/sec   Loss 7.7829   LearningRate 0.1528   Epoch: 6   Global Step: 66790   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:49:56,315-Speed 5431.75 samples/sec   Loss 7.7619   LearningRate 0.1528   Epoch: 6   Global Step: 66800   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:50:03,829-Speed 5451.97 samples/sec   Loss 7.6746   LearningRate 0.1527   Epoch: 6   Global Step: 66810   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:50:11,355-Speed 5443.48 samples/sec   Loss 7.7468   LearningRate 0.1527   Epoch: 6   Global Step: 66820   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:50:18,902-Speed 5427.58 samples/sec   Loss 7.7341   LearningRate 0.1527   Epoch: 6   Global Step: 66830   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:50:26,509-Speed 5385.00 samples/sec   Loss 7.7033   LearningRate 0.1527   Epoch: 6   Global Step: 66840   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:50:34,088-Speed 5405.84 samples/sec   Loss 7.6481   LearningRate 0.1526   Epoch: 6   Global Step: 66850   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:50:41,810-Speed 5304.74 samples/sec   Loss 7.6913   LearningRate 0.1526   Epoch: 6   Global Step: 66860   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:50:49,292-Speed 5475.26 samples/sec   Loss 7.7153   LearningRate 0.1526   Epoch: 6   Global Step: 66870   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:50:56,814-Speed 5445.95 samples/sec   Loss 7.7096   LearningRate 0.1526   Epoch: 6   Global Step: 66880   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:51:04,314-Speed 5461.72 samples/sec   Loss 7.6957   LearningRate 0.1526   Epoch: 6   Global Step: 66890   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:51:11,848-Speed 5437.81 samples/sec   Loss 7.7234   LearningRate 0.1525   Epoch: 6   Global Step: 66900   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:51:19,350-Speed 5460.02 samples/sec   Loss 7.7147   LearningRate 0.1525   Epoch: 6   Global Step: 66910   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:51:26,762-Speed 5526.85 samples/sec   Loss 7.7084   LearningRate 0.1525   Epoch: 6   Global Step: 66920   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:51:34,337-Speed 5408.57 samples/sec   Loss 7.7073   LearningRate 0.1525   Epoch: 6   Global Step: 66930   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:51:41,832-Speed 5465.48 samples/sec   Loss 7.7416   LearningRate 0.1524   Epoch: 6   Global Step: 66940   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:51:49,271-Speed 5506.30 samples/sec   Loss 7.5818   LearningRate 0.1524   Epoch: 6   Global Step: 66950   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:51:56,751-Speed 5477.25 samples/sec   Loss 7.7247   LearningRate 0.1524   Epoch: 6   Global Step: 66960   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:52:04,278-Speed 5442.25 samples/sec   Loss 7.6513   LearningRate 0.1524   Epoch: 6   Global Step: 66970   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:52:11,810-Speed 5439.04 samples/sec   Loss 7.7795   LearningRate 0.1524   Epoch: 6   Global Step: 66980   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:52:19,350-Speed 5432.37 samples/sec   Loss 7.7025   LearningRate 0.1523   Epoch: 6   Global Step: 66990   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:52:26,950-Speed 5390.71 samples/sec   Loss 7.7415   LearningRate 0.1523   Epoch: 6   Global Step: 67000   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:52:34,512-Speed 5416.92 samples/sec   Loss 7.7245   LearningRate 0.1523   Epoch: 6   Global Step: 67010   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:52:42,058-Speed 5428.88 samples/sec   Loss 7.8421   LearningRate 0.1523   Epoch: 6   Global Step: 67020   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:52:49,514-Speed 5493.86 samples/sec   Loss 7.6670   LearningRate 0.1523   Epoch: 6   Global Step: 67030   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:52:57,051-Speed 5435.77 samples/sec   Loss 7.7119   LearningRate 0.1522   Epoch: 6   Global Step: 67040   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:53:04,675-Speed 5373.13 samples/sec   Loss 7.7313   LearningRate 0.1522   Epoch: 6   Global Step: 67050   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:53:12,281-Speed 5385.94 samples/sec   Loss 7.6921   LearningRate 0.1522   Epoch: 6   Global Step: 67060   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:53:19,775-Speed 5465.98 samples/sec   Loss 7.7245   LearningRate 0.1522   Epoch: 6   Global Step: 67070   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:53:27,230-Speed 5495.45 samples/sec   Loss 7.7332   LearningRate 0.1521   Epoch: 6   Global Step: 67080   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:53:34,791-Speed 5417.82 samples/sec   Loss 7.7118   LearningRate 0.1521   Epoch: 6   Global Step: 67090   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:53:42,298-Speed 5457.31 samples/sec   Loss 7.7158   LearningRate 0.1521   Epoch: 6   Global Step: 67100   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:53:49,930-Speed 5367.51 samples/sec   Loss 7.6841   LearningRate 0.1521   Epoch: 6   Global Step: 67110   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:53:57,537-Speed 5385.41 samples/sec   Loss 7.7001   LearningRate 0.1521   Epoch: 6   Global Step: 67120   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:54:05,090-Speed 5423.63 samples/sec   Loss 7.6912   LearningRate 0.1520   Epoch: 6   Global Step: 67130   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:54:12,527-Speed 5508.89 samples/sec   Loss 7.6489   LearningRate 0.1520   Epoch: 6   Global Step: 67140   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:54:20,005-Speed 5477.69 samples/sec   Loss 7.7301   LearningRate 0.1520   Epoch: 6   Global Step: 67150   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:54:27,476-Speed 5483.50 samples/sec   Loss 7.7162   LearningRate 0.1520   Epoch: 6   Global Step: 67160   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:54:34,956-Speed 5476.43 samples/sec   Loss 7.7414   LearningRate 0.1519   Epoch: 6   Global Step: 67170   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:54:42,527-Speed 5410.68 samples/sec   Loss 7.6942   LearningRate 0.1519   Epoch: 6   Global Step: 67180   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:54:50,308-Speed 5264.72 samples/sec   Loss 7.6907   LearningRate 0.1519   Epoch: 6   Global Step: 67190   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:54:57,802-Speed 5466.92 samples/sec   Loss 7.7393   LearningRate 0.1519   Epoch: 6   Global Step: 67200   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:55:05,367-Speed 5414.98 samples/sec   Loss 7.7025   LearningRate 0.1519   Epoch: 6   Global Step: 67210   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:55:12,908-Speed 5432.73 samples/sec   Loss 7.7412   LearningRate 0.1518   Epoch: 6   Global Step: 67220   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:55:20,460-Speed 5424.44 samples/sec   Loss 7.7062   LearningRate 0.1518   Epoch: 6   Global Step: 67230   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:55:27,950-Speed 5469.32 samples/sec   Loss 7.6522   LearningRate 0.1518   Epoch: 6   Global Step: 67240   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:55:35,541-Speed 5396.37 samples/sec   Loss 7.6836   LearningRate 0.1518   Epoch: 6   Global Step: 67250   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:55:43,272-Speed 5298.98 samples/sec   Loss 7.6784   LearningRate 0.1518   Epoch: 6   Global Step: 67260   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:55:50,770-Speed 5463.89 samples/sec   Loss 7.7201   LearningRate 0.1517   Epoch: 6   Global Step: 67270   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:55:58,271-Speed 5461.01 samples/sec   Loss 7.7329   LearningRate 0.1517   Epoch: 6   Global Step: 67280   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:56:05,826-Speed 5422.70 samples/sec   Loss 7.7358   LearningRate 0.1517   Epoch: 6   Global Step: 67290   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:56:13,417-Speed 5396.01 samples/sec   Loss 7.7358   LearningRate 0.1517   Epoch: 6   Global Step: 67300   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:56:21,010-Speed 5395.59 samples/sec   Loss 7.7452   LearningRate 0.1516   Epoch: 6   Global Step: 67310   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:56:28,527-Speed 5449.97 samples/sec   Loss 7.7421   LearningRate 0.1516   Epoch: 6   Global Step: 67320   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:56:35,985-Speed 5492.57 samples/sec   Loss 7.7223   LearningRate 0.1516   Epoch: 6   Global Step: 67330   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:56:43,485-Speed 5462.68 samples/sec   Loss 7.6236   LearningRate 0.1516   Epoch: 6   Global Step: 67340   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:56:51,065-Speed 5404.02 samples/sec   Loss 7.7511   LearningRate 0.1516   Epoch: 6   Global Step: 67350   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:56:58,503-Speed 5507.51 samples/sec   Loss 7.7388   LearningRate 0.1515   Epoch: 6   Global Step: 67360   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:57:05,957-Speed 5495.69 samples/sec   Loss 7.6900   LearningRate 0.1515   Epoch: 6   Global Step: 67370   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:57:13,485-Speed 5442.65 samples/sec   Loss 7.6767   LearningRate 0.1515   Epoch: 6   Global Step: 67380   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:57:21,243-Speed 5279.86 samples/sec   Loss 7.6359   LearningRate 0.1515   Epoch: 6   Global Step: 67390   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:57:29,014-Speed 5271.84 samples/sec   Loss 7.6627   LearningRate 0.1515   Epoch: 6   Global Step: 67400   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:57:36,586-Speed 5410.29 samples/sec   Loss 7.6691   LearningRate 0.1514   Epoch: 6   Global Step: 67410   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:57:44,097-Speed 5454.81 samples/sec   Loss 7.6915   LearningRate 0.1514   Epoch: 6   Global Step: 67420   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:57:51,631-Speed 5437.02 samples/sec   Loss 7.6423   LearningRate 0.1514   Epoch: 6   Global Step: 67430   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:57:59,153-Speed 5445.68 samples/sec   Loss 7.7244   LearningRate 0.1514   Epoch: 6   Global Step: 67440   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:58:06,677-Speed 5444.71 samples/sec   Loss 7.7379   LearningRate 0.1513   Epoch: 6   Global Step: 67450   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:58:14,147-Speed 5484.28 samples/sec   Loss 7.7236   LearningRate 0.1513   Epoch: 6   Global Step: 67460   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:58:21,603-Speed 5494.01 samples/sec   Loss 7.6912   LearningRate 0.1513   Epoch: 6   Global Step: 67470   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:58:29,153-Speed 5426.19 samples/sec   Loss 7.6434   LearningRate 0.1513   Epoch: 6   Global Step: 67480   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:58:36,695-Speed 5431.45 samples/sec   Loss 7.6499   LearningRate 0.1513   Epoch: 6   Global Step: 67490   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:58:44,127-Speed 5512.47 samples/sec   Loss 7.6599   LearningRate 0.1512   Epoch: 6   Global Step: 67500   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:58:51,635-Speed 5456.42 samples/sec   Loss 7.6490   LearningRate 0.1512   Epoch: 6   Global Step: 67510   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:58:59,319-Speed 5330.73 samples/sec   Loss 7.6867   LearningRate 0.1512   Epoch: 6   Global Step: 67520   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:59:06,819-Speed 5462.29 samples/sec   Loss 7.7161   LearningRate 0.1512   Epoch: 6   Global Step: 67530   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 09:59:14,312-Speed 5467.21 samples/sec   Loss 7.6868   LearningRate 0.1511   Epoch: 6   Global Step: 67540   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:59:21,788-Speed 5479.37 samples/sec   Loss 7.7082   LearningRate 0.1511   Epoch: 6   Global Step: 67550   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:59:29,320-Speed 5439.40 samples/sec   Loss 7.6989   LearningRate 0.1511   Epoch: 6   Global Step: 67560   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:59:36,779-Speed 5491.86 samples/sec   Loss 7.6838   LearningRate 0.1511   Epoch: 6   Global Step: 67570   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:59:44,365-Speed 5399.92 samples/sec   Loss 7.6439   LearningRate 0.1511   Epoch: 6   Global Step: 67580   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:59:51,799-Speed 5510.61 samples/sec   Loss 7.6800   LearningRate 0.1510   Epoch: 6   Global Step: 67590   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 09:59:59,347-Speed 5427.42 samples/sec   Loss 7.7289   LearningRate 0.1510   Epoch: 6   Global Step: 67600   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:00:06,931-Speed 5401.21 samples/sec   Loss 7.6702   LearningRate 0.1510   Epoch: 6   Global Step: 67610   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:00:14,398-Speed 5486.38 samples/sec   Loss 7.6241   LearningRate 0.1510   Epoch: 6   Global Step: 67620   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:00:21,983-Speed 5400.84 samples/sec   Loss 7.6175   LearningRate 0.1510   Epoch: 6   Global Step: 67630   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:00:29,457-Speed 5481.04 samples/sec   Loss 7.6389   LearningRate 0.1509   Epoch: 6   Global Step: 67640   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:00:36,978-Speed 5446.97 samples/sec   Loss 7.5974   LearningRate 0.1509   Epoch: 6   Global Step: 67650   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:00:44,442-Speed 5488.78 samples/sec   Loss 7.6121   LearningRate 0.1509   Epoch: 6   Global Step: 67660   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:00:52,100-Speed 5349.21 samples/sec   Loss 7.6615   LearningRate 0.1509   Epoch: 6   Global Step: 67670   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:00:59,614-Speed 5451.98 samples/sec   Loss 7.6419   LearningRate 0.1508   Epoch: 6   Global Step: 67680   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:01:07,068-Speed 5495.10 samples/sec   Loss 7.7262   LearningRate 0.1508   Epoch: 6   Global Step: 67690   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:01:14,528-Speed 5491.36 samples/sec   Loss 7.6796   LearningRate 0.1508   Epoch: 6   Global Step: 67700   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:01:22,063-Speed 5437.61 samples/sec   Loss 7.6778   LearningRate 0.1508   Epoch: 6   Global Step: 67710   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:01:29,704-Speed 5360.73 samples/sec   Loss 7.6423   LearningRate 0.1508   Epoch: 6   Global Step: 67720   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:01:37,287-Speed 5401.84 samples/sec   Loss 7.7165   LearningRate 0.1507   Epoch: 6   Global Step: 67730   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:01:44,850-Speed 5416.77 samples/sec   Loss 7.6535   LearningRate 0.1507   Epoch: 6   Global Step: 67740   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:01:52,386-Speed 5436.32 samples/sec   Loss 7.7501   LearningRate 0.1507   Epoch: 6   Global Step: 67750   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:01:59,921-Speed 5436.37 samples/sec   Loss 7.6729   LearningRate 0.1507   Epoch: 6   Global Step: 67760   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:02:07,437-Speed 5450.63 samples/sec   Loss 7.6073   LearningRate 0.1507   Epoch: 6   Global Step: 67770   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:02:14,981-Speed 5429.44 samples/sec   Loss 7.6233   LearningRate 0.1506   Epoch: 6   Global Step: 67780   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:02:22,481-Speed 5462.97 samples/sec   Loss 7.6233   LearningRate 0.1506   Epoch: 6   Global Step: 67790   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:02:29,997-Speed 5450.47 samples/sec   Loss 7.7003   LearningRate 0.1506   Epoch: 6   Global Step: 67800   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:02:37,586-Speed 5397.08 samples/sec   Loss 7.7044   LearningRate 0.1506   Epoch: 6   Global Step: 67810   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:02:45,074-Speed 5471.37 samples/sec   Loss 7.7337   LearningRate 0.1505   Epoch: 6   Global Step: 67820   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:02:52,757-Speed 5331.92 samples/sec   Loss 7.6861   LearningRate 0.1505   Epoch: 6   Global Step: 67830   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:03:00,381-Speed 5373.68 samples/sec   Loss 7.6756   LearningRate 0.1505   Epoch: 6   Global Step: 67840   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:03:08,149-Speed 5273.36 samples/sec   Loss 7.6642   LearningRate 0.1505   Epoch: 6   Global Step: 67850   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:03:15,769-Speed 5375.92 samples/sec   Loss 7.6592   LearningRate 0.1505   Epoch: 6   Global Step: 67860   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:03:23,387-Speed 5377.44 samples/sec   Loss 7.6856   LearningRate 0.1504   Epoch: 6   Global Step: 67870   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:03:31,082-Speed 5323.59 samples/sec   Loss 7.6354   LearningRate 0.1504   Epoch: 6   Global Step: 67880   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:03:38,703-Speed 5375.78 samples/sec   Loss 7.6888   LearningRate 0.1504   Epoch: 6   Global Step: 67890   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:03:46,191-Speed 5470.62 samples/sec   Loss 7.6533   LearningRate 0.1504   Epoch: 6   Global Step: 67900   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:03:53,651-Speed 5491.47 samples/sec   Loss 7.6330   LearningRate 0.1503   Epoch: 6   Global Step: 67910   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:04:01,186-Speed 5437.24 samples/sec   Loss 7.6387   LearningRate 0.1503   Epoch: 6   Global Step: 67920   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:04:08,677-Speed 5468.04 samples/sec   Loss 7.6395   LearningRate 0.1503   Epoch: 6   Global Step: 67930   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:04:16,195-Speed 5448.87 samples/sec   Loss 7.7330   LearningRate 0.1503   Epoch: 6   Global Step: 67940   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:04:23,803-Speed 5384.88 samples/sec   Loss 7.6588   LearningRate 0.1503   Epoch: 6   Global Step: 67950   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:04:31,372-Speed 5412.76 samples/sec   Loss 7.6920   LearningRate 0.1502   Epoch: 6   Global Step: 67960   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:04:38,844-Speed 5481.77 samples/sec   Loss 7.6785   LearningRate 0.1502   Epoch: 6   Global Step: 67970   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:04:46,408-Speed 5416.32 samples/sec   Loss 7.6326   LearningRate 0.1502   Epoch: 6   Global Step: 67980   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:04:53,907-Speed 5464.43 samples/sec   Loss 7.6793   LearningRate 0.1502   Epoch: 6   Global Step: 67990   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:05:01,435-Speed 5442.22 samples/sec   Loss 7.6874   LearningRate 0.1502   Epoch: 6   Global Step: 68000   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:05:45,468-[lfw][68000]XNorm: 24.109080
Training: 2022-01-08 10:05:45,468-[lfw][68000]Accuracy-Flip: 0.99750+-0.00261
Training: 2022-01-08 10:05:45,469-[lfw][68000]Accuracy-Highest: 0.99817
Training: 2022-01-08 10:06:37,483-[cfp_fp][68000]XNorm: 22.016478
Training: 2022-01-08 10:06:37,484-[cfp_fp][68000]Accuracy-Flip: 0.98571+-0.00515
Training: 2022-01-08 10:06:37,485-[cfp_fp][68000]Accuracy-Highest: 0.98600
Training: 2022-01-08 10:07:24,082-[agedb_30][68000]XNorm: 24.030862
Training: 2022-01-08 10:07:24,084-[agedb_30][68000]Accuracy-Flip: 0.97500+-0.00632
Training: 2022-01-08 10:07:24,084-[agedb_30][68000]Accuracy-Highest: 0.97667
Training: 2022-01-08 10:07:31,600-Speed 272.77 samples/sec   Loss 7.6134   LearningRate 0.1501   Epoch: 6   Global Step: 68010   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:07:39,147-Speed 5429.29 samples/sec   Loss 7.6032   LearningRate 0.1501   Epoch: 6   Global Step: 68020   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:07:46,753-Speed 5386.03 samples/sec   Loss 7.7205   LearningRate 0.1501   Epoch: 6   Global Step: 68030   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:07:54,403-Speed 5356.38 samples/sec   Loss 7.6617   LearningRate 0.1501   Epoch: 6   Global Step: 68040   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:08:02,051-Speed 5356.73 samples/sec   Loss 7.6328   LearningRate 0.1500   Epoch: 6   Global Step: 68050   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:08:09,615-Speed 5415.97 samples/sec   Loss 7.6074   LearningRate 0.1500   Epoch: 6   Global Step: 68060   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:08:17,069-Speed 5496.02 samples/sec   Loss 7.6357   LearningRate 0.1500   Epoch: 6   Global Step: 68070   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:08:24,644-Speed 5407.55 samples/sec   Loss 7.6218   LearningRate 0.1500   Epoch: 6   Global Step: 68080   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:08:32,233-Speed 5397.99 samples/sec   Loss 7.6123   LearningRate 0.1500   Epoch: 6   Global Step: 68090   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:08:39,787-Speed 5423.24 samples/sec   Loss 7.5991   LearningRate 0.1499   Epoch: 6   Global Step: 68100   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:08:47,353-Speed 5415.01 samples/sec   Loss 7.6384   LearningRate 0.1499   Epoch: 6   Global Step: 68110   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:08:54,968-Speed 5379.38 samples/sec   Loss 7.6592   LearningRate 0.1499   Epoch: 6   Global Step: 68120   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 10:09:02,447-Speed 5477.33 samples/sec   Loss 7.7046   LearningRate 0.1499   Epoch: 6   Global Step: 68130   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:09:09,946-Speed 5462.53 samples/sec   Loss 7.6019   LearningRate 0.1499   Epoch: 6   Global Step: 68140   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:09:17,577-Speed 5368.38 samples/sec   Loss 7.6221   LearningRate 0.1498   Epoch: 6   Global Step: 68150   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:09:25,107-Speed 5440.09 samples/sec   Loss 7.6643   LearningRate 0.1498   Epoch: 6   Global Step: 68160   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:09:32,715-Speed 5384.36 samples/sec   Loss 7.6352   LearningRate 0.1498   Epoch: 6   Global Step: 68170   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:09:40,195-Speed 5476.81 samples/sec   Loss 7.6777   LearningRate 0.1498   Epoch: 6   Global Step: 68180   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:09:47,815-Speed 5376.22 samples/sec   Loss 7.6524   LearningRate 0.1497   Epoch: 6   Global Step: 68190   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:09:55,409-Speed 5394.43 samples/sec   Loss 7.6551   LearningRate 0.1497   Epoch: 6   Global Step: 68200   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:10:02,948-Speed 5433.56 samples/sec   Loss 7.6794   LearningRate 0.1497   Epoch: 6   Global Step: 68210   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:10:10,547-Speed 5390.86 samples/sec   Loss 7.6745   LearningRate 0.1497   Epoch: 6   Global Step: 68220   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:10:18,016-Speed 5485.23 samples/sec   Loss 7.6507   LearningRate 0.1497   Epoch: 6   Global Step: 68230   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:10:25,445-Speed 5514.09 samples/sec   Loss 7.6335   LearningRate 0.1496   Epoch: 6   Global Step: 68240   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:10:32,979-Speed 5436.85 samples/sec   Loss 7.6264   LearningRate 0.1496   Epoch: 6   Global Step: 68250   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:10:40,436-Speed 5493.74 samples/sec   Loss 7.6547   LearningRate 0.1496   Epoch: 6   Global Step: 68260   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:10:48,130-Speed 5324.47 samples/sec   Loss 7.6174   LearningRate 0.1496   Epoch: 6   Global Step: 68270   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:10:55,690-Speed 5418.79 samples/sec   Loss 7.6054   LearningRate 0.1496   Epoch: 6   Global Step: 68280   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:11:03,295-Speed 5386.48 samples/sec   Loss 7.6436   LearningRate 0.1495   Epoch: 6   Global Step: 68290   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:11:10,812-Speed 5449.69 samples/sec   Loss 7.5445   LearningRate 0.1495   Epoch: 6   Global Step: 68300   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:11:18,412-Speed 5389.99 samples/sec   Loss 7.6803   LearningRate 0.1495   Epoch: 6   Global Step: 68310   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:11:25,955-Speed 5431.07 samples/sec   Loss 7.6722   LearningRate 0.1495   Epoch: 6   Global Step: 68320   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:11:33,754-Speed 5252.15 samples/sec   Loss 7.6542   LearningRate 0.1494   Epoch: 6   Global Step: 68330   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:11:41,701-Speed 5154.96 samples/sec   Loss 7.6189   LearningRate 0.1494   Epoch: 6   Global Step: 68340   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:11:49,519-Speed 5240.08 samples/sec   Loss 7.6291   LearningRate 0.1494   Epoch: 6   Global Step: 68350   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:11:57,179-Speed 5347.56 samples/sec   Loss 7.6066   LearningRate 0.1494   Epoch: 6   Global Step: 68360   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:12:05,019-Speed 5225.33 samples/sec   Loss 7.7339   LearningRate 0.1494   Epoch: 6   Global Step: 68370   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:12:12,863-Speed 5222.26 samples/sec   Loss 7.6609   LearningRate 0.1493   Epoch: 6   Global Step: 68380   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:12:20,886-Speed 5106.02 samples/sec   Loss 7.6132   LearningRate 0.1493   Epoch: 6   Global Step: 68390   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:12:28,580-Speed 5324.27 samples/sec   Loss 7.6209   LearningRate 0.1493   Epoch: 6   Global Step: 68400   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:12:36,174-Speed 5394.71 samples/sec   Loss 7.6158   LearningRate 0.1493   Epoch: 6   Global Step: 68410   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:12:43,796-Speed 5374.93 samples/sec   Loss 7.6185   LearningRate 0.1493   Epoch: 6   Global Step: 68420   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:12:51,432-Speed 5364.57 samples/sec   Loss 7.5738   LearningRate 0.1492   Epoch: 6   Global Step: 68430   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:12:58,959-Speed 5442.16 samples/sec   Loss 7.6245   LearningRate 0.1492   Epoch: 6   Global Step: 68440   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:13:06,473-Speed 5451.88 samples/sec   Loss 7.6187   LearningRate 0.1492   Epoch: 6   Global Step: 68450   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:13:13,966-Speed 5467.77 samples/sec   Loss 7.7309   LearningRate 0.1492   Epoch: 6   Global Step: 68460   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:13:21,497-Speed 5439.95 samples/sec   Loss 7.6646   LearningRate 0.1491   Epoch: 6   Global Step: 68470   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:13:29,046-Speed 5426.09 samples/sec   Loss 7.6324   LearningRate 0.1491   Epoch: 6   Global Step: 68480   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:13:36,668-Speed 5374.56 samples/sec   Loss 7.5993   LearningRate 0.1491   Epoch: 6   Global Step: 68490   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:13:44,184-Speed 5450.94 samples/sec   Loss 7.7222   LearningRate 0.1491   Epoch: 6   Global Step: 68500   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:13:51,754-Speed 5411.37 samples/sec   Loss 7.6082   LearningRate 0.1491   Epoch: 6   Global Step: 68510   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 10:13:59,415-Speed 5347.62 samples/sec   Loss 7.5890   LearningRate 0.1490   Epoch: 6   Global Step: 68520   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 10:14:07,078-Speed 5345.78 samples/sec   Loss 7.5997   LearningRate 0.1490   Epoch: 6   Global Step: 68530   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:14:14,729-Speed 5354.30 samples/sec   Loss 7.6680   LearningRate 0.1490   Epoch: 6   Global Step: 68540   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:14:22,334-Speed 5387.29 samples/sec   Loss 7.6778   LearningRate 0.1490   Epoch: 6   Global Step: 68550   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:14:29,985-Speed 5353.92 samples/sec   Loss 7.6176   LearningRate 0.1490   Epoch: 6   Global Step: 68560   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:14:37,684-Speed 5320.12 samples/sec   Loss 7.5955   LearningRate 0.1489   Epoch: 6   Global Step: 68570   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:14:45,250-Speed 5414.68 samples/sec   Loss 7.5659   LearningRate 0.1489   Epoch: 6   Global Step: 68580   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:14:52,716-Speed 5487.29 samples/sec   Loss 7.6193   LearningRate 0.1489   Epoch: 6   Global Step: 68590   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:15:00,387-Speed 5340.44 samples/sec   Loss 7.6226   LearningRate 0.1489   Epoch: 6   Global Step: 68600   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:15:07,854-Speed 5485.71 samples/sec   Loss 7.5797   LearningRate 0.1488   Epoch: 6   Global Step: 68610   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:15:15,410-Speed 5421.69 samples/sec   Loss 7.5659   LearningRate 0.1488   Epoch: 6   Global Step: 68620   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:15:22,918-Speed 5456.43 samples/sec   Loss 7.5750   LearningRate 0.1488   Epoch: 6   Global Step: 68630   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 10:15:30,711-Speed 5256.75 samples/sec   Loss 7.7017   LearningRate 0.1488   Epoch: 6   Global Step: 68640   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 10:15:38,318-Speed 5384.40 samples/sec   Loss 7.6668   LearningRate 0.1488   Epoch: 6   Global Step: 68650   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 10:15:45,895-Speed 5407.02 samples/sec   Loss 7.6123   LearningRate 0.1487   Epoch: 6   Global Step: 68660   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:15:53,509-Speed 5380.47 samples/sec   Loss 7.6097   LearningRate 0.1487   Epoch: 6   Global Step: 68670   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:16:01,155-Speed 5357.78 samples/sec   Loss 7.5984   LearningRate 0.1487   Epoch: 6   Global Step: 68680   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:16:08,856-Speed 5319.03 samples/sec   Loss 7.6189   LearningRate 0.1487   Epoch: 6   Global Step: 68690   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:16:16,398-Speed 5431.70 samples/sec   Loss 7.6716   LearningRate 0.1487   Epoch: 6   Global Step: 68700   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:16:23,988-Speed 5397.91 samples/sec   Loss 7.5513   LearningRate 0.1486   Epoch: 6   Global Step: 68710   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:16:31,511-Speed 5444.93 samples/sec   Loss 7.6239   LearningRate 0.1486   Epoch: 6   Global Step: 68720   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:16:39,075-Speed 5415.54 samples/sec   Loss 7.6361   LearningRate 0.1486   Epoch: 6   Global Step: 68730   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:16:46,763-Speed 5329.03 samples/sec   Loss 7.6285   LearningRate 0.1486   Epoch: 6   Global Step: 68740   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:16:54,483-Speed 5306.47 samples/sec   Loss 7.6353   LearningRate 0.1485   Epoch: 6   Global Step: 68750   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:17:02,097-Speed 5379.76 samples/sec   Loss 7.6193   LearningRate 0.1485   Epoch: 6   Global Step: 68760   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:17:09,611-Speed 5451.37 samples/sec   Loss 7.5687   LearningRate 0.1485   Epoch: 6   Global Step: 68770   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:17:17,175-Speed 5416.20 samples/sec   Loss 7.5762   LearningRate 0.1485   Epoch: 6   Global Step: 68780   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:17:24,836-Speed 5347.23 samples/sec   Loss 7.5846   LearningRate 0.1485   Epoch: 6   Global Step: 68790   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:17:32,406-Speed 5411.25 samples/sec   Loss 7.5008   LearningRate 0.1484   Epoch: 6   Global Step: 68800   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:17:39,967-Speed 5417.70 samples/sec   Loss 7.5925   LearningRate 0.1484   Epoch: 6   Global Step: 68810   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:17:47,503-Speed 5436.42 samples/sec   Loss 7.5926   LearningRate 0.1484   Epoch: 6   Global Step: 68820   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:17:55,098-Speed 5393.77 samples/sec   Loss 7.6443   LearningRate 0.1484   Epoch: 6   Global Step: 68830   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:18:02,640-Speed 5431.42 samples/sec   Loss 7.5787   LearningRate 0.1484   Epoch: 6   Global Step: 68840   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:18:10,210-Speed 5411.72 samples/sec   Loss 7.5985   LearningRate 0.1483   Epoch: 6   Global Step: 68850   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:18:17,856-Speed 5357.98 samples/sec   Loss 7.5673   LearningRate 0.1483   Epoch: 6   Global Step: 68860   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:18:25,374-Speed 5449.13 samples/sec   Loss 7.5272   LearningRate 0.1483   Epoch: 6   Global Step: 68870   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:18:32,961-Speed 5398.84 samples/sec   Loss 7.6372   LearningRate 0.1483   Epoch: 6   Global Step: 68880   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:18:40,569-Speed 5384.69 samples/sec   Loss 7.5228   LearningRate 0.1482   Epoch: 6   Global Step: 68890   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:18:48,099-Speed 5440.68 samples/sec   Loss 7.5887   LearningRate 0.1482   Epoch: 6   Global Step: 68900   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:18:55,756-Speed 5349.72 samples/sec   Loss 7.5967   LearningRate 0.1482   Epoch: 6   Global Step: 68910   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:19:03,443-Speed 5329.09 samples/sec   Loss 7.5614   LearningRate 0.1482   Epoch: 6   Global Step: 68920   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:19:11,037-Speed 5394.70 samples/sec   Loss 7.6004   LearningRate 0.1482   Epoch: 6   Global Step: 68930   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:19:18,638-Speed 5389.24 samples/sec   Loss 7.5920   LearningRate 0.1481   Epoch: 6   Global Step: 68940   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:19:26,256-Speed 5377.85 samples/sec   Loss 7.5919   LearningRate 0.1481   Epoch: 6   Global Step: 68950   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:19:33,780-Speed 5443.91 samples/sec   Loss 7.5686   LearningRate 0.1481   Epoch: 6   Global Step: 68960   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:19:41,355-Speed 5408.62 samples/sec   Loss 7.5858   LearningRate 0.1481   Epoch: 6   Global Step: 68970   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:19:49,028-Speed 5339.25 samples/sec   Loss 7.5719   LearningRate 0.1481   Epoch: 6   Global Step: 68980   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:19:56,781-Speed 5283.81 samples/sec   Loss 7.6165   LearningRate 0.1480   Epoch: 6   Global Step: 68990   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:20:04,453-Speed 5339.33 samples/sec   Loss 7.6535   LearningRate 0.1480   Epoch: 6   Global Step: 69000   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:20:12,044-Speed 5396.10 samples/sec   Loss 7.5734   LearningRate 0.1480   Epoch: 6   Global Step: 69010   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:20:19,641-Speed 5392.98 samples/sec   Loss 7.6491   LearningRate 0.1480   Epoch: 6   Global Step: 69020   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:20:27,193-Speed 5424.41 samples/sec   Loss 7.5385   LearningRate 0.1479   Epoch: 6   Global Step: 69030   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:20:34,791-Speed 5390.99 samples/sec   Loss 7.5647   LearningRate 0.1479   Epoch: 6   Global Step: 69040   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:20:42,331-Speed 5433.73 samples/sec   Loss 7.5917   LearningRate 0.1479   Epoch: 6   Global Step: 69050   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:20:49,841-Speed 5454.90 samples/sec   Loss 7.5750   LearningRate 0.1479   Epoch: 6   Global Step: 69060   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:20:57,389-Speed 5427.40 samples/sec   Loss 7.5636   LearningRate 0.1479   Epoch: 6   Global Step: 69070   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:21:04,986-Speed 5392.37 samples/sec   Loss 7.6024   LearningRate 0.1478   Epoch: 6   Global Step: 69080   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:21:12,520-Speed 5436.97 samples/sec   Loss 7.6161   LearningRate 0.1478   Epoch: 6   Global Step: 69090   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:21:20,149-Speed 5369.94 samples/sec   Loss 7.5876   LearningRate 0.1478   Epoch: 6   Global Step: 69100   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:21:27,723-Speed 5408.98 samples/sec   Loss 7.5342   LearningRate 0.1478   Epoch: 6   Global Step: 69110   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:21:35,400-Speed 5335.91 samples/sec   Loss 7.5884   LearningRate 0.1478   Epoch: 6   Global Step: 69120   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:21:43,005-Speed 5386.47 samples/sec   Loss 7.5926   LearningRate 0.1477   Epoch: 6   Global Step: 69130   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:21:50,554-Speed 5427.03 samples/sec   Loss 7.6241   LearningRate 0.1477   Epoch: 6   Global Step: 69140   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:21:58,113-Speed 5419.56 samples/sec   Loss 7.5662   LearningRate 0.1477   Epoch: 6   Global Step: 69150   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:22:05,700-Speed 5398.83 samples/sec   Loss 7.5585   LearningRate 0.1477   Epoch: 6   Global Step: 69160   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:22:13,337-Speed 5364.03 samples/sec   Loss 7.5528   LearningRate 0.1476   Epoch: 6   Global Step: 69170   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:22:21,001-Speed 5345.27 samples/sec   Loss 7.6314   LearningRate 0.1476   Epoch: 6   Global Step: 69180   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:22:28,529-Speed 5441.72 samples/sec   Loss 7.5757   LearningRate 0.1476   Epoch: 6   Global Step: 69190   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:22:36,164-Speed 5365.24 samples/sec   Loss 7.6204   LearningRate 0.1476   Epoch: 6   Global Step: 69200   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:22:43,730-Speed 5414.45 samples/sec   Loss 7.6119   LearningRate 0.1476   Epoch: 6   Global Step: 69210   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:22:51,360-Speed 5368.87 samples/sec   Loss 7.5985   LearningRate 0.1475   Epoch: 6   Global Step: 69220   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:22:59,453-Speed 5062.49 samples/sec   Loss 7.5469   LearningRate 0.1475   Epoch: 6   Global Step: 69230   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:23:07,019-Speed 5413.84 samples/sec   Loss 7.5464   LearningRate 0.1475   Epoch: 6   Global Step: 69240   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:23:14,657-Speed 5363.10 samples/sec   Loss 7.5991   LearningRate 0.1475   Epoch: 6   Global Step: 69250   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:23:22,247-Speed 5397.14 samples/sec   Loss 7.5358   LearningRate 0.1475   Epoch: 6   Global Step: 69260   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:23:29,813-Speed 5414.83 samples/sec   Loss 7.5316   LearningRate 0.1474   Epoch: 6   Global Step: 69270   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:23:37,364-Speed 5425.45 samples/sec   Loss 7.5718   LearningRate 0.1474   Epoch: 6   Global Step: 69280   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:23:44,900-Speed 5435.71 samples/sec   Loss 7.5434   LearningRate 0.1474   Epoch: 6   Global Step: 69290   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:23:52,488-Speed 5399.04 samples/sec   Loss 7.5578   LearningRate 0.1474   Epoch: 6   Global Step: 69300   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:24:00,005-Speed 5449.30 samples/sec   Loss 7.6373   LearningRate 0.1473   Epoch: 6   Global Step: 69310   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:24:07,643-Speed 5363.15 samples/sec   Loss 7.5745   LearningRate 0.1473   Epoch: 6   Global Step: 69320   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:24:15,233-Speed 5397.10 samples/sec   Loss 7.6714   LearningRate 0.1473   Epoch: 6   Global Step: 69330   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:24:22,864-Speed 5368.84 samples/sec   Loss 7.5078   LearningRate 0.1473   Epoch: 6   Global Step: 69340   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:24:30,581-Speed 5308.41 samples/sec   Loss 7.5939   LearningRate 0.1473   Epoch: 6   Global Step: 69350   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:24:38,079-Speed 5463.79 samples/sec   Loss 7.5463   LearningRate 0.1472   Epoch: 6   Global Step: 69360   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:24:45,631-Speed 5424.03 samples/sec   Loss 7.5742   LearningRate 0.1472   Epoch: 6   Global Step: 69370   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:24:53,212-Speed 5403.53 samples/sec   Loss 7.5015   LearningRate 0.1472   Epoch: 6   Global Step: 69380   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:25:00,830-Speed 5377.68 samples/sec   Loss 7.5967   LearningRate 0.1472   Epoch: 6   Global Step: 69390   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:25:08,430-Speed 5389.79 samples/sec   Loss 7.5829   LearningRate 0.1472   Epoch: 6   Global Step: 69400   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:25:16,008-Speed 5405.81 samples/sec   Loss 7.5381   LearningRate 0.1471   Epoch: 6   Global Step: 69410   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:25:23,530-Speed 5446.48 samples/sec   Loss 7.6010   LearningRate 0.1471   Epoch: 6   Global Step: 69420   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:25:31,042-Speed 5453.60 samples/sec   Loss 7.5794   LearningRate 0.1471   Epoch: 6   Global Step: 69430   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:25:38,622-Speed 5404.03 samples/sec   Loss 7.5245   LearningRate 0.1471   Epoch: 6   Global Step: 69440   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:25:46,166-Speed 5430.45 samples/sec   Loss 7.6042   LearningRate 0.1470   Epoch: 6   Global Step: 69450   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:25:53,663-Speed 5464.15 samples/sec   Loss 7.5524   LearningRate 0.1470   Epoch: 6   Global Step: 69460   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:26:01,344-Speed 5333.85 samples/sec   Loss 7.5681   LearningRate 0.1470   Epoch: 6   Global Step: 69470   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:26:08,922-Speed 5405.45 samples/sec   Loss 7.6015   LearningRate 0.1470   Epoch: 6   Global Step: 69480   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:26:16,538-Speed 5378.84 samples/sec   Loss 7.5671   LearningRate 0.1470   Epoch: 6   Global Step: 69490   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:26:24,272-Speed 5296.38 samples/sec   Loss 7.5051   LearningRate 0.1469   Epoch: 6   Global Step: 69500   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:26:31,864-Speed 5396.39 samples/sec   Loss 7.5528   LearningRate 0.1469   Epoch: 6   Global Step: 69510   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:26:39,521-Speed 5349.63 samples/sec   Loss 7.5101   LearningRate 0.1469   Epoch: 6   Global Step: 69520   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:26:47,361-Speed 5225.44 samples/sec   Loss 7.5689   LearningRate 0.1469   Epoch: 6   Global Step: 69530   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:26:54,925-Speed 5415.43 samples/sec   Loss 7.5975   LearningRate 0.1469   Epoch: 6   Global Step: 69540   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:27:02,408-Speed 5475.16 samples/sec   Loss 7.5955   LearningRate 0.1468   Epoch: 6   Global Step: 69550   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:27:09,940-Speed 5438.67 samples/sec   Loss 7.5536   LearningRate 0.1468   Epoch: 6   Global Step: 69560   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:27:17,423-Speed 5473.95 samples/sec   Loss 7.6078   LearningRate 0.1468   Epoch: 6   Global Step: 69570   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:27:24,914-Speed 5469.10 samples/sec   Loss 7.5529   LearningRate 0.1468   Epoch: 6   Global Step: 69580   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:27:32,614-Speed 5319.98 samples/sec   Loss 7.4436   LearningRate 0.1467   Epoch: 6   Global Step: 69590   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:27:40,346-Speed 5298.05 samples/sec   Loss 7.5839   LearningRate 0.1467   Epoch: 6   Global Step: 69600   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:27:47,908-Speed 5417.06 samples/sec   Loss 7.5705   LearningRate 0.1467   Epoch: 6   Global Step: 69610   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:27:55,527-Speed 5377.17 samples/sec   Loss 7.5869   LearningRate 0.1467   Epoch: 6   Global Step: 69620   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:28:03,174-Speed 5357.10 samples/sec   Loss 7.5468   LearningRate 0.1467   Epoch: 6   Global Step: 69630   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:28:10,745-Speed 5410.44 samples/sec   Loss 7.5084   LearningRate 0.1466   Epoch: 6   Global Step: 69640   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:28:18,312-Speed 5413.51 samples/sec   Loss 7.5744   LearningRate 0.1466   Epoch: 6   Global Step: 69650   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:28:25,990-Speed 5335.81 samples/sec   Loss 7.5228   LearningRate 0.1466   Epoch: 6   Global Step: 69660   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:28:33,488-Speed 5463.54 samples/sec   Loss 7.6154   LearningRate 0.1466   Epoch: 6   Global Step: 69670   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:28:41,141-Speed 5352.98 samples/sec   Loss 7.6155   LearningRate 0.1466   Epoch: 6   Global Step: 69680   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:28:48,707-Speed 5413.69 samples/sec   Loss 7.5580   LearningRate 0.1465   Epoch: 6   Global Step: 69690   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:28:56,275-Speed 5413.72 samples/sec   Loss 7.5237   LearningRate 0.1465   Epoch: 6   Global Step: 69700   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:29:03,964-Speed 5328.03 samples/sec   Loss 7.5436   LearningRate 0.1465   Epoch: 6   Global Step: 69710   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:29:11,597-Speed 5366.36 samples/sec   Loss 7.5445   LearningRate 0.1465   Epoch: 6   Global Step: 69720   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:29:19,219-Speed 5374.97 samples/sec   Loss 7.5116   LearningRate 0.1465   Epoch: 6   Global Step: 69730   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:29:26,987-Speed 5273.46 samples/sec   Loss 7.5745   LearningRate 0.1464   Epoch: 6   Global Step: 69740   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:29:34,631-Speed 5359.22 samples/sec   Loss 7.5153   LearningRate 0.1464   Epoch: 6   Global Step: 69750   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:29:42,270-Speed 5362.51 samples/sec   Loss 7.4908   LearningRate 0.1464   Epoch: 6   Global Step: 69760   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:29:49,907-Speed 5363.56 samples/sec   Loss 7.4854   LearningRate 0.1464   Epoch: 6   Global Step: 69770   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:29:57,513-Speed 5386.32 samples/sec   Loss 7.5707   LearningRate 0.1463   Epoch: 6   Global Step: 69780   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:30:04,992-Speed 5477.54 samples/sec   Loss 7.4819   LearningRate 0.1463   Epoch: 6   Global Step: 69790   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:30:12,682-Speed 5327.00 samples/sec   Loss 7.5315   LearningRate 0.1463   Epoch: 6   Global Step: 69800   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:30:20,302-Speed 5375.70 samples/sec   Loss 7.4904   LearningRate 0.1463   Epoch: 6   Global Step: 69810   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:30:27,884-Speed 5403.48 samples/sec   Loss 7.4865   LearningRate 0.1463   Epoch: 6   Global Step: 69820   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:30:35,389-Speed 5458.94 samples/sec   Loss 7.4932   LearningRate 0.1462   Epoch: 6   Global Step: 69830   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:30:42,984-Speed 5393.14 samples/sec   Loss 7.5347   LearningRate 0.1462   Epoch: 6   Global Step: 69840   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:30:50,482-Speed 5463.54 samples/sec   Loss 7.5042   LearningRate 0.1462   Epoch: 6   Global Step: 69850   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:30:58,099-Speed 5378.33 samples/sec   Loss 7.4928   LearningRate 0.1462   Epoch: 6   Global Step: 69860   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:31:05,739-Speed 5362.62 samples/sec   Loss 7.5234   LearningRate 0.1462   Epoch: 6   Global Step: 69870   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:31:13,352-Speed 5380.46 samples/sec   Loss 7.6225   LearningRate 0.1461   Epoch: 6   Global Step: 69880   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:31:20,828-Speed 5479.94 samples/sec   Loss 7.5666   LearningRate 0.1461   Epoch: 6   Global Step: 69890   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:31:28,322-Speed 5466.32 samples/sec   Loss 7.4901   LearningRate 0.1461   Epoch: 6   Global Step: 69900   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:31:35,994-Speed 5340.15 samples/sec   Loss 7.5363   LearningRate 0.1461   Epoch: 6   Global Step: 69910   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:31:43,670-Speed 5336.54 samples/sec   Loss 7.5850   LearningRate 0.1460   Epoch: 6   Global Step: 69920   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:31:51,239-Speed 5412.41 samples/sec   Loss 7.6363   LearningRate 0.1460   Epoch: 6   Global Step: 69930   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:31:58,865-Speed 5372.24 samples/sec   Loss 7.6133   LearningRate 0.1460   Epoch: 6   Global Step: 69940   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:32:06,509-Speed 5358.66 samples/sec   Loss 7.5516   LearningRate 0.1460   Epoch: 6   Global Step: 69950   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:32:14,036-Speed 5442.29 samples/sec   Loss 7.5099   LearningRate 0.1460   Epoch: 6   Global Step: 69960   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:32:21,686-Speed 5355.52 samples/sec   Loss 7.4933   LearningRate 0.1459   Epoch: 6   Global Step: 69970   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:32:29,389-Speed 5318.10 samples/sec   Loss 7.5691   LearningRate 0.1459   Epoch: 6   Global Step: 69980   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:32:37,002-Speed 5381.09 samples/sec   Loss 7.5447   LearningRate 0.1459   Epoch: 6   Global Step: 69990   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:32:44,498-Speed 5464.51 samples/sec   Loss 7.5572   LearningRate 0.1459   Epoch: 6   Global Step: 70000   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:33:28,705-[lfw][70000]XNorm: 22.685974
Training: 2022-01-08 10:33:28,706-[lfw][70000]Accuracy-Flip: 0.99683+-0.00273
Training: 2022-01-08 10:33:28,706-[lfw][70000]Accuracy-Highest: 0.99817
Training: 2022-01-08 10:34:20,186-[cfp_fp][70000]XNorm: 20.707267
Training: 2022-01-08 10:34:20,187-[cfp_fp][70000]Accuracy-Flip: 0.98771+-0.00410
Training: 2022-01-08 10:34:20,188-[cfp_fp][70000]Accuracy-Highest: 0.98771
Training: 2022-01-08 10:35:05,934-[agedb_30][70000]XNorm: 22.748361
Training: 2022-01-08 10:35:05,936-[agedb_30][70000]Accuracy-Flip: 0.97350+-0.00474
Training: 2022-01-08 10:35:05,936-[agedb_30][70000]Accuracy-Highest: 0.97667
Training: 2022-01-08 10:35:13,584-Speed 274.74 samples/sec   Loss 7.5159   LearningRate 0.1459   Epoch: 6   Global Step: 70010   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:35:21,110-Speed 5444.03 samples/sec   Loss 7.5007   LearningRate 0.1458   Epoch: 6   Global Step: 70020   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:35:28,613-Speed 5460.62 samples/sec   Loss 7.5405   LearningRate 0.1458   Epoch: 6   Global Step: 70030   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:35:36,246-Speed 5366.62 samples/sec   Loss 7.5770   LearningRate 0.1458   Epoch: 6   Global Step: 70040   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:35:43,825-Speed 5405.40 samples/sec   Loss 7.5591   LearningRate 0.1458   Epoch: 6   Global Step: 70050   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:35:51,377-Speed 5424.85 samples/sec   Loss 7.5308   LearningRate 0.1457   Epoch: 6   Global Step: 70060   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:35:58,932-Speed 5421.96 samples/sec   Loss 7.4917   LearningRate 0.1457   Epoch: 6   Global Step: 70070   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:36:06,519-Speed 5399.67 samples/sec   Loss 7.4810   LearningRate 0.1457   Epoch: 6   Global Step: 70080   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 10:36:14,118-Speed 5390.86 samples/sec   Loss 7.5318   LearningRate 0.1457   Epoch: 6   Global Step: 70090   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:36:21,541-Speed 5518.34 samples/sec   Loss 7.5203   LearningRate 0.1457   Epoch: 6   Global Step: 70100   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 10:36:28,984-Speed 5504.36 samples/sec   Loss 7.4702   LearningRate 0.1456   Epoch: 6   Global Step: 70110   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 10:36:36,518-Speed 5437.53 samples/sec   Loss 7.5178   LearningRate 0.1456   Epoch: 6   Global Step: 70120   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 10:36:44,228-Speed 5312.58 samples/sec   Loss 7.5514   LearningRate 0.1456   Epoch: 6   Global Step: 70130   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 10:36:51,755-Speed 5442.59 samples/sec   Loss 7.5088   LearningRate 0.1456   Epoch: 6   Global Step: 70140   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 10:36:59,282-Speed 5442.27 samples/sec   Loss 7.5018   LearningRate 0.1456   Epoch: 6   Global Step: 70150   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 10:37:06,736-Speed 5495.97 samples/sec   Loss 7.5108   LearningRate 0.1455   Epoch: 6   Global Step: 70160   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 10:37:14,295-Speed 5419.59 samples/sec   Loss 7.4523   LearningRate 0.1455   Epoch: 6   Global Step: 70170   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 10:37:21,849-Speed 5422.46 samples/sec   Loss 7.5413   LearningRate 0.1455   Epoch: 6   Global Step: 70180   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 10:37:29,558-Speed 5314.24 samples/sec   Loss 7.4496   LearningRate 0.1455   Epoch: 6   Global Step: 70190   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 10:37:37,244-Speed 5329.93 samples/sec   Loss 7.5022   LearningRate 0.1455   Epoch: 6   Global Step: 70200   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 10:37:44,758-Speed 5452.05 samples/sec   Loss 7.5053   LearningRate 0.1454   Epoch: 6   Global Step: 70210   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:37:52,498-Speed 5292.36 samples/sec   Loss 7.4945   LearningRate 0.1454   Epoch: 6   Global Step: 70220   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:37:59,936-Speed 5507.78 samples/sec   Loss 7.5561   LearningRate 0.1454   Epoch: 6   Global Step: 70230   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:38:07,555-Speed 5376.09 samples/sec   Loss 7.5543   LearningRate 0.1454   Epoch: 6   Global Step: 70240   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:38:15,154-Speed 5391.45 samples/sec   Loss 7.4986   LearningRate 0.1453   Epoch: 6   Global Step: 70250   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:38:22,623-Speed 5484.56 samples/sec   Loss 7.4916   LearningRate 0.1453   Epoch: 6   Global Step: 70260   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:38:30,057-Speed 5510.84 samples/sec   Loss 7.5030   LearningRate 0.1453   Epoch: 6   Global Step: 70270   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:38:37,618-Speed 5417.10 samples/sec   Loss 7.5434   LearningRate 0.1453   Epoch: 6   Global Step: 70280   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:38:45,262-Speed 5359.52 samples/sec   Loss 7.5861   LearningRate 0.1453   Epoch: 6   Global Step: 70290   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:38:52,823-Speed 5418.69 samples/sec   Loss 7.4767   LearningRate 0.1452   Epoch: 6   Global Step: 70300   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:39:00,445-Speed 5374.09 samples/sec   Loss 7.5718   LearningRate 0.1452   Epoch: 6   Global Step: 70310   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:39:07,865-Speed 5521.33 samples/sec   Loss 7.5039   LearningRate 0.1452   Epoch: 6   Global Step: 70320   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:39:15,396-Speed 5439.65 samples/sec   Loss 7.4726   LearningRate 0.1452   Epoch: 6   Global Step: 70330   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:39:22,978-Speed 5402.71 samples/sec   Loss 7.5124   LearningRate 0.1452   Epoch: 6   Global Step: 70340   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:39:30,553-Speed 5408.36 samples/sec   Loss 7.4626   LearningRate 0.1451   Epoch: 6   Global Step: 70350   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:39:38,144-Speed 5396.17 samples/sec   Loss 7.4657   LearningRate 0.1451   Epoch: 6   Global Step: 70360   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:39:45,728-Speed 5401.95 samples/sec   Loss 7.5295   LearningRate 0.1451   Epoch: 6   Global Step: 70370   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:39:53,261-Speed 5437.96 samples/sec   Loss 7.4710   LearningRate 0.1451   Epoch: 6   Global Step: 70380   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:40:00,820-Speed 5419.30 samples/sec   Loss 7.5070   LearningRate 0.1451   Epoch: 6   Global Step: 70390   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:40:08,420-Speed 5390.25 samples/sec   Loss 7.4445   LearningRate 0.1450   Epoch: 6   Global Step: 70400   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:40:15,957-Speed 5435.55 samples/sec   Loss 7.4515   LearningRate 0.1450   Epoch: 6   Global Step: 70410   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:40:23,451-Speed 5466.43 samples/sec   Loss 7.4883   LearningRate 0.1450   Epoch: 6   Global Step: 70420   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:40:30,969-Speed 5448.88 samples/sec   Loss 7.4614   LearningRate 0.1450   Epoch: 6   Global Step: 70430   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:40:38,580-Speed 5382.89 samples/sec   Loss 7.4958   LearningRate 0.1449   Epoch: 6   Global Step: 70440   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:40:46,192-Speed 5381.23 samples/sec   Loss 7.5069   LearningRate 0.1449   Epoch: 6   Global Step: 70450   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:40:53,726-Speed 5437.74 samples/sec   Loss 7.5059   LearningRate 0.1449   Epoch: 6   Global Step: 70460   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:41:01,225-Speed 5462.86 samples/sec   Loss 7.5052   LearningRate 0.1449   Epoch: 6   Global Step: 70470   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:41:08,737-Speed 5453.49 samples/sec   Loss 7.5026   LearningRate 0.1449   Epoch: 6   Global Step: 70480   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:41:16,426-Speed 5327.29 samples/sec   Loss 7.5046   LearningRate 0.1448   Epoch: 6   Global Step: 70490   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:41:24,006-Speed 5404.47 samples/sec   Loss 7.4788   LearningRate 0.1448   Epoch: 6   Global Step: 70500   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:41:31,632-Speed 5372.40 samples/sec   Loss 7.5064   LearningRate 0.1448   Epoch: 6   Global Step: 70510   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:41:39,236-Speed 5387.39 samples/sec   Loss 7.5064   LearningRate 0.1448   Epoch: 6   Global Step: 70520   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:41:47,044-Speed 5246.29 samples/sec   Loss 7.5263   LearningRate 0.1448   Epoch: 6   Global Step: 70530   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:41:54,626-Speed 5402.87 samples/sec   Loss 7.5286   LearningRate 0.1447   Epoch: 6   Global Step: 70540   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:42:02,185-Speed 5419.25 samples/sec   Loss 7.4694   LearningRate 0.1447   Epoch: 6   Global Step: 70550   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:42:09,803-Speed 5377.72 samples/sec   Loss 7.4490   LearningRate 0.1447   Epoch: 6   Global Step: 70560   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:42:17,515-Speed 5311.07 samples/sec   Loss 7.4398   LearningRate 0.1447   Epoch: 6   Global Step: 70570   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:42:25,070-Speed 5422.44 samples/sec   Loss 7.4829   LearningRate 0.1446   Epoch: 6   Global Step: 70580   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:42:32,737-Speed 5343.04 samples/sec   Loss 7.4439   LearningRate 0.1446   Epoch: 6   Global Step: 70590   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:42:40,297-Speed 5419.24 samples/sec   Loss 7.4756   LearningRate 0.1446   Epoch: 6   Global Step: 70600   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:42:47,855-Speed 5419.47 samples/sec   Loss 7.4730   LearningRate 0.1446   Epoch: 6   Global Step: 70610   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:42:55,361-Speed 5458.22 samples/sec   Loss 7.4698   LearningRate 0.1446   Epoch: 6   Global Step: 70620   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:43:02,836-Speed 5480.45 samples/sec   Loss 7.4681   LearningRate 0.1445   Epoch: 6   Global Step: 70630   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:43:10,361-Speed 5443.84 samples/sec   Loss 7.5003   LearningRate 0.1445   Epoch: 6   Global Step: 70640   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:43:17,881-Speed 5447.15 samples/sec   Loss 7.4986   LearningRate 0.1445   Epoch: 6   Global Step: 70650   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:43:25,345-Speed 5488.66 samples/sec   Loss 7.5045   LearningRate 0.1445   Epoch: 6   Global Step: 70660   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:43:32,865-Speed 5447.29 samples/sec   Loss 7.4201   LearningRate 0.1445   Epoch: 6   Global Step: 70670   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:43:40,416-Speed 5425.02 samples/sec   Loss 7.5006   LearningRate 0.1444   Epoch: 6   Global Step: 70680   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:43:47,888-Speed 5482.58 samples/sec   Loss 7.4802   LearningRate 0.1444   Epoch: 6   Global Step: 70690   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:43:55,443-Speed 5422.35 samples/sec   Loss 7.4736   LearningRate 0.1444   Epoch: 6   Global Step: 70700   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:44:02,995-Speed 5424.61 samples/sec   Loss 7.5206   LearningRate 0.1444   Epoch: 6   Global Step: 70710   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:44:10,619-Speed 5372.94 samples/sec   Loss 7.4962   LearningRate 0.1444   Epoch: 6   Global Step: 70720   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:44:18,119-Speed 5462.45 samples/sec   Loss 7.4831   LearningRate 0.1443   Epoch: 6   Global Step: 70730   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:44:25,754-Speed 5364.99 samples/sec   Loss 7.5364   LearningRate 0.1443   Epoch: 6   Global Step: 70740   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:44:33,215-Speed 5490.95 samples/sec   Loss 7.4779   LearningRate 0.1443   Epoch: 6   Global Step: 70750   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:44:40,789-Speed 5408.99 samples/sec   Loss 7.4856   LearningRate 0.1443   Epoch: 6   Global Step: 70760   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:44:48,390-Speed 5388.95 samples/sec   Loss 7.4608   LearningRate 0.1442   Epoch: 6   Global Step: 70770   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:44:55,936-Speed 5429.41 samples/sec   Loss 7.3830   LearningRate 0.1442   Epoch: 6   Global Step: 70780   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:45:03,494-Speed 5419.73 samples/sec   Loss 7.5276   LearningRate 0.1442   Epoch: 6   Global Step: 70790   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:45:11,008-Speed 5451.97 samples/sec   Loss 7.4247   LearningRate 0.1442   Epoch: 6   Global Step: 70800   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:45:18,630-Speed 5375.02 samples/sec   Loss 7.4811   LearningRate 0.1442   Epoch: 6   Global Step: 70810   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:45:26,078-Speed 5500.12 samples/sec   Loss 7.4730   LearningRate 0.1441   Epoch: 6   Global Step: 70820   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:45:33,573-Speed 5466.03 samples/sec   Loss 7.4266   LearningRate 0.1441   Epoch: 6   Global Step: 70830   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:45:41,171-Speed 5391.37 samples/sec   Loss 7.4929   LearningRate 0.1441   Epoch: 6   Global Step: 70840   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:45:48,713-Speed 5431.87 samples/sec   Loss 7.5140   LearningRate 0.1441   Epoch: 6   Global Step: 70850   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:45:56,167-Speed 5495.86 samples/sec   Loss 7.4682   LearningRate 0.1441   Epoch: 6   Global Step: 70860   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:46:03,752-Speed 5400.88 samples/sec   Loss 7.3882   LearningRate 0.1440   Epoch: 6   Global Step: 70870   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:46:11,274-Speed 5446.12 samples/sec   Loss 7.4707   LearningRate 0.1440   Epoch: 6   Global Step: 70880   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:46:18,786-Speed 5453.34 samples/sec   Loss 7.4240   LearningRate 0.1440   Epoch: 6   Global Step: 70890   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:46:26,235-Speed 5499.65 samples/sec   Loss 7.4899   LearningRate 0.1440   Epoch: 6   Global Step: 70900   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:46:33,800-Speed 5415.15 samples/sec   Loss 7.5402   LearningRate 0.1440   Epoch: 6   Global Step: 70910   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:46:41,361-Speed 5417.89 samples/sec   Loss 7.4633   LearningRate 0.1439   Epoch: 6   Global Step: 70920   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:46:48,880-Speed 5448.37 samples/sec   Loss 7.4939   LearningRate 0.1439   Epoch: 6   Global Step: 70930   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:46:56,420-Speed 5433.10 samples/sec   Loss 7.4777   LearningRate 0.1439   Epoch: 6   Global Step: 70940   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:47:03,974-Speed 5423.32 samples/sec   Loss 7.4957   LearningRate 0.1439   Epoch: 6   Global Step: 70950   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:47:11,588-Speed 5380.36 samples/sec   Loss 7.4500   LearningRate 0.1438   Epoch: 6   Global Step: 70960   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:47:19,081-Speed 5467.27 samples/sec   Loss 7.5083   LearningRate 0.1438   Epoch: 6   Global Step: 70970   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:47:26,680-Speed 5390.51 samples/sec   Loss 7.4498   LearningRate 0.1438   Epoch: 6   Global Step: 70980   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:47:34,194-Speed 5452.15 samples/sec   Loss 7.5199   LearningRate 0.1438   Epoch: 6   Global Step: 70990   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:47:41,753-Speed 5419.24 samples/sec   Loss 7.5099   LearningRate 0.1438   Epoch: 6   Global Step: 71000   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:47:49,443-Speed 5327.12 samples/sec   Loss 7.5110   LearningRate 0.1437   Epoch: 6   Global Step: 71010   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:47:57,179-Speed 5295.24 samples/sec   Loss 7.4378   LearningRate 0.1437   Epoch: 6   Global Step: 71020   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:48:04,966-Speed 5260.85 samples/sec   Loss 7.4728   LearningRate 0.1437   Epoch: 6   Global Step: 71030   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:48:12,555-Speed 5398.03 samples/sec   Loss 7.4604   LearningRate 0.1437   Epoch: 6   Global Step: 71040   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:48:20,130-Speed 5408.04 samples/sec   Loss 7.3995   LearningRate 0.1437   Epoch: 6   Global Step: 71050   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:48:27,585-Speed 5495.01 samples/sec   Loss 7.4596   LearningRate 0.1436   Epoch: 6   Global Step: 71060   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:48:35,106-Speed 5446.90 samples/sec   Loss 7.4961   LearningRate 0.1436   Epoch: 6   Global Step: 71070   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:48:42,665-Speed 5419.28 samples/sec   Loss 7.3540   LearningRate 0.1436   Epoch: 6   Global Step: 71080   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:48:50,248-Speed 5402.04 samples/sec   Loss 7.4985   LearningRate 0.1436   Epoch: 6   Global Step: 71090   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:48:57,822-Speed 5409.02 samples/sec   Loss 7.4279   LearningRate 0.1436   Epoch: 6   Global Step: 71100   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:49:05,420-Speed 5391.64 samples/sec   Loss 7.5049   LearningRate 0.1435   Epoch: 6   Global Step: 71110   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:49:12,920-Speed 5461.65 samples/sec   Loss 7.4990   LearningRate 0.1435   Epoch: 6   Global Step: 71120   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:49:20,525-Speed 5387.23 samples/sec   Loss 7.4684   LearningRate 0.1435   Epoch: 6   Global Step: 71130   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:49:28,058-Speed 5437.49 samples/sec   Loss 7.5474   LearningRate 0.1435   Epoch: 6   Global Step: 71140   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:49:35,612-Speed 5423.33 samples/sec   Loss 7.4485   LearningRate 0.1434   Epoch: 6   Global Step: 71150   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:49:43,104-Speed 5467.74 samples/sec   Loss 7.4820   LearningRate 0.1434   Epoch: 6   Global Step: 71160   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:49:50,658-Speed 5423.11 samples/sec   Loss 7.5076   LearningRate 0.1434   Epoch: 6   Global Step: 71170   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:49:58,168-Speed 5455.12 samples/sec   Loss 7.5024   LearningRate 0.1434   Epoch: 6   Global Step: 71180   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:50:05,775-Speed 5384.93 samples/sec   Loss 7.4402   LearningRate 0.1434   Epoch: 6   Global Step: 71190   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:50:13,278-Speed 5460.10 samples/sec   Loss 7.4990   LearningRate 0.1433   Epoch: 6   Global Step: 71200   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:50:20,712-Speed 5510.54 samples/sec   Loss 7.4926   LearningRate 0.1433   Epoch: 6   Global Step: 71210   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:50:28,329-Speed 5377.72 samples/sec   Loss 7.4777   LearningRate 0.1433   Epoch: 6   Global Step: 71220   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 10:50:35,822-Speed 5467.52 samples/sec   Loss 7.4363   LearningRate 0.1433   Epoch: 6   Global Step: 71230   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:50:43,361-Speed 5433.70 samples/sec   Loss 7.4169   LearningRate 0.1433   Epoch: 6   Global Step: 71240   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:50:50,830-Speed 5484.85 samples/sec   Loss 7.4291   LearningRate 0.1432   Epoch: 6   Global Step: 71250   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:50:58,294-Speed 5487.82 samples/sec   Loss 7.4819   LearningRate 0.1432   Epoch: 6   Global Step: 71260   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:51:05,871-Speed 5407.55 samples/sec   Loss 7.4359   LearningRate 0.1432   Epoch: 6   Global Step: 71270   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:51:13,350-Speed 5476.48 samples/sec   Loss 7.3935   LearningRate 0.1432   Epoch: 6   Global Step: 71280   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:51:20,979-Speed 5369.87 samples/sec   Loss 7.4458   LearningRate 0.1432   Epoch: 6   Global Step: 71290   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:51:28,460-Speed 5475.90 samples/sec   Loss 7.4367   LearningRate 0.1431   Epoch: 6   Global Step: 71300   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:51:36,007-Speed 5428.13 samples/sec   Loss 7.4578   LearningRate 0.1431   Epoch: 6   Global Step: 71310   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:51:43,548-Speed 5432.50 samples/sec   Loss 7.3777   LearningRate 0.1431   Epoch: 6   Global Step: 71320   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:51:51,094-Speed 5428.05 samples/sec   Loss 7.4434   LearningRate 0.1431   Epoch: 6   Global Step: 71330   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:51:58,571-Speed 5479.41 samples/sec   Loss 7.5113   LearningRate 0.1430   Epoch: 6   Global Step: 71340   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:52:06,001-Speed 5513.76 samples/sec   Loss 7.3980   LearningRate 0.1430   Epoch: 6   Global Step: 71350   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:52:13,558-Speed 5420.26 samples/sec   Loss 7.4505   LearningRate 0.1430   Epoch: 6   Global Step: 71360   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:52:21,073-Speed 5450.77 samples/sec   Loss 7.4268   LearningRate 0.1430   Epoch: 6   Global Step: 71370   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:52:28,676-Speed 5388.09 samples/sec   Loss 7.4198   LearningRate 0.1430   Epoch: 6   Global Step: 71380   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:52:36,288-Speed 5382.38 samples/sec   Loss 7.4046   LearningRate 0.1429   Epoch: 6   Global Step: 71390   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:52:43,879-Speed 5396.35 samples/sec   Loss 7.4754   LearningRate 0.1429   Epoch: 6   Global Step: 71400   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:52:51,389-Speed 5454.81 samples/sec   Loss 7.4481   LearningRate 0.1429   Epoch: 6   Global Step: 71410   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:52:58,912-Speed 5444.98 samples/sec   Loss 7.4190   LearningRate 0.1429   Epoch: 6   Global Step: 71420   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:53:06,443-Speed 5439.38 samples/sec   Loss 7.4824   LearningRate 0.1429   Epoch: 6   Global Step: 71430   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:53:14,012-Speed 5412.29 samples/sec   Loss 7.4327   LearningRate 0.1428   Epoch: 6   Global Step: 71440   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:53:21,577-Speed 5415.43 samples/sec   Loss 7.4365   LearningRate 0.1428   Epoch: 6   Global Step: 71450   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 10:53:29,085-Speed 5455.94 samples/sec   Loss 7.3979   LearningRate 0.1428   Epoch: 6   Global Step: 71460   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:53:36,561-Speed 5480.46 samples/sec   Loss 7.4536   LearningRate 0.1428   Epoch: 6   Global Step: 71470   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:53:44,090-Speed 5440.82 samples/sec   Loss 7.5242   LearningRate 0.1428   Epoch: 6   Global Step: 71480   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:53:51,582-Speed 5467.72 samples/sec   Loss 7.5098   LearningRate 0.1427   Epoch: 6   Global Step: 71490   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:53:59,078-Speed 5464.99 samples/sec   Loss 7.4149   LearningRate 0.1427   Epoch: 6   Global Step: 71500   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:54:06,571-Speed 5467.27 samples/sec   Loss 7.4780   LearningRate 0.1427   Epoch: 6   Global Step: 71510   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:54:14,062-Speed 5469.01 samples/sec   Loss 7.3932   LearningRate 0.1427   Epoch: 6   Global Step: 71520   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:54:21,533-Speed 5482.50 samples/sec   Loss 7.4834   LearningRate 0.1426   Epoch: 6   Global Step: 71530   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:54:29,160-Speed 5371.11 samples/sec   Loss 7.4093   LearningRate 0.1426   Epoch: 6   Global Step: 71540   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:54:36,689-Speed 5441.58 samples/sec   Loss 7.3593   LearningRate 0.1426   Epoch: 6   Global Step: 71550   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:54:44,199-Speed 5455.14 samples/sec   Loss 7.4467   LearningRate 0.1426   Epoch: 6   Global Step: 71560   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:54:51,769-Speed 5410.91 samples/sec   Loss 7.4220   LearningRate 0.1426   Epoch: 6   Global Step: 71570   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:54:59,287-Speed 5449.40 samples/sec   Loss 7.4117   LearningRate 0.1425   Epoch: 6   Global Step: 71580   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:55:06,830-Speed 5430.77 samples/sec   Loss 7.3680   LearningRate 0.1425   Epoch: 6   Global Step: 71590   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:55:14,388-Speed 5420.70 samples/sec   Loss 7.3595   LearningRate 0.1425   Epoch: 6   Global Step: 71600   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:55:21,908-Speed 5447.35 samples/sec   Loss 7.4397   LearningRate 0.1425   Epoch: 6   Global Step: 71610   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:55:29,494-Speed 5399.61 samples/sec   Loss 7.4228   LearningRate 0.1425   Epoch: 6   Global Step: 71620   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:55:36,972-Speed 5478.88 samples/sec   Loss 7.4515   LearningRate 0.1424   Epoch: 6   Global Step: 71630   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:55:44,546-Speed 5408.78 samples/sec   Loss 7.5209   LearningRate 0.1424   Epoch: 6   Global Step: 71640   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:55:52,057-Speed 5453.43 samples/sec   Loss 7.4539   LearningRate 0.1424   Epoch: 6   Global Step: 71650   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:55:59,617-Speed 5418.73 samples/sec   Loss 7.3995   LearningRate 0.1424   Epoch: 6   Global Step: 71660   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:56:07,103-Speed 5471.99 samples/sec   Loss 7.4403   LearningRate 0.1424   Epoch: 6   Global Step: 71670   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:56:14,615-Speed 5453.95 samples/sec   Loss 7.3534   LearningRate 0.1423   Epoch: 6   Global Step: 71680   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:56:22,246-Speed 5368.15 samples/sec   Loss 7.3883   LearningRate 0.1423   Epoch: 6   Global Step: 71690   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:56:29,757-Speed 5454.14 samples/sec   Loss 7.4127   LearningRate 0.1423   Epoch: 6   Global Step: 71700   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:56:37,204-Speed 5500.98 samples/sec   Loss 7.4176   LearningRate 0.1423   Epoch: 6   Global Step: 71710   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:56:44,805-Speed 5389.86 samples/sec   Loss 7.4714   LearningRate 0.1422   Epoch: 6   Global Step: 71720   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:56:52,337-Speed 5438.22 samples/sec   Loss 7.4024   LearningRate 0.1422   Epoch: 6   Global Step: 71730   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:56:59,796-Speed 5492.74 samples/sec   Loss 7.4279   LearningRate 0.1422   Epoch: 6   Global Step: 71740   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:57:07,218-Speed 5518.92 samples/sec   Loss 7.4049   LearningRate 0.1422   Epoch: 6   Global Step: 71750   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 10:57:14,726-Speed 5456.49 samples/sec   Loss 7.3998   LearningRate 0.1422   Epoch: 6   Global Step: 71760   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:57:22,219-Speed 5467.41 samples/sec   Loss 7.3561   LearningRate 0.1421   Epoch: 6   Global Step: 71770   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:57:29,681-Speed 5489.82 samples/sec   Loss 7.4330   LearningRate 0.1421   Epoch: 6   Global Step: 71780   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:57:37,154-Speed 5481.94 samples/sec   Loss 7.4127   LearningRate 0.1421   Epoch: 6   Global Step: 71790   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:57:44,665-Speed 5454.30 samples/sec   Loss 7.4639   LearningRate 0.1421   Epoch: 6   Global Step: 71800   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:57:52,133-Speed 5485.55 samples/sec   Loss 7.3996   LearningRate 0.1421   Epoch: 6   Global Step: 71810   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:57:59,624-Speed 5468.26 samples/sec   Loss 7.4481   LearningRate 0.1420   Epoch: 6   Global Step: 71820   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:58:07,181-Speed 5421.31 samples/sec   Loss 7.3912   LearningRate 0.1420   Epoch: 6   Global Step: 71830   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:58:14,661-Speed 5476.48 samples/sec   Loss 7.4082   LearningRate 0.1420   Epoch: 6   Global Step: 71840   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:58:22,242-Speed 5403.50 samples/sec   Loss 7.4316   LearningRate 0.1420   Epoch: 6   Global Step: 71850   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:58:29,796-Speed 5423.70 samples/sec   Loss 7.4206   LearningRate 0.1420   Epoch: 6   Global Step: 71860   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:58:37,228-Speed 5511.67 samples/sec   Loss 7.4156   LearningRate 0.1419   Epoch: 6   Global Step: 71870   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 10:58:44,725-Speed 5464.45 samples/sec   Loss 7.3799   LearningRate 0.1419   Epoch: 6   Global Step: 71880   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:58:52,273-Speed 5427.55 samples/sec   Loss 7.4069   LearningRate 0.1419   Epoch: 6   Global Step: 71890   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:58:59,745-Speed 5482.39 samples/sec   Loss 7.3830   LearningRate 0.1419   Epoch: 6   Global Step: 71900   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:59:07,214-Speed 5485.09 samples/sec   Loss 7.3807   LearningRate 0.1418   Epoch: 6   Global Step: 71910   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:59:14,806-Speed 5395.92 samples/sec   Loss 7.4755   LearningRate 0.1418   Epoch: 6   Global Step: 71920   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:59:22,249-Speed 5503.47 samples/sec   Loss 7.4375   LearningRate 0.1418   Epoch: 6   Global Step: 71930   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:59:29,738-Speed 5470.65 samples/sec   Loss 7.4008   LearningRate 0.1418   Epoch: 6   Global Step: 71940   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:59:37,164-Speed 5516.00 samples/sec   Loss 7.4299   LearningRate 0.1418   Epoch: 6   Global Step: 71950   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:59:44,625-Speed 5490.41 samples/sec   Loss 7.4283   LearningRate 0.1417   Epoch: 6   Global Step: 71960   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:59:52,107-Speed 5475.44 samples/sec   Loss 7.3817   LearningRate 0.1417   Epoch: 6   Global Step: 71970   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 10:59:59,558-Speed 5497.72 samples/sec   Loss 7.4711   LearningRate 0.1417   Epoch: 6   Global Step: 71980   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:00:07,002-Speed 5503.58 samples/sec   Loss 7.4469   LearningRate 0.1417   Epoch: 6   Global Step: 71990   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:00:14,448-Speed 5501.77 samples/sec   Loss 7.4139   LearningRate 0.1417   Epoch: 6   Global Step: 72000   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:00:58,263-[lfw][72000]XNorm: 23.137435
Training: 2022-01-08 11:00:58,264-[lfw][72000]Accuracy-Flip: 0.99800+-0.00245
Training: 2022-01-08 11:00:58,264-[lfw][72000]Accuracy-Highest: 0.99817
Training: 2022-01-08 11:01:50,014-[cfp_fp][72000]XNorm: 20.903826
Training: 2022-01-08 11:01:50,015-[cfp_fp][72000]Accuracy-Flip: 0.98514+-0.00607
Training: 2022-01-08 11:01:50,015-[cfp_fp][72000]Accuracy-Highest: 0.98771
Training: 2022-01-08 11:02:35,556-[agedb_30][72000]XNorm: 23.205493
Training: 2022-01-08 11:02:35,557-[agedb_30][72000]Accuracy-Flip: 0.97267+-0.00629
Training: 2022-01-08 11:02:35,558-[agedb_30][72000]Accuracy-Highest: 0.97667
Training: 2022-01-08 11:02:43,195-Speed 275.37 samples/sec   Loss 7.4430   LearningRate 0.1416   Epoch: 6   Global Step: 72010   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:02:50,950-Speed 5283.04 samples/sec   Loss 7.4033   LearningRate 0.1416   Epoch: 6   Global Step: 72020   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:02:58,530-Speed 5404.97 samples/sec   Loss 7.4265   LearningRate 0.1416   Epoch: 6   Global Step: 72030   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:03:06,085-Speed 5423.24 samples/sec   Loss 7.3904   LearningRate 0.1416   Epoch: 6   Global Step: 72040   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:03:13,570-Speed 5473.36 samples/sec   Loss 7.4322   LearningRate 0.1416   Epoch: 6   Global Step: 72050   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:03:21,169-Speed 5390.96 samples/sec   Loss 7.4442   LearningRate 0.1415   Epoch: 6   Global Step: 72060   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:03:28,716-Speed 5428.00 samples/sec   Loss 7.3863   LearningRate 0.1415   Epoch: 6   Global Step: 72070   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:03:36,273-Speed 5420.70 samples/sec   Loss 7.5002   LearningRate 0.1415   Epoch: 6   Global Step: 72080   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:03:43,752-Speed 5477.61 samples/sec   Loss 7.3720   LearningRate 0.1415   Epoch: 6   Global Step: 72090   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:03:51,221-Speed 5484.31 samples/sec   Loss 7.4472   LearningRate 0.1415   Epoch: 6   Global Step: 72100   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:03:58,850-Speed 5369.81 samples/sec   Loss 7.4218   LearningRate 0.1414   Epoch: 6   Global Step: 72110   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:04:06,458-Speed 5384.19 samples/sec   Loss 7.3658   LearningRate 0.1414   Epoch: 6   Global Step: 72120   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:04:13,937-Speed 5478.09 samples/sec   Loss 7.4780   LearningRate 0.1414   Epoch: 6   Global Step: 72130   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:04:21,384-Speed 5500.53 samples/sec   Loss 7.4093   LearningRate 0.1414   Epoch: 6   Global Step: 72140   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:04:28,844-Speed 5491.32 samples/sec   Loss 7.4041   LearningRate 0.1413   Epoch: 6   Global Step: 72150   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:04:36,280-Speed 5508.82 samples/sec   Loss 7.4560   LearningRate 0.1413   Epoch: 6   Global Step: 72160   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:04:43,755-Speed 5480.38 samples/sec   Loss 7.3314   LearningRate 0.1413   Epoch: 6   Global Step: 72170   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:04:51,275-Speed 5447.66 samples/sec   Loss 7.4699   LearningRate 0.1413   Epoch: 6   Global Step: 72180   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:04:58,781-Speed 5457.64 samples/sec   Loss 7.4015   LearningRate 0.1413   Epoch: 6   Global Step: 72190   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:05:06,212-Speed 5512.60 samples/sec   Loss 7.4158   LearningRate 0.1412   Epoch: 6   Global Step: 72200   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 11:05:13,798-Speed 5400.14 samples/sec   Loss 7.4080   LearningRate 0.1412   Epoch: 6   Global Step: 72210   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 11:05:21,311-Speed 5452.80 samples/sec   Loss 7.3354   LearningRate 0.1412   Epoch: 6   Global Step: 72220   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 11:05:28,968-Speed 5350.06 samples/sec   Loss 7.4161   LearningRate 0.1412   Epoch: 6   Global Step: 72230   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 11:05:36,492-Speed 5444.58 samples/sec   Loss 7.3941   LearningRate 0.1412   Epoch: 6   Global Step: 72240   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 11:05:44,006-Speed 5451.91 samples/sec   Loss 7.3494   LearningRate 0.1411   Epoch: 6   Global Step: 72250   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 11:05:51,519-Speed 5452.56 samples/sec   Loss 7.4295   LearningRate 0.1411   Epoch: 6   Global Step: 72260   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 11:05:59,084-Speed 5415.51 samples/sec   Loss 7.4611   LearningRate 0.1411   Epoch: 6   Global Step: 72270   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 11:06:06,555-Speed 5483.10 samples/sec   Loss 7.3578   LearningRate 0.1411   Epoch: 6   Global Step: 72280   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 11:06:14,063-Speed 5456.11 samples/sec   Loss 7.3727   LearningRate 0.1411   Epoch: 6   Global Step: 72290   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 11:06:21,635-Speed 5409.88 samples/sec   Loss 7.3952   LearningRate 0.1410   Epoch: 6   Global Step: 72300   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:06:29,132-Speed 5464.21 samples/sec   Loss 7.5059   LearningRate 0.1410   Epoch: 6   Global Step: 72310   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:06:36,613-Speed 5476.01 samples/sec   Loss 7.4242   LearningRate 0.1410   Epoch: 6   Global Step: 72320   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:06:44,114-Speed 5461.51 samples/sec   Loss 7.4308   LearningRate 0.1410   Epoch: 6   Global Step: 72330   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:06:51,820-Speed 5315.66 samples/sec   Loss 7.4409   LearningRate 0.1410   Epoch: 6   Global Step: 72340   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:06:59,313-Speed 5467.42 samples/sec   Loss 7.3954   LearningRate 0.1409   Epoch: 6   Global Step: 72350   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:07:06,809-Speed 5465.66 samples/sec   Loss 7.4327   LearningRate 0.1409   Epoch: 6   Global Step: 72360   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:07:14,301-Speed 5468.00 samples/sec   Loss 7.3711   LearningRate 0.1409   Epoch: 6   Global Step: 72370   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:07:21,825-Speed 5444.67 samples/sec   Loss 7.3712   LearningRate 0.1409   Epoch: 6   Global Step: 72380   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:07:29,265-Speed 5505.17 samples/sec   Loss 7.3687   LearningRate 0.1408   Epoch: 6   Global Step: 72390   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:07:36,721-Speed 5494.71 samples/sec   Loss 7.3326   LearningRate 0.1408   Epoch: 6   Global Step: 72400   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:07:44,179-Speed 5493.58 samples/sec   Loss 7.4062   LearningRate 0.1408   Epoch: 6   Global Step: 72410   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:07:51,718-Speed 5433.22 samples/sec   Loss 7.4148   LearningRate 0.1408   Epoch: 6   Global Step: 72420   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:07:59,271-Speed 5423.73 samples/sec   Loss 7.3567   LearningRate 0.1408   Epoch: 6   Global Step: 72430   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:08:06,767-Speed 5465.50 samples/sec   Loss 7.3531   LearningRate 0.1407   Epoch: 6   Global Step: 72440   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:08:14,202-Speed 5509.59 samples/sec   Loss 7.3616   LearningRate 0.1407   Epoch: 6   Global Step: 72450   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:08:21,656-Speed 5495.51 samples/sec   Loss 7.3958   LearningRate 0.1407   Epoch: 6   Global Step: 72460   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:08:29,154-Speed 5463.88 samples/sec   Loss 7.3744   LearningRate 0.1407   Epoch: 6   Global Step: 72470   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:08:36,751-Speed 5392.26 samples/sec   Loss 7.3677   LearningRate 0.1407   Epoch: 6   Global Step: 72480   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:08:44,169-Speed 5522.60 samples/sec   Loss 7.4281   LearningRate 0.1406   Epoch: 6   Global Step: 72490   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:08:51,713-Speed 5430.00 samples/sec   Loss 7.4764   LearningRate 0.1406   Epoch: 6   Global Step: 72500   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:08:59,274-Speed 5417.58 samples/sec   Loss 7.4622   LearningRate 0.1406   Epoch: 6   Global Step: 72510   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:09:06,761-Speed 5471.88 samples/sec   Loss 7.3537   LearningRate 0.1406   Epoch: 6   Global Step: 72520   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:09:14,273-Speed 5453.79 samples/sec   Loss 7.3594   LearningRate 0.1406   Epoch: 6   Global Step: 72530   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:09:21,717-Speed 5503.10 samples/sec   Loss 7.4027   LearningRate 0.1405   Epoch: 6   Global Step: 72540   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:09:29,238-Speed 5446.69 samples/sec   Loss 7.3355   LearningRate 0.1405   Epoch: 6   Global Step: 72550   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:09:36,815-Speed 5406.38 samples/sec   Loss 7.3569   LearningRate 0.1405   Epoch: 6   Global Step: 72560   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:09:44,528-Speed 5311.39 samples/sec   Loss 7.4289   LearningRate 0.1405   Epoch: 6   Global Step: 72570   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:09:52,119-Speed 5396.44 samples/sec   Loss 7.3657   LearningRate 0.1404   Epoch: 6   Global Step: 72580   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:10:16,693-Speed 1666.88 samples/sec   Loss 7.4607   LearningRate 0.1404   Epoch: 7   Global Step: 72590   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:10:24,127-Speed 5511.03 samples/sec   Loss 7.4436   LearningRate 0.1404   Epoch: 7   Global Step: 72600   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:10:31,610-Speed 5474.71 samples/sec   Loss 7.4297   LearningRate 0.1404   Epoch: 7   Global Step: 72610   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:10:39,123-Speed 5452.35 samples/sec   Loss 7.3824   LearningRate 0.1404   Epoch: 7   Global Step: 72620   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:10:46,528-Speed 5531.86 samples/sec   Loss 7.3430   LearningRate 0.1403   Epoch: 7   Global Step: 72630   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:10:53,964-Speed 5509.84 samples/sec   Loss 7.3928   LearningRate 0.1403   Epoch: 7   Global Step: 72640   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:11:01,511-Speed 5427.51 samples/sec   Loss 7.4474   LearningRate 0.1403   Epoch: 7   Global Step: 72650   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:11:08,947-Speed 5509.17 samples/sec   Loss 7.3735   LearningRate 0.1403   Epoch: 7   Global Step: 72660   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:11:16,407-Speed 5491.60 samples/sec   Loss 7.2623   LearningRate 0.1403   Epoch: 7   Global Step: 72670   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:11:23,899-Speed 5467.69 samples/sec   Loss 7.3554   LearningRate 0.1402   Epoch: 7   Global Step: 72680   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:11:31,370-Speed 5483.54 samples/sec   Loss 7.3888   LearningRate 0.1402   Epoch: 7   Global Step: 72690   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:11:38,833-Speed 5489.22 samples/sec   Loss 7.3765   LearningRate 0.1402   Epoch: 7   Global Step: 72700   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:11:46,387-Speed 5422.89 samples/sec   Loss 7.3532   LearningRate 0.1402   Epoch: 7   Global Step: 72710   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:11:53,921-Speed 5437.52 samples/sec   Loss 7.3988   LearningRate 0.1402   Epoch: 7   Global Step: 72720   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:12:01,342-Speed 5520.17 samples/sec   Loss 7.3422   LearningRate 0.1401   Epoch: 7   Global Step: 72730   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:12:08,793-Speed 5497.82 samples/sec   Loss 7.3905   LearningRate 0.1401   Epoch: 7   Global Step: 72740   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:12:16,309-Speed 5450.17 samples/sec   Loss 7.3322   LearningRate 0.1401   Epoch: 7   Global Step: 72750   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:12:23,737-Speed 5515.44 samples/sec   Loss 7.3718   LearningRate 0.1401   Epoch: 7   Global Step: 72760   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:12:31,193-Speed 5494.29 samples/sec   Loss 7.3610   LearningRate 0.1401   Epoch: 7   Global Step: 72770   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:12:38,785-Speed 5395.62 samples/sec   Loss 7.3989   LearningRate 0.1400   Epoch: 7   Global Step: 72780   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:12:46,243-Speed 5492.87 samples/sec   Loss 7.3857   LearningRate 0.1400   Epoch: 7   Global Step: 72790   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:12:53,735-Speed 5468.06 samples/sec   Loss 7.2923   LearningRate 0.1400   Epoch: 7   Global Step: 72800   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 11:13:01,288-Speed 5423.97 samples/sec   Loss 7.3039   LearningRate 0.1400   Epoch: 7   Global Step: 72810   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:13:08,752-Speed 5488.29 samples/sec   Loss 7.3165   LearningRate 0.1399   Epoch: 7   Global Step: 72820   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:13:16,209-Speed 5493.60 samples/sec   Loss 7.3271   LearningRate 0.1399   Epoch: 7   Global Step: 72830   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:13:23,680-Speed 5483.04 samples/sec   Loss 7.4296   LearningRate 0.1399   Epoch: 7   Global Step: 72840   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:13:31,234-Speed 5423.00 samples/sec   Loss 7.3405   LearningRate 0.1399   Epoch: 7   Global Step: 72850   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:13:38,770-Speed 5436.28 samples/sec   Loss 7.3098   LearningRate 0.1399   Epoch: 7   Global Step: 72860   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:13:46,506-Speed 5294.93 samples/sec   Loss 7.3240   LearningRate 0.1398   Epoch: 7   Global Step: 72870   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:13:54,312-Speed 5248.02 samples/sec   Loss 7.3266   LearningRate 0.1398   Epoch: 7   Global Step: 72880   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:14:02,156-Speed 5222.68 samples/sec   Loss 7.3851   LearningRate 0.1398   Epoch: 7   Global Step: 72890   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:14:09,927-Speed 5271.68 samples/sec   Loss 7.2824   LearningRate 0.1398   Epoch: 7   Global Step: 72900   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:14:17,649-Speed 5304.46 samples/sec   Loss 7.3049   LearningRate 0.1398   Epoch: 7   Global Step: 72910   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:14:25,339-Speed 5326.99 samples/sec   Loss 7.3128   LearningRate 0.1397   Epoch: 7   Global Step: 72920   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:14:33,157-Speed 5240.40 samples/sec   Loss 7.3335   LearningRate 0.1397   Epoch: 7   Global Step: 72930   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:14:40,964-Speed 5247.06 samples/sec   Loss 7.4153   LearningRate 0.1397   Epoch: 7   Global Step: 72940   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:14:48,859-Speed 5188.87 samples/sec   Loss 7.3566   LearningRate 0.1397   Epoch: 7   Global Step: 72950   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:14:56,740-Speed 5197.91 samples/sec   Loss 7.4146   LearningRate 0.1397   Epoch: 7   Global Step: 72960   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:15:04,388-Speed 5356.52 samples/sec   Loss 7.3226   LearningRate 0.1396   Epoch: 7   Global Step: 72970   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:15:11,912-Speed 5444.15 samples/sec   Loss 7.3995   LearningRate 0.1396   Epoch: 7   Global Step: 72980   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:15:19,473-Speed 5417.86 samples/sec   Loss 7.3450   LearningRate 0.1396   Epoch: 7   Global Step: 72990   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:15:27,215-Speed 5291.09 samples/sec   Loss 7.3084   LearningRate 0.1396   Epoch: 7   Global Step: 73000   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:15:34,773-Speed 5420.78 samples/sec   Loss 7.3541   LearningRate 0.1396   Epoch: 7   Global Step: 73010   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:15:42,384-Speed 5381.97 samples/sec   Loss 7.3621   LearningRate 0.1395   Epoch: 7   Global Step: 73020   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:15:49,871-Speed 5471.50 samples/sec   Loss 7.3385   LearningRate 0.1395   Epoch: 7   Global Step: 73030   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:15:57,344-Speed 5481.87 samples/sec   Loss 7.4053   LearningRate 0.1395   Epoch: 7   Global Step: 73040   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:16:04,859-Speed 5451.36 samples/sec   Loss 7.3751   LearningRate 0.1395   Epoch: 7   Global Step: 73050   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:16:12,342-Speed 5474.32 samples/sec   Loss 7.3477   LearningRate 0.1395   Epoch: 7   Global Step: 73060   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:16:19,949-Speed 5385.56 samples/sec   Loss 7.3242   LearningRate 0.1394   Epoch: 7   Global Step: 73070   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:16:27,387-Speed 5507.56 samples/sec   Loss 7.3456   LearningRate 0.1394   Epoch: 7   Global Step: 73080   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:16:34,867-Speed 5476.57 samples/sec   Loss 7.3780   LearningRate 0.1394   Epoch: 7   Global Step: 73090   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:16:42,361-Speed 5466.74 samples/sec   Loss 7.2944   LearningRate 0.1394   Epoch: 7   Global Step: 73100   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:16:49,909-Speed 5426.86 samples/sec   Loss 7.3415   LearningRate 0.1393   Epoch: 7   Global Step: 73110   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:16:57,453-Speed 5430.62 samples/sec   Loss 7.3750   LearningRate 0.1393   Epoch: 7   Global Step: 73120   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:17:05,031-Speed 5405.72 samples/sec   Loss 7.3238   LearningRate 0.1393   Epoch: 7   Global Step: 73130   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:17:12,485-Speed 5496.03 samples/sec   Loss 7.3030   LearningRate 0.1393   Epoch: 7   Global Step: 73140   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:17:20,134-Speed 5355.58 samples/sec   Loss 7.3428   LearningRate 0.1393   Epoch: 7   Global Step: 73150   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:17:27,697-Speed 5416.74 samples/sec   Loss 7.3082   LearningRate 0.1392   Epoch: 7   Global Step: 73160   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:17:35,270-Speed 5409.26 samples/sec   Loss 7.2856   LearningRate 0.1392   Epoch: 7   Global Step: 73170   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:17:43,119-Speed 5218.97 samples/sec   Loss 7.3544   LearningRate 0.1392   Epoch: 7   Global Step: 73180   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:17:50,785-Speed 5343.76 samples/sec   Loss 7.3999   LearningRate 0.1392   Epoch: 7   Global Step: 73190   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:17:58,330-Speed 5429.85 samples/sec   Loss 7.3746   LearningRate 0.1392   Epoch: 7   Global Step: 73200   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:18:05,943-Speed 5380.67 samples/sec   Loss 7.3891   LearningRate 0.1391   Epoch: 7   Global Step: 73210   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:18:13,378-Speed 5509.82 samples/sec   Loss 7.3750   LearningRate 0.1391   Epoch: 7   Global Step: 73220   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:18:20,924-Speed 5428.89 samples/sec   Loss 7.3895   LearningRate 0.1391   Epoch: 7   Global Step: 73230   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:18:28,430-Speed 5457.48 samples/sec   Loss 7.4100   LearningRate 0.1391   Epoch: 7   Global Step: 73240   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:18:36,107-Speed 5336.08 samples/sec   Loss 7.3321   LearningRate 0.1391   Epoch: 7   Global Step: 73250   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:18:43,614-Speed 5457.12 samples/sec   Loss 7.2886   LearningRate 0.1390   Epoch: 7   Global Step: 73260   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:18:51,079-Speed 5487.78 samples/sec   Loss 7.2673   LearningRate 0.1390   Epoch: 7   Global Step: 73270   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:18:58,664-Speed 5400.06 samples/sec   Loss 7.3262   LearningRate 0.1390   Epoch: 7   Global Step: 73280   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:19:06,204-Speed 5433.51 samples/sec   Loss 7.3968   LearningRate 0.1390   Epoch: 7   Global Step: 73290   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:19:13,734-Speed 5440.29 samples/sec   Loss 7.3426   LearningRate 0.1390   Epoch: 7   Global Step: 73300   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:19:21,295-Speed 5417.76 samples/sec   Loss 7.2947   LearningRate 0.1389   Epoch: 7   Global Step: 73310   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:19:28,839-Speed 5430.23 samples/sec   Loss 7.3300   LearningRate 0.1389   Epoch: 7   Global Step: 73320   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:19:36,481-Speed 5360.37 samples/sec   Loss 7.4398   LearningRate 0.1389   Epoch: 7   Global Step: 73330   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:19:44,026-Speed 5429.55 samples/sec   Loss 7.3370   LearningRate 0.1389   Epoch: 7   Global Step: 73340   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:19:51,617-Speed 5396.67 samples/sec   Loss 7.4187   LearningRate 0.1388   Epoch: 7   Global Step: 73350   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:19:59,165-Speed 5427.44 samples/sec   Loss 7.3668   LearningRate 0.1388   Epoch: 7   Global Step: 73360   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:20:06,578-Speed 5525.78 samples/sec   Loss 7.2713   LearningRate 0.1388   Epoch: 7   Global Step: 73370   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:20:14,073-Speed 5465.72 samples/sec   Loss 7.2707   LearningRate 0.1388   Epoch: 7   Global Step: 73380   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:20:21,566-Speed 5467.65 samples/sec   Loss 7.3502   LearningRate 0.1388   Epoch: 7   Global Step: 73390   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:20:29,162-Speed 5392.49 samples/sec   Loss 7.3104   LearningRate 0.1387   Epoch: 7   Global Step: 73400   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:20:36,679-Speed 5449.82 samples/sec   Loss 7.3261   LearningRate 0.1387   Epoch: 7   Global Step: 73410   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:20:44,171-Speed 5467.98 samples/sec   Loss 7.2624   LearningRate 0.1387   Epoch: 7   Global Step: 73420   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:20:51,663-Speed 5468.34 samples/sec   Loss 7.4119   LearningRate 0.1387   Epoch: 7   Global Step: 73430   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:20:59,198-Speed 5436.52 samples/sec   Loss 7.3823   LearningRate 0.1387   Epoch: 7   Global Step: 73440   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:21:06,786-Speed 5399.12 samples/sec   Loss 7.3459   LearningRate 0.1386   Epoch: 7   Global Step: 73450   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:21:14,360-Speed 5408.46 samples/sec   Loss 7.3949   LearningRate 0.1386   Epoch: 7   Global Step: 73460   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:21:21,862-Speed 5460.24 samples/sec   Loss 7.3264   LearningRate 0.1386   Epoch: 7   Global Step: 73470   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:21:29,439-Speed 5407.33 samples/sec   Loss 7.3855   LearningRate 0.1386   Epoch: 7   Global Step: 73480   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:21:37,082-Speed 5359.33 samples/sec   Loss 7.3033   LearningRate 0.1386   Epoch: 7   Global Step: 73490   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 11:21:44,649-Speed 5413.91 samples/sec   Loss 7.2858   LearningRate 0.1385   Epoch: 7   Global Step: 73500   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 11:21:52,306-Speed 5350.21 samples/sec   Loss 7.4086   LearningRate 0.1385   Epoch: 7   Global Step: 73510   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 11:21:59,976-Speed 5340.74 samples/sec   Loss 7.3528   LearningRate 0.1385   Epoch: 7   Global Step: 73520   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 11:22:07,568-Speed 5396.17 samples/sec   Loss 7.3109   LearningRate 0.1385   Epoch: 7   Global Step: 73530   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 11:22:15,162-Speed 5394.33 samples/sec   Loss 7.3337   LearningRate 0.1385   Epoch: 7   Global Step: 73540   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 11:22:22,818-Speed 5350.44 samples/sec   Loss 7.3592   LearningRate 0.1384   Epoch: 7   Global Step: 73550   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 11:22:30,364-Speed 5428.76 samples/sec   Loss 7.3481   LearningRate 0.1384   Epoch: 7   Global Step: 73560   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 11:22:37,902-Speed 5434.86 samples/sec   Loss 7.3372   LearningRate 0.1384   Epoch: 7   Global Step: 73570   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 11:22:45,488-Speed 5400.14 samples/sec   Loss 7.3858   LearningRate 0.1384   Epoch: 7   Global Step: 73580   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 11:22:53,058-Speed 5410.72 samples/sec   Loss 7.3065   LearningRate 0.1384   Epoch: 7   Global Step: 73590   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:23:00,736-Speed 5336.04 samples/sec   Loss 7.3381   LearningRate 0.1383   Epoch: 7   Global Step: 73600   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:23:08,277-Speed 5431.92 samples/sec   Loss 7.3055   LearningRate 0.1383   Epoch: 7   Global Step: 73610   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:23:15,812-Speed 5437.03 samples/sec   Loss 7.2350   LearningRate 0.1383   Epoch: 7   Global Step: 73620   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:23:23,313-Speed 5461.26 samples/sec   Loss 7.3341   LearningRate 0.1383   Epoch: 7   Global Step: 73630   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:23:30,914-Speed 5389.50 samples/sec   Loss 7.2410   LearningRate 0.1382   Epoch: 7   Global Step: 73640   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:23:38,511-Speed 5392.25 samples/sec   Loss 7.3459   LearningRate 0.1382   Epoch: 7   Global Step: 73650   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:23:46,101-Speed 5397.11 samples/sec   Loss 7.2982   LearningRate 0.1382   Epoch: 7   Global Step: 73660   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:23:53,610-Speed 5455.58 samples/sec   Loss 7.2504   LearningRate 0.1382   Epoch: 7   Global Step: 73670   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:24:01,126-Speed 5449.82 samples/sec   Loss 7.3471   LearningRate 0.1382   Epoch: 7   Global Step: 73680   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:24:08,697-Speed 5410.88 samples/sec   Loss 7.2519   LearningRate 0.1381   Epoch: 7   Global Step: 73690   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:24:16,164-Speed 5486.64 samples/sec   Loss 7.3081   LearningRate 0.1381   Epoch: 7   Global Step: 73700   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:24:23,700-Speed 5435.99 samples/sec   Loss 7.3066   LearningRate 0.1381   Epoch: 7   Global Step: 73710   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:24:31,287-Speed 5398.69 samples/sec   Loss 7.2883   LearningRate 0.1381   Epoch: 7   Global Step: 73720   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:24:38,929-Speed 5360.81 samples/sec   Loss 7.3668   LearningRate 0.1381   Epoch: 7   Global Step: 73730   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:24:46,408-Speed 5477.62 samples/sec   Loss 7.3485   LearningRate 0.1380   Epoch: 7   Global Step: 73740   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:24:53,881-Speed 5482.03 samples/sec   Loss 7.3353   LearningRate 0.1380   Epoch: 7   Global Step: 73750   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:25:01,357-Speed 5478.98 samples/sec   Loss 7.3647   LearningRate 0.1380   Epoch: 7   Global Step: 73760   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:25:09,002-Speed 5359.00 samples/sec   Loss 7.3358   LearningRate 0.1380   Epoch: 7   Global Step: 73770   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:25:16,433-Speed 5512.70 samples/sec   Loss 7.2709   LearningRate 0.1380   Epoch: 7   Global Step: 73780   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:25:23,911-Speed 5478.19 samples/sec   Loss 7.3466   LearningRate 0.1379   Epoch: 7   Global Step: 73790   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:25:31,346-Speed 5510.13 samples/sec   Loss 7.3815   LearningRate 0.1379   Epoch: 7   Global Step: 73800   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:25:38,854-Speed 5456.05 samples/sec   Loss 7.3244   LearningRate 0.1379   Epoch: 7   Global Step: 73810   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:25:46,348-Speed 5466.56 samples/sec   Loss 7.3181   LearningRate 0.1379   Epoch: 7   Global Step: 73820   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:25:53,950-Speed 5388.63 samples/sec   Loss 7.3172   LearningRate 0.1379   Epoch: 7   Global Step: 73830   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:26:01,501-Speed 5425.70 samples/sec   Loss 7.3322   LearningRate 0.1378   Epoch: 7   Global Step: 73840   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:26:08,974-Speed 5481.36 samples/sec   Loss 7.1700   LearningRate 0.1378   Epoch: 7   Global Step: 73850   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:26:16,470-Speed 5465.03 samples/sec   Loss 7.2920   LearningRate 0.1378   Epoch: 7   Global Step: 73860   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:26:23,995-Speed 5444.46 samples/sec   Loss 7.2975   LearningRate 0.1378   Epoch: 7   Global Step: 73870   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:26:31,583-Speed 5398.01 samples/sec   Loss 7.3256   LearningRate 0.1378   Epoch: 7   Global Step: 73880   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:26:39,140-Speed 5420.69 samples/sec   Loss 7.2928   LearningRate 0.1377   Epoch: 7   Global Step: 73890   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:26:46,598-Speed 5492.94 samples/sec   Loss 7.2635   LearningRate 0.1377   Epoch: 7   Global Step: 73900   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:26:54,113-Speed 5451.78 samples/sec   Loss 7.3294   LearningRate 0.1377   Epoch: 7   Global Step: 73910   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:27:01,618-Speed 5457.67 samples/sec   Loss 7.3042   LearningRate 0.1377   Epoch: 7   Global Step: 73920   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:27:09,093-Speed 5480.00 samples/sec   Loss 7.3062   LearningRate 0.1377   Epoch: 7   Global Step: 73930   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:27:16,649-Speed 5422.41 samples/sec   Loss 7.3311   LearningRate 0.1376   Epoch: 7   Global Step: 73940   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:27:24,166-Speed 5449.35 samples/sec   Loss 7.3221   LearningRate 0.1376   Epoch: 7   Global Step: 73950   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:27:31,749-Speed 5402.43 samples/sec   Loss 7.2737   LearningRate 0.1376   Epoch: 7   Global Step: 73960   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:27:39,309-Speed 5418.31 samples/sec   Loss 7.3335   LearningRate 0.1376   Epoch: 7   Global Step: 73970   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:27:46,805-Speed 5464.76 samples/sec   Loss 7.2530   LearningRate 0.1375   Epoch: 7   Global Step: 73980   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:27:54,340-Speed 5437.27 samples/sec   Loss 7.2838   LearningRate 0.1375   Epoch: 7   Global Step: 73990   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:28:01,898-Speed 5419.65 samples/sec   Loss 7.2819   LearningRate 0.1375   Epoch: 7   Global Step: 74000   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:28:46,382-[lfw][74000]XNorm: 23.041898
Training: 2022-01-08 11:28:46,383-[lfw][74000]Accuracy-Flip: 0.99783+-0.00269
Training: 2022-01-08 11:28:46,383-[lfw][74000]Accuracy-Highest: 0.99817
Training: 2022-01-08 11:29:38,159-[cfp_fp][74000]XNorm: 20.887236
Training: 2022-01-08 11:29:38,160-[cfp_fp][74000]Accuracy-Flip: 0.98371+-0.00539
Training: 2022-01-08 11:29:38,160-[cfp_fp][74000]Accuracy-Highest: 0.98771
Training: 2022-01-08 11:30:24,007-[agedb_30][74000]XNorm: 22.723156
Training: 2022-01-08 11:30:24,008-[agedb_30][74000]Accuracy-Flip: 0.97283+-0.01019
Training: 2022-01-08 11:30:24,008-[agedb_30][74000]Accuracy-Highest: 0.97667
Training: 2022-01-08 11:30:31,601-Speed 273.61 samples/sec   Loss 7.3666   LearningRate 0.1375   Epoch: 7   Global Step: 74010   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:30:39,034-Speed 5512.69 samples/sec   Loss 7.2581   LearningRate 0.1375   Epoch: 7   Global Step: 74020   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:30:46,540-Speed 5457.83 samples/sec   Loss 7.3262   LearningRate 0.1374   Epoch: 7   Global Step: 74030   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:30:54,087-Speed 5428.79 samples/sec   Loss 7.2425   LearningRate 0.1374   Epoch: 7   Global Step: 74040   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:31:01,702-Speed 5379.39 samples/sec   Loss 7.3546   LearningRate 0.1374   Epoch: 7   Global Step: 74050   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:31:09,197-Speed 5466.81 samples/sec   Loss 7.3082   LearningRate 0.1374   Epoch: 7   Global Step: 74060   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 11:31:16,857-Speed 5347.78 samples/sec   Loss 7.2786   LearningRate 0.1374   Epoch: 7   Global Step: 74070   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 11:31:24,441-Speed 5401.30 samples/sec   Loss 7.2803   LearningRate 0.1373   Epoch: 7   Global Step: 74080   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 11:31:31,900-Speed 5492.20 samples/sec   Loss 7.3386   LearningRate 0.1373   Epoch: 7   Global Step: 74090   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 11:31:39,424-Speed 5444.83 samples/sec   Loss 7.2248   LearningRate 0.1373   Epoch: 7   Global Step: 74100   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 11:31:47,019-Speed 5393.70 samples/sec   Loss 7.3724   LearningRate 0.1373   Epoch: 7   Global Step: 74110   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 11:31:54,478-Speed 5491.37 samples/sec   Loss 7.2424   LearningRate 0.1373   Epoch: 7   Global Step: 74120   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 11:32:02,076-Speed 5391.63 samples/sec   Loss 7.3084   LearningRate 0.1372   Epoch: 7   Global Step: 74130   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 11:32:09,538-Speed 5490.47 samples/sec   Loss 7.2460   LearningRate 0.1372   Epoch: 7   Global Step: 74140   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 11:32:16,977-Speed 5506.43 samples/sec   Loss 7.3549   LearningRate 0.1372   Epoch: 7   Global Step: 74150   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 11:32:24,459-Speed 5475.18 samples/sec   Loss 7.2781   LearningRate 0.1372   Epoch: 7   Global Step: 74160   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:32:32,005-Speed 5428.48 samples/sec   Loss 7.3207   LearningRate 0.1372   Epoch: 7   Global Step: 74170   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:32:39,466-Speed 5491.26 samples/sec   Loss 7.2321   LearningRate 0.1371   Epoch: 7   Global Step: 74180   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:32:46,995-Speed 5440.46 samples/sec   Loss 7.2846   LearningRate 0.1371   Epoch: 7   Global Step: 74190   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:32:54,452-Speed 5493.42 samples/sec   Loss 7.2611   LearningRate 0.1371   Epoch: 7   Global Step: 74200   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:33:01,886-Speed 5510.74 samples/sec   Loss 7.3099   LearningRate 0.1371   Epoch: 7   Global Step: 74210   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:33:09,393-Speed 5457.31 samples/sec   Loss 7.2675   LearningRate 0.1371   Epoch: 7   Global Step: 74220   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:33:16,971-Speed 5405.81 samples/sec   Loss 7.2332   LearningRate 0.1370   Epoch: 7   Global Step: 74230   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:33:24,415-Speed 5502.42 samples/sec   Loss 7.2851   LearningRate 0.1370   Epoch: 7   Global Step: 74240   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:33:31,878-Speed 5489.39 samples/sec   Loss 7.2783   LearningRate 0.1370   Epoch: 7   Global Step: 74250   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:33:39,352-Speed 5481.35 samples/sec   Loss 7.3512   LearningRate 0.1370   Epoch: 7   Global Step: 74260   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:33:47,025-Speed 5338.92 samples/sec   Loss 7.2827   LearningRate 0.1369   Epoch: 7   Global Step: 74270   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:33:54,496-Speed 5482.76 samples/sec   Loss 7.2579   LearningRate 0.1369   Epoch: 7   Global Step: 74280   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:34:01,905-Speed 5528.96 samples/sec   Loss 7.2702   LearningRate 0.1369   Epoch: 7   Global Step: 74290   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:34:09,399-Speed 5466.37 samples/sec   Loss 7.3246   LearningRate 0.1369   Epoch: 7   Global Step: 74300   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:34:16,913-Speed 5452.40 samples/sec   Loss 7.3266   LearningRate 0.1369   Epoch: 7   Global Step: 74310   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:34:24,447-Speed 5436.97 samples/sec   Loss 7.2812   LearningRate 0.1368   Epoch: 7   Global Step: 74320   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:34:32,054-Speed 5385.43 samples/sec   Loss 7.2799   LearningRate 0.1368   Epoch: 7   Global Step: 74330   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:34:39,557-Speed 5460.32 samples/sec   Loss 7.2674   LearningRate 0.1368   Epoch: 7   Global Step: 74340   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:34:46,995-Speed 5507.68 samples/sec   Loss 7.2769   LearningRate 0.1368   Epoch: 7   Global Step: 74350   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:34:54,626-Speed 5367.93 samples/sec   Loss 7.3163   LearningRate 0.1368   Epoch: 7   Global Step: 74360   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:35:02,256-Speed 5368.71 samples/sec   Loss 7.3279   LearningRate 0.1367   Epoch: 7   Global Step: 74370   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:35:09,853-Speed 5393.04 samples/sec   Loss 7.2389   LearningRate 0.1367   Epoch: 7   Global Step: 74380   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 11:35:17,381-Speed 5441.29 samples/sec   Loss 7.2787   LearningRate 0.1367   Epoch: 7   Global Step: 74390   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:35:24,949-Speed 5413.22 samples/sec   Loss 7.2613   LearningRate 0.1367   Epoch: 7   Global Step: 74400   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:35:32,384-Speed 5509.66 samples/sec   Loss 7.3326   LearningRate 0.1367   Epoch: 7   Global Step: 74410   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:35:39,883-Speed 5462.99 samples/sec   Loss 7.3081   LearningRate 0.1366   Epoch: 7   Global Step: 74420   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:35:47,349-Speed 5486.37 samples/sec   Loss 7.3071   LearningRate 0.1366   Epoch: 7   Global Step: 74430   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:35:54,925-Speed 5407.41 samples/sec   Loss 7.3342   LearningRate 0.1366   Epoch: 7   Global Step: 74440   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 11:36:02,389-Speed 5489.14 samples/sec   Loss 7.2979   LearningRate 0.1366   Epoch: 7   Global Step: 74450   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:36:09,884-Speed 5465.34 samples/sec   Loss 7.3092   LearningRate 0.1366   Epoch: 7   Global Step: 74460   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:36:17,482-Speed 5391.92 samples/sec   Loss 7.3195   LearningRate 0.1365   Epoch: 7   Global Step: 74470   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:36:25,009-Speed 5442.40 samples/sec   Loss 7.1674   LearningRate 0.1365   Epoch: 7   Global Step: 74480   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:36:32,496-Speed 5471.86 samples/sec   Loss 7.2314   LearningRate 0.1365   Epoch: 7   Global Step: 74490   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:36:40,069-Speed 5409.03 samples/sec   Loss 7.2255   LearningRate 0.1365   Epoch: 7   Global Step: 74500   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:36:47,625-Speed 5421.36 samples/sec   Loss 7.2612   LearningRate 0.1365   Epoch: 7   Global Step: 74510   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:36:55,069-Speed 5503.26 samples/sec   Loss 7.2322   LearningRate 0.1364   Epoch: 7   Global Step: 74520   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:37:02,539-Speed 5484.36 samples/sec   Loss 7.2428   LearningRate 0.1364   Epoch: 7   Global Step: 74530   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:37:10,018-Speed 5477.00 samples/sec   Loss 7.2596   LearningRate 0.1364   Epoch: 7   Global Step: 74540   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:37:17,514-Speed 5465.22 samples/sec   Loss 7.2537   LearningRate 0.1364   Epoch: 7   Global Step: 74550   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:37:25,071-Speed 5420.77 samples/sec   Loss 7.3093   LearningRate 0.1364   Epoch: 7   Global Step: 74560   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:37:32,553-Speed 5475.78 samples/sec   Loss 7.2859   LearningRate 0.1363   Epoch: 7   Global Step: 74570   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:37:40,031-Speed 5478.12 samples/sec   Loss 7.2872   LearningRate 0.1363   Epoch: 7   Global Step: 74580   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:37:47,604-Speed 5409.10 samples/sec   Loss 7.2555   LearningRate 0.1363   Epoch: 7   Global Step: 74590   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:37:55,105-Speed 5461.88 samples/sec   Loss 7.2856   LearningRate 0.1363   Epoch: 7   Global Step: 74600   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:38:02,619-Speed 5451.27 samples/sec   Loss 7.2977   LearningRate 0.1363   Epoch: 7   Global Step: 74610   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:38:10,109-Speed 5469.36 samples/sec   Loss 7.2820   LearningRate 0.1362   Epoch: 7   Global Step: 74620   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:38:17,631-Speed 5446.52 samples/sec   Loss 7.2198   LearningRate 0.1362   Epoch: 7   Global Step: 74630   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:38:25,081-Speed 5498.85 samples/sec   Loss 7.2578   LearningRate 0.1362   Epoch: 7   Global Step: 74640   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:38:32,506-Speed 5516.84 samples/sec   Loss 7.2566   LearningRate 0.1362   Epoch: 7   Global Step: 74650   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:38:40,007-Speed 5461.05 samples/sec   Loss 7.1925   LearningRate 0.1361   Epoch: 7   Global Step: 74660   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:38:47,522-Speed 5451.56 samples/sec   Loss 7.2688   LearningRate 0.1361   Epoch: 7   Global Step: 74670   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:38:55,007-Speed 5473.08 samples/sec   Loss 7.2208   LearningRate 0.1361   Epoch: 7   Global Step: 74680   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:39:02,561-Speed 5423.00 samples/sec   Loss 7.2931   LearningRate 0.1361   Epoch: 7   Global Step: 74690   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:39:10,031-Speed 5483.99 samples/sec   Loss 7.2717   LearningRate 0.1361   Epoch: 7   Global Step: 74700   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:39:17,546-Speed 5451.37 samples/sec   Loss 7.2889   LearningRate 0.1360   Epoch: 7   Global Step: 74710   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 11:39:24,996-Speed 5498.70 samples/sec   Loss 7.2833   LearningRate 0.1360   Epoch: 7   Global Step: 74720   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 11:39:32,437-Speed 5505.60 samples/sec   Loss 7.2244   LearningRate 0.1360   Epoch: 7   Global Step: 74730   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 11:39:39,908-Speed 5483.08 samples/sec   Loss 7.2970   LearningRate 0.1360   Epoch: 7   Global Step: 74740   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 11:39:47,468-Speed 5418.88 samples/sec   Loss 7.1870   LearningRate 0.1360   Epoch: 7   Global Step: 74750   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 11:39:55,128-Speed 5348.45 samples/sec   Loss 7.2078   LearningRate 0.1359   Epoch: 7   Global Step: 74760   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 11:40:02,552-Speed 5517.81 samples/sec   Loss 7.2223   LearningRate 0.1359   Epoch: 7   Global Step: 74770   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 11:40:10,167-Speed 5379.49 samples/sec   Loss 7.2332   LearningRate 0.1359   Epoch: 7   Global Step: 74780   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 11:40:17,689-Speed 5446.06 samples/sec   Loss 7.1839   LearningRate 0.1359   Epoch: 7   Global Step: 74790   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 11:40:25,159-Speed 5484.43 samples/sec   Loss 7.2209   LearningRate 0.1359   Epoch: 7   Global Step: 74800   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 11:40:32,636-Speed 5478.93 samples/sec   Loss 7.2077   LearningRate 0.1358   Epoch: 7   Global Step: 74810   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:40:40,070-Speed 5509.87 samples/sec   Loss 7.2806   LearningRate 0.1358   Epoch: 7   Global Step: 74820   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:40:47,652-Speed 5403.60 samples/sec   Loss 7.2017   LearningRate 0.1358   Epoch: 7   Global Step: 74830   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:40:55,152-Speed 5462.28 samples/sec   Loss 7.2977   LearningRate 0.1358   Epoch: 7   Global Step: 74840   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:41:02,580-Speed 5514.90 samples/sec   Loss 7.2257   LearningRate 0.1358   Epoch: 7   Global Step: 74850   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:41:10,092-Speed 5453.36 samples/sec   Loss 7.2307   LearningRate 0.1357   Epoch: 7   Global Step: 74860   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:41:17,580-Speed 5471.10 samples/sec   Loss 7.2839   LearningRate 0.1357   Epoch: 7   Global Step: 74870   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:41:24,988-Speed 5529.46 samples/sec   Loss 7.2226   LearningRate 0.1357   Epoch: 7   Global Step: 74880   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:41:32,453-Speed 5488.06 samples/sec   Loss 7.2623   LearningRate 0.1357   Epoch: 7   Global Step: 74890   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:41:39,946-Speed 5466.66 samples/sec   Loss 7.2458   LearningRate 0.1357   Epoch: 7   Global Step: 74900   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:41:47,436-Speed 5469.38 samples/sec   Loss 7.2818   LearningRate 0.1356   Epoch: 7   Global Step: 74910   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:41:54,867-Speed 5513.28 samples/sec   Loss 7.2271   LearningRate 0.1356   Epoch: 7   Global Step: 74920   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:42:02,363-Speed 5465.00 samples/sec   Loss 7.2636   LearningRate 0.1356   Epoch: 7   Global Step: 74930   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:42:09,894-Speed 5439.55 samples/sec   Loss 7.2812   LearningRate 0.1356   Epoch: 7   Global Step: 74940   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:42:17,370-Speed 5479.70 samples/sec   Loss 7.3073   LearningRate 0.1356   Epoch: 7   Global Step: 74950   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:42:24,789-Speed 5522.14 samples/sec   Loss 7.2017   LearningRate 0.1355   Epoch: 7   Global Step: 74960   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:42:32,482-Speed 5324.63 samples/sec   Loss 7.2081   LearningRate 0.1355   Epoch: 7   Global Step: 74970   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:42:39,959-Speed 5478.74 samples/sec   Loss 7.2351   LearningRate 0.1355   Epoch: 7   Global Step: 74980   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:42:47,415-Speed 5494.66 samples/sec   Loss 7.2368   LearningRate 0.1355   Epoch: 7   Global Step: 74990   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:42:54,997-Speed 5403.60 samples/sec   Loss 7.2188   LearningRate 0.1355   Epoch: 7   Global Step: 75000   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:43:02,499-Speed 5460.60 samples/sec   Loss 7.2893   LearningRate 0.1354   Epoch: 7   Global Step: 75010   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:43:10,192-Speed 5324.74 samples/sec   Loss 7.2450   LearningRate 0.1354   Epoch: 7   Global Step: 75020   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:43:17,698-Speed 5457.58 samples/sec   Loss 7.1920   LearningRate 0.1354   Epoch: 7   Global Step: 75030   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:43:25,159-Speed 5490.99 samples/sec   Loss 7.2481   LearningRate 0.1354   Epoch: 7   Global Step: 75040   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:43:32,679-Speed 5447.32 samples/sec   Loss 7.2095   LearningRate 0.1353   Epoch: 7   Global Step: 75050   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:43:40,129-Speed 5499.03 samples/sec   Loss 7.2408   LearningRate 0.1353   Epoch: 7   Global Step: 75060   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:43:47,573-Speed 5503.32 samples/sec   Loss 7.2067   LearningRate 0.1353   Epoch: 7   Global Step: 75070   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:43:55,040-Speed 5486.10 samples/sec   Loss 7.1844   LearningRate 0.1353   Epoch: 7   Global Step: 75080   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:44:02,520-Speed 5477.10 samples/sec   Loss 7.2050   LearningRate 0.1353   Epoch: 7   Global Step: 75090   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:44:10,011-Speed 5468.08 samples/sec   Loss 7.2196   LearningRate 0.1352   Epoch: 7   Global Step: 75100   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:44:17,619-Speed 5384.51 samples/sec   Loss 7.2529   LearningRate 0.1352   Epoch: 7   Global Step: 75110   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:44:25,055-Speed 5509.32 samples/sec   Loss 7.2293   LearningRate 0.1352   Epoch: 7   Global Step: 75120   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:44:32,537-Speed 5475.44 samples/sec   Loss 7.3014   LearningRate 0.1352   Epoch: 7   Global Step: 75130   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:44:40,072-Speed 5436.60 samples/sec   Loss 7.2711   LearningRate 0.1352   Epoch: 7   Global Step: 75140   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:44:47,510-Speed 5507.84 samples/sec   Loss 7.2253   LearningRate 0.1351   Epoch: 7   Global Step: 75150   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:44:54,900-Speed 5543.04 samples/sec   Loss 7.1776   LearningRate 0.1351   Epoch: 7   Global Step: 75160   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:45:02,381-Speed 5476.15 samples/sec   Loss 7.2629   LearningRate 0.1351   Epoch: 7   Global Step: 75170   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:45:09,912-Speed 5439.91 samples/sec   Loss 7.1792   LearningRate 0.1351   Epoch: 7   Global Step: 75180   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:45:17,501-Speed 5397.30 samples/sec   Loss 7.2127   LearningRate 0.1351   Epoch: 7   Global Step: 75190   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:45:24,984-Speed 5475.25 samples/sec   Loss 7.2911   LearningRate 0.1350   Epoch: 7   Global Step: 75200   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:45:32,477-Speed 5467.30 samples/sec   Loss 7.2318   LearningRate 0.1350   Epoch: 7   Global Step: 75210   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:45:40,039-Speed 5416.64 samples/sec   Loss 7.2653   LearningRate 0.1350   Epoch: 7   Global Step: 75220   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:45:47,469-Speed 5513.60 samples/sec   Loss 7.1780   LearningRate 0.1350   Epoch: 7   Global Step: 75230   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:45:54,942-Speed 5482.30 samples/sec   Loss 7.2854   LearningRate 0.1350   Epoch: 7   Global Step: 75240   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:46:02,407-Speed 5487.94 samples/sec   Loss 7.2213   LearningRate 0.1349   Epoch: 7   Global Step: 75250   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:46:09,929-Speed 5446.09 samples/sec   Loss 7.2220   LearningRate 0.1349   Epoch: 7   Global Step: 75260   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:46:17,425-Speed 5464.59 samples/sec   Loss 7.2710   LearningRate 0.1349   Epoch: 7   Global Step: 75270   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:46:24,861-Speed 5508.68 samples/sec   Loss 7.2296   LearningRate 0.1349   Epoch: 7   Global Step: 75280   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:46:32,285-Speed 5518.29 samples/sec   Loss 7.2335   LearningRate 0.1349   Epoch: 7   Global Step: 75290   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:46:39,738-Speed 5496.91 samples/sec   Loss 7.1901   LearningRate 0.1348   Epoch: 7   Global Step: 75300   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:46:47,297-Speed 5419.14 samples/sec   Loss 7.1855   LearningRate 0.1348   Epoch: 7   Global Step: 75310   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:46:54,747-Speed 5498.84 samples/sec   Loss 7.2048   LearningRate 0.1348   Epoch: 7   Global Step: 75320   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:47:02,302-Speed 5422.66 samples/sec   Loss 7.2707   LearningRate 0.1348   Epoch: 7   Global Step: 75330   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:47:09,839-Speed 5435.30 samples/sec   Loss 7.2250   LearningRate 0.1348   Epoch: 7   Global Step: 75340   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:47:17,317-Speed 5477.38 samples/sec   Loss 7.1963   LearningRate 0.1347   Epoch: 7   Global Step: 75350   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:47:24,767-Speed 5499.58 samples/sec   Loss 7.2482   LearningRate 0.1347   Epoch: 7   Global Step: 75360   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:47:32,264-Speed 5463.86 samples/sec   Loss 7.1944   LearningRate 0.1347   Epoch: 7   Global Step: 75370   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:47:39,894-Speed 5369.33 samples/sec   Loss 7.2216   LearningRate 0.1347   Epoch: 7   Global Step: 75380   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:47:47,443-Speed 5426.55 samples/sec   Loss 7.2841   LearningRate 0.1347   Epoch: 7   Global Step: 75390   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:47:54,882-Speed 5506.49 samples/sec   Loss 7.2369   LearningRate 0.1346   Epoch: 7   Global Step: 75400   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:48:02,325-Speed 5504.35 samples/sec   Loss 7.2126   LearningRate 0.1346   Epoch: 7   Global Step: 75410   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:48:09,941-Speed 5378.66 samples/sec   Loss 7.1815   LearningRate 0.1346   Epoch: 7   Global Step: 75420   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:48:17,591-Speed 5355.26 samples/sec   Loss 7.2742   LearningRate 0.1346   Epoch: 7   Global Step: 75430   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:48:25,166-Speed 5407.41 samples/sec   Loss 7.2570   LearningRate 0.1346   Epoch: 7   Global Step: 75440   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:48:32,709-Speed 5430.86 samples/sec   Loss 7.2319   LearningRate 0.1345   Epoch: 7   Global Step: 75450   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:48:40,373-Speed 5345.47 samples/sec   Loss 7.1777   LearningRate 0.1345   Epoch: 7   Global Step: 75460   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:48:47,871-Speed 5463.39 samples/sec   Loss 7.2297   LearningRate 0.1345   Epoch: 7   Global Step: 75470   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:48:55,453-Speed 5402.95 samples/sec   Loss 7.2317   LearningRate 0.1345   Epoch: 7   Global Step: 75480   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:49:02,886-Speed 5511.06 samples/sec   Loss 7.2578   LearningRate 0.1345   Epoch: 7   Global Step: 75490   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:49:10,382-Speed 5465.47 samples/sec   Loss 7.1194   LearningRate 0.1344   Epoch: 7   Global Step: 75500   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:49:17,863-Speed 5475.47 samples/sec   Loss 7.2470   LearningRate 0.1344   Epoch: 7   Global Step: 75510   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:49:25,324-Speed 5490.21 samples/sec   Loss 7.2002   LearningRate 0.1344   Epoch: 7   Global Step: 75520   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:49:32,844-Speed 5447.63 samples/sec   Loss 7.2368   LearningRate 0.1344   Epoch: 7   Global Step: 75530   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:49:40,478-Speed 5366.75 samples/sec   Loss 7.2945   LearningRate 0.1343   Epoch: 7   Global Step: 75540   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:49:48,049-Speed 5410.98 samples/sec   Loss 7.3033   LearningRate 0.1343   Epoch: 7   Global Step: 75550   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:49:55,539-Speed 5468.75 samples/sec   Loss 7.1816   LearningRate 0.1343   Epoch: 7   Global Step: 75560   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:50:03,016-Speed 5479.27 samples/sec   Loss 7.1776   LearningRate 0.1343   Epoch: 7   Global Step: 75570   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:50:10,597-Speed 5403.70 samples/sec   Loss 7.2363   LearningRate 0.1343   Epoch: 7   Global Step: 75580   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:50:18,054-Speed 5493.28 samples/sec   Loss 7.2536   LearningRate 0.1342   Epoch: 7   Global Step: 75590   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:50:25,498-Speed 5503.05 samples/sec   Loss 7.2474   LearningRate 0.1342   Epoch: 7   Global Step: 75600   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:50:32,958-Speed 5490.88 samples/sec   Loss 7.1820   LearningRate 0.1342   Epoch: 7   Global Step: 75610   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:50:40,429-Speed 5483.75 samples/sec   Loss 7.1804   LearningRate 0.1342   Epoch: 7   Global Step: 75620   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:50:47,896-Speed 5485.88 samples/sec   Loss 7.2017   LearningRate 0.1342   Epoch: 7   Global Step: 75630   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:50:55,382-Speed 5472.38 samples/sec   Loss 7.1600   LearningRate 0.1341   Epoch: 7   Global Step: 75640   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:51:02,955-Speed 5409.33 samples/sec   Loss 7.1844   LearningRate 0.1341   Epoch: 7   Global Step: 75650   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:51:10,465-Speed 5455.07 samples/sec   Loss 7.2601   LearningRate 0.1341   Epoch: 7   Global Step: 75660   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:51:17,928-Speed 5489.18 samples/sec   Loss 7.2404   LearningRate 0.1341   Epoch: 7   Global Step: 75670   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:51:25,322-Speed 5540.10 samples/sec   Loss 7.2999   LearningRate 0.1341   Epoch: 7   Global Step: 75680   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:51:32,753-Speed 5513.10 samples/sec   Loss 7.1725   LearningRate 0.1340   Epoch: 7   Global Step: 75690   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:51:40,194-Speed 5505.17 samples/sec   Loss 7.1468   LearningRate 0.1340   Epoch: 7   Global Step: 75700   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:51:47,677-Speed 5474.51 samples/sec   Loss 7.2149   LearningRate 0.1340   Epoch: 7   Global Step: 75710   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:51:55,093-Speed 5523.55 samples/sec   Loss 7.1829   LearningRate 0.1340   Epoch: 7   Global Step: 75720   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:52:02,571-Speed 5477.86 samples/sec   Loss 7.1982   LearningRate 0.1340   Epoch: 7   Global Step: 75730   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:52:10,076-Speed 5458.85 samples/sec   Loss 7.1932   LearningRate 0.1339   Epoch: 7   Global Step: 75740   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:52:17,603-Speed 5442.54 samples/sec   Loss 7.2428   LearningRate 0.1339   Epoch: 7   Global Step: 75750   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:52:25,022-Speed 5521.94 samples/sec   Loss 7.2288   LearningRate 0.1339   Epoch: 7   Global Step: 75760   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:52:32,527-Speed 5457.59 samples/sec   Loss 7.2045   LearningRate 0.1339   Epoch: 7   Global Step: 75770   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:52:40,003-Speed 5479.70 samples/sec   Loss 7.1802   LearningRate 0.1339   Epoch: 7   Global Step: 75780   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:52:47,479-Speed 5479.65 samples/sec   Loss 7.2436   LearningRate 0.1338   Epoch: 7   Global Step: 75790   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:52:54,997-Speed 5449.17 samples/sec   Loss 7.1145   LearningRate 0.1338   Epoch: 7   Global Step: 75800   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:53:02,511-Speed 5451.89 samples/sec   Loss 7.1872   LearningRate 0.1338   Epoch: 7   Global Step: 75810   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:53:10,097-Speed 5400.43 samples/sec   Loss 7.1165   LearningRate 0.1338   Epoch: 7   Global Step: 75820   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:53:17,554-Speed 5493.55 samples/sec   Loss 7.2490   LearningRate 0.1338   Epoch: 7   Global Step: 75830   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:53:25,014-Speed 5490.96 samples/sec   Loss 7.2536   LearningRate 0.1337   Epoch: 7   Global Step: 75840   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:53:32,537-Speed 5444.84 samples/sec   Loss 7.2176   LearningRate 0.1337   Epoch: 7   Global Step: 75850   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:53:40,042-Speed 5458.99 samples/sec   Loss 7.1163   LearningRate 0.1337   Epoch: 7   Global Step: 75860   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:53:47,640-Speed 5391.32 samples/sec   Loss 7.1285   LearningRate 0.1337   Epoch: 7   Global Step: 75870   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:53:55,153-Speed 5453.03 samples/sec   Loss 7.1984   LearningRate 0.1337   Epoch: 7   Global Step: 75880   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:54:02,697-Speed 5429.33 samples/sec   Loss 7.2644   LearningRate 0.1336   Epoch: 7   Global Step: 75890   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:54:10,230-Speed 5438.15 samples/sec   Loss 7.1686   LearningRate 0.1336   Epoch: 7   Global Step: 75900   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:54:17,737-Speed 5457.04 samples/sec   Loss 7.2091   LearningRate 0.1336   Epoch: 7   Global Step: 75910   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:54:25,235-Speed 5463.46 samples/sec   Loss 7.2030   LearningRate 0.1336   Epoch: 7   Global Step: 75920   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:54:32,763-Speed 5442.32 samples/sec   Loss 7.1538   LearningRate 0.1336   Epoch: 7   Global Step: 75930   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:54:40,387-Speed 5373.19 samples/sec   Loss 7.1385   LearningRate 0.1335   Epoch: 7   Global Step: 75940   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:54:47,912-Speed 5443.81 samples/sec   Loss 7.1715   LearningRate 0.1335   Epoch: 7   Global Step: 75950   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:54:55,479-Speed 5413.66 samples/sec   Loss 7.3184   LearningRate 0.1335   Epoch: 7   Global Step: 75960   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:55:03,101-Speed 5374.41 samples/sec   Loss 7.2225   LearningRate 0.1335   Epoch: 7   Global Step: 75970   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:55:10,644-Speed 5431.04 samples/sec   Loss 7.2583   LearningRate 0.1335   Epoch: 7   Global Step: 75980   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:55:18,298-Speed 5352.28 samples/sec   Loss 7.2212   LearningRate 0.1334   Epoch: 7   Global Step: 75990   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:55:25,745-Speed 5501.08 samples/sec   Loss 7.1384   LearningRate 0.1334   Epoch: 7   Global Step: 76000   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:56:10,041-[lfw][76000]XNorm: 22.878465
Training: 2022-01-08 11:56:10,042-[lfw][76000]Accuracy-Flip: 0.99717+-0.00279
Training: 2022-01-08 11:56:10,043-[lfw][76000]Accuracy-Highest: 0.99817
Training: 2022-01-08 11:57:01,801-[cfp_fp][76000]XNorm: 20.644460
Training: 2022-01-08 11:57:01,802-[cfp_fp][76000]Accuracy-Flip: 0.98629+-0.00487
Training: 2022-01-08 11:57:01,803-[cfp_fp][76000]Accuracy-Highest: 0.98771
Training: 2022-01-08 11:57:47,409-[agedb_30][76000]XNorm: 22.711006
Training: 2022-01-08 11:57:47,410-[agedb_30][76000]Accuracy-Flip: 0.97450+-0.00785
Training: 2022-01-08 11:57:47,411-[agedb_30][76000]Accuracy-Highest: 0.97667
Training: 2022-01-08 11:57:55,012-Speed 274.41 samples/sec   Loss 7.1943   LearningRate 0.1334   Epoch: 7   Global Step: 76010   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:58:02,469-Speed 5494.93 samples/sec   Loss 7.2153   LearningRate 0.1334   Epoch: 7   Global Step: 76020   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:58:10,035-Speed 5415.70 samples/sec   Loss 7.1822   LearningRate 0.1334   Epoch: 7   Global Step: 76030   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:58:17,409-Speed 5556.47 samples/sec   Loss 7.1313   LearningRate 0.1333   Epoch: 7   Global Step: 76040   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 11:58:24,894-Speed 5473.45 samples/sec   Loss 7.1569   LearningRate 0.1333   Epoch: 7   Global Step: 76050   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:58:32,446-Speed 5424.48 samples/sec   Loss 7.1722   LearningRate 0.1333   Epoch: 7   Global Step: 76060   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:58:40,022-Speed 5407.43 samples/sec   Loss 7.2019   LearningRate 0.1333   Epoch: 7   Global Step: 76070   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:58:47,491-Speed 5484.84 samples/sec   Loss 7.1185   LearningRate 0.1333   Epoch: 7   Global Step: 76080   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:58:55,082-Speed 5398.10 samples/sec   Loss 7.1450   LearningRate 0.1332   Epoch: 7   Global Step: 76090   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:59:02,686-Speed 5387.48 samples/sec   Loss 7.1939   LearningRate 0.1332   Epoch: 7   Global Step: 76100   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:59:10,201-Speed 5451.46 samples/sec   Loss 7.1721   LearningRate 0.1332   Epoch: 7   Global Step: 76110   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:59:17,745-Speed 5430.07 samples/sec   Loss 7.1680   LearningRate 0.1332   Epoch: 7   Global Step: 76120   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:59:25,260-Speed 5450.38 samples/sec   Loss 7.1719   LearningRate 0.1331   Epoch: 7   Global Step: 76130   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 11:59:32,688-Speed 5515.77 samples/sec   Loss 7.1155   LearningRate 0.1331   Epoch: 7   Global Step: 76140   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 11:59:40,312-Speed 5373.33 samples/sec   Loss 7.1865   LearningRate 0.1331   Epoch: 7   Global Step: 76150   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 11:59:47,819-Speed 5456.99 samples/sec   Loss 7.2045   LearningRate 0.1331   Epoch: 7   Global Step: 76160   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 11:59:55,334-Speed 5450.97 samples/sec   Loss 7.1701   LearningRate 0.1331   Epoch: 7   Global Step: 76170   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:00:02,850-Speed 5450.40 samples/sec   Loss 7.2086   LearningRate 0.1330   Epoch: 7   Global Step: 76180   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:00:10,316-Speed 5487.35 samples/sec   Loss 7.1375   LearningRate 0.1330   Epoch: 7   Global Step: 76190   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:00:17,775-Speed 5491.48 samples/sec   Loss 7.1649   LearningRate 0.1330   Epoch: 7   Global Step: 76200   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:00:25,295-Speed 5447.33 samples/sec   Loss 7.1957   LearningRate 0.1330   Epoch: 7   Global Step: 76210   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:00:32,776-Speed 5475.35 samples/sec   Loss 7.2206   LearningRate 0.1330   Epoch: 7   Global Step: 76220   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:00:40,240-Speed 5489.06 samples/sec   Loss 7.1968   LearningRate 0.1329   Epoch: 7   Global Step: 76230   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:00:47,787-Speed 5428.25 samples/sec   Loss 7.1278   LearningRate 0.1329   Epoch: 7   Global Step: 76240   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:00:55,304-Speed 5449.42 samples/sec   Loss 7.2051   LearningRate 0.1329   Epoch: 7   Global Step: 76250   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:01:02,918-Speed 5380.06 samples/sec   Loss 7.1614   LearningRate 0.1329   Epoch: 7   Global Step: 76260   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:01:10,366-Speed 5500.61 samples/sec   Loss 7.1577   LearningRate 0.1329   Epoch: 7   Global Step: 76270   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:01:17,842-Speed 5479.64 samples/sec   Loss 7.2243   LearningRate 0.1328   Epoch: 7   Global Step: 76280   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:01:25,333-Speed 5468.06 samples/sec   Loss 7.1005   LearningRate 0.1328   Epoch: 7   Global Step: 76290   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:01:32,845-Speed 5453.40 samples/sec   Loss 7.1172   LearningRate 0.1328   Epoch: 7   Global Step: 76300   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:01:40,320-Speed 5480.57 samples/sec   Loss 7.1546   LearningRate 0.1328   Epoch: 7   Global Step: 76310   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:01:47,760-Speed 5506.18 samples/sec   Loss 7.1468   LearningRate 0.1328   Epoch: 7   Global Step: 76320   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:01:55,269-Speed 5455.80 samples/sec   Loss 7.1680   LearningRate 0.1327   Epoch: 7   Global Step: 76330   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:02:02,863-Speed 5394.41 samples/sec   Loss 7.2027   LearningRate 0.1327   Epoch: 7   Global Step: 76340   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:02:10,443-Speed 5404.11 samples/sec   Loss 7.1728   LearningRate 0.1327   Epoch: 7   Global Step: 76350   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:02:17,951-Speed 5456.65 samples/sec   Loss 7.1763   LearningRate 0.1327   Epoch: 7   Global Step: 76360   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:02:25,418-Speed 5486.19 samples/sec   Loss 7.1208   LearningRate 0.1327   Epoch: 7   Global Step: 76370   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:02:32,907-Speed 5470.08 samples/sec   Loss 7.1383   LearningRate 0.1326   Epoch: 7   Global Step: 76380   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:02:40,442-Speed 5436.91 samples/sec   Loss 7.1199   LearningRate 0.1326   Epoch: 7   Global Step: 76390   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:02:47,902-Speed 5491.21 samples/sec   Loss 7.1040   LearningRate 0.1326   Epoch: 7   Global Step: 76400   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:02:55,386-Speed 5473.68 samples/sec   Loss 7.1032   LearningRate 0.1326   Epoch: 7   Global Step: 76410   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:03:02,876-Speed 5469.51 samples/sec   Loss 7.2159   LearningRate 0.1326   Epoch: 7   Global Step: 76420   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:03:10,374-Speed 5463.55 samples/sec   Loss 7.1639   LearningRate 0.1325   Epoch: 7   Global Step: 76430   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:03:17,879-Speed 5458.95 samples/sec   Loss 7.1869   LearningRate 0.1325   Epoch: 7   Global Step: 76440   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:03:25,367-Speed 5470.82 samples/sec   Loss 7.2045   LearningRate 0.1325   Epoch: 7   Global Step: 76450   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:03:32,885-Speed 5448.43 samples/sec   Loss 7.1806   LearningRate 0.1325   Epoch: 7   Global Step: 76460   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:03:40,329-Speed 5503.67 samples/sec   Loss 7.1212   LearningRate 0.1325   Epoch: 7   Global Step: 76470   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:03:47,832-Speed 5460.00 samples/sec   Loss 7.1668   LearningRate 0.1324   Epoch: 7   Global Step: 76480   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:03:55,259-Speed 5515.88 samples/sec   Loss 7.1431   LearningRate 0.1324   Epoch: 7   Global Step: 76490   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:04:02,829-Speed 5410.73 samples/sec   Loss 7.1446   LearningRate 0.1324   Epoch: 7   Global Step: 76500   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:04:10,443-Speed 5380.96 samples/sec   Loss 7.1230   LearningRate 0.1324   Epoch: 7   Global Step: 76510   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:04:18,062-Speed 5376.73 samples/sec   Loss 7.0999   LearningRate 0.1324   Epoch: 7   Global Step: 76520   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:04:25,631-Speed 5412.42 samples/sec   Loss 7.1704   LearningRate 0.1323   Epoch: 7   Global Step: 76530   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:04:33,188-Speed 5420.50 samples/sec   Loss 7.1676   LearningRate 0.1323   Epoch: 7   Global Step: 76540   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:04:40,855-Speed 5343.17 samples/sec   Loss 7.1335   LearningRate 0.1323   Epoch: 7   Global Step: 76550   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:04:48,383-Speed 5441.28 samples/sec   Loss 7.1738   LearningRate 0.1323   Epoch: 7   Global Step: 76560   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:04:55,978-Speed 5394.22 samples/sec   Loss 7.1308   LearningRate 0.1323   Epoch: 7   Global Step: 76570   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:05:03,566-Speed 5398.89 samples/sec   Loss 7.1362   LearningRate 0.1322   Epoch: 7   Global Step: 76580   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:05:11,097-Speed 5439.54 samples/sec   Loss 7.1110   LearningRate 0.1322   Epoch: 7   Global Step: 76590   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:05:18,709-Speed 5381.06 samples/sec   Loss 7.1963   LearningRate 0.1322   Epoch: 7   Global Step: 76600   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:05:26,317-Speed 5384.74 samples/sec   Loss 7.1065   LearningRate 0.1322   Epoch: 7   Global Step: 76610   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:05:33,845-Speed 5442.05 samples/sec   Loss 7.1126   LearningRate 0.1322   Epoch: 7   Global Step: 76620   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:05:41,406-Speed 5418.10 samples/sec   Loss 7.0862   LearningRate 0.1321   Epoch: 7   Global Step: 76630   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:05:48,887-Speed 5475.81 samples/sec   Loss 7.0957   LearningRate 0.1321   Epoch: 7   Global Step: 76640   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:05:56,368-Speed 5475.88 samples/sec   Loss 7.1005   LearningRate 0.1321   Epoch: 7   Global Step: 76650   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:06:04,018-Speed 5354.98 samples/sec   Loss 7.1102   LearningRate 0.1321   Epoch: 7   Global Step: 76660   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:06:11,765-Speed 5288.21 samples/sec   Loss 7.1126   LearningRate 0.1321   Epoch: 7   Global Step: 76670   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:06:19,322-Speed 5421.23 samples/sec   Loss 7.0827   LearningRate 0.1320   Epoch: 7   Global Step: 76680   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:06:26,801-Speed 5476.91 samples/sec   Loss 7.1762   LearningRate 0.1320   Epoch: 7   Global Step: 76690   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:06:34,368-Speed 5413.75 samples/sec   Loss 7.1132   LearningRate 0.1320   Epoch: 7   Global Step: 76700   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:06:41,932-Speed 5416.24 samples/sec   Loss 7.1405   LearningRate 0.1320   Epoch: 7   Global Step: 76710   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:06:49,521-Speed 5397.72 samples/sec   Loss 7.1456   LearningRate 0.1320   Epoch: 7   Global Step: 76720   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:06:56,995-Speed 5480.92 samples/sec   Loss 7.1310   LearningRate 0.1319   Epoch: 7   Global Step: 76730   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:07:04,576-Speed 5403.52 samples/sec   Loss 7.1261   LearningRate 0.1319   Epoch: 7   Global Step: 76740   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:07:12,192-Speed 5379.54 samples/sec   Loss 7.1420   LearningRate 0.1319   Epoch: 7   Global Step: 76750   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:07:19,787-Speed 5393.49 samples/sec   Loss 7.1669   LearningRate 0.1319   Epoch: 7   Global Step: 76760   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:07:27,359-Speed 5409.89 samples/sec   Loss 7.1353   LearningRate 0.1319   Epoch: 7   Global Step: 76770   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:07:34,853-Speed 5466.04 samples/sec   Loss 7.1317   LearningRate 0.1318   Epoch: 7   Global Step: 76780   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:07:42,338-Speed 5473.34 samples/sec   Loss 7.1246   LearningRate 0.1318   Epoch: 7   Global Step: 76790   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:07:49,932-Speed 5394.59 samples/sec   Loss 7.1215   LearningRate 0.1318   Epoch: 7   Global Step: 76800   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:07:57,473-Speed 5431.82 samples/sec   Loss 7.1631   LearningRate 0.1318   Epoch: 7   Global Step: 76810   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:08:05,078-Speed 5387.15 samples/sec   Loss 7.0952   LearningRate 0.1318   Epoch: 7   Global Step: 76820   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:08:12,622-Speed 5430.37 samples/sec   Loss 7.0788   LearningRate 0.1317   Epoch: 7   Global Step: 76830   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:08:20,404-Speed 5263.70 samples/sec   Loss 7.1348   LearningRate 0.1317   Epoch: 7   Global Step: 76840   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:08:27,916-Speed 5453.14 samples/sec   Loss 7.1276   LearningRate 0.1317   Epoch: 7   Global Step: 76850   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:08:35,489-Speed 5409.53 samples/sec   Loss 7.1393   LearningRate 0.1317   Epoch: 7   Global Step: 76860   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:08:42,900-Speed 5528.13 samples/sec   Loss 7.0658   LearningRate 0.1317   Epoch: 7   Global Step: 76870   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:08:50,390-Speed 5468.81 samples/sec   Loss 7.1556   LearningRate 0.1316   Epoch: 7   Global Step: 76880   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:08:57,924-Speed 5437.42 samples/sec   Loss 7.1329   LearningRate 0.1316   Epoch: 7   Global Step: 76890   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:09:05,440-Speed 5450.07 samples/sec   Loss 7.1290   LearningRate 0.1316   Epoch: 7   Global Step: 76900   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:09:13,034-Speed 5394.90 samples/sec   Loss 7.1428   LearningRate 0.1316   Epoch: 7   Global Step: 76910   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:09:20,605-Speed 5410.75 samples/sec   Loss 7.1636   LearningRate 0.1316   Epoch: 7   Global Step: 76920   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:09:28,059-Speed 5496.03 samples/sec   Loss 7.1419   LearningRate 0.1315   Epoch: 7   Global Step: 76930   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:09:35,591-Speed 5438.63 samples/sec   Loss 7.0836   LearningRate 0.1315   Epoch: 7   Global Step: 76940   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:09:43,083-Speed 5467.86 samples/sec   Loss 7.1457   LearningRate 0.1315   Epoch: 7   Global Step: 76950   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:09:50,630-Speed 5428.41 samples/sec   Loss 7.1089   LearningRate 0.1315   Epoch: 7   Global Step: 76960   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:09:58,084-Speed 5495.31 samples/sec   Loss 7.1767   LearningRate 0.1315   Epoch: 7   Global Step: 76970   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:10:05,553-Speed 5485.41 samples/sec   Loss 7.1552   LearningRate 0.1314   Epoch: 7   Global Step: 76980   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:10:12,995-Speed 5504.37 samples/sec   Loss 7.1637   LearningRate 0.1314   Epoch: 7   Global Step: 76990   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:10:20,522-Speed 5442.65 samples/sec   Loss 7.1518   LearningRate 0.1314   Epoch: 7   Global Step: 77000   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:10:28,031-Speed 5455.61 samples/sec   Loss 7.1687   LearningRate 0.1314   Epoch: 7   Global Step: 77010   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:10:35,653-Speed 5374.27 samples/sec   Loss 7.1945   LearningRate 0.1313   Epoch: 7   Global Step: 77020   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:10:43,160-Speed 5457.22 samples/sec   Loss 7.1601   LearningRate 0.1313   Epoch: 7   Global Step: 77030   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:10:50,773-Speed 5380.73 samples/sec   Loss 7.1152   LearningRate 0.1313   Epoch: 7   Global Step: 77040   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:10:58,319-Speed 5429.38 samples/sec   Loss 7.1731   LearningRate 0.1313   Epoch: 7   Global Step: 77050   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:11:06,037-Speed 5307.00 samples/sec   Loss 7.0719   LearningRate 0.1313   Epoch: 7   Global Step: 77060   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:11:13,535-Speed 5463.37 samples/sec   Loss 7.0955   LearningRate 0.1312   Epoch: 7   Global Step: 77070   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:11:21,137-Speed 5389.05 samples/sec   Loss 7.1295   LearningRate 0.1312   Epoch: 7   Global Step: 77080   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:11:28,708-Speed 5410.99 samples/sec   Loss 7.1059   LearningRate 0.1312   Epoch: 7   Global Step: 77090   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:11:36,207-Speed 5462.49 samples/sec   Loss 7.1557   LearningRate 0.1312   Epoch: 7   Global Step: 77100   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:11:43,784-Speed 5406.43 samples/sec   Loss 7.1205   LearningRate 0.1312   Epoch: 7   Global Step: 77110   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:11:51,553-Speed 5273.16 samples/sec   Loss 7.1314   LearningRate 0.1311   Epoch: 7   Global Step: 77120   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:11:59,122-Speed 5412.74 samples/sec   Loss 7.1348   LearningRate 0.1311   Epoch: 7   Global Step: 77130   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:12:06,649-Speed 5442.38 samples/sec   Loss 7.1436   LearningRate 0.1311   Epoch: 7   Global Step: 77140   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:12:14,189-Speed 5432.30 samples/sec   Loss 7.1537   LearningRate 0.1311   Epoch: 7   Global Step: 77150   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:12:21,712-Speed 5445.87 samples/sec   Loss 7.1071   LearningRate 0.1311   Epoch: 7   Global Step: 77160   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:12:29,298-Speed 5400.06 samples/sec   Loss 7.0893   LearningRate 0.1310   Epoch: 7   Global Step: 77170   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:12:36,926-Speed 5370.58 samples/sec   Loss 7.0373   LearningRate 0.1310   Epoch: 7   Global Step: 77180   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:12:44,527-Speed 5388.84 samples/sec   Loss 7.1434   LearningRate 0.1310   Epoch: 7   Global Step: 77190   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:12:52,104-Speed 5406.61 samples/sec   Loss 7.1075   LearningRate 0.1310   Epoch: 7   Global Step: 77200   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:12:59,777-Speed 5339.19 samples/sec   Loss 7.0472   LearningRate 0.1310   Epoch: 7   Global Step: 77210   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:13:07,458-Speed 5333.02 samples/sec   Loss 7.0909   LearningRate 0.1309   Epoch: 7   Global Step: 77220   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:13:14,988-Speed 5440.16 samples/sec   Loss 7.1804   LearningRate 0.1309   Epoch: 7   Global Step: 77230   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:13:22,505-Speed 5449.44 samples/sec   Loss 7.1694   LearningRate 0.1309   Epoch: 7   Global Step: 77240   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:13:30,100-Speed 5393.92 samples/sec   Loss 7.1039   LearningRate 0.1309   Epoch: 7   Global Step: 77250   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:13:37,630-Speed 5440.79 samples/sec   Loss 7.1835   LearningRate 0.1309   Epoch: 7   Global Step: 77260   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:13:45,085-Speed 5494.66 samples/sec   Loss 7.1280   LearningRate 0.1308   Epoch: 7   Global Step: 77270   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:13:52,552-Speed 5485.77 samples/sec   Loss 7.1406   LearningRate 0.1308   Epoch: 7   Global Step: 77280   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:14:00,078-Speed 5443.81 samples/sec   Loss 7.1087   LearningRate 0.1308   Epoch: 7   Global Step: 77290   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:14:07,736-Speed 5349.06 samples/sec   Loss 7.1462   LearningRate 0.1308   Epoch: 7   Global Step: 77300   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:14:15,323-Speed 5399.30 samples/sec   Loss 7.0577   LearningRate 0.1308   Epoch: 7   Global Step: 77310   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:14:22,916-Speed 5395.07 samples/sec   Loss 7.1116   LearningRate 0.1307   Epoch: 7   Global Step: 77320   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:14:30,412-Speed 5465.25 samples/sec   Loss 7.0620   LearningRate 0.1307   Epoch: 7   Global Step: 77330   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 12:14:37,813-Speed 5534.72 samples/sec   Loss 7.1161   LearningRate 0.1307   Epoch: 7   Global Step: 77340   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:14:45,231-Speed 5522.87 samples/sec   Loss 7.0578   LearningRate 0.1307   Epoch: 7   Global Step: 77350   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:14:52,789-Speed 5420.04 samples/sec   Loss 7.1450   LearningRate 0.1307   Epoch: 7   Global Step: 77360   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:15:00,241-Speed 5497.29 samples/sec   Loss 7.0617   LearningRate 0.1306   Epoch: 7   Global Step: 77370   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:15:07,901-Speed 5347.79 samples/sec   Loss 7.1026   LearningRate 0.1306   Epoch: 7   Global Step: 77380   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:15:15,423-Speed 5446.28 samples/sec   Loss 7.0895   LearningRate 0.1306   Epoch: 7   Global Step: 77390   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:15:22,801-Speed 5552.05 samples/sec   Loss 7.0986   LearningRate 0.1306   Epoch: 7   Global Step: 77400   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:15:30,304-Speed 5460.22 samples/sec   Loss 7.1054   LearningRate 0.1306   Epoch: 7   Global Step: 77410   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:15:37,867-Speed 5416.69 samples/sec   Loss 7.1150   LearningRate 0.1305   Epoch: 7   Global Step: 77420   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:15:45,440-Speed 5408.87 samples/sec   Loss 7.1455   LearningRate 0.1305   Epoch: 7   Global Step: 77430   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:15:52,960-Speed 5447.23 samples/sec   Loss 7.0736   LearningRate 0.1305   Epoch: 7   Global Step: 77440   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:16:00,445-Speed 5473.60 samples/sec   Loss 7.0961   LearningRate 0.1305   Epoch: 7   Global Step: 77450   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:16:07,986-Speed 5431.76 samples/sec   Loss 7.1079   LearningRate 0.1305   Epoch: 7   Global Step: 77460   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:16:15,416-Speed 5513.99 samples/sec   Loss 7.0823   LearningRate 0.1304   Epoch: 7   Global Step: 77470   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:16:22,893-Speed 5478.94 samples/sec   Loss 7.0670   LearningRate 0.1304   Epoch: 7   Global Step: 77480   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:16:30,452-Speed 5419.25 samples/sec   Loss 7.1045   LearningRate 0.1304   Epoch: 7   Global Step: 77490   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:16:38,060-Speed 5385.20 samples/sec   Loss 7.0791   LearningRate 0.1304   Epoch: 7   Global Step: 77500   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:16:45,519-Speed 5491.54 samples/sec   Loss 7.0901   LearningRate 0.1304   Epoch: 7   Global Step: 77510   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:16:53,039-Speed 5448.05 samples/sec   Loss 7.1146   LearningRate 0.1303   Epoch: 7   Global Step: 77520   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:17:00,534-Speed 5465.05 samples/sec   Loss 7.1434   LearningRate 0.1303   Epoch: 7   Global Step: 77530   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:17:08,039-Speed 5459.38 samples/sec   Loss 7.1269   LearningRate 0.1303   Epoch: 7   Global Step: 77540   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:17:15,790-Speed 5284.45 samples/sec   Loss 7.0972   LearningRate 0.1303   Epoch: 7   Global Step: 77550   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:17:23,334-Speed 5430.50 samples/sec   Loss 7.1134   LearningRate 0.1303   Epoch: 7   Global Step: 77560   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:17:30,733-Speed 5536.40 samples/sec   Loss 7.0952   LearningRate 0.1302   Epoch: 7   Global Step: 77570   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:17:38,168-Speed 5510.53 samples/sec   Loss 7.1079   LearningRate 0.1302   Epoch: 7   Global Step: 77580   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:17:45,631-Speed 5488.82 samples/sec   Loss 7.1650   LearningRate 0.1302   Epoch: 7   Global Step: 77590   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:17:53,174-Speed 5431.24 samples/sec   Loss 7.0915   LearningRate 0.1302   Epoch: 7   Global Step: 77600   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:18:00,596-Speed 5519.37 samples/sec   Loss 7.1101   LearningRate 0.1302   Epoch: 7   Global Step: 77610   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:18:08,018-Speed 5519.45 samples/sec   Loss 7.0558   LearningRate 0.1301   Epoch: 7   Global Step: 77620   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:18:15,518-Speed 5461.92 samples/sec   Loss 7.0771   LearningRate 0.1301   Epoch: 7   Global Step: 77630   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:18:23,098-Speed 5404.36 samples/sec   Loss 7.0661   LearningRate 0.1301   Epoch: 7   Global Step: 77640   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:18:30,629-Speed 5439.92 samples/sec   Loss 7.1085   LearningRate 0.1301   Epoch: 7   Global Step: 77650   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:18:38,075-Speed 5501.83 samples/sec   Loss 7.1449   LearningRate 0.1301   Epoch: 7   Global Step: 77660   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:18:45,530-Speed 5495.16 samples/sec   Loss 7.0756   LearningRate 0.1300   Epoch: 7   Global Step: 77670   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:18:52,979-Speed 5499.24 samples/sec   Loss 7.0916   LearningRate 0.1300   Epoch: 7   Global Step: 77680   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:19:00,515-Speed 5435.86 samples/sec   Loss 7.1067   LearningRate 0.1300   Epoch: 7   Global Step: 77690   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:19:08,142-Speed 5371.42 samples/sec   Loss 7.1057   LearningRate 0.1300   Epoch: 7   Global Step: 77700   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:19:15,720-Speed 5405.72 samples/sec   Loss 7.0776   LearningRate 0.1300   Epoch: 7   Global Step: 77710   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:19:23,145-Speed 5517.68 samples/sec   Loss 7.1117   LearningRate 0.1299   Epoch: 7   Global Step: 77720   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:19:30,537-Speed 5541.57 samples/sec   Loss 7.1827   LearningRate 0.1299   Epoch: 7   Global Step: 77730   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:19:37,934-Speed 5538.45 samples/sec   Loss 7.1163   LearningRate 0.1299   Epoch: 7   Global Step: 77740   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:19:45,363-Speed 5514.52 samples/sec   Loss 7.1315   LearningRate 0.1299   Epoch: 7   Global Step: 77750   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:19:53,012-Speed 5355.28 samples/sec   Loss 7.1392   LearningRate 0.1299   Epoch: 7   Global Step: 77760   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:20:00,438-Speed 5516.51 samples/sec   Loss 7.1066   LearningRate 0.1298   Epoch: 7   Global Step: 77770   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:20:07,872-Speed 5510.54 samples/sec   Loss 7.1289   LearningRate 0.1298   Epoch: 7   Global Step: 77780   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:20:15,326-Speed 5496.23 samples/sec   Loss 7.0792   LearningRate 0.1298   Epoch: 7   Global Step: 77790   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:20:22,890-Speed 5415.72 samples/sec   Loss 7.0746   LearningRate 0.1298   Epoch: 7   Global Step: 77800   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:20:30,514-Speed 5373.19 samples/sec   Loss 7.0758   LearningRate 0.1298   Epoch: 7   Global Step: 77810   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:20:38,171-Speed 5350.34 samples/sec   Loss 7.0654   LearningRate 0.1297   Epoch: 7   Global Step: 77820   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:20:45,656-Speed 5472.82 samples/sec   Loss 7.1261   LearningRate 0.1297   Epoch: 7   Global Step: 77830   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:20:53,217-Speed 5418.52 samples/sec   Loss 7.0875   LearningRate 0.1297   Epoch: 7   Global Step: 77840   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:21:00,670-Speed 5496.41 samples/sec   Loss 7.0076   LearningRate 0.1297   Epoch: 7   Global Step: 77850   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:21:08,121-Speed 5497.56 samples/sec   Loss 7.0022   LearningRate 0.1297   Epoch: 7   Global Step: 77860   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:21:15,674-Speed 5424.05 samples/sec   Loss 7.0083   LearningRate 0.1296   Epoch: 7   Global Step: 77870   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:21:23,110-Speed 5508.96 samples/sec   Loss 7.0863   LearningRate 0.1296   Epoch: 7   Global Step: 77880   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:21:30,548-Speed 5507.70 samples/sec   Loss 7.0879   LearningRate 0.1296   Epoch: 7   Global Step: 77890   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:21:38,031-Speed 5474.32 samples/sec   Loss 7.0670   LearningRate 0.1296   Epoch: 7   Global Step: 77900   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:21:45,425-Speed 5540.90 samples/sec   Loss 7.0685   LearningRate 0.1296   Epoch: 7   Global Step: 77910   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:21:52,864-Speed 5506.95 samples/sec   Loss 7.1304   LearningRate 0.1295   Epoch: 7   Global Step: 77920   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:22:00,408-Speed 5429.71 samples/sec   Loss 7.0231   LearningRate 0.1295   Epoch: 7   Global Step: 77930   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:22:07,819-Speed 5528.13 samples/sec   Loss 7.0791   LearningRate 0.1295   Epoch: 7   Global Step: 77940   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:22:15,383-Speed 5415.59 samples/sec   Loss 7.0748   LearningRate 0.1295   Epoch: 7   Global Step: 77950   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:22:22,806-Speed 5518.71 samples/sec   Loss 7.1175   LearningRate 0.1295   Epoch: 7   Global Step: 77960   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:22:30,266-Speed 5490.92 samples/sec   Loss 7.0759   LearningRate 0.1294   Epoch: 7   Global Step: 77970   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:22:37,755-Speed 5470.20 samples/sec   Loss 7.0404   LearningRate 0.1294   Epoch: 7   Global Step: 77980   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:22:45,318-Speed 5416.92 samples/sec   Loss 7.1376   LearningRate 0.1294   Epoch: 7   Global Step: 77990   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:22:52,756-Speed 5507.21 samples/sec   Loss 7.0497   LearningRate 0.1294   Epoch: 7   Global Step: 78000   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:23:37,084-[lfw][78000]XNorm: 22.960847
Training: 2022-01-08 12:23:37,085-[lfw][78000]Accuracy-Flip: 0.99783+-0.00299
Training: 2022-01-08 12:23:37,085-[lfw][78000]Accuracy-Highest: 0.99817
Training: 2022-01-08 12:24:29,762-[cfp_fp][78000]XNorm: 20.959540
Training: 2022-01-08 12:24:29,763-[cfp_fp][78000]Accuracy-Flip: 0.98814+-0.00389
Training: 2022-01-08 12:24:29,764-[cfp_fp][78000]Accuracy-Highest: 0.98814
Training: 2022-01-08 12:25:18,486-[agedb_30][78000]XNorm: 23.009706
Training: 2022-01-08 12:25:18,487-[agedb_30][78000]Accuracy-Flip: 0.97533+-0.00781
Training: 2022-01-08 12:25:18,487-[agedb_30][78000]Accuracy-Highest: 0.97667
Training: 2022-01-08 12:25:26,120-Speed 267.08 samples/sec   Loss 7.0236   LearningRate 0.1294   Epoch: 7   Global Step: 78010   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:25:33,653-Speed 5439.14 samples/sec   Loss 7.0691   LearningRate 0.1293   Epoch: 7   Global Step: 78020   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:25:41,173-Speed 5447.72 samples/sec   Loss 7.0310   LearningRate 0.1293   Epoch: 7   Global Step: 78030   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:25:48,798-Speed 5373.21 samples/sec   Loss 7.0910   LearningRate 0.1293   Epoch: 7   Global Step: 78040   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:25:56,316-Speed 5449.15 samples/sec   Loss 7.0309   LearningRate 0.1293   Epoch: 7   Global Step: 78050   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:26:03,767-Speed 5498.13 samples/sec   Loss 7.0905   LearningRate 0.1293   Epoch: 7   Global Step: 78060   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:26:11,307-Speed 5432.80 samples/sec   Loss 7.0912   LearningRate 0.1292   Epoch: 7   Global Step: 78070   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:26:18,824-Speed 5450.29 samples/sec   Loss 7.0669   LearningRate 0.1292   Epoch: 7   Global Step: 78080   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:26:26,359-Speed 5436.07 samples/sec   Loss 7.0665   LearningRate 0.1292   Epoch: 7   Global Step: 78090   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:26:33,856-Speed 5464.61 samples/sec   Loss 7.0131   LearningRate 0.1292   Epoch: 7   Global Step: 78100   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:26:41,337-Speed 5476.35 samples/sec   Loss 7.0728   LearningRate 0.1292   Epoch: 7   Global Step: 78110   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:26:48,890-Speed 5423.61 samples/sec   Loss 7.0352   LearningRate 0.1291   Epoch: 7   Global Step: 78120   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:26:56,411-Speed 5446.20 samples/sec   Loss 7.0956   LearningRate 0.1291   Epoch: 7   Global Step: 78130   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:27:03,919-Speed 5456.32 samples/sec   Loss 7.0941   LearningRate 0.1291   Epoch: 7   Global Step: 78140   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:27:11,379-Speed 5491.66 samples/sec   Loss 7.0755   LearningRate 0.1291   Epoch: 7   Global Step: 78150   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:27:18,832-Speed 5496.61 samples/sec   Loss 7.0548   LearningRate 0.1291   Epoch: 7   Global Step: 78160   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:27:26,362-Speed 5439.75 samples/sec   Loss 7.0716   LearningRate 0.1290   Epoch: 7   Global Step: 78170   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:27:33,827-Speed 5487.87 samples/sec   Loss 7.0659   LearningRate 0.1290   Epoch: 7   Global Step: 78180   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:27:41,340-Speed 5452.73 samples/sec   Loss 7.0689   LearningRate 0.1290   Epoch: 7   Global Step: 78190   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:27:48,822-Speed 5475.49 samples/sec   Loss 7.0883   LearningRate 0.1290   Epoch: 7   Global Step: 78200   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:27:56,356-Speed 5436.64 samples/sec   Loss 7.0799   LearningRate 0.1290   Epoch: 7   Global Step: 78210   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:28:03,863-Speed 5456.96 samples/sec   Loss 6.9707   LearningRate 0.1289   Epoch: 7   Global Step: 78220   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:28:11,373-Speed 5455.53 samples/sec   Loss 7.0636   LearningRate 0.1289   Epoch: 7   Global Step: 78230   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:28:18,851-Speed 5477.65 samples/sec   Loss 7.0557   LearningRate 0.1289   Epoch: 7   Global Step: 78240   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:28:26,301-Speed 5499.00 samples/sec   Loss 7.0125   LearningRate 0.1289   Epoch: 7   Global Step: 78250   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:28:33,747-Speed 5501.04 samples/sec   Loss 7.0941   LearningRate 0.1289   Epoch: 7   Global Step: 78260   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:28:41,295-Speed 5428.10 samples/sec   Loss 7.1138   LearningRate 0.1288   Epoch: 7   Global Step: 78270   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:28:48,776-Speed 5475.71 samples/sec   Loss 7.0921   LearningRate 0.1288   Epoch: 7   Global Step: 78280   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:28:56,210-Speed 5510.54 samples/sec   Loss 7.0063   LearningRate 0.1288   Epoch: 7   Global Step: 78290   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:29:03,694-Speed 5474.09 samples/sec   Loss 7.0587   LearningRate 0.1288   Epoch: 7   Global Step: 78300   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:29:11,253-Speed 5419.38 samples/sec   Loss 7.0530   LearningRate 0.1288   Epoch: 7   Global Step: 78310   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:29:18,764-Speed 5453.86 samples/sec   Loss 7.0189   LearningRate 0.1287   Epoch: 7   Global Step: 78320   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:29:26,368-Speed 5387.26 samples/sec   Loss 7.1225   LearningRate 0.1287   Epoch: 7   Global Step: 78330   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:29:33,892-Speed 5445.01 samples/sec   Loss 7.0690   LearningRate 0.1287   Epoch: 7   Global Step: 78340   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:29:41,352-Speed 5491.47 samples/sec   Loss 7.0793   LearningRate 0.1287   Epoch: 7   Global Step: 78350   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:29:48,756-Speed 5532.95 samples/sec   Loss 6.9969   LearningRate 0.1287   Epoch: 7   Global Step: 78360   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:29:56,213-Speed 5493.90 samples/sec   Loss 7.0970   LearningRate 0.1286   Epoch: 7   Global Step: 78370   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:30:03,650-Speed 5508.10 samples/sec   Loss 6.9909   LearningRate 0.1286   Epoch: 7   Global Step: 78380   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:30:11,099-Speed 5499.40 samples/sec   Loss 7.0532   LearningRate 0.1286   Epoch: 7   Global Step: 78390   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:30:18,576-Speed 5478.89 samples/sec   Loss 7.0467   LearningRate 0.1286   Epoch: 7   Global Step: 78400   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:30:26,067-Speed 5469.12 samples/sec   Loss 7.0374   LearningRate 0.1286   Epoch: 7   Global Step: 78410   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:30:33,493-Speed 5515.75 samples/sec   Loss 7.0499   LearningRate 0.1285   Epoch: 7   Global Step: 78420   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:30:40,983-Speed 5469.82 samples/sec   Loss 7.0218   LearningRate 0.1285   Epoch: 7   Global Step: 78430   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:30:48,376-Speed 5540.74 samples/sec   Loss 7.0870   LearningRate 0.1285   Epoch: 7   Global Step: 78440   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:30:55,875-Speed 5463.43 samples/sec   Loss 7.0372   LearningRate 0.1285   Epoch: 7   Global Step: 78450   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:31:03,509-Speed 5366.06 samples/sec   Loss 7.0896   LearningRate 0.1285   Epoch: 7   Global Step: 78460   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:31:11,016-Speed 5456.60 samples/sec   Loss 7.0604   LearningRate 0.1284   Epoch: 7   Global Step: 78470   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:31:18,517-Speed 5461.32 samples/sec   Loss 7.0862   LearningRate 0.1284   Epoch: 7   Global Step: 78480   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:31:26,004-Speed 5471.91 samples/sec   Loss 7.0198   LearningRate 0.1284   Epoch: 7   Global Step: 78490   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:31:33,450-Speed 5501.66 samples/sec   Loss 7.0433   LearningRate 0.1284   Epoch: 7   Global Step: 78500   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:31:40,908-Speed 5492.44 samples/sec   Loss 6.9995   LearningRate 0.1284   Epoch: 7   Global Step: 78510   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:31:48,394-Speed 5472.37 samples/sec   Loss 7.0159   LearningRate 0.1283   Epoch: 7   Global Step: 78520   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:31:57,312-Speed 5510.93 samples/sec   Loss 7.0128   LearningRate 0.1283   Epoch: 7   Global Step: 78530   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:32:04,816-Speed 5458.84 samples/sec   Loss 7.0648   LearningRate 0.1283   Epoch: 7   Global Step: 78540   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:32:12,325-Speed 5455.56 samples/sec   Loss 7.0588   LearningRate 0.1283   Epoch: 7   Global Step: 78550   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:32:19,857-Speed 5438.97 samples/sec   Loss 7.0750   LearningRate 0.1283   Epoch: 7   Global Step: 78560   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:32:27,402-Speed 5429.36 samples/sec   Loss 7.0802   LearningRate 0.1282   Epoch: 7   Global Step: 78570   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:32:34,896-Speed 5466.80 samples/sec   Loss 7.0180   LearningRate 0.1282   Epoch: 7   Global Step: 78580   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:32:42,368-Speed 5482.17 samples/sec   Loss 7.0260   LearningRate 0.1282   Epoch: 7   Global Step: 78590   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:32:49,984-Speed 5378.61 samples/sec   Loss 7.0192   LearningRate 0.1282   Epoch: 7   Global Step: 78600   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:32:57,422-Speed 5508.39 samples/sec   Loss 7.0485   LearningRate 0.1282   Epoch: 7   Global Step: 78610   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:33:04,984-Speed 5417.57 samples/sec   Loss 7.0097   LearningRate 0.1281   Epoch: 7   Global Step: 78620   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 12:33:12,482-Speed 5463.29 samples/sec   Loss 7.0468   LearningRate 0.1281   Epoch: 7   Global Step: 78630   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:33:19,899-Speed 5522.59 samples/sec   Loss 6.9760   LearningRate 0.1281   Epoch: 7   Global Step: 78640   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:33:27,346-Speed 5501.52 samples/sec   Loss 7.0350   LearningRate 0.1281   Epoch: 7   Global Step: 78650   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 12:33:34,741-Speed 5539.85 samples/sec   Loss 7.0925   LearningRate 0.1281   Epoch: 7   Global Step: 78660   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:33:42,213-Speed 5482.00 samples/sec   Loss 7.0795   LearningRate 0.1280   Epoch: 7   Global Step: 78670   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:33:49,632-Speed 5521.47 samples/sec   Loss 7.0042   LearningRate 0.1280   Epoch: 7   Global Step: 78680   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:33:57,120-Speed 5471.43 samples/sec   Loss 7.0459   LearningRate 0.1280   Epoch: 7   Global Step: 78690   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:34:04,617-Speed 5464.07 samples/sec   Loss 7.0448   LearningRate 0.1280   Epoch: 7   Global Step: 78700   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:34:12,052-Speed 5509.90 samples/sec   Loss 7.0212   LearningRate 0.1280   Epoch: 7   Global Step: 78710   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:34:19,546-Speed 5466.24 samples/sec   Loss 7.1044   LearningRate 0.1279   Epoch: 7   Global Step: 78720   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:34:26,963-Speed 5523.37 samples/sec   Loss 6.9864   LearningRate 0.1279   Epoch: 7   Global Step: 78730   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 12:34:34,412-Speed 5499.69 samples/sec   Loss 6.9974   LearningRate 0.1279   Epoch: 7   Global Step: 78740   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 12:34:41,922-Speed 5454.78 samples/sec   Loss 7.0113   LearningRate 0.1279   Epoch: 7   Global Step: 78750   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 12:34:49,336-Speed 5524.89 samples/sec   Loss 7.0107   LearningRate 0.1279   Epoch: 7   Global Step: 78760   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:34:56,894-Speed 5420.31 samples/sec   Loss 7.0175   LearningRate 0.1278   Epoch: 7   Global Step: 78770   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:35:04,379-Speed 5473.19 samples/sec   Loss 6.9954   LearningRate 0.1278   Epoch: 7   Global Step: 78780   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 12:35:11,861-Speed 5475.55 samples/sec   Loss 7.0186   LearningRate 0.1278   Epoch: 7   Global Step: 78790   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 12:35:19,345-Speed 5473.77 samples/sec   Loss 7.0701   LearningRate 0.1278   Epoch: 7   Global Step: 78800   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 12:35:26,844-Speed 5462.51 samples/sec   Loss 7.0115   LearningRate 0.1278   Epoch: 7   Global Step: 78810   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 12:35:34,293-Speed 5499.73 samples/sec   Loss 7.0802   LearningRate 0.1277   Epoch: 7   Global Step: 78820   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 12:35:41,761-Speed 5485.28 samples/sec   Loss 7.0173   LearningRate 0.1277   Epoch: 7   Global Step: 78830   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 12:35:49,214-Speed 5496.42 samples/sec   Loss 7.0624   LearningRate 0.1277   Epoch: 7   Global Step: 78840   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 12:35:56,747-Speed 5437.89 samples/sec   Loss 7.0047   LearningRate 0.1277   Epoch: 7   Global Step: 78850   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 12:36:04,236-Speed 5470.17 samples/sec   Loss 6.9793   LearningRate 0.1277   Epoch: 7   Global Step: 78860   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 12:36:11,835-Speed 5391.42 samples/sec   Loss 7.0056   LearningRate 0.1276   Epoch: 7   Global Step: 78870   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 12:36:19,331-Speed 5464.54 samples/sec   Loss 7.0190   LearningRate 0.1276   Epoch: 7   Global Step: 78880   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:36:26,812-Speed 5476.06 samples/sec   Loss 7.0822   LearningRate 0.1276   Epoch: 7   Global Step: 78890   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:36:34,271-Speed 5491.96 samples/sec   Loss 6.9836   LearningRate 0.1276   Epoch: 7   Global Step: 78900   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:36:41,816-Speed 5429.67 samples/sec   Loss 7.0234   LearningRate 0.1276   Epoch: 7   Global Step: 78910   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:36:49,306-Speed 5469.59 samples/sec   Loss 6.9832   LearningRate 0.1275   Epoch: 7   Global Step: 78920   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:36:56,771-Speed 5487.28 samples/sec   Loss 7.0532   LearningRate 0.1275   Epoch: 7   Global Step: 78930   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:37:04,343-Speed 5410.53 samples/sec   Loss 6.9768   LearningRate 0.1275   Epoch: 7   Global Step: 78940   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:37:11,809-Speed 5487.11 samples/sec   Loss 7.0296   LearningRate 0.1275   Epoch: 7   Global Step: 78950   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:37:19,321-Speed 5453.34 samples/sec   Loss 7.0149   LearningRate 0.1275   Epoch: 7   Global Step: 78960   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:37:26,916-Speed 5394.06 samples/sec   Loss 7.0242   LearningRate 0.1274   Epoch: 7   Global Step: 78970   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:37:34,501-Speed 5401.13 samples/sec   Loss 6.9655   LearningRate 0.1274   Epoch: 7   Global Step: 78980   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 12:37:42,055-Speed 5423.44 samples/sec   Loss 6.9935   LearningRate 0.1274   Epoch: 7   Global Step: 78990   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 12:37:49,588-Speed 5438.21 samples/sec   Loss 7.0568   LearningRate 0.1274   Epoch: 7   Global Step: 79000   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 12:37:57,072-Speed 5473.48 samples/sec   Loss 7.0314   LearningRate 0.1274   Epoch: 7   Global Step: 79010   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 12:38:04,749-Speed 5335.89 samples/sec   Loss 7.0281   LearningRate 0.1274   Epoch: 7   Global Step: 79020   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 12:38:12,369-Speed 5375.68 samples/sec   Loss 6.9760   LearningRate 0.1273   Epoch: 7   Global Step: 79030   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 12:38:19,893-Speed 5445.01 samples/sec   Loss 7.0802   LearningRate 0.1273   Epoch: 7   Global Step: 79040   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:38:27,360-Speed 5486.17 samples/sec   Loss 7.0048   LearningRate 0.1273   Epoch: 7   Global Step: 79050   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:38:34,865-Speed 5458.55 samples/sec   Loss 7.0523   LearningRate 0.1273   Epoch: 7   Global Step: 79060   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:38:42,381-Speed 5449.99 samples/sec   Loss 6.9977   LearningRate 0.1273   Epoch: 7   Global Step: 79070   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:38:49,789-Speed 5530.38 samples/sec   Loss 7.0056   LearningRate 0.1272   Epoch: 7   Global Step: 79080   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:38:57,303-Speed 5451.73 samples/sec   Loss 7.0250   LearningRate 0.1272   Epoch: 7   Global Step: 79090   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:39:04,797-Speed 5466.03 samples/sec   Loss 6.9837   LearningRate 0.1272   Epoch: 7   Global Step: 79100   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:39:12,257-Speed 5491.39 samples/sec   Loss 6.9602   LearningRate 0.1272   Epoch: 7   Global Step: 79110   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:39:19,720-Speed 5489.17 samples/sec   Loss 6.9106   LearningRate 0.1272   Epoch: 7   Global Step: 79120   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:39:27,183-Speed 5488.94 samples/sec   Loss 7.0869   LearningRate 0.1271   Epoch: 7   Global Step: 79130   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:39:34,574-Speed 5542.44 samples/sec   Loss 7.0009   LearningRate 0.1271   Epoch: 7   Global Step: 79140   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:39:42,038-Speed 5488.97 samples/sec   Loss 7.0318   LearningRate 0.1271   Epoch: 7   Global Step: 79150   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:39:49,568-Speed 5439.80 samples/sec   Loss 6.9412   LearningRate 0.1271   Epoch: 7   Global Step: 79160   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:39:57,080-Speed 5453.90 samples/sec   Loss 6.9966   LearningRate 0.1271   Epoch: 7   Global Step: 79170   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:40:04,619-Speed 5433.25 samples/sec   Loss 7.0034   LearningRate 0.1270   Epoch: 7   Global Step: 79180   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:40:12,109-Speed 5469.51 samples/sec   Loss 6.9858   LearningRate 0.1270   Epoch: 7   Global Step: 79190   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:40:19,593-Speed 5473.97 samples/sec   Loss 7.0279   LearningRate 0.1270   Epoch: 7   Global Step: 79200   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:40:27,078-Speed 5473.19 samples/sec   Loss 7.0094   LearningRate 0.1270   Epoch: 7   Global Step: 79210   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:40:34,627-Speed 5426.34 samples/sec   Loss 6.9649   LearningRate 0.1270   Epoch: 7   Global Step: 79220   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:40:42,077-Speed 5498.66 samples/sec   Loss 6.9911   LearningRate 0.1269   Epoch: 7   Global Step: 79230   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:40:49,538-Speed 5490.78 samples/sec   Loss 7.0405   LearningRate 0.1269   Epoch: 7   Global Step: 79240   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 12:40:57,001-Speed 5489.18 samples/sec   Loss 6.9644   LearningRate 0.1269   Epoch: 7   Global Step: 79250   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 12:41:04,498-Speed 5464.04 samples/sec   Loss 7.0269   LearningRate 0.1269   Epoch: 7   Global Step: 79260   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 12:41:12,004-Speed 5457.76 samples/sec   Loss 7.0032   LearningRate 0.1269   Epoch: 7   Global Step: 79270   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 12:41:19,567-Speed 5416.93 samples/sec   Loss 7.0147   LearningRate 0.1268   Epoch: 7   Global Step: 79280   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 12:41:27,053-Speed 5472.65 samples/sec   Loss 6.9858   LearningRate 0.1268   Epoch: 7   Global Step: 79290   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 12:41:34,453-Speed 5534.99 samples/sec   Loss 6.9676   LearningRate 0.1268   Epoch: 7   Global Step: 79300   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:41:42,010-Speed 5421.50 samples/sec   Loss 6.9374   LearningRate 0.1268   Epoch: 7   Global Step: 79310   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:41:49,457-Speed 5500.79 samples/sec   Loss 6.9671   LearningRate 0.1268   Epoch: 7   Global Step: 79320   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:42:02,437-Speed 3155.77 samples/sec   Loss 7.0071   LearningRate 0.1267   Epoch: 7   Global Step: 79330   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:42:09,934-Speed 5464.58 samples/sec   Loss 6.9362   LearningRate 0.1267   Epoch: 7   Global Step: 79340   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:42:17,363-Speed 5514.04 samples/sec   Loss 6.9445   LearningRate 0.1267   Epoch: 7   Global Step: 79350   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:42:24,899-Speed 5436.15 samples/sec   Loss 6.9337   LearningRate 0.1267   Epoch: 7   Global Step: 79360   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:42:32,351-Speed 5496.97 samples/sec   Loss 6.9773   LearningRate 0.1267   Epoch: 7   Global Step: 79370   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:42:39,866-Speed 5451.43 samples/sec   Loss 6.9895   LearningRate 0.1266   Epoch: 7   Global Step: 79380   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:42:47,398-Speed 5438.67 samples/sec   Loss 6.9764   LearningRate 0.1266   Epoch: 7   Global Step: 79390   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:42:54,888-Speed 5469.74 samples/sec   Loss 7.0393   LearningRate 0.1266   Epoch: 7   Global Step: 79400   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 12:43:02,498-Speed 5383.42 samples/sec   Loss 6.9789   LearningRate 0.1266   Epoch: 7   Global Step: 79410   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 12:43:10,100-Speed 5388.07 samples/sec   Loss 6.9925   LearningRate 0.1266   Epoch: 7   Global Step: 79420   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 12:43:17,660-Speed 5418.94 samples/sec   Loss 6.9990   LearningRate 0.1265   Epoch: 7   Global Step: 79430   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 12:43:25,220-Speed 5419.36 samples/sec   Loss 6.9894   LearningRate 0.1265   Epoch: 7   Global Step: 79440   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 12:43:32,671-Speed 5498.20 samples/sec   Loss 6.9153   LearningRate 0.1265   Epoch: 7   Global Step: 79450   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 12:43:40,175-Speed 5458.48 samples/sec   Loss 6.9076   LearningRate 0.1265   Epoch: 7   Global Step: 79460   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:43:47,754-Speed 5405.26 samples/sec   Loss 6.9809   LearningRate 0.1265   Epoch: 7   Global Step: 79470   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:43:55,293-Speed 5434.28 samples/sec   Loss 6.9806   LearningRate 0.1264   Epoch: 7   Global Step: 79480   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:44:02,834-Speed 5432.28 samples/sec   Loss 6.9501   LearningRate 0.1264   Epoch: 7   Global Step: 79490   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:44:10,339-Speed 5457.95 samples/sec   Loss 6.9850   LearningRate 0.1264   Epoch: 7   Global Step: 79500   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:44:17,832-Speed 5467.38 samples/sec   Loss 6.8911   LearningRate 0.1264   Epoch: 7   Global Step: 79510   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:44:25,323-Speed 5468.86 samples/sec   Loss 6.9583   LearningRate 0.1264   Epoch: 7   Global Step: 79520   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:44:32,858-Speed 5437.03 samples/sec   Loss 6.9631   LearningRate 0.1263   Epoch: 7   Global Step: 79530   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:44:40,645-Speed 5260.28 samples/sec   Loss 7.0021   LearningRate 0.1263   Epoch: 7   Global Step: 79540   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:44:48,206-Speed 5418.16 samples/sec   Loss 6.9729   LearningRate 0.1263   Epoch: 7   Global Step: 79550   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:44:55,786-Speed 5403.91 samples/sec   Loss 6.9840   LearningRate 0.1263   Epoch: 7   Global Step: 79560   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 12:45:03,333-Speed 5428.18 samples/sec   Loss 7.0168   LearningRate 0.1263   Epoch: 7   Global Step: 79570   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 12:45:10,814-Speed 5476.22 samples/sec   Loss 6.9802   LearningRate 0.1262   Epoch: 7   Global Step: 79580   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 12:45:18,383-Speed 5412.38 samples/sec   Loss 7.0552   LearningRate 0.1262   Epoch: 7   Global Step: 79590   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 12:45:26,016-Speed 5366.64 samples/sec   Loss 7.0075   LearningRate 0.1262   Epoch: 7   Global Step: 79600   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:45:34,586-Speed 4780.09 samples/sec   Loss 6.9915   LearningRate 0.1262   Epoch: 7   Global Step: 79610   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:45:42,162-Speed 5406.95 samples/sec   Loss 7.0037   LearningRate 0.1262   Epoch: 7   Global Step: 79620   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:45:49,727-Speed 5415.06 samples/sec   Loss 6.9253   LearningRate 0.1261   Epoch: 7   Global Step: 79630   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:45:57,331-Speed 5387.41 samples/sec   Loss 6.9715   LearningRate 0.1261   Epoch: 7   Global Step: 79640   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:46:04,984-Speed 5353.12 samples/sec   Loss 6.9678   LearningRate 0.1261   Epoch: 7   Global Step: 79650   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:46:12,523-Speed 5433.43 samples/sec   Loss 6.9901   LearningRate 0.1261   Epoch: 7   Global Step: 79660   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:46:20,077-Speed 5423.28 samples/sec   Loss 7.0127   LearningRate 0.1261   Epoch: 7   Global Step: 79670   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:46:27,620-Speed 5430.19 samples/sec   Loss 7.0407   LearningRate 0.1260   Epoch: 7   Global Step: 79680   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:46:35,097-Speed 5479.65 samples/sec   Loss 7.0246   LearningRate 0.1260   Epoch: 7   Global Step: 79690   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:46:42,638-Speed 5432.17 samples/sec   Loss 6.9567   LearningRate 0.1260   Epoch: 7   Global Step: 79700   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 12:46:50,158-Speed 5447.41 samples/sec   Loss 6.9977   LearningRate 0.1260   Epoch: 7   Global Step: 79710   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 12:46:57,696-Speed 5434.23 samples/sec   Loss 6.9613   LearningRate 0.1260   Epoch: 7   Global Step: 79720   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 12:47:05,283-Speed 5399.50 samples/sec   Loss 6.9703   LearningRate 0.1259   Epoch: 7   Global Step: 79730   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 12:47:12,874-Speed 5396.85 samples/sec   Loss 6.9855   LearningRate 0.1259   Epoch: 7   Global Step: 79740   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 12:47:20,446-Speed 5410.12 samples/sec   Loss 6.9660   LearningRate 0.1259   Epoch: 7   Global Step: 79750   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 12:47:28,075-Speed 5369.44 samples/sec   Loss 6.9608   LearningRate 0.1259   Epoch: 7   Global Step: 79760   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 12:47:35,699-Speed 5373.50 samples/sec   Loss 6.8927   LearningRate 0.1259   Epoch: 7   Global Step: 79770   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 12:47:43,364-Speed 5344.01 samples/sec   Loss 6.9863   LearningRate 0.1258   Epoch: 7   Global Step: 79780   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 12:47:51,012-Speed 5356.25 samples/sec   Loss 7.0034   LearningRate 0.1258   Epoch: 7   Global Step: 79790   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 12:47:58,624-Speed 5381.43 samples/sec   Loss 6.9783   LearningRate 0.1258   Epoch: 7   Global Step: 79800   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 12:48:06,212-Speed 5399.21 samples/sec   Loss 6.9409   LearningRate 0.1258   Epoch: 7   Global Step: 79810   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 12:48:13,799-Speed 5399.70 samples/sec   Loss 6.9660   LearningRate 0.1258   Epoch: 7   Global Step: 79820   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:48:21,422-Speed 5373.58 samples/sec   Loss 6.9454   LearningRate 0.1257   Epoch: 7   Global Step: 79830   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:48:28,955-Speed 5438.41 samples/sec   Loss 6.9353   LearningRate 0.1257   Epoch: 7   Global Step: 79840   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:48:36,564-Speed 5383.81 samples/sec   Loss 6.9776   LearningRate 0.1257   Epoch: 7   Global Step: 79850   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:48:44,151-Speed 5399.15 samples/sec   Loss 6.9815   LearningRate 0.1257   Epoch: 7   Global Step: 79860   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:48:51,624-Speed 5482.33 samples/sec   Loss 6.9617   LearningRate 0.1257   Epoch: 7   Global Step: 79870   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:48:59,217-Speed 5395.00 samples/sec   Loss 6.9921   LearningRate 0.1256   Epoch: 7   Global Step: 79880   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:49:06,775-Speed 5419.77 samples/sec   Loss 6.9786   LearningRate 0.1256   Epoch: 7   Global Step: 79890   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:49:14,287-Speed 5453.14 samples/sec   Loss 6.9873   LearningRate 0.1256   Epoch: 7   Global Step: 79900   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:49:21,762-Speed 5481.00 samples/sec   Loss 7.0124   LearningRate 0.1256   Epoch: 7   Global Step: 79910   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:49:29,253-Speed 5468.20 samples/sec   Loss 6.9862   LearningRate 0.1256   Epoch: 7   Global Step: 79920   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 12:49:36,769-Speed 5450.59 samples/sec   Loss 6.9388   LearningRate 0.1256   Epoch: 7   Global Step: 79930   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 12:49:44,263-Speed 5466.58 samples/sec   Loss 6.9054   LearningRate 0.1255   Epoch: 7   Global Step: 79940   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 12:49:51,891-Speed 5370.23 samples/sec   Loss 6.9521   LearningRate 0.1255   Epoch: 7   Global Step: 79950   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 12:49:59,458-Speed 5412.96 samples/sec   Loss 6.9033   LearningRate 0.1255   Epoch: 7   Global Step: 79960   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 12:50:06,935-Speed 5479.02 samples/sec   Loss 6.9393   LearningRate 0.1255   Epoch: 7   Global Step: 79970   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 12:50:14,478-Speed 5431.24 samples/sec   Loss 6.9276   LearningRate 0.1255   Epoch: 7   Global Step: 79980   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 12:50:22,062-Speed 5401.72 samples/sec   Loss 6.9926   LearningRate 0.1254   Epoch: 7   Global Step: 79990   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 12:50:29,575-Speed 5452.25 samples/sec   Loss 6.9126   LearningRate 0.1254   Epoch: 7   Global Step: 80000   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 12:51:14,000-[lfw][80000]XNorm: 23.802571
Training: 2022-01-08 12:51:14,004-[lfw][80000]Accuracy-Flip: 0.99750+-0.00300
Training: 2022-01-08 12:51:14,004-[lfw][80000]Accuracy-Highest: 0.99817
Training: 2022-01-08 12:52:05,842-[cfp_fp][80000]XNorm: 21.845129
Training: 2022-01-08 12:52:05,843-[cfp_fp][80000]Accuracy-Flip: 0.98586+-0.00627
Training: 2022-01-08 12:52:05,843-[cfp_fp][80000]Accuracy-Highest: 0.98814
Training: 2022-01-08 12:52:51,768-[agedb_30][80000]XNorm: 23.331471
Training: 2022-01-08 12:52:51,769-[agedb_30][80000]Accuracy-Flip: 0.97483+-0.00758
Training: 2022-01-08 12:52:51,770-[agedb_30][80000]Accuracy-Highest: 0.97667
Training: 2022-01-08 12:52:59,371-Speed 273.44 samples/sec   Loss 6.9404   LearningRate 0.1254   Epoch: 7   Global Step: 80010   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 12:53:06,934-Speed 5417.49 samples/sec   Loss 6.9746   LearningRate 0.1254   Epoch: 7   Global Step: 80020   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 12:53:14,619-Speed 5330.39 samples/sec   Loss 6.9392   LearningRate 0.1254   Epoch: 7   Global Step: 80030   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 12:53:22,150-Speed 5440.35 samples/sec   Loss 6.9231   LearningRate 0.1253   Epoch: 7   Global Step: 80040   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 12:53:29,640-Speed 5470.18 samples/sec   Loss 6.9976   LearningRate 0.1253   Epoch: 7   Global Step: 80050   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 12:53:37,249-Speed 5384.73 samples/sec   Loss 6.9550   LearningRate 0.1253   Epoch: 7   Global Step: 80060   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 12:53:44,777-Speed 5442.34 samples/sec   Loss 6.9799   LearningRate 0.1253   Epoch: 7   Global Step: 80070   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 12:53:52,290-Speed 5452.48 samples/sec   Loss 6.8671   LearningRate 0.1253   Epoch: 7   Global Step: 80080   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 12:53:59,769-Speed 5477.28 samples/sec   Loss 6.9198   LearningRate 0.1252   Epoch: 7   Global Step: 80090   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:54:07,335-Speed 5414.83 samples/sec   Loss 6.9096   LearningRate 0.1252   Epoch: 7   Global Step: 80100   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:54:14,950-Speed 5379.56 samples/sec   Loss 6.9341   LearningRate 0.1252   Epoch: 7   Global Step: 80110   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:54:22,569-Speed 5376.36 samples/sec   Loss 6.9749   LearningRate 0.1252   Epoch: 7   Global Step: 80120   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:54:30,237-Speed 5342.05 samples/sec   Loss 7.0116   LearningRate 0.1252   Epoch: 7   Global Step: 80130   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:54:37,895-Speed 5349.88 samples/sec   Loss 6.9875   LearningRate 0.1251   Epoch: 7   Global Step: 80140   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:54:45,537-Speed 5360.70 samples/sec   Loss 6.9627   LearningRate 0.1251   Epoch: 7   Global Step: 80150   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:54:53,032-Speed 5465.20 samples/sec   Loss 6.8742   LearningRate 0.1251   Epoch: 7   Global Step: 80160   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:55:00,537-Speed 5458.36 samples/sec   Loss 6.9375   LearningRate 0.1251   Epoch: 7   Global Step: 80170   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:55:08,036-Speed 5463.25 samples/sec   Loss 6.8900   LearningRate 0.1251   Epoch: 7   Global Step: 80180   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:55:15,587-Speed 5424.81 samples/sec   Loss 7.0181   LearningRate 0.1250   Epoch: 7   Global Step: 80190   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 12:55:23,183-Speed 5392.95 samples/sec   Loss 6.8951   LearningRate 0.1250   Epoch: 7   Global Step: 80200   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 12:55:30,710-Speed 5442.75 samples/sec   Loss 6.9452   LearningRate 0.1250   Epoch: 7   Global Step: 80210   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 12:55:38,316-Speed 5386.22 samples/sec   Loss 6.9904   LearningRate 0.1250   Epoch: 7   Global Step: 80220   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 12:55:45,752-Speed 5508.60 samples/sec   Loss 6.9317   LearningRate 0.1250   Epoch: 7   Global Step: 80230   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 12:55:53,311-Speed 5419.14 samples/sec   Loss 6.9224   LearningRate 0.1249   Epoch: 7   Global Step: 80240   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 12:56:00,979-Speed 5342.25 samples/sec   Loss 6.9793   LearningRate 0.1249   Epoch: 7   Global Step: 80250   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:56:08,520-Speed 5432.90 samples/sec   Loss 6.8986   LearningRate 0.1249   Epoch: 7   Global Step: 80260   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:56:16,034-Speed 5451.90 samples/sec   Loss 6.9694   LearningRate 0.1249   Epoch: 7   Global Step: 80270   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:56:23,513-Speed 5477.46 samples/sec   Loss 6.8665   LearningRate 0.1249   Epoch: 7   Global Step: 80280   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:56:31,029-Speed 5450.30 samples/sec   Loss 6.9143   LearningRate 0.1248   Epoch: 7   Global Step: 80290   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:56:38,513-Speed 5473.44 samples/sec   Loss 6.9178   LearningRate 0.1248   Epoch: 7   Global Step: 80300   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:56:45,994-Speed 5476.07 samples/sec   Loss 6.9768   LearningRate 0.1248   Epoch: 7   Global Step: 80310   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:56:53,512-Speed 5449.34 samples/sec   Loss 6.8986   LearningRate 0.1248   Epoch: 7   Global Step: 80320   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:57:01,112-Speed 5389.64 samples/sec   Loss 6.9523   LearningRate 0.1248   Epoch: 7   Global Step: 80330   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:57:08,737-Speed 5373.14 samples/sec   Loss 6.9524   LearningRate 0.1247   Epoch: 7   Global Step: 80340   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:57:16,461-Speed 5303.28 samples/sec   Loss 6.8382   LearningRate 0.1247   Epoch: 7   Global Step: 80350   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 12:57:23,970-Speed 5455.85 samples/sec   Loss 6.8983   LearningRate 0.1247   Epoch: 7   Global Step: 80360   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 12:57:31,594-Speed 5372.91 samples/sec   Loss 6.9167   LearningRate 0.1247   Epoch: 7   Global Step: 80370   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:57:39,138-Speed 5430.61 samples/sec   Loss 6.9607   LearningRate 0.1247   Epoch: 7   Global Step: 80380   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:57:46,616-Speed 5477.47 samples/sec   Loss 6.9189   LearningRate 0.1246   Epoch: 7   Global Step: 80390   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:57:54,486-Speed 5205.23 samples/sec   Loss 6.8650   LearningRate 0.1246   Epoch: 7   Global Step: 80400   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:58:01,941-Speed 5495.35 samples/sec   Loss 6.8980   LearningRate 0.1246   Epoch: 7   Global Step: 80410   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:58:09,391-Speed 5498.58 samples/sec   Loss 6.9572   LearningRate 0.1246   Epoch: 7   Global Step: 80420   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:58:16,881-Speed 5469.09 samples/sec   Loss 6.9199   LearningRate 0.1246   Epoch: 7   Global Step: 80430   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:58:24,281-Speed 5536.26 samples/sec   Loss 6.9191   LearningRate 0.1245   Epoch: 7   Global Step: 80440   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:58:31,766-Speed 5472.72 samples/sec   Loss 6.9670   LearningRate 0.1245   Epoch: 7   Global Step: 80450   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:58:39,268-Speed 5460.84 samples/sec   Loss 6.8626   LearningRate 0.1245   Epoch: 7   Global Step: 80460   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:58:46,705-Speed 5508.44 samples/sec   Loss 6.9196   LearningRate 0.1245   Epoch: 7   Global Step: 80470   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:58:54,168-Speed 5488.75 samples/sec   Loss 6.9800   LearningRate 0.1245   Epoch: 7   Global Step: 80480   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:59:01,739-Speed 5410.81 samples/sec   Loss 6.9043   LearningRate 0.1245   Epoch: 7   Global Step: 80490   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:59:09,278-Speed 5434.57 samples/sec   Loss 6.9867   LearningRate 0.1244   Epoch: 7   Global Step: 80500   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:59:16,789-Speed 5453.87 samples/sec   Loss 6.9136   LearningRate 0.1244   Epoch: 7   Global Step: 80510   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:59:24,286-Speed 5464.20 samples/sec   Loss 6.8690   LearningRate 0.1244   Epoch: 7   Global Step: 80520   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:59:31,793-Speed 5457.18 samples/sec   Loss 6.9847   LearningRate 0.1244   Epoch: 7   Global Step: 80530   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:59:39,305-Speed 5452.89 samples/sec   Loss 6.9265   LearningRate 0.1244   Epoch: 7   Global Step: 80540   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:59:46,836-Speed 5439.74 samples/sec   Loss 6.9121   LearningRate 0.1243   Epoch: 7   Global Step: 80550   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 12:59:54,373-Speed 5435.25 samples/sec   Loss 6.8696   LearningRate 0.1243   Epoch: 7   Global Step: 80560   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:00:01,863-Speed 5469.15 samples/sec   Loss 6.8688   LearningRate 0.1243   Epoch: 7   Global Step: 80570   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:00:09,331-Speed 5485.26 samples/sec   Loss 6.9519   LearningRate 0.1243   Epoch: 7   Global Step: 80580   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:00:16,897-Speed 5414.42 samples/sec   Loss 6.9291   LearningRate 0.1243   Epoch: 7   Global Step: 80590   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:00:24,454-Speed 5420.55 samples/sec   Loss 6.9026   LearningRate 0.1242   Epoch: 7   Global Step: 80600   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:00:31,936-Speed 5475.07 samples/sec   Loss 6.9321   LearningRate 0.1242   Epoch: 7   Global Step: 80610   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:00:39,420-Speed 5474.52 samples/sec   Loss 6.9264   LearningRate 0.1242   Epoch: 7   Global Step: 80620   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:00:46,859-Speed 5506.32 samples/sec   Loss 6.8945   LearningRate 0.1242   Epoch: 7   Global Step: 80630   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:00:54,326-Speed 5485.95 samples/sec   Loss 6.9159   LearningRate 0.1242   Epoch: 7   Global Step: 80640   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:01:01,978-Speed 5354.02 samples/sec   Loss 6.9256   LearningRate 0.1241   Epoch: 7   Global Step: 80650   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:01:09,609-Speed 5367.73 samples/sec   Loss 6.9055   LearningRate 0.1241   Epoch: 7   Global Step: 80660   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:01:17,073-Speed 5489.11 samples/sec   Loss 6.9605   LearningRate 0.1241   Epoch: 7   Global Step: 80670   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:01:24,514-Speed 5505.35 samples/sec   Loss 6.8568   LearningRate 0.1241   Epoch: 7   Global Step: 80680   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:01:31,968-Speed 5495.27 samples/sec   Loss 6.8741   LearningRate 0.1241   Epoch: 7   Global Step: 80690   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:01:39,468-Speed 5462.14 samples/sec   Loss 6.8647   LearningRate 0.1240   Epoch: 7   Global Step: 80700   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:01:47,098-Speed 5369.20 samples/sec   Loss 6.8563   LearningRate 0.1240   Epoch: 7   Global Step: 80710   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:01:54,605-Speed 5457.21 samples/sec   Loss 6.9038   LearningRate 0.1240   Epoch: 7   Global Step: 80720   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:02:02,072-Speed 5486.39 samples/sec   Loss 6.9406   LearningRate 0.1240   Epoch: 7   Global Step: 80730   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:02:09,516-Speed 5503.32 samples/sec   Loss 6.8396   LearningRate 0.1240   Epoch: 7   Global Step: 80740   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:02:17,069-Speed 5424.05 samples/sec   Loss 6.9057   LearningRate 0.1239   Epoch: 7   Global Step: 80750   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:02:24,543-Speed 5481.02 samples/sec   Loss 6.8855   LearningRate 0.1239   Epoch: 7   Global Step: 80760   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:02:32,027-Speed 5473.91 samples/sec   Loss 6.8736   LearningRate 0.1239   Epoch: 7   Global Step: 80770   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:02:39,656-Speed 5369.73 samples/sec   Loss 6.9216   LearningRate 0.1239   Epoch: 7   Global Step: 80780   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:02:47,190-Speed 5437.77 samples/sec   Loss 6.9356   LearningRate 0.1239   Epoch: 7   Global Step: 80790   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:02:54,794-Speed 5386.71 samples/sec   Loss 6.8668   LearningRate 0.1238   Epoch: 7   Global Step: 80800   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:03:02,358-Speed 5416.31 samples/sec   Loss 6.8387   LearningRate 0.1238   Epoch: 7   Global Step: 80810   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:03:09,822-Speed 5488.15 samples/sec   Loss 6.9331   LearningRate 0.1238   Epoch: 7   Global Step: 80820   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:03:17,366-Speed 5430.45 samples/sec   Loss 6.9034   LearningRate 0.1238   Epoch: 7   Global Step: 80830   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:03:24,991-Speed 5371.91 samples/sec   Loss 6.9533   LearningRate 0.1238   Epoch: 7   Global Step: 80840   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:03:32,518-Speed 5442.60 samples/sec   Loss 6.9203   LearningRate 0.1237   Epoch: 7   Global Step: 80850   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:03:40,029-Speed 5453.76 samples/sec   Loss 6.9786   LearningRate 0.1237   Epoch: 7   Global Step: 80860   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:03:47,529-Speed 5462.28 samples/sec   Loss 6.8616   LearningRate 0.1237   Epoch: 7   Global Step: 80870   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:03:55,002-Speed 5481.63 samples/sec   Loss 6.8453   LearningRate 0.1237   Epoch: 7   Global Step: 80880   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:04:02,516-Speed 5451.96 samples/sec   Loss 6.8958   LearningRate 0.1237   Epoch: 7   Global Step: 80890   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:04:09,964-Speed 5500.13 samples/sec   Loss 6.8733   LearningRate 0.1236   Epoch: 7   Global Step: 80900   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:04:17,424-Speed 5491.74 samples/sec   Loss 6.8865   LearningRate 0.1236   Epoch: 7   Global Step: 80910   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:04:24,953-Speed 5440.78 samples/sec   Loss 6.9110   LearningRate 0.1236   Epoch: 7   Global Step: 80920   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:04:32,409-Speed 5494.25 samples/sec   Loss 6.8558   LearningRate 0.1236   Epoch: 7   Global Step: 80930   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:04:39,908-Speed 5462.72 samples/sec   Loss 6.9204   LearningRate 0.1236   Epoch: 7   Global Step: 80940   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:04:47,340-Speed 5512.31 samples/sec   Loss 6.9292   LearningRate 0.1235   Epoch: 7   Global Step: 80950   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:04:54,865-Speed 5443.27 samples/sec   Loss 6.8674   LearningRate 0.1235   Epoch: 7   Global Step: 80960   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:05:02,419-Speed 5423.24 samples/sec   Loss 6.8948   LearningRate 0.1235   Epoch: 7   Global Step: 80970   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:05:09,942-Speed 5445.45 samples/sec   Loss 6.8929   LearningRate 0.1235   Epoch: 7   Global Step: 80980   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:05:17,571-Speed 5369.55 samples/sec   Loss 6.8916   LearningRate 0.1235   Epoch: 7   Global Step: 80990   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:05:25,087-Speed 5449.84 samples/sec   Loss 6.8639   LearningRate 0.1235   Epoch: 7   Global Step: 81000   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:05:32,660-Speed 5409.76 samples/sec   Loss 6.8956   LearningRate 0.1234   Epoch: 7   Global Step: 81010   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:05:40,221-Speed 5417.97 samples/sec   Loss 6.9008   LearningRate 0.1234   Epoch: 7   Global Step: 81020   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:05:47,738-Speed 5450.17 samples/sec   Loss 6.8708   LearningRate 0.1234   Epoch: 7   Global Step: 81030   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:05:55,343-Speed 5386.00 samples/sec   Loss 6.9503   LearningRate 0.1234   Epoch: 7   Global Step: 81040   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:06:02,911-Speed 5413.34 samples/sec   Loss 6.8997   LearningRate 0.1234   Epoch: 7   Global Step: 81050   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:06:10,450-Speed 5433.81 samples/sec   Loss 6.9198   LearningRate 0.1233   Epoch: 7   Global Step: 81060   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:06:18,080-Speed 5368.57 samples/sec   Loss 6.8772   LearningRate 0.1233   Epoch: 7   Global Step: 81070   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:06:25,583-Speed 5460.20 samples/sec   Loss 6.8654   LearningRate 0.1233   Epoch: 7   Global Step: 81080   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:06:33,083-Speed 5462.01 samples/sec   Loss 6.8799   LearningRate 0.1233   Epoch: 7   Global Step: 81090   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:06:40,635-Speed 5424.32 samples/sec   Loss 6.8699   LearningRate 0.1233   Epoch: 7   Global Step: 81100   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:06:48,210-Speed 5408.04 samples/sec   Loss 6.8701   LearningRate 0.1232   Epoch: 7   Global Step: 81110   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:06:55,738-Speed 5441.43 samples/sec   Loss 6.8374   LearningRate 0.1232   Epoch: 7   Global Step: 81120   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:07:03,308-Speed 5411.30 samples/sec   Loss 6.9342   LearningRate 0.1232   Epoch: 7   Global Step: 81130   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:07:10,962-Speed 5352.43 samples/sec   Loss 6.8479   LearningRate 0.1232   Epoch: 7   Global Step: 81140   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:07:18,634-Speed 5339.83 samples/sec   Loss 6.9447   LearningRate 0.1232   Epoch: 7   Global Step: 81150   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:07:26,181-Speed 5427.82 samples/sec   Loss 6.9205   LearningRate 0.1231   Epoch: 7   Global Step: 81160   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:07:33,667-Speed 5472.40 samples/sec   Loss 6.8717   LearningRate 0.1231   Epoch: 7   Global Step: 81170   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:07:41,228-Speed 5417.92 samples/sec   Loss 6.8672   LearningRate 0.1231   Epoch: 7   Global Step: 81180   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:07:48,795-Speed 5413.81 samples/sec   Loss 6.8340   LearningRate 0.1231   Epoch: 7   Global Step: 81190   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:07:56,305-Speed 5454.69 samples/sec   Loss 6.8657   LearningRate 0.1231   Epoch: 7   Global Step: 81200   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:08:03,802-Speed 5464.52 samples/sec   Loss 6.8914   LearningRate 0.1230   Epoch: 7   Global Step: 81210   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:08:11,338-Speed 5435.87 samples/sec   Loss 6.8864   LearningRate 0.1230   Epoch: 7   Global Step: 81220   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:08:18,890-Speed 5424.68 samples/sec   Loss 6.9096   LearningRate 0.1230   Epoch: 7   Global Step: 81230   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:08:26,514-Speed 5373.21 samples/sec   Loss 6.7822   LearningRate 0.1230   Epoch: 7   Global Step: 81240   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:08:34,048-Speed 5437.10 samples/sec   Loss 6.9300   LearningRate 0.1230   Epoch: 7   Global Step: 81250   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:08:41,535-Speed 5472.00 samples/sec   Loss 6.9402   LearningRate 0.1229   Epoch: 7   Global Step: 81260   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:08:49,081-Speed 5429.09 samples/sec   Loss 6.8681   LearningRate 0.1229   Epoch: 7   Global Step: 81270   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:08:56,675-Speed 5394.70 samples/sec   Loss 6.8934   LearningRate 0.1229   Epoch: 7   Global Step: 81280   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:09:04,213-Speed 5433.97 samples/sec   Loss 6.8728   LearningRate 0.1229   Epoch: 7   Global Step: 81290   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:09:11,721-Speed 5456.94 samples/sec   Loss 6.8966   LearningRate 0.1229   Epoch: 7   Global Step: 81300   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:09:19,266-Speed 5428.99 samples/sec   Loss 6.8921   LearningRate 0.1228   Epoch: 7   Global Step: 81310   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:09:26,827-Speed 5417.92 samples/sec   Loss 6.8970   LearningRate 0.1228   Epoch: 7   Global Step: 81320   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:09:34,358-Speed 5439.53 samples/sec   Loss 6.8726   LearningRate 0.1228   Epoch: 7   Global Step: 81330   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:09:41,883-Speed 5444.36 samples/sec   Loss 6.8289   LearningRate 0.1228   Epoch: 7   Global Step: 81340   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:09:49,370-Speed 5471.82 samples/sec   Loss 6.9455   LearningRate 0.1228   Epoch: 7   Global Step: 81350   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:09:56,993-Speed 5373.62 samples/sec   Loss 6.8398   LearningRate 0.1227   Epoch: 7   Global Step: 81360   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:10:04,493-Speed 5462.30 samples/sec   Loss 6.8959   LearningRate 0.1227   Epoch: 7   Global Step: 81370   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:10:12,017-Speed 5444.22 samples/sec   Loss 6.9282   LearningRate 0.1227   Epoch: 7   Global Step: 81380   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:10:19,540-Speed 5445.50 samples/sec   Loss 6.8923   LearningRate 0.1227   Epoch: 7   Global Step: 81390   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:10:27,033-Speed 5467.62 samples/sec   Loss 6.8288   LearningRate 0.1227   Epoch: 7   Global Step: 81400   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:10:34,622-Speed 5397.52 samples/sec   Loss 6.8689   LearningRate 0.1227   Epoch: 7   Global Step: 81410   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:10:42,225-Speed 5388.57 samples/sec   Loss 6.8417   LearningRate 0.1226   Epoch: 7   Global Step: 81420   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:10:49,739-Speed 5451.84 samples/sec   Loss 6.8527   LearningRate 0.1226   Epoch: 7   Global Step: 81430   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:10:57,468-Speed 5300.28 samples/sec   Loss 6.9190   LearningRate 0.1226   Epoch: 7   Global Step: 81440   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:11:05,066-Speed 5391.66 samples/sec   Loss 6.8829   LearningRate 0.1226   Epoch: 7   Global Step: 81450   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:11:12,495-Speed 5514.00 samples/sec   Loss 6.8175   LearningRate 0.1226   Epoch: 7   Global Step: 81460   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:11:20,019-Speed 5445.04 samples/sec   Loss 6.8683   LearningRate 0.1225   Epoch: 7   Global Step: 81470   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:11:27,535-Speed 5450.02 samples/sec   Loss 6.8150   LearningRate 0.1225   Epoch: 7   Global Step: 81480   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:11:35,137-Speed 5389.07 samples/sec   Loss 6.8404   LearningRate 0.1225   Epoch: 7   Global Step: 81490   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:11:42,615-Speed 5477.98 samples/sec   Loss 6.8614   LearningRate 0.1225   Epoch: 7   Global Step: 81500   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:11:50,094-Speed 5477.09 samples/sec   Loss 6.8415   LearningRate 0.1225   Epoch: 7   Global Step: 81510   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:11:57,681-Speed 5399.32 samples/sec   Loss 6.9165   LearningRate 0.1224   Epoch: 7   Global Step: 81520   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:12:05,290-Speed 5384.50 samples/sec   Loss 6.8932   LearningRate 0.1224   Epoch: 7   Global Step: 81530   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:12:12,855-Speed 5414.76 samples/sec   Loss 6.8869   LearningRate 0.1224   Epoch: 7   Global Step: 81540   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:12:20,542-Speed 5329.03 samples/sec   Loss 6.8188   LearningRate 0.1224   Epoch: 7   Global Step: 81550   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:12:28,138-Speed 5393.13 samples/sec   Loss 6.8837   LearningRate 0.1224   Epoch: 7   Global Step: 81560   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:12:35,678-Speed 5433.60 samples/sec   Loss 6.8050   LearningRate 0.1223   Epoch: 7   Global Step: 81570   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:12:43,242-Speed 5415.65 samples/sec   Loss 6.8560   LearningRate 0.1223   Epoch: 7   Global Step: 81580   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:12:50,740-Speed 5463.05 samples/sec   Loss 6.8335   LearningRate 0.1223   Epoch: 7   Global Step: 81590   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:12:58,264-Speed 5444.69 samples/sec   Loss 6.7994   LearningRate 0.1223   Epoch: 7   Global Step: 81600   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:13:05,839-Speed 5407.96 samples/sec   Loss 6.8517   LearningRate 0.1223   Epoch: 7   Global Step: 81610   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:13:13,393-Speed 5423.13 samples/sec   Loss 6.8381   LearningRate 0.1222   Epoch: 7   Global Step: 81620   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:13:20,888-Speed 5465.39 samples/sec   Loss 6.8572   LearningRate 0.1222   Epoch: 7   Global Step: 81630   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:13:28,378-Speed 5469.63 samples/sec   Loss 6.8313   LearningRate 0.1222   Epoch: 7   Global Step: 81640   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:13:35,901-Speed 5445.45 samples/sec   Loss 6.8448   LearningRate 0.1222   Epoch: 7   Global Step: 81650   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:13:43,500-Speed 5391.08 samples/sec   Loss 6.8551   LearningRate 0.1222   Epoch: 7   Global Step: 81660   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:13:50,967-Speed 5485.73 samples/sec   Loss 6.8295   LearningRate 0.1221   Epoch: 7   Global Step: 81670   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:13:58,426-Speed 5492.26 samples/sec   Loss 6.7993   LearningRate 0.1221   Epoch: 7   Global Step: 81680   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:14:05,990-Speed 5415.65 samples/sec   Loss 6.8161   LearningRate 0.1221   Epoch: 7   Global Step: 81690   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:14:13,508-Speed 5448.76 samples/sec   Loss 6.7567   LearningRate 0.1221   Epoch: 7   Global Step: 81700   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:14:21,028-Speed 5448.15 samples/sec   Loss 6.8323   LearningRate 0.1221   Epoch: 7   Global Step: 81710   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:14:28,530-Speed 5460.73 samples/sec   Loss 6.8291   LearningRate 0.1220   Epoch: 7   Global Step: 81720   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:14:35,972-Speed 5504.56 samples/sec   Loss 6.8045   LearningRate 0.1220   Epoch: 7   Global Step: 81730   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:14:43,551-Speed 5405.47 samples/sec   Loss 6.7925   LearningRate 0.1220   Epoch: 7   Global Step: 81740   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:14:51,050-Speed 5462.79 samples/sec   Loss 6.8152   LearningRate 0.1220   Epoch: 7   Global Step: 81750   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:14:58,522-Speed 5482.44 samples/sec   Loss 6.8686   LearningRate 0.1220   Epoch: 7   Global Step: 81760   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:15:06,108-Speed 5399.93 samples/sec   Loss 6.8603   LearningRate 0.1220   Epoch: 7   Global Step: 81770   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:15:13,652-Speed 5430.34 samples/sec   Loss 6.8769   LearningRate 0.1219   Epoch: 7   Global Step: 81780   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:15:21,129-Speed 5479.16 samples/sec   Loss 6.7761   LearningRate 0.1219   Epoch: 7   Global Step: 81790   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:15:28,639-Speed 5454.96 samples/sec   Loss 6.7952   LearningRate 0.1219   Epoch: 7   Global Step: 81800   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:15:36,247-Speed 5384.16 samples/sec   Loss 6.8591   LearningRate 0.1219   Epoch: 7   Global Step: 81810   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:15:43,876-Speed 5370.40 samples/sec   Loss 6.8267   LearningRate 0.1219   Epoch: 7   Global Step: 81820   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:15:51,378-Speed 5460.19 samples/sec   Loss 6.8258   LearningRate 0.1218   Epoch: 7   Global Step: 81830   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:15:58,935-Speed 5420.95 samples/sec   Loss 6.9120   LearningRate 0.1218   Epoch: 7   Global Step: 81840   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:16:06,566-Speed 5368.34 samples/sec   Loss 6.8755   LearningRate 0.1218   Epoch: 7   Global Step: 81850   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:16:14,171-Speed 5386.58 samples/sec   Loss 6.8532   LearningRate 0.1218   Epoch: 7   Global Step: 81860   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:16:21,759-Speed 5398.92 samples/sec   Loss 6.8682   LearningRate 0.1218   Epoch: 7   Global Step: 81870   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:16:29,264-Speed 5458.10 samples/sec   Loss 6.8923   LearningRate 0.1217   Epoch: 7   Global Step: 81880   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:16:36,792-Speed 5441.41 samples/sec   Loss 6.8503   LearningRate 0.1217   Epoch: 7   Global Step: 81890   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:16:44,328-Speed 5436.23 samples/sec   Loss 6.8157   LearningRate 0.1217   Epoch: 7   Global Step: 81900   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:16:51,879-Speed 5425.37 samples/sec   Loss 6.8693   LearningRate 0.1217   Epoch: 7   Global Step: 81910   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:16:59,343-Speed 5488.30 samples/sec   Loss 6.8582   LearningRate 0.1217   Epoch: 7   Global Step: 81920   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:17:06,914-Speed 5410.51 samples/sec   Loss 6.9215   LearningRate 0.1216   Epoch: 7   Global Step: 81930   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:17:14,401-Speed 5472.64 samples/sec   Loss 6.8739   LearningRate 0.1216   Epoch: 7   Global Step: 81940   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:17:21,956-Speed 5422.13 samples/sec   Loss 6.8060   LearningRate 0.1216   Epoch: 7   Global Step: 81950   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:17:29,669-Speed 5310.87 samples/sec   Loss 6.7922   LearningRate 0.1216   Epoch: 7   Global Step: 81960   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:17:37,194-Speed 5444.39 samples/sec   Loss 6.7703   LearningRate 0.1216   Epoch: 7   Global Step: 81970   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:17:44,676-Speed 5474.79 samples/sec   Loss 6.8191   LearningRate 0.1215   Epoch: 7   Global Step: 81980   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:17:52,220-Speed 5430.34 samples/sec   Loss 6.8304   LearningRate 0.1215   Epoch: 7   Global Step: 81990   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:17:59,856-Speed 5364.80 samples/sec   Loss 6.8408   LearningRate 0.1215   Epoch: 7   Global Step: 82000   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:18:44,258-[lfw][82000]XNorm: 23.526637
Training: 2022-01-08 13:18:44,258-[lfw][82000]Accuracy-Flip: 0.99783+-0.00279
Training: 2022-01-08 13:18:44,259-[lfw][82000]Accuracy-Highest: 0.99817
Training: 2022-01-08 13:19:36,214-[cfp_fp][82000]XNorm: 21.435723
Training: 2022-01-08 13:19:36,215-[cfp_fp][82000]Accuracy-Flip: 0.98600+-0.00595
Training: 2022-01-08 13:19:36,215-[cfp_fp][82000]Accuracy-Highest: 0.98814
Training: 2022-01-08 13:20:22,297-[agedb_30][82000]XNorm: 23.263862
Training: 2022-01-08 13:20:22,297-[agedb_30][82000]Accuracy-Flip: 0.97567+-0.00742
Training: 2022-01-08 13:20:22,298-[agedb_30][82000]Accuracy-Highest: 0.97667
Training: 2022-01-08 13:20:29,809-Speed 273.15 samples/sec   Loss 6.8505   LearningRate 0.1215   Epoch: 7   Global Step: 82010   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:20:37,272-Speed 5490.16 samples/sec   Loss 6.8522   LearningRate 0.1215   Epoch: 7   Global Step: 82020   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:20:44,812-Speed 5433.29 samples/sec   Loss 6.8137   LearningRate 0.1214   Epoch: 7   Global Step: 82030   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:20:52,342-Speed 5441.00 samples/sec   Loss 6.8749   LearningRate 0.1214   Epoch: 7   Global Step: 82040   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:20:59,920-Speed 5405.93 samples/sec   Loss 6.8409   LearningRate 0.1214   Epoch: 7   Global Step: 82050   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:21:07,478-Speed 5420.18 samples/sec   Loss 6.8442   LearningRate 0.1214   Epoch: 7   Global Step: 82060   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:21:15,005-Speed 5442.80 samples/sec   Loss 6.8602   LearningRate 0.1214   Epoch: 7   Global Step: 82070   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:21:22,685-Speed 5333.90 samples/sec   Loss 6.8871   LearningRate 0.1214   Epoch: 7   Global Step: 82080   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:21:30,319-Speed 5366.14 samples/sec   Loss 6.8454   LearningRate 0.1213   Epoch: 7   Global Step: 82090   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:21:37,838-Speed 5447.53 samples/sec   Loss 6.8376   LearningRate 0.1213   Epoch: 7   Global Step: 82100   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:21:45,384-Speed 5429.30 samples/sec   Loss 6.8707   LearningRate 0.1213   Epoch: 7   Global Step: 82110   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:21:52,835-Speed 5498.00 samples/sec   Loss 6.8191   LearningRate 0.1213   Epoch: 7   Global Step: 82120   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:22:00,334-Speed 5462.98 samples/sec   Loss 6.8136   LearningRate 0.1213   Epoch: 7   Global Step: 82130   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:22:08,025-Speed 5326.03 samples/sec   Loss 6.8303   LearningRate 0.1212   Epoch: 7   Global Step: 82140   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:22:15,521-Speed 5464.98 samples/sec   Loss 6.8243   LearningRate 0.1212   Epoch: 7   Global Step: 82150   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:22:23,025-Speed 5459.83 samples/sec   Loss 6.8769   LearningRate 0.1212   Epoch: 7   Global Step: 82160   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:22:30,604-Speed 5404.82 samples/sec   Loss 6.8698   LearningRate 0.1212   Epoch: 7   Global Step: 82170   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:22:38,128-Speed 5445.14 samples/sec   Loss 6.7984   LearningRate 0.1212   Epoch: 7   Global Step: 82180   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:22:45,604-Speed 5479.50 samples/sec   Loss 6.7740   LearningRate 0.1211   Epoch: 7   Global Step: 82190   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:22:53,029-Speed 5516.85 samples/sec   Loss 6.7843   LearningRate 0.1211   Epoch: 7   Global Step: 82200   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:23:00,547-Speed 5449.11 samples/sec   Loss 6.8022   LearningRate 0.1211   Epoch: 7   Global Step: 82210   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:23:08,131-Speed 5402.11 samples/sec   Loss 6.7919   LearningRate 0.1211   Epoch: 7   Global Step: 82220   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:23:15,723-Speed 5395.29 samples/sec   Loss 6.8603   LearningRate 0.1211   Epoch: 7   Global Step: 82230   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:23:23,305-Speed 5403.26 samples/sec   Loss 6.7882   LearningRate 0.1210   Epoch: 7   Global Step: 82240   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:23:30,839-Speed 5437.93 samples/sec   Loss 6.8267   LearningRate 0.1210   Epoch: 7   Global Step: 82250   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:23:38,293-Speed 5495.64 samples/sec   Loss 6.8743   LearningRate 0.1210   Epoch: 7   Global Step: 82260   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:23:45,892-Speed 5390.64 samples/sec   Loss 6.8164   LearningRate 0.1210   Epoch: 7   Global Step: 82270   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:23:53,365-Speed 5481.62 samples/sec   Loss 6.8537   LearningRate 0.1210   Epoch: 7   Global Step: 82280   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:24:00,849-Speed 5473.70 samples/sec   Loss 6.7608   LearningRate 0.1209   Epoch: 7   Global Step: 82290   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:24:08,343-Speed 5466.15 samples/sec   Loss 6.8415   LearningRate 0.1209   Epoch: 7   Global Step: 82300   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:24:15,786-Speed 5504.00 samples/sec   Loss 6.8159   LearningRate 0.1209   Epoch: 7   Global Step: 82310   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:24:23,258-Speed 5482.19 samples/sec   Loss 6.7993   LearningRate 0.1209   Epoch: 7   Global Step: 82320   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:24:30,694-Speed 5510.62 samples/sec   Loss 6.7676   LearningRate 0.1209   Epoch: 7   Global Step: 82330   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:24:38,236-Speed 5431.69 samples/sec   Loss 6.7921   LearningRate 0.1208   Epoch: 7   Global Step: 82340   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:24:45,796-Speed 5418.81 samples/sec   Loss 6.8438   LearningRate 0.1208   Epoch: 7   Global Step: 82350   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:24:53,386-Speed 5397.15 samples/sec   Loss 6.8082   LearningRate 0.1208   Epoch: 7   Global Step: 82360   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:25:00,855-Speed 5484.94 samples/sec   Loss 6.8519   LearningRate 0.1208   Epoch: 7   Global Step: 82370   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:25:08,314-Speed 5492.19 samples/sec   Loss 6.7724   LearningRate 0.1208   Epoch: 7   Global Step: 82380   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:25:15,830-Speed 5450.08 samples/sec   Loss 6.7867   LearningRate 0.1208   Epoch: 7   Global Step: 82390   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:25:23,293-Speed 5489.68 samples/sec   Loss 6.8081   LearningRate 0.1207   Epoch: 7   Global Step: 82400   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:25:30,833-Speed 5432.98 samples/sec   Loss 6.7785   LearningRate 0.1207   Epoch: 7   Global Step: 82410   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:25:38,359-Speed 5443.46 samples/sec   Loss 6.8204   LearningRate 0.1207   Epoch: 7   Global Step: 82420   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:25:45,889-Speed 5439.81 samples/sec   Loss 6.8589   LearningRate 0.1207   Epoch: 7   Global Step: 82430   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:25:53,464-Speed 5408.37 samples/sec   Loss 6.8821   LearningRate 0.1207   Epoch: 7   Global Step: 82440   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:26:01,011-Speed 5428.56 samples/sec   Loss 6.8690   LearningRate 0.1206   Epoch: 7   Global Step: 82450   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:26:08,530-Speed 5447.88 samples/sec   Loss 6.7861   LearningRate 0.1206   Epoch: 7   Global Step: 82460   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:26:16,058-Speed 5441.70 samples/sec   Loss 6.7754   LearningRate 0.1206   Epoch: 7   Global Step: 82470   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:26:23,589-Speed 5439.55 samples/sec   Loss 6.8182   LearningRate 0.1206   Epoch: 7   Global Step: 82480   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:26:31,072-Speed 5474.44 samples/sec   Loss 6.7691   LearningRate 0.1206   Epoch: 7   Global Step: 82490   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:26:38,613-Speed 5432.85 samples/sec   Loss 6.8011   LearningRate 0.1205   Epoch: 7   Global Step: 82500   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:26:46,126-Speed 5452.39 samples/sec   Loss 6.7955   LearningRate 0.1205   Epoch: 7   Global Step: 82510   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:26:53,627-Speed 5461.08 samples/sec   Loss 6.7379   LearningRate 0.1205   Epoch: 7   Global Step: 82520   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:27:01,208-Speed 5403.22 samples/sec   Loss 6.7885   LearningRate 0.1205   Epoch: 7   Global Step: 82530   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:27:08,712-Speed 5459.87 samples/sec   Loss 6.8038   LearningRate 0.1205   Epoch: 7   Global Step: 82540   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:27:16,272-Speed 5418.38 samples/sec   Loss 6.7469   LearningRate 0.1204   Epoch: 7   Global Step: 82550   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:27:23,753-Speed 5475.71 samples/sec   Loss 6.8891   LearningRate 0.1204   Epoch: 7   Global Step: 82560   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:27:31,395-Speed 5360.50 samples/sec   Loss 6.7822   LearningRate 0.1204   Epoch: 7   Global Step: 82570   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:27:38,929-Speed 5438.10 samples/sec   Loss 6.7738   LearningRate 0.1204   Epoch: 7   Global Step: 82580   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:27:46,454-Speed 5443.92 samples/sec   Loss 6.8650   LearningRate 0.1204   Epoch: 7   Global Step: 82590   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:27:54,024-Speed 5411.01 samples/sec   Loss 6.7845   LearningRate 0.1203   Epoch: 7   Global Step: 82600   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 13:28:01,532-Speed 5456.21 samples/sec   Loss 6.8092   LearningRate 0.1203   Epoch: 7   Global Step: 82610   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:28:09,091-Speed 5419.88 samples/sec   Loss 6.7614   LearningRate 0.1203   Epoch: 7   Global Step: 82620   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:28:16,657-Speed 5414.00 samples/sec   Loss 6.8262   LearningRate 0.1203   Epoch: 7   Global Step: 82630   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:28:24,161-Speed 5458.85 samples/sec   Loss 6.7785   LearningRate 0.1203   Epoch: 7   Global Step: 82640   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:28:31,719-Speed 5420.35 samples/sec   Loss 6.7980   LearningRate 0.1202   Epoch: 7   Global Step: 82650   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:28:39,350-Speed 5368.91 samples/sec   Loss 6.7522   LearningRate 0.1202   Epoch: 7   Global Step: 82660   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:28:46,871-Speed 5446.75 samples/sec   Loss 6.7787   LearningRate 0.1202   Epoch: 7   Global Step: 82670   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:28:54,376-Speed 5457.56 samples/sec   Loss 6.8106   LearningRate 0.1202   Epoch: 7   Global Step: 82680   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:29:01,943-Speed 5414.10 samples/sec   Loss 6.7994   LearningRate 0.1202   Epoch: 7   Global Step: 82690   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:29:09,447-Speed 5459.48 samples/sec   Loss 6.7423   LearningRate 0.1202   Epoch: 7   Global Step: 82700   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:29:16,891-Speed 5502.86 samples/sec   Loss 6.7746   LearningRate 0.1201   Epoch: 7   Global Step: 82710   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:29:24,346-Speed 5495.44 samples/sec   Loss 6.7613   LearningRate 0.1201   Epoch: 7   Global Step: 82720   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:29:31,855-Speed 5455.60 samples/sec   Loss 6.7625   LearningRate 0.1201   Epoch: 7   Global Step: 82730   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:29:39,446-Speed 5396.77 samples/sec   Loss 6.7965   LearningRate 0.1201   Epoch: 7   Global Step: 82740   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:29:47,020-Speed 5408.95 samples/sec   Loss 6.7833   LearningRate 0.1201   Epoch: 7   Global Step: 82750   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:29:54,565-Speed 5428.71 samples/sec   Loss 6.7995   LearningRate 0.1200   Epoch: 7   Global Step: 82760   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:30:02,106-Speed 5432.55 samples/sec   Loss 6.7880   LearningRate 0.1200   Epoch: 7   Global Step: 82770   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:30:09,638-Speed 5439.28 samples/sec   Loss 6.7569   LearningRate 0.1200   Epoch: 7   Global Step: 82780   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:30:17,136-Speed 5463.82 samples/sec   Loss 6.8262   LearningRate 0.1200   Epoch: 7   Global Step: 82790   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:30:24,817-Speed 5332.74 samples/sec   Loss 6.8493   LearningRate 0.1200   Epoch: 7   Global Step: 82800   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:30:32,321-Speed 5459.16 samples/sec   Loss 6.7949   LearningRate 0.1199   Epoch: 7   Global Step: 82810   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:30:39,769-Speed 5499.97 samples/sec   Loss 6.8414   LearningRate 0.1199   Epoch: 7   Global Step: 82820   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:30:47,262-Speed 5467.53 samples/sec   Loss 6.8470   LearningRate 0.1199   Epoch: 7   Global Step: 82830   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:30:54,841-Speed 5405.43 samples/sec   Loss 6.8180   LearningRate 0.1199   Epoch: 7   Global Step: 82840   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:31:02,318-Speed 5478.78 samples/sec   Loss 6.8273   LearningRate 0.1199   Epoch: 7   Global Step: 82850   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:31:09,855-Speed 5434.59 samples/sec   Loss 6.8278   LearningRate 0.1198   Epoch: 7   Global Step: 82860   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:31:17,375-Speed 5447.76 samples/sec   Loss 6.8658   LearningRate 0.1198   Epoch: 7   Global Step: 82870   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:31:24,876-Speed 5461.78 samples/sec   Loss 6.8169   LearningRate 0.1198   Epoch: 7   Global Step: 82880   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:31:32,402-Speed 5442.72 samples/sec   Loss 6.7158   LearningRate 0.1198   Epoch: 7   Global Step: 82890   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:31:40,015-Speed 5380.32 samples/sec   Loss 6.7418   LearningRate 0.1198   Epoch: 7   Global Step: 82900   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:31:47,658-Speed 5360.40 samples/sec   Loss 6.8020   LearningRate 0.1197   Epoch: 7   Global Step: 82910   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:31:55,346-Speed 5328.65 samples/sec   Loss 6.7827   LearningRate 0.1197   Epoch: 7   Global Step: 82920   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:32:02,964-Speed 5377.20 samples/sec   Loss 6.7356   LearningRate 0.1197   Epoch: 7   Global Step: 82930   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:32:10,572-Speed 5384.25 samples/sec   Loss 6.7332   LearningRate 0.1197   Epoch: 7   Global Step: 82940   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:32:18,205-Speed 5367.23 samples/sec   Loss 6.7804   LearningRate 0.1197   Epoch: 7   Global Step: 82950   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:32:42,090-Speed 1714.95 samples/sec   Loss 6.8057   LearningRate 0.1197   Epoch: 8   Global Step: 82960   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:32:49,570-Speed 5476.42 samples/sec   Loss 6.7310   LearningRate 0.1196   Epoch: 8   Global Step: 82970   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:32:57,015-Speed 5502.73 samples/sec   Loss 6.7719   LearningRate 0.1196   Epoch: 8   Global Step: 82980   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:33:04,514-Speed 5462.88 samples/sec   Loss 6.8267   LearningRate 0.1196   Epoch: 8   Global Step: 82990   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:33:12,028-Speed 5451.84 samples/sec   Loss 6.7499   LearningRate 0.1196   Epoch: 8   Global Step: 83000   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:33:19,516-Speed 5470.86 samples/sec   Loss 6.7412   LearningRate 0.1196   Epoch: 8   Global Step: 83010   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:33:27,014-Speed 5463.61 samples/sec   Loss 6.7547   LearningRate 0.1195   Epoch: 8   Global Step: 83020   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:33:34,519-Speed 5458.64 samples/sec   Loss 6.8006   LearningRate 0.1195   Epoch: 8   Global Step: 83030   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:33:42,078-Speed 5419.35 samples/sec   Loss 6.7273   LearningRate 0.1195   Epoch: 8   Global Step: 83040   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:33:49,782-Speed 5317.95 samples/sec   Loss 6.7542   LearningRate 0.1195   Epoch: 8   Global Step: 83050   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-01-08 13:33:57,249-Speed 5485.83 samples/sec   Loss 6.7656   LearningRate 0.1195   Epoch: 8   Global Step: 83060   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:34:04,803-Speed 5423.26 samples/sec   Loss 6.8000   LearningRate 0.1194   Epoch: 8   Global Step: 83070   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:34:12,382-Speed 5405.49 samples/sec   Loss 6.7368   LearningRate 0.1194   Epoch: 8   Global Step: 83080   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 13:34:19,848-Speed 5486.98 samples/sec   Loss 6.7328   LearningRate 0.1194   Epoch: 8   Global Step: 83090   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:34:27,274-Speed 5516.91 samples/sec   Loss 6.7861   LearningRate 0.1194   Epoch: 8   Global Step: 83100   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:34:34,732-Speed 5492.31 samples/sec   Loss 6.7516   LearningRate 0.1194   Epoch: 8   Global Step: 83110   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:34:42,193-Speed 5491.01 samples/sec   Loss 6.7344   LearningRate 0.1193   Epoch: 8   Global Step: 83120   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:34:49,930-Speed 5294.66 samples/sec   Loss 6.7283   LearningRate 0.1193   Epoch: 8   Global Step: 83130   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:34:57,635-Speed 5317.02 samples/sec   Loss 6.7435   LearningRate 0.1193   Epoch: 8   Global Step: 83140   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:35:05,246-Speed 5382.17 samples/sec   Loss 6.7458   LearningRate 0.1193   Epoch: 8   Global Step: 83150   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:35:13,094-Speed 5220.19 samples/sec   Loss 6.7048   LearningRate 0.1193   Epoch: 8   Global Step: 83160   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:35:20,703-Speed 5383.69 samples/sec   Loss 6.7427   LearningRate 0.1192   Epoch: 8   Global Step: 83170   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:35:28,302-Speed 5391.04 samples/sec   Loss 6.8020   LearningRate 0.1192   Epoch: 8   Global Step: 83180   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:35:35,927-Speed 5372.24 samples/sec   Loss 6.7153   LearningRate 0.1192   Epoch: 8   Global Step: 83190   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:35:43,582-Speed 5352.13 samples/sec   Loss 6.7000   LearningRate 0.1192   Epoch: 8   Global Step: 83200   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:35:51,194-Speed 5381.01 samples/sec   Loss 6.7783   LearningRate 0.1192   Epoch: 8   Global Step: 83210   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:35:58,867-Speed 5339.18 samples/sec   Loss 6.6986   LearningRate 0.1192   Epoch: 8   Global Step: 83220   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:36:06,500-Speed 5367.04 samples/sec   Loss 6.8127   LearningRate 0.1191   Epoch: 8   Global Step: 83230   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:36:14,114-Speed 5380.75 samples/sec   Loss 6.7492   LearningRate 0.1191   Epoch: 8   Global Step: 83240   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:36:21,755-Speed 5360.99 samples/sec   Loss 6.6518   LearningRate 0.1191   Epoch: 8   Global Step: 83250   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:36:29,222-Speed 5485.92 samples/sec   Loss 6.6992   LearningRate 0.1191   Epoch: 8   Global Step: 83260   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:36:36,668-Speed 5501.80 samples/sec   Loss 6.7568   LearningRate 0.1191   Epoch: 8   Global Step: 83270   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:36:44,101-Speed 5511.50 samples/sec   Loss 6.7556   LearningRate 0.1190   Epoch: 8   Global Step: 83280   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:36:51,516-Speed 5524.46 samples/sec   Loss 6.7757   LearningRate 0.1190   Epoch: 8   Global Step: 83290   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:36:58,943-Speed 5515.17 samples/sec   Loss 6.7626   LearningRate 0.1190   Epoch: 8   Global Step: 83300   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 13:37:06,387-Speed 5503.55 samples/sec   Loss 6.7023   LearningRate 0.1190   Epoch: 8   Global Step: 83310   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 13:37:13,845-Speed 5492.58 samples/sec   Loss 6.7527   LearningRate 0.1190   Epoch: 8   Global Step: 83320   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 13:37:21,313-Speed 5485.56 samples/sec   Loss 6.7138   LearningRate 0.1189   Epoch: 8   Global Step: 83330   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 13:37:28,805-Speed 5467.81 samples/sec   Loss 6.7285   LearningRate 0.1189   Epoch: 8   Global Step: 83340   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 13:37:36,324-Speed 5447.97 samples/sec   Loss 6.7674   LearningRate 0.1189   Epoch: 8   Global Step: 83350   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 13:37:43,773-Speed 5499.85 samples/sec   Loss 6.7373   LearningRate 0.1189   Epoch: 8   Global Step: 83360   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 13:37:51,275-Speed 5460.63 samples/sec   Loss 6.6828   LearningRate 0.1189   Epoch: 8   Global Step: 83370   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 13:37:58,733-Speed 5492.61 samples/sec   Loss 6.7673   LearningRate 0.1188   Epoch: 8   Global Step: 83380   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 13:38:06,304-Speed 5410.50 samples/sec   Loss 6.7964   LearningRate 0.1188   Epoch: 8   Global Step: 83390   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 13:38:14,003-Speed 5321.30 samples/sec   Loss 6.7275   LearningRate 0.1188   Epoch: 8   Global Step: 83400   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:38:21,575-Speed 5410.08 samples/sec   Loss 6.7134   LearningRate 0.1188   Epoch: 8   Global Step: 83410   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:38:29,056-Speed 5476.09 samples/sec   Loss 6.7488   LearningRate 0.1188   Epoch: 8   Global Step: 83420   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:38:36,508-Speed 5497.22 samples/sec   Loss 6.7602   LearningRate 0.1187   Epoch: 8   Global Step: 83430   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:38:43,970-Speed 5489.75 samples/sec   Loss 6.7365   LearningRate 0.1187   Epoch: 8   Global Step: 83440   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:38:51,397-Speed 5515.92 samples/sec   Loss 6.7427   LearningRate 0.1187   Epoch: 8   Global Step: 83450   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:38:58,815-Speed 5522.40 samples/sec   Loss 6.7299   LearningRate 0.1187   Epoch: 8   Global Step: 83460   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:39:06,368-Speed 5423.33 samples/sec   Loss 6.7190   LearningRate 0.1187   Epoch: 8   Global Step: 83470   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:39:13,943-Speed 5408.58 samples/sec   Loss 6.7309   LearningRate 0.1187   Epoch: 8   Global Step: 83480   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:39:21,391-Speed 5499.88 samples/sec   Loss 6.7657   LearningRate 0.1186   Epoch: 8   Global Step: 83490   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:39:28,955-Speed 5416.38 samples/sec   Loss 6.7365   LearningRate 0.1186   Epoch: 8   Global Step: 83500   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:39:36,591-Speed 5364.36 samples/sec   Loss 6.7440   LearningRate 0.1186   Epoch: 8   Global Step: 83510   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:39:44,147-Speed 5421.84 samples/sec   Loss 6.7696   LearningRate 0.1186   Epoch: 8   Global Step: 83520   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:39:51,628-Speed 5475.98 samples/sec   Loss 6.7055   LearningRate 0.1186   Epoch: 8   Global Step: 83530   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:39:59,077-Speed 5499.94 samples/sec   Loss 6.7403   LearningRate 0.1185   Epoch: 8   Global Step: 83540   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:40:06,635-Speed 5419.52 samples/sec   Loss 6.7208   LearningRate 0.1185   Epoch: 8   Global Step: 83550   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:40:14,188-Speed 5424.58 samples/sec   Loss 6.7118   LearningRate 0.1185   Epoch: 8   Global Step: 83560   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:40:21,702-Speed 5451.46 samples/sec   Loss 6.7500   LearningRate 0.1185   Epoch: 8   Global Step: 83570   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:40:29,251-Speed 5426.84 samples/sec   Loss 6.7194   LearningRate 0.1185   Epoch: 8   Global Step: 83580   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:40:36,790-Speed 5433.43 samples/sec   Loss 6.7658   LearningRate 0.1184   Epoch: 8   Global Step: 83590   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:40:44,385-Speed 5394.36 samples/sec   Loss 6.7546   LearningRate 0.1184   Epoch: 8   Global Step: 83600   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:40:51,895-Speed 5454.32 samples/sec   Loss 6.7519   LearningRate 0.1184   Epoch: 8   Global Step: 83610   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:40:59,440-Speed 5429.57 samples/sec   Loss 6.7388   LearningRate 0.1184   Epoch: 8   Global Step: 83620   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:41:06,992-Speed 5424.35 samples/sec   Loss 6.7276   LearningRate 0.1184   Epoch: 8   Global Step: 83630   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:41:14,603-Speed 5383.12 samples/sec   Loss 6.7384   LearningRate 0.1183   Epoch: 8   Global Step: 83640   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:41:22,138-Speed 5435.99 samples/sec   Loss 6.7601   LearningRate 0.1183   Epoch: 8   Global Step: 83650   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:41:29,668-Speed 5440.36 samples/sec   Loss 6.7822   LearningRate 0.1183   Epoch: 8   Global Step: 83660   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:41:37,212-Speed 5430.59 samples/sec   Loss 6.7569   LearningRate 0.1183   Epoch: 8   Global Step: 83670   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:41:44,640-Speed 5514.95 samples/sec   Loss 6.7695   LearningRate 0.1183   Epoch: 8   Global Step: 83680   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:41:52,110-Speed 5484.13 samples/sec   Loss 6.7680   LearningRate 0.1183   Epoch: 8   Global Step: 83690   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:41:59,655-Speed 5429.42 samples/sec   Loss 6.7301   LearningRate 0.1182   Epoch: 8   Global Step: 83700   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:42:07,129-Speed 5481.60 samples/sec   Loss 6.6840   LearningRate 0.1182   Epoch: 8   Global Step: 83710   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:42:14,684-Speed 5422.24 samples/sec   Loss 6.7314   LearningRate 0.1182   Epoch: 8   Global Step: 83720   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:42:22,200-Speed 5450.19 samples/sec   Loss 6.6555   LearningRate 0.1182   Epoch: 8   Global Step: 83730   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:42:29,804-Speed 5387.88 samples/sec   Loss 6.7391   LearningRate 0.1182   Epoch: 8   Global Step: 83740   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:42:37,372-Speed 5412.60 samples/sec   Loss 6.7288   LearningRate 0.1181   Epoch: 8   Global Step: 83750   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:42:44,965-Speed 5395.33 samples/sec   Loss 6.7925   LearningRate 0.1181   Epoch: 8   Global Step: 83760   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:42:52,532-Speed 5413.80 samples/sec   Loss 6.7585   LearningRate 0.1181   Epoch: 8   Global Step: 83770   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:43:00,119-Speed 5399.71 samples/sec   Loss 6.6597   LearningRate 0.1181   Epoch: 8   Global Step: 83780   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:43:07,776-Speed 5349.96 samples/sec   Loss 6.6532   LearningRate 0.1181   Epoch: 8   Global Step: 83790   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:43:15,276-Speed 5461.98 samples/sec   Loss 6.7538   LearningRate 0.1180   Epoch: 8   Global Step: 83800   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:43:22,790-Speed 5451.94 samples/sec   Loss 6.7616   LearningRate 0.1180   Epoch: 8   Global Step: 83810   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:43:30,314-Speed 5444.37 samples/sec   Loss 6.7258   LearningRate 0.1180   Epoch: 8   Global Step: 83820   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:43:37,765-Speed 5498.03 samples/sec   Loss 6.7050   LearningRate 0.1180   Epoch: 8   Global Step: 83830   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:43:45,230-Speed 5487.72 samples/sec   Loss 6.7559   LearningRate 0.1180   Epoch: 8   Global Step: 83840   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:43:52,809-Speed 5405.29 samples/sec   Loss 6.7673   LearningRate 0.1179   Epoch: 8   Global Step: 83850   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:44:00,432-Speed 5374.05 samples/sec   Loss 6.7110   LearningRate 0.1179   Epoch: 8   Global Step: 83860   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:44:07,928-Speed 5464.88 samples/sec   Loss 6.7734   LearningRate 0.1179   Epoch: 8   Global Step: 83870   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:44:15,514-Speed 5400.23 samples/sec   Loss 6.7774   LearningRate 0.1179   Epoch: 8   Global Step: 83880   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:44:22,982-Speed 5485.45 samples/sec   Loss 6.7304   LearningRate 0.1179   Epoch: 8   Global Step: 83890   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:44:30,478-Speed 5465.48 samples/sec   Loss 6.7360   LearningRate 0.1179   Epoch: 8   Global Step: 83900   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:44:38,004-Speed 5443.01 samples/sec   Loss 6.7352   LearningRate 0.1178   Epoch: 8   Global Step: 83910   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:44:45,444-Speed 5506.49 samples/sec   Loss 6.7983   LearningRate 0.1178   Epoch: 8   Global Step: 83920   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:44:52,916-Speed 5482.04 samples/sec   Loss 6.7211   LearningRate 0.1178   Epoch: 8   Global Step: 83930   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:45:00,437-Speed 5447.23 samples/sec   Loss 6.6908   LearningRate 0.1178   Epoch: 8   Global Step: 83940   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:45:08,082-Speed 5358.30 samples/sec   Loss 6.6897   LearningRate 0.1178   Epoch: 8   Global Step: 83950   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:45:15,719-Speed 5364.01 samples/sec   Loss 6.7198   LearningRate 0.1177   Epoch: 8   Global Step: 83960   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:45:23,174-Speed 5495.03 samples/sec   Loss 6.6763   LearningRate 0.1177   Epoch: 8   Global Step: 83970   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:45:30,742-Speed 5413.09 samples/sec   Loss 6.6443   LearningRate 0.1177   Epoch: 8   Global Step: 83980   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:45:38,263-Speed 5446.65 samples/sec   Loss 6.6775   LearningRate 0.1177   Epoch: 8   Global Step: 83990   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:45:45,755-Speed 5467.87 samples/sec   Loss 6.6942   LearningRate 0.1177   Epoch: 8   Global Step: 84000   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:46:29,826-[lfw][84000]XNorm: 22.226733
Training: 2022-01-08 13:46:29,827-[lfw][84000]Accuracy-Flip: 0.99783+-0.00279
Training: 2022-01-08 13:46:29,827-[lfw][84000]Accuracy-Highest: 0.99817
Training: 2022-01-08 13:47:21,553-[cfp_fp][84000]XNorm: 20.178478
Training: 2022-01-08 13:47:21,554-[cfp_fp][84000]Accuracy-Flip: 0.98671+-0.00507
Training: 2022-01-08 13:47:21,555-[cfp_fp][84000]Accuracy-Highest: 0.98814
Training: 2022-01-08 13:48:07,354-[agedb_30][84000]XNorm: 21.844130
Training: 2022-01-08 13:48:07,355-[agedb_30][84000]Accuracy-Flip: 0.97350+-0.01004
Training: 2022-01-08 13:48:07,355-[agedb_30][84000]Accuracy-Highest: 0.97667
Training: 2022-01-08 13:48:14,836-Speed 274.75 samples/sec   Loss 6.7012   LearningRate 0.1176   Epoch: 8   Global Step: 84010   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:48:22,470-Speed 5366.47 samples/sec   Loss 6.7188   LearningRate 0.1176   Epoch: 8   Global Step: 84020   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:48:30,017-Speed 5428.36 samples/sec   Loss 6.7422   LearningRate 0.1176   Epoch: 8   Global Step: 84030   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:48:37,653-Speed 5365.60 samples/sec   Loss 6.6631   LearningRate 0.1176   Epoch: 8   Global Step: 84040   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:48:45,332-Speed 5335.16 samples/sec   Loss 6.7228   LearningRate 0.1176   Epoch: 8   Global Step: 84050   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:48:52,835-Speed 5459.74 samples/sec   Loss 6.6781   LearningRate 0.1175   Epoch: 8   Global Step: 84060   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:49:00,465-Speed 5369.05 samples/sec   Loss 6.6801   LearningRate 0.1175   Epoch: 8   Global Step: 84070   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:49:08,061-Speed 5393.18 samples/sec   Loss 6.7173   LearningRate 0.1175   Epoch: 8   Global Step: 84080   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:49:15,531-Speed 5484.06 samples/sec   Loss 6.6935   LearningRate 0.1175   Epoch: 8   Global Step: 84090   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:49:23,107-Speed 5407.66 samples/sec   Loss 6.7323   LearningRate 0.1175   Epoch: 8   Global Step: 84100   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:49:30,813-Speed 5315.75 samples/sec   Loss 6.7788   LearningRate 0.1175   Epoch: 8   Global Step: 84110   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:49:38,286-Speed 5481.49 samples/sec   Loss 6.6747   LearningRate 0.1174   Epoch: 8   Global Step: 84120   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:49:45,770-Speed 5473.98 samples/sec   Loss 6.6829   LearningRate 0.1174   Epoch: 8   Global Step: 84130   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:49:53,199-Speed 5513.87 samples/sec   Loss 6.7607   LearningRate 0.1174   Epoch: 8   Global Step: 84140   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:50:00,722-Speed 5445.92 samples/sec   Loss 6.6801   LearningRate 0.1174   Epoch: 8   Global Step: 84150   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:50:08,252-Speed 5440.40 samples/sec   Loss 6.6725   LearningRate 0.1174   Epoch: 8   Global Step: 84160   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:50:15,855-Speed 5388.36 samples/sec   Loss 6.7276   LearningRate 0.1173   Epoch: 8   Global Step: 84170   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:50:23,360-Speed 5457.82 samples/sec   Loss 6.7520   LearningRate 0.1173   Epoch: 8   Global Step: 84180   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:50:30,838-Speed 5478.20 samples/sec   Loss 6.7588   LearningRate 0.1173   Epoch: 8   Global Step: 84190   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:50:38,311-Speed 5482.16 samples/sec   Loss 6.6819   LearningRate 0.1173   Epoch: 8   Global Step: 84200   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:50:45,789-Speed 5478.01 samples/sec   Loss 6.7098   LearningRate 0.1173   Epoch: 8   Global Step: 84210   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:50:53,444-Speed 5351.60 samples/sec   Loss 6.7440   LearningRate 0.1172   Epoch: 8   Global Step: 84220   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:51:01,129-Speed 5330.50 samples/sec   Loss 6.6540   LearningRate 0.1172   Epoch: 8   Global Step: 84230   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:51:08,674-Speed 5429.61 samples/sec   Loss 6.6974   LearningRate 0.1172   Epoch: 8   Global Step: 84240   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:51:16,106-Speed 5512.23 samples/sec   Loss 6.6935   LearningRate 0.1172   Epoch: 8   Global Step: 84250   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:51:23,606-Speed 5462.19 samples/sec   Loss 6.6991   LearningRate 0.1172   Epoch: 8   Global Step: 84260   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:51:31,098-Speed 5467.34 samples/sec   Loss 6.6913   LearningRate 0.1171   Epoch: 8   Global Step: 84270   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:51:38,522-Speed 5518.41 samples/sec   Loss 6.6136   LearningRate 0.1171   Epoch: 8   Global Step: 84280   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:51:45,964-Speed 5504.38 samples/sec   Loss 6.6909   LearningRate 0.1171   Epoch: 8   Global Step: 84290   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:51:53,440-Speed 5479.81 samples/sec   Loss 6.6680   LearningRate 0.1171   Epoch: 8   Global Step: 84300   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:52:00,894-Speed 5495.25 samples/sec   Loss 6.7151   LearningRate 0.1171   Epoch: 8   Global Step: 84310   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:52:08,439-Speed 5429.24 samples/sec   Loss 6.6489   LearningRate 0.1171   Epoch: 8   Global Step: 84320   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:52:15,909-Speed 5484.30 samples/sec   Loss 6.7609   LearningRate 0.1170   Epoch: 8   Global Step: 84330   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:52:23,339-Speed 5513.37 samples/sec   Loss 6.6413   LearningRate 0.1170   Epoch: 8   Global Step: 84340   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:52:30,793-Speed 5496.19 samples/sec   Loss 6.7017   LearningRate 0.1170   Epoch: 8   Global Step: 84350   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:52:38,258-Speed 5487.75 samples/sec   Loss 6.6317   LearningRate 0.1170   Epoch: 8   Global Step: 84360   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:52:45,750-Speed 5467.42 samples/sec   Loss 6.6771   LearningRate 0.1170   Epoch: 8   Global Step: 84370   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:52:53,258-Speed 5456.95 samples/sec   Loss 6.7131   LearningRate 0.1169   Epoch: 8   Global Step: 84380   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:53:00,723-Speed 5487.58 samples/sec   Loss 6.7536   LearningRate 0.1169   Epoch: 8   Global Step: 84390   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:53:08,296-Speed 5409.32 samples/sec   Loss 6.6427   LearningRate 0.1169   Epoch: 8   Global Step: 84400   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:53:15,907-Speed 5381.88 samples/sec   Loss 6.6591   LearningRate 0.1169   Epoch: 8   Global Step: 84410   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:53:23,433-Speed 5443.53 samples/sec   Loss 6.6570   LearningRate 0.1169   Epoch: 8   Global Step: 84420   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:53:30,897-Speed 5488.21 samples/sec   Loss 6.7111   LearningRate 0.1168   Epoch: 8   Global Step: 84430   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:53:38,355-Speed 5492.87 samples/sec   Loss 6.6456   LearningRate 0.1168   Epoch: 8   Global Step: 84440   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:53:45,815-Speed 5491.76 samples/sec   Loss 6.7416   LearningRate 0.1168   Epoch: 8   Global Step: 84450   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:53:53,271-Speed 5494.38 samples/sec   Loss 6.7094   LearningRate 0.1168   Epoch: 8   Global Step: 84460   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:54:00,823-Speed 5424.09 samples/sec   Loss 6.6838   LearningRate 0.1168   Epoch: 8   Global Step: 84470   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:54:08,536-Speed 5311.21 samples/sec   Loss 6.6623   LearningRate 0.1167   Epoch: 8   Global Step: 84480   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:54:16,059-Speed 5446.01 samples/sec   Loss 6.6271   LearningRate 0.1167   Epoch: 8   Global Step: 84490   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:54:23,601-Speed 5431.73 samples/sec   Loss 6.6358   LearningRate 0.1167   Epoch: 8   Global Step: 84500   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:54:31,101-Speed 5461.80 samples/sec   Loss 6.6724   LearningRate 0.1167   Epoch: 8   Global Step: 84510   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:54:38,579-Speed 5478.11 samples/sec   Loss 6.7000   LearningRate 0.1167   Epoch: 8   Global Step: 84520   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:54:46,104-Speed 5443.71 samples/sec   Loss 6.7025   LearningRate 0.1167   Epoch: 8   Global Step: 84530   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:54:53,583-Speed 5477.68 samples/sec   Loss 6.7022   LearningRate 0.1166   Epoch: 8   Global Step: 84540   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:55:01,114-Speed 5440.03 samples/sec   Loss 6.6689   LearningRate 0.1166   Epoch: 8   Global Step: 84550   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:55:08,589-Speed 5480.33 samples/sec   Loss 6.7097   LearningRate 0.1166   Epoch: 8   Global Step: 84560   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:55:16,104-Speed 5450.61 samples/sec   Loss 6.7188   LearningRate 0.1166   Epoch: 8   Global Step: 84570   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:55:23,577-Speed 5482.34 samples/sec   Loss 6.6684   LearningRate 0.1166   Epoch: 8   Global Step: 84580   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:55:31,156-Speed 5405.10 samples/sec   Loss 6.6914   LearningRate 0.1165   Epoch: 8   Global Step: 84590   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:55:38,629-Speed 5481.51 samples/sec   Loss 6.6629   LearningRate 0.1165   Epoch: 8   Global Step: 84600   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:55:46,102-Speed 5482.29 samples/sec   Loss 6.5725   LearningRate 0.1165   Epoch: 8   Global Step: 84610   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:55:53,720-Speed 5377.50 samples/sec   Loss 6.6533   LearningRate 0.1165   Epoch: 8   Global Step: 84620   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:56:01,232-Speed 5453.68 samples/sec   Loss 6.6627   LearningRate 0.1165   Epoch: 8   Global Step: 84630   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 13:56:08,693-Speed 5490.46 samples/sec   Loss 6.6788   LearningRate 0.1164   Epoch: 8   Global Step: 84640   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:56:16,214-Speed 5447.09 samples/sec   Loss 6.7088   LearningRate 0.1164   Epoch: 8   Global Step: 84650   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:56:23,771-Speed 5420.36 samples/sec   Loss 6.6853   LearningRate 0.1164   Epoch: 8   Global Step: 84660   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:56:31,234-Speed 5489.00 samples/sec   Loss 6.6726   LearningRate 0.1164   Epoch: 8   Global Step: 84670   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:56:38,643-Speed 5529.18 samples/sec   Loss 6.6997   LearningRate 0.1164   Epoch: 8   Global Step: 84680   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:56:46,088-Speed 5502.89 samples/sec   Loss 6.7408   LearningRate 0.1163   Epoch: 8   Global Step: 84690   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:56:53,494-Speed 5531.40 samples/sec   Loss 6.6564   LearningRate 0.1163   Epoch: 8   Global Step: 84700   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:57:00,972-Speed 5478.18 samples/sec   Loss 6.6698   LearningRate 0.1163   Epoch: 8   Global Step: 84710   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:57:08,470-Speed 5463.10 samples/sec   Loss 6.6970   LearningRate 0.1163   Epoch: 8   Global Step: 84720   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:57:15,973-Speed 5459.98 samples/sec   Loss 6.6650   LearningRate 0.1163   Epoch: 8   Global Step: 84730   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:57:23,397-Speed 5518.51 samples/sec   Loss 6.6756   LearningRate 0.1163   Epoch: 8   Global Step: 84740   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:57:30,928-Speed 5439.08 samples/sec   Loss 6.6847   LearningRate 0.1162   Epoch: 8   Global Step: 84750   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:57:38,415-Speed 5472.01 samples/sec   Loss 6.6914   LearningRate 0.1162   Epoch: 8   Global Step: 84760   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:57:45,929-Speed 5452.13 samples/sec   Loss 6.6630   LearningRate 0.1162   Epoch: 8   Global Step: 84770   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:57:53,512-Speed 5401.77 samples/sec   Loss 6.6886   LearningRate 0.1162   Epoch: 8   Global Step: 84780   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:58:01,175-Speed 5345.79 samples/sec   Loss 6.6029   LearningRate 0.1162   Epoch: 8   Global Step: 84790   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:58:08,862-Speed 5329.71 samples/sec   Loss 6.6242   LearningRate 0.1161   Epoch: 8   Global Step: 84800   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:58:16,552-Speed 5327.26 samples/sec   Loss 6.6601   LearningRate 0.1161   Epoch: 8   Global Step: 84810   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:58:24,084-Speed 5438.94 samples/sec   Loss 6.7393   LearningRate 0.1161   Epoch: 8   Global Step: 84820   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:58:31,585-Speed 5460.72 samples/sec   Loss 6.6310   LearningRate 0.1161   Epoch: 8   Global Step: 84830   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:58:39,044-Speed 5491.99 samples/sec   Loss 6.6463   LearningRate 0.1161   Epoch: 8   Global Step: 84840   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:58:46,539-Speed 5465.66 samples/sec   Loss 6.6610   LearningRate 0.1160   Epoch: 8   Global Step: 84850   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:58:54,045-Speed 5457.74 samples/sec   Loss 6.6573   LearningRate 0.1160   Epoch: 8   Global Step: 84860   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:59:01,661-Speed 5378.89 samples/sec   Loss 6.6794   LearningRate 0.1160   Epoch: 8   Global Step: 84870   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:59:09,154-Speed 5467.21 samples/sec   Loss 6.6377   LearningRate 0.1160   Epoch: 8   Global Step: 84880   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:59:16,661-Speed 5456.73 samples/sec   Loss 6.6345   LearningRate 0.1160   Epoch: 8   Global Step: 84890   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:59:24,183-Speed 5445.97 samples/sec   Loss 6.6645   LearningRate 0.1159   Epoch: 8   Global Step: 84900   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:59:31,871-Speed 5328.98 samples/sec   Loss 6.6578   LearningRate 0.1159   Epoch: 8   Global Step: 84910   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:59:39,343-Speed 5482.57 samples/sec   Loss 6.6042   LearningRate 0.1159   Epoch: 8   Global Step: 84920   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:59:46,835-Speed 5467.67 samples/sec   Loss 6.6086   LearningRate 0.1159   Epoch: 8   Global Step: 84930   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 13:59:54,290-Speed 5494.49 samples/sec   Loss 6.6345   LearningRate 0.1159   Epoch: 8   Global Step: 84940   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:00:01,807-Speed 5449.58 samples/sec   Loss 6.6443   LearningRate 0.1159   Epoch: 8   Global Step: 84950   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:00:09,297-Speed 5469.69 samples/sec   Loss 6.6419   LearningRate 0.1158   Epoch: 8   Global Step: 84960   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:00:16,709-Speed 5526.67 samples/sec   Loss 6.6365   LearningRate 0.1158   Epoch: 8   Global Step: 84970   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:00:24,159-Speed 5498.98 samples/sec   Loss 6.6748   LearningRate 0.1158   Epoch: 8   Global Step: 84980   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:00:31,686-Speed 5442.66 samples/sec   Loss 6.5995   LearningRate 0.1158   Epoch: 8   Global Step: 84990   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:00:39,169-Speed 5474.28 samples/sec   Loss 6.6704   LearningRate 0.1158   Epoch: 8   Global Step: 85000   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:00:46,830-Speed 5346.93 samples/sec   Loss 6.6853   LearningRate 0.1157   Epoch: 8   Global Step: 85010   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:00:54,355-Speed 5443.70 samples/sec   Loss 6.6442   LearningRate 0.1157   Epoch: 8   Global Step: 85020   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:01:01,875-Speed 5447.75 samples/sec   Loss 6.6507   LearningRate 0.1157   Epoch: 8   Global Step: 85030   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:01:09,405-Speed 5440.22 samples/sec   Loss 6.5955   LearningRate 0.1157   Epoch: 8   Global Step: 85040   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:01:16,930-Speed 5444.12 samples/sec   Loss 6.6117   LearningRate 0.1157   Epoch: 8   Global Step: 85050   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:01:24,447-Speed 5449.37 samples/sec   Loss 6.6186   LearningRate 0.1156   Epoch: 8   Global Step: 85060   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:01:32,030-Speed 5443.67 samples/sec   Loss 6.6410   LearningRate 0.1156   Epoch: 8   Global Step: 85070   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:01:39,681-Speed 5370.53 samples/sec   Loss 6.6339   LearningRate 0.1156   Epoch: 8   Global Step: 85080   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:01:47,137-Speed 5494.91 samples/sec   Loss 6.6984   LearningRate 0.1156   Epoch: 8   Global Step: 85090   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:01:54,662-Speed 5443.44 samples/sec   Loss 6.5941   LearningRate 0.1156   Epoch: 8   Global Step: 85100   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:02:02,157-Speed 5465.28 samples/sec   Loss 6.6401   LearningRate 0.1156   Epoch: 8   Global Step: 85110   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:02:09,709-Speed 5424.98 samples/sec   Loss 6.6007   LearningRate 0.1155   Epoch: 8   Global Step: 85120   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:02:17,363-Speed 5352.07 samples/sec   Loss 6.6086   LearningRate 0.1155   Epoch: 8   Global Step: 85130   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:02:24,910-Speed 5428.33 samples/sec   Loss 6.6530   LearningRate 0.1155   Epoch: 8   Global Step: 85140   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:02:32,360-Speed 5498.20 samples/sec   Loss 6.6298   LearningRate 0.1155   Epoch: 8   Global Step: 85150   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:02:39,837-Speed 5478.76 samples/sec   Loss 6.6294   LearningRate 0.1155   Epoch: 8   Global Step: 85160   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:02:47,373-Speed 5436.30 samples/sec   Loss 6.5985   LearningRate 0.1154   Epoch: 8   Global Step: 85170   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:02:54,938-Speed 5415.52 samples/sec   Loss 6.6267   LearningRate 0.1154   Epoch: 8   Global Step: 85180   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:03:02,497-Speed 5419.39 samples/sec   Loss 6.6779   LearningRate 0.1154   Epoch: 8   Global Step: 85190   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:03:10,057-Speed 5418.60 samples/sec   Loss 6.5809   LearningRate 0.1154   Epoch: 8   Global Step: 85200   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:03:17,686-Speed 5369.93 samples/sec   Loss 6.6340   LearningRate 0.1154   Epoch: 8   Global Step: 85210   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:03:25,241-Speed 5422.03 samples/sec   Loss 6.6439   LearningRate 0.1153   Epoch: 8   Global Step: 85220   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:03:32,743-Speed 5460.76 samples/sec   Loss 6.6016   LearningRate 0.1153   Epoch: 8   Global Step: 85230   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:03:40,253-Speed 5454.66 samples/sec   Loss 6.6276   LearningRate 0.1153   Epoch: 8   Global Step: 85240   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:03:47,699-Speed 5501.96 samples/sec   Loss 6.6952   LearningRate 0.1153   Epoch: 8   Global Step: 85250   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:03:55,236-Speed 5435.13 samples/sec   Loss 6.6535   LearningRate 0.1153   Epoch: 8   Global Step: 85260   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:04:02,685-Speed 5498.82 samples/sec   Loss 6.6060   LearningRate 0.1153   Epoch: 8   Global Step: 85270   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:04:10,255-Speed 5412.12 samples/sec   Loss 6.6795   LearningRate 0.1152   Epoch: 8   Global Step: 85280   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:04:17,839-Speed 5401.68 samples/sec   Loss 6.6528   LearningRate 0.1152   Epoch: 8   Global Step: 85290   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:04:25,425-Speed 5399.88 samples/sec   Loss 6.6296   LearningRate 0.1152   Epoch: 8   Global Step: 85300   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:04:32,923-Speed 5464.34 samples/sec   Loss 6.6822   LearningRate 0.1152   Epoch: 8   Global Step: 85310   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:04:40,514-Speed 5396.32 samples/sec   Loss 6.7030   LearningRate 0.1152   Epoch: 8   Global Step: 85320   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:04:48,054-Speed 5432.67 samples/sec   Loss 6.6743   LearningRate 0.1151   Epoch: 8   Global Step: 85330   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:04:55,580-Speed 5443.56 samples/sec   Loss 6.6477   LearningRate 0.1151   Epoch: 8   Global Step: 85340   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:05:03,093-Speed 5452.95 samples/sec   Loss 6.6743   LearningRate 0.1151   Epoch: 8   Global Step: 85350   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:05:10,529-Speed 5508.41 samples/sec   Loss 6.6544   LearningRate 0.1151   Epoch: 8   Global Step: 85360   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:05:17,977-Speed 5500.69 samples/sec   Loss 6.6225   LearningRate 0.1151   Epoch: 8   Global Step: 85370   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:05:25,482-Speed 5458.07 samples/sec   Loss 6.6257   LearningRate 0.1150   Epoch: 8   Global Step: 85380   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:05:32,962-Speed 5476.94 samples/sec   Loss 6.6041   LearningRate 0.1150   Epoch: 8   Global Step: 85390   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:05:40,474-Speed 5453.18 samples/sec   Loss 6.6351   LearningRate 0.1150   Epoch: 8   Global Step: 85400   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:05:48,063-Speed 5398.60 samples/sec   Loss 6.6061   LearningRate 0.1150   Epoch: 8   Global Step: 85410   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:05:55,655-Speed 5395.55 samples/sec   Loss 6.5946   LearningRate 0.1150   Epoch: 8   Global Step: 85420   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:06:03,122-Speed 5486.26 samples/sec   Loss 6.5891   LearningRate 0.1149   Epoch: 8   Global Step: 85430   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:06:10,639-Speed 5449.89 samples/sec   Loss 6.6104   LearningRate 0.1149   Epoch: 8   Global Step: 85440   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:06:18,200-Speed 5417.61 samples/sec   Loss 6.5990   LearningRate 0.1149   Epoch: 8   Global Step: 85450   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:06:25,654-Speed 5496.27 samples/sec   Loss 6.6013   LearningRate 0.1149   Epoch: 8   Global Step: 85460   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:06:33,129-Speed 5480.35 samples/sec   Loss 6.6100   LearningRate 0.1149   Epoch: 8   Global Step: 85470   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:06:40,690-Speed 5417.63 samples/sec   Loss 6.6041   LearningRate 0.1149   Epoch: 8   Global Step: 85480   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:06:48,162-Speed 5482.99 samples/sec   Loss 6.5650   LearningRate 0.1148   Epoch: 8   Global Step: 85490   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:06:55,635-Speed 5481.49 samples/sec   Loss 6.5526   LearningRate 0.1148   Epoch: 8   Global Step: 85500   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:07:03,260-Speed 5372.80 samples/sec   Loss 6.6491   LearningRate 0.1148   Epoch: 8   Global Step: 85510   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:07:10,837-Speed 5406.43 samples/sec   Loss 6.5901   LearningRate 0.1148   Epoch: 8   Global Step: 85520   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:07:18,453-Speed 5379.14 samples/sec   Loss 6.5953   LearningRate 0.1148   Epoch: 8   Global Step: 85530   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:07:26,057-Speed 5387.30 samples/sec   Loss 6.6504   LearningRate 0.1147   Epoch: 8   Global Step: 85540   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:07:33,517-Speed 5491.35 samples/sec   Loss 6.6399   LearningRate 0.1147   Epoch: 8   Global Step: 85550   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:07:41,046-Speed 5440.83 samples/sec   Loss 6.6101   LearningRate 0.1147   Epoch: 8   Global Step: 85560   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:07:48,794-Speed 5287.37 samples/sec   Loss 6.6206   LearningRate 0.1147   Epoch: 8   Global Step: 85570   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:07:56,305-Speed 5453.68 samples/sec   Loss 6.6191   LearningRate 0.1147   Epoch: 8   Global Step: 85580   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:08:03,819-Speed 5452.27 samples/sec   Loss 6.5498   LearningRate 0.1146   Epoch: 8   Global Step: 85590   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:08:11,269-Speed 5498.23 samples/sec   Loss 6.6459   LearningRate 0.1146   Epoch: 8   Global Step: 85600   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:08:18,813-Speed 5430.30 samples/sec   Loss 6.6465   LearningRate 0.1146   Epoch: 8   Global Step: 85610   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:08:26,326-Speed 5452.65 samples/sec   Loss 6.6723   LearningRate 0.1146   Epoch: 8   Global Step: 85620   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:08:33,931-Speed 5387.09 samples/sec   Loss 6.5721   LearningRate 0.1146   Epoch: 8   Global Step: 85630   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:08:41,654-Speed 5303.99 samples/sec   Loss 6.5831   LearningRate 0.1146   Epoch: 8   Global Step: 85640   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:08:49,200-Speed 5428.67 samples/sec   Loss 6.6914   LearningRate 0.1145   Epoch: 8   Global Step: 85650   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:08:56,712-Speed 5453.55 samples/sec   Loss 6.6657   LearningRate 0.1145   Epoch: 8   Global Step: 85660   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:09:04,293-Speed 5403.69 samples/sec   Loss 6.6270   LearningRate 0.1145   Epoch: 8   Global Step: 85670   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:09:11,959-Speed 5343.50 samples/sec   Loss 6.6019   LearningRate 0.1145   Epoch: 8   Global Step: 85680   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:09:19,426-Speed 5486.41 samples/sec   Loss 6.6154   LearningRate 0.1145   Epoch: 8   Global Step: 85690   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:09:26,952-Speed 5442.94 samples/sec   Loss 6.5736   LearningRate 0.1144   Epoch: 8   Global Step: 85700   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:09:34,357-Speed 5532.66 samples/sec   Loss 6.5747   LearningRate 0.1144   Epoch: 8   Global Step: 85710   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:09:41,873-Speed 5449.84 samples/sec   Loss 6.5323   LearningRate 0.1144   Epoch: 8   Global Step: 85720   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:09:49,356-Speed 5474.82 samples/sec   Loss 6.5779   LearningRate 0.1144   Epoch: 8   Global Step: 85730   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:09:56,929-Speed 5408.99 samples/sec   Loss 6.6295   LearningRate 0.1144   Epoch: 8   Global Step: 85740   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:10:04,445-Speed 5450.61 samples/sec   Loss 6.6227   LearningRate 0.1143   Epoch: 8   Global Step: 85750   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:10:11,980-Speed 5437.04 samples/sec   Loss 6.6486   LearningRate 0.1143   Epoch: 8   Global Step: 85760   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:10:19,599-Speed 5376.69 samples/sec   Loss 6.6223   LearningRate 0.1143   Epoch: 8   Global Step: 85770   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:10:27,096-Speed 5463.86 samples/sec   Loss 6.6199   LearningRate 0.1143   Epoch: 8   Global Step: 85780   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:10:34,731-Speed 5365.68 samples/sec   Loss 6.6067   LearningRate 0.1143   Epoch: 8   Global Step: 85790   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 14:10:42,214-Speed 5474.74 samples/sec   Loss 6.6109   LearningRate 0.1143   Epoch: 8   Global Step: 85800   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 14:10:49,747-Speed 5438.15 samples/sec   Loss 6.5998   LearningRate 0.1142   Epoch: 8   Global Step: 85810   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 14:10:57,275-Speed 5442.04 samples/sec   Loss 6.6522   LearningRate 0.1142   Epoch: 8   Global Step: 85820   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 14:11:04,865-Speed 5396.87 samples/sec   Loss 6.6028   LearningRate 0.1142   Epoch: 8   Global Step: 85830   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 14:11:12,416-Speed 5425.28 samples/sec   Loss 6.5670   LearningRate 0.1142   Epoch: 8   Global Step: 85840   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 14:11:19,922-Speed 5457.64 samples/sec   Loss 6.6922   LearningRate 0.1142   Epoch: 8   Global Step: 85850   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 14:11:27,416-Speed 5466.59 samples/sec   Loss 6.6007   LearningRate 0.1141   Epoch: 8   Global Step: 85860   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 14:11:34,887-Speed 5483.07 samples/sec   Loss 6.5650   LearningRate 0.1141   Epoch: 8   Global Step: 85870   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 14:11:42,413-Speed 5443.73 samples/sec   Loss 6.6010   LearningRate 0.1141   Epoch: 8   Global Step: 85880   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 14:11:49,913-Speed 5461.53 samples/sec   Loss 6.5969   LearningRate 0.1141   Epoch: 8   Global Step: 85890   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:11:57,529-Speed 5379.08 samples/sec   Loss 6.5985   LearningRate 0.1141   Epoch: 8   Global Step: 85900   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:12:05,073-Speed 5430.28 samples/sec   Loss 6.6449   LearningRate 0.1140   Epoch: 8   Global Step: 85910   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:12:12,638-Speed 5414.93 samples/sec   Loss 6.6295   LearningRate 0.1140   Epoch: 8   Global Step: 85920   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:12:20,171-Speed 5438.18 samples/sec   Loss 6.6713   LearningRate 0.1140   Epoch: 8   Global Step: 85930   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:12:27,758-Speed 5399.72 samples/sec   Loss 6.6258   LearningRate 0.1140   Epoch: 8   Global Step: 85940   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:12:35,518-Speed 5279.09 samples/sec   Loss 6.5884   LearningRate 0.1140   Epoch: 8   Global Step: 85950   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:12:43,032-Speed 5451.64 samples/sec   Loss 6.6518   LearningRate 0.1140   Epoch: 8   Global Step: 85960   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:12:50,613-Speed 5403.68 samples/sec   Loss 6.6567   LearningRate 0.1139   Epoch: 8   Global Step: 85970   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:12:58,185-Speed 5410.60 samples/sec   Loss 6.6062   LearningRate 0.1139   Epoch: 8   Global Step: 85980   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:13:05,727-Speed 5431.32 samples/sec   Loss 6.6085   LearningRate 0.1139   Epoch: 8   Global Step: 85990   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:13:13,222-Speed 5466.01 samples/sec   Loss 6.5295   LearningRate 0.1139   Epoch: 8   Global Step: 86000   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:13:57,266-[lfw][86000]XNorm: 22.979678
Training: 2022-01-08 14:13:57,267-[lfw][86000]Accuracy-Flip: 0.99750+-0.00261
Training: 2022-01-08 14:13:57,268-[lfw][86000]Accuracy-Highest: 0.99817
Training: 2022-01-08 14:14:48,332-[cfp_fp][86000]XNorm: 20.765421
Training: 2022-01-08 14:14:48,333-[cfp_fp][86000]Accuracy-Flip: 0.98671+-0.00811
Training: 2022-01-08 14:14:48,333-[cfp_fp][86000]Accuracy-Highest: 0.98814
Training: 2022-01-08 14:15:33,898-[agedb_30][86000]XNorm: 22.846304
Training: 2022-01-08 14:15:33,899-[agedb_30][86000]Accuracy-Flip: 0.97600+-0.00700
Training: 2022-01-08 14:15:33,899-[agedb_30][86000]Accuracy-Highest: 0.97667
Training: 2022-01-08 14:15:41,169-Speed 276.86 samples/sec   Loss 6.5563   LearningRate 0.1139   Epoch: 8   Global Step: 86010   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:15:48,760-Speed 5397.11 samples/sec   Loss 6.6396   LearningRate 0.1138   Epoch: 8   Global Step: 86020   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:15:56,323-Speed 5416.77 samples/sec   Loss 6.6102   LearningRate 0.1138   Epoch: 8   Global Step: 86030   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:16:03,990-Speed 5343.52 samples/sec   Loss 6.6289   LearningRate 0.1138   Epoch: 8   Global Step: 86040   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:16:11,504-Speed 5452.26 samples/sec   Loss 6.5637   LearningRate 0.1138   Epoch: 8   Global Step: 86050   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:16:19,090-Speed 5400.18 samples/sec   Loss 6.5569   LearningRate 0.1138   Epoch: 8   Global Step: 86060   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:16:26,604-Speed 5452.33 samples/sec   Loss 6.6015   LearningRate 0.1137   Epoch: 8   Global Step: 86070   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:16:34,083-Speed 5477.11 samples/sec   Loss 6.5902   LearningRate 0.1137   Epoch: 8   Global Step: 86080   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:16:41,611-Speed 5440.80 samples/sec   Loss 6.5898   LearningRate 0.1137   Epoch: 8   Global Step: 86090   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:16:49,109-Speed 5463.97 samples/sec   Loss 6.5686   LearningRate 0.1137   Epoch: 8   Global Step: 86100   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:16:56,659-Speed 5426.01 samples/sec   Loss 6.6322   LearningRate 0.1137   Epoch: 8   Global Step: 86110   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:17:04,211-Speed 5424.04 samples/sec   Loss 6.5806   LearningRate 0.1137   Epoch: 8   Global Step: 86120   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:17:11,721-Speed 5455.11 samples/sec   Loss 6.5742   LearningRate 0.1136   Epoch: 8   Global Step: 86130   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:17:19,209-Speed 5470.80 samples/sec   Loss 6.5818   LearningRate 0.1136   Epoch: 8   Global Step: 86140   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:17:26,683-Speed 5481.31 samples/sec   Loss 6.6351   LearningRate 0.1136   Epoch: 8   Global Step: 86150   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 14:17:34,225-Speed 5430.95 samples/sec   Loss 6.6104   LearningRate 0.1136   Epoch: 8   Global Step: 86160   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 14:17:41,750-Speed 5443.66 samples/sec   Loss 6.6033   LearningRate 0.1136   Epoch: 8   Global Step: 86170   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 14:17:49,346-Speed 5393.56 samples/sec   Loss 6.5982   LearningRate 0.1135   Epoch: 8   Global Step: 86180   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 14:17:56,912-Speed 5414.45 samples/sec   Loss 6.6266   LearningRate 0.1135   Epoch: 8   Global Step: 86190   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 14:18:04,486-Speed 5408.20 samples/sec   Loss 6.5845   LearningRate 0.1135   Epoch: 8   Global Step: 86200   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 14:18:12,146-Speed 5347.55 samples/sec   Loss 6.5876   LearningRate 0.1135   Epoch: 8   Global Step: 86210   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 14:18:19,896-Speed 5286.41 samples/sec   Loss 6.5313   LearningRate 0.1135   Epoch: 8   Global Step: 86220   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 14:18:27,502-Speed 5386.22 samples/sec   Loss 6.5755   LearningRate 0.1134   Epoch: 8   Global Step: 86230   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 14:18:35,169-Speed 5342.66 samples/sec   Loss 6.6144   LearningRate 0.1134   Epoch: 8   Global Step: 86240   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 14:18:42,809-Speed 5361.70 samples/sec   Loss 6.5775   LearningRate 0.1134   Epoch: 8   Global Step: 86250   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:18:50,274-Speed 5488.10 samples/sec   Loss 6.5691   LearningRate 0.1134   Epoch: 8   Global Step: 86260   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:18:57,761-Speed 5471.48 samples/sec   Loss 6.5938   LearningRate 0.1134   Epoch: 8   Global Step: 86270   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:19:05,347-Speed 5400.12 samples/sec   Loss 6.5649   LearningRate 0.1134   Epoch: 8   Global Step: 86280   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:19:12,887-Speed 5432.53 samples/sec   Loss 6.5641   LearningRate 0.1133   Epoch: 8   Global Step: 86290   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:19:20,437-Speed 5426.64 samples/sec   Loss 6.5344   LearningRate 0.1133   Epoch: 8   Global Step: 86300   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:19:27,905-Speed 5485.33 samples/sec   Loss 6.5587   LearningRate 0.1133   Epoch: 8   Global Step: 86310   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:19:35,348-Speed 5503.89 samples/sec   Loss 6.6037   LearningRate 0.1133   Epoch: 8   Global Step: 86320   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:19:42,851-Speed 5459.75 samples/sec   Loss 6.5260   LearningRate 0.1133   Epoch: 8   Global Step: 86330   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:19:50,406-Speed 5422.39 samples/sec   Loss 6.5502   LearningRate 0.1132   Epoch: 8   Global Step: 86340   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:19:58,026-Speed 5375.97 samples/sec   Loss 6.5923   LearningRate 0.1132   Epoch: 8   Global Step: 86350   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:20:05,717-Speed 5327.08 samples/sec   Loss 6.5624   LearningRate 0.1132   Epoch: 8   Global Step: 86360   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:20:13,324-Speed 5384.47 samples/sec   Loss 6.5669   LearningRate 0.1132   Epoch: 8   Global Step: 86370   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:20:20,823-Speed 5463.20 samples/sec   Loss 6.5674   LearningRate 0.1132   Epoch: 8   Global Step: 86380   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:20:28,331-Speed 5456.76 samples/sec   Loss 6.5360   LearningRate 0.1131   Epoch: 8   Global Step: 86390   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:20:35,896-Speed 5415.18 samples/sec   Loss 6.5466   LearningRate 0.1131   Epoch: 8   Global Step: 86400   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:20:43,465-Speed 5411.98 samples/sec   Loss 6.5258   LearningRate 0.1131   Epoch: 8   Global Step: 86410   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:20:51,132-Speed 5343.12 samples/sec   Loss 6.5610   LearningRate 0.1131   Epoch: 8   Global Step: 86420   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:20:58,777-Speed 5357.94 samples/sec   Loss 6.5281   LearningRate 0.1131   Epoch: 8   Global Step: 86430   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:21:06,429-Speed 5353.91 samples/sec   Loss 6.5732   LearningRate 0.1131   Epoch: 8   Global Step: 86440   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:21:14,037-Speed 5385.00 samples/sec   Loss 6.5941   LearningRate 0.1130   Epoch: 8   Global Step: 86450   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:21:21,717-Speed 5333.49 samples/sec   Loss 6.6196   LearningRate 0.1130   Epoch: 8   Global Step: 86460   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:21:29,213-Speed 5465.20 samples/sec   Loss 6.5997   LearningRate 0.1130   Epoch: 8   Global Step: 86470   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:21:36,725-Speed 5453.35 samples/sec   Loss 6.5409   LearningRate 0.1130   Epoch: 8   Global Step: 86480   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:21:44,239-Speed 5451.44 samples/sec   Loss 6.6006   LearningRate 0.1130   Epoch: 8   Global Step: 86490   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:21:51,696-Speed 5494.04 samples/sec   Loss 6.5397   LearningRate 0.1129   Epoch: 8   Global Step: 86500   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:21:59,142-Speed 5501.30 samples/sec   Loss 6.5218   LearningRate 0.1129   Epoch: 8   Global Step: 86510   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:22:06,691-Speed 5426.77 samples/sec   Loss 6.5934   LearningRate 0.1129   Epoch: 8   Global Step: 86520   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:22:14,337-Speed 5357.69 samples/sec   Loss 6.5721   LearningRate 0.1129   Epoch: 8   Global Step: 86530   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:22:22,108-Speed 5271.34 samples/sec   Loss 6.5413   LearningRate 0.1129   Epoch: 8   Global Step: 86540   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:22:29,599-Speed 5468.31 samples/sec   Loss 6.5416   LearningRate 0.1128   Epoch: 8   Global Step: 86550   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:22:37,194-Speed 5394.07 samples/sec   Loss 6.5346   LearningRate 0.1128   Epoch: 8   Global Step: 86560   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:22:44,634-Speed 5506.11 samples/sec   Loss 6.5697   LearningRate 0.1128   Epoch: 8   Global Step: 86570   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:22:52,105-Speed 5482.85 samples/sec   Loss 6.5613   LearningRate 0.1128   Epoch: 8   Global Step: 86580   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:22:59,536-Speed 5512.96 samples/sec   Loss 6.5719   LearningRate 0.1128   Epoch: 8   Global Step: 86590   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:23:07,100-Speed 5415.25 samples/sec   Loss 6.5800   LearningRate 0.1128   Epoch: 8   Global Step: 86600   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:23:14,603-Speed 5460.57 samples/sec   Loss 6.6131   LearningRate 0.1127   Epoch: 8   Global Step: 86610   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:23:22,184-Speed 5403.52 samples/sec   Loss 6.5258   LearningRate 0.1127   Epoch: 8   Global Step: 86620   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:23:29,852-Speed 5341.80 samples/sec   Loss 6.5403   LearningRate 0.1127   Epoch: 8   Global Step: 86630   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:23:37,415-Speed 5416.68 samples/sec   Loss 6.5195   LearningRate 0.1127   Epoch: 8   Global Step: 86640   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:23:44,828-Speed 5526.60 samples/sec   Loss 6.5532   LearningRate 0.1127   Epoch: 8   Global Step: 86650   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:23:52,253-Speed 5516.99 samples/sec   Loss 6.5063   LearningRate 0.1126   Epoch: 8   Global Step: 86660   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:23:59,800-Speed 5427.91 samples/sec   Loss 6.4804   LearningRate 0.1126   Epoch: 8   Global Step: 86670   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:24:07,280-Speed 5476.46 samples/sec   Loss 6.5180   LearningRate 0.1126   Epoch: 8   Global Step: 86680   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:24:14,754-Speed 5481.61 samples/sec   Loss 6.5770   LearningRate 0.1126   Epoch: 8   Global Step: 86690   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:24:22,301-Speed 5428.31 samples/sec   Loss 6.5446   LearningRate 0.1126   Epoch: 8   Global Step: 86700   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:24:29,799-Speed 5463.02 samples/sec   Loss 6.5686   LearningRate 0.1125   Epoch: 8   Global Step: 86710   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:24:37,400-Speed 5389.41 samples/sec   Loss 6.5436   LearningRate 0.1125   Epoch: 8   Global Step: 86720   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:24:44,931-Speed 5439.19 samples/sec   Loss 6.5260   LearningRate 0.1125   Epoch: 8   Global Step: 86730   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:24:52,401-Speed 5484.85 samples/sec   Loss 6.5867   LearningRate 0.1125   Epoch: 8   Global Step: 86740   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 14:24:59,884-Speed 5474.24 samples/sec   Loss 6.5263   LearningRate 0.1125   Epoch: 8   Global Step: 86750   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 14:25:07,420-Speed 5436.13 samples/sec   Loss 6.6333   LearningRate 0.1125   Epoch: 8   Global Step: 86760   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 14:25:14,962-Speed 5431.54 samples/sec   Loss 6.5863   LearningRate 0.1124   Epoch: 8   Global Step: 86770   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 14:25:22,526-Speed 5415.73 samples/sec   Loss 6.5230   LearningRate 0.1124   Epoch: 8   Global Step: 86780   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 14:25:30,059-Speed 5437.89 samples/sec   Loss 6.5364   LearningRate 0.1124   Epoch: 8   Global Step: 86790   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 14:25:37,660-Speed 5389.39 samples/sec   Loss 6.5455   LearningRate 0.1124   Epoch: 8   Global Step: 86800   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 14:25:45,227-Speed 5414.27 samples/sec   Loss 6.5188   LearningRate 0.1124   Epoch: 8   Global Step: 86810   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 14:25:52,869-Speed 5360.01 samples/sec   Loss 6.5139   LearningRate 0.1123   Epoch: 8   Global Step: 86820   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 14:26:00,420-Speed 5425.60 samples/sec   Loss 6.5544   LearningRate 0.1123   Epoch: 8   Global Step: 86830   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 14:26:07,982-Speed 5417.16 samples/sec   Loss 6.5744   LearningRate 0.1123   Epoch: 8   Global Step: 86840   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:26:15,553-Speed 5411.09 samples/sec   Loss 6.5436   LearningRate 0.1123   Epoch: 8   Global Step: 86850   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:26:23,103-Speed 5425.60 samples/sec   Loss 6.5187   LearningRate 0.1123   Epoch: 8   Global Step: 86860   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:26:30,803-Speed 5320.29 samples/sec   Loss 6.5088   LearningRate 0.1123   Epoch: 8   Global Step: 86870   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:26:38,372-Speed 5412.74 samples/sec   Loss 6.5493   LearningRate 0.1122   Epoch: 8   Global Step: 86880   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:26:45,869-Speed 5464.45 samples/sec   Loss 6.5725   LearningRate 0.1122   Epoch: 8   Global Step: 86890   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:26:53,347-Speed 5478.02 samples/sec   Loss 6.5601   LearningRate 0.1122   Epoch: 8   Global Step: 86900   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:27:00,977-Speed 5368.46 samples/sec   Loss 6.4556   LearningRate 0.1122   Epoch: 8   Global Step: 86910   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:27:08,511-Speed 5437.79 samples/sec   Loss 6.4469   LearningRate 0.1122   Epoch: 8   Global Step: 86920   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:27:15,981-Speed 5483.66 samples/sec   Loss 6.5879   LearningRate 0.1121   Epoch: 8   Global Step: 86930   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:27:23,468-Speed 5471.80 samples/sec   Loss 6.5859   LearningRate 0.1121   Epoch: 8   Global Step: 86940   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:27:31,028-Speed 5418.99 samples/sec   Loss 6.5576   LearningRate 0.1121   Epoch: 8   Global Step: 86950   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:27:38,541-Speed 5452.84 samples/sec   Loss 6.5761   LearningRate 0.1121   Epoch: 8   Global Step: 86960   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:27:46,050-Speed 5455.52 samples/sec   Loss 6.5944   LearningRate 0.1121   Epoch: 8   Global Step: 86970   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:27:53,670-Speed 5375.44 samples/sec   Loss 6.5158   LearningRate 0.1120   Epoch: 8   Global Step: 86980   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:28:01,228-Speed 5420.18 samples/sec   Loss 6.4729   LearningRate 0.1120   Epoch: 8   Global Step: 86990   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:28:08,700-Speed 5483.08 samples/sec   Loss 6.4808   LearningRate 0.1120   Epoch: 8   Global Step: 87000   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:28:16,094-Speed 5540.33 samples/sec   Loss 6.5204   LearningRate 0.1120   Epoch: 8   Global Step: 87010   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:28:23,533-Speed 5506.61 samples/sec   Loss 6.5357   LearningRate 0.1120   Epoch: 8   Global Step: 87020   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:28:31,056-Speed 5445.34 samples/sec   Loss 6.5431   LearningRate 0.1120   Epoch: 8   Global Step: 87030   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:28:38,612-Speed 5422.04 samples/sec   Loss 6.6078   LearningRate 0.1119   Epoch: 8   Global Step: 87040   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:28:46,089-Speed 5479.19 samples/sec   Loss 6.4954   LearningRate 0.1119   Epoch: 8   Global Step: 87050   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:28:53,614-Speed 5443.43 samples/sec   Loss 6.5551   LearningRate 0.1119   Epoch: 8   Global Step: 87060   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:29:01,238-Speed 5373.54 samples/sec   Loss 6.5791   LearningRate 0.1119   Epoch: 8   Global Step: 87070   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:29:08,804-Speed 5414.11 samples/sec   Loss 6.6358   LearningRate 0.1119   Epoch: 8   Global Step: 87080   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:29:16,292-Speed 5471.07 samples/sec   Loss 6.5703   LearningRate 0.1118   Epoch: 8   Global Step: 87090   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:29:23,834-Speed 5431.84 samples/sec   Loss 6.5423   LearningRate 0.1118   Epoch: 8   Global Step: 87100   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:29:31,344-Speed 5454.80 samples/sec   Loss 6.6079   LearningRate 0.1118   Epoch: 8   Global Step: 87110   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:29:38,794-Speed 5498.79 samples/sec   Loss 6.4643   LearningRate 0.1118   Epoch: 8   Global Step: 87120   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:29:46,179-Speed 5547.04 samples/sec   Loss 6.5327   LearningRate 0.1118   Epoch: 8   Global Step: 87130   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:29:53,611-Speed 5511.84 samples/sec   Loss 6.5204   LearningRate 0.1117   Epoch: 8   Global Step: 87140   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:30:01,097-Speed 5472.29 samples/sec   Loss 6.4250   LearningRate 0.1117   Epoch: 8   Global Step: 87150   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:30:08,591-Speed 5466.48 samples/sec   Loss 6.4492   LearningRate 0.1117   Epoch: 8   Global Step: 87160   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:30:16,091-Speed 5461.71 samples/sec   Loss 6.4994   LearningRate 0.1117   Epoch: 8   Global Step: 87170   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:30:23,613-Speed 5446.37 samples/sec   Loss 6.4958   LearningRate 0.1117   Epoch: 8   Global Step: 87180   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:30:31,223-Speed 5383.45 samples/sec   Loss 6.5020   LearningRate 0.1117   Epoch: 8   Global Step: 87190   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:30:38,823-Speed 5389.86 samples/sec   Loss 6.5312   LearningRate 0.1116   Epoch: 8   Global Step: 87200   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:30:46,494-Speed 5340.44 samples/sec   Loss 6.5205   LearningRate 0.1116   Epoch: 8   Global Step: 87210   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:30:54,009-Speed 5451.47 samples/sec   Loss 6.5057   LearningRate 0.1116   Epoch: 8   Global Step: 87220   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:31:01,720-Speed 5312.68 samples/sec   Loss 6.5945   LearningRate 0.1116   Epoch: 8   Global Step: 87230   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:31:09,246-Speed 5442.82 samples/sec   Loss 6.4951   LearningRate 0.1116   Epoch: 8   Global Step: 87240   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:31:16,859-Speed 5381.34 samples/sec   Loss 6.5164   LearningRate 0.1115   Epoch: 8   Global Step: 87250   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:31:24,323-Speed 5488.37 samples/sec   Loss 6.5187   LearningRate 0.1115   Epoch: 8   Global Step: 87260   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:31:31,778-Speed 5495.10 samples/sec   Loss 6.4611   LearningRate 0.1115   Epoch: 8   Global Step: 87270   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:31:39,384-Speed 5385.43 samples/sec   Loss 6.6098   LearningRate 0.1115   Epoch: 8   Global Step: 87280   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:31:46,903-Speed 5448.06 samples/sec   Loss 6.5308   LearningRate 0.1115   Epoch: 8   Global Step: 87290   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:31:54,389-Speed 5472.89 samples/sec   Loss 6.5394   LearningRate 0.1115   Epoch: 8   Global Step: 87300   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:32:01,931-Speed 5431.46 samples/sec   Loss 6.4782   LearningRate 0.1114   Epoch: 8   Global Step: 87310   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:32:09,690-Speed 5279.75 samples/sec   Loss 6.5320   LearningRate 0.1114   Epoch: 8   Global Step: 87320   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:32:17,196-Speed 5457.28 samples/sec   Loss 6.5071   LearningRate 0.1114   Epoch: 8   Global Step: 87330   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:32:24,699-Speed 5460.43 samples/sec   Loss 6.5851   LearningRate 0.1114   Epoch: 8   Global Step: 87340   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:32:32,216-Speed 5449.60 samples/sec   Loss 6.5002   LearningRate 0.1114   Epoch: 8   Global Step: 87350   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:32:39,656-Speed 5506.03 samples/sec   Loss 6.5565   LearningRate 0.1113   Epoch: 8   Global Step: 87360   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:32:47,128-Speed 5482.42 samples/sec   Loss 6.4385   LearningRate 0.1113   Epoch: 8   Global Step: 87370   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:32:54,667-Speed 5434.26 samples/sec   Loss 6.5946   LearningRate 0.1113   Epoch: 8   Global Step: 87380   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 14:33:02,167-Speed 5461.97 samples/sec   Loss 6.5329   LearningRate 0.1113   Epoch: 8   Global Step: 87390   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:33:09,661-Speed 5466.56 samples/sec   Loss 6.5448   LearningRate 0.1113   Epoch: 8   Global Step: 87400   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 14:33:17,159-Speed 5463.71 samples/sec   Loss 6.4769   LearningRate 0.1112   Epoch: 8   Global Step: 87410   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:33:24,560-Speed 5534.54 samples/sec   Loss 6.5296   LearningRate 0.1112   Epoch: 8   Global Step: 87420   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:33:32,118-Speed 5420.46 samples/sec   Loss 6.5081   LearningRate 0.1112   Epoch: 8   Global Step: 87430   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:33:39,602-Speed 5474.41 samples/sec   Loss 6.5596   LearningRate 0.1112   Epoch: 8   Global Step: 87440   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:33:47,071-Speed 5484.44 samples/sec   Loss 6.5460   LearningRate 0.1112   Epoch: 8   Global Step: 87450   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:33:54,528-Speed 5493.27 samples/sec   Loss 6.5015   LearningRate 0.1112   Epoch: 8   Global Step: 87460   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:34:01,976-Speed 5499.93 samples/sec   Loss 6.4388   LearningRate 0.1111   Epoch: 8   Global Step: 87470   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:34:09,447-Speed 5483.79 samples/sec   Loss 6.4546   LearningRate 0.1111   Epoch: 8   Global Step: 87480   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:34:16,973-Speed 5442.90 samples/sec   Loss 6.4014   LearningRate 0.1111   Epoch: 8   Global Step: 87490   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:34:24,406-Speed 5511.64 samples/sec   Loss 6.5205   LearningRate 0.1111   Epoch: 8   Global Step: 87500   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:34:31,910-Speed 5458.76 samples/sec   Loss 6.4658   LearningRate 0.1111   Epoch: 8   Global Step: 87510   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:34:39,381-Speed 5484.16 samples/sec   Loss 6.4696   LearningRate 0.1110   Epoch: 8   Global Step: 87520   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:34:46,874-Speed 5466.66 samples/sec   Loss 6.5542   LearningRate 0.1110   Epoch: 8   Global Step: 87530   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:34:54,292-Speed 5522.79 samples/sec   Loss 6.5198   LearningRate 0.1110   Epoch: 8   Global Step: 87540   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:35:01,724-Speed 5511.84 samples/sec   Loss 6.5048   LearningRate 0.1110   Epoch: 8   Global Step: 87550   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:35:09,234-Speed 5455.39 samples/sec   Loss 6.5135   LearningRate 0.1110   Epoch: 8   Global Step: 87560   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:35:16,810-Speed 5407.30 samples/sec   Loss 6.4934   LearningRate 0.1109   Epoch: 8   Global Step: 87570   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:35:24,254-Speed 5503.14 samples/sec   Loss 6.4770   LearningRate 0.1109   Epoch: 8   Global Step: 87580   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:35:31,762-Speed 5455.91 samples/sec   Loss 6.4936   LearningRate 0.1109   Epoch: 8   Global Step: 87590   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:35:39,251-Speed 5470.23 samples/sec   Loss 6.5100   LearningRate 0.1109   Epoch: 8   Global Step: 87600   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:35:46,822-Speed 5411.03 samples/sec   Loss 6.5049   LearningRate 0.1109   Epoch: 8   Global Step: 87610   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:35:54,312-Speed 5469.46 samples/sec   Loss 6.5568   LearningRate 0.1109   Epoch: 8   Global Step: 87620   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:36:01,764-Speed 5497.06 samples/sec   Loss 6.4914   LearningRate 0.1108   Epoch: 8   Global Step: 87630   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:36:09,399-Speed 5365.92 samples/sec   Loss 6.5042   LearningRate 0.1108   Epoch: 8   Global Step: 87640   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:36:17,160-Speed 5278.08 samples/sec   Loss 6.5146   LearningRate 0.1108   Epoch: 8   Global Step: 87650   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:36:24,774-Speed 5380.79 samples/sec   Loss 6.4296   LearningRate 0.1108   Epoch: 8   Global Step: 87660   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:36:32,311-Speed 5434.65 samples/sec   Loss 6.4777   LearningRate 0.1108   Epoch: 8   Global Step: 87670   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:36:39,916-Speed 5387.53 samples/sec   Loss 6.5373   LearningRate 0.1107   Epoch: 8   Global Step: 87680   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:36:47,399-Speed 5474.22 samples/sec   Loss 6.4894   LearningRate 0.1107   Epoch: 8   Global Step: 87690   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:36:54,950-Speed 5424.86 samples/sec   Loss 6.4816   LearningRate 0.1107   Epoch: 8   Global Step: 87700   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 14:37:02,499-Speed 5427.11 samples/sec   Loss 6.5544   LearningRate 0.1107   Epoch: 8   Global Step: 87710   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 14:37:10,140-Speed 5361.04 samples/sec   Loss 6.5143   LearningRate 0.1107   Epoch: 8   Global Step: 87720   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 14:37:17,858-Speed 5308.00 samples/sec   Loss 6.5364   LearningRate 0.1107   Epoch: 8   Global Step: 87730   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 14:37:25,416-Speed 5419.87 samples/sec   Loss 6.5567   LearningRate 0.1106   Epoch: 8   Global Step: 87740   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 14:37:33,002-Speed 5400.28 samples/sec   Loss 6.4690   LearningRate 0.1106   Epoch: 8   Global Step: 87750   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 14:37:40,524-Speed 5446.62 samples/sec   Loss 6.4855   LearningRate 0.1106   Epoch: 8   Global Step: 87760   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 14:37:48,012-Speed 5470.70 samples/sec   Loss 6.4897   LearningRate 0.1106   Epoch: 8   Global Step: 87770   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 14:37:55,539-Speed 5442.65 samples/sec   Loss 6.4411   LearningRate 0.1106   Epoch: 8   Global Step: 87780   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 14:38:03,190-Speed 5354.05 samples/sec   Loss 6.4324   LearningRate 0.1105   Epoch: 8   Global Step: 87790   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 14:38:10,841-Speed 5354.13 samples/sec   Loss 6.5131   LearningRate 0.1105   Epoch: 8   Global Step: 87800   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:38:18,524-Speed 5332.31 samples/sec   Loss 6.5238   LearningRate 0.1105   Epoch: 8   Global Step: 87810   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:38:25,999-Speed 5479.91 samples/sec   Loss 6.5248   LearningRate 0.1105   Epoch: 8   Global Step: 87820   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:38:33,435-Speed 5509.22 samples/sec   Loss 6.4973   LearningRate 0.1105   Epoch: 8   Global Step: 87830   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:38:40,897-Speed 5490.32 samples/sec   Loss 6.5034   LearningRate 0.1105   Epoch: 8   Global Step: 87840   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:38:48,574-Speed 5335.66 samples/sec   Loss 6.4885   LearningRate 0.1104   Epoch: 8   Global Step: 87850   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:38:56,268-Speed 5324.65 samples/sec   Loss 6.5831   LearningRate 0.1104   Epoch: 8   Global Step: 87860   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:39:03,948-Speed 5334.26 samples/sec   Loss 6.5015   LearningRate 0.1104   Epoch: 8   Global Step: 87870   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:39:11,682-Speed 5296.30 samples/sec   Loss 6.4384   LearningRate 0.1104   Epoch: 8   Global Step: 87880   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:39:19,361-Speed 5334.91 samples/sec   Loss 6.5189   LearningRate 0.1104   Epoch: 8   Global Step: 87890   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:39:27,078-Speed 5308.63 samples/sec   Loss 6.5050   LearningRate 0.1103   Epoch: 8   Global Step: 87900   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:39:34,616-Speed 5434.20 samples/sec   Loss 6.4674   LearningRate 0.1103   Epoch: 8   Global Step: 87910   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:39:42,180-Speed 5415.98 samples/sec   Loss 6.5106   LearningRate 0.1103   Epoch: 8   Global Step: 87920   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:39:49,761-Speed 5403.94 samples/sec   Loss 6.4786   LearningRate 0.1103   Epoch: 8   Global Step: 87930   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:39:57,319-Speed 5419.74 samples/sec   Loss 6.5092   LearningRate 0.1103   Epoch: 8   Global Step: 87940   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:40:04,786-Speed 5486.17 samples/sec   Loss 6.5093   LearningRate 0.1102   Epoch: 8   Global Step: 87950   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:40:12,229-Speed 5504.15 samples/sec   Loss 6.4786   LearningRate 0.1102   Epoch: 8   Global Step: 87960   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:40:19,859-Speed 5369.13 samples/sec   Loss 6.5096   LearningRate 0.1102   Epoch: 8   Global Step: 87970   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:40:27,308-Speed 5499.31 samples/sec   Loss 6.4696   LearningRate 0.1102   Epoch: 8   Global Step: 87980   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:40:34,819-Speed 5453.94 samples/sec   Loss 6.5140   LearningRate 0.1102   Epoch: 8   Global Step: 87990   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:40:42,313-Speed 5465.99 samples/sec   Loss 6.4865   LearningRate 0.1102   Epoch: 8   Global Step: 88000   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:41:26,191-[lfw][88000]XNorm: 22.549660
Training: 2022-01-08 14:41:26,192-[lfw][88000]Accuracy-Flip: 0.99800+-0.00287
Training: 2022-01-08 14:41:26,192-[lfw][88000]Accuracy-Highest: 0.99817
Training: 2022-01-08 14:42:17,888-[cfp_fp][88000]XNorm: 20.955835
Training: 2022-01-08 14:42:17,890-[cfp_fp][88000]Accuracy-Flip: 0.98814+-0.00649
Training: 2022-01-08 14:42:17,890-[cfp_fp][88000]Accuracy-Highest: 0.98814
Training: 2022-01-08 14:43:04,039-[agedb_30][88000]XNorm: 22.499002
Training: 2022-01-08 14:43:04,040-[agedb_30][88000]Accuracy-Flip: 0.97617+-0.00757
Training: 2022-01-08 14:43:04,040-[agedb_30][88000]Accuracy-Highest: 0.97667
Training: 2022-01-08 14:43:11,656-Speed 274.27 samples/sec   Loss 6.4254   LearningRate 0.1101   Epoch: 8   Global Step: 88010   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:43:19,349-Speed 5325.90 samples/sec   Loss 6.4497   LearningRate 0.1101   Epoch: 8   Global Step: 88020   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:43:26,926-Speed 5407.72 samples/sec   Loss 6.4636   LearningRate 0.1101   Epoch: 8   Global Step: 88030   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:43:34,376-Speed 5502.39 samples/sec   Loss 6.4747   LearningRate 0.1101   Epoch: 8   Global Step: 88040   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:43:41,836-Speed 5491.62 samples/sec   Loss 6.4934   LearningRate 0.1101   Epoch: 8   Global Step: 88050   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:43:49,345-Speed 5456.24 samples/sec   Loss 6.5079   LearningRate 0.1100   Epoch: 8   Global Step: 88060   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:43:56,830-Speed 5473.27 samples/sec   Loss 6.5134   LearningRate 0.1100   Epoch: 8   Global Step: 88070   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:44:04,324-Speed 5466.64 samples/sec   Loss 6.5256   LearningRate 0.1100   Epoch: 8   Global Step: 88080   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:44:11,841-Speed 5450.17 samples/sec   Loss 6.4902   LearningRate 0.1100   Epoch: 8   Global Step: 88090   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:44:19,560-Speed 5306.51 samples/sec   Loss 6.5148   LearningRate 0.1100   Epoch: 8   Global Step: 88100   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:44:27,097-Speed 5435.66 samples/sec   Loss 6.4969   LearningRate 0.1100   Epoch: 8   Global Step: 88110   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:44:34,642-Speed 5429.70 samples/sec   Loss 6.4717   LearningRate 0.1099   Epoch: 8   Global Step: 88120   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:44:42,254-Speed 5381.73 samples/sec   Loss 6.4687   LearningRate 0.1099   Epoch: 8   Global Step: 88130   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:44:49,768-Speed 5451.88 samples/sec   Loss 6.5128   LearningRate 0.1099   Epoch: 8   Global Step: 88140   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:44:57,601-Speed 5229.88 samples/sec   Loss 6.4657   LearningRate 0.1099   Epoch: 8   Global Step: 88150   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:45:05,273-Speed 5339.62 samples/sec   Loss 6.5135   LearningRate 0.1099   Epoch: 8   Global Step: 88160   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:45:12,781-Speed 5457.05 samples/sec   Loss 6.4891   LearningRate 0.1098   Epoch: 8   Global Step: 88170   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:45:20,315-Speed 5437.39 samples/sec   Loss 6.4667   LearningRate 0.1098   Epoch: 8   Global Step: 88180   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:45:27,786-Speed 5482.95 samples/sec   Loss 6.4431   LearningRate 0.1098   Epoch: 8   Global Step: 88190   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:45:35,278-Speed 5468.14 samples/sec   Loss 6.4601   LearningRate 0.1098   Epoch: 8   Global Step: 88200   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:45:42,783-Speed 5458.26 samples/sec   Loss 6.4890   LearningRate 0.1098   Epoch: 8   Global Step: 88210   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:45:50,257-Speed 5481.55 samples/sec   Loss 6.5283   LearningRate 0.1097   Epoch: 8   Global Step: 88220   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:45:57,848-Speed 5396.07 samples/sec   Loss 6.4621   LearningRate 0.1097   Epoch: 8   Global Step: 88230   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:46:05,504-Speed 5351.53 samples/sec   Loss 6.5228   LearningRate 0.1097   Epoch: 8   Global Step: 88240   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:46:13,256-Speed 5283.84 samples/sec   Loss 6.4658   LearningRate 0.1097   Epoch: 8   Global Step: 88250   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:46:20,841-Speed 5401.05 samples/sec   Loss 6.4720   LearningRate 0.1097   Epoch: 8   Global Step: 88260   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:46:28,398-Speed 5420.82 samples/sec   Loss 6.4756   LearningRate 0.1097   Epoch: 8   Global Step: 88270   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:46:35,863-Speed 5487.75 samples/sec   Loss 6.5055   LearningRate 0.1096   Epoch: 8   Global Step: 88280   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:46:43,378-Speed 5451.10 samples/sec   Loss 6.4524   LearningRate 0.1096   Epoch: 8   Global Step: 88290   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:46:50,989-Speed 5382.20 samples/sec   Loss 6.4151   LearningRate 0.1096   Epoch: 8   Global Step: 88300   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:46:58,598-Speed 5383.90 samples/sec   Loss 6.4476   LearningRate 0.1096   Epoch: 8   Global Step: 88310   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:47:06,117-Speed 5448.43 samples/sec   Loss 6.4878   LearningRate 0.1096   Epoch: 8   Global Step: 88320   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:47:13,628-Speed 5453.81 samples/sec   Loss 6.4546   LearningRate 0.1095   Epoch: 8   Global Step: 88330   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:47:21,059-Speed 5512.54 samples/sec   Loss 6.4686   LearningRate 0.1095   Epoch: 8   Global Step: 88340   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:47:28,600-Speed 5432.82 samples/sec   Loss 6.4621   LearningRate 0.1095   Epoch: 8   Global Step: 88350   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:47:36,174-Speed 5408.08 samples/sec   Loss 6.4167   LearningRate 0.1095   Epoch: 8   Global Step: 88360   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:47:43,692-Speed 5449.10 samples/sec   Loss 6.4290   LearningRate 0.1095   Epoch: 8   Global Step: 88370   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:47:51,240-Speed 5427.41 samples/sec   Loss 6.4395   LearningRate 0.1095   Epoch: 8   Global Step: 88380   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:47:58,702-Speed 5490.22 samples/sec   Loss 6.4060   LearningRate 0.1094   Epoch: 8   Global Step: 88390   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:48:06,142-Speed 5505.60 samples/sec   Loss 6.4112   LearningRate 0.1094   Epoch: 8   Global Step: 88400   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:48:13,668-Speed 5443.19 samples/sec   Loss 6.4438   LearningRate 0.1094   Epoch: 8   Global Step: 88410   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:48:21,191-Speed 5445.80 samples/sec   Loss 6.4655   LearningRate 0.1094   Epoch: 8   Global Step: 88420   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:48:28,660-Speed 5484.48 samples/sec   Loss 6.4693   LearningRate 0.1094   Epoch: 8   Global Step: 88430   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:48:36,088-Speed 5515.04 samples/sec   Loss 6.5211   LearningRate 0.1093   Epoch: 8   Global Step: 88440   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:48:43,629-Speed 5432.12 samples/sec   Loss 6.4738   LearningRate 0.1093   Epoch: 8   Global Step: 88450   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:48:51,234-Speed 5386.30 samples/sec   Loss 6.4889   LearningRate 0.1093   Epoch: 8   Global Step: 88460   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:48:58,684-Speed 5499.38 samples/sec   Loss 6.4430   LearningRate 0.1093   Epoch: 8   Global Step: 88470   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:49:06,160-Speed 5479.28 samples/sec   Loss 6.4322   LearningRate 0.1093   Epoch: 8   Global Step: 88480   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:49:13,597-Speed 5508.62 samples/sec   Loss 6.4331   LearningRate 0.1093   Epoch: 8   Global Step: 88490   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:49:21,103-Speed 5457.40 samples/sec   Loss 6.4752   LearningRate 0.1092   Epoch: 8   Global Step: 88500   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:49:28,680-Speed 5406.12 samples/sec   Loss 6.4343   LearningRate 0.1092   Epoch: 8   Global Step: 88510   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 14:49:36,129-Speed 5500.05 samples/sec   Loss 6.4081   LearningRate 0.1092   Epoch: 8   Global Step: 88520   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 14:49:43,573-Speed 5503.23 samples/sec   Loss 6.4773   LearningRate 0.1092   Epoch: 8   Global Step: 88530   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 14:49:51,054-Speed 5475.73 samples/sec   Loss 6.4383   LearningRate 0.1092   Epoch: 8   Global Step: 88540   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 14:49:58,506-Speed 5497.19 samples/sec   Loss 6.4617   LearningRate 0.1091   Epoch: 8   Global Step: 88550   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 14:50:06,016-Speed 5454.89 samples/sec   Loss 6.4731   LearningRate 0.1091   Epoch: 8   Global Step: 88560   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 14:50:13,563-Speed 5428.10 samples/sec   Loss 6.4417   LearningRate 0.1091   Epoch: 8   Global Step: 88570   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 14:50:21,143-Speed 5404.41 samples/sec   Loss 6.4855   LearningRate 0.1091   Epoch: 8   Global Step: 88580   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 14:50:28,651-Speed 5455.88 samples/sec   Loss 6.4632   LearningRate 0.1091   Epoch: 8   Global Step: 88590   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 14:50:36,143-Speed 5468.38 samples/sec   Loss 6.4491   LearningRate 0.1091   Epoch: 8   Global Step: 88600   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 14:50:43,762-Speed 5376.78 samples/sec   Loss 6.4847   LearningRate 0.1090   Epoch: 8   Global Step: 88610   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:50:51,264-Speed 5460.90 samples/sec   Loss 6.4710   LearningRate 0.1090   Epoch: 8   Global Step: 88620   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:50:58,891-Speed 5370.84 samples/sec   Loss 6.4398   LearningRate 0.1090   Epoch: 8   Global Step: 88630   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:51:06,428-Speed 5435.07 samples/sec   Loss 6.4351   LearningRate 0.1090   Epoch: 8   Global Step: 88640   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:51:13,931-Speed 5460.18 samples/sec   Loss 6.4224   LearningRate 0.1090   Epoch: 8   Global Step: 88650   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:51:21,433-Speed 5460.38 samples/sec   Loss 6.4299   LearningRate 0.1089   Epoch: 8   Global Step: 88660   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:51:28,973-Speed 5432.75 samples/sec   Loss 6.4157   LearningRate 0.1089   Epoch: 8   Global Step: 88670   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:51:36,444-Speed 5482.96 samples/sec   Loss 6.4658   LearningRate 0.1089   Epoch: 8   Global Step: 88680   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:51:43,879-Speed 5510.05 samples/sec   Loss 6.4258   LearningRate 0.1089   Epoch: 8   Global Step: 88690   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:51:51,396-Speed 5449.42 samples/sec   Loss 6.4355   LearningRate 0.1089   Epoch: 8   Global Step: 88700   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:51:58,865-Speed 5485.07 samples/sec   Loss 6.4722   LearningRate 0.1088   Epoch: 8   Global Step: 88710   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:52:06,359-Speed 5466.39 samples/sec   Loss 6.4979   LearningRate 0.1088   Epoch: 8   Global Step: 88720   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:52:13,876-Speed 5449.92 samples/sec   Loss 6.4357   LearningRate 0.1088   Epoch: 8   Global Step: 88730   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:52:21,406-Speed 5440.31 samples/sec   Loss 6.4828   LearningRate 0.1088   Epoch: 8   Global Step: 88740   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:52:28,937-Speed 5439.14 samples/sec   Loss 6.4607   LearningRate 0.1088   Epoch: 8   Global Step: 88750   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:52:36,532-Speed 5393.87 samples/sec   Loss 6.4329   LearningRate 0.1088   Epoch: 8   Global Step: 88760   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:52:44,009-Speed 5478.68 samples/sec   Loss 6.4224   LearningRate 0.1087   Epoch: 8   Global Step: 88770   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:52:51,551-Speed 5431.94 samples/sec   Loss 6.4631   LearningRate 0.1087   Epoch: 8   Global Step: 88780   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:52:59,140-Speed 5397.86 samples/sec   Loss 6.4299   LearningRate 0.1087   Epoch: 8   Global Step: 88790   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:53:06,712-Speed 5409.89 samples/sec   Loss 6.4882   LearningRate 0.1087   Epoch: 8   Global Step: 88800   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:53:14,247-Speed 5437.27 samples/sec   Loss 6.4441   LearningRate 0.1087   Epoch: 8   Global Step: 88810   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:53:21,765-Speed 5448.92 samples/sec   Loss 6.3701   LearningRate 0.1086   Epoch: 8   Global Step: 88820   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:53:29,295-Speed 5440.17 samples/sec   Loss 6.3729   LearningRate 0.1086   Epoch: 8   Global Step: 88830   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:53:36,810-Speed 5450.94 samples/sec   Loss 6.4285   LearningRate 0.1086   Epoch: 8   Global Step: 88840   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:53:44,349-Speed 5433.75 samples/sec   Loss 6.3561   LearningRate 0.1086   Epoch: 8   Global Step: 88850   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:53:51,879-Speed 5440.32 samples/sec   Loss 6.4592   LearningRate 0.1086   Epoch: 8   Global Step: 88860   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:53:59,341-Speed 5490.07 samples/sec   Loss 6.4488   LearningRate 0.1086   Epoch: 8   Global Step: 88870   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:54:06,851-Speed 5454.55 samples/sec   Loss 6.4374   LearningRate 0.1085   Epoch: 8   Global Step: 88880   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:54:14,396-Speed 5429.37 samples/sec   Loss 6.4067   LearningRate 0.1085   Epoch: 8   Global Step: 88890   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:54:21,889-Speed 5467.12 samples/sec   Loss 6.4604   LearningRate 0.1085   Epoch: 8   Global Step: 88900   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:54:29,337-Speed 5500.55 samples/sec   Loss 6.4271   LearningRate 0.1085   Epoch: 8   Global Step: 88910   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:54:36,783-Speed 5501.18 samples/sec   Loss 6.4697   LearningRate 0.1085   Epoch: 8   Global Step: 88920   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:54:44,239-Speed 5494.56 samples/sec   Loss 6.4096   LearningRate 0.1084   Epoch: 8   Global Step: 88930   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 14:54:51,747-Speed 5456.57 samples/sec   Loss 6.4158   LearningRate 0.1084   Epoch: 8   Global Step: 88940   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 14:54:59,296-Speed 5426.32 samples/sec   Loss 6.4616   LearningRate 0.1084   Epoch: 8   Global Step: 88950   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 14:55:06,990-Speed 5324.43 samples/sec   Loss 6.4114   LearningRate 0.1084   Epoch: 8   Global Step: 88960   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 14:55:14,501-Speed 5453.92 samples/sec   Loss 6.4091   LearningRate 0.1084   Epoch: 8   Global Step: 88970   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 14:55:22,005-Speed 5459.20 samples/sec   Loss 6.5031   LearningRate 0.1084   Epoch: 8   Global Step: 88980   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 14:55:29,532-Speed 5442.78 samples/sec   Loss 6.4737   LearningRate 0.1083   Epoch: 8   Global Step: 88990   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 14:55:37,050-Speed 5448.52 samples/sec   Loss 6.4257   LearningRate 0.1083   Epoch: 8   Global Step: 89000   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 14:55:44,633-Speed 5401.84 samples/sec   Loss 6.4046   LearningRate 0.1083   Epoch: 8   Global Step: 89010   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 14:55:52,135-Speed 5461.18 samples/sec   Loss 6.4413   LearningRate 0.1083   Epoch: 8   Global Step: 89020   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 14:55:59,717-Speed 5403.35 samples/sec   Loss 6.4026   LearningRate 0.1083   Epoch: 8   Global Step: 89030   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:56:07,271-Speed 5422.77 samples/sec   Loss 6.4552   LearningRate 0.1082   Epoch: 8   Global Step: 89040   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:56:14,842-Speed 5410.13 samples/sec   Loss 6.4768   LearningRate 0.1082   Epoch: 8   Global Step: 89050   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:56:22,363-Speed 5447.52 samples/sec   Loss 6.4669   LearningRate 0.1082   Epoch: 8   Global Step: 89060   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:56:29,946-Speed 5402.35 samples/sec   Loss 6.4206   LearningRate 0.1082   Epoch: 8   Global Step: 89070   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:56:37,373-Speed 5515.08 samples/sec   Loss 6.4364   LearningRate 0.1082   Epoch: 8   Global Step: 89080   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:56:44,813-Speed 5506.02 samples/sec   Loss 6.4136   LearningRate 0.1082   Epoch: 8   Global Step: 89090   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:56:52,277-Speed 5488.94 samples/sec   Loss 6.4471   LearningRate 0.1081   Epoch: 8   Global Step: 89100   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:56:59,800-Speed 5445.29 samples/sec   Loss 6.4242   LearningRate 0.1081   Epoch: 8   Global Step: 89110   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:57:07,316-Speed 5450.52 samples/sec   Loss 6.3847   LearningRate 0.1081   Epoch: 8   Global Step: 89120   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:57:14,808-Speed 5467.44 samples/sec   Loss 6.4057   LearningRate 0.1081   Epoch: 8   Global Step: 89130   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:57:22,312-Speed 5459.38 samples/sec   Loss 6.4144   LearningRate 0.1081   Epoch: 8   Global Step: 89140   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:57:29,822-Speed 5454.95 samples/sec   Loss 6.4757   LearningRate 0.1080   Epoch: 8   Global Step: 89150   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:57:37,325-Speed 5460.11 samples/sec   Loss 6.3540   LearningRate 0.1080   Epoch: 8   Global Step: 89160   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:57:44,832-Speed 5456.82 samples/sec   Loss 6.4082   LearningRate 0.1080   Epoch: 8   Global Step: 89170   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:57:52,345-Speed 5452.42 samples/sec   Loss 6.4029   LearningRate 0.1080   Epoch: 8   Global Step: 89180   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:57:59,847-Speed 5460.42 samples/sec   Loss 6.4151   LearningRate 0.1080   Epoch: 8   Global Step: 89190   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:58:07,276-Speed 5514.42 samples/sec   Loss 6.3998   LearningRate 0.1080   Epoch: 8   Global Step: 89200   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:58:14,892-Speed 5378.47 samples/sec   Loss 6.4071   LearningRate 0.1079   Epoch: 8   Global Step: 89210   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:58:22,487-Speed 5394.38 samples/sec   Loss 6.3815   LearningRate 0.1079   Epoch: 8   Global Step: 89220   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:58:29,979-Speed 5467.99 samples/sec   Loss 6.3675   LearningRate 0.1079   Epoch: 8   Global Step: 89230   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:58:37,447-Speed 5485.41 samples/sec   Loss 6.4078   LearningRate 0.1079   Epoch: 8   Global Step: 89240   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:58:44,867-Speed 5520.19 samples/sec   Loss 6.4233   LearningRate 0.1079   Epoch: 8   Global Step: 89250   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:58:52,408-Speed 5432.79 samples/sec   Loss 6.3854   LearningRate 0.1078   Epoch: 8   Global Step: 89260   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:58:59,936-Speed 5442.21 samples/sec   Loss 6.3858   LearningRate 0.1078   Epoch: 8   Global Step: 89270   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:59:07,500-Speed 5415.79 samples/sec   Loss 6.3811   LearningRate 0.1078   Epoch: 8   Global Step: 89280   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:59:14,975-Speed 5479.94 samples/sec   Loss 6.4029   LearningRate 0.1078   Epoch: 8   Global Step: 89290   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 14:59:22,500-Speed 5444.61 samples/sec   Loss 6.3941   LearningRate 0.1078   Epoch: 8   Global Step: 89300   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:59:30,093-Speed 5394.90 samples/sec   Loss 6.4273   LearningRate 0.1078   Epoch: 8   Global Step: 89310   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 14:59:37,595-Speed 5460.43 samples/sec   Loss 6.4112   LearningRate 0.1077   Epoch: 8   Global Step: 89320   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 14:59:45,063-Speed 5485.33 samples/sec   Loss 6.4124   LearningRate 0.1077   Epoch: 8   Global Step: 89330   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 14:59:52,607-Speed 5430.79 samples/sec   Loss 6.4071   LearningRate 0.1077   Epoch: 8   Global Step: 89340   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 15:00:00,126-Speed 5448.05 samples/sec   Loss 6.3861   LearningRate 0.1077   Epoch: 8   Global Step: 89350   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 15:00:07,643-Speed 5449.53 samples/sec   Loss 6.3482   LearningRate 0.1077   Epoch: 8   Global Step: 89360   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 15:00:15,140-Speed 5463.98 samples/sec   Loss 6.3601   LearningRate 0.1076   Epoch: 8   Global Step: 89370   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 15:00:22,612-Speed 5482.87 samples/sec   Loss 6.4595   LearningRate 0.1076   Epoch: 8   Global Step: 89380   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 15:00:30,084-Speed 5483.24 samples/sec   Loss 6.4344   LearningRate 0.1076   Epoch: 8   Global Step: 89390   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 15:00:37,629-Speed 5429.26 samples/sec   Loss 6.3933   LearningRate 0.1076   Epoch: 8   Global Step: 89400   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 15:00:45,197-Speed 5412.54 samples/sec   Loss 6.3880   LearningRate 0.1076   Epoch: 8   Global Step: 89410   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 15:00:52,720-Speed 5445.65 samples/sec   Loss 6.3341   LearningRate 0.1075   Epoch: 8   Global Step: 89420   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:01:00,214-Speed 5466.91 samples/sec   Loss 6.3826   LearningRate 0.1075   Epoch: 8   Global Step: 89430   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:01:07,670-Speed 5494.28 samples/sec   Loss 6.3709   LearningRate 0.1075   Epoch: 8   Global Step: 89440   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:01:15,172-Speed 5460.64 samples/sec   Loss 6.4135   LearningRate 0.1075   Epoch: 8   Global Step: 89450   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:01:22,651-Speed 5477.51 samples/sec   Loss 6.3834   LearningRate 0.1075   Epoch: 8   Global Step: 89460   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:01:30,161-Speed 5455.05 samples/sec   Loss 6.4166   LearningRate 0.1075   Epoch: 8   Global Step: 89470   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:01:37,614-Speed 5495.96 samples/sec   Loss 6.3495   LearningRate 0.1074   Epoch: 8   Global Step: 89480   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:01:45,106-Speed 5467.74 samples/sec   Loss 6.3576   LearningRate 0.1074   Epoch: 8   Global Step: 89490   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:01:52,726-Speed 5376.81 samples/sec   Loss 6.4389   LearningRate 0.1074   Epoch: 8   Global Step: 89500   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:02:00,264-Speed 5434.45 samples/sec   Loss 6.4464   LearningRate 0.1074   Epoch: 8   Global Step: 89510   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:02:07,835-Speed 5411.02 samples/sec   Loss 6.3840   LearningRate 0.1074   Epoch: 8   Global Step: 89520   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:02:15,327-Speed 5467.02 samples/sec   Loss 6.3994   LearningRate 0.1073   Epoch: 8   Global Step: 89530   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:02:22,854-Speed 5442.79 samples/sec   Loss 6.3826   LearningRate 0.1073   Epoch: 8   Global Step: 89540   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:02:30,300-Speed 5502.05 samples/sec   Loss 6.3630   LearningRate 0.1073   Epoch: 8   Global Step: 89550   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:02:37,812-Speed 5453.14 samples/sec   Loss 6.3198   LearningRate 0.1073   Epoch: 8   Global Step: 89560   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:02:45,375-Speed 5417.01 samples/sec   Loss 6.3964   LearningRate 0.1073   Epoch: 8   Global Step: 89570   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:02:52,977-Speed 5388.85 samples/sec   Loss 6.4135   LearningRate 0.1073   Epoch: 8   Global Step: 89580   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:03:00,515-Speed 5434.75 samples/sec   Loss 6.4021   LearningRate 0.1072   Epoch: 8   Global Step: 89590   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:03:08,024-Speed 5455.15 samples/sec   Loss 6.3364   LearningRate 0.1072   Epoch: 8   Global Step: 89600   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:03:15,500-Speed 5479.36 samples/sec   Loss 6.3678   LearningRate 0.1072   Epoch: 8   Global Step: 89610   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:03:23,091-Speed 5396.55 samples/sec   Loss 6.3409   LearningRate 0.1072   Epoch: 8   Global Step: 89620   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:03:30,580-Speed 5470.55 samples/sec   Loss 6.4530   LearningRate 0.1072   Epoch: 8   Global Step: 89630   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:03:38,057-Speed 5478.92 samples/sec   Loss 6.4023   LearningRate 0.1071   Epoch: 8   Global Step: 89640   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:03:45,614-Speed 5420.65 samples/sec   Loss 6.4812   LearningRate 0.1071   Epoch: 8   Global Step: 89650   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:03:53,234-Speed 5376.20 samples/sec   Loss 6.4000   LearningRate 0.1071   Epoch: 8   Global Step: 89660   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:04:00,888-Speed 5352.60 samples/sec   Loss 6.3886   LearningRate 0.1071   Epoch: 8   Global Step: 89670   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:04:08,424-Speed 5436.28 samples/sec   Loss 6.3715   LearningRate 0.1071   Epoch: 8   Global Step: 89680   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:04:15,965-Speed 5432.06 samples/sec   Loss 6.3963   LearningRate 0.1071   Epoch: 8   Global Step: 89690   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:04:23,496-Speed 5439.32 samples/sec   Loss 6.3437   LearningRate 0.1070   Epoch: 8   Global Step: 89700   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:04:31,021-Speed 5443.69 samples/sec   Loss 6.3988   LearningRate 0.1070   Epoch: 8   Global Step: 89710   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:04:38,522-Speed 5461.58 samples/sec   Loss 6.4325   LearningRate 0.1070   Epoch: 8   Global Step: 89720   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:04:46,039-Speed 5449.93 samples/sec   Loss 6.3708   LearningRate 0.1070   Epoch: 8   Global Step: 89730   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:04:53,684-Speed 5358.10 samples/sec   Loss 6.3532   LearningRate 0.1070   Epoch: 8   Global Step: 89740   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:05:01,139-Speed 5495.06 samples/sec   Loss 6.3392   LearningRate 0.1069   Epoch: 8   Global Step: 89750   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:05:08,673-Speed 5437.60 samples/sec   Loss 6.4639   LearningRate 0.1069   Epoch: 8   Global Step: 89760   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:05:16,181-Speed 5456.68 samples/sec   Loss 6.3796   LearningRate 0.1069   Epoch: 8   Global Step: 89770   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:05:23,674-Speed 5466.79 samples/sec   Loss 6.4247   LearningRate 0.1069   Epoch: 8   Global Step: 89780   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:05:31,132-Speed 5492.52 samples/sec   Loss 6.3639   LearningRate 0.1069   Epoch: 8   Global Step: 89790   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:05:38,689-Speed 5421.29 samples/sec   Loss 6.3495   LearningRate 0.1069   Epoch: 8   Global Step: 89800   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:05:46,153-Speed 5488.55 samples/sec   Loss 6.3642   LearningRate 0.1068   Epoch: 8   Global Step: 89810   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:05:53,709-Speed 5420.77 samples/sec   Loss 6.4081   LearningRate 0.1068   Epoch: 8   Global Step: 89820   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:06:01,291-Speed 5403.43 samples/sec   Loss 6.3738   LearningRate 0.1068   Epoch: 8   Global Step: 89830   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:06:08,803-Speed 5453.45 samples/sec   Loss 6.3720   LearningRate 0.1068   Epoch: 8   Global Step: 89840   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:06:16,324-Speed 5446.56 samples/sec   Loss 6.3849   LearningRate 0.1068   Epoch: 8   Global Step: 89850   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 15:06:23,791-Speed 5486.01 samples/sec   Loss 6.4089   LearningRate 0.1067   Epoch: 8   Global Step: 89860   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 15:06:31,385-Speed 5394.91 samples/sec   Loss 6.3789   LearningRate 0.1067   Epoch: 8   Global Step: 89870   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 15:06:38,824-Speed 5506.88 samples/sec   Loss 6.3288   LearningRate 0.1067   Epoch: 8   Global Step: 89880   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 15:06:46,358-Speed 5437.36 samples/sec   Loss 6.3717   LearningRate 0.1067   Epoch: 8   Global Step: 89890   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 15:06:53,996-Speed 5363.33 samples/sec   Loss 6.4259   LearningRate 0.1067   Epoch: 8   Global Step: 89900   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 15:07:01,560-Speed 5416.02 samples/sec   Loss 6.3720   LearningRate 0.1067   Epoch: 8   Global Step: 89910   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 15:07:09,009-Speed 5499.07 samples/sec   Loss 6.4074   LearningRate 0.1066   Epoch: 8   Global Step: 89920   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 15:07:16,654-Speed 5358.79 samples/sec   Loss 6.3809   LearningRate 0.1066   Epoch: 8   Global Step: 89930   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 15:07:24,144-Speed 5468.95 samples/sec   Loss 6.4517   LearningRate 0.1066   Epoch: 8   Global Step: 89940   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 15:07:31,683-Speed 5434.18 samples/sec   Loss 6.4408   LearningRate 0.1066   Epoch: 8   Global Step: 89950   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:07:39,178-Speed 5465.55 samples/sec   Loss 6.3435   LearningRate 0.1066   Epoch: 8   Global Step: 89960   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:07:46,727-Speed 5426.72 samples/sec   Loss 6.3637   LearningRate 0.1065   Epoch: 8   Global Step: 89970   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:07:54,183-Speed 5494.63 samples/sec   Loss 6.4197   LearningRate 0.1065   Epoch: 8   Global Step: 89980   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:08:01,726-Speed 5431.15 samples/sec   Loss 6.3457   LearningRate 0.1065   Epoch: 8   Global Step: 89990   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:08:09,332-Speed 5386.04 samples/sec   Loss 6.3363   LearningRate 0.1065   Epoch: 8   Global Step: 90000   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:08:53,779-[lfw][90000]XNorm: 23.255685
Training: 2022-01-08 15:08:53,780-[lfw][90000]Accuracy-Flip: 0.99767+-0.00271
Training: 2022-01-08 15:08:53,781-[lfw][90000]Accuracy-Highest: 0.99817
Training: 2022-01-08 15:09:45,467-[cfp_fp][90000]XNorm: 20.862160
Training: 2022-01-08 15:09:45,468-[cfp_fp][90000]Accuracy-Flip: 0.98814+-0.00561
Training: 2022-01-08 15:09:45,468-[cfp_fp][90000]Accuracy-Highest: 0.98814
Training: 2022-01-08 15:10:31,192-[agedb_30][90000]XNorm: 22.981298
Training: 2022-01-08 15:10:31,193-[agedb_30][90000]Accuracy-Flip: 0.97833+-0.00898
Training: 2022-01-08 15:10:31,194-[agedb_30][90000]Accuracy-Highest: 0.97833
Training: 2022-01-08 15:10:39,000-Speed 273.67 samples/sec   Loss 6.3452   LearningRate 0.1065   Epoch: 8   Global Step: 90010   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:10:46,574-Speed 5409.16 samples/sec   Loss 6.3746   LearningRate 0.1065   Epoch: 8   Global Step: 90020   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:10:54,142-Speed 5413.09 samples/sec   Loss 6.2902   LearningRate 0.1064   Epoch: 8   Global Step: 90030   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:11:01,702-Speed 5419.12 samples/sec   Loss 6.3753   LearningRate 0.1064   Epoch: 8   Global Step: 90040   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:11:09,168-Speed 5487.01 samples/sec   Loss 6.3624   LearningRate 0.1064   Epoch: 8   Global Step: 90050   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:11:16,682-Speed 5451.77 samples/sec   Loss 6.3515   LearningRate 0.1064   Epoch: 8   Global Step: 90060   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:11:24,400-Speed 5307.36 samples/sec   Loss 6.3791   LearningRate 0.1064   Epoch: 8   Global Step: 90070   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:11:31,923-Speed 5445.39 samples/sec   Loss 6.2899   LearningRate 0.1063   Epoch: 8   Global Step: 90080   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:11:39,599-Speed 5337.11 samples/sec   Loss 6.3140   LearningRate 0.1063   Epoch: 8   Global Step: 90090   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:11:47,083-Speed 5473.61 samples/sec   Loss 6.2826   LearningRate 0.1063   Epoch: 8   Global Step: 90100   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:11:54,547-Speed 5487.81 samples/sec   Loss 6.3622   LearningRate 0.1063   Epoch: 8   Global Step: 90110   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:12:02,087-Speed 5433.61 samples/sec   Loss 6.3358   LearningRate 0.1063   Epoch: 8   Global Step: 90120   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:12:09,617-Speed 5440.76 samples/sec   Loss 6.4184   LearningRate 0.1063   Epoch: 8   Global Step: 90130   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:12:17,045-Speed 5514.27 samples/sec   Loss 6.4360   LearningRate 0.1062   Epoch: 8   Global Step: 90140   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:12:24,460-Speed 5524.53 samples/sec   Loss 6.4104   LearningRate 0.1062   Epoch: 8   Global Step: 90150   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:12:31,944-Speed 5474.40 samples/sec   Loss 6.3747   LearningRate 0.1062   Epoch: 8   Global Step: 90160   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:12:39,348-Speed 5533.21 samples/sec   Loss 6.3470   LearningRate 0.1062   Epoch: 8   Global Step: 90170   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:12:46,785-Speed 5507.89 samples/sec   Loss 6.3291   LearningRate 0.1062   Epoch: 8   Global Step: 90180   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:12:54,226-Speed 5504.92 samples/sec   Loss 6.3336   LearningRate 0.1062   Epoch: 8   Global Step: 90190   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:13:01,698-Speed 5482.92 samples/sec   Loss 6.3329   LearningRate 0.1061   Epoch: 8   Global Step: 90200   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:13:09,331-Speed 5367.51 samples/sec   Loss 6.3444   LearningRate 0.1061   Epoch: 8   Global Step: 90210   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:13:16,763-Speed 5511.65 samples/sec   Loss 6.3900   LearningRate 0.1061   Epoch: 8   Global Step: 90220   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:13:24,261-Speed 5463.57 samples/sec   Loss 6.3918   LearningRate 0.1061   Epoch: 8   Global Step: 90230   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:13:31,732-Speed 5483.04 samples/sec   Loss 6.2464   LearningRate 0.1061   Epoch: 8   Global Step: 90240   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:13:39,251-Speed 5448.31 samples/sec   Loss 6.3708   LearningRate 0.1060   Epoch: 8   Global Step: 90250   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:13:46,749-Speed 5463.32 samples/sec   Loss 6.4156   LearningRate 0.1060   Epoch: 8   Global Step: 90260   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:13:54,169-Speed 5520.80 samples/sec   Loss 6.3272   LearningRate 0.1060   Epoch: 8   Global Step: 90270   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:14:01,727-Speed 5420.68 samples/sec   Loss 6.3815   LearningRate 0.1060   Epoch: 8   Global Step: 90280   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:14:09,215-Speed 5471.03 samples/sec   Loss 6.3406   LearningRate 0.1060   Epoch: 8   Global Step: 90290   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:14:16,771-Speed 5420.97 samples/sec   Loss 6.3088   LearningRate 0.1060   Epoch: 8   Global Step: 90300   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:14:24,269-Speed 5463.97 samples/sec   Loss 6.3658   LearningRate 0.1059   Epoch: 8   Global Step: 90310   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:14:31,780-Speed 5453.67 samples/sec   Loss 6.3960   LearningRate 0.1059   Epoch: 8   Global Step: 90320   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:14:39,380-Speed 5390.93 samples/sec   Loss 6.2892   LearningRate 0.1059   Epoch: 8   Global Step: 90330   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:14:46,890-Speed 5454.47 samples/sec   Loss 6.3221   LearningRate 0.1059   Epoch: 8   Global Step: 90340   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:14:54,555-Speed 5344.78 samples/sec   Loss 6.3578   LearningRate 0.1059   Epoch: 8   Global Step: 90350   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:15:02,074-Speed 5447.68 samples/sec   Loss 6.3630   LearningRate 0.1058   Epoch: 8   Global Step: 90360   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:15:09,513-Speed 5507.20 samples/sec   Loss 6.3999   LearningRate 0.1058   Epoch: 8   Global Step: 90370   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:15:17,032-Speed 5448.50 samples/sec   Loss 6.3220   LearningRate 0.1058   Epoch: 8   Global Step: 90380   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:15:24,465-Speed 5511.50 samples/sec   Loss 6.3393   LearningRate 0.1058   Epoch: 8   Global Step: 90390   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:15:31,964-Speed 5462.72 samples/sec   Loss 6.3505   LearningRate 0.1058   Epoch: 8   Global Step: 90400   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:15:39,423-Speed 5492.19 samples/sec   Loss 6.3802   LearningRate 0.1058   Epoch: 8   Global Step: 90410   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:15:46,917-Speed 5466.81 samples/sec   Loss 6.3551   LearningRate 0.1057   Epoch: 8   Global Step: 90420   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:15:54,443-Speed 5442.70 samples/sec   Loss 6.3927   LearningRate 0.1057   Epoch: 8   Global Step: 90430   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:16:01,962-Speed 5448.71 samples/sec   Loss 6.3473   LearningRate 0.1057   Epoch: 8   Global Step: 90440   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:16:09,397-Speed 5510.19 samples/sec   Loss 6.3131   LearningRate 0.1057   Epoch: 8   Global Step: 90450   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:16:16,927-Speed 5440.52 samples/sec   Loss 6.3459   LearningRate 0.1057   Epoch: 8   Global Step: 90460   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:16:24,373-Speed 5501.05 samples/sec   Loss 6.3706   LearningRate 0.1056   Epoch: 8   Global Step: 90470   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:16:31,796-Speed 5518.54 samples/sec   Loss 6.3643   LearningRate 0.1056   Epoch: 8   Global Step: 90480   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:16:39,368-Speed 5410.37 samples/sec   Loss 6.3781   LearningRate 0.1056   Epoch: 8   Global Step: 90490   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:16:46,880-Speed 5453.67 samples/sec   Loss 6.3256   LearningRate 0.1056   Epoch: 8   Global Step: 90500   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:16:54,432-Speed 5424.32 samples/sec   Loss 6.3696   LearningRate 0.1056   Epoch: 8   Global Step: 90510   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:17:01,949-Speed 5449.93 samples/sec   Loss 6.3220   LearningRate 0.1056   Epoch: 8   Global Step: 90520   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:17:09,551-Speed 5388.81 samples/sec   Loss 6.3445   LearningRate 0.1055   Epoch: 8   Global Step: 90530   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:17:17,062-Speed 5453.98 samples/sec   Loss 6.3724   LearningRate 0.1055   Epoch: 8   Global Step: 90540   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:17:24,870-Speed 5246.43 samples/sec   Loss 6.3242   LearningRate 0.1055   Epoch: 8   Global Step: 90550   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:17:32,514-Speed 5359.14 samples/sec   Loss 6.3584   LearningRate 0.1055   Epoch: 8   Global Step: 90560   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:17:40,032-Speed 5449.77 samples/sec   Loss 6.2874   LearningRate 0.1055   Epoch: 8   Global Step: 90570   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:17:47,692-Speed 5347.92 samples/sec   Loss 6.2505   LearningRate 0.1054   Epoch: 8   Global Step: 90580   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:17:55,531-Speed 5225.81 samples/sec   Loss 6.3352   LearningRate 0.1054   Epoch: 8   Global Step: 90590   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:18:03,099-Speed 5413.15 samples/sec   Loss 6.3023   LearningRate 0.1054   Epoch: 8   Global Step: 90600   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:18:10,605-Speed 5457.85 samples/sec   Loss 6.3072   LearningRate 0.1054   Epoch: 8   Global Step: 90610   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:18:18,100-Speed 5465.82 samples/sec   Loss 6.4032   LearningRate 0.1054   Epoch: 8   Global Step: 90620   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:18:25,620-Speed 5447.21 samples/sec   Loss 6.3771   LearningRate 0.1054   Epoch: 8   Global Step: 90630   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:18:33,225-Speed 5387.13 samples/sec   Loss 6.3449   LearningRate 0.1053   Epoch: 8   Global Step: 90640   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:18:40,839-Speed 5379.77 samples/sec   Loss 6.4204   LearningRate 0.1053   Epoch: 8   Global Step: 90650   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:18:48,306-Speed 5486.65 samples/sec   Loss 6.2851   LearningRate 0.1053   Epoch: 8   Global Step: 90660   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:18:55,762-Speed 5494.37 samples/sec   Loss 6.3137   LearningRate 0.1053   Epoch: 8   Global Step: 90670   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:19:03,203-Speed 5505.27 samples/sec   Loss 6.2575   LearningRate 0.1053   Epoch: 8   Global Step: 90680   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:19:10,692-Speed 5470.64 samples/sec   Loss 6.2902   LearningRate 0.1052   Epoch: 8   Global Step: 90690   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:19:18,152-Speed 5491.28 samples/sec   Loss 6.2806   LearningRate 0.1052   Epoch: 8   Global Step: 90700   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:19:25,631-Speed 5477.12 samples/sec   Loss 6.3334   LearningRate 0.1052   Epoch: 8   Global Step: 90710   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:19:33,120-Speed 5469.97 samples/sec   Loss 6.3050   LearningRate 0.1052   Epoch: 8   Global Step: 90720   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:19:40,729-Speed 5383.90 samples/sec   Loss 6.2830   LearningRate 0.1052   Epoch: 8   Global Step: 90730   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:19:48,250-Speed 5447.17 samples/sec   Loss 6.3148   LearningRate 0.1052   Epoch: 8   Global Step: 90740   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:19:55,743-Speed 5467.26 samples/sec   Loss 6.3356   LearningRate 0.1051   Epoch: 8   Global Step: 90750   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:20:03,188-Speed 5501.96 samples/sec   Loss 6.3581   LearningRate 0.1051   Epoch: 8   Global Step: 90760   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:20:10,656-Speed 5485.83 samples/sec   Loss 6.2625   LearningRate 0.1051   Epoch: 8   Global Step: 90770   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:20:18,133-Speed 5479.17 samples/sec   Loss 6.3098   LearningRate 0.1051   Epoch: 8   Global Step: 90780   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:20:25,637-Speed 5459.23 samples/sec   Loss 6.3058   LearningRate 0.1051   Epoch: 8   Global Step: 90790   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:20:33,073-Speed 5509.19 samples/sec   Loss 6.2819   LearningRate 0.1050   Epoch: 8   Global Step: 90800   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:20:40,484-Speed 5526.98 samples/sec   Loss 6.3386   LearningRate 0.1050   Epoch: 8   Global Step: 90810   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:20:47,868-Speed 5548.63 samples/sec   Loss 6.2967   LearningRate 0.1050   Epoch: 8   Global Step: 90820   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:20:55,327-Speed 5491.66 samples/sec   Loss 6.3526   LearningRate 0.1050   Epoch: 8   Global Step: 90830   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:21:02,914-Speed 5399.61 samples/sec   Loss 6.3293   LearningRate 0.1050   Epoch: 8   Global Step: 90840   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:21:10,448-Speed 5437.12 samples/sec   Loss 6.3278   LearningRate 0.1050   Epoch: 8   Global Step: 90850   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:21:17,812-Speed 5563.36 samples/sec   Loss 6.3178   LearningRate 0.1049   Epoch: 8   Global Step: 90860   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:21:25,490-Speed 5335.81 samples/sec   Loss 6.3089   LearningRate 0.1049   Epoch: 8   Global Step: 90870   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:21:32,948-Speed 5492.40 samples/sec   Loss 6.3197   LearningRate 0.1049   Epoch: 8   Global Step: 90880   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:21:40,610-Speed 5346.46 samples/sec   Loss 6.2640   LearningRate 0.1049   Epoch: 8   Global Step: 90890   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:21:48,116-Speed 5458.24 samples/sec   Loss 6.3681   LearningRate 0.1049   Epoch: 8   Global Step: 90900   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:21:55,546-Speed 5513.99 samples/sec   Loss 6.3520   LearningRate 0.1049   Epoch: 8   Global Step: 90910   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:22:03,103-Speed 5420.17 samples/sec   Loss 6.3399   LearningRate 0.1048   Epoch: 8   Global Step: 90920   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:22:10,689-Speed 5400.65 samples/sec   Loss 6.2985   LearningRate 0.1048   Epoch: 8   Global Step: 90930   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:22:18,100-Speed 5527.20 samples/sec   Loss 6.3509   LearningRate 0.1048   Epoch: 8   Global Step: 90940   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:22:25,555-Speed 5495.77 samples/sec   Loss 6.3236   LearningRate 0.1048   Epoch: 8   Global Step: 90950   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:22:33,081-Speed 5443.08 samples/sec   Loss 6.3061   LearningRate 0.1048   Epoch: 8   Global Step: 90960   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:22:40,541-Speed 5490.92 samples/sec   Loss 6.2942   LearningRate 0.1047   Epoch: 8   Global Step: 90970   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:22:47,975-Speed 5510.53 samples/sec   Loss 6.2979   LearningRate 0.1047   Epoch: 8   Global Step: 90980   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:22:55,479-Speed 5459.82 samples/sec   Loss 6.3374   LearningRate 0.1047   Epoch: 8   Global Step: 90990   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:23:02,897-Speed 5522.11 samples/sec   Loss 6.3189   LearningRate 0.1047   Epoch: 8   Global Step: 91000   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:23:10,330-Speed 5511.18 samples/sec   Loss 6.3186   LearningRate 0.1047   Epoch: 8   Global Step: 91010   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:23:17,778-Speed 5500.44 samples/sec   Loss 6.3274   LearningRate 0.1047   Epoch: 8   Global Step: 91020   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:23:25,215-Speed 5508.53 samples/sec   Loss 6.3365   LearningRate 0.1046   Epoch: 8   Global Step: 91030   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:23:32,660-Speed 5502.39 samples/sec   Loss 6.2798   LearningRate 0.1046   Epoch: 8   Global Step: 91040   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:23:40,071-Speed 5527.18 samples/sec   Loss 6.3000   LearningRate 0.1046   Epoch: 8   Global Step: 91050   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:23:47,517-Speed 5502.35 samples/sec   Loss 6.3145   LearningRate 0.1046   Epoch: 8   Global Step: 91060   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:23:54,929-Speed 5526.83 samples/sec   Loss 6.2854   LearningRate 0.1046   Epoch: 8   Global Step: 91070   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:24:02,384-Speed 5495.14 samples/sec   Loss 6.3048   LearningRate 0.1045   Epoch: 8   Global Step: 91080   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:24:09,807-Speed 5518.18 samples/sec   Loss 6.2779   LearningRate 0.1045   Epoch: 8   Global Step: 91090   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:24:17,203-Speed 5539.12 samples/sec   Loss 6.3090   LearningRate 0.1045   Epoch: 8   Global Step: 91100   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:24:24,574-Speed 5557.86 samples/sec   Loss 6.2912   LearningRate 0.1045   Epoch: 8   Global Step: 91110   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:24:31,965-Speed 5542.71 samples/sec   Loss 6.2864   LearningRate 0.1045   Epoch: 8   Global Step: 91120   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:24:39,383-Speed 5522.36 samples/sec   Loss 6.3078   LearningRate 0.1045   Epoch: 8   Global Step: 91130   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:24:46,807-Speed 5517.98 samples/sec   Loss 6.2804   LearningRate 0.1044   Epoch: 8   Global Step: 91140   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:24:54,257-Speed 5498.80 samples/sec   Loss 6.3397   LearningRate 0.1044   Epoch: 8   Global Step: 91150   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:25:01,714-Speed 5494.00 samples/sec   Loss 6.3092   LearningRate 0.1044   Epoch: 8   Global Step: 91160   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:25:09,118-Speed 5532.15 samples/sec   Loss 6.2745   LearningRate 0.1044   Epoch: 8   Global Step: 91170   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:25:16,570-Speed 5497.54 samples/sec   Loss 6.2381   LearningRate 0.1044   Epoch: 8   Global Step: 91180   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:25:23,943-Speed 5555.87 samples/sec   Loss 6.3105   LearningRate 0.1043   Epoch: 8   Global Step: 91190   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:25:31,344-Speed 5535.75 samples/sec   Loss 6.2773   LearningRate 0.1043   Epoch: 8   Global Step: 91200   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:25:38,751-Speed 5530.56 samples/sec   Loss 6.3379   LearningRate 0.1043   Epoch: 8   Global Step: 91210   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:25:46,136-Speed 5547.44 samples/sec   Loss 6.2506   LearningRate 0.1043   Epoch: 8   Global Step: 91220   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:25:53,751-Speed 5378.90 samples/sec   Loss 6.3453   LearningRate 0.1043   Epoch: 8   Global Step: 91230   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:26:01,223-Speed 5483.58 samples/sec   Loss 6.3038   LearningRate 0.1043   Epoch: 8   Global Step: 91240   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:26:08,625-Speed 5533.80 samples/sec   Loss 6.2639   LearningRate 0.1042   Epoch: 8   Global Step: 91250   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:26:16,000-Speed 5554.81 samples/sec   Loss 6.2487   LearningRate 0.1042   Epoch: 8   Global Step: 91260   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:26:23,409-Speed 5529.20 samples/sec   Loss 6.2601   LearningRate 0.1042   Epoch: 8   Global Step: 91270   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:26:30,966-Speed 5421.04 samples/sec   Loss 6.3619   LearningRate 0.1042   Epoch: 8   Global Step: 91280   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:26:38,377-Speed 5527.63 samples/sec   Loss 6.4110   LearningRate 0.1042   Epoch: 8   Global Step: 91290   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:26:45,920-Speed 5430.93 samples/sec   Loss 6.3204   LearningRate 0.1041   Epoch: 8   Global Step: 91300   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:26:53,478-Speed 5420.69 samples/sec   Loss 6.2689   LearningRate 0.1041   Epoch: 8   Global Step: 91310   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:27:00,933-Speed 5495.21 samples/sec   Loss 6.3546   LearningRate 0.1041   Epoch: 8   Global Step: 91320   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:27:08,372-Speed 5506.87 samples/sec   Loss 6.2657   LearningRate 0.1041   Epoch: 8   Global Step: 91330   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:27:15,864-Speed 5467.61 samples/sec   Loss 6.2774   LearningRate 0.1041   Epoch: 8   Global Step: 91340   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:27:23,297-Speed 5510.72 samples/sec   Loss 6.2442   LearningRate 0.1041   Epoch: 8   Global Step: 91350   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:27:30,809-Speed 5454.53 samples/sec   Loss 6.2770   LearningRate 0.1040   Epoch: 8   Global Step: 91360   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:27:38,251-Speed 5505.23 samples/sec   Loss 6.2990   LearningRate 0.1040   Epoch: 8   Global Step: 91370   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:27:45,670-Speed 5521.61 samples/sec   Loss 6.3120   LearningRate 0.1040   Epoch: 8   Global Step: 91380   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:27:53,104-Speed 5510.31 samples/sec   Loss 6.2560   LearningRate 0.1040   Epoch: 8   Global Step: 91390   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:28:00,527-Speed 5518.67 samples/sec   Loss 6.2076   LearningRate 0.1040   Epoch: 8   Global Step: 91400   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:28:07,952-Speed 5517.76 samples/sec   Loss 6.2169   LearningRate 0.1040   Epoch: 8   Global Step: 91410   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:28:15,365-Speed 5525.60 samples/sec   Loss 6.2184   LearningRate 0.1039   Epoch: 8   Global Step: 91420   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:28:22,765-Speed 5535.77 samples/sec   Loss 6.2977   LearningRate 0.1039   Epoch: 8   Global Step: 91430   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:28:30,172-Speed 5530.99 samples/sec   Loss 6.3111   LearningRate 0.1039   Epoch: 8   Global Step: 91440   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:28:37,659-Speed 5472.10 samples/sec   Loss 6.2831   LearningRate 0.1039   Epoch: 8   Global Step: 91450   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:28:45,072-Speed 5526.01 samples/sec   Loss 6.2973   LearningRate 0.1039   Epoch: 8   Global Step: 91460   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:28:52,499-Speed 5515.84 samples/sec   Loss 6.3061   LearningRate 0.1038   Epoch: 8   Global Step: 91470   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:28:59,938-Speed 5506.88 samples/sec   Loss 6.2954   LearningRate 0.1038   Epoch: 8   Global Step: 91480   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:29:07,419-Speed 5476.27 samples/sec   Loss 6.2270   LearningRate 0.1038   Epoch: 8   Global Step: 91490   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:29:14,884-Speed 5487.52 samples/sec   Loss 6.3000   LearningRate 0.1038   Epoch: 8   Global Step: 91500   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:29:22,292-Speed 5529.97 samples/sec   Loss 6.2513   LearningRate 0.1038   Epoch: 8   Global Step: 91510   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:29:29,715-Speed 5518.56 samples/sec   Loss 6.2636   LearningRate 0.1038   Epoch: 8   Global Step: 91520   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:29:37,181-Speed 5487.21 samples/sec   Loss 6.2848   LearningRate 0.1037   Epoch: 8   Global Step: 91530   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:29:44,613-Speed 5511.81 samples/sec   Loss 6.2905   LearningRate 0.1037   Epoch: 8   Global Step: 91540   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:29:52,036-Speed 5518.65 samples/sec   Loss 6.3115   LearningRate 0.1037   Epoch: 8   Global Step: 91550   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:29:59,516-Speed 5476.92 samples/sec   Loss 6.3707   LearningRate 0.1037   Epoch: 8   Global Step: 91560   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:30:06,944-Speed 5515.02 samples/sec   Loss 6.2532   LearningRate 0.1037   Epoch: 8   Global Step: 91570   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:30:14,417-Speed 5481.61 samples/sec   Loss 6.2800   LearningRate 0.1036   Epoch: 8   Global Step: 91580   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:30:21,969-Speed 5424.52 samples/sec   Loss 6.2478   LearningRate 0.1036   Epoch: 8   Global Step: 91590   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:30:29,384-Speed 5524.90 samples/sec   Loss 6.2577   LearningRate 0.1036   Epoch: 8   Global Step: 91600   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:30:36,803-Speed 5522.24 samples/sec   Loss 6.2241   LearningRate 0.1036   Epoch: 8   Global Step: 91610   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 15:30:44,194-Speed 5542.58 samples/sec   Loss 6.2363   LearningRate 0.1036   Epoch: 8   Global Step: 91620   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:30:51,583-Speed 5544.27 samples/sec   Loss 6.2836   LearningRate 0.1036   Epoch: 8   Global Step: 91630   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:30:59,111-Speed 5441.54 samples/sec   Loss 6.3297   LearningRate 0.1035   Epoch: 8   Global Step: 91640   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:31:06,513-Speed 5534.10 samples/sec   Loss 6.2977   LearningRate 0.1035   Epoch: 8   Global Step: 91650   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:31:13,918-Speed 5532.44 samples/sec   Loss 6.2006   LearningRate 0.1035   Epoch: 8   Global Step: 91660   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:31:21,339-Speed 5520.40 samples/sec   Loss 6.2383   LearningRate 0.1035   Epoch: 8   Global Step: 91670   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:31:28,761-Speed 5519.16 samples/sec   Loss 6.2726   LearningRate 0.1035   Epoch: 8   Global Step: 91680   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:31:36,176-Speed 5524.80 samples/sec   Loss 6.3024   LearningRate 0.1035   Epoch: 8   Global Step: 91690   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:31:43,602-Speed 5516.54 samples/sec   Loss 6.2806   LearningRate 0.1034   Epoch: 8   Global Step: 91700   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:31:50,986-Speed 5548.38 samples/sec   Loss 6.2568   LearningRate 0.1034   Epoch: 8   Global Step: 91710   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 15:31:58,432-Speed 5501.60 samples/sec   Loss 6.2321   LearningRate 0.1034   Epoch: 8   Global Step: 91720   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 15:32:06,009-Speed 5406.23 samples/sec   Loss 6.2570   LearningRate 0.1034   Epoch: 8   Global Step: 91730   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 15:32:13,408-Speed 5536.75 samples/sec   Loss 6.3544   LearningRate 0.1034   Epoch: 8   Global Step: 91740   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 15:32:20,826-Speed 5522.60 samples/sec   Loss 6.2821   LearningRate 0.1033   Epoch: 8   Global Step: 91750   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:32:28,257-Speed 5513.06 samples/sec   Loss 6.3301   LearningRate 0.1033   Epoch: 8   Global Step: 91760   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:32:35,666-Speed 5528.66 samples/sec   Loss 6.2232   LearningRate 0.1033   Epoch: 8   Global Step: 91770   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:32:43,139-Speed 5482.31 samples/sec   Loss 6.3090   LearningRate 0.1033   Epoch: 8   Global Step: 91780   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:32:50,567-Speed 5514.66 samples/sec   Loss 6.2921   LearningRate 0.1033   Epoch: 8   Global Step: 91790   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:32:58,022-Speed 5495.39 samples/sec   Loss 6.2909   LearningRate 0.1033   Epoch: 8   Global Step: 91800   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:33:05,541-Speed 5448.02 samples/sec   Loss 6.1989   LearningRate 0.1032   Epoch: 8   Global Step: 91810   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:33:12,933-Speed 5542.51 samples/sec   Loss 6.2928   LearningRate 0.1032   Epoch: 8   Global Step: 91820   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:33:20,387-Speed 5495.57 samples/sec   Loss 6.2368   LearningRate 0.1032   Epoch: 8   Global Step: 91830   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:33:27,811-Speed 5517.82 samples/sec   Loss 6.3148   LearningRate 0.1032   Epoch: 8   Global Step: 91840   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:33:35,261-Speed 5498.87 samples/sec   Loss 6.2618   LearningRate 0.1032   Epoch: 8   Global Step: 91850   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:33:42,720-Speed 5492.02 samples/sec   Loss 6.2479   LearningRate 0.1031   Epoch: 8   Global Step: 91860   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:33:50,139-Speed 5521.92 samples/sec   Loss 6.3346   LearningRate 0.1031   Epoch: 8   Global Step: 91870   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:33:57,584-Speed 5501.68 samples/sec   Loss 6.2762   LearningRate 0.1031   Epoch: 8   Global Step: 91880   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:34:05,037-Speed 5497.06 samples/sec   Loss 6.2637   LearningRate 0.1031   Epoch: 8   Global Step: 91890   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:34:12,527-Speed 5469.14 samples/sec   Loss 6.2077   LearningRate 0.1031   Epoch: 8   Global Step: 91900   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:34:19,941-Speed 5525.63 samples/sec   Loss 6.2496   LearningRate 0.1031   Epoch: 8   Global Step: 91910   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:34:27,364-Speed 5518.62 samples/sec   Loss 6.1925   LearningRate 0.1030   Epoch: 8   Global Step: 91920   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:34:34,788-Speed 5518.01 samples/sec   Loss 6.2479   LearningRate 0.1030   Epoch: 8   Global Step: 91930   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:34:42,462-Speed 5338.25 samples/sec   Loss 6.2613   LearningRate 0.1030   Epoch: 8   Global Step: 91940   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:34:49,873-Speed 5527.82 samples/sec   Loss 6.2601   LearningRate 0.1030   Epoch: 8   Global Step: 91950   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:34:57,294-Speed 5520.08 samples/sec   Loss 6.2287   LearningRate 0.1030   Epoch: 8   Global Step: 91960   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:35:04,728-Speed 5510.46 samples/sec   Loss 6.2270   LearningRate 0.1030   Epoch: 8   Global Step: 91970   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:35:12,170-Speed 5504.91 samples/sec   Loss 6.2496   LearningRate 0.1029   Epoch: 8   Global Step: 91980   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:35:19,576-Speed 5531.59 samples/sec   Loss 6.2186   LearningRate 0.1029   Epoch: 8   Global Step: 91990   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:35:27,171-Speed 5393.36 samples/sec   Loss 6.2122   LearningRate 0.1029   Epoch: 8   Global Step: 92000   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:36:11,446-[lfw][92000]XNorm: 23.362174
Training: 2022-01-08 15:36:11,447-[lfw][92000]Accuracy-Flip: 0.99733+-0.00260
Training: 2022-01-08 15:36:11,447-[lfw][92000]Accuracy-Highest: 0.99817
Training: 2022-01-08 15:37:03,307-[cfp_fp][92000]XNorm: 21.384648
Training: 2022-01-08 15:37:03,307-[cfp_fp][92000]Accuracy-Flip: 0.98800+-0.00466
Training: 2022-01-08 15:37:03,308-[cfp_fp][92000]Accuracy-Highest: 0.98814
Training: 2022-01-08 15:37:49,100-[agedb_30][92000]XNorm: 23.008124
Training: 2022-01-08 15:37:49,101-[agedb_30][92000]Accuracy-Flip: 0.97750+-0.00676
Training: 2022-01-08 15:37:49,102-[agedb_30][92000]Accuracy-Highest: 0.97833
Training: 2022-01-08 15:37:56,589-Speed 274.14 samples/sec   Loss 6.2756   LearningRate 0.1029   Epoch: 8   Global Step: 92010   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:38:03,998-Speed 5529.26 samples/sec   Loss 6.2044   LearningRate 0.1029   Epoch: 8   Global Step: 92020   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:38:11,386-Speed 5545.01 samples/sec   Loss 6.2701   LearningRate 0.1028   Epoch: 8   Global Step: 92030   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:38:18,757-Speed 5558.25 samples/sec   Loss 6.2542   LearningRate 0.1028   Epoch: 8   Global Step: 92040   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:38:26,150-Speed 5540.83 samples/sec   Loss 6.2458   LearningRate 0.1028   Epoch: 8   Global Step: 92050   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:38:33,555-Speed 5531.98 samples/sec   Loss 6.2544   LearningRate 0.1028   Epoch: 8   Global Step: 92060   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:38:40,991-Speed 5509.31 samples/sec   Loss 6.2489   LearningRate 0.1028   Epoch: 8   Global Step: 92070   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:38:48,497-Speed 5457.66 samples/sec   Loss 6.2244   LearningRate 0.1028   Epoch: 8   Global Step: 92080   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:38:55,900-Speed 5533.68 samples/sec   Loss 6.1725   LearningRate 0.1027   Epoch: 8   Global Step: 92090   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:39:03,416-Speed 5450.99 samples/sec   Loss 6.2601   LearningRate 0.1027   Epoch: 8   Global Step: 92100   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:39:10,812-Speed 5539.31 samples/sec   Loss 6.2976   LearningRate 0.1027   Epoch: 8   Global Step: 92110   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:39:18,193-Speed 5549.20 samples/sec   Loss 6.2722   LearningRate 0.1027   Epoch: 8   Global Step: 92120   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:39:25,616-Speed 5519.41 samples/sec   Loss 6.2951   LearningRate 0.1027   Epoch: 8   Global Step: 92130   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:39:33,028-Speed 5526.62 samples/sec   Loss 6.2833   LearningRate 0.1026   Epoch: 8   Global Step: 92140   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:39:40,457-Speed 5514.31 samples/sec   Loss 6.2625   LearningRate 0.1026   Epoch: 8   Global Step: 92150   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:39:48,003-Speed 5428.55 samples/sec   Loss 6.2666   LearningRate 0.1026   Epoch: 8   Global Step: 92160   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:39:55,463-Speed 5491.81 samples/sec   Loss 6.2344   LearningRate 0.1026   Epoch: 8   Global Step: 92170   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:40:03,082-Speed 5376.79 samples/sec   Loss 6.2526   LearningRate 0.1026   Epoch: 8   Global Step: 92180   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:40:10,525-Speed 5504.42 samples/sec   Loss 6.2552   LearningRate 0.1026   Epoch: 8   Global Step: 92190   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:40:17,967-Speed 5504.22 samples/sec   Loss 6.2046   LearningRate 0.1025   Epoch: 8   Global Step: 92200   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:40:25,385-Speed 5522.03 samples/sec   Loss 6.2740   LearningRate 0.1025   Epoch: 8   Global Step: 92210   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:40:32,857-Speed 5483.01 samples/sec   Loss 6.2386   LearningRate 0.1025   Epoch: 8   Global Step: 92220   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:40:40,388-Speed 5439.68 samples/sec   Loss 6.2194   LearningRate 0.1025   Epoch: 8   Global Step: 92230   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:40:47,880-Speed 5468.04 samples/sec   Loss 6.2733   LearningRate 0.1025   Epoch: 8   Global Step: 92240   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 15:40:55,461-Speed 5403.86 samples/sec   Loss 6.2406   LearningRate 0.1025   Epoch: 8   Global Step: 92250   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 15:41:03,066-Speed 5387.33 samples/sec   Loss 6.1955   LearningRate 0.1024   Epoch: 8   Global Step: 92260   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 15:41:10,559-Speed 5467.58 samples/sec   Loss 6.2620   LearningRate 0.1024   Epoch: 8   Global Step: 92270   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 15:41:18,088-Speed 5440.26 samples/sec   Loss 6.2542   LearningRate 0.1024   Epoch: 8   Global Step: 92280   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:41:25,608-Speed 5447.70 samples/sec   Loss 6.2994   LearningRate 0.1024   Epoch: 8   Global Step: 92290   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:41:33,062-Speed 5495.88 samples/sec   Loss 6.2479   LearningRate 0.1024   Epoch: 8   Global Step: 92300   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:41:40,736-Speed 5338.35 samples/sec   Loss 6.2063   LearningRate 0.1023   Epoch: 8   Global Step: 92310   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:41:48,183-Speed 5500.87 samples/sec   Loss 6.2487   LearningRate 0.1023   Epoch: 8   Global Step: 92320   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:41:55,600-Speed 5522.84 samples/sec   Loss 6.2418   LearningRate 0.1023   Epoch: 8   Global Step: 92330   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:42:03,079-Speed 5477.48 samples/sec   Loss 6.3139   LearningRate 0.1023   Epoch: 8   Global Step: 92340   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:42:10,518-Speed 5507.10 samples/sec   Loss 6.2614   LearningRate 0.1023   Epoch: 8   Global Step: 92350   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:42:17,959-Speed 5505.15 samples/sec   Loss 6.1794   LearningRate 0.1023   Epoch: 8   Global Step: 92360   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:42:25,331-Speed 5556.84 samples/sec   Loss 6.2759   LearningRate 0.1022   Epoch: 8   Global Step: 92370   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:42:32,830-Speed 5462.68 samples/sec   Loss 6.2662   LearningRate 0.1022   Epoch: 8   Global Step: 92380   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 15:42:40,322-Speed 5468.45 samples/sec   Loss 6.1969   LearningRate 0.1022   Epoch: 8   Global Step: 92390   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 15:42:47,750-Speed 5515.19 samples/sec   Loss 6.2643   LearningRate 0.1022   Epoch: 8   Global Step: 92400   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:42:55,157-Speed 5530.49 samples/sec   Loss 6.2480   LearningRate 0.1022   Epoch: 8   Global Step: 92410   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:43:02,556-Speed 5536.70 samples/sec   Loss 6.1921   LearningRate 0.1021   Epoch: 8   Global Step: 92420   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:43:10,019-Speed 5489.10 samples/sec   Loss 6.2319   LearningRate 0.1021   Epoch: 8   Global Step: 92430   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:43:17,420-Speed 5535.67 samples/sec   Loss 6.2139   LearningRate 0.1021   Epoch: 8   Global Step: 92440   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:43:24,838-Speed 5522.24 samples/sec   Loss 6.2780   LearningRate 0.1021   Epoch: 8   Global Step: 92450   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:43:32,260-Speed 5518.98 samples/sec   Loss 6.2116   LearningRate 0.1021   Epoch: 8   Global Step: 92460   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:43:39,694-Speed 5511.21 samples/sec   Loss 6.1688   LearningRate 0.1021   Epoch: 8   Global Step: 92470   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:43:47,167-Speed 5481.76 samples/sec   Loss 6.2342   LearningRate 0.1020   Epoch: 8   Global Step: 92480   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:43:54,748-Speed 5403.75 samples/sec   Loss 6.2426   LearningRate 0.1020   Epoch: 8   Global Step: 92490   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:44:02,215-Speed 5485.86 samples/sec   Loss 6.1802   LearningRate 0.1020   Epoch: 8   Global Step: 92500   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:44:09,701-Speed 5472.43 samples/sec   Loss 6.2384   LearningRate 0.1020   Epoch: 8   Global Step: 92510   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:44:17,083-Speed 5549.33 samples/sec   Loss 6.2530   LearningRate 0.1020   Epoch: 8   Global Step: 92520   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:44:24,511-Speed 5515.26 samples/sec   Loss 6.1435   LearningRate 0.1020   Epoch: 8   Global Step: 92530   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:44:31,941-Speed 5513.46 samples/sec   Loss 6.2235   LearningRate 0.1019   Epoch: 8   Global Step: 92540   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:44:39,439-Speed 5463.50 samples/sec   Loss 6.3148   LearningRate 0.1019   Epoch: 8   Global Step: 92550   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:44:46,942-Speed 5460.09 samples/sec   Loss 6.2224   LearningRate 0.1019   Epoch: 8   Global Step: 92560   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:44:54,412-Speed 5484.14 samples/sec   Loss 6.2189   LearningRate 0.1019   Epoch: 8   Global Step: 92570   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:45:01,882-Speed 5483.60 samples/sec   Loss 6.2702   LearningRate 0.1019   Epoch: 8   Global Step: 92580   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:45:09,331-Speed 5499.05 samples/sec   Loss 6.1466   LearningRate 0.1018   Epoch: 8   Global Step: 92590   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:45:16,786-Speed 5495.83 samples/sec   Loss 6.2030   LearningRate 0.1018   Epoch: 8   Global Step: 92600   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:45:24,201-Speed 5524.88 samples/sec   Loss 6.1671   LearningRate 0.1018   Epoch: 8   Global Step: 92610   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:45:31,684-Speed 5474.18 samples/sec   Loss 6.1632   LearningRate 0.1018   Epoch: 8   Global Step: 92620   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:45:39,136-Speed 5496.84 samples/sec   Loss 6.1869   LearningRate 0.1018   Epoch: 8   Global Step: 92630   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:45:46,581-Speed 5502.81 samples/sec   Loss 6.2046   LearningRate 0.1018   Epoch: 8   Global Step: 92640   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 15:45:54,145-Speed 5415.76 samples/sec   Loss 6.1935   LearningRate 0.1017   Epoch: 8   Global Step: 92650   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 15:46:01,560-Speed 5525.02 samples/sec   Loss 6.2388   LearningRate 0.1017   Epoch: 8   Global Step: 92660   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 15:46:09,066-Speed 5457.56 samples/sec   Loss 6.1962   LearningRate 0.1017   Epoch: 8   Global Step: 92670   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 15:46:16,645-Speed 5405.12 samples/sec   Loss 6.1839   LearningRate 0.1017   Epoch: 8   Global Step: 92680   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:46:24,158-Speed 5452.68 samples/sec   Loss 6.2402   LearningRate 0.1017   Epoch: 8   Global Step: 92690   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:46:31,677-Speed 5447.88 samples/sec   Loss 6.2209   LearningRate 0.1017   Epoch: 8   Global Step: 92700   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:46:39,085-Speed 5530.46 samples/sec   Loss 6.2271   LearningRate 0.1016   Epoch: 8   Global Step: 92710   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:46:46,473-Speed 5544.69 samples/sec   Loss 6.2241   LearningRate 0.1016   Epoch: 8   Global Step: 92720   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:46:53,883-Speed 5528.80 samples/sec   Loss 6.2454   LearningRate 0.1016   Epoch: 8   Global Step: 92730   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:47:01,268-Speed 5546.66 samples/sec   Loss 6.2448   LearningRate 0.1016   Epoch: 8   Global Step: 92740   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:47:08,730-Speed 5489.96 samples/sec   Loss 6.2100   LearningRate 0.1016   Epoch: 8   Global Step: 92750   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:47:16,143-Speed 5526.30 samples/sec   Loss 6.2395   LearningRate 0.1015   Epoch: 8   Global Step: 92760   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:47:23,563-Speed 5520.77 samples/sec   Loss 6.2399   LearningRate 0.1015   Epoch: 8   Global Step: 92770   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:47:31,013-Speed 5499.10 samples/sec   Loss 6.2365   LearningRate 0.1015   Epoch: 8   Global Step: 92780   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:47:38,409-Speed 5539.06 samples/sec   Loss 6.2242   LearningRate 0.1015   Epoch: 8   Global Step: 92790   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:47:45,839-Speed 5512.93 samples/sec   Loss 6.1713   LearningRate 0.1015   Epoch: 8   Global Step: 92800   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:47:53,367-Speed 5442.59 samples/sec   Loss 6.2161   LearningRate 0.1015   Epoch: 8   Global Step: 92810   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:48:00,833-Speed 5486.79 samples/sec   Loss 6.2398   LearningRate 0.1014   Epoch: 8   Global Step: 92820   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:48:08,343-Speed 5454.44 samples/sec   Loss 6.2677   LearningRate 0.1014   Epoch: 8   Global Step: 92830   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:48:15,769-Speed 5516.23 samples/sec   Loss 6.2198   LearningRate 0.1014   Epoch: 8   Global Step: 92840   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:48:23,216-Speed 5501.65 samples/sec   Loss 6.1988   LearningRate 0.1014   Epoch: 8   Global Step: 92850   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:48:30,676-Speed 5491.03 samples/sec   Loss 6.1951   LearningRate 0.1014   Epoch: 8   Global Step: 92860   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:48:38,268-Speed 5395.46 samples/sec   Loss 6.2436   LearningRate 0.1014   Epoch: 8   Global Step: 92870   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:48:45,732-Speed 5489.06 samples/sec   Loss 6.2154   LearningRate 0.1013   Epoch: 8   Global Step: 92880   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:48:53,153-Speed 5520.27 samples/sec   Loss 6.2579   LearningRate 0.1013   Epoch: 8   Global Step: 92890   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:49:00,589-Speed 5509.14 samples/sec   Loss 6.2109   LearningRate 0.1013   Epoch: 8   Global Step: 92900   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:49:07,993-Speed 5532.57 samples/sec   Loss 6.2246   LearningRate 0.1013   Epoch: 8   Global Step: 92910   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:49:15,422-Speed 5514.50 samples/sec   Loss 6.2047   LearningRate 0.1013   Epoch: 8   Global Step: 92920   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:49:22,851-Speed 5513.98 samples/sec   Loss 6.2211   LearningRate 0.1012   Epoch: 8   Global Step: 92930   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:49:30,320-Speed 5485.36 samples/sec   Loss 6.2332   LearningRate 0.1012   Epoch: 8   Global Step: 92940   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:49:37,765-Speed 5501.98 samples/sec   Loss 6.1909   LearningRate 0.1012   Epoch: 8   Global Step: 92950   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:49:45,198-Speed 5511.31 samples/sec   Loss 6.1936   LearningRate 0.1012   Epoch: 8   Global Step: 92960   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:49:52,635-Speed 5507.97 samples/sec   Loss 6.2138   LearningRate 0.1012   Epoch: 8   Global Step: 92970   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:50:00,098-Speed 5489.91 samples/sec   Loss 6.2080   LearningRate 0.1012   Epoch: 8   Global Step: 92980   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 15:50:07,582-Speed 5473.24 samples/sec   Loss 6.2877   LearningRate 0.1011   Epoch: 8   Global Step: 92990   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:50:15,027-Speed 5502.32 samples/sec   Loss 6.1995   LearningRate 0.1011   Epoch: 8   Global Step: 93000   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:50:22,520-Speed 5467.50 samples/sec   Loss 6.1979   LearningRate 0.1011   Epoch: 8   Global Step: 93010   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:50:29,950-Speed 5514.08 samples/sec   Loss 6.2157   LearningRate 0.1011   Epoch: 8   Global Step: 93020   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:50:37,421-Speed 5482.77 samples/sec   Loss 6.2433   LearningRate 0.1011   Epoch: 8   Global Step: 93030   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:50:44,788-Speed 5560.51 samples/sec   Loss 6.2616   LearningRate 0.1011   Epoch: 8   Global Step: 93040   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:50:52,188-Speed 5536.48 samples/sec   Loss 6.1631   LearningRate 0.1010   Epoch: 8   Global Step: 93050   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:50:59,561-Speed 5556.29 samples/sec   Loss 6.1922   LearningRate 0.1010   Epoch: 8   Global Step: 93060   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:51:06,965-Speed 5533.01 samples/sec   Loss 6.1849   LearningRate 0.1010   Epoch: 8   Global Step: 93070   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:51:14,366-Speed 5534.60 samples/sec   Loss 6.1784   LearningRate 0.1010   Epoch: 8   Global Step: 93080   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:51:21,828-Speed 5489.57 samples/sec   Loss 6.2383   LearningRate 0.1010   Epoch: 8   Global Step: 93090   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 15:51:29,311-Speed 5475.00 samples/sec   Loss 6.2376   LearningRate 0.1009   Epoch: 8   Global Step: 93100   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 15:51:36,701-Speed 5543.89 samples/sec   Loss 6.1708   LearningRate 0.1009   Epoch: 8   Global Step: 93110   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:51:44,154-Speed 5496.13 samples/sec   Loss 6.1590   LearningRate 0.1009   Epoch: 8   Global Step: 93120   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:51:51,581-Speed 5515.24 samples/sec   Loss 6.1857   LearningRate 0.1009   Epoch: 8   Global Step: 93130   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:51:59,051-Speed 5484.32 samples/sec   Loss 6.1947   LearningRate 0.1009   Epoch: 8   Global Step: 93140   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:52:06,493-Speed 5504.90 samples/sec   Loss 6.1821   LearningRate 0.1009   Epoch: 8   Global Step: 93150   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:52:13,921-Speed 5515.08 samples/sec   Loss 6.2036   LearningRate 0.1008   Epoch: 8   Global Step: 93160   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:52:21,375-Speed 5495.79 samples/sec   Loss 6.2568   LearningRate 0.1008   Epoch: 8   Global Step: 93170   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:52:28,860-Speed 5472.88 samples/sec   Loss 6.1668   LearningRate 0.1008   Epoch: 8   Global Step: 93180   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:52:36,292-Speed 5512.03 samples/sec   Loss 6.2083   LearningRate 0.1008   Epoch: 8   Global Step: 93190   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:52:43,755-Speed 5488.96 samples/sec   Loss 6.1841   LearningRate 0.1008   Epoch: 8   Global Step: 93200   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:52:51,180-Speed 5517.33 samples/sec   Loss 6.1954   LearningRate 0.1007   Epoch: 8   Global Step: 93210   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:52:58,646-Speed 5487.00 samples/sec   Loss 6.2418   LearningRate 0.1007   Epoch: 8   Global Step: 93220   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:53:06,057-Speed 5528.43 samples/sec   Loss 6.1692   LearningRate 0.1007   Epoch: 8   Global Step: 93230   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:53:13,501-Speed 5502.66 samples/sec   Loss 6.1336   LearningRate 0.1007   Epoch: 8   Global Step: 93240   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:53:20,935-Speed 5510.51 samples/sec   Loss 6.2212   LearningRate 0.1007   Epoch: 8   Global Step: 93250   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:53:28,421-Speed 5472.59 samples/sec   Loss 6.2281   LearningRate 0.1007   Epoch: 8   Global Step: 93260   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:53:35,916-Speed 5465.87 samples/sec   Loss 6.1515   LearningRate 0.1006   Epoch: 8   Global Step: 93270   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:53:43,403-Speed 5471.58 samples/sec   Loss 6.1919   LearningRate 0.1006   Epoch: 8   Global Step: 93280   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:53:50,852-Speed 5499.08 samples/sec   Loss 6.1837   LearningRate 0.1006   Epoch: 8   Global Step: 93290   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:53:58,318-Speed 5487.07 samples/sec   Loss 6.1565   LearningRate 0.1006   Epoch: 8   Global Step: 93300   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:54:05,871-Speed 5424.13 samples/sec   Loss 6.1884   LearningRate 0.1006   Epoch: 8   Global Step: 93310   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:54:13,283-Speed 5526.85 samples/sec   Loss 6.1881   LearningRate 0.1006   Epoch: 8   Global Step: 93320   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:54:37,674-Speed 1679.33 samples/sec   Loss 6.1676   LearningRate 0.1005   Epoch: 9   Global Step: 93330   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:54:45,137-Speed 5489.27 samples/sec   Loss 6.1681   LearningRate 0.1005   Epoch: 9   Global Step: 93340   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:54:52,590-Speed 5496.90 samples/sec   Loss 6.1440   LearningRate 0.1005   Epoch: 9   Global Step: 93350   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:55:00,024-Speed 5510.23 samples/sec   Loss 6.1849   LearningRate 0.1005   Epoch: 9   Global Step: 93360   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:55:07,489-Speed 5488.22 samples/sec   Loss 6.1957   LearningRate 0.1005   Epoch: 9   Global Step: 93370   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:55:14,916-Speed 5515.84 samples/sec   Loss 6.1764   LearningRate 0.1005   Epoch: 9   Global Step: 93380   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:55:22,344-Speed 5515.22 samples/sec   Loss 6.1794   LearningRate 0.1004   Epoch: 9   Global Step: 93390   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:55:29,775-Speed 5512.37 samples/sec   Loss 6.1768   LearningRate 0.1004   Epoch: 9   Global Step: 93400   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:55:37,182-Speed 5530.73 samples/sec   Loss 6.1743   LearningRate 0.1004   Epoch: 9   Global Step: 93410   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:55:44,622-Speed 5506.70 samples/sec   Loss 6.1646   LearningRate 0.1004   Epoch: 9   Global Step: 93420   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:55:52,189-Speed 5413.60 samples/sec   Loss 6.1472   LearningRate 0.1004   Epoch: 9   Global Step: 93430   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:55:59,690-Speed 5460.81 samples/sec   Loss 6.1888   LearningRate 0.1003   Epoch: 9   Global Step: 93440   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:56:07,147-Speed 5493.58 samples/sec   Loss 6.1823   LearningRate 0.1003   Epoch: 9   Global Step: 93450   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:56:14,584-Speed 5508.61 samples/sec   Loss 6.1793   LearningRate 0.1003   Epoch: 9   Global Step: 93460   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:56:22,143-Speed 5419.69 samples/sec   Loss 6.2060   LearningRate 0.1003   Epoch: 9   Global Step: 93470   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:56:29,595-Speed 5497.06 samples/sec   Loss 6.2294   LearningRate 0.1003   Epoch: 9   Global Step: 93480   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:56:37,034-Speed 5506.94 samples/sec   Loss 6.1404   LearningRate 0.1003   Epoch: 9   Global Step: 93490   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:56:44,616-Speed 5402.89 samples/sec   Loss 6.1324   LearningRate 0.1002   Epoch: 9   Global Step: 93500   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:56:52,245-Speed 5370.39 samples/sec   Loss 6.1264   LearningRate 0.1002   Epoch: 9   Global Step: 93510   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:56:59,865-Speed 5375.69 samples/sec   Loss 6.1029   LearningRate 0.1002   Epoch: 9   Global Step: 93520   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:57:07,424-Speed 5419.14 samples/sec   Loss 6.1518   LearningRate 0.1002   Epoch: 9   Global Step: 93530   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:57:15,010-Speed 5400.28 samples/sec   Loss 6.0935   LearningRate 0.1002   Epoch: 9   Global Step: 93540   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:57:22,583-Speed 5409.39 samples/sec   Loss 6.1408   LearningRate 0.1002   Epoch: 9   Global Step: 93550   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:57:30,179-Speed 5393.07 samples/sec   Loss 6.1636   LearningRate 0.1001   Epoch: 9   Global Step: 93560   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:57:37,652-Speed 5481.96 samples/sec   Loss 6.1781   LearningRate 0.1001   Epoch: 9   Global Step: 93570   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:57:45,198-Speed 5428.08 samples/sec   Loss 6.1298   LearningRate 0.1001   Epoch: 9   Global Step: 93580   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:57:52,855-Speed 5350.96 samples/sec   Loss 6.1463   LearningRate 0.1001   Epoch: 9   Global Step: 93590   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:58:00,434-Speed 5404.88 samples/sec   Loss 6.2238   LearningRate 0.1001   Epoch: 9   Global Step: 93600   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:58:08,109-Speed 5337.60 samples/sec   Loss 6.0986   LearningRate 0.1000   Epoch: 9   Global Step: 93610   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:58:15,871-Speed 5277.37 samples/sec   Loss 6.2121   LearningRate 0.1000   Epoch: 9   Global Step: 93620   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 15:58:23,532-Speed 5347.34 samples/sec   Loss 6.2374   LearningRate 0.1000   Epoch: 9   Global Step: 93630   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:58:31,214-Speed 5332.69 samples/sec   Loss 6.1838   LearningRate 0.1000   Epoch: 9   Global Step: 93640   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:58:38,848-Speed 5365.83 samples/sec   Loss 6.1381   LearningRate 0.1000   Epoch: 9   Global Step: 93650   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:58:46,508-Speed 5348.09 samples/sec   Loss 6.1735   LearningRate 0.1000   Epoch: 9   Global Step: 93660   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:58:54,140-Speed 5367.89 samples/sec   Loss 6.1107   LearningRate 0.0999   Epoch: 9   Global Step: 93670   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:59:01,850-Speed 5312.90 samples/sec   Loss 6.1504   LearningRate 0.0999   Epoch: 9   Global Step: 93680   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:59:09,372-Speed 5446.04 samples/sec   Loss 6.1070   LearningRate 0.0999   Epoch: 9   Global Step: 93690   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:59:16,914-Speed 5431.51 samples/sec   Loss 6.1154   LearningRate 0.0999   Epoch: 9   Global Step: 93700   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:59:24,384-Speed 5483.93 samples/sec   Loss 6.1136   LearningRate 0.0999   Epoch: 9   Global Step: 93710   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:59:31,922-Speed 5435.97 samples/sec   Loss 6.1861   LearningRate 0.0999   Epoch: 9   Global Step: 93720   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:59:39,372-Speed 5498.18 samples/sec   Loss 6.1381   LearningRate 0.0998   Epoch: 9   Global Step: 93730   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 15:59:46,873-Speed 5460.99 samples/sec   Loss 6.1340   LearningRate 0.0998   Epoch: 9   Global Step: 93740   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 15:59:54,444-Speed 5411.18 samples/sec   Loss 6.1459   LearningRate 0.0998   Epoch: 9   Global Step: 93750   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:00:02,080-Speed 5364.59 samples/sec   Loss 6.1918   LearningRate 0.0998   Epoch: 9   Global Step: 93760   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:00:09,564-Speed 5474.01 samples/sec   Loss 6.0939   LearningRate 0.0998   Epoch: 9   Global Step: 93770   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:00:17,176-Speed 5381.21 samples/sec   Loss 6.1286   LearningRate 0.0997   Epoch: 9   Global Step: 93780   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:00:24,703-Speed 5442.37 samples/sec   Loss 6.0957   LearningRate 0.0997   Epoch: 9   Global Step: 93790   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:00:32,231-Speed 5442.42 samples/sec   Loss 6.1700   LearningRate 0.0997   Epoch: 9   Global Step: 93800   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:00:39,737-Speed 5457.63 samples/sec   Loss 6.1732   LearningRate 0.0997   Epoch: 9   Global Step: 93810   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:00:47,223-Speed 5472.00 samples/sec   Loss 6.1559   LearningRate 0.0997   Epoch: 9   Global Step: 93820   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:00:54,677-Speed 5496.01 samples/sec   Loss 6.1580   LearningRate 0.0997   Epoch: 9   Global Step: 93830   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:01:02,230-Speed 5423.56 samples/sec   Loss 6.1436   LearningRate 0.0996   Epoch: 9   Global Step: 93840   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:01:09,745-Speed 5451.28 samples/sec   Loss 6.1530   LearningRate 0.0996   Epoch: 9   Global Step: 93850   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:01:17,209-Speed 5488.08 samples/sec   Loss 6.1471   LearningRate 0.0996   Epoch: 9   Global Step: 93860   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:01:24,777-Speed 5413.21 samples/sec   Loss 6.1958   LearningRate 0.0996   Epoch: 9   Global Step: 93870   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:01:32,239-Speed 5490.73 samples/sec   Loss 6.1550   LearningRate 0.0996   Epoch: 9   Global Step: 93880   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:01:39,705-Speed 5486.72 samples/sec   Loss 6.2068   LearningRate 0.0996   Epoch: 9   Global Step: 93890   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:01:47,286-Speed 5403.48 samples/sec   Loss 6.1176   LearningRate 0.0995   Epoch: 9   Global Step: 93900   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:01:54,851-Speed 5415.06 samples/sec   Loss 6.1448   LearningRate 0.0995   Epoch: 9   Global Step: 93910   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:02:02,411-Speed 5418.99 samples/sec   Loss 6.1311   LearningRate 0.0995   Epoch: 9   Global Step: 93920   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:02:09,929-Speed 5449.12 samples/sec   Loss 6.1709   LearningRate 0.0995   Epoch: 9   Global Step: 93930   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:02:17,433-Speed 5458.63 samples/sec   Loss 6.1654   LearningRate 0.0995   Epoch: 9   Global Step: 93940   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:02:24,970-Speed 5435.11 samples/sec   Loss 6.2109   LearningRate 0.0994   Epoch: 9   Global Step: 93950   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:02:32,726-Speed 5281.73 samples/sec   Loss 6.1414   LearningRate 0.0994   Epoch: 9   Global Step: 93960   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:02:40,384-Speed 5349.94 samples/sec   Loss 6.1296   LearningRate 0.0994   Epoch: 9   Global Step: 93970   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:02:47,866-Speed 5474.83 samples/sec   Loss 6.1763   LearningRate 0.0994   Epoch: 9   Global Step: 93980   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:02:55,448-Speed 5402.81 samples/sec   Loss 6.1362   LearningRate 0.0994   Epoch: 9   Global Step: 93990   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:03:03,122-Speed 5338.08 samples/sec   Loss 6.0984   LearningRate 0.0994   Epoch: 9   Global Step: 94000   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:03:47,050-[lfw][94000]XNorm: 23.219754
Training: 2022-01-08 16:03:47,051-[lfw][94000]Accuracy-Flip: 0.99783+-0.00269
Training: 2022-01-08 16:03:47,051-[lfw][94000]Accuracy-Highest: 0.99817
Training: 2022-01-08 16:04:38,930-[cfp_fp][94000]XNorm: 21.137395
Training: 2022-01-08 16:04:38,931-[cfp_fp][94000]Accuracy-Flip: 0.98800+-0.00600
Training: 2022-01-08 16:04:38,932-[cfp_fp][94000]Accuracy-Highest: 0.98814
Training: 2022-01-08 16:05:24,882-[agedb_30][94000]XNorm: 22.578027
Training: 2022-01-08 16:05:24,884-[agedb_30][94000]Accuracy-Flip: 0.97533+-0.00730
Training: 2022-01-08 16:05:24,884-[agedb_30][94000]Accuracy-Highest: 0.97833
Training: 2022-01-08 16:05:32,469-Speed 274.27 samples/sec   Loss 6.1263   LearningRate 0.0993   Epoch: 9   Global Step: 94010   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:05:39,943-Speed 5481.69 samples/sec   Loss 6.1672   LearningRate 0.0993   Epoch: 9   Global Step: 94020   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:05:47,413-Speed 5484.52 samples/sec   Loss 6.1518   LearningRate 0.0993   Epoch: 9   Global Step: 94030   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:05:54,877-Speed 5489.57 samples/sec   Loss 6.2016   LearningRate 0.0993   Epoch: 9   Global Step: 94040   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:06:02,342-Speed 5488.16 samples/sec   Loss 6.1367   LearningRate 0.0993   Epoch: 9   Global Step: 94050   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:06:09,927-Speed 5402.06 samples/sec   Loss 6.1719   LearningRate 0.0993   Epoch: 9   Global Step: 94060   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:06:17,429-Speed 5461.22 samples/sec   Loss 6.1579   LearningRate 0.0992   Epoch: 9   Global Step: 94070   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:06:24,928-Speed 5462.71 samples/sec   Loss 6.1808   LearningRate 0.0992   Epoch: 9   Global Step: 94080   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:06:32,381-Speed 5496.98 samples/sec   Loss 6.1626   LearningRate 0.0992   Epoch: 9   Global Step: 94090   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:06:39,865-Speed 5474.23 samples/sec   Loss 6.1935   LearningRate 0.0992   Epoch: 9   Global Step: 94100   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:06:47,293-Speed 5514.66 samples/sec   Loss 6.1007   LearningRate 0.0992   Epoch: 9   Global Step: 94110   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:06:55,027-Speed 5297.49 samples/sec   Loss 6.1218   LearningRate 0.0992   Epoch: 9   Global Step: 94120   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:07:02,634-Speed 5386.07 samples/sec   Loss 6.1792   LearningRate 0.0991   Epoch: 9   Global Step: 94130   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:07:10,064-Speed 5512.60 samples/sec   Loss 6.1486   LearningRate 0.0991   Epoch: 9   Global Step: 94140   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:07:17,562-Speed 5464.33 samples/sec   Loss 6.1231   LearningRate 0.0991   Epoch: 9   Global Step: 94150   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:07:25,093-Speed 5439.42 samples/sec   Loss 6.1228   LearningRate 0.0991   Epoch: 9   Global Step: 94160   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:07:32,496-Speed 5534.21 samples/sec   Loss 6.1331   LearningRate 0.0991   Epoch: 9   Global Step: 94170   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:07:39,907-Speed 5527.19 samples/sec   Loss 6.1488   LearningRate 0.0990   Epoch: 9   Global Step: 94180   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:07:47,469-Speed 5417.56 samples/sec   Loss 6.2034   LearningRate 0.0990   Epoch: 9   Global Step: 94190   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:07:54,984-Speed 5451.51 samples/sec   Loss 6.2091   LearningRate 0.0990   Epoch: 9   Global Step: 94200   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:08:02,530-Speed 5428.56 samples/sec   Loss 6.1512   LearningRate 0.0990   Epoch: 9   Global Step: 94210   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:08:10,078-Speed 5427.40 samples/sec   Loss 6.1610   LearningRate 0.0990   Epoch: 9   Global Step: 94220   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:08:17,528-Speed 5498.48 samples/sec   Loss 6.1595   LearningRate 0.0990   Epoch: 9   Global Step: 94230   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:08:25,002-Speed 5481.15 samples/sec   Loss 6.1411   LearningRate 0.0989   Epoch: 9   Global Step: 94240   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:08:32,468-Speed 5486.74 samples/sec   Loss 6.1512   LearningRate 0.0989   Epoch: 9   Global Step: 94250   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:08:39,868-Speed 5536.56 samples/sec   Loss 6.1744   LearningRate 0.0989   Epoch: 9   Global Step: 94260   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:08:47,422-Speed 5422.85 samples/sec   Loss 6.1110   LearningRate 0.0989   Epoch: 9   Global Step: 94270   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:08:54,842-Speed 5520.58 samples/sec   Loss 6.1243   LearningRate 0.0989   Epoch: 9   Global Step: 94280   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:09:02,299-Speed 5494.06 samples/sec   Loss 6.1089   LearningRate 0.0989   Epoch: 9   Global Step: 94290   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:09:09,758-Speed 5491.74 samples/sec   Loss 6.1205   LearningRate 0.0988   Epoch: 9   Global Step: 94300   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:09:17,211-Speed 5497.41 samples/sec   Loss 6.1658   LearningRate 0.0988   Epoch: 9   Global Step: 94310   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:09:24,901-Speed 5326.56 samples/sec   Loss 6.1135   LearningRate 0.0988   Epoch: 9   Global Step: 94320   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:09:32,432-Speed 5439.81 samples/sec   Loss 6.1176   LearningRate 0.0988   Epoch: 9   Global Step: 94330   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:09:39,834-Speed 5534.52 samples/sec   Loss 6.1056   LearningRate 0.0988   Epoch: 9   Global Step: 94340   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:09:47,246-Speed 5526.78 samples/sec   Loss 6.1062   LearningRate 0.0987   Epoch: 9   Global Step: 94350   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:09:54,959-Speed 5311.58 samples/sec   Loss 6.1418   LearningRate 0.0987   Epoch: 9   Global Step: 94360   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:10:02,548-Speed 5397.72 samples/sec   Loss 6.1028   LearningRate 0.0987   Epoch: 9   Global Step: 94370   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:10:10,090-Speed 5431.99 samples/sec   Loss 6.1149   LearningRate 0.0987   Epoch: 9   Global Step: 94380   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:10:17,604-Speed 5451.78 samples/sec   Loss 6.0728   LearningRate 0.0987   Epoch: 9   Global Step: 94390   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:10:25,216-Speed 5381.95 samples/sec   Loss 6.0875   LearningRate 0.0987   Epoch: 9   Global Step: 94400   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:10:32,639-Speed 5518.49 samples/sec   Loss 6.0878   LearningRate 0.0986   Epoch: 9   Global Step: 94410   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:10:40,131-Speed 5467.90 samples/sec   Loss 6.1932   LearningRate 0.0986   Epoch: 9   Global Step: 94420   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:10:47,588-Speed 5493.57 samples/sec   Loss 6.0749   LearningRate 0.0986   Epoch: 9   Global Step: 94430   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:10:55,088-Speed 5462.16 samples/sec   Loss 6.1191   LearningRate 0.0986   Epoch: 9   Global Step: 94440   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:11:02,574-Speed 5473.05 samples/sec   Loss 6.1010   LearningRate 0.0986   Epoch: 9   Global Step: 94450   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:11:10,131-Speed 5420.69 samples/sec   Loss 6.0994   LearningRate 0.0986   Epoch: 9   Global Step: 94460   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:11:17,683-Speed 5424.65 samples/sec   Loss 6.1199   LearningRate 0.0985   Epoch: 9   Global Step: 94470   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:11:25,158-Speed 5480.45 samples/sec   Loss 6.0858   LearningRate 0.0985   Epoch: 9   Global Step: 94480   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:11:32,596-Speed 5507.74 samples/sec   Loss 6.0832   LearningRate 0.0985   Epoch: 9   Global Step: 94490   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:11:40,078-Speed 5474.98 samples/sec   Loss 6.1610   LearningRate 0.0985   Epoch: 9   Global Step: 94500   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:11:47,607-Speed 5441.30 samples/sec   Loss 6.1821   LearningRate 0.0985   Epoch: 9   Global Step: 94510   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:11:55,085-Speed 5477.78 samples/sec   Loss 6.1298   LearningRate 0.0985   Epoch: 9   Global Step: 94520   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:12:02,536-Speed 5498.26 samples/sec   Loss 6.0916   LearningRate 0.0984   Epoch: 9   Global Step: 94530   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:12:09,987-Speed 5497.64 samples/sec   Loss 6.1150   LearningRate 0.0984   Epoch: 9   Global Step: 94540   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:12:17,486-Speed 5462.89 samples/sec   Loss 6.1563   LearningRate 0.0984   Epoch: 9   Global Step: 94550   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:12:24,983-Speed 5464.50 samples/sec   Loss 6.1623   LearningRate 0.0984   Epoch: 9   Global Step: 94560   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:12:32,453-Speed 5483.87 samples/sec   Loss 6.1614   LearningRate 0.0984   Epoch: 9   Global Step: 94570   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:12:39,907-Speed 5495.88 samples/sec   Loss 6.1470   LearningRate 0.0983   Epoch: 9   Global Step: 94580   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:12:47,389-Speed 5475.48 samples/sec   Loss 6.1175   LearningRate 0.0983   Epoch: 9   Global Step: 94590   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:12:54,791-Speed 5534.17 samples/sec   Loss 6.1358   LearningRate 0.0983   Epoch: 9   Global Step: 94600   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:13:02,333-Speed 5431.88 samples/sec   Loss 6.1171   LearningRate 0.0983   Epoch: 9   Global Step: 94610   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:13:09,877-Speed 5430.41 samples/sec   Loss 6.1154   LearningRate 0.0983   Epoch: 9   Global Step: 94620   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:13:17,404-Speed 5442.34 samples/sec   Loss 6.0347   LearningRate 0.0983   Epoch: 9   Global Step: 94630   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:13:24,812-Speed 5530.02 samples/sec   Loss 6.0873   LearningRate 0.0982   Epoch: 9   Global Step: 94640   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:13:32,298-Speed 5472.33 samples/sec   Loss 6.0570   LearningRate 0.0982   Epoch: 9   Global Step: 94650   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:13:39,789-Speed 5468.68 samples/sec   Loss 6.1250   LearningRate 0.0982   Epoch: 9   Global Step: 94660   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:13:47,200-Speed 5527.47 samples/sec   Loss 6.0990   LearningRate 0.0982   Epoch: 9   Global Step: 94670   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:13:54,654-Speed 5495.98 samples/sec   Loss 6.1206   LearningRate 0.0982   Epoch: 9   Global Step: 94680   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:14:02,114-Speed 5491.39 samples/sec   Loss 6.1095   LearningRate 0.0982   Epoch: 9   Global Step: 94690   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:14:09,604-Speed 5469.60 samples/sec   Loss 6.0939   LearningRate 0.0981   Epoch: 9   Global Step: 94700   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:14:17,050-Speed 5501.83 samples/sec   Loss 6.0287   LearningRate 0.0981   Epoch: 9   Global Step: 94710   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:14:24,516-Speed 5486.79 samples/sec   Loss 6.1136   LearningRate 0.0981   Epoch: 9   Global Step: 94720   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:14:31,947-Speed 5512.58 samples/sec   Loss 6.0627   LearningRate 0.0981   Epoch: 9   Global Step: 94730   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:14:39,591-Speed 5359.54 samples/sec   Loss 6.0941   LearningRate 0.0981   Epoch: 9   Global Step: 94740   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:14:47,076-Speed 5472.69 samples/sec   Loss 6.0464   LearningRate 0.0981   Epoch: 9   Global Step: 94750   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:14:54,474-Speed 5537.59 samples/sec   Loss 6.0621   LearningRate 0.0980   Epoch: 9   Global Step: 94760   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:15:01,927-Speed 5496.70 samples/sec   Loss 6.1435   LearningRate 0.0980   Epoch: 9   Global Step: 94770   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:15:09,434-Speed 5456.40 samples/sec   Loss 6.1008   LearningRate 0.0980   Epoch: 9   Global Step: 94780   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:15:16,878-Speed 5503.13 samples/sec   Loss 6.1326   LearningRate 0.0980   Epoch: 9   Global Step: 94790   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:15:24,428-Speed 5426.31 samples/sec   Loss 6.1192   LearningRate 0.0980   Epoch: 9   Global Step: 94800   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:15:31,866-Speed 5507.74 samples/sec   Loss 6.0359   LearningRate 0.0979   Epoch: 9   Global Step: 94810   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:15:39,391-Speed 5443.58 samples/sec   Loss 6.1056   LearningRate 0.0979   Epoch: 9   Global Step: 94820   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:15:46,972-Speed 5404.08 samples/sec   Loss 6.1144   LearningRate 0.0979   Epoch: 9   Global Step: 94830   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:15:54,524-Speed 5424.31 samples/sec   Loss 6.1161   LearningRate 0.0979   Epoch: 9   Global Step: 94840   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:16:02,056-Speed 5439.20 samples/sec   Loss 6.1358   LearningRate 0.0979   Epoch: 9   Global Step: 94850   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:16:09,592-Speed 5435.53 samples/sec   Loss 6.0472   LearningRate 0.0979   Epoch: 9   Global Step: 94860   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:16:17,161-Speed 5412.56 samples/sec   Loss 6.1548   LearningRate 0.0978   Epoch: 9   Global Step: 94870   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:16:24,667-Speed 5457.35 samples/sec   Loss 6.0017   LearningRate 0.0978   Epoch: 9   Global Step: 94880   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:16:32,109-Speed 5505.56 samples/sec   Loss 6.0621   LearningRate 0.0978   Epoch: 9   Global Step: 94890   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:16:39,608-Speed 5462.33 samples/sec   Loss 6.1026   LearningRate 0.0978   Epoch: 9   Global Step: 94900   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:16:47,054-Speed 5502.00 samples/sec   Loss 6.0965   LearningRate 0.0978   Epoch: 9   Global Step: 94910   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:16:54,609-Speed 5422.35 samples/sec   Loss 6.0593   LearningRate 0.0978   Epoch: 9   Global Step: 94920   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:17:02,139-Speed 5440.40 samples/sec   Loss 6.0751   LearningRate 0.0977   Epoch: 9   Global Step: 94930   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:17:09,603-Speed 5488.63 samples/sec   Loss 6.1680   LearningRate 0.0977   Epoch: 9   Global Step: 94940   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:17:17,113-Speed 5454.33 samples/sec   Loss 6.0785   LearningRate 0.0977   Epoch: 9   Global Step: 94950   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:17:24,641-Speed 5441.63 samples/sec   Loss 6.1529   LearningRate 0.0977   Epoch: 9   Global Step: 94960   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:17:32,129-Speed 5471.80 samples/sec   Loss 6.0516   LearningRate 0.0977   Epoch: 9   Global Step: 94970   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:17:39,590-Speed 5490.27 samples/sec   Loss 6.0779   LearningRate 0.0977   Epoch: 9   Global Step: 94980   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:17:47,048-Speed 5492.68 samples/sec   Loss 6.0425   LearningRate 0.0976   Epoch: 9   Global Step: 94990   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:17:54,736-Speed 5328.19 samples/sec   Loss 6.1059   LearningRate 0.0976   Epoch: 9   Global Step: 95000   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:18:02,196-Speed 5492.02 samples/sec   Loss 6.1145   LearningRate 0.0976   Epoch: 9   Global Step: 95010   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:18:09,656-Speed 5491.50 samples/sec   Loss 6.1028   LearningRate 0.0976   Epoch: 9   Global Step: 95020   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:18:17,079-Speed 5518.59 samples/sec   Loss 6.0893   LearningRate 0.0976   Epoch: 9   Global Step: 95030   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:18:24,551-Speed 5481.75 samples/sec   Loss 6.1082   LearningRate 0.0975   Epoch: 9   Global Step: 95040   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:18:31,996-Speed 5503.36 samples/sec   Loss 6.0498   LearningRate 0.0975   Epoch: 9   Global Step: 95050   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:18:39,458-Speed 5489.65 samples/sec   Loss 6.0481   LearningRate 0.0975   Epoch: 9   Global Step: 95060   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:18:46,910-Speed 5496.75 samples/sec   Loss 6.0374   LearningRate 0.0975   Epoch: 9   Global Step: 95070   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:18:54,389-Speed 5477.94 samples/sec   Loss 6.0551   LearningRate 0.0975   Epoch: 9   Global Step: 95080   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:19:01,880-Speed 5468.52 samples/sec   Loss 6.0773   LearningRate 0.0975   Epoch: 9   Global Step: 95090   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:19:09,279-Speed 5536.87 samples/sec   Loss 6.1293   LearningRate 0.0974   Epoch: 9   Global Step: 95100   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:19:16,730-Speed 5497.62 samples/sec   Loss 6.1351   LearningRate 0.0974   Epoch: 9   Global Step: 95110   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:19:24,236-Speed 5457.52 samples/sec   Loss 6.1055   LearningRate 0.0974   Epoch: 9   Global Step: 95120   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:19:31,795-Speed 5420.04 samples/sec   Loss 6.1326   LearningRate 0.0974   Epoch: 9   Global Step: 95130   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:19:39,219-Speed 5517.71 samples/sec   Loss 6.0949   LearningRate 0.0974   Epoch: 9   Global Step: 95140   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:19:46,764-Speed 5429.41 samples/sec   Loss 6.0595   LearningRate 0.0974   Epoch: 9   Global Step: 95150   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:19:54,235-Speed 5483.25 samples/sec   Loss 6.0805   LearningRate 0.0973   Epoch: 9   Global Step: 95160   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:20:01,697-Speed 5490.06 samples/sec   Loss 6.0814   LearningRate 0.0973   Epoch: 9   Global Step: 95170   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:20:09,153-Speed 5495.08 samples/sec   Loss 6.0407   LearningRate 0.0973   Epoch: 9   Global Step: 95180   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:20:16,668-Speed 5450.34 samples/sec   Loss 6.0267   LearningRate 0.0973   Epoch: 9   Global Step: 95190   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:20:24,152-Speed 5473.84 samples/sec   Loss 6.0697   LearningRate 0.0973   Epoch: 9   Global Step: 95200   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:20:31,665-Speed 5452.80 samples/sec   Loss 6.0952   LearningRate 0.0973   Epoch: 9   Global Step: 95210   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:20:39,134-Speed 5485.28 samples/sec   Loss 6.0928   LearningRate 0.0972   Epoch: 9   Global Step: 95220   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:20:46,674-Speed 5433.12 samples/sec   Loss 6.1298   LearningRate 0.0972   Epoch: 9   Global Step: 95230   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:20:54,107-Speed 5510.70 samples/sec   Loss 6.0882   LearningRate 0.0972   Epoch: 9   Global Step: 95240   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:21:01,575-Speed 5485.72 samples/sec   Loss 6.0228   LearningRate 0.0972   Epoch: 9   Global Step: 95250   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:21:09,059-Speed 5474.40 samples/sec   Loss 6.0659   LearningRate 0.0972   Epoch: 9   Global Step: 95260   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:21:16,735-Speed 5336.66 samples/sec   Loss 6.0872   LearningRate 0.0971   Epoch: 9   Global Step: 95270   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:21:24,225-Speed 5469.30 samples/sec   Loss 6.0665   LearningRate 0.0971   Epoch: 9   Global Step: 95280   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:21:31,735-Speed 5454.76 samples/sec   Loss 6.0770   LearningRate 0.0971   Epoch: 9   Global Step: 95290   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:21:39,231-Speed 5465.15 samples/sec   Loss 6.1228   LearningRate 0.0971   Epoch: 9   Global Step: 95300   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:21:46,734-Speed 5459.75 samples/sec   Loss 6.0512   LearningRate 0.0971   Epoch: 9   Global Step: 95310   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:21:54,241-Speed 5456.99 samples/sec   Loss 6.1001   LearningRate 0.0971   Epoch: 9   Global Step: 95320   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:22:01,680-Speed 5507.14 samples/sec   Loss 6.0338   LearningRate 0.0970   Epoch: 9   Global Step: 95330   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:22:09,087-Speed 5530.43 samples/sec   Loss 6.0627   LearningRate 0.0970   Epoch: 9   Global Step: 95340   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:22:16,547-Speed 5491.47 samples/sec   Loss 6.0959   LearningRate 0.0970   Epoch: 9   Global Step: 95350   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:22:24,052-Speed 5458.37 samples/sec   Loss 6.0946   LearningRate 0.0970   Epoch: 9   Global Step: 95360   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:22:31,525-Speed 5481.20 samples/sec   Loss 6.0489   LearningRate 0.0970   Epoch: 9   Global Step: 95370   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:22:39,016-Speed 5469.11 samples/sec   Loss 6.0902   LearningRate 0.0970   Epoch: 9   Global Step: 95380   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:22:46,689-Speed 5339.03 samples/sec   Loss 6.1263   LearningRate 0.0969   Epoch: 9   Global Step: 95390   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:22:54,250-Speed 5417.34 samples/sec   Loss 6.0177   LearningRate 0.0969   Epoch: 9   Global Step: 95400   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:23:01,684-Speed 5510.95 samples/sec   Loss 6.0722   LearningRate 0.0969   Epoch: 9   Global Step: 95410   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:23:09,194-Speed 5455.11 samples/sec   Loss 6.0676   LearningRate 0.0969   Epoch: 9   Global Step: 95420   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:23:16,669-Speed 5480.52 samples/sec   Loss 6.0398   LearningRate 0.0969   Epoch: 9   Global Step: 95430   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:23:24,138-Speed 5484.43 samples/sec   Loss 6.0675   LearningRate 0.0969   Epoch: 9   Global Step: 95440   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:23:31,619-Speed 5476.10 samples/sec   Loss 6.0675   LearningRate 0.0968   Epoch: 9   Global Step: 95450   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:23:39,108-Speed 5470.08 samples/sec   Loss 6.0292   LearningRate 0.0968   Epoch: 9   Global Step: 95460   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:23:46,586-Speed 5478.75 samples/sec   Loss 6.1101   LearningRate 0.0968   Epoch: 9   Global Step: 95470   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:23:54,024-Speed 5506.80 samples/sec   Loss 6.0903   LearningRate 0.0968   Epoch: 9   Global Step: 95480   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:24:01,539-Speed 5451.50 samples/sec   Loss 6.1088   LearningRate 0.0968   Epoch: 9   Global Step: 95490   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:24:09,100-Speed 5417.51 samples/sec   Loss 6.0161   LearningRate 0.0967   Epoch: 9   Global Step: 95500   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:24:16,602-Speed 5460.78 samples/sec   Loss 6.0390   LearningRate 0.0967   Epoch: 9   Global Step: 95510   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:24:24,090-Speed 5470.98 samples/sec   Loss 6.0092   LearningRate 0.0967   Epoch: 9   Global Step: 95520   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:24:31,548-Speed 5492.93 samples/sec   Loss 6.0207   LearningRate 0.0967   Epoch: 9   Global Step: 95530   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:24:39,233-Speed 5330.47 samples/sec   Loss 5.9704   LearningRate 0.0967   Epoch: 9   Global Step: 95540   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:24:46,707-Speed 5481.26 samples/sec   Loss 6.0408   LearningRate 0.0967   Epoch: 9   Global Step: 95550   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:24:54,174-Speed 5513.12 samples/sec   Loss 6.0675   LearningRate 0.0966   Epoch: 9   Global Step: 95560   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:25:01,657-Speed 5474.18 samples/sec   Loss 6.0567   LearningRate 0.0966   Epoch: 9   Global Step: 95570   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:25:09,133-Speed 5479.01 samples/sec   Loss 6.0780   LearningRate 0.0966   Epoch: 9   Global Step: 95580   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:25:16,610-Speed 5479.43 samples/sec   Loss 6.0826   LearningRate 0.0966   Epoch: 9   Global Step: 95590   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:25:24,062-Speed 5497.61 samples/sec   Loss 6.1142   LearningRate 0.0966   Epoch: 9   Global Step: 95600   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:25:31,540-Speed 5477.32 samples/sec   Loss 6.0282   LearningRate 0.0966   Epoch: 9   Global Step: 95610   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:25:39,023-Speed 5474.68 samples/sec   Loss 6.0741   LearningRate 0.0965   Epoch: 9   Global Step: 95620   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:25:46,569-Speed 5429.49 samples/sec   Loss 6.0449   LearningRate 0.0965   Epoch: 9   Global Step: 95630   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:25:53,993-Speed 5517.93 samples/sec   Loss 6.1039   LearningRate 0.0965   Epoch: 9   Global Step: 95640   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:26:01,503-Speed 5454.51 samples/sec   Loss 6.0781   LearningRate 0.0965   Epoch: 9   Global Step: 95650   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:26:08,978-Speed 5480.00 samples/sec   Loss 6.0008   LearningRate 0.0965   Epoch: 9   Global Step: 95660   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:26:16,437-Speed 5492.57 samples/sec   Loss 6.0649   LearningRate 0.0965   Epoch: 9   Global Step: 95670   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:26:23,887-Speed 5499.41 samples/sec   Loss 6.0004   LearningRate 0.0964   Epoch: 9   Global Step: 95680   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:26:31,375-Speed 5470.09 samples/sec   Loss 5.9887   LearningRate 0.0964   Epoch: 9   Global Step: 95690   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:26:38,830-Speed 5495.18 samples/sec   Loss 6.0618   LearningRate 0.0964   Epoch: 9   Global Step: 95700   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:26:46,367-Speed 5435.49 samples/sec   Loss 6.0540   LearningRate 0.0964   Epoch: 9   Global Step: 95710   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:26:53,843-Speed 5479.67 samples/sec   Loss 6.0548   LearningRate 0.0964   Epoch: 9   Global Step: 95720   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:27:01,442-Speed 5390.58 samples/sec   Loss 6.0368   LearningRate 0.0964   Epoch: 9   Global Step: 95730   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:27:09,016-Speed 5408.61 samples/sec   Loss 6.0355   LearningRate 0.0963   Epoch: 9   Global Step: 95740   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:27:16,499-Speed 5475.01 samples/sec   Loss 6.0743   LearningRate 0.0963   Epoch: 9   Global Step: 95750   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:27:24,192-Speed 5325.45 samples/sec   Loss 6.0208   LearningRate 0.0963   Epoch: 9   Global Step: 95760   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:27:31,673-Speed 5475.38 samples/sec   Loss 6.0435   LearningRate 0.0963   Epoch: 9   Global Step: 95770   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:27:39,143-Speed 5483.96 samples/sec   Loss 6.0216   LearningRate 0.0963   Epoch: 9   Global Step: 95780   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:27:46,599-Speed 5494.57 samples/sec   Loss 6.0968   LearningRate 0.0962   Epoch: 9   Global Step: 95790   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:27:54,191-Speed 5396.25 samples/sec   Loss 6.0482   LearningRate 0.0962   Epoch: 9   Global Step: 95800   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:28:01,660-Speed 5484.44 samples/sec   Loss 5.9630   LearningRate 0.0962   Epoch: 9   Global Step: 95810   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:28:09,056-Speed 5538.36 samples/sec   Loss 6.0505   LearningRate 0.0962   Epoch: 9   Global Step: 95820   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:28:16,569-Speed 5452.58 samples/sec   Loss 6.0397   LearningRate 0.0962   Epoch: 9   Global Step: 95830   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:28:24,069-Speed 5462.38 samples/sec   Loss 6.0266   LearningRate 0.0962   Epoch: 9   Global Step: 95840   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:28:31,703-Speed 5366.15 samples/sec   Loss 6.0406   LearningRate 0.0961   Epoch: 9   Global Step: 95850   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:28:39,234-Speed 5439.91 samples/sec   Loss 6.0301   LearningRate 0.0961   Epoch: 9   Global Step: 95860   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:28:46,783-Speed 5426.21 samples/sec   Loss 6.0932   LearningRate 0.0961   Epoch: 9   Global Step: 95870   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:28:54,353-Speed 5412.16 samples/sec   Loss 6.0382   LearningRate 0.0961   Epoch: 9   Global Step: 95880   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:29:01,910-Speed 5420.36 samples/sec   Loss 6.0792   LearningRate 0.0961   Epoch: 9   Global Step: 95890   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:29:09,540-Speed 5369.03 samples/sec   Loss 6.0455   LearningRate 0.0961   Epoch: 9   Global Step: 95900   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:29:16,975-Speed 5510.35 samples/sec   Loss 6.0534   LearningRate 0.0960   Epoch: 9   Global Step: 95910   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 16:29:24,488-Speed 5452.74 samples/sec   Loss 6.0856   LearningRate 0.0960   Epoch: 9   Global Step: 95920   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:29:32,026-Speed 5434.00 samples/sec   Loss 6.1233   LearningRate 0.0960   Epoch: 9   Global Step: 95930   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:29:39,647-Speed 5375.39 samples/sec   Loss 6.0985   LearningRate 0.0960   Epoch: 9   Global Step: 95940   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:29:47,057-Speed 5528.40 samples/sec   Loss 6.0519   LearningRate 0.0960   Epoch: 9   Global Step: 95950   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:29:54,526-Speed 5485.04 samples/sec   Loss 6.0279   LearningRate 0.0960   Epoch: 9   Global Step: 95960   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:30:02,034-Speed 5456.02 samples/sec   Loss 5.9632   LearningRate 0.0959   Epoch: 9   Global Step: 95970   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:30:09,570-Speed 5436.03 samples/sec   Loss 5.9539   LearningRate 0.0959   Epoch: 9   Global Step: 95980   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:30:17,138-Speed 5412.76 samples/sec   Loss 6.0283   LearningRate 0.0959   Epoch: 9   Global Step: 95990   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:30:24,690-Speed 5424.82 samples/sec   Loss 6.0502   LearningRate 0.0959   Epoch: 9   Global Step: 96000   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:31:08,759-[lfw][96000]XNorm: 22.075303
Training: 2022-01-08 16:31:08,759-[lfw][96000]Accuracy-Flip: 0.99800+-0.00296
Training: 2022-01-08 16:31:08,760-[lfw][96000]Accuracy-Highest: 0.99817
Training: 2022-01-08 16:32:00,775-[cfp_fp][96000]XNorm: 20.072877
Training: 2022-01-08 16:32:00,776-[cfp_fp][96000]Accuracy-Flip: 0.98914+-0.00535
Training: 2022-01-08 16:32:00,777-[cfp_fp][96000]Accuracy-Highest: 0.98914
Training: 2022-01-08 16:32:46,615-[agedb_30][96000]XNorm: 21.833672
Training: 2022-01-08 16:32:46,617-[agedb_30][96000]Accuracy-Flip: 0.97550+-0.00746
Training: 2022-01-08 16:32:46,617-[agedb_30][96000]Accuracy-Highest: 0.97833
Training: 2022-01-08 16:32:54,256-Speed 273.86 samples/sec   Loss 6.0672   LearningRate 0.0959   Epoch: 9   Global Step: 96010   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:33:01,857-Speed 5390.22 samples/sec   Loss 6.0896   LearningRate 0.0959   Epoch: 9   Global Step: 96020   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:33:09,354-Speed 5464.25 samples/sec   Loss 6.0522   LearningRate 0.0958   Epoch: 9   Global Step: 96030   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:33:16,842-Speed 5471.96 samples/sec   Loss 6.0537   LearningRate 0.0958   Epoch: 9   Global Step: 96040   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:33:24,289-Speed 5501.65 samples/sec   Loss 6.0642   LearningRate 0.0958   Epoch: 9   Global Step: 96050   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:33:31,938-Speed 5356.37 samples/sec   Loss 6.0613   LearningRate 0.0958   Epoch: 9   Global Step: 96060   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:33:39,429-Speed 5469.29 samples/sec   Loss 6.0222   LearningRate 0.0958   Epoch: 9   Global Step: 96070   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:33:46,868-Speed 5506.98 samples/sec   Loss 6.0432   LearningRate 0.0957   Epoch: 9   Global Step: 96080   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:33:54,355-Speed 5472.77 samples/sec   Loss 5.9833   LearningRate 0.0957   Epoch: 9   Global Step: 96090   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:34:01,754-Speed 5536.47 samples/sec   Loss 6.0829   LearningRate 0.0957   Epoch: 9   Global Step: 96100   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:34:09,240-Speed 5472.71 samples/sec   Loss 6.0604   LearningRate 0.0957   Epoch: 9   Global Step: 96110   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:34:16,772-Speed 5438.39 samples/sec   Loss 6.0034   LearningRate 0.0957   Epoch: 9   Global Step: 96120   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:34:24,281-Speed 5456.54 samples/sec   Loss 5.9802   LearningRate 0.0957   Epoch: 9   Global Step: 96130   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:34:31,808-Speed 5442.09 samples/sec   Loss 6.0560   LearningRate 0.0956   Epoch: 9   Global Step: 96140   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:34:39,302-Speed 5466.24 samples/sec   Loss 5.9774   LearningRate 0.0956   Epoch: 9   Global Step: 96150   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:34:46,778-Speed 5479.36 samples/sec   Loss 6.0546   LearningRate 0.0956   Epoch: 9   Global Step: 96160   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:34:54,170-Speed 5542.11 samples/sec   Loss 6.0298   LearningRate 0.0956   Epoch: 9   Global Step: 96170   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:35:01,641-Speed 5482.99 samples/sec   Loss 6.0126   LearningRate 0.0956   Epoch: 9   Global Step: 96180   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:35:09,076-Speed 5509.50 samples/sec   Loss 5.9951   LearningRate 0.0956   Epoch: 9   Global Step: 96190   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:35:16,516-Speed 5506.48 samples/sec   Loss 6.0178   LearningRate 0.0955   Epoch: 9   Global Step: 96200   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 16:35:24,006-Speed 5469.73 samples/sec   Loss 6.0064   LearningRate 0.0955   Epoch: 9   Global Step: 96210   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 16:35:31,521-Speed 5450.57 samples/sec   Loss 6.0217   LearningRate 0.0955   Epoch: 9   Global Step: 96220   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:35:39,028-Speed 5457.38 samples/sec   Loss 5.9929   LearningRate 0.0955   Epoch: 9   Global Step: 96230   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:35:46,565-Speed 5435.04 samples/sec   Loss 5.9779   LearningRate 0.0955   Epoch: 9   Global Step: 96240   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:35:53,960-Speed 5539.71 samples/sec   Loss 6.0170   LearningRate 0.0955   Epoch: 9   Global Step: 96250   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:36:01,421-Speed 5490.66 samples/sec   Loss 6.0528   LearningRate 0.0954   Epoch: 9   Global Step: 96260   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:36:08,836-Speed 5524.48 samples/sec   Loss 5.9855   LearningRate 0.0954   Epoch: 9   Global Step: 96270   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:36:16,382-Speed 5428.54 samples/sec   Loss 6.0257   LearningRate 0.0954   Epoch: 9   Global Step: 96280   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:36:23,937-Speed 5422.35 samples/sec   Loss 6.0138   LearningRate 0.0954   Epoch: 9   Global Step: 96290   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:36:31,396-Speed 5492.44 samples/sec   Loss 6.0039   LearningRate 0.0954   Epoch: 9   Global Step: 96300   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:36:38,978-Speed 5402.46 samples/sec   Loss 5.9967   LearningRate 0.0954   Epoch: 9   Global Step: 96310   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 16:36:46,406-Speed 5514.59 samples/sec   Loss 6.0279   LearningRate 0.0953   Epoch: 9   Global Step: 96320   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 16:36:53,949-Speed 5431.83 samples/sec   Loss 6.0617   LearningRate 0.0953   Epoch: 9   Global Step: 96330   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:37:01,420-Speed 5483.31 samples/sec   Loss 6.0538   LearningRate 0.0953   Epoch: 9   Global Step: 96340   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:37:08,916-Speed 5464.54 samples/sec   Loss 6.0345   LearningRate 0.0953   Epoch: 9   Global Step: 96350   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:37:16,413-Speed 5463.91 samples/sec   Loss 6.0225   LearningRate 0.0953   Epoch: 9   Global Step: 96360   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:37:23,846-Speed 5511.96 samples/sec   Loss 6.0374   LearningRate 0.0952   Epoch: 9   Global Step: 96370   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:37:31,368-Speed 5446.06 samples/sec   Loss 5.9915   LearningRate 0.0952   Epoch: 9   Global Step: 96380   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:37:38,989-Speed 5375.13 samples/sec   Loss 6.0282   LearningRate 0.0952   Epoch: 9   Global Step: 96390   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 16:37:46,432-Speed 5503.85 samples/sec   Loss 6.0050   LearningRate 0.0952   Epoch: 9   Global Step: 96400   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 16:37:53,960-Speed 5441.61 samples/sec   Loss 6.0460   LearningRate 0.0952   Epoch: 9   Global Step: 96410   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 16:38:01,468-Speed 5456.30 samples/sec   Loss 6.0081   LearningRate 0.0952   Epoch: 9   Global Step: 96420   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 16:38:08,924-Speed 5494.58 samples/sec   Loss 6.0079   LearningRate 0.0951   Epoch: 9   Global Step: 96430   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 16:38:16,379-Speed 5495.24 samples/sec   Loss 6.0207   LearningRate 0.0951   Epoch: 9   Global Step: 96440   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 16:38:23,807-Speed 5515.02 samples/sec   Loss 6.0143   LearningRate 0.0951   Epoch: 9   Global Step: 96450   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 16:38:31,261-Speed 5495.38 samples/sec   Loss 6.0199   LearningRate 0.0951   Epoch: 9   Global Step: 96460   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 16:38:38,756-Speed 5465.80 samples/sec   Loss 6.0168   LearningRate 0.0951   Epoch: 9   Global Step: 96470   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 16:38:46,250-Speed 5466.27 samples/sec   Loss 6.0368   LearningRate 0.0951   Epoch: 9   Global Step: 96480   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 16:38:53,715-Speed 5488.25 samples/sec   Loss 6.0043   LearningRate 0.0950   Epoch: 9   Global Step: 96490   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:39:01,192-Speed 5478.65 samples/sec   Loss 6.0119   LearningRate 0.0950   Epoch: 9   Global Step: 96500   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:39:08,697-Speed 5458.55 samples/sec   Loss 6.0148   LearningRate 0.0950   Epoch: 9   Global Step: 96510   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:39:16,124-Speed 5515.59 samples/sec   Loss 5.9972   LearningRate 0.0950   Epoch: 9   Global Step: 96520   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:39:23,609-Speed 5473.08 samples/sec   Loss 5.9040   LearningRate 0.0950   Epoch: 9   Global Step: 96530   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:39:31,050-Speed 5505.74 samples/sec   Loss 5.9121   LearningRate 0.0950   Epoch: 9   Global Step: 96540   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:39:38,502-Speed 5496.93 samples/sec   Loss 5.9386   LearningRate 0.0949   Epoch: 9   Global Step: 96550   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:39:45,988-Speed 5472.51 samples/sec   Loss 6.0167   LearningRate 0.0949   Epoch: 9   Global Step: 96560   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:39:53,472-Speed 5473.55 samples/sec   Loss 6.0376   LearningRate 0.0949   Epoch: 9   Global Step: 96570   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:40:00,873-Speed 5535.38 samples/sec   Loss 6.0368   LearningRate 0.0949   Epoch: 9   Global Step: 96580   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:40:08,338-Speed 5487.95 samples/sec   Loss 6.0078   LearningRate 0.0949   Epoch: 9   Global Step: 96590   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 16:40:15,848-Speed 5454.29 samples/sec   Loss 6.0195   LearningRate 0.0949   Epoch: 9   Global Step: 96600   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 16:40:23,276-Speed 5515.08 samples/sec   Loss 6.0050   LearningRate 0.0948   Epoch: 9   Global Step: 96610   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 16:40:30,813-Speed 5435.49 samples/sec   Loss 5.9825   LearningRate 0.0948   Epoch: 9   Global Step: 96620   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 16:40:38,207-Speed 5540.06 samples/sec   Loss 6.0333   LearningRate 0.0948   Epoch: 9   Global Step: 96630   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:40:45,700-Speed 5467.51 samples/sec   Loss 6.0130   LearningRate 0.0948   Epoch: 9   Global Step: 96640   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:40:53,148-Speed 5500.52 samples/sec   Loss 6.0041   LearningRate 0.0948   Epoch: 9   Global Step: 96650   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:41:00,663-Speed 5450.74 samples/sec   Loss 5.9723   LearningRate 0.0948   Epoch: 9   Global Step: 96660   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:41:08,065-Speed 5534.46 samples/sec   Loss 5.9739   LearningRate 0.0947   Epoch: 9   Global Step: 96670   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:41:15,488-Speed 5519.11 samples/sec   Loss 5.9344   LearningRate 0.0947   Epoch: 9   Global Step: 96680   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:41:23,155-Speed 5343.04 samples/sec   Loss 5.9945   LearningRate 0.0947   Epoch: 9   Global Step: 96690   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:41:30,721-Speed 5414.33 samples/sec   Loss 5.9502   LearningRate 0.0947   Epoch: 9   Global Step: 96700   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:41:38,223-Speed 5460.31 samples/sec   Loss 5.9925   LearningRate 0.0947   Epoch: 9   Global Step: 96710   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:41:45,682-Speed 5492.02 samples/sec   Loss 5.9538   LearningRate 0.0947   Epoch: 9   Global Step: 96720   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:41:53,101-Speed 5521.70 samples/sec   Loss 5.9576   LearningRate 0.0946   Epoch: 9   Global Step: 96730   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 16:42:00,510-Speed 5529.95 samples/sec   Loss 6.0170   LearningRate 0.0946   Epoch: 9   Global Step: 96740   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:42:08,005-Speed 5465.50 samples/sec   Loss 6.0091   LearningRate 0.0946   Epoch: 9   Global Step: 96750   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:42:15,531-Speed 5443.25 samples/sec   Loss 5.9562   LearningRate 0.0946   Epoch: 9   Global Step: 96760   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:42:22,960-Speed 5514.38 samples/sec   Loss 5.9474   LearningRate 0.0946   Epoch: 9   Global Step: 96770   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:42:30,406-Speed 5501.94 samples/sec   Loss 6.0652   LearningRate 0.0945   Epoch: 9   Global Step: 96780   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:42:37,794-Speed 5544.76 samples/sec   Loss 5.9995   LearningRate 0.0945   Epoch: 9   Global Step: 96790   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:42:45,210-Speed 5523.81 samples/sec   Loss 6.0104   LearningRate 0.0945   Epoch: 9   Global Step: 96800   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:42:52,690-Speed 5476.45 samples/sec   Loss 5.9527   LearningRate 0.0945   Epoch: 9   Global Step: 96810   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:43:00,136-Speed 5502.57 samples/sec   Loss 5.9726   LearningRate 0.0945   Epoch: 9   Global Step: 96820   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:43:07,592-Speed 5493.97 samples/sec   Loss 5.9234   LearningRate 0.0945   Epoch: 9   Global Step: 96830   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:43:15,058-Speed 5486.61 samples/sec   Loss 5.9566   LearningRate 0.0944   Epoch: 9   Global Step: 96840   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 16:43:22,441-Speed 5548.99 samples/sec   Loss 5.9529   LearningRate 0.0944   Epoch: 9   Global Step: 96850   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 16:43:29,842-Speed 5535.36 samples/sec   Loss 5.9627   LearningRate 0.0944   Epoch: 9   Global Step: 96860   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 16:43:37,382-Speed 5433.01 samples/sec   Loss 6.0062   LearningRate 0.0944   Epoch: 9   Global Step: 96870   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 16:43:44,933-Speed 5425.13 samples/sec   Loss 5.9716   LearningRate 0.0944   Epoch: 9   Global Step: 96880   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 16:43:52,357-Speed 5518.00 samples/sec   Loss 6.0005   LearningRate 0.0944   Epoch: 9   Global Step: 96890   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 16:43:59,867-Speed 5454.39 samples/sec   Loss 5.9791   LearningRate 0.0943   Epoch: 9   Global Step: 96900   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 16:44:07,341-Speed 5481.49 samples/sec   Loss 6.0062   LearningRate 0.0943   Epoch: 9   Global Step: 96910   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:44:14,832-Speed 5468.90 samples/sec   Loss 5.9865   LearningRate 0.0943   Epoch: 9   Global Step: 96920   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:44:22,288-Speed 5494.00 samples/sec   Loss 5.9836   LearningRate 0.0943   Epoch: 9   Global Step: 96930   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:44:29,764-Speed 5479.18 samples/sec   Loss 5.9458   LearningRate 0.0943   Epoch: 9   Global Step: 96940   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:44:37,196-Speed 5512.56 samples/sec   Loss 5.9860   LearningRate 0.0943   Epoch: 9   Global Step: 96950   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 16:44:44,658-Speed 5490.39 samples/sec   Loss 5.9953   LearningRate 0.0942   Epoch: 9   Global Step: 96960   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 16:44:52,092-Speed 5509.59 samples/sec   Loss 5.9328   LearningRate 0.0942   Epoch: 9   Global Step: 96970   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 16:44:59,508-Speed 5524.57 samples/sec   Loss 5.9508   LearningRate 0.0942   Epoch: 9   Global Step: 96980   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 16:45:06,929-Speed 5521.38 samples/sec   Loss 5.9648   LearningRate 0.0942   Epoch: 9   Global Step: 96990   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 16:45:22,035-Speed 2711.76 samples/sec   Loss 5.9574   LearningRate 0.0942   Epoch: 9   Global Step: 97000   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 16:45:29,630-Speed 5393.82 samples/sec   Loss 5.9221   LearningRate 0.0942   Epoch: 9   Global Step: 97010   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 16:45:37,099-Speed 5484.73 samples/sec   Loss 5.9927   LearningRate 0.0941   Epoch: 9   Global Step: 97020   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 16:45:44,597-Speed 5463.58 samples/sec   Loss 5.9656   LearningRate 0.0941   Epoch: 9   Global Step: 97030   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 16:45:52,057-Speed 5491.28 samples/sec   Loss 5.9655   LearningRate 0.0941   Epoch: 9   Global Step: 97040   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 16:45:59,528-Speed 5483.23 samples/sec   Loss 5.9968   LearningRate 0.0941   Epoch: 9   Global Step: 97050   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:46:07,013-Speed 5473.34 samples/sec   Loss 6.0331   LearningRate 0.0941   Epoch: 9   Global Step: 97060   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:46:14,521-Speed 5456.39 samples/sec   Loss 6.0200   LearningRate 0.0941   Epoch: 9   Global Step: 97070   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:46:22,013-Speed 5467.71 samples/sec   Loss 5.9362   LearningRate 0.0940   Epoch: 9   Global Step: 97080   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:46:29,546-Speed 5437.85 samples/sec   Loss 6.0079   LearningRate 0.0940   Epoch: 9   Global Step: 97090   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:46:37,054-Speed 5456.93 samples/sec   Loss 5.9472   LearningRate 0.0940   Epoch: 9   Global Step: 97100   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:46:44,491-Speed 5508.00 samples/sec   Loss 6.0017   LearningRate 0.0940   Epoch: 9   Global Step: 97110   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:46:51,981-Speed 5469.46 samples/sec   Loss 6.0004   LearningRate 0.0940   Epoch: 9   Global Step: 97120   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:46:59,444-Speed 5488.52 samples/sec   Loss 5.9807   LearningRate 0.0940   Epoch: 9   Global Step: 97130   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:47:06,881-Speed 5508.92 samples/sec   Loss 6.0065   LearningRate 0.0939   Epoch: 9   Global Step: 97140   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:47:14,388-Speed 5457.51 samples/sec   Loss 5.9418   LearningRate 0.0939   Epoch: 9   Global Step: 97150   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 16:47:21,910-Speed 5445.26 samples/sec   Loss 5.9018   LearningRate 0.0939   Epoch: 9   Global Step: 97160   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 16:47:29,375-Speed 5487.82 samples/sec   Loss 5.9127   LearningRate 0.0939   Epoch: 9   Global Step: 97170   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 16:47:36,940-Speed 5415.26 samples/sec   Loss 5.9636   LearningRate 0.0939   Epoch: 9   Global Step: 97180   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 16:47:44,426-Speed 5472.56 samples/sec   Loss 5.9683   LearningRate 0.0938   Epoch: 9   Global Step: 97190   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:47:51,928-Speed 5460.47 samples/sec   Loss 5.9806   LearningRate 0.0938   Epoch: 9   Global Step: 97200   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:47:59,490-Speed 5417.37 samples/sec   Loss 5.9752   LearningRate 0.0938   Epoch: 9   Global Step: 97210   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:48:06,973-Speed 5474.38 samples/sec   Loss 5.9437   LearningRate 0.0938   Epoch: 9   Global Step: 97220   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:48:14,569-Speed 5393.50 samples/sec   Loss 5.9016   LearningRate 0.0938   Epoch: 9   Global Step: 97230   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:48:22,021-Speed 5497.05 samples/sec   Loss 5.9487   LearningRate 0.0938   Epoch: 9   Global Step: 97240   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:48:29,522-Speed 5461.48 samples/sec   Loss 5.9398   LearningRate 0.0937   Epoch: 9   Global Step: 97250   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:48:37,064-Speed 5431.57 samples/sec   Loss 5.9604   LearningRate 0.0937   Epoch: 9   Global Step: 97260   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:48:44,527-Speed 5489.55 samples/sec   Loss 5.9599   LearningRate 0.0937   Epoch: 9   Global Step: 97270   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:48:51,984-Speed 5493.59 samples/sec   Loss 5.9812   LearningRate 0.0937   Epoch: 9   Global Step: 97280   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:48:59,462-Speed 5478.08 samples/sec   Loss 5.9590   LearningRate 0.0937   Epoch: 9   Global Step: 97290   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 16:49:06,891-Speed 5514.57 samples/sec   Loss 5.9467   LearningRate 0.0937   Epoch: 9   Global Step: 97300   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 16:49:14,352-Speed 5490.70 samples/sec   Loss 6.0050   LearningRate 0.0936   Epoch: 9   Global Step: 97310   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 16:49:21,834-Speed 5474.96 samples/sec   Loss 5.9773   LearningRate 0.0936   Epoch: 9   Global Step: 97320   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 16:49:29,261-Speed 5515.75 samples/sec   Loss 5.9482   LearningRate 0.0936   Epoch: 9   Global Step: 97330   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:49:36,753-Speed 5467.90 samples/sec   Loss 5.9751   LearningRate 0.0936   Epoch: 9   Global Step: 97340   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:49:44,191-Speed 5507.99 samples/sec   Loss 5.9247   LearningRate 0.0936   Epoch: 9   Global Step: 97350   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:49:51,676-Speed 5472.58 samples/sec   Loss 5.9881   LearningRate 0.0936   Epoch: 9   Global Step: 97360   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:49:59,174-Speed 5463.77 samples/sec   Loss 6.0201   LearningRate 0.0935   Epoch: 9   Global Step: 97370   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:50:06,630-Speed 5494.03 samples/sec   Loss 6.0094   LearningRate 0.0935   Epoch: 9   Global Step: 97380   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:50:14,122-Speed 5468.04 samples/sec   Loss 5.9599   LearningRate 0.0935   Epoch: 9   Global Step: 97390   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:50:21,569-Speed 5501.23 samples/sec   Loss 5.9433   LearningRate 0.0935   Epoch: 9   Global Step: 97400   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:50:29,042-Speed 5481.21 samples/sec   Loss 5.9674   LearningRate 0.0935   Epoch: 9   Global Step: 97410   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:50:36,512-Speed 5484.07 samples/sec   Loss 5.9628   LearningRate 0.0935   Epoch: 9   Global Step: 97420   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:50:43,958-Speed 5501.94 samples/sec   Loss 5.9078   LearningRate 0.0934   Epoch: 9   Global Step: 97430   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 16:50:51,429-Speed 5483.92 samples/sec   Loss 5.9156   LearningRate 0.0934   Epoch: 9   Global Step: 97440   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 16:50:58,945-Speed 5449.65 samples/sec   Loss 5.9529   LearningRate 0.0934   Epoch: 9   Global Step: 97450   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 16:51:06,392-Speed 5501.32 samples/sec   Loss 5.9559   LearningRate 0.0934   Epoch: 9   Global Step: 97460   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 16:51:13,805-Speed 5526.71 samples/sec   Loss 5.9374   LearningRate 0.0934   Epoch: 9   Global Step: 97470   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 16:51:21,337-Speed 5438.33 samples/sec   Loss 5.9477   LearningRate 0.0934   Epoch: 9   Global Step: 97480   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 16:51:28,883-Speed 5428.37 samples/sec   Loss 6.0216   LearningRate 0.0933   Epoch: 9   Global Step: 97490   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 16:51:36,284-Speed 5535.25 samples/sec   Loss 5.9152   LearningRate 0.0933   Epoch: 9   Global Step: 97500   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 16:51:43,874-Speed 5398.07 samples/sec   Loss 5.9357   LearningRate 0.0933   Epoch: 9   Global Step: 97510   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:51:51,462-Speed 5398.39 samples/sec   Loss 5.9377   LearningRate 0.0933   Epoch: 9   Global Step: 97520   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:51:58,923-Speed 5490.28 samples/sec   Loss 5.9294   LearningRate 0.0933   Epoch: 9   Global Step: 97530   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:52:06,367-Speed 5503.61 samples/sec   Loss 5.9095   LearningRate 0.0933   Epoch: 9   Global Step: 97540   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:52:13,826-Speed 5491.88 samples/sec   Loss 5.9156   LearningRate 0.0932   Epoch: 9   Global Step: 97550   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:52:21,301-Speed 5480.41 samples/sec   Loss 5.8903   LearningRate 0.0932   Epoch: 9   Global Step: 97560   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:52:28,757-Speed 5494.63 samples/sec   Loss 5.8986   LearningRate 0.0932   Epoch: 9   Global Step: 97570   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:52:36,173-Speed 5523.19 samples/sec   Loss 5.9842   LearningRate 0.0932   Epoch: 9   Global Step: 97580   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:52:43,604-Speed 5513.14 samples/sec   Loss 5.9698   LearningRate 0.0932   Epoch: 9   Global Step: 97590   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:52:51,050-Speed 5501.92 samples/sec   Loss 5.8670   LearningRate 0.0932   Epoch: 9   Global Step: 97600   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:52:58,455-Speed 5532.28 samples/sec   Loss 5.9572   LearningRate 0.0931   Epoch: 9   Global Step: 97610   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:53:05,874-Speed 5521.54 samples/sec   Loss 5.9260   LearningRate 0.0931   Epoch: 9   Global Step: 97620   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:53:13,325-Speed 5497.93 samples/sec   Loss 5.9569   LearningRate 0.0931   Epoch: 9   Global Step: 97630   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:53:20,781-Speed 5494.66 samples/sec   Loss 5.9329   LearningRate 0.0931   Epoch: 9   Global Step: 97640   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:53:28,573-Speed 5257.00 samples/sec   Loss 5.9847   LearningRate 0.0931   Epoch: 9   Global Step: 97650   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:53:35,982-Speed 5529.26 samples/sec   Loss 5.9345   LearningRate 0.0930   Epoch: 9   Global Step: 97660   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:53:43,500-Speed 5449.01 samples/sec   Loss 5.8826   LearningRate 0.0930   Epoch: 9   Global Step: 97670   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:53:50,947-Speed 5500.56 samples/sec   Loss 5.9411   LearningRate 0.0930   Epoch: 9   Global Step: 97680   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:53:58,403-Speed 5495.00 samples/sec   Loss 5.9706   LearningRate 0.0930   Epoch: 9   Global Step: 97690   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:54:05,897-Speed 5466.17 samples/sec   Loss 6.0161   LearningRate 0.0930   Epoch: 9   Global Step: 97700   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:54:13,385-Speed 5470.41 samples/sec   Loss 5.9326   LearningRate 0.0930   Epoch: 9   Global Step: 97710   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 16:54:20,912-Speed 5442.95 samples/sec   Loss 5.9343   LearningRate 0.0929   Epoch: 9   Global Step: 97720   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 16:54:28,581-Speed 5341.89 samples/sec   Loss 5.9158   LearningRate 0.0929   Epoch: 9   Global Step: 97730   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 16:54:36,010-Speed 5514.41 samples/sec   Loss 5.9878   LearningRate 0.0929   Epoch: 9   Global Step: 97740   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:54:43,596-Speed 5399.65 samples/sec   Loss 5.9162   LearningRate 0.0929   Epoch: 9   Global Step: 97750   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:54:51,066-Speed 5483.95 samples/sec   Loss 5.9876   LearningRate 0.0929   Epoch: 9   Global Step: 97760   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:54:58,479-Speed 5526.78 samples/sec   Loss 5.9371   LearningRate 0.0929   Epoch: 9   Global Step: 97770   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:55:05,932-Speed 5496.04 samples/sec   Loss 5.9200   LearningRate 0.0928   Epoch: 9   Global Step: 97780   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:55:13,382-Speed 5498.90 samples/sec   Loss 5.9297   LearningRate 0.0928   Epoch: 9   Global Step: 97790   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:55:20,946-Speed 5415.81 samples/sec   Loss 5.9281   LearningRate 0.0928   Epoch: 9   Global Step: 97800   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:55:28,475-Speed 5441.69 samples/sec   Loss 5.9630   LearningRate 0.0928   Epoch: 9   Global Step: 97810   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:55:36,083-Speed 5384.14 samples/sec   Loss 5.9627   LearningRate 0.0928   Epoch: 9   Global Step: 97820   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:55:43,697-Speed 5380.41 samples/sec   Loss 5.9323   LearningRate 0.0928   Epoch: 9   Global Step: 97830   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:55:51,471-Speed 5269.88 samples/sec   Loss 5.9534   LearningRate 0.0927   Epoch: 9   Global Step: 97840   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 16:55:59,083-Speed 5381.67 samples/sec   Loss 5.9328   LearningRate 0.0927   Epoch: 9   Global Step: 97850   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:56:06,727-Speed 5359.18 samples/sec   Loss 5.9155   LearningRate 0.0927   Epoch: 9   Global Step: 97860   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:56:14,424-Speed 5321.82 samples/sec   Loss 5.8856   LearningRate 0.0927   Epoch: 9   Global Step: 97870   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:56:22,044-Speed 5376.65 samples/sec   Loss 5.9303   LearningRate 0.0927   Epoch: 9   Global Step: 97880   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:56:29,637-Speed 5394.75 samples/sec   Loss 5.9170   LearningRate 0.0927   Epoch: 9   Global Step: 97890   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:56:37,168-Speed 5439.56 samples/sec   Loss 5.8718   LearningRate 0.0926   Epoch: 9   Global Step: 97900   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:56:44,715-Speed 5428.16 samples/sec   Loss 5.9488   LearningRate 0.0926   Epoch: 9   Global Step: 97910   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:56:52,172-Speed 5493.50 samples/sec   Loss 5.9306   LearningRate 0.0926   Epoch: 9   Global Step: 97920   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:56:59,672-Speed 5461.87 samples/sec   Loss 5.9543   LearningRate 0.0926   Epoch: 9   Global Step: 97930   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:57:07,222-Speed 5426.24 samples/sec   Loss 5.8943   LearningRate 0.0926   Epoch: 9   Global Step: 97940   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:57:14,751-Speed 5441.33 samples/sec   Loss 5.8884   LearningRate 0.0926   Epoch: 9   Global Step: 97950   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 16:57:22,279-Speed 5441.34 samples/sec   Loss 5.9096   LearningRate 0.0925   Epoch: 9   Global Step: 97960   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:57:29,784-Speed 5458.07 samples/sec   Loss 5.9348   LearningRate 0.0925   Epoch: 9   Global Step: 97970   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:57:37,237-Speed 5496.78 samples/sec   Loss 5.9451   LearningRate 0.0925   Epoch: 9   Global Step: 97980   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:57:44,760-Speed 5445.23 samples/sec   Loss 5.9514   LearningRate 0.0925   Epoch: 9   Global Step: 97990   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:57:52,324-Speed 5416.16 samples/sec   Loss 5.9314   LearningRate 0.0925   Epoch: 9   Global Step: 98000   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 16:58:36,657-[lfw][98000]XNorm: 21.642411
Training: 2022-01-08 16:58:36,658-[lfw][98000]Accuracy-Flip: 0.99783+-0.00289
Training: 2022-01-08 16:58:36,659-[lfw][98000]Accuracy-Highest: 0.99817
Training: 2022-01-08 16:59:27,973-[cfp_fp][98000]XNorm: 19.796313
Training: 2022-01-08 16:59:27,974-[cfp_fp][98000]Accuracy-Flip: 0.98700+-0.00614
Training: 2022-01-08 16:59:27,974-[cfp_fp][98000]Accuracy-Highest: 0.98914
Training: 2022-01-08 17:00:13,294-[agedb_30][98000]XNorm: 21.276060
Training: 2022-01-08 17:00:13,296-[agedb_30][98000]Accuracy-Flip: 0.97500+-0.00792
Training: 2022-01-08 17:00:13,296-[agedb_30][98000]Accuracy-Highest: 0.97833
Training: 2022-01-08 17:00:20,966-Speed 275.57 samples/sec   Loss 5.9450   LearningRate 0.0925   Epoch: 9   Global Step: 98010   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:00:28,609-Speed 5360.99 samples/sec   Loss 5.9587   LearningRate 0.0924   Epoch: 9   Global Step: 98020   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 17:00:36,172-Speed 5417.05 samples/sec   Loss 5.9542   LearningRate 0.0924   Epoch: 9   Global Step: 98030   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 17:00:43,609-Speed 5509.04 samples/sec   Loss 5.9416   LearningRate 0.0924   Epoch: 9   Global Step: 98040   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 17:00:51,159-Speed 5426.94 samples/sec   Loss 5.9300   LearningRate 0.0924   Epoch: 9   Global Step: 98050   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 17:00:58,608-Speed 5499.85 samples/sec   Loss 5.9360   LearningRate 0.0924   Epoch: 9   Global Step: 98060   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 17:01:06,148-Speed 5433.31 samples/sec   Loss 5.9215   LearningRate 0.0924   Epoch: 9   Global Step: 98070   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 17:01:13,754-Speed 5385.49 samples/sec   Loss 5.9186   LearningRate 0.0923   Epoch: 9   Global Step: 98080   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 17:01:21,457-Speed 5317.97 samples/sec   Loss 5.9625   LearningRate 0.0923   Epoch: 9   Global Step: 98090   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 17:01:28,990-Speed 5438.60 samples/sec   Loss 5.8819   LearningRate 0.0923   Epoch: 9   Global Step: 98100   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 17:01:36,416-Speed 5516.43 samples/sec   Loss 5.8942   LearningRate 0.0923   Epoch: 9   Global Step: 98110   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 17:01:43,907-Speed 5468.48 samples/sec   Loss 5.9086   LearningRate 0.0923   Epoch: 9   Global Step: 98120   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:01:51,397-Speed 5468.97 samples/sec   Loss 5.8427   LearningRate 0.0923   Epoch: 9   Global Step: 98130   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:01:58,863-Speed 5486.75 samples/sec   Loss 5.9523   LearningRate 0.0922   Epoch: 9   Global Step: 98140   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:02:06,311-Speed 5500.82 samples/sec   Loss 5.8918   LearningRate 0.0922   Epoch: 9   Global Step: 98150   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:02:13,770-Speed 5491.71 samples/sec   Loss 5.9214   LearningRate 0.0922   Epoch: 9   Global Step: 98160   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:02:21,186-Speed 5524.28 samples/sec   Loss 5.9423   LearningRate 0.0922   Epoch: 9   Global Step: 98170   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:02:28,721-Speed 5436.10 samples/sec   Loss 5.9355   LearningRate 0.0922   Epoch: 9   Global Step: 98180   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:02:36,167-Speed 5502.09 samples/sec   Loss 5.8735   LearningRate 0.0922   Epoch: 9   Global Step: 98190   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:02:43,610-Speed 5504.06 samples/sec   Loss 5.8832   LearningRate 0.0921   Epoch: 9   Global Step: 98200   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:02:51,130-Speed 5447.34 samples/sec   Loss 5.9157   LearningRate 0.0921   Epoch: 9   Global Step: 98210   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:02:58,532-Speed 5534.53 samples/sec   Loss 6.0068   LearningRate 0.0921   Epoch: 9   Global Step: 98220   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:03:06,004-Speed 5482.35 samples/sec   Loss 5.8947   LearningRate 0.0921   Epoch: 9   Global Step: 98230   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:03:13,501-Speed 5464.25 samples/sec   Loss 5.8975   LearningRate 0.0921   Epoch: 9   Global Step: 98240   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:03:20,970-Speed 5484.98 samples/sec   Loss 5.9148   LearningRate 0.0921   Epoch: 9   Global Step: 98250   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:03:28,463-Speed 5466.97 samples/sec   Loss 5.9487   LearningRate 0.0920   Epoch: 9   Global Step: 98260   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:03:35,931-Speed 5485.88 samples/sec   Loss 5.9053   LearningRate 0.0920   Epoch: 9   Global Step: 98270   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:03:43,475-Speed 5430.13 samples/sec   Loss 5.9337   LearningRate 0.0920   Epoch: 9   Global Step: 98280   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:03:50,943-Speed 5485.18 samples/sec   Loss 5.9093   LearningRate 0.0920   Epoch: 9   Global Step: 98290   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:03:58,371-Speed 5515.21 samples/sec   Loss 5.9025   LearningRate 0.0920   Epoch: 9   Global Step: 98300   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:04:05,831-Speed 5491.35 samples/sec   Loss 5.8366   LearningRate 0.0919   Epoch: 9   Global Step: 98310   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:04:13,248-Speed 5523.42 samples/sec   Loss 5.9091   LearningRate 0.0919   Epoch: 9   Global Step: 98320   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:04:20,772-Speed 5444.53 samples/sec   Loss 5.8786   LearningRate 0.0919   Epoch: 9   Global Step: 98330   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:04:28,324-Speed 5424.41 samples/sec   Loss 5.9147   LearningRate 0.0919   Epoch: 9   Global Step: 98340   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:04:35,885-Speed 5417.81 samples/sec   Loss 5.8753   LearningRate 0.0919   Epoch: 9   Global Step: 98350   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:04:43,444-Speed 5419.69 samples/sec   Loss 5.8582   LearningRate 0.0919   Epoch: 9   Global Step: 98360   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:04:51,036-Speed 5395.52 samples/sec   Loss 5.9478   LearningRate 0.0918   Epoch: 9   Global Step: 98370   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:04:58,564-Speed 5441.75 samples/sec   Loss 5.8726   LearningRate 0.0918   Epoch: 9   Global Step: 98380   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:05:05,959-Speed 5539.54 samples/sec   Loss 5.8431   LearningRate 0.0918   Epoch: 9   Global Step: 98390   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:05:13,403-Speed 5503.39 samples/sec   Loss 5.9086   LearningRate 0.0918   Epoch: 9   Global Step: 98400   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:05:20,811-Speed 5529.91 samples/sec   Loss 5.8483   LearningRate 0.0918   Epoch: 9   Global Step: 98410   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:05:28,277-Speed 5487.25 samples/sec   Loss 5.8686   LearningRate 0.0918   Epoch: 9   Global Step: 98420   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:05:35,741-Speed 5488.05 samples/sec   Loss 5.9245   LearningRate 0.0917   Epoch: 9   Global Step: 98430   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:05:43,233-Speed 5468.16 samples/sec   Loss 5.9240   LearningRate 0.0917   Epoch: 9   Global Step: 98440   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:05:50,651-Speed 5522.09 samples/sec   Loss 5.9030   LearningRate 0.0917   Epoch: 9   Global Step: 98450   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:05:58,140-Speed 5470.62 samples/sec   Loss 5.8758   LearningRate 0.0917   Epoch: 9   Global Step: 98460   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:06:05,564-Speed 5517.47 samples/sec   Loss 5.9487   LearningRate 0.0917   Epoch: 9   Global Step: 98470   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:06:12,994-Speed 5514.05 samples/sec   Loss 5.8596   LearningRate 0.0917   Epoch: 9   Global Step: 98480   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:06:20,446-Speed 5496.83 samples/sec   Loss 5.8287   LearningRate 0.0916   Epoch: 9   Global Step: 98490   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:06:27,922-Speed 5480.19 samples/sec   Loss 5.9332   LearningRate 0.0916   Epoch: 9   Global Step: 98500   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:06:35,333-Speed 5526.95 samples/sec   Loss 5.9061   LearningRate 0.0916   Epoch: 9   Global Step: 98510   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:06:42,987-Speed 5352.60 samples/sec   Loss 5.9028   LearningRate 0.0916   Epoch: 9   Global Step: 98520   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:06:50,514-Speed 5442.47 samples/sec   Loss 5.9311   LearningRate 0.0916   Epoch: 9   Global Step: 98530   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:06:58,080-Speed 5414.22 samples/sec   Loss 5.8870   LearningRate 0.0916   Epoch: 9   Global Step: 98540   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:07:05,605-Speed 5444.47 samples/sec   Loss 5.8540   LearningRate 0.0915   Epoch: 9   Global Step: 98550   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:07:13,116-Speed 5454.20 samples/sec   Loss 5.8915   LearningRate 0.0915   Epoch: 9   Global Step: 98560   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:07:20,574-Speed 5492.54 samples/sec   Loss 5.9646   LearningRate 0.0915   Epoch: 9   Global Step: 98570   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:07:28,008-Speed 5510.55 samples/sec   Loss 5.8784   LearningRate 0.0915   Epoch: 9   Global Step: 98580   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:07:35,528-Speed 5447.04 samples/sec   Loss 5.8561   LearningRate 0.0915   Epoch: 9   Global Step: 98590   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:07:43,000-Speed 5482.82 samples/sec   Loss 5.8536   LearningRate 0.0915   Epoch: 9   Global Step: 98600   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:07:50,473-Speed 5483.21 samples/sec   Loss 5.8409   LearningRate 0.0914   Epoch: 9   Global Step: 98610   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:07:57,942-Speed 5484.07 samples/sec   Loss 5.8537   LearningRate 0.0914   Epoch: 9   Global Step: 98620   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:08:05,619-Speed 5336.82 samples/sec   Loss 5.8964   LearningRate 0.0914   Epoch: 9   Global Step: 98630   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:08:13,195-Speed 5407.08 samples/sec   Loss 5.9179   LearningRate 0.0914   Epoch: 9   Global Step: 98640   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:08:20,621-Speed 5518.51 samples/sec   Loss 5.9086   LearningRate 0.0914   Epoch: 9   Global Step: 98650   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:08:28,135-Speed 5451.33 samples/sec   Loss 5.9379   LearningRate 0.0914   Epoch: 9   Global Step: 98660   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:08:35,664-Speed 5441.33 samples/sec   Loss 5.8909   LearningRate 0.0913   Epoch: 9   Global Step: 98670   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:08:43,200-Speed 5435.86 samples/sec   Loss 5.8611   LearningRate 0.0913   Epoch: 9   Global Step: 98680   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:08:50,645-Speed 5503.07 samples/sec   Loss 5.8419   LearningRate 0.0913   Epoch: 9   Global Step: 98690   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:08:58,149-Speed 5458.52 samples/sec   Loss 5.8846   LearningRate 0.0913   Epoch: 9   Global Step: 98700   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:09:05,628-Speed 5477.52 samples/sec   Loss 5.8835   LearningRate 0.0913   Epoch: 9   Global Step: 98710   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:09:13,071-Speed 5504.40 samples/sec   Loss 5.8832   LearningRate 0.0913   Epoch: 9   Global Step: 98720   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:09:20,623-Speed 5424.73 samples/sec   Loss 5.9123   LearningRate 0.0912   Epoch: 9   Global Step: 98730   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:09:28,108-Speed 5472.33 samples/sec   Loss 5.8598   LearningRate 0.0912   Epoch: 9   Global Step: 98740   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:09:35,617-Speed 5456.03 samples/sec   Loss 5.9073   LearningRate 0.0912   Epoch: 9   Global Step: 98750   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:09:43,140-Speed 5444.98 samples/sec   Loss 5.9002   LearningRate 0.0912   Epoch: 9   Global Step: 98760   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:09:50,686-Speed 5429.19 samples/sec   Loss 5.8691   LearningRate 0.0912   Epoch: 9   Global Step: 98770   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:09:58,233-Speed 5427.90 samples/sec   Loss 5.8480   LearningRate 0.0912   Epoch: 9   Global Step: 98780   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:10:05,639-Speed 5531.66 samples/sec   Loss 5.9227   LearningRate 0.0911   Epoch: 9   Global Step: 98790   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:10:13,065-Speed 5515.84 samples/sec   Loss 5.8615   LearningRate 0.0911   Epoch: 9   Global Step: 98800   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:10:20,542-Speed 5479.26 samples/sec   Loss 5.9560   LearningRate 0.0911   Epoch: 9   Global Step: 98810   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:10:28,004-Speed 5490.12 samples/sec   Loss 5.8614   LearningRate 0.0911   Epoch: 9   Global Step: 98820   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:10:35,469-Speed 5487.74 samples/sec   Loss 5.9212   LearningRate 0.0911   Epoch: 9   Global Step: 98830   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:10:42,923-Speed 5494.82 samples/sec   Loss 5.8276   LearningRate 0.0911   Epoch: 9   Global Step: 98840   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:10:50,457-Speed 5437.91 samples/sec   Loss 5.8978   LearningRate 0.0910   Epoch: 9   Global Step: 98850   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:10:57,956-Speed 5463.11 samples/sec   Loss 5.8388   LearningRate 0.0910   Epoch: 9   Global Step: 98860   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:11:05,486-Speed 5439.79 samples/sec   Loss 5.9307   LearningRate 0.0910   Epoch: 9   Global Step: 98870   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:11:13,011-Speed 5444.22 samples/sec   Loss 5.8540   LearningRate 0.0910   Epoch: 9   Global Step: 98880   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:11:20,543-Speed 5439.11 samples/sec   Loss 5.9104   LearningRate 0.0910   Epoch: 9   Global Step: 98890   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:11:28,145-Speed 5389.04 samples/sec   Loss 5.8753   LearningRate 0.0910   Epoch: 9   Global Step: 98900   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:11:35,740-Speed 5393.31 samples/sec   Loss 5.8880   LearningRate 0.0909   Epoch: 9   Global Step: 98910   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:11:43,258-Speed 5449.40 samples/sec   Loss 5.8476   LearningRate 0.0909   Epoch: 9   Global Step: 98920   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:11:50,863-Speed 5386.70 samples/sec   Loss 5.8282   LearningRate 0.0909   Epoch: 9   Global Step: 98930   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:11:58,311-Speed 5500.44 samples/sec   Loss 5.8813   LearningRate 0.0909   Epoch: 9   Global Step: 98940   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:12:05,913-Speed 5388.84 samples/sec   Loss 5.7937   LearningRate 0.0909   Epoch: 9   Global Step: 98950   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:12:13,447-Speed 5437.41 samples/sec   Loss 5.9319   LearningRate 0.0909   Epoch: 9   Global Step: 98960   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:12:21,215-Speed 5273.82 samples/sec   Loss 5.8384   LearningRate 0.0908   Epoch: 9   Global Step: 98970   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:12:28,742-Speed 5442.28 samples/sec   Loss 5.8475   LearningRate 0.0908   Epoch: 9   Global Step: 98980   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:12:36,279-Speed 5435.11 samples/sec   Loss 5.8426   LearningRate 0.0908   Epoch: 9   Global Step: 98990   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:12:43,852-Speed 5409.06 samples/sec   Loss 5.9387   LearningRate 0.0908   Epoch: 9   Global Step: 99000   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:12:51,301-Speed 5500.07 samples/sec   Loss 5.9047   LearningRate 0.0908   Epoch: 9   Global Step: 99010   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:12:58,775-Speed 5480.82 samples/sec   Loss 5.8145   LearningRate 0.0908   Epoch: 9   Global Step: 99020   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:13:06,247-Speed 5482.69 samples/sec   Loss 5.8358   LearningRate 0.0907   Epoch: 9   Global Step: 99030   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:13:13,740-Speed 5466.65 samples/sec   Loss 5.8635   LearningRate 0.0907   Epoch: 9   Global Step: 99040   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:13:21,155-Speed 5524.62 samples/sec   Loss 5.8620   LearningRate 0.0907   Epoch: 9   Global Step: 99050   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:13:28,726-Speed 5411.45 samples/sec   Loss 5.9109   LearningRate 0.0907   Epoch: 9   Global Step: 99060   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:13:36,224-Speed 5463.20 samples/sec   Loss 5.8942   LearningRate 0.0907   Epoch: 9   Global Step: 99070   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:13:43,776-Speed 5424.12 samples/sec   Loss 5.8525   LearningRate 0.0907   Epoch: 9   Global Step: 99080   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:13:51,246-Speed 5483.95 samples/sec   Loss 5.8661   LearningRate 0.0906   Epoch: 9   Global Step: 99090   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:13:58,736-Speed 5469.73 samples/sec   Loss 5.8369   LearningRate 0.0906   Epoch: 9   Global Step: 99100   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:14:06,223-Speed 5471.87 samples/sec   Loss 5.8496   LearningRate 0.0906   Epoch: 9   Global Step: 99110   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:14:13,701-Speed 5477.68 samples/sec   Loss 5.8480   LearningRate 0.0906   Epoch: 9   Global Step: 99120   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:14:21,379-Speed 5334.89 samples/sec   Loss 5.8254   LearningRate 0.0906   Epoch: 9   Global Step: 99130   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:14:29,040-Speed 5347.46 samples/sec   Loss 5.8469   LearningRate 0.0906   Epoch: 9   Global Step: 99140   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:14:36,481-Speed 5505.80 samples/sec   Loss 5.8725   LearningRate 0.0905   Epoch: 9   Global Step: 99150   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:14:44,001-Speed 5447.25 samples/sec   Loss 5.8910   LearningRate 0.0905   Epoch: 9   Global Step: 99160   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:14:51,573-Speed 5409.93 samples/sec   Loss 5.9084   LearningRate 0.0905   Epoch: 9   Global Step: 99170   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:14:59,017-Speed 5503.95 samples/sec   Loss 5.8766   LearningRate 0.0905   Epoch: 9   Global Step: 99180   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:15:06,500-Speed 5474.38 samples/sec   Loss 5.8903   LearningRate 0.0905   Epoch: 9   Global Step: 99190   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:15:14,021-Speed 5446.69 samples/sec   Loss 5.8558   LearningRate 0.0905   Epoch: 9   Global Step: 99200   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:15:21,534-Speed 5452.67 samples/sec   Loss 5.8709   LearningRate 0.0904   Epoch: 9   Global Step: 99210   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:15:29,031-Speed 5464.09 samples/sec   Loss 5.8201   LearningRate 0.0904   Epoch: 9   Global Step: 99220   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:15:36,522-Speed 5468.39 samples/sec   Loss 5.8237   LearningRate 0.0904   Epoch: 9   Global Step: 99230   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:15:44,043-Speed 5446.84 samples/sec   Loss 5.8645   LearningRate 0.0904   Epoch: 9   Global Step: 99240   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:15:51,593-Speed 5425.64 samples/sec   Loss 5.8910   LearningRate 0.0904   Epoch: 9   Global Step: 99250   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:15:59,192-Speed 5391.52 samples/sec   Loss 5.8770   LearningRate 0.0904   Epoch: 9   Global Step: 99260   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:16:06,778-Speed 5399.92 samples/sec   Loss 5.8422   LearningRate 0.0903   Epoch: 9   Global Step: 99270   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:16:14,177-Speed 5536.78 samples/sec   Loss 5.8748   LearningRate 0.0903   Epoch: 9   Global Step: 99280   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:16:21,737-Speed 5418.60 samples/sec   Loss 5.8486   LearningRate 0.0903   Epoch: 9   Global Step: 99290   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:16:29,259-Speed 5446.36 samples/sec   Loss 5.8377   LearningRate 0.0903   Epoch: 9   Global Step: 99300   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:16:36,831-Speed 5410.31 samples/sec   Loss 5.8743   LearningRate 0.0903   Epoch: 9   Global Step: 99310   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:16:44,359-Speed 5441.46 samples/sec   Loss 5.8269   LearningRate 0.0903   Epoch: 9   Global Step: 99320   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:16:51,870-Speed 5453.35 samples/sec   Loss 5.8458   LearningRate 0.0902   Epoch: 9   Global Step: 99330   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:16:59,303-Speed 5511.84 samples/sec   Loss 5.8650   LearningRate 0.0902   Epoch: 9   Global Step: 99340   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:17:06,784-Speed 5476.08 samples/sec   Loss 5.7909   LearningRate 0.0902   Epoch: 9   Global Step: 99350   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:17:14,333-Speed 5426.85 samples/sec   Loss 5.9110   LearningRate 0.0902   Epoch: 9   Global Step: 99360   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:17:21,827-Speed 5465.72 samples/sec   Loss 5.9122   LearningRate 0.0902   Epoch: 9   Global Step: 99370   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 17:17:29,332-Speed 5458.56 samples/sec   Loss 5.8192   LearningRate 0.0902   Epoch: 9   Global Step: 99380   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 17:17:36,802-Speed 5483.66 samples/sec   Loss 5.8567   LearningRate 0.0901   Epoch: 9   Global Step: 99390   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 17:17:44,323-Speed 5447.41 samples/sec   Loss 5.8652   LearningRate 0.0901   Epoch: 9   Global Step: 99400   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 17:17:51,890-Speed 5413.15 samples/sec   Loss 5.8877   LearningRate 0.0901   Epoch: 9   Global Step: 99410   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 17:17:59,422-Speed 5438.39 samples/sec   Loss 5.9192   LearningRate 0.0901   Epoch: 9   Global Step: 99420   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 17:18:06,876-Speed 5495.66 samples/sec   Loss 5.8283   LearningRate 0.0901   Epoch: 9   Global Step: 99430   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 17:18:14,443-Speed 5414.22 samples/sec   Loss 5.8564   LearningRate 0.0901   Epoch: 9   Global Step: 99440   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 17:18:21,975-Speed 5438.74 samples/sec   Loss 5.8348   LearningRate 0.0900   Epoch: 9   Global Step: 99450   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 17:18:29,465-Speed 5468.90 samples/sec   Loss 5.8025   LearningRate 0.0900   Epoch: 9   Global Step: 99460   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 17:18:36,940-Speed 5480.73 samples/sec   Loss 5.9184   LearningRate 0.0900   Epoch: 9   Global Step: 99470   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:18:44,463-Speed 5445.31 samples/sec   Loss 5.8598   LearningRate 0.0900   Epoch: 9   Global Step: 99480   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:18:51,922-Speed 5492.08 samples/sec   Loss 5.8317   LearningRate 0.0900   Epoch: 9   Global Step: 99490   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:18:59,346-Speed 5517.94 samples/sec   Loss 5.8730   LearningRate 0.0900   Epoch: 9   Global Step: 99500   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:19:06,840-Speed 5466.26 samples/sec   Loss 5.8872   LearningRate 0.0899   Epoch: 9   Global Step: 99510   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:19:14,354-Speed 5451.91 samples/sec   Loss 5.7859   LearningRate 0.0899   Epoch: 9   Global Step: 99520   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:19:21,830-Speed 5479.76 samples/sec   Loss 5.8381   LearningRate 0.0899   Epoch: 9   Global Step: 99530   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:19:29,369-Speed 5433.45 samples/sec   Loss 5.8585   LearningRate 0.0899   Epoch: 9   Global Step: 99540   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:19:36,853-Speed 5474.06 samples/sec   Loss 5.8373   LearningRate 0.0899   Epoch: 9   Global Step: 99550   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:19:44,443-Speed 5397.52 samples/sec   Loss 5.8549   LearningRate 0.0899   Epoch: 9   Global Step: 99560   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:19:51,952-Speed 5455.35 samples/sec   Loss 5.8423   LearningRate 0.0898   Epoch: 9   Global Step: 99570   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:19:59,484-Speed 5438.82 samples/sec   Loss 5.8953   LearningRate 0.0898   Epoch: 9   Global Step: 99580   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:20:06,903-Speed 5521.55 samples/sec   Loss 5.8338   LearningRate 0.0898   Epoch: 9   Global Step: 99590   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:20:14,355-Speed 5497.69 samples/sec   Loss 5.7789   LearningRate 0.0898   Epoch: 9   Global Step: 99600   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:20:21,835-Speed 5476.13 samples/sec   Loss 5.8433   LearningRate 0.0898   Epoch: 9   Global Step: 99610   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:20:29,306-Speed 5483.69 samples/sec   Loss 5.8639   LearningRate 0.0898   Epoch: 9   Global Step: 99620   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:20:36,817-Speed 5454.45 samples/sec   Loss 5.8071   LearningRate 0.0897   Epoch: 9   Global Step: 99630   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:20:44,414-Speed 5392.29 samples/sec   Loss 5.8113   LearningRate 0.0897   Epoch: 9   Global Step: 99640   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:20:51,993-Speed 5404.67 samples/sec   Loss 5.8550   LearningRate 0.0897   Epoch: 9   Global Step: 99650   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:20:59,542-Speed 5426.51 samples/sec   Loss 5.8145   LearningRate 0.0897   Epoch: 9   Global Step: 99660   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:21:07,067-Speed 5444.36 samples/sec   Loss 5.8285   LearningRate 0.0897   Epoch: 9   Global Step: 99670   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:21:14,612-Speed 5429.11 samples/sec   Loss 5.7736   LearningRate 0.0897   Epoch: 9   Global Step: 99680   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:21:22,255-Speed 5359.97 samples/sec   Loss 5.8604   LearningRate 0.0896   Epoch: 9   Global Step: 99690   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:21:29,803-Speed 5426.88 samples/sec   Loss 5.8281   LearningRate 0.0896   Epoch: 9   Global Step: 99700   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:21:37,345-Speed 5432.04 samples/sec   Loss 5.8646   LearningRate 0.0896   Epoch: 9   Global Step: 99710   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:21:44,824-Speed 5477.29 samples/sec   Loss 5.8077   LearningRate 0.0896   Epoch: 9   Global Step: 99720   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:21:52,377-Speed 5423.45 samples/sec   Loss 5.8111   LearningRate 0.0896   Epoch: 9   Global Step: 99730   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:21:59,893-Speed 5450.18 samples/sec   Loss 5.8231   LearningRate 0.0896   Epoch: 9   Global Step: 99740   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:22:07,383-Speed 5469.66 samples/sec   Loss 5.8113   LearningRate 0.0895   Epoch: 9   Global Step: 99750   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:22:14,980-Speed 5392.28 samples/sec   Loss 5.8394   LearningRate 0.0895   Epoch: 9   Global Step: 99760   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:22:22,561-Speed 5403.52 samples/sec   Loss 5.8597   LearningRate 0.0895   Epoch: 9   Global Step: 99770   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:22:30,045-Speed 5473.66 samples/sec   Loss 5.7229   LearningRate 0.0895   Epoch: 9   Global Step: 99780   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:22:37,564-Speed 5449.09 samples/sec   Loss 5.7852   LearningRate 0.0895   Epoch: 9   Global Step: 99790   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:22:45,031-Speed 5485.76 samples/sec   Loss 5.7637   LearningRate 0.0895   Epoch: 9   Global Step: 99800   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:22:52,505-Speed 5480.78 samples/sec   Loss 5.8304   LearningRate 0.0894   Epoch: 9   Global Step: 99810   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:23:00,014-Speed 5455.76 samples/sec   Loss 5.8608   LearningRate 0.0894   Epoch: 9   Global Step: 99820   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:23:07,525-Speed 5454.47 samples/sec   Loss 5.8328   LearningRate 0.0894   Epoch: 9   Global Step: 99830   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:23:15,139-Speed 5379.93 samples/sec   Loss 5.8173   LearningRate 0.0894   Epoch: 9   Global Step: 99840   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:23:22,714-Speed 5408.64 samples/sec   Loss 5.8491   LearningRate 0.0894   Epoch: 9   Global Step: 99850   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:23:30,212-Speed 5463.39 samples/sec   Loss 5.8541   LearningRate 0.0894   Epoch: 9   Global Step: 99860   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:23:37,791-Speed 5404.68 samples/sec   Loss 5.8365   LearningRate 0.0893   Epoch: 9   Global Step: 99870   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:23:45,422-Speed 5368.66 samples/sec   Loss 5.7973   LearningRate 0.0893   Epoch: 9   Global Step: 99880   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:23:52,918-Speed 5465.11 samples/sec   Loss 5.8301   LearningRate 0.0893   Epoch: 9   Global Step: 99890   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:24:00,454-Speed 5436.01 samples/sec   Loss 5.7926   LearningRate 0.0893   Epoch: 9   Global Step: 99900   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:24:07,980-Speed 5442.76 samples/sec   Loss 5.8243   LearningRate 0.0893   Epoch: 9   Global Step: 99910   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:24:15,482-Speed 5461.05 samples/sec   Loss 5.7784   LearningRate 0.0893   Epoch: 9   Global Step: 99920   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:24:22,979-Speed 5464.15 samples/sec   Loss 5.7456   LearningRate 0.0892   Epoch: 9   Global Step: 99930   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:24:30,450-Speed 5482.89 samples/sec   Loss 5.7991   LearningRate 0.0892   Epoch: 9   Global Step: 99940   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:24:37,960-Speed 5454.70 samples/sec   Loss 5.8123   LearningRate 0.0892   Epoch: 9   Global Step: 99950   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:24:45,399-Speed 5507.38 samples/sec   Loss 5.7896   LearningRate 0.0892   Epoch: 9   Global Step: 99960   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:24:52,961-Speed 5417.11 samples/sec   Loss 5.8116   LearningRate 0.0892   Epoch: 9   Global Step: 99970   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:25:00,641-Speed 5333.73 samples/sec   Loss 5.8019   LearningRate 0.0892   Epoch: 9   Global Step: 99980   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:25:08,199-Speed 5420.50 samples/sec   Loss 5.8438   LearningRate 0.0891   Epoch: 9   Global Step: 99990   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:25:15,680-Speed 5475.54 samples/sec   Loss 5.7990   LearningRate 0.0891   Epoch: 9   Global Step: 100000   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:25:59,681-[lfw][100000]XNorm: 22.346219
Training: 2022-01-08 17:25:59,682-[lfw][100000]Accuracy-Flip: 0.99800+-0.00277
Training: 2022-01-08 17:25:59,683-[lfw][100000]Accuracy-Highest: 0.99817
Training: 2022-01-08 17:26:51,031-[cfp_fp][100000]XNorm: 20.419401
Training: 2022-01-08 17:26:51,032-[cfp_fp][100000]Accuracy-Flip: 0.98586+-0.00505
Training: 2022-01-08 17:26:51,032-[cfp_fp][100000]Accuracy-Highest: 0.98914
Training: 2022-01-08 17:27:36,859-[agedb_30][100000]XNorm: 22.253314
Training: 2022-01-08 17:27:36,861-[agedb_30][100000]Accuracy-Flip: 0.97633+-0.00795
Training: 2022-01-08 17:27:36,861-[agedb_30][100000]Accuracy-Highest: 0.97833
Training: 2022-01-08 17:27:44,452-Speed 275.32 samples/sec   Loss 5.7759   LearningRate 0.0891   Epoch: 9   Global Step: 100010   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:27:52,120-Speed 5343.17 samples/sec   Loss 5.7999   LearningRate 0.0891   Epoch: 9   Global Step: 100020   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:27:59,760-Speed 5362.26 samples/sec   Loss 5.8647   LearningRate 0.0891   Epoch: 9   Global Step: 100030   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:28:07,436-Speed 5336.83 samples/sec   Loss 5.8029   LearningRate 0.0891   Epoch: 9   Global Step: 100040   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:28:14,994-Speed 5419.97 samples/sec   Loss 5.7877   LearningRate 0.0890   Epoch: 9   Global Step: 100050   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:28:22,574-Speed 5404.99 samples/sec   Loss 5.7940   LearningRate 0.0890   Epoch: 9   Global Step: 100060   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:28:30,044-Speed 5483.95 samples/sec   Loss 5.8160   LearningRate 0.0890   Epoch: 9   Global Step: 100070   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:28:37,587-Speed 5430.13 samples/sec   Loss 5.7769   LearningRate 0.0890   Epoch: 9   Global Step: 100080   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:28:45,214-Speed 5371.34 samples/sec   Loss 5.8178   LearningRate 0.0890   Epoch: 9   Global Step: 100090   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:28:52,755-Speed 5432.72 samples/sec   Loss 5.8490   LearningRate 0.0890   Epoch: 9   Global Step: 100100   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:29:00,226-Speed 5483.47 samples/sec   Loss 5.7437   LearningRate 0.0889   Epoch: 9   Global Step: 100110   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:29:07,767-Speed 5431.71 samples/sec   Loss 5.8311   LearningRate 0.0889   Epoch: 9   Global Step: 100120   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:29:15,309-Speed 5431.85 samples/sec   Loss 5.8514   LearningRate 0.0889   Epoch: 9   Global Step: 100130   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:29:22,827-Speed 5449.31 samples/sec   Loss 5.8480   LearningRate 0.0889   Epoch: 9   Global Step: 100140   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:29:30,223-Speed 5538.46 samples/sec   Loss 5.7887   LearningRate 0.0889   Epoch: 9   Global Step: 100150   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:29:37,676-Speed 5496.85 samples/sec   Loss 5.7897   LearningRate 0.0889   Epoch: 9   Global Step: 100160   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:29:45,144-Speed 5485.10 samples/sec   Loss 5.8212   LearningRate 0.0888   Epoch: 9   Global Step: 100170   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:29:52,717-Speed 5409.69 samples/sec   Loss 5.7848   LearningRate 0.0888   Epoch: 9   Global Step: 100180   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:30:00,233-Speed 5450.07 samples/sec   Loss 5.8009   LearningRate 0.0888   Epoch: 9   Global Step: 100190   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:30:07,761-Speed 5441.79 samples/sec   Loss 5.8244   LearningRate 0.0888   Epoch: 9   Global Step: 100200   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:30:15,284-Speed 5445.44 samples/sec   Loss 5.7940   LearningRate 0.0888   Epoch: 9   Global Step: 100210   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:30:22,818-Speed 5437.72 samples/sec   Loss 5.7994   LearningRate 0.0888   Epoch: 9   Global Step: 100220   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:30:30,379-Speed 5417.75 samples/sec   Loss 5.7730   LearningRate 0.0887   Epoch: 9   Global Step: 100230   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:30:37,931-Speed 5423.91 samples/sec   Loss 5.7596   LearningRate 0.0887   Epoch: 9   Global Step: 100240   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:30:45,402-Speed 5483.53 samples/sec   Loss 5.8241   LearningRate 0.0887   Epoch: 9   Global Step: 100250   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:30:52,938-Speed 5436.14 samples/sec   Loss 5.8622   LearningRate 0.0887   Epoch: 9   Global Step: 100260   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:31:00,520-Speed 5403.31 samples/sec   Loss 5.7915   LearningRate 0.0887   Epoch: 9   Global Step: 100270   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:31:08,249-Speed 5299.62 samples/sec   Loss 5.8023   LearningRate 0.0887   Epoch: 9   Global Step: 100280   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:31:15,675-Speed 5516.61 samples/sec   Loss 5.7837   LearningRate 0.0886   Epoch: 9   Global Step: 100290   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:31:23,152-Speed 5478.98 samples/sec   Loss 5.7377   LearningRate 0.0886   Epoch: 9   Global Step: 100300   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:31:30,720-Speed 5413.41 samples/sec   Loss 5.7351   LearningRate 0.0886   Epoch: 9   Global Step: 100310   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:31:38,267-Speed 5427.56 samples/sec   Loss 5.7783   LearningRate 0.0886   Epoch: 9   Global Step: 100320   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:31:45,741-Speed 5481.65 samples/sec   Loss 5.8210   LearningRate 0.0886   Epoch: 9   Global Step: 100330   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:31:53,309-Speed 5412.56 samples/sec   Loss 5.8419   LearningRate 0.0886   Epoch: 9   Global Step: 100340   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:32:00,859-Speed 5426.23 samples/sec   Loss 5.8307   LearningRate 0.0885   Epoch: 9   Global Step: 100350   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:32:08,491-Speed 5367.55 samples/sec   Loss 5.7687   LearningRate 0.0885   Epoch: 9   Global Step: 100360   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:32:16,033-Speed 5432.24 samples/sec   Loss 5.7681   LearningRate 0.0885   Epoch: 9   Global Step: 100370   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:32:23,502-Speed 5484.25 samples/sec   Loss 5.7492   LearningRate 0.0885   Epoch: 9   Global Step: 100380   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:32:30,966-Speed 5488.90 samples/sec   Loss 5.7284   LearningRate 0.0885   Epoch: 9   Global Step: 100390   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:32:38,527-Speed 5417.65 samples/sec   Loss 5.8007   LearningRate 0.0885   Epoch: 9   Global Step: 100400   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:32:46,024-Speed 5464.42 samples/sec   Loss 5.8219   LearningRate 0.0884   Epoch: 9   Global Step: 100410   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:32:53,513-Speed 5470.17 samples/sec   Loss 5.7997   LearningRate 0.0884   Epoch: 9   Global Step: 100420   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:33:01,030-Speed 5450.14 samples/sec   Loss 5.7246   LearningRate 0.0884   Epoch: 9   Global Step: 100430   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 17:33:08,656-Speed 5371.58 samples/sec   Loss 5.8210   LearningRate 0.0884   Epoch: 9   Global Step: 100440   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 17:33:16,100-Speed 5503.55 samples/sec   Loss 6.1275   LearningRate 0.0884   Epoch: 9   Global Step: 100450   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 17:33:23,629-Speed 5441.01 samples/sec   Loss 5.9902   LearningRate 0.0884   Epoch: 9   Global Step: 100460   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 17:33:31,105-Speed 5479.73 samples/sec   Loss 5.9300   LearningRate 0.0883   Epoch: 9   Global Step: 100470   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 17:33:38,668-Speed 5416.17 samples/sec   Loss 5.8268   LearningRate 0.0883   Epoch: 9   Global Step: 100480   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 17:33:46,300-Speed 5367.70 samples/sec   Loss 5.8757   LearningRate 0.0883   Epoch: 9   Global Step: 100490   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 17:33:54,129-Speed 5232.87 samples/sec   Loss 5.8189   LearningRate 0.0883   Epoch: 9   Global Step: 100500   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 17:34:01,632-Speed 5460.04 samples/sec   Loss 5.8259   LearningRate 0.0883   Epoch: 9   Global Step: 100510   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 17:34:09,119-Speed 5471.28 samples/sec   Loss 5.8353   LearningRate 0.0883   Epoch: 9   Global Step: 100520   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 17:34:16,573-Speed 5495.83 samples/sec   Loss 5.8057   LearningRate 0.0882   Epoch: 9   Global Step: 100530   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 17:34:24,068-Speed 5466.10 samples/sec   Loss 5.8870   LearningRate 0.0882   Epoch: 9   Global Step: 100540   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:34:31,518-Speed 5498.54 samples/sec   Loss 5.7770   LearningRate 0.0882   Epoch: 9   Global Step: 100550   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:34:38,977-Speed 5492.01 samples/sec   Loss 5.8021   LearningRate 0.0882   Epoch: 9   Global Step: 100560   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:34:46,470-Speed 5467.40 samples/sec   Loss 5.7617   LearningRate 0.0882   Epoch: 9   Global Step: 100570   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:34:53,972-Speed 5460.53 samples/sec   Loss 5.8070   LearningRate 0.0882   Epoch: 9   Global Step: 100580   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:35:01,472-Speed 5461.95 samples/sec   Loss 5.7907   LearningRate 0.0881   Epoch: 9   Global Step: 100590   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:35:08,976-Speed 5485.67 samples/sec   Loss 5.7969   LearningRate 0.0881   Epoch: 9   Global Step: 100600   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:35:16,521-Speed 5429.38 samples/sec   Loss 5.7955   LearningRate 0.0881   Epoch: 9   Global Step: 100610   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:35:24,082-Speed 5418.40 samples/sec   Loss 5.8217   LearningRate 0.0881   Epoch: 9   Global Step: 100620   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:35:31,644-Speed 5417.39 samples/sec   Loss 5.7421   LearningRate 0.0881   Epoch: 9   Global Step: 100630   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:35:39,147-Speed 5459.91 samples/sec   Loss 5.8311   LearningRate 0.0881   Epoch: 9   Global Step: 100640   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:35:46,603-Speed 5493.59 samples/sec   Loss 5.8138   LearningRate 0.0880   Epoch: 9   Global Step: 100650   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:35:54,107-Speed 5460.00 samples/sec   Loss 5.7297   LearningRate 0.0880   Epoch: 9   Global Step: 100660   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:36:01,693-Speed 5399.71 samples/sec   Loss 5.7530   LearningRate 0.0880   Epoch: 9   Global Step: 100670   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:36:09,350-Speed 5350.66 samples/sec   Loss 5.8207   LearningRate 0.0880   Epoch: 9   Global Step: 100680   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:36:16,873-Speed 5444.92 samples/sec   Loss 5.8098   LearningRate 0.0880   Epoch: 9   Global Step: 100690   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:36:24,519-Speed 5357.59 samples/sec   Loss 5.7773   LearningRate 0.0880   Epoch: 9   Global Step: 100700   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:36:31,968-Speed 5499.85 samples/sec   Loss 5.7904   LearningRate 0.0879   Epoch: 9   Global Step: 100710   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:36:39,471-Speed 5460.27 samples/sec   Loss 5.8059   LearningRate 0.0879   Epoch: 9   Global Step: 100720   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:36:46,988-Speed 5449.06 samples/sec   Loss 5.8058   LearningRate 0.0879   Epoch: 9   Global Step: 100730   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:36:54,563-Speed 5408.18 samples/sec   Loss 5.7529   LearningRate 0.0879   Epoch: 9   Global Step: 100740   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:37:02,026-Speed 5489.49 samples/sec   Loss 5.7403   LearningRate 0.0879   Epoch: 9   Global Step: 100750   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 17:37:09,496-Speed 5484.23 samples/sec   Loss 5.7222   LearningRate 0.0879   Epoch: 9   Global Step: 100760   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:37:17,038-Speed 5430.96 samples/sec   Loss 5.7780   LearningRate 0.0878   Epoch: 9   Global Step: 100770   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:37:24,658-Speed 5376.15 samples/sec   Loss 5.8112   LearningRate 0.0878   Epoch: 9   Global Step: 100780   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:37:32,187-Speed 5441.43 samples/sec   Loss 5.7591   LearningRate 0.0878   Epoch: 9   Global Step: 100790   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:37:39,786-Speed 5390.36 samples/sec   Loss 5.8237   LearningRate 0.0878   Epoch: 9   Global Step: 100800   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:37:47,264-Speed 5478.62 samples/sec   Loss 5.6653   LearningRate 0.0878   Epoch: 9   Global Step: 100810   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:37:54,760-Speed 5464.99 samples/sec   Loss 5.7366   LearningRate 0.0878   Epoch: 9   Global Step: 100820   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:38:02,315-Speed 5421.71 samples/sec   Loss 5.7638   LearningRate 0.0878   Epoch: 9   Global Step: 100830   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:38:09,759-Speed 5503.36 samples/sec   Loss 5.8252   LearningRate 0.0877   Epoch: 9   Global Step: 100840   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:38:17,212-Speed 5496.55 samples/sec   Loss 5.7518   LearningRate 0.0877   Epoch: 9   Global Step: 100850   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:38:24,722-Speed 5454.78 samples/sec   Loss 5.7309   LearningRate 0.0877   Epoch: 9   Global Step: 100860   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:38:32,157-Speed 5509.12 samples/sec   Loss 5.7636   LearningRate 0.0877   Epoch: 9   Global Step: 100870   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:38:39,647-Speed 5470.08 samples/sec   Loss 5.7297   LearningRate 0.0877   Epoch: 9   Global Step: 100880   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:38:47,200-Speed 5423.52 samples/sec   Loss 5.7602   LearningRate 0.0877   Epoch: 9   Global Step: 100890   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:38:54,729-Speed 5440.57 samples/sec   Loss 5.6918   LearningRate 0.0876   Epoch: 9   Global Step: 100900   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:39:02,231-Speed 5461.10 samples/sec   Loss 5.7572   LearningRate 0.0876   Epoch: 9   Global Step: 100910   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:39:09,774-Speed 5430.69 samples/sec   Loss 5.7683   LearningRate 0.0876   Epoch: 9   Global Step: 100920   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:39:17,322-Speed 5427.87 samples/sec   Loss 5.6999   LearningRate 0.0876   Epoch: 9   Global Step: 100930   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:39:24,891-Speed 5411.97 samples/sec   Loss 5.7674   LearningRate 0.0876   Epoch: 9   Global Step: 100940   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:39:32,444-Speed 5423.33 samples/sec   Loss 5.7512   LearningRate 0.0876   Epoch: 9   Global Step: 100950   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:39:40,152-Speed 5315.05 samples/sec   Loss 5.7340   LearningRate 0.0875   Epoch: 9   Global Step: 100960   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:39:47,632-Speed 5476.38 samples/sec   Loss 5.7369   LearningRate 0.0875   Epoch: 9   Global Step: 100970   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:39:55,080-Speed 5500.51 samples/sec   Loss 5.7360   LearningRate 0.0875   Epoch: 9   Global Step: 100980   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:40:02,548-Speed 5485.01 samples/sec   Loss 5.7116   LearningRate 0.0875   Epoch: 9   Global Step: 100990   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:40:10,059-Speed 5454.22 samples/sec   Loss 5.7786   LearningRate 0.0875   Epoch: 9   Global Step: 101000   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:40:17,560-Speed 5461.54 samples/sec   Loss 5.7825   LearningRate 0.0875   Epoch: 9   Global Step: 101010   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:40:24,986-Speed 5516.39 samples/sec   Loss 5.7358   LearningRate 0.0874   Epoch: 9   Global Step: 101020   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:40:32,604-Speed 5377.10 samples/sec   Loss 5.7500   LearningRate 0.0874   Epoch: 9   Global Step: 101030   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:40:40,162-Speed 5420.57 samples/sec   Loss 5.7810   LearningRate 0.0874   Epoch: 9   Global Step: 101040   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:40:47,727-Speed 5415.14 samples/sec   Loss 5.7946   LearningRate 0.0874   Epoch: 9   Global Step: 101050   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:40:55,251-Speed 5444.28 samples/sec   Loss 5.7353   LearningRate 0.0874   Epoch: 9   Global Step: 101060   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:41:02,670-Speed 5521.65 samples/sec   Loss 5.7430   LearningRate 0.0874   Epoch: 9   Global Step: 101070   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:41:10,167-Speed 5464.03 samples/sec   Loss 5.7477   LearningRate 0.0873   Epoch: 9   Global Step: 101080   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:41:17,666-Speed 5462.55 samples/sec   Loss 5.7138   LearningRate 0.0873   Epoch: 9   Global Step: 101090   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:41:25,209-Speed 5431.42 samples/sec   Loss 5.7114   LearningRate 0.0873   Epoch: 9   Global Step: 101100   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:41:32,716-Speed 5456.60 samples/sec   Loss 5.6809   LearningRate 0.0873   Epoch: 9   Global Step: 101110   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:41:40,216-Speed 5462.00 samples/sec   Loss 5.7440   LearningRate 0.0873   Epoch: 9   Global Step: 101120   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:41:47,788-Speed 5409.94 samples/sec   Loss 5.6606   LearningRate 0.0873   Epoch: 9   Global Step: 101130   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:41:55,303-Speed 5451.23 samples/sec   Loss 5.7451   LearningRate 0.0872   Epoch: 9   Global Step: 101140   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:42:02,788-Speed 5473.26 samples/sec   Loss 5.7265   LearningRate 0.0872   Epoch: 9   Global Step: 101150   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:42:10,287-Speed 5462.81 samples/sec   Loss 5.7281   LearningRate 0.0872   Epoch: 9   Global Step: 101160   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:42:17,797-Speed 5454.96 samples/sec   Loss 5.7071   LearningRate 0.0872   Epoch: 9   Global Step: 101170   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:42:25,310-Speed 5452.55 samples/sec   Loss 5.7605   LearningRate 0.0872   Epoch: 9   Global Step: 101180   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:42:32,764-Speed 5495.16 samples/sec   Loss 5.8253   LearningRate 0.0872   Epoch: 9   Global Step: 101190   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:42:40,282-Speed 5449.77 samples/sec   Loss 5.7700   LearningRate 0.0871   Epoch: 9   Global Step: 101200   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:42:47,773-Speed 5468.39 samples/sec   Loss 5.6165   LearningRate 0.0871   Epoch: 9   Global Step: 101210   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:42:55,194-Speed 5519.68 samples/sec   Loss 5.7988   LearningRate 0.0871   Epoch: 9   Global Step: 101220   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:43:02,613-Speed 5522.02 samples/sec   Loss 5.6884   LearningRate 0.0871   Epoch: 9   Global Step: 101230   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:43:10,114-Speed 5461.47 samples/sec   Loss 5.7416   LearningRate 0.0871   Epoch: 9   Global Step: 101240   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:43:17,714-Speed 5389.76 samples/sec   Loss 5.7376   LearningRate 0.0871   Epoch: 9   Global Step: 101250   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:43:25,210-Speed 5464.96 samples/sec   Loss 5.7482   LearningRate 0.0870   Epoch: 9   Global Step: 101260   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:43:32,807-Speed 5392.83 samples/sec   Loss 5.7026   LearningRate 0.0870   Epoch: 9   Global Step: 101270   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 17:43:40,289-Speed 5474.99 samples/sec   Loss 5.8012   LearningRate 0.0870   Epoch: 9   Global Step: 101280   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:43:47,715-Speed 5516.34 samples/sec   Loss 5.7853   LearningRate 0.0870   Epoch: 9   Global Step: 101290   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:43:55,197-Speed 5475.19 samples/sec   Loss 5.7274   LearningRate 0.0870   Epoch: 9   Global Step: 101300   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:44:02,776-Speed 5405.41 samples/sec   Loss 5.7286   LearningRate 0.0870   Epoch: 9   Global Step: 101310   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:44:10,432-Speed 5350.70 samples/sec   Loss 5.6982   LearningRate 0.0869   Epoch: 9   Global Step: 101320   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:44:17,981-Speed 5426.72 samples/sec   Loss 5.7659   LearningRate 0.0869   Epoch: 9   Global Step: 101330   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:44:25,619-Speed 5363.11 samples/sec   Loss 5.7990   LearningRate 0.0869   Epoch: 9   Global Step: 101340   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:44:33,217-Speed 5391.94 samples/sec   Loss 5.8025   LearningRate 0.0869   Epoch: 9   Global Step: 101350   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:44:41,097-Speed 5198.68 samples/sec   Loss 5.7914   LearningRate 0.0869   Epoch: 9   Global Step: 101360   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:44:48,594-Speed 5463.87 samples/sec   Loss 5.7716   LearningRate 0.0869   Epoch: 9   Global Step: 101370   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:44:56,098-Speed 5459.51 samples/sec   Loss 5.7501   LearningRate 0.0868   Epoch: 9   Global Step: 101380   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:45:03,677-Speed 5405.28 samples/sec   Loss 5.7961   LearningRate 0.0868   Epoch: 9   Global Step: 101390   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:45:11,121-Speed 5503.11 samples/sec   Loss 5.7766   LearningRate 0.0868   Epoch: 9   Global Step: 101400   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:45:18,687-Speed 5414.53 samples/sec   Loss 5.7821   LearningRate 0.0868   Epoch: 9   Global Step: 101410   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:45:26,257-Speed 5411.22 samples/sec   Loss 5.7537   LearningRate 0.0868   Epoch: 9   Global Step: 101420   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:45:33,777-Speed 5447.67 samples/sec   Loss 5.7129   LearningRate 0.0868   Epoch: 9   Global Step: 101430   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:45:41,440-Speed 5345.90 samples/sec   Loss 5.6986   LearningRate 0.0867   Epoch: 9   Global Step: 101440   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:45:49,038-Speed 5391.95 samples/sec   Loss 5.6346   LearningRate 0.0867   Epoch: 9   Global Step: 101450   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:45:56,576-Speed 5434.65 samples/sec   Loss 5.7292   LearningRate 0.0867   Epoch: 9   Global Step: 101460   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:46:04,099-Speed 5444.95 samples/sec   Loss 5.6616   LearningRate 0.0867   Epoch: 9   Global Step: 101470   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:46:11,654-Speed 5422.70 samples/sec   Loss 5.8135   LearningRate 0.0867   Epoch: 9   Global Step: 101480   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:46:19,171-Speed 5449.36 samples/sec   Loss 5.7115   LearningRate 0.0867   Epoch: 9   Global Step: 101490   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:46:26,715-Speed 5430.32 samples/sec   Loss 5.7351   LearningRate 0.0866   Epoch: 9   Global Step: 101500   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:46:34,189-Speed 5481.31 samples/sec   Loss 5.7021   LearningRate 0.0866   Epoch: 9   Global Step: 101510   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:46:41,649-Speed 5491.74 samples/sec   Loss 5.6848   LearningRate 0.0866   Epoch: 9   Global Step: 101520   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:46:49,202-Speed 5423.71 samples/sec   Loss 5.7480   LearningRate 0.0866   Epoch: 9   Global Step: 101530   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:46:56,671-Speed 5484.33 samples/sec   Loss 5.7501   LearningRate 0.0866   Epoch: 9   Global Step: 101540   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:47:04,118-Speed 5500.81 samples/sec   Loss 5.7275   LearningRate 0.0866   Epoch: 9   Global Step: 101550   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:47:11,609-Speed 5468.93 samples/sec   Loss 5.7392   LearningRate 0.0866   Epoch: 9   Global Step: 101560   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:47:19,189-Speed 5404.42 samples/sec   Loss 5.6881   LearningRate 0.0865   Epoch: 9   Global Step: 101570   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:47:26,780-Speed 5396.57 samples/sec   Loss 5.6860   LearningRate 0.0865   Epoch: 9   Global Step: 101580   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:47:34,396-Speed 5378.36 samples/sec   Loss 5.7010   LearningRate 0.0865   Epoch: 9   Global Step: 101590   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:47:41,869-Speed 5481.80 samples/sec   Loss 5.6904   LearningRate 0.0865   Epoch: 9   Global Step: 101600   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:47:49,457-Speed 5398.88 samples/sec   Loss 5.7035   LearningRate 0.0865   Epoch: 9   Global Step: 101610   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:47:56,912-Speed 5494.87 samples/sec   Loss 5.6881   LearningRate 0.0865   Epoch: 9   Global Step: 101620   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:48:04,436-Speed 5444.56 samples/sec   Loss 5.7761   LearningRate 0.0864   Epoch: 9   Global Step: 101630   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:48:11,876-Speed 5506.13 samples/sec   Loss 5.6763   LearningRate 0.0864   Epoch: 9   Global Step: 101640   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:48:19,355-Speed 5477.88 samples/sec   Loss 5.7009   LearningRate 0.0864   Epoch: 9   Global Step: 101650   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:48:26,859-Speed 5458.75 samples/sec   Loss 5.6750   LearningRate 0.0864   Epoch: 9   Global Step: 101660   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:48:34,345-Speed 5471.91 samples/sec   Loss 5.7142   LearningRate 0.0864   Epoch: 9   Global Step: 101670   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:48:41,903-Speed 5420.31 samples/sec   Loss 5.7290   LearningRate 0.0864   Epoch: 9   Global Step: 101680   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:48:49,427-Speed 5444.65 samples/sec   Loss 5.7698   LearningRate 0.0863   Epoch: 9   Global Step: 101690   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:48:56,912-Speed 5473.27 samples/sec   Loss 5.7148   LearningRate 0.0863   Epoch: 9   Global Step: 101700   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:49:04,475-Speed 5416.24 samples/sec   Loss 5.7508   LearningRate 0.0863   Epoch: 9   Global Step: 101710   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:49:11,984-Speed 5455.57 samples/sec   Loss 5.6730   LearningRate 0.0863   Epoch: 9   Global Step: 101720   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:49:19,536-Speed 5424.17 samples/sec   Loss 5.7083   LearningRate 0.0863   Epoch: 9   Global Step: 101730   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:49:26,953-Speed 5522.90 samples/sec   Loss 5.7781   LearningRate 0.0863   Epoch: 9   Global Step: 101740   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:49:34,472-Speed 5448.14 samples/sec   Loss 5.6791   LearningRate 0.0862   Epoch: 9   Global Step: 101750   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:49:41,939-Speed 5486.33 samples/sec   Loss 5.7285   LearningRate 0.0862   Epoch: 9   Global Step: 101760   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:49:49,459-Speed 5447.93 samples/sec   Loss 5.6689   LearningRate 0.0862   Epoch: 9   Global Step: 101770   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:49:56,926-Speed 5486.33 samples/sec   Loss 5.7153   LearningRate 0.0862   Epoch: 9   Global Step: 101780   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:50:04,434-Speed 5456.08 samples/sec   Loss 5.7251   LearningRate 0.0862   Epoch: 9   Global Step: 101790   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:50:12,016-Speed 5402.89 samples/sec   Loss 5.6699   LearningRate 0.0862   Epoch: 9   Global Step: 101800   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:50:19,525-Speed 5455.94 samples/sec   Loss 5.7279   LearningRate 0.0861   Epoch: 9   Global Step: 101810   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:50:27,097-Speed 5409.75 samples/sec   Loss 5.6996   LearningRate 0.0861   Epoch: 9   Global Step: 101820   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:50:34,662-Speed 5414.63 samples/sec   Loss 5.7623   LearningRate 0.0861   Epoch: 9   Global Step: 101830   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:50:42,240-Speed 5406.48 samples/sec   Loss 5.7483   LearningRate 0.0861   Epoch: 9   Global Step: 101840   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:50:49,776-Speed 5435.92 samples/sec   Loss 5.6840   LearningRate 0.0861   Epoch: 9   Global Step: 101850   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:50:57,319-Speed 5430.87 samples/sec   Loss 5.6871   LearningRate 0.0861   Epoch: 9   Global Step: 101860   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:51:04,918-Speed 5390.69 samples/sec   Loss 5.7051   LearningRate 0.0860   Epoch: 9   Global Step: 101870   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 17:51:12,398-Speed 5476.18 samples/sec   Loss 5.6924   LearningRate 0.0860   Epoch: 9   Global Step: 101880   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:51:19,925-Speed 5443.31 samples/sec   Loss 5.7198   LearningRate 0.0860   Epoch: 9   Global Step: 101890   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:51:27,435-Speed 5454.15 samples/sec   Loss 5.6691   LearningRate 0.0860   Epoch: 9   Global Step: 101900   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:51:34,932-Speed 5464.33 samples/sec   Loss 5.6991   LearningRate 0.0860   Epoch: 9   Global Step: 101910   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:51:42,447-Speed 5450.60 samples/sec   Loss 5.6906   LearningRate 0.0860   Epoch: 9   Global Step: 101920   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:51:49,937-Speed 5469.79 samples/sec   Loss 5.6929   LearningRate 0.0859   Epoch: 9   Global Step: 101930   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:51:57,461-Speed 5445.24 samples/sec   Loss 5.6833   LearningRate 0.0859   Epoch: 9   Global Step: 101940   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:52:04,949-Speed 5469.97 samples/sec   Loss 5.7172   LearningRate 0.0859   Epoch: 9   Global Step: 101950   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:52:12,482-Speed 5437.92 samples/sec   Loss 5.6871   LearningRate 0.0859   Epoch: 9   Global Step: 101960   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:52:19,944-Speed 5490.27 samples/sec   Loss 5.7098   LearningRate 0.0859   Epoch: 9   Global Step: 101970   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:52:27,426-Speed 5475.50 samples/sec   Loss 5.7178   LearningRate 0.0859   Epoch: 9   Global Step: 101980   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:52:35,111-Speed 5330.41 samples/sec   Loss 5.7218   LearningRate 0.0858   Epoch: 9   Global Step: 101990   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:52:42,680-Speed 5412.20 samples/sec   Loss 5.7095   LearningRate 0.0858   Epoch: 9   Global Step: 102000   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:53:26,276-[lfw][102000]XNorm: 24.103609
Training: 2022-01-08 17:53:26,277-[lfw][102000]Accuracy-Flip: 0.99783+-0.00279
Training: 2022-01-08 17:53:26,277-[lfw][102000]Accuracy-Highest: 0.99817
Training: 2022-01-08 17:54:18,050-[cfp_fp][102000]XNorm: 22.032256
Training: 2022-01-08 17:54:18,051-[cfp_fp][102000]Accuracy-Flip: 0.98843+-0.00555
Training: 2022-01-08 17:54:18,052-[cfp_fp][102000]Accuracy-Highest: 0.98914
Training: 2022-01-08 17:55:03,544-[agedb_30][102000]XNorm: 24.000977
Training: 2022-01-08 17:55:03,545-[agedb_30][102000]Accuracy-Flip: 0.97917+-0.00696
Training: 2022-01-08 17:55:03,545-[agedb_30][102000]Accuracy-Highest: 0.97917
Training: 2022-01-08 17:55:11,111-Speed 275.96 samples/sec   Loss 5.7522   LearningRate 0.0858   Epoch: 9   Global Step: 102010   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:55:18,531-Speed 5521.65 samples/sec   Loss 5.6786   LearningRate 0.0858   Epoch: 9   Global Step: 102020   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:55:26,028-Speed 5465.01 samples/sec   Loss 5.6444   LearningRate 0.0858   Epoch: 9   Global Step: 102030   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:55:33,714-Speed 5330.40 samples/sec   Loss 5.6498   LearningRate 0.0858   Epoch: 9   Global Step: 102040   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:55:41,346-Speed 5368.27 samples/sec   Loss 5.6467   LearningRate 0.0858   Epoch: 9   Global Step: 102050   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:55:49,015-Speed 5342.53 samples/sec   Loss 5.6510   LearningRate 0.0857   Epoch: 9   Global Step: 102060   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:55:56,486-Speed 5483.61 samples/sec   Loss 5.6760   LearningRate 0.0857   Epoch: 9   Global Step: 102070   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:56:03,956-Speed 5484.20 samples/sec   Loss 5.6673   LearningRate 0.0857   Epoch: 9   Global Step: 102080   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:56:11,503-Speed 5429.17 samples/sec   Loss 5.6490   LearningRate 0.0857   Epoch: 9   Global Step: 102090   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:56:19,056-Speed 5423.75 samples/sec   Loss 5.6935   LearningRate 0.0857   Epoch: 9   Global Step: 102100   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:56:26,714-Speed 5350.18 samples/sec   Loss 5.7478   LearningRate 0.0857   Epoch: 9   Global Step: 102110   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:56:34,284-Speed 5411.94 samples/sec   Loss 5.6988   LearningRate 0.0856   Epoch: 9   Global Step: 102120   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:56:41,860-Speed 5407.84 samples/sec   Loss 5.7294   LearningRate 0.0856   Epoch: 9   Global Step: 102130   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:56:49,475-Speed 5379.75 samples/sec   Loss 5.7020   LearningRate 0.0856   Epoch: 9   Global Step: 102140   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:56:56,941-Speed 5487.15 samples/sec   Loss 5.7080   LearningRate 0.0856   Epoch: 9   Global Step: 102150   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:57:04,430-Speed 5470.23 samples/sec   Loss 5.6420   LearningRate 0.0856   Epoch: 9   Global Step: 102160   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:57:11,868-Speed 5507.55 samples/sec   Loss 5.7484   LearningRate 0.0856   Epoch: 9   Global Step: 102170   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:57:19,356-Speed 5470.57 samples/sec   Loss 5.7124   LearningRate 0.0855   Epoch: 9   Global Step: 102180   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:57:26,873-Speed 5449.35 samples/sec   Loss 5.7272   LearningRate 0.0855   Epoch: 9   Global Step: 102190   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:57:34,329-Speed 5494.16 samples/sec   Loss 5.7271   LearningRate 0.0855   Epoch: 9   Global Step: 102200   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:57:41,784-Speed 5495.13 samples/sec   Loss 5.6731   LearningRate 0.0855   Epoch: 9   Global Step: 102210   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:57:49,320-Speed 5436.64 samples/sec   Loss 5.6646   LearningRate 0.0855   Epoch: 9   Global Step: 102220   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:57:56,900-Speed 5403.83 samples/sec   Loss 5.7085   LearningRate 0.0855   Epoch: 9   Global Step: 102230   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:58:04,397-Speed 5464.55 samples/sec   Loss 5.6535   LearningRate 0.0854   Epoch: 9   Global Step: 102240   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:58:11,876-Speed 5477.20 samples/sec   Loss 5.6595   LearningRate 0.0854   Epoch: 9   Global Step: 102250   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 17:58:19,383-Speed 5457.11 samples/sec   Loss 5.6643   LearningRate 0.0854   Epoch: 9   Global Step: 102260   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:58:26,834-Speed 5498.05 samples/sec   Loss 5.6957   LearningRate 0.0854   Epoch: 9   Global Step: 102270   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:58:34,263-Speed 5513.85 samples/sec   Loss 5.7024   LearningRate 0.0854   Epoch: 9   Global Step: 102280   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:58:41,755-Speed 5468.28 samples/sec   Loss 5.6987   LearningRate 0.0854   Epoch: 9   Global Step: 102290   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:58:49,187-Speed 5511.69 samples/sec   Loss 5.6870   LearningRate 0.0853   Epoch: 9   Global Step: 102300   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:58:56,638-Speed 5498.15 samples/sec   Loss 5.6716   LearningRate 0.0853   Epoch: 9   Global Step: 102310   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:59:04,135-Speed 5464.07 samples/sec   Loss 5.7614   LearningRate 0.0853   Epoch: 9   Global Step: 102320   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:59:11,572-Speed 5508.65 samples/sec   Loss 5.6735   LearningRate 0.0853   Epoch: 9   Global Step: 102330   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:59:19,098-Speed 5442.75 samples/sec   Loss 5.6487   LearningRate 0.0853   Epoch: 9   Global Step: 102340   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:59:26,651-Speed 5423.93 samples/sec   Loss 5.6544   LearningRate 0.0853   Epoch: 9   Global Step: 102350   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:59:34,078-Speed 5516.18 samples/sec   Loss 5.7238   LearningRate 0.0852   Epoch: 9   Global Step: 102360   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:59:41,616-Speed 5434.40 samples/sec   Loss 5.7454   LearningRate 0.0852   Epoch: 9   Global Step: 102370   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:59:49,042-Speed 5516.98 samples/sec   Loss 5.6866   LearningRate 0.0852   Epoch: 9   Global Step: 102380   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 17:59:56,533-Speed 5468.17 samples/sec   Loss 5.6761   LearningRate 0.0852   Epoch: 9   Global Step: 102390   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:00:04,037-Speed 5459.03 samples/sec   Loss 5.6651   LearningRate 0.0852   Epoch: 9   Global Step: 102400   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:00:11,575-Speed 5434.92 samples/sec   Loss 5.6831   LearningRate 0.0852   Epoch: 9   Global Step: 102410   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:00:19,098-Speed 5445.87 samples/sec   Loss 5.6723   LearningRate 0.0852   Epoch: 9   Global Step: 102420   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:00:26,782-Speed 5331.20 samples/sec   Loss 5.6937   LearningRate 0.0851   Epoch: 9   Global Step: 102430   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:00:34,281-Speed 5462.64 samples/sec   Loss 5.6690   LearningRate 0.0851   Epoch: 9   Global Step: 102440   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:00:41,820-Speed 5433.64 samples/sec   Loss 5.6272   LearningRate 0.0851   Epoch: 9   Global Step: 102450   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:00:49,257-Speed 5508.63 samples/sec   Loss 5.6425   LearningRate 0.0851   Epoch: 9   Global Step: 102460   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:00:56,704-Speed 5501.09 samples/sec   Loss 5.6892   LearningRate 0.0851   Epoch: 9   Global Step: 102470   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:01:04,185-Speed 5475.79 samples/sec   Loss 5.6646   LearningRate 0.0851   Epoch: 9   Global Step: 102480   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:01:11,673-Speed 5470.66 samples/sec   Loss 5.6864   LearningRate 0.0850   Epoch: 9   Global Step: 102490   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:01:19,186-Speed 5452.74 samples/sec   Loss 5.7045   LearningRate 0.0850   Epoch: 9   Global Step: 102500   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:01:26,754-Speed 5412.67 samples/sec   Loss 5.6240   LearningRate 0.0850   Epoch: 9   Global Step: 102510   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:01:34,267-Speed 5452.93 samples/sec   Loss 5.6552   LearningRate 0.0850   Epoch: 9   Global Step: 102520   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:01:41,867-Speed 5390.36 samples/sec   Loss 5.6484   LearningRate 0.0850   Epoch: 9   Global Step: 102530   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:01:49,383-Speed 5450.38 samples/sec   Loss 5.6957   LearningRate 0.0850   Epoch: 9   Global Step: 102540   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:01:56,873-Speed 5469.54 samples/sec   Loss 5.6569   LearningRate 0.0849   Epoch: 9   Global Step: 102550   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:02:04,459-Speed 5399.84 samples/sec   Loss 5.6547   LearningRate 0.0849   Epoch: 9   Global Step: 102560   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:02:11,942-Speed 5474.58 samples/sec   Loss 5.7073   LearningRate 0.0849   Epoch: 9   Global Step: 102570   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:02:19,372-Speed 5513.35 samples/sec   Loss 5.6542   LearningRate 0.0849   Epoch: 9   Global Step: 102580   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:02:26,886-Speed 5452.16 samples/sec   Loss 5.6771   LearningRate 0.0849   Epoch: 9   Global Step: 102590   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:02:34,438-Speed 5424.58 samples/sec   Loss 5.6152   LearningRate 0.0849   Epoch: 9   Global Step: 102600   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:02:44,618-Speed 4024.03 samples/sec   Loss 5.6807   LearningRate 0.0848   Epoch: 9   Global Step: 102610   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:02:52,108-Speed 5469.12 samples/sec   Loss 5.6445   LearningRate 0.0848   Epoch: 9   Global Step: 102620   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:02:59,619-Speed 5454.38 samples/sec   Loss 5.6729   LearningRate 0.0848   Epoch: 9   Global Step: 102630   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:03:07,114-Speed 5464.95 samples/sec   Loss 5.6096   LearningRate 0.0848   Epoch: 9   Global Step: 102640   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:03:14,667-Speed 5424.18 samples/sec   Loss 5.6540   LearningRate 0.0848   Epoch: 9   Global Step: 102650   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:03:22,252-Speed 5400.97 samples/sec   Loss 5.6687   LearningRate 0.0848   Epoch: 9   Global Step: 102660   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:03:29,863-Speed 5382.14 samples/sec   Loss 5.6943   LearningRate 0.0847   Epoch: 9   Global Step: 102670   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:03:37,331-Speed 5485.43 samples/sec   Loss 5.6968   LearningRate 0.0847   Epoch: 9   Global Step: 102680   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:03:44,804-Speed 5481.24 samples/sec   Loss 5.6851   LearningRate 0.0847   Epoch: 9   Global Step: 102690   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:03:52,343-Speed 5434.23 samples/sec   Loss 5.6804   LearningRate 0.0847   Epoch: 9   Global Step: 102700   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:03:59,874-Speed 5439.84 samples/sec   Loss 5.6837   LearningRate 0.0847   Epoch: 9   Global Step: 102710   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:04:07,439-Speed 5414.52 samples/sec   Loss 5.6949   LearningRate 0.0847   Epoch: 9   Global Step: 102720   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:04:14,899-Speed 5491.10 samples/sec   Loss 5.6358   LearningRate 0.0846   Epoch: 9   Global Step: 102730   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:04:22,394-Speed 5466.11 samples/sec   Loss 5.7054   LearningRate 0.0846   Epoch: 9   Global Step: 102740   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:04:29,864-Speed 5483.94 samples/sec   Loss 5.6857   LearningRate 0.0846   Epoch: 9   Global Step: 102750   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:04:37,344-Speed 5476.35 samples/sec   Loss 5.6354   LearningRate 0.0846   Epoch: 9   Global Step: 102760   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:04:44,812-Speed 5485.47 samples/sec   Loss 5.6796   LearningRate 0.0846   Epoch: 9   Global Step: 102770   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:04:52,289-Speed 5479.05 samples/sec   Loss 5.6391   LearningRate 0.0846   Epoch: 9   Global Step: 102780   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:04:59,743-Speed 5496.20 samples/sec   Loss 5.6108   LearningRate 0.0846   Epoch: 9   Global Step: 102790   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:05:07,270-Speed 5442.34 samples/sec   Loss 5.6510   LearningRate 0.0845   Epoch: 9   Global Step: 102800   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:05:14,731-Speed 5490.55 samples/sec   Loss 5.6588   LearningRate 0.0845   Epoch: 9   Global Step: 102810   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:05:22,215-Speed 5474.20 samples/sec   Loss 5.6370   LearningRate 0.0845   Epoch: 9   Global Step: 102820   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:05:29,785-Speed 5411.31 samples/sec   Loss 5.5982   LearningRate 0.0845   Epoch: 9   Global Step: 102830   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:05:37,303-Speed 5449.07 samples/sec   Loss 5.6151   LearningRate 0.0845   Epoch: 9   Global Step: 102840   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:05:44,736-Speed 5511.16 samples/sec   Loss 5.6696   LearningRate 0.0845   Epoch: 9   Global Step: 102850   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:05:52,279-Speed 5431.04 samples/sec   Loss 5.6185   LearningRate 0.0844   Epoch: 9   Global Step: 102860   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:05:59,776-Speed 5464.00 samples/sec   Loss 5.6548   LearningRate 0.0844   Epoch: 9   Global Step: 102870   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:06:07,283-Speed 5456.69 samples/sec   Loss 5.6607   LearningRate 0.0844   Epoch: 9   Global Step: 102880   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:06:14,746-Speed 5489.54 samples/sec   Loss 5.6702   LearningRate 0.0844   Epoch: 9   Global Step: 102890   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:06:22,237-Speed 5468.27 samples/sec   Loss 5.6704   LearningRate 0.0844   Epoch: 9   Global Step: 102900   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:06:29,701-Speed 5488.86 samples/sec   Loss 5.6638   LearningRate 0.0844   Epoch: 9   Global Step: 102910   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:06:37,153-Speed 5497.42 samples/sec   Loss 5.6181   LearningRate 0.0843   Epoch: 9   Global Step: 102920   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:06:44,716-Speed 5416.38 samples/sec   Loss 5.6114   LearningRate 0.0843   Epoch: 9   Global Step: 102930   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:06:52,245-Speed 5440.65 samples/sec   Loss 5.6460   LearningRate 0.0843   Epoch: 9   Global Step: 102940   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:06:59,842-Speed 5392.62 samples/sec   Loss 5.6444   LearningRate 0.0843   Epoch: 9   Global Step: 102950   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:07:07,352-Speed 5454.56 samples/sec   Loss 5.6855   LearningRate 0.0843   Epoch: 9   Global Step: 102960   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:07:14,877-Speed 5444.55 samples/sec   Loss 5.6507   LearningRate 0.0843   Epoch: 9   Global Step: 102970   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:07:22,381-Speed 5458.75 samples/sec   Loss 5.5752   LearningRate 0.0842   Epoch: 9   Global Step: 102980   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:07:29,926-Speed 5429.50 samples/sec   Loss 5.7149   LearningRate 0.0842   Epoch: 9   Global Step: 102990   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:07:37,530-Speed 5387.84 samples/sec   Loss 5.6278   LearningRate 0.0842   Epoch: 9   Global Step: 103000   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:07:45,380-Speed 5218.01 samples/sec   Loss 5.6564   LearningRate 0.0842   Epoch: 9   Global Step: 103010   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:07:52,924-Speed 5430.10 samples/sec   Loss 5.6771   LearningRate 0.0842   Epoch: 9   Global Step: 103020   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:08:00,515-Speed 5396.93 samples/sec   Loss 5.6344   LearningRate 0.0842   Epoch: 9   Global Step: 103030   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:08:07,970-Speed 5495.09 samples/sec   Loss 5.6440   LearningRate 0.0841   Epoch: 9   Global Step: 103040   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:08:15,505-Speed 5436.48 samples/sec   Loss 5.6250   LearningRate 0.0841   Epoch: 9   Global Step: 103050   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:08:22,949-Speed 5503.36 samples/sec   Loss 5.6701   LearningRate 0.0841   Epoch: 9   Global Step: 103060   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:08:30,447-Speed 5463.84 samples/sec   Loss 5.6479   LearningRate 0.0841   Epoch: 9   Global Step: 103070   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:08:37,983-Speed 5435.52 samples/sec   Loss 5.6079   LearningRate 0.0841   Epoch: 9   Global Step: 103080   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 18:08:45,530-Speed 5428.14 samples/sec   Loss 5.6394   LearningRate 0.0841   Epoch: 9   Global Step: 103090   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:08:53,225-Speed 5323.91 samples/sec   Loss 5.6251   LearningRate 0.0841   Epoch: 9   Global Step: 103100   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:09:00,889-Speed 5345.53 samples/sec   Loss 5.5571   LearningRate 0.0840   Epoch: 9   Global Step: 103110   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:09:08,422-Speed 5438.12 samples/sec   Loss 5.5914   LearningRate 0.0840   Epoch: 9   Global Step: 103120   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:09:15,947-Speed 5443.24 samples/sec   Loss 5.5792   LearningRate 0.0840   Epoch: 9   Global Step: 103130   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:09:23,422-Speed 5480.36 samples/sec   Loss 5.6150   LearningRate 0.0840   Epoch: 9   Global Step: 103140   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:09:30,918-Speed 5465.92 samples/sec   Loss 5.5765   LearningRate 0.0840   Epoch: 9   Global Step: 103150   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:09:38,387-Speed 5484.16 samples/sec   Loss 5.6111   LearningRate 0.0840   Epoch: 9   Global Step: 103160   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:09:45,837-Speed 5498.87 samples/sec   Loss 5.5847   LearningRate 0.0839   Epoch: 9   Global Step: 103170   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:09:53,387-Speed 5425.90 samples/sec   Loss 5.6061   LearningRate 0.0839   Epoch: 9   Global Step: 103180   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:10:00,859-Speed 5483.08 samples/sec   Loss 5.6746   LearningRate 0.0839   Epoch: 9   Global Step: 103190   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:10:08,379-Speed 5446.72 samples/sec   Loss 5.5864   LearningRate 0.0839   Epoch: 9   Global Step: 103200   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:10:15,854-Speed 5480.66 samples/sec   Loss 5.6502   LearningRate 0.0839   Epoch: 9   Global Step: 103210   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:10:23,340-Speed 5472.41 samples/sec   Loss 5.6257   LearningRate 0.0839   Epoch: 9   Global Step: 103220   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:10:30,949-Speed 5383.82 samples/sec   Loss 5.6793   LearningRate 0.0838   Epoch: 9   Global Step: 103230   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:10:38,488-Speed 5433.93 samples/sec   Loss 5.6736   LearningRate 0.0838   Epoch: 9   Global Step: 103240   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:10:45,967-Speed 5477.10 samples/sec   Loss 5.6382   LearningRate 0.0838   Epoch: 9   Global Step: 103250   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:10:53,436-Speed 5485.06 samples/sec   Loss 5.6396   LearningRate 0.0838   Epoch: 9   Global Step: 103260   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:11:01,053-Speed 5378.14 samples/sec   Loss 5.7037   LearningRate 0.0838   Epoch: 9   Global Step: 103270   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:11:08,644-Speed 5396.93 samples/sec   Loss 5.6258   LearningRate 0.0838   Epoch: 9   Global Step: 103280   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:11:16,215-Speed 5410.31 samples/sec   Loss 5.6692   LearningRate 0.0837   Epoch: 9   Global Step: 103290   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:11:23,778-Speed 5416.72 samples/sec   Loss 5.6434   LearningRate 0.0837   Epoch: 9   Global Step: 103300   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:11:31,284-Speed 5457.52 samples/sec   Loss 5.5981   LearningRate 0.0837   Epoch: 9   Global Step: 103310   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:11:38,742-Speed 5493.34 samples/sec   Loss 5.6176   LearningRate 0.0837   Epoch: 9   Global Step: 103320   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:11:46,231-Speed 5469.62 samples/sec   Loss 5.5807   LearningRate 0.0837   Epoch: 9   Global Step: 103330   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:11:53,732-Speed 5460.99 samples/sec   Loss 5.6416   LearningRate 0.0837   Epoch: 9   Global Step: 103340   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:12:01,204-Speed 5483.07 samples/sec   Loss 5.5994   LearningRate 0.0836   Epoch: 9   Global Step: 103350   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:12:08,860-Speed 5351.10 samples/sec   Loss 5.6299   LearningRate 0.0836   Epoch: 9   Global Step: 103360   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:12:16,354-Speed 5465.73 samples/sec   Loss 5.7116   LearningRate 0.0836   Epoch: 9   Global Step: 103370   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:12:24,031-Speed 5335.93 samples/sec   Loss 5.6290   LearningRate 0.0836   Epoch: 9   Global Step: 103380   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:12:31,708-Speed 5336.44 samples/sec   Loss 5.6502   LearningRate 0.0836   Epoch: 9   Global Step: 103390   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:12:39,444-Speed 5295.55 samples/sec   Loss 5.6518   LearningRate 0.0836   Epoch: 9   Global Step: 103400   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:12:46,942-Speed 5463.17 samples/sec   Loss 5.6181   LearningRate 0.0836   Epoch: 9   Global Step: 103410   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:12:54,688-Speed 5288.82 samples/sec   Loss 5.6528   LearningRate 0.0835   Epoch: 9   Global Step: 103420   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:13:02,172-Speed 5473.89 samples/sec   Loss 5.6303   LearningRate 0.0835   Epoch: 9   Global Step: 103430   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:13:09,706-Speed 5437.66 samples/sec   Loss 5.6127   LearningRate 0.0835   Epoch: 9   Global Step: 103440   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:13:19,553-Speed 4159.70 samples/sec   Loss 5.5219   LearningRate 0.0835   Epoch: 9   Global Step: 103450   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:13:27,084-Speed 5439.73 samples/sec   Loss 5.5856   LearningRate 0.0835   Epoch: 9   Global Step: 103460   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:13:34,603-Speed 5448.33 samples/sec   Loss 5.6498   LearningRate 0.0835   Epoch: 9   Global Step: 103470   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:13:42,033-Speed 5513.36 samples/sec   Loss 5.6369   LearningRate 0.0834   Epoch: 9   Global Step: 103480   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:13:49,689-Speed 5350.88 samples/sec   Loss 5.6055   LearningRate 0.0834   Epoch: 9   Global Step: 103490   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:13:57,283-Speed 5394.98 samples/sec   Loss 5.6486   LearningRate 0.0834   Epoch: 9   Global Step: 103500   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:14:04,771-Speed 5471.02 samples/sec   Loss 5.6213   LearningRate 0.0834   Epoch: 9   Global Step: 103510   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:14:12,373-Speed 5388.36 samples/sec   Loss 5.5850   LearningRate 0.0834   Epoch: 9   Global Step: 103520   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:14:19,828-Speed 5495.25 samples/sec   Loss 5.5870   LearningRate 0.0834   Epoch: 9   Global Step: 103530   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:14:27,341-Speed 5452.88 samples/sec   Loss 5.6132   LearningRate 0.0833   Epoch: 9   Global Step: 103540   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:14:34,883-Speed 5431.40 samples/sec   Loss 5.6921   LearningRate 0.0833   Epoch: 9   Global Step: 103550   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:14:42,427-Speed 5430.64 samples/sec   Loss 5.6337   LearningRate 0.0833   Epoch: 9   Global Step: 103560   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:14:49,871-Speed 5503.16 samples/sec   Loss 5.5689   LearningRate 0.0833   Epoch: 9   Global Step: 103570   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:14:57,321-Speed 5498.31 samples/sec   Loss 5.5844   LearningRate 0.0833   Epoch: 9   Global Step: 103580   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:15:04,996-Speed 5337.89 samples/sec   Loss 5.5964   LearningRate 0.0833   Epoch: 9   Global Step: 103590   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:15:12,558-Speed 5417.30 samples/sec   Loss 5.6139   LearningRate 0.0832   Epoch: 9   Global Step: 103600   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:15:20,114-Speed 5421.36 samples/sec   Loss 5.6334   LearningRate 0.0832   Epoch: 9   Global Step: 103610   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:15:27,579-Speed 5487.83 samples/sec   Loss 5.5316   LearningRate 0.0832   Epoch: 9   Global Step: 103620   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:15:35,137-Speed 5420.75 samples/sec   Loss 5.5781   LearningRate 0.0832   Epoch: 9   Global Step: 103630   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:15:42,607-Speed 5483.50 samples/sec   Loss 5.6514   LearningRate 0.0832   Epoch: 9   Global Step: 103640   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:15:50,141-Speed 5437.22 samples/sec   Loss 5.6320   LearningRate 0.0832   Epoch: 9   Global Step: 103650   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:15:57,778-Speed 5364.41 samples/sec   Loss 5.5927   LearningRate 0.0832   Epoch: 9   Global Step: 103660   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:16:05,347-Speed 5412.32 samples/sec   Loss 5.5904   LearningRate 0.0831   Epoch: 9   Global Step: 103670   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:16:12,869-Speed 5445.99 samples/sec   Loss 5.7069   LearningRate 0.0831   Epoch: 9   Global Step: 103680   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:16:20,280-Speed 5527.60 samples/sec   Loss 5.6682   LearningRate 0.0831   Epoch: 9   Global Step: 103690   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:16:42,945-Speed 1807.30 samples/sec   Loss 5.5986   LearningRate 0.0831   Epoch: 10   Global Step: 103700   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:16:50,382-Speed 5508.03 samples/sec   Loss 5.6259   LearningRate 0.0831   Epoch: 10   Global Step: 103710   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:16:57,866-Speed 5474.37 samples/sec   Loss 5.5833   LearningRate 0.0831   Epoch: 10   Global Step: 103720   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:17:05,358-Speed 5467.64 samples/sec   Loss 5.6207   LearningRate 0.0830   Epoch: 10   Global Step: 103730   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:17:12,762-Speed 5532.85 samples/sec   Loss 5.6011   LearningRate 0.0830   Epoch: 10   Global Step: 103740   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:17:20,166-Speed 5532.48 samples/sec   Loss 5.6077   LearningRate 0.0830   Epoch: 10   Global Step: 103750   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:17:27,759-Speed 5396.07 samples/sec   Loss 5.6526   LearningRate 0.0830   Epoch: 10   Global Step: 103760   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:17:35,196-Speed 5508.50 samples/sec   Loss 5.5429   LearningRate 0.0830   Epoch: 10   Global Step: 103770   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:17:42,617-Speed 5520.05 samples/sec   Loss 5.5894   LearningRate 0.0830   Epoch: 10   Global Step: 103780   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:17:50,022-Speed 5531.81 samples/sec   Loss 5.5818   LearningRate 0.0829   Epoch: 10   Global Step: 103790   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:17:57,444-Speed 5519.83 samples/sec   Loss 5.6205   LearningRate 0.0829   Epoch: 10   Global Step: 103800   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:18:04,853-Speed 5529.31 samples/sec   Loss 5.6116   LearningRate 0.0829   Epoch: 10   Global Step: 103810   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:18:12,253-Speed 5535.96 samples/sec   Loss 5.6013   LearningRate 0.0829   Epoch: 10   Global Step: 103820   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:18:19,788-Speed 5436.42 samples/sec   Loss 5.5969   LearningRate 0.0829   Epoch: 10   Global Step: 103830   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:18:27,277-Speed 5469.83 samples/sec   Loss 5.5861   LearningRate 0.0829   Epoch: 10   Global Step: 103840   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:18:34,674-Speed 5541.23 samples/sec   Loss 5.5836   LearningRate 0.0828   Epoch: 10   Global Step: 103850   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:18:42,282-Speed 5384.50 samples/sec   Loss 5.5996   LearningRate 0.0828   Epoch: 10   Global Step: 103860   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:18:49,836-Speed 5422.80 samples/sec   Loss 5.4872   LearningRate 0.0828   Epoch: 10   Global Step: 103870   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:18:57,470-Speed 5366.51 samples/sec   Loss 5.5840   LearningRate 0.0828   Epoch: 10   Global Step: 103880   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:19:05,066-Speed 5392.81 samples/sec   Loss 5.5559   LearningRate 0.0828   Epoch: 10   Global Step: 103890   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:19:12,727-Speed 5347.42 samples/sec   Loss 5.5640   LearningRate 0.0828   Epoch: 10   Global Step: 103900   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:19:20,319-Speed 5395.48 samples/sec   Loss 5.5460   LearningRate 0.0828   Epoch: 10   Global Step: 103910   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:19:27,910-Speed 5396.77 samples/sec   Loss 5.5626   LearningRate 0.0827   Epoch: 10   Global Step: 103920   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:19:35,542-Speed 5367.96 samples/sec   Loss 5.6298   LearningRate 0.0827   Epoch: 10   Global Step: 103930   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:19:43,135-Speed 5394.61 samples/sec   Loss 5.5947   LearningRate 0.0827   Epoch: 10   Global Step: 103940   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:19:50,726-Speed 5396.74 samples/sec   Loss 5.5812   LearningRate 0.0827   Epoch: 10   Global Step: 103950   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:19:58,430-Speed 5317.37 samples/sec   Loss 5.5849   LearningRate 0.0827   Epoch: 10   Global Step: 103960   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:20:06,240-Speed 5245.06 samples/sec   Loss 5.5689   LearningRate 0.0827   Epoch: 10   Global Step: 103970   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:20:13,821-Speed 5403.90 samples/sec   Loss 5.6072   LearningRate 0.0826   Epoch: 10   Global Step: 103980   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:20:21,416-Speed 5393.71 samples/sec   Loss 5.5898   LearningRate 0.0826   Epoch: 10   Global Step: 103990   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:20:29,022-Speed 5386.11 samples/sec   Loss 5.5938   LearningRate 0.0826   Epoch: 10   Global Step: 104000   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:21:13,147-[lfw][104000]XNorm: 22.918996
Training: 2022-01-08 18:21:13,147-[lfw][104000]Accuracy-Flip: 0.99767+-0.00281
Training: 2022-01-08 18:21:13,148-[lfw][104000]Accuracy-Highest: 0.99817
Training: 2022-01-08 18:22:04,797-[cfp_fp][104000]XNorm: 20.986479
Training: 2022-01-08 18:22:04,799-[cfp_fp][104000]Accuracy-Flip: 0.98914+-0.00444
Training: 2022-01-08 18:22:04,799-[cfp_fp][104000]Accuracy-Highest: 0.98914
Training: 2022-01-08 18:22:50,294-[agedb_30][104000]XNorm: 22.826693
Training: 2022-01-08 18:22:50,295-[agedb_30][104000]Accuracy-Flip: 0.97650+-0.00555
Training: 2022-01-08 18:22:50,296-[agedb_30][104000]Accuracy-Highest: 0.97917
Training: 2022-01-08 18:22:57,920-Speed 275.09 samples/sec   Loss 5.5904   LearningRate 0.0826   Epoch: 10   Global Step: 104010   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:23:05,557-Speed 5364.75 samples/sec   Loss 5.5722   LearningRate 0.0826   Epoch: 10   Global Step: 104020   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:23:13,075-Speed 5449.96 samples/sec   Loss 5.5740   LearningRate 0.0826   Epoch: 10   Global Step: 104030   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:23:20,515-Speed 5506.95 samples/sec   Loss 5.5842   LearningRate 0.0825   Epoch: 10   Global Step: 104040   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:23:27,952-Speed 5509.27 samples/sec   Loss 5.5599   LearningRate 0.0825   Epoch: 10   Global Step: 104050   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:23:35,322-Speed 5559.43 samples/sec   Loss 5.5466   LearningRate 0.0825   Epoch: 10   Global Step: 104060   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:23:42,768-Speed 5502.14 samples/sec   Loss 5.5919   LearningRate 0.0825   Epoch: 10   Global Step: 104070   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:23:50,223-Speed 5495.37 samples/sec   Loss 5.5543   LearningRate 0.0825   Epoch: 10   Global Step: 104080   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:23:57,724-Speed 5462.18 samples/sec   Loss 5.5938   LearningRate 0.0825   Epoch: 10   Global Step: 104090   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:24:05,160-Speed 5509.42 samples/sec   Loss 5.6096   LearningRate 0.0824   Epoch: 10   Global Step: 104100   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:24:12,675-Speed 5451.30 samples/sec   Loss 5.5818   LearningRate 0.0824   Epoch: 10   Global Step: 104110   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:24:20,148-Speed 5482.78 samples/sec   Loss 5.6027   LearningRate 0.0824   Epoch: 10   Global Step: 104120   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:24:27,660-Speed 5453.45 samples/sec   Loss 5.5817   LearningRate 0.0824   Epoch: 10   Global Step: 104130   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:24:35,139-Speed 5478.43 samples/sec   Loss 5.5842   LearningRate 0.0824   Epoch: 10   Global Step: 104140   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:24:42,547-Speed 5530.92 samples/sec   Loss 5.5287   LearningRate 0.0824   Epoch: 10   Global Step: 104150   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:24:50,099-Speed 5424.37 samples/sec   Loss 5.5240   LearningRate 0.0824   Epoch: 10   Global Step: 104160   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:24:57,591-Speed 5468.72 samples/sec   Loss 5.5917   LearningRate 0.0823   Epoch: 10   Global Step: 104170   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:25:05,142-Speed 5425.47 samples/sec   Loss 5.6116   LearningRate 0.0823   Epoch: 10   Global Step: 104180   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:25:12,272-Speed 5746.45 samples/sec   Loss 5.6354   LearningRate 0.0823   Epoch: 10   Global Step: 104190   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:25:19,917-Speed 5359.26 samples/sec   Loss 5.5380   LearningRate 0.0823   Epoch: 10   Global Step: 104200   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:25:27,426-Speed 5456.48 samples/sec   Loss 5.5834   LearningRate 0.0823   Epoch: 10   Global Step: 104210   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:25:35,077-Speed 5354.44 samples/sec   Loss 5.5826   LearningRate 0.0823   Epoch: 10   Global Step: 104220   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:25:42,529-Speed 5497.84 samples/sec   Loss 5.6405   LearningRate 0.0822   Epoch: 10   Global Step: 104230   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:25:50,141-Speed 5382.05 samples/sec   Loss 5.5621   LearningRate 0.0822   Epoch: 10   Global Step: 104240   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:25:57,481-Speed 5581.35 samples/sec   Loss 5.5544   LearningRate 0.0822   Epoch: 10   Global Step: 104250   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:26:04,542-Speed 5802.31 samples/sec   Loss 5.5538   LearningRate 0.0822   Epoch: 10   Global Step: 104260   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:26:11,988-Speed 5501.59 samples/sec   Loss 5.5332   LearningRate 0.0822   Epoch: 10   Global Step: 104270   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:26:19,484-Speed 5465.93 samples/sec   Loss 5.5712   LearningRate 0.0822   Epoch: 10   Global Step: 104280   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:26:27,074-Speed 5397.48 samples/sec   Loss 5.5468   LearningRate 0.0821   Epoch: 10   Global Step: 104290   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:26:34,430-Speed 5569.73 samples/sec   Loss 5.5796   LearningRate 0.0821   Epoch: 10   Global Step: 104300   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:26:41,708-Speed 5628.93 samples/sec   Loss 5.5689   LearningRate 0.0821   Epoch: 10   Global Step: 104310   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:26:49,309-Speed 5390.25 samples/sec   Loss 5.5484   LearningRate 0.0821   Epoch: 10   Global Step: 104320   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:26:56,430-Speed 5753.06 samples/sec   Loss 5.6046   LearningRate 0.0821   Epoch: 10   Global Step: 104330   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:27:03,479-Speed 5811.94 samples/sec   Loss 5.5522   LearningRate 0.0821   Epoch: 10   Global Step: 104340   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:27:10,633-Speed 5726.99 samples/sec   Loss 5.5803   LearningRate 0.0820   Epoch: 10   Global Step: 104350   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:27:18,064-Speed 5513.66 samples/sec   Loss 5.5177   LearningRate 0.0820   Epoch: 10   Global Step: 104360   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:27:25,639-Speed 5407.87 samples/sec   Loss 5.5919   LearningRate 0.0820   Epoch: 10   Global Step: 104370   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:27:33,154-Speed 5452.21 samples/sec   Loss 5.5449   LearningRate 0.0820   Epoch: 10   Global Step: 104380   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:27:40,341-Speed 5700.72 samples/sec   Loss 5.5532   LearningRate 0.0820   Epoch: 10   Global Step: 104390   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:27:47,540-Speed 5690.29 samples/sec   Loss 5.5332   LearningRate 0.0820   Epoch: 10   Global Step: 104400   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:27:55,073-Speed 5439.39 samples/sec   Loss 5.5747   LearningRate 0.0820   Epoch: 10   Global Step: 104410   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:28:02,502-Speed 5514.31 samples/sec   Loss 5.5677   LearningRate 0.0819   Epoch: 10   Global Step: 104420   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:28:10,135-Speed 5367.73 samples/sec   Loss 5.5528   LearningRate 0.0819   Epoch: 10   Global Step: 104430   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:28:17,618-Speed 5474.82 samples/sec   Loss 5.5513   LearningRate 0.0819   Epoch: 10   Global Step: 104440   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:28:25,066-Speed 5500.13 samples/sec   Loss 5.5839   LearningRate 0.0819   Epoch: 10   Global Step: 104450   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:28:32,586-Speed 5448.35 samples/sec   Loss 5.5399   LearningRate 0.0819   Epoch: 10   Global Step: 104460   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:28:40,139-Speed 5424.10 samples/sec   Loss 5.5776   LearningRate 0.0819   Epoch: 10   Global Step: 104470   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:28:47,620-Speed 5476.44 samples/sec   Loss 5.5694   LearningRate 0.0818   Epoch: 10   Global Step: 104480   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:28:54,922-Speed 5610.68 samples/sec   Loss 5.5880   LearningRate 0.0818   Epoch: 10   Global Step: 104490   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:29:02,229-Speed 5606.66 samples/sec   Loss 5.5900   LearningRate 0.0818   Epoch: 10   Global Step: 104500   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:29:09,727-Speed 5463.71 samples/sec   Loss 5.5987   LearningRate 0.0818   Epoch: 10   Global Step: 104510   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:29:17,098-Speed 5558.81 samples/sec   Loss 5.5352   LearningRate 0.0818   Epoch: 10   Global Step: 104520   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:29:24,696-Speed 5392.15 samples/sec   Loss 5.5280   LearningRate 0.0818   Epoch: 10   Global Step: 104530   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:29:32,027-Speed 5588.35 samples/sec   Loss 5.4868   LearningRate 0.0817   Epoch: 10   Global Step: 104540   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:29:39,532-Speed 5459.05 samples/sec   Loss 5.5622   LearningRate 0.0817   Epoch: 10   Global Step: 104550   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:29:47,126-Speed 5394.64 samples/sec   Loss 5.5682   LearningRate 0.0817   Epoch: 10   Global Step: 104560   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:29:54,564-Speed 5508.47 samples/sec   Loss 5.5528   LearningRate 0.0817   Epoch: 10   Global Step: 104570   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:30:02,096-Speed 5438.97 samples/sec   Loss 5.5680   LearningRate 0.0817   Epoch: 10   Global Step: 104580   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:30:09,608-Speed 5454.41 samples/sec   Loss 5.5925   LearningRate 0.0817   Epoch: 10   Global Step: 104590   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:30:17,049-Speed 5505.78 samples/sec   Loss 5.5690   LearningRate 0.0817   Epoch: 10   Global Step: 104600   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:30:24,501-Speed 5497.00 samples/sec   Loss 5.5737   LearningRate 0.0816   Epoch: 10   Global Step: 104610   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:30:31,947-Speed 5502.37 samples/sec   Loss 5.5786   LearningRate 0.0816   Epoch: 10   Global Step: 104620   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:30:39,490-Speed 5431.37 samples/sec   Loss 5.5453   LearningRate 0.0816   Epoch: 10   Global Step: 104630   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:30:46,953-Speed 5490.05 samples/sec   Loss 5.5340   LearningRate 0.0816   Epoch: 10   Global Step: 104640   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:30:54,347-Speed 5540.61 samples/sec   Loss 5.4915   LearningRate 0.0816   Epoch: 10   Global Step: 104650   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 18:31:01,758-Speed 5529.78 samples/sec   Loss 5.5978   LearningRate 0.0816   Epoch: 10   Global Step: 104660   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:31:09,163-Speed 5532.27 samples/sec   Loss 5.5779   LearningRate 0.0815   Epoch: 10   Global Step: 104670   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:31:16,327-Speed 5718.54 samples/sec   Loss 5.5254   LearningRate 0.0815   Epoch: 10   Global Step: 104680   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:31:23,762-Speed 5510.58 samples/sec   Loss 5.5397   LearningRate 0.0815   Epoch: 10   Global Step: 104690   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:31:31,191-Speed 5514.69 samples/sec   Loss 5.5817   LearningRate 0.0815   Epoch: 10   Global Step: 104700   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:31:38,484-Speed 5617.29 samples/sec   Loss 5.5846   LearningRate 0.0815   Epoch: 10   Global Step: 104710   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:31:45,681-Speed 5693.05 samples/sec   Loss 5.5340   LearningRate 0.0815   Epoch: 10   Global Step: 104720   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:31:52,838-Speed 5724.20 samples/sec   Loss 5.4837   LearningRate 0.0814   Epoch: 10   Global Step: 104730   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:32:00,365-Speed 5442.25 samples/sec   Loss 5.5342   LearningRate 0.0814   Epoch: 10   Global Step: 104740   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:32:07,886-Speed 5447.86 samples/sec   Loss 5.5782   LearningRate 0.0814   Epoch: 10   Global Step: 104750   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:32:15,393-Speed 5457.10 samples/sec   Loss 5.5776   LearningRate 0.0814   Epoch: 10   Global Step: 104760   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:32:22,673-Speed 5627.28 samples/sec   Loss 5.5698   LearningRate 0.0814   Epoch: 10   Global Step: 104770   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:32:29,820-Speed 5731.85 samples/sec   Loss 5.5697   LearningRate 0.0814   Epoch: 10   Global Step: 104780   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:32:37,002-Speed 5704.81 samples/sec   Loss 5.5441   LearningRate 0.0813   Epoch: 10   Global Step: 104790   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:32:44,610-Speed 5386.30 samples/sec   Loss 5.5064   LearningRate 0.0813   Epoch: 10   Global Step: 104800   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:32:52,041-Speed 5513.21 samples/sec   Loss 5.5133   LearningRate 0.0813   Epoch: 10   Global Step: 104810   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:32:59,486-Speed 5503.25 samples/sec   Loss 5.5858   LearningRate 0.0813   Epoch: 10   Global Step: 104820   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 18:33:06,935-Speed 5499.58 samples/sec   Loss 5.5153   LearningRate 0.0813   Epoch: 10   Global Step: 104830   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:33:14,338-Speed 5534.18 samples/sec   Loss 5.5086   LearningRate 0.0813   Epoch: 10   Global Step: 104840   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:33:21,838-Speed 5462.74 samples/sec   Loss 5.5613   LearningRate 0.0813   Epoch: 10   Global Step: 104850   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:33:29,215-Speed 5553.50 samples/sec   Loss 5.5487   LearningRate 0.0812   Epoch: 10   Global Step: 104860   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 18:33:36,837-Speed 5375.05 samples/sec   Loss 5.5171   LearningRate 0.0812   Epoch: 10   Global Step: 104870   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:33:44,204-Speed 5561.30 samples/sec   Loss 5.5381   LearningRate 0.0812   Epoch: 10   Global Step: 104880   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:33:51,381-Speed 5708.21 samples/sec   Loss 5.5514   LearningRate 0.0812   Epoch: 10   Global Step: 104890   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:33:58,892-Speed 5454.91 samples/sec   Loss 5.5299   LearningRate 0.0812   Epoch: 10   Global Step: 104900   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:34:06,377-Speed 5473.04 samples/sec   Loss 5.4862   LearningRate 0.0812   Epoch: 10   Global Step: 104910   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:34:13,858-Speed 5476.46 samples/sec   Loss 5.5425   LearningRate 0.0811   Epoch: 10   Global Step: 104920   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:34:21,412-Speed 5423.36 samples/sec   Loss 5.4927   LearningRate 0.0811   Epoch: 10   Global Step: 104930   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 18:34:29,001-Speed 5398.60 samples/sec   Loss 5.5771   LearningRate 0.0811   Epoch: 10   Global Step: 104940   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 18:34:36,433-Speed 5513.02 samples/sec   Loss 5.5171   LearningRate 0.0811   Epoch: 10   Global Step: 104950   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 18:34:43,769-Speed 5584.52 samples/sec   Loss 5.5427   LearningRate 0.0811   Epoch: 10   Global Step: 104960   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 18:34:51,338-Speed 5413.19 samples/sec   Loss 5.4662   LearningRate 0.0811   Epoch: 10   Global Step: 104970   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 18:34:58,765-Speed 5516.13 samples/sec   Loss 5.5327   LearningRate 0.0810   Epoch: 10   Global Step: 104980   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:35:06,375-Speed 5383.39 samples/sec   Loss 5.5165   LearningRate 0.0810   Epoch: 10   Global Step: 104990   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:35:13,682-Speed 5606.74 samples/sec   Loss 5.5139   LearningRate 0.0810   Epoch: 10   Global Step: 105000   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:35:20,745-Speed 5800.25 samples/sec   Loss 5.5531   LearningRate 0.0810   Epoch: 10   Global Step: 105010   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:35:28,240-Speed 5466.08 samples/sec   Loss 5.5548   LearningRate 0.0810   Epoch: 10   Global Step: 105020   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:35:35,670-Speed 5514.53 samples/sec   Loss 5.4938   LearningRate 0.0810   Epoch: 10   Global Step: 105030   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:35:43,199-Speed 5441.10 samples/sec   Loss 5.4709   LearningRate 0.0810   Epoch: 10   Global Step: 105040   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:35:50,767-Speed 5414.26 samples/sec   Loss 5.5385   LearningRate 0.0809   Epoch: 10   Global Step: 105050   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:35:58,378-Speed 5382.74 samples/sec   Loss 5.5712   LearningRate 0.0809   Epoch: 10   Global Step: 105060   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:36:06,097-Speed 5307.07 samples/sec   Loss 5.4975   LearningRate 0.0809   Epoch: 10   Global Step: 105070   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:36:13,810-Speed 5312.17 samples/sec   Loss 5.4489   LearningRate 0.0809   Epoch: 10   Global Step: 105080   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 18:36:21,362-Speed 5424.40 samples/sec   Loss 5.5552   LearningRate 0.0809   Epoch: 10   Global Step: 105090   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:36:28,973-Speed 5383.10 samples/sec   Loss 5.4961   LearningRate 0.0809   Epoch: 10   Global Step: 105100   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:36:36,591-Speed 5377.88 samples/sec   Loss 5.5582   LearningRate 0.0808   Epoch: 10   Global Step: 105110   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:36:44,255-Speed 5345.85 samples/sec   Loss 5.4937   LearningRate 0.0808   Epoch: 10   Global Step: 105120   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:36:51,816-Speed 5418.08 samples/sec   Loss 5.5229   LearningRate 0.0808   Epoch: 10   Global Step: 105130   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:36:59,280-Speed 5488.81 samples/sec   Loss 5.5206   LearningRate 0.0808   Epoch: 10   Global Step: 105140   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:37:06,872-Speed 5396.33 samples/sec   Loss 5.5540   LearningRate 0.0808   Epoch: 10   Global Step: 105150   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:37:14,485-Speed 5381.74 samples/sec   Loss 5.5198   LearningRate 0.0808   Epoch: 10   Global Step: 105160   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:37:21,726-Speed 5657.95 samples/sec   Loss 5.4943   LearningRate 0.0807   Epoch: 10   Global Step: 105170   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:37:29,008-Speed 5625.84 samples/sec   Loss 5.5590   LearningRate 0.0807   Epoch: 10   Global Step: 105180   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:37:36,500-Speed 5468.96 samples/sec   Loss 5.5455   LearningRate 0.0807   Epoch: 10   Global Step: 105190   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 18:37:44,055-Speed 5422.79 samples/sec   Loss 5.5727   LearningRate 0.0807   Epoch: 10   Global Step: 105200   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 18:37:52,642-Speed 5702.35 samples/sec   Loss 5.5463   LearningRate 0.0807   Epoch: 10   Global Step: 105210   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 18:37:59,955-Speed 5601.91 samples/sec   Loss 5.5015   LearningRate 0.0807   Epoch: 10   Global Step: 105220   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:38:07,501-Speed 5429.74 samples/sec   Loss 5.5376   LearningRate 0.0807   Epoch: 10   Global Step: 105230   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:38:15,086-Speed 5400.93 samples/sec   Loss 5.5589   LearningRate 0.0806   Epoch: 10   Global Step: 105240   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:38:22,591-Speed 5458.83 samples/sec   Loss 5.5530   LearningRate 0.0806   Epoch: 10   Global Step: 105250   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:38:30,183-Speed 5396.24 samples/sec   Loss 5.4896   LearningRate 0.0806   Epoch: 10   Global Step: 105260   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:38:37,729-Speed 5430.91 samples/sec   Loss 5.4582   LearningRate 0.0806   Epoch: 10   Global Step: 105270   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:38:45,422-Speed 5324.91 samples/sec   Loss 5.5160   LearningRate 0.0806   Epoch: 10   Global Step: 105280   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:38:52,956-Speed 5438.36 samples/sec   Loss 5.4947   LearningRate 0.0806   Epoch: 10   Global Step: 105290   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:39:00,573-Speed 5378.95 samples/sec   Loss 5.5015   LearningRate 0.0805   Epoch: 10   Global Step: 105300   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:39:08,070-Speed 5464.81 samples/sec   Loss 5.4877   LearningRate 0.0805   Epoch: 10   Global Step: 105310   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:39:15,636-Speed 5414.30 samples/sec   Loss 5.5063   LearningRate 0.0805   Epoch: 10   Global Step: 105320   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 18:39:23,442-Speed 5248.23 samples/sec   Loss 5.5026   LearningRate 0.0805   Epoch: 10   Global Step: 105330   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 18:39:31,015-Speed 5409.63 samples/sec   Loss 5.5286   LearningRate 0.0805   Epoch: 10   Global Step: 105340   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 18:39:38,553-Speed 5434.64 samples/sec   Loss 5.5437   LearningRate 0.0805   Epoch: 10   Global Step: 105350   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 18:39:46,180-Speed 5371.59 samples/sec   Loss 5.5428   LearningRate 0.0804   Epoch: 10   Global Step: 105360   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 18:39:53,665-Speed 5473.58 samples/sec   Loss 5.4831   LearningRate 0.0804   Epoch: 10   Global Step: 105370   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 18:40:01,251-Speed 5400.30 samples/sec   Loss 5.5140   LearningRate 0.0804   Epoch: 10   Global Step: 105380   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 18:40:08,805-Speed 5423.63 samples/sec   Loss 5.4863   LearningRate 0.0804   Epoch: 10   Global Step: 105390   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 18:40:16,207-Speed 5534.52 samples/sec   Loss 5.5286   LearningRate 0.0804   Epoch: 10   Global Step: 105400   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:40:23,676-Speed 5485.59 samples/sec   Loss 5.5323   LearningRate 0.0804   Epoch: 10   Global Step: 105410   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:40:31,142-Speed 5487.27 samples/sec   Loss 5.4812   LearningRate 0.0804   Epoch: 10   Global Step: 105420   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:40:38,679-Speed 5435.36 samples/sec   Loss 5.4823   LearningRate 0.0803   Epoch: 10   Global Step: 105430   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:40:46,097-Speed 5522.51 samples/sec   Loss 5.4962   LearningRate 0.0803   Epoch: 10   Global Step: 105440   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:40:53,523-Speed 5517.63 samples/sec   Loss 5.5273   LearningRate 0.0803   Epoch: 10   Global Step: 105450   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:41:00,926-Speed 5533.97 samples/sec   Loss 5.5108   LearningRate 0.0803   Epoch: 10   Global Step: 105460   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:41:08,221-Speed 5615.48 samples/sec   Loss 5.5321   LearningRate 0.0803   Epoch: 10   Global Step: 105470   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:41:15,729-Speed 5458.68 samples/sec   Loss 5.5002   LearningRate 0.0803   Epoch: 10   Global Step: 105480   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:41:23,167-Speed 5508.24 samples/sec   Loss 5.5289   LearningRate 0.0802   Epoch: 10   Global Step: 105490   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:41:30,656-Speed 5470.24 samples/sec   Loss 5.4721   LearningRate 0.0802   Epoch: 10   Global Step: 105500   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:41:38,136-Speed 5477.79 samples/sec   Loss 5.4824   LearningRate 0.0802   Epoch: 10   Global Step: 105510   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:41:45,601-Speed 5487.46 samples/sec   Loss 5.4967   LearningRate 0.0802   Epoch: 10   Global Step: 105520   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:41:53,159-Speed 5420.51 samples/sec   Loss 5.5340   LearningRate 0.0802   Epoch: 10   Global Step: 105530   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:42:00,711-Speed 5424.49 samples/sec   Loss 5.5441   LearningRate 0.0802   Epoch: 10   Global Step: 105540   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:42:08,176-Speed 5487.93 samples/sec   Loss 5.5173   LearningRate 0.0801   Epoch: 10   Global Step: 105550   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:42:15,670-Speed 5466.61 samples/sec   Loss 5.5552   LearningRate 0.0801   Epoch: 10   Global Step: 105560   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:42:23,143-Speed 5481.33 samples/sec   Loss 5.5441   LearningRate 0.0801   Epoch: 10   Global Step: 105570   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:42:30,622-Speed 5477.27 samples/sec   Loss 5.5072   LearningRate 0.0801   Epoch: 10   Global Step: 105580   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:42:38,146-Speed 5445.20 samples/sec   Loss 5.4885   LearningRate 0.0801   Epoch: 10   Global Step: 105590   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:42:45,561-Speed 5524.54 samples/sec   Loss 5.4736   LearningRate 0.0801   Epoch: 10   Global Step: 105600   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 18:42:53,068-Speed 5457.08 samples/sec   Loss 5.5381   LearningRate 0.0801   Epoch: 10   Global Step: 105610   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 18:43:00,500-Speed 5512.01 samples/sec   Loss 5.4504   LearningRate 0.0800   Epoch: 10   Global Step: 105620   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:43:07,975-Speed 5480.39 samples/sec   Loss 5.4734   LearningRate 0.0800   Epoch: 10   Global Step: 105630   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:43:15,440-Speed 5487.34 samples/sec   Loss 5.5088   LearningRate 0.0800   Epoch: 10   Global Step: 105640   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:43:22,938-Speed 5463.53 samples/sec   Loss 5.4371   LearningRate 0.0800   Epoch: 10   Global Step: 105650   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:43:30,420-Speed 5475.37 samples/sec   Loss 5.4730   LearningRate 0.0800   Epoch: 10   Global Step: 105660   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:43:37,969-Speed 5426.31 samples/sec   Loss 5.3926   LearningRate 0.0800   Epoch: 10   Global Step: 105670   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:43:45,409-Speed 5506.81 samples/sec   Loss 5.4802   LearningRate 0.0799   Epoch: 10   Global Step: 105680   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:43:52,905-Speed 5464.77 samples/sec   Loss 5.4171   LearningRate 0.0799   Epoch: 10   Global Step: 105690   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:44:00,518-Speed 5380.58 samples/sec   Loss 5.4930   LearningRate 0.0799   Epoch: 10   Global Step: 105700   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:44:08,020-Speed 5460.78 samples/sec   Loss 5.5018   LearningRate 0.0799   Epoch: 10   Global Step: 105710   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:44:15,505-Speed 5473.38 samples/sec   Loss 5.4843   LearningRate 0.0799   Epoch: 10   Global Step: 105720   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 18:44:22,945-Speed 5505.86 samples/sec   Loss 5.4841   LearningRate 0.0799   Epoch: 10   Global Step: 105730   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 18:44:30,386-Speed 5505.58 samples/sec   Loss 5.4576   LearningRate 0.0798   Epoch: 10   Global Step: 105740   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:44:37,839-Speed 5496.20 samples/sec   Loss 5.4595   LearningRate 0.0798   Epoch: 10   Global Step: 105750   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:44:45,391-Speed 5424.36 samples/sec   Loss 5.4660   LearningRate 0.0798   Epoch: 10   Global Step: 105760   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:44:52,891-Speed 5462.20 samples/sec   Loss 5.5013   LearningRate 0.0798   Epoch: 10   Global Step: 105770   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:45:00,364-Speed 5482.42 samples/sec   Loss 5.4621   LearningRate 0.0798   Epoch: 10   Global Step: 105780   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:45:07,823-Speed 5491.77 samples/sec   Loss 5.5255   LearningRate 0.0798   Epoch: 10   Global Step: 105790   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:45:15,800-Speed 5135.12 samples/sec   Loss 5.5786   LearningRate 0.0798   Epoch: 10   Global Step: 105800   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:45:23,283-Speed 5474.80 samples/sec   Loss 5.4327   LearningRate 0.0797   Epoch: 10   Global Step: 105810   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:45:30,822-Speed 5433.59 samples/sec   Loss 5.4394   LearningRate 0.0797   Epoch: 10   Global Step: 105820   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:45:38,316-Speed 5466.74 samples/sec   Loss 5.4948   LearningRate 0.0797   Epoch: 10   Global Step: 105830   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:45:45,865-Speed 5426.49 samples/sec   Loss 5.5096   LearningRate 0.0797   Epoch: 10   Global Step: 105840   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 18:45:53,298-Speed 5510.92 samples/sec   Loss 5.4967   LearningRate 0.0797   Epoch: 10   Global Step: 105850   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 18:46:00,758-Speed 5491.88 samples/sec   Loss 5.4969   LearningRate 0.0797   Epoch: 10   Global Step: 105860   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:46:08,237-Speed 5477.44 samples/sec   Loss 5.5303   LearningRate 0.0796   Epoch: 10   Global Step: 105870   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 18:46:15,687-Speed 5498.61 samples/sec   Loss 5.4703   LearningRate 0.0796   Epoch: 10   Global Step: 105880   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 18:46:23,141-Speed 5495.84 samples/sec   Loss 5.4773   LearningRate 0.0796   Epoch: 10   Global Step: 105890   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 18:46:30,654-Speed 5452.83 samples/sec   Loss 5.4632   LearningRate 0.0796   Epoch: 10   Global Step: 105900   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 18:46:38,102-Speed 5500.09 samples/sec   Loss 5.4317   LearningRate 0.0796   Epoch: 10   Global Step: 105910   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 18:46:45,529-Speed 5515.95 samples/sec   Loss 5.4666   LearningRate 0.0796   Epoch: 10   Global Step: 105920   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 18:46:53,010-Speed 5475.61 samples/sec   Loss 5.5002   LearningRate 0.0796   Epoch: 10   Global Step: 105930   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 18:47:00,563-Speed 5423.86 samples/sec   Loss 5.5083   LearningRate 0.0795   Epoch: 10   Global Step: 105940   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 18:47:08,184-Speed 5375.81 samples/sec   Loss 5.5088   LearningRate 0.0795   Epoch: 10   Global Step: 105950   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 18:47:15,767-Speed 5401.88 samples/sec   Loss 5.5120   LearningRate 0.0795   Epoch: 10   Global Step: 105960   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 18:47:23,254-Speed 5471.91 samples/sec   Loss 5.4972   LearningRate 0.0795   Epoch: 10   Global Step: 105970   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:47:30,750-Speed 5464.69 samples/sec   Loss 5.5286   LearningRate 0.0795   Epoch: 10   Global Step: 105980   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:47:38,242-Speed 5468.41 samples/sec   Loss 5.5160   LearningRate 0.0795   Epoch: 10   Global Step: 105990   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:47:45,683-Speed 5504.83 samples/sec   Loss 5.4761   LearningRate 0.0794   Epoch: 10   Global Step: 106000   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:48:29,990-[lfw][106000]XNorm: 23.475606
Training: 2022-01-08 18:48:29,991-[lfw][106000]Accuracy-Flip: 0.99800+-0.00287
Training: 2022-01-08 18:48:29,991-[lfw][106000]Accuracy-Highest: 0.99817
Training: 2022-01-08 18:49:21,301-[cfp_fp][106000]XNorm: 21.649124
Training: 2022-01-08 18:49:21,301-[cfp_fp][106000]Accuracy-Flip: 0.99043+-0.00429
Training: 2022-01-08 18:49:21,302-[cfp_fp][106000]Accuracy-Highest: 0.99043
Training: 2022-01-08 18:50:05,646-[agedb_30][106000]XNorm: 23.216536
Training: 2022-01-08 18:50:05,647-[agedb_30][106000]Accuracy-Flip: 0.97783+-0.00619
Training: 2022-01-08 18:50:05,647-[agedb_30][106000]Accuracy-Highest: 0.97917
Training: 2022-01-08 18:50:13,161-Speed 277.74 samples/sec   Loss 5.5376   LearningRate 0.0794   Epoch: 10   Global Step: 106010   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:50:20,667-Speed 5457.92 samples/sec   Loss 5.4482   LearningRate 0.0794   Epoch: 10   Global Step: 106020   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:50:28,138-Speed 5484.40 samples/sec   Loss 5.5169   LearningRate 0.0794   Epoch: 10   Global Step: 106030   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:50:35,578-Speed 5506.48 samples/sec   Loss 5.4660   LearningRate 0.0794   Epoch: 10   Global Step: 106040   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:50:43,025-Speed 5501.57 samples/sec   Loss 5.4478   LearningRate 0.0794   Epoch: 10   Global Step: 106050   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:50:50,468-Speed 5504.53 samples/sec   Loss 5.5010   LearningRate 0.0793   Epoch: 10   Global Step: 106060   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:50:57,915-Speed 5501.76 samples/sec   Loss 5.4412   LearningRate 0.0793   Epoch: 10   Global Step: 106070   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 18:51:05,362-Speed 5500.96 samples/sec   Loss 5.4483   LearningRate 0.0793   Epoch: 10   Global Step: 106080   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:51:12,826-Speed 5488.82 samples/sec   Loss 5.4830   LearningRate 0.0793   Epoch: 10   Global Step: 106090   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:51:20,375-Speed 5427.51 samples/sec   Loss 5.5091   LearningRate 0.0793   Epoch: 10   Global Step: 106100   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:51:27,839-Speed 5489.30 samples/sec   Loss 5.4965   LearningRate 0.0793   Epoch: 10   Global Step: 106110   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:51:35,054-Speed 5679.39 samples/sec   Loss 5.4489   LearningRate 0.0793   Epoch: 10   Global Step: 106120   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:51:42,597-Speed 5431.71 samples/sec   Loss 5.4778   LearningRate 0.0792   Epoch: 10   Global Step: 106130   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:51:50,147-Speed 5425.84 samples/sec   Loss 5.4635   LearningRate 0.0792   Epoch: 10   Global Step: 106140   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:51:57,557-Speed 5529.17 samples/sec   Loss 5.4921   LearningRate 0.0792   Epoch: 10   Global Step: 106150   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:52:05,023-Speed 5487.24 samples/sec   Loss 5.4393   LearningRate 0.0792   Epoch: 10   Global Step: 106160   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:52:12,162-Speed 5738.89 samples/sec   Loss 5.4254   LearningRate 0.0792   Epoch: 10   Global Step: 106170   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:52:19,671-Speed 5456.29 samples/sec   Loss 5.4472   LearningRate 0.0792   Epoch: 10   Global Step: 106180   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 18:52:27,126-Speed 5495.73 samples/sec   Loss 5.4386   LearningRate 0.0791   Epoch: 10   Global Step: 106190   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 18:52:34,643-Speed 5449.88 samples/sec   Loss 5.4464   LearningRate 0.0791   Epoch: 10   Global Step: 106200   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 18:52:42,173-Speed 5440.92 samples/sec   Loss 5.4625   LearningRate 0.0791   Epoch: 10   Global Step: 106210   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:52:49,639-Speed 5487.98 samples/sec   Loss 5.4927   LearningRate 0.0791   Epoch: 10   Global Step: 106220   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:52:57,100-Speed 5491.23 samples/sec   Loss 5.4768   LearningRate 0.0791   Epoch: 10   Global Step: 106230   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:53:04,575-Speed 5480.45 samples/sec   Loss 5.4820   LearningRate 0.0791   Epoch: 10   Global Step: 106240   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:53:11,998-Speed 5519.17 samples/sec   Loss 5.4732   LearningRate 0.0790   Epoch: 10   Global Step: 106250   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:53:19,442-Speed 5504.06 samples/sec   Loss 5.5514   LearningRate 0.0790   Epoch: 10   Global Step: 106260   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:53:26,983-Speed 5432.46 samples/sec   Loss 5.4769   LearningRate 0.0790   Epoch: 10   Global Step: 106270   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:53:34,489-Speed 5458.27 samples/sec   Loss 5.4062   LearningRate 0.0790   Epoch: 10   Global Step: 106280   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:53:41,951-Speed 5490.41 samples/sec   Loss 5.4200   LearningRate 0.0790   Epoch: 10   Global Step: 106290   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:53:49,350-Speed 5537.32 samples/sec   Loss 5.4700   LearningRate 0.0790   Epoch: 10   Global Step: 106300   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:53:56,774-Speed 5518.29 samples/sec   Loss 5.4402   LearningRate 0.0790   Epoch: 10   Global Step: 106310   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 18:54:04,184-Speed 5528.78 samples/sec   Loss 5.4776   LearningRate 0.0789   Epoch: 10   Global Step: 106320   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:54:11,579-Speed 5539.97 samples/sec   Loss 5.4487   LearningRate 0.0789   Epoch: 10   Global Step: 106330   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:54:19,084-Speed 5459.03 samples/sec   Loss 5.4346   LearningRate 0.0789   Epoch: 10   Global Step: 106340   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:54:26,458-Speed 5555.37 samples/sec   Loss 5.4871   LearningRate 0.0789   Epoch: 10   Global Step: 106350   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:54:33,889-Speed 5513.65 samples/sec   Loss 5.4531   LearningRate 0.0789   Epoch: 10   Global Step: 106360   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:54:41,178-Speed 5620.37 samples/sec   Loss 5.4854   LearningRate 0.0789   Epoch: 10   Global Step: 106370   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:54:48,684-Speed 5458.85 samples/sec   Loss 5.4502   LearningRate 0.0788   Epoch: 10   Global Step: 106380   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:54:56,102-Speed 5522.50 samples/sec   Loss 5.4986   LearningRate 0.0788   Epoch: 10   Global Step: 106390   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:55:03,440-Speed 5583.19 samples/sec   Loss 5.4629   LearningRate 0.0788   Epoch: 10   Global Step: 106400   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:55:10,465-Speed 5831.81 samples/sec   Loss 5.4762   LearningRate 0.0788   Epoch: 10   Global Step: 106410   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:55:17,877-Speed 5527.73 samples/sec   Loss 5.4461   LearningRate 0.0788   Epoch: 10   Global Step: 106420   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 18:55:25,683-Speed 5248.53 samples/sec   Loss 5.4205   LearningRate 0.0788   Epoch: 10   Global Step: 106430   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 18:55:33,319-Speed 5364.78 samples/sec   Loss 5.4479   LearningRate 0.0788   Epoch: 10   Global Step: 106440   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 18:55:40,735-Speed 5524.33 samples/sec   Loss 5.4379   LearningRate 0.0787   Epoch: 10   Global Step: 106450   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:55:48,318-Speed 5402.86 samples/sec   Loss 5.4680   LearningRate 0.0787   Epoch: 10   Global Step: 106460   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:55:55,820-Speed 5461.54 samples/sec   Loss 5.4469   LearningRate 0.0787   Epoch: 10   Global Step: 106470   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:56:03,315-Speed 5466.07 samples/sec   Loss 5.4273   LearningRate 0.0787   Epoch: 10   Global Step: 106480   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:56:10,760-Speed 5502.51 samples/sec   Loss 5.4279   LearningRate 0.0787   Epoch: 10   Global Step: 106490   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:56:18,221-Speed 5490.85 samples/sec   Loss 5.4883   LearningRate 0.0787   Epoch: 10   Global Step: 106500   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:56:25,696-Speed 5480.72 samples/sec   Loss 5.4908   LearningRate 0.0786   Epoch: 10   Global Step: 106510   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:56:33,224-Speed 5442.96 samples/sec   Loss 5.5101   LearningRate 0.0786   Epoch: 10   Global Step: 106520   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:56:40,679-Speed 5494.65 samples/sec   Loss 5.4508   LearningRate 0.0786   Epoch: 10   Global Step: 106530   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:56:48,157-Speed 5477.68 samples/sec   Loss 5.4556   LearningRate 0.0786   Epoch: 10   Global Step: 106540   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:56:55,686-Speed 5441.49 samples/sec   Loss 5.4545   LearningRate 0.0786   Epoch: 10   Global Step: 106550   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 18:57:03,399-Speed 5311.15 samples/sec   Loss 5.4490   LearningRate 0.0786   Epoch: 10   Global Step: 106560   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 18:57:19,005-Speed 2624.82 samples/sec   Loss 5.4241   LearningRate 0.0786   Epoch: 10   Global Step: 106570   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 18:57:26,399-Speed 5540.53 samples/sec   Loss 5.4505   LearningRate 0.0785   Epoch: 10   Global Step: 106580   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:57:33,871-Speed 5482.84 samples/sec   Loss 5.4367   LearningRate 0.0785   Epoch: 10   Global Step: 106590   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:57:41,355-Speed 5473.69 samples/sec   Loss 5.4846   LearningRate 0.0785   Epoch: 10   Global Step: 106600   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:57:48,945-Speed 5397.54 samples/sec   Loss 5.4764   LearningRate 0.0785   Epoch: 10   Global Step: 106610   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:57:56,385-Speed 5506.61 samples/sec   Loss 5.4401   LearningRate 0.0785   Epoch: 10   Global Step: 106620   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:58:03,840-Speed 5494.58 samples/sec   Loss 5.4064   LearningRate 0.0785   Epoch: 10   Global Step: 106630   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:58:11,288-Speed 5500.26 samples/sec   Loss 5.4462   LearningRate 0.0784   Epoch: 10   Global Step: 106640   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:58:18,831-Speed 5431.32 samples/sec   Loss 5.4794   LearningRate 0.0784   Epoch: 10   Global Step: 106650   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:58:26,330-Speed 5462.79 samples/sec   Loss 5.4341   LearningRate 0.0784   Epoch: 10   Global Step: 106660   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:58:33,744-Speed 5525.79 samples/sec   Loss 5.4512   LearningRate 0.0784   Epoch: 10   Global Step: 106670   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:58:41,217-Speed 5481.22 samples/sec   Loss 5.4438   LearningRate 0.0784   Epoch: 10   Global Step: 106680   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 18:58:48,647-Speed 5514.20 samples/sec   Loss 5.4363   LearningRate 0.0784   Epoch: 10   Global Step: 106690   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 18:58:56,078-Speed 5512.43 samples/sec   Loss 5.4485   LearningRate 0.0783   Epoch: 10   Global Step: 106700   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:59:03,518-Speed 5506.24 samples/sec   Loss 5.4645   LearningRate 0.0783   Epoch: 10   Global Step: 106710   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:59:11,028-Speed 5454.75 samples/sec   Loss 5.4383   LearningRate 0.0783   Epoch: 10   Global Step: 106720   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:59:18,537-Speed 5455.59 samples/sec   Loss 5.4117   LearningRate 0.0783   Epoch: 10   Global Step: 106730   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:59:26,047-Speed 5454.92 samples/sec   Loss 5.4278   LearningRate 0.0783   Epoch: 10   Global Step: 106740   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:59:33,471-Speed 5517.84 samples/sec   Loss 5.4206   LearningRate 0.0783   Epoch: 10   Global Step: 106750   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:59:40,886-Speed 5524.10 samples/sec   Loss 5.4467   LearningRate 0.0783   Epoch: 10   Global Step: 106760   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:59:48,409-Speed 5446.19 samples/sec   Loss 5.4756   LearningRate 0.0782   Epoch: 10   Global Step: 106770   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 18:59:55,886-Speed 5478.17 samples/sec   Loss 5.4535   LearningRate 0.0782   Epoch: 10   Global Step: 106780   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:00:03,318-Speed 5512.35 samples/sec   Loss 5.4313   LearningRate 0.0782   Epoch: 10   Global Step: 106790   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:00:10,785-Speed 5486.33 samples/sec   Loss 5.4337   LearningRate 0.0782   Epoch: 10   Global Step: 106800   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 19:00:18,257-Speed 5482.41 samples/sec   Loss 5.4509   LearningRate 0.0782   Epoch: 10   Global Step: 106810   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 19:00:25,768-Speed 5453.95 samples/sec   Loss 5.4783   LearningRate 0.0782   Epoch: 10   Global Step: 106820   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 19:00:33,251-Speed 5474.71 samples/sec   Loss 5.4210   LearningRate 0.0781   Epoch: 10   Global Step: 106830   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 19:00:40,726-Speed 5480.36 samples/sec   Loss 5.4043   LearningRate 0.0781   Epoch: 10   Global Step: 106840   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 19:00:48,160-Speed 5510.31 samples/sec   Loss 5.4343   LearningRate 0.0781   Epoch: 10   Global Step: 106850   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 19:00:55,677-Speed 5449.83 samples/sec   Loss 5.4602   LearningRate 0.0781   Epoch: 10   Global Step: 106860   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 19:01:03,154-Speed 5479.16 samples/sec   Loss 5.4240   LearningRate 0.0781   Epoch: 10   Global Step: 106870   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 19:01:10,626-Speed 5482.48 samples/sec   Loss 5.4648   LearningRate 0.0781   Epoch: 10   Global Step: 106880   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 19:01:18,116-Speed 5469.72 samples/sec   Loss 5.4503   LearningRate 0.0781   Epoch: 10   Global Step: 106890   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:01:25,541-Speed 5517.38 samples/sec   Loss 5.3629   LearningRate 0.0780   Epoch: 10   Global Step: 106900   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:01:33,039-Speed 5463.31 samples/sec   Loss 5.4023   LearningRate 0.0780   Epoch: 10   Global Step: 106910   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:01:40,527-Speed 5471.19 samples/sec   Loss 5.4404   LearningRate 0.0780   Epoch: 10   Global Step: 106920   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:01:47,986-Speed 5491.70 samples/sec   Loss 5.3870   LearningRate 0.0780   Epoch: 10   Global Step: 106930   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:01:55,450-Speed 5488.97 samples/sec   Loss 5.4066   LearningRate 0.0780   Epoch: 10   Global Step: 106940   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:02:02,877-Speed 5515.41 samples/sec   Loss 5.4331   LearningRate 0.0780   Epoch: 10   Global Step: 106950   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:02:10,403-Speed 5443.23 samples/sec   Loss 5.4404   LearningRate 0.0779   Epoch: 10   Global Step: 106960   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:02:17,937-Speed 5437.67 samples/sec   Loss 5.4421   LearningRate 0.0779   Epoch: 10   Global Step: 106970   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:02:25,426-Speed 5469.75 samples/sec   Loss 5.3935   LearningRate 0.0779   Epoch: 10   Global Step: 106980   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:02:32,875-Speed 5499.45 samples/sec   Loss 5.3764   LearningRate 0.0779   Epoch: 10   Global Step: 106990   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:02:40,375-Speed 5462.37 samples/sec   Loss 5.4239   LearningRate 0.0779   Epoch: 10   Global Step: 107000   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:02:47,893-Speed 5448.68 samples/sec   Loss 5.4014   LearningRate 0.0779   Epoch: 10   Global Step: 107010   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:02:55,346-Speed 5496.84 samples/sec   Loss 5.4768   LearningRate 0.0779   Epoch: 10   Global Step: 107020   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:03:02,822-Speed 5479.56 samples/sec   Loss 5.4058   LearningRate 0.0778   Epoch: 10   Global Step: 107030   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:03:10,271-Speed 5499.72 samples/sec   Loss 5.4541   LearningRate 0.0778   Epoch: 10   Global Step: 107040   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:03:17,752-Speed 5475.55 samples/sec   Loss 5.3659   LearningRate 0.0778   Epoch: 10   Global Step: 107050   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:03:25,369-Speed 5378.03 samples/sec   Loss 5.3846   LearningRate 0.0778   Epoch: 10   Global Step: 107060   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:03:32,894-Speed 5444.57 samples/sec   Loss 5.3929   LearningRate 0.0778   Epoch: 10   Global Step: 107070   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:03:40,540-Speed 5357.77 samples/sec   Loss 5.4146   LearningRate 0.0778   Epoch: 10   Global Step: 107080   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:03:48,091-Speed 5425.08 samples/sec   Loss 5.4354   LearningRate 0.0777   Epoch: 10   Global Step: 107090   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:03:55,617-Speed 5443.34 samples/sec   Loss 5.4312   LearningRate 0.0777   Epoch: 10   Global Step: 107100   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:04:03,113-Speed 5464.35 samples/sec   Loss 5.4658   LearningRate 0.0777   Epoch: 10   Global Step: 107110   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 19:04:10,639-Speed 5443.72 samples/sec   Loss 5.4496   LearningRate 0.0777   Epoch: 10   Global Step: 107120   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:04:18,106-Speed 5486.25 samples/sec   Loss 5.3899   LearningRate 0.0777   Epoch: 10   Global Step: 107130   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:04:25,621-Speed 5451.23 samples/sec   Loss 5.4112   LearningRate 0.0777   Epoch: 10   Global Step: 107140   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:04:33,206-Speed 5400.71 samples/sec   Loss 5.3492   LearningRate 0.0776   Epoch: 10   Global Step: 107150   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:04:40,740-Speed 5437.13 samples/sec   Loss 5.3583   LearningRate 0.0776   Epoch: 10   Global Step: 107160   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:04:48,151-Speed 5527.84 samples/sec   Loss 5.4487   LearningRate 0.0776   Epoch: 10   Global Step: 107170   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:04:55,639-Speed 5470.79 samples/sec   Loss 5.4532   LearningRate 0.0776   Epoch: 10   Global Step: 107180   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:05:03,093-Speed 5495.67 samples/sec   Loss 5.3635   LearningRate 0.0776   Epoch: 10   Global Step: 107190   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:05:10,562-Speed 5484.38 samples/sec   Loss 5.4330   LearningRate 0.0776   Epoch: 10   Global Step: 107200   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:05:18,045-Speed 5474.72 samples/sec   Loss 5.3867   LearningRate 0.0776   Epoch: 10   Global Step: 107210   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:05:25,537-Speed 5468.32 samples/sec   Loss 5.4135   LearningRate 0.0775   Epoch: 10   Global Step: 107220   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:05:33,057-Speed 5447.42 samples/sec   Loss 5.3486   LearningRate 0.0775   Epoch: 10   Global Step: 107230   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:05:40,462-Speed 5531.98 samples/sec   Loss 5.3703   LearningRate 0.0775   Epoch: 10   Global Step: 107240   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:05:47,937-Speed 5480.13 samples/sec   Loss 5.4330   LearningRate 0.0775   Epoch: 10   Global Step: 107250   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:05:55,381-Speed 5503.31 samples/sec   Loss 5.4159   LearningRate 0.0775   Epoch: 10   Global Step: 107260   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:06:02,794-Speed 5526.36 samples/sec   Loss 5.4095   LearningRate 0.0775   Epoch: 10   Global Step: 107270   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:06:10,306-Speed 5453.19 samples/sec   Loss 5.4386   LearningRate 0.0774   Epoch: 10   Global Step: 107280   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:06:17,750-Speed 5503.35 samples/sec   Loss 5.3776   LearningRate 0.0774   Epoch: 10   Global Step: 107290   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:06:25,211-Speed 5490.97 samples/sec   Loss 5.4685   LearningRate 0.0774   Epoch: 10   Global Step: 107300   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:06:32,807-Speed 5392.87 samples/sec   Loss 5.4210   LearningRate 0.0774   Epoch: 10   Global Step: 107310   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:06:40,370-Speed 5416.18 samples/sec   Loss 5.4276   LearningRate 0.0774   Epoch: 10   Global Step: 107320   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:06:47,918-Speed 5427.87 samples/sec   Loss 5.4158   LearningRate 0.0774   Epoch: 10   Global Step: 107330   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:06:55,476-Speed 5420.11 samples/sec   Loss 5.4155   LearningRate 0.0774   Epoch: 10   Global Step: 107340   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:07:03,008-Speed 5438.84 samples/sec   Loss 5.3900   LearningRate 0.0773   Epoch: 10   Global Step: 107350   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:07:10,583-Speed 5408.01 samples/sec   Loss 5.3468   LearningRate 0.0773   Epoch: 10   Global Step: 107360   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:07:18,107-Speed 5444.68 samples/sec   Loss 5.4155   LearningRate 0.0773   Epoch: 10   Global Step: 107370   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:07:25,694-Speed 5399.43 samples/sec   Loss 5.4121   LearningRate 0.0773   Epoch: 10   Global Step: 107380   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:07:33,244-Speed 5425.44 samples/sec   Loss 5.4246   LearningRate 0.0773   Epoch: 10   Global Step: 107390   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:07:40,730-Speed 5472.05 samples/sec   Loss 5.3701   LearningRate 0.0773   Epoch: 10   Global Step: 107400   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:07:48,192-Speed 5489.95 samples/sec   Loss 5.3497   LearningRate 0.0772   Epoch: 10   Global Step: 107410   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:07:55,615-Speed 5518.86 samples/sec   Loss 5.3809   LearningRate 0.0772   Epoch: 10   Global Step: 107420   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:08:03,091-Speed 5479.69 samples/sec   Loss 5.3936   LearningRate 0.0772   Epoch: 10   Global Step: 107430   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:08:10,677-Speed 5400.38 samples/sec   Loss 5.3644   LearningRate 0.0772   Epoch: 10   Global Step: 107440   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:08:18,143-Speed 5486.40 samples/sec   Loss 5.3917   LearningRate 0.0772   Epoch: 10   Global Step: 107450   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:08:25,610-Speed 5486.81 samples/sec   Loss 5.3602   LearningRate 0.0772   Epoch: 10   Global Step: 107460   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:08:33,140-Speed 5440.38 samples/sec   Loss 5.3650   LearningRate 0.0772   Epoch: 10   Global Step: 107470   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 19:08:40,640-Speed 5461.55 samples/sec   Loss 5.3833   LearningRate 0.0771   Epoch: 10   Global Step: 107480   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 19:08:48,173-Speed 5437.90 samples/sec   Loss 5.4360   LearningRate 0.0771   Epoch: 10   Global Step: 107490   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:08:55,727-Speed 5423.25 samples/sec   Loss 5.3551   LearningRate 0.0771   Epoch: 10   Global Step: 107500   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:09:03,395-Speed 5342.75 samples/sec   Loss 5.3781   LearningRate 0.0771   Epoch: 10   Global Step: 107510   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:09:10,901-Speed 5456.97 samples/sec   Loss 5.3735   LearningRate 0.0771   Epoch: 10   Global Step: 107520   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:09:18,465-Speed 5415.97 samples/sec   Loss 5.4340   LearningRate 0.0771   Epoch: 10   Global Step: 107530   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:09:25,992-Speed 5442.95 samples/sec   Loss 5.4024   LearningRate 0.0770   Epoch: 10   Global Step: 107540   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:09:33,754-Speed 5277.62 samples/sec   Loss 5.3942   LearningRate 0.0770   Epoch: 10   Global Step: 107550   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:09:41,314-Speed 5417.96 samples/sec   Loss 5.3349   LearningRate 0.0770   Epoch: 10   Global Step: 107560   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:09:48,859-Speed 5430.08 samples/sec   Loss 5.3481   LearningRate 0.0770   Epoch: 10   Global Step: 107570   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:09:56,340-Speed 5475.57 samples/sec   Loss 5.3956   LearningRate 0.0770   Epoch: 10   Global Step: 107580   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:10:03,798-Speed 5493.22 samples/sec   Loss 5.3608   LearningRate 0.0770   Epoch: 10   Global Step: 107590   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:10:11,299-Speed 5461.36 samples/sec   Loss 5.3564   LearningRate 0.0770   Epoch: 10   Global Step: 107600   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:10:18,799-Speed 5461.88 samples/sec   Loss 5.3899   LearningRate 0.0769   Epoch: 10   Global Step: 107610   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:10:26,351-Speed 5423.94 samples/sec   Loss 5.3722   LearningRate 0.0769   Epoch: 10   Global Step: 107620   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:10:33,859-Speed 5456.65 samples/sec   Loss 5.3630   LearningRate 0.0769   Epoch: 10   Global Step: 107630   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:10:41,340-Speed 5475.88 samples/sec   Loss 5.3609   LearningRate 0.0769   Epoch: 10   Global Step: 107640   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:10:48,819-Speed 5477.26 samples/sec   Loss 5.3817   LearningRate 0.0769   Epoch: 10   Global Step: 107650   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:10:56,342-Speed 5445.40 samples/sec   Loss 5.3209   LearningRate 0.0769   Epoch: 10   Global Step: 107660   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:11:03,794-Speed 5497.76 samples/sec   Loss 5.3009   LearningRate 0.0768   Epoch: 10   Global Step: 107670   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:11:11,238-Speed 5502.65 samples/sec   Loss 5.3954   LearningRate 0.0768   Epoch: 10   Global Step: 107680   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:11:18,708-Speed 5484.55 samples/sec   Loss 5.3435   LearningRate 0.0768   Epoch: 10   Global Step: 107690   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:11:26,168-Speed 5490.74 samples/sec   Loss 5.3606   LearningRate 0.0768   Epoch: 10   Global Step: 107700   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:11:33,605-Speed 5508.87 samples/sec   Loss 5.4451   LearningRate 0.0768   Epoch: 10   Global Step: 107710   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:11:40,997-Speed 5541.70 samples/sec   Loss 5.4200   LearningRate 0.0768   Epoch: 10   Global Step: 107720   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:11:48,525-Speed 5441.77 samples/sec   Loss 5.3649   LearningRate 0.0768   Epoch: 10   Global Step: 107730   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:11:55,929-Speed 5532.60 samples/sec   Loss 5.3790   LearningRate 0.0767   Epoch: 10   Global Step: 107740   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:12:03,455-Speed 5443.30 samples/sec   Loss 5.3720   LearningRate 0.0767   Epoch: 10   Global Step: 107750   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:12:10,928-Speed 5482.35 samples/sec   Loss 5.4046   LearningRate 0.0767   Epoch: 10   Global Step: 107760   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 19:12:18,407-Speed 5476.75 samples/sec   Loss 5.3797   LearningRate 0.0767   Epoch: 10   Global Step: 107770   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 19:12:25,914-Speed 5457.60 samples/sec   Loss 5.3784   LearningRate 0.0767   Epoch: 10   Global Step: 107780   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:12:33,342-Speed 5514.75 samples/sec   Loss 5.3790   LearningRate 0.0767   Epoch: 10   Global Step: 107790   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:12:40,789-Speed 5500.95 samples/sec   Loss 5.3614   LearningRate 0.0766   Epoch: 10   Global Step: 107800   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:12:48,360-Speed 5410.64 samples/sec   Loss 5.3468   LearningRate 0.0766   Epoch: 10   Global Step: 107810   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:12:55,793-Speed 5511.39 samples/sec   Loss 5.4119   LearningRate 0.0766   Epoch: 10   Global Step: 107820   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:13:03,310-Speed 5449.88 samples/sec   Loss 5.3751   LearningRate 0.0766   Epoch: 10   Global Step: 107830   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:13:10,786-Speed 5479.67 samples/sec   Loss 5.3293   LearningRate 0.0766   Epoch: 10   Global Step: 107840   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:13:18,308-Speed 5446.13 samples/sec   Loss 5.3463   LearningRate 0.0766   Epoch: 10   Global Step: 107850   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:13:25,773-Speed 5488.02 samples/sec   Loss 5.3349   LearningRate 0.0766   Epoch: 10   Global Step: 107860   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:13:33,235-Speed 5489.74 samples/sec   Loss 5.3661   LearningRate 0.0765   Epoch: 10   Global Step: 107870   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:13:40,733-Speed 5463.48 samples/sec   Loss 5.3227   LearningRate 0.0765   Epoch: 10   Global Step: 107880   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 19:13:48,239-Speed 5457.66 samples/sec   Loss 5.4020   LearningRate 0.0765   Epoch: 10   Global Step: 107890   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 19:13:55,746-Speed 5456.79 samples/sec   Loss 5.3358   LearningRate 0.0765   Epoch: 10   Global Step: 107900   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 19:14:03,232-Speed 5472.78 samples/sec   Loss 5.3645   LearningRate 0.0765   Epoch: 10   Global Step: 107910   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 19:14:10,778-Speed 5428.71 samples/sec   Loss 5.2709   LearningRate 0.0765   Epoch: 10   Global Step: 107920   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 19:14:18,322-Speed 5430.47 samples/sec   Loss 5.3379   LearningRate 0.0764   Epoch: 10   Global Step: 107930   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:14:25,902-Speed 5404.26 samples/sec   Loss 5.3289   LearningRate 0.0764   Epoch: 10   Global Step: 107940   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:14:33,540-Speed 5363.05 samples/sec   Loss 5.3918   LearningRate 0.0764   Epoch: 10   Global Step: 107950   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:14:41,100-Speed 5419.25 samples/sec   Loss 5.3740   LearningRate 0.0764   Epoch: 10   Global Step: 107960   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:14:48,742-Speed 5360.63 samples/sec   Loss 5.3083   LearningRate 0.0764   Epoch: 10   Global Step: 107970   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:14:56,180-Speed 5507.50 samples/sec   Loss 5.3475   LearningRate 0.0764   Epoch: 10   Global Step: 107980   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:15:03,694-Speed 5451.59 samples/sec   Loss 5.3314   LearningRate 0.0764   Epoch: 10   Global Step: 107990   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:15:11,115-Speed 5520.38 samples/sec   Loss 5.3657   LearningRate 0.0763   Epoch: 10   Global Step: 108000   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:15:55,173-[lfw][108000]XNorm: 21.434070
Training: 2022-01-08 19:15:55,174-[lfw][108000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-01-08 19:15:55,175-[lfw][108000]Accuracy-Highest: 0.99817
Training: 2022-01-08 19:16:46,395-[cfp_fp][108000]XNorm: 19.724399
Training: 2022-01-08 19:16:46,396-[cfp_fp][108000]Accuracy-Flip: 0.98900+-0.00507
Training: 2022-01-08 19:16:46,397-[cfp_fp][108000]Accuracy-Highest: 0.99043
Training: 2022-01-08 19:17:30,557-[agedb_30][108000]XNorm: 21.423153
Training: 2022-01-08 19:17:30,558-[agedb_30][108000]Accuracy-Flip: 0.97867+-0.00557
Training: 2022-01-08 19:17:30,559-[agedb_30][108000]Accuracy-Highest: 0.97917
Training: 2022-01-08 19:17:37,783-Speed 279.27 samples/sec   Loss 5.4095   LearningRate 0.0763   Epoch: 10   Global Step: 108010   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:17:45,179-Speed 5540.36 samples/sec   Loss 5.4122   LearningRate 0.0763   Epoch: 10   Global Step: 108020   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:17:52,638-Speed 5491.64 samples/sec   Loss 5.4180   LearningRate 0.0763   Epoch: 10   Global Step: 108030   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:18:00,059-Speed 5521.19 samples/sec   Loss 5.3817   LearningRate 0.0763   Epoch: 10   Global Step: 108040   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:18:07,564-Speed 5459.19 samples/sec   Loss 5.4186   LearningRate 0.0763   Epoch: 10   Global Step: 108050   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:18:15,141-Speed 5406.82 samples/sec   Loss 5.3511   LearningRate 0.0762   Epoch: 10   Global Step: 108060   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:18:22,579-Speed 5507.13 samples/sec   Loss 5.3526   LearningRate 0.0762   Epoch: 10   Global Step: 108070   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:18:29,983-Speed 5533.03 samples/sec   Loss 5.3607   LearningRate 0.0762   Epoch: 10   Global Step: 108080   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:18:37,549-Speed 5415.11 samples/sec   Loss 5.3209   LearningRate 0.0762   Epoch: 10   Global Step: 108090   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:18:45,000-Speed 5497.26 samples/sec   Loss 5.3781   LearningRate 0.0762   Epoch: 10   Global Step: 108100   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:18:52,526-Speed 5443.34 samples/sec   Loss 5.3486   LearningRate 0.0762   Epoch: 10   Global Step: 108110   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:19:00,034-Speed 5456.84 samples/sec   Loss 5.3494   LearningRate 0.0762   Epoch: 10   Global Step: 108120   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:19:07,596-Speed 5417.13 samples/sec   Loss 5.3043   LearningRate 0.0761   Epoch: 10   Global Step: 108130   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:19:15,114-Speed 5448.61 samples/sec   Loss 5.4219   LearningRate 0.0761   Epoch: 10   Global Step: 108140   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:19:22,658-Speed 5430.49 samples/sec   Loss 5.3793   LearningRate 0.0761   Epoch: 10   Global Step: 108150   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:19:30,154-Speed 5464.80 samples/sec   Loss 5.3417   LearningRate 0.0761   Epoch: 10   Global Step: 108160   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 19:19:37,941-Speed 5260.82 samples/sec   Loss 5.3272   LearningRate 0.0761   Epoch: 10   Global Step: 108170   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 19:19:45,421-Speed 5476.28 samples/sec   Loss 5.3395   LearningRate 0.0761   Epoch: 10   Global Step: 108180   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:19:52,899-Speed 5478.38 samples/sec   Loss 5.3283   LearningRate 0.0760   Epoch: 10   Global Step: 108190   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:20:00,359-Speed 5491.67 samples/sec   Loss 5.3691   LearningRate 0.0760   Epoch: 10   Global Step: 108200   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:20:07,799-Speed 5506.02 samples/sec   Loss 5.3747   LearningRate 0.0760   Epoch: 10   Global Step: 108210   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:20:15,267-Speed 5485.20 samples/sec   Loss 5.3385   LearningRate 0.0760   Epoch: 10   Global Step: 108220   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:20:22,782-Speed 5451.00 samples/sec   Loss 5.3021   LearningRate 0.0760   Epoch: 10   Global Step: 108230   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:20:30,245-Speed 5488.92 samples/sec   Loss 5.3634   LearningRate 0.0760   Epoch: 10   Global Step: 108240   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:20:37,681-Speed 5509.84 samples/sec   Loss 5.3582   LearningRate 0.0760   Epoch: 10   Global Step: 108250   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:20:45,276-Speed 5393.41 samples/sec   Loss 5.4142   LearningRate 0.0759   Epoch: 10   Global Step: 108260   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:20:52,722-Speed 5501.56 samples/sec   Loss 5.4024   LearningRate 0.0759   Epoch: 10   Global Step: 108270   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:21:00,347-Speed 5373.40 samples/sec   Loss 5.3392   LearningRate 0.0759   Epoch: 10   Global Step: 108280   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:21:07,804-Speed 5493.88 samples/sec   Loss 5.3399   LearningRate 0.0759   Epoch: 10   Global Step: 108290   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:21:15,248-Speed 5503.24 samples/sec   Loss 5.3671   LearningRate 0.0759   Epoch: 10   Global Step: 108300   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:21:22,863-Speed 5379.12 samples/sec   Loss 5.2822   LearningRate 0.0759   Epoch: 10   Global Step: 108310   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:21:30,554-Speed 5326.59 samples/sec   Loss 5.3376   LearningRate 0.0758   Epoch: 10   Global Step: 108320   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:21:38,085-Speed 5439.34 samples/sec   Loss 5.3327   LearningRate 0.0758   Epoch: 10   Global Step: 108330   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:21:45,603-Speed 5448.80 samples/sec   Loss 5.3252   LearningRate 0.0758   Epoch: 10   Global Step: 108340   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:21:53,185-Speed 5402.81 samples/sec   Loss 5.3701   LearningRate 0.0758   Epoch: 10   Global Step: 108350   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:22:00,576-Speed 5543.09 samples/sec   Loss 5.3435   LearningRate 0.0758   Epoch: 10   Global Step: 108360   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:22:08,008-Speed 5512.12 samples/sec   Loss 5.3543   LearningRate 0.0758   Epoch: 10   Global Step: 108370   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:22:15,544-Speed 5438.36 samples/sec   Loss 5.3168   LearningRate 0.0758   Epoch: 10   Global Step: 108380   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:22:22,953-Speed 5528.67 samples/sec   Loss 5.3810   LearningRate 0.0757   Epoch: 10   Global Step: 108390   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:22:30,405-Speed 5497.28 samples/sec   Loss 5.3531   LearningRate 0.0757   Epoch: 10   Global Step: 108400   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:22:37,771-Speed 5561.08 samples/sec   Loss 5.3720   LearningRate 0.0757   Epoch: 10   Global Step: 108410   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:22:45,322-Speed 5425.85 samples/sec   Loss 5.3090   LearningRate 0.0757   Epoch: 10   Global Step: 108420   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:22:52,871-Speed 5426.33 samples/sec   Loss 5.3384   LearningRate 0.0757   Epoch: 10   Global Step: 108430   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:23:00,311-Speed 5505.79 samples/sec   Loss 5.3481   LearningRate 0.0757   Epoch: 10   Global Step: 108440   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:23:07,908-Speed 5392.94 samples/sec   Loss 5.3932   LearningRate 0.0756   Epoch: 10   Global Step: 108450   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:23:15,400-Speed 5467.42 samples/sec   Loss 5.3329   LearningRate 0.0756   Epoch: 10   Global Step: 108460   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:23:22,923-Speed 5444.95 samples/sec   Loss 5.3572   LearningRate 0.0756   Epoch: 10   Global Step: 108470   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:23:30,553-Speed 5369.57 samples/sec   Loss 5.3029   LearningRate 0.0756   Epoch: 10   Global Step: 108480   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:23:38,050-Speed 5464.50 samples/sec   Loss 5.3230   LearningRate 0.0756   Epoch: 10   Global Step: 108490   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:23:45,533-Speed 5474.70 samples/sec   Loss 5.3153   LearningRate 0.0756   Epoch: 10   Global Step: 108500   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 19:23:53,154-Speed 5374.87 samples/sec   Loss 5.3352   LearningRate 0.0756   Epoch: 10   Global Step: 108510   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:24:00,701-Speed 5428.49 samples/sec   Loss 5.3503   LearningRate 0.0755   Epoch: 10   Global Step: 108520   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:24:08,146-Speed 5502.16 samples/sec   Loss 5.3913   LearningRate 0.0755   Epoch: 10   Global Step: 108530   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:24:15,657-Speed 5453.75 samples/sec   Loss 5.3357   LearningRate 0.0755   Epoch: 10   Global Step: 108540   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:24:23,267-Speed 5383.24 samples/sec   Loss 5.3694   LearningRate 0.0755   Epoch: 10   Global Step: 108550   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:24:30,682-Speed 5524.15 samples/sec   Loss 5.3657   LearningRate 0.0755   Epoch: 10   Global Step: 108560   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:24:38,186-Speed 5459.55 samples/sec   Loss 5.3485   LearningRate 0.0755   Epoch: 10   Global Step: 108570   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:24:45,856-Speed 5340.77 samples/sec   Loss 5.3582   LearningRate 0.0754   Epoch: 10   Global Step: 108580   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:24:53,383-Speed 5442.63 samples/sec   Loss 5.3111   LearningRate 0.0754   Epoch: 10   Global Step: 108590   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:25:00,921-Speed 5433.95 samples/sec   Loss 5.2937   LearningRate 0.0754   Epoch: 10   Global Step: 108600   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:25:08,363-Speed 5504.60 samples/sec   Loss 5.2796   LearningRate 0.0754   Epoch: 10   Global Step: 108610   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:25:16,387-Speed 5105.67 samples/sec   Loss 5.3194   LearningRate 0.0754   Epoch: 10   Global Step: 108620   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:25:23,838-Speed 5497.53 samples/sec   Loss 5.3384   LearningRate 0.0754   Epoch: 10   Global Step: 108630   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:25:31,366-Speed 5442.13 samples/sec   Loss 5.3215   LearningRate 0.0754   Epoch: 10   Global Step: 108640   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:25:38,843-Speed 5478.73 samples/sec   Loss 5.3441   LearningRate 0.0753   Epoch: 10   Global Step: 108650   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:25:46,302-Speed 5491.82 samples/sec   Loss 5.3153   LearningRate 0.0753   Epoch: 10   Global Step: 108660   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:25:53,855-Speed 5423.72 samples/sec   Loss 5.3712   LearningRate 0.0753   Epoch: 10   Global Step: 108670   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:26:01,355-Speed 5462.45 samples/sec   Loss 5.3406   LearningRate 0.0753   Epoch: 10   Global Step: 108680   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:26:08,841-Speed 5472.26 samples/sec   Loss 5.3425   LearningRate 0.0753   Epoch: 10   Global Step: 108690   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:26:16,352-Speed 5453.83 samples/sec   Loss 5.3844   LearningRate 0.0753   Epoch: 10   Global Step: 108700   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:26:23,890-Speed 5434.48 samples/sec   Loss 5.3207   LearningRate 0.0753   Epoch: 10   Global Step: 108710   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:26:31,389-Speed 5463.23 samples/sec   Loss 5.3110   LearningRate 0.0752   Epoch: 10   Global Step: 108720   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:26:38,879-Speed 5469.30 samples/sec   Loss 5.3431   LearningRate 0.0752   Epoch: 10   Global Step: 108730   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:26:46,376-Speed 5464.29 samples/sec   Loss 5.2860   LearningRate 0.0752   Epoch: 10   Global Step: 108740   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:26:53,996-Speed 5375.94 samples/sec   Loss 5.3118   LearningRate 0.0752   Epoch: 10   Global Step: 108750   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:27:01,463-Speed 5485.83 samples/sec   Loss 5.2689   LearningRate 0.0752   Epoch: 10   Global Step: 108760   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 19:27:08,934-Speed 5483.77 samples/sec   Loss 5.3435   LearningRate 0.0752   Epoch: 10   Global Step: 108770   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:27:16,484-Speed 5426.01 samples/sec   Loss 5.2759   LearningRate 0.0751   Epoch: 10   Global Step: 108780   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:27:24,020-Speed 5435.55 samples/sec   Loss 5.3510   LearningRate 0.0751   Epoch: 10   Global Step: 108790   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:27:31,502-Speed 5475.99 samples/sec   Loss 5.3368   LearningRate 0.0751   Epoch: 10   Global Step: 108800   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:27:39,071-Speed 5411.63 samples/sec   Loss 5.3153   LearningRate 0.0751   Epoch: 10   Global Step: 108810   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:27:46,497-Speed 5516.57 samples/sec   Loss 5.2879   LearningRate 0.0751   Epoch: 10   Global Step: 108820   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:27:53,961-Speed 5488.81 samples/sec   Loss 5.3432   LearningRate 0.0751   Epoch: 10   Global Step: 108830   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:28:01,483-Speed 5446.43 samples/sec   Loss 5.3130   LearningRate 0.0751   Epoch: 10   Global Step: 108840   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:28:08,863-Speed 5550.66 samples/sec   Loss 5.3151   LearningRate 0.0750   Epoch: 10   Global Step: 108850   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:28:16,275-Speed 5526.30 samples/sec   Loss 5.2989   LearningRate 0.0750   Epoch: 10   Global Step: 108860   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:28:23,781-Speed 5458.20 samples/sec   Loss 5.3176   LearningRate 0.0750   Epoch: 10   Global Step: 108870   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 19:28:31,273-Speed 5468.06 samples/sec   Loss 5.3410   LearningRate 0.0750   Epoch: 10   Global Step: 108880   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 19:28:38,666-Speed 5541.17 samples/sec   Loss 5.3255   LearningRate 0.0750   Epoch: 10   Global Step: 108890   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:28:46,105-Speed 5506.87 samples/sec   Loss 5.3371   LearningRate 0.0750   Epoch: 10   Global Step: 108900   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:28:53,544-Speed 5506.62 samples/sec   Loss 5.3359   LearningRate 0.0749   Epoch: 10   Global Step: 108910   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:29:01,038-Speed 5466.39 samples/sec   Loss 5.2829   LearningRate 0.0749   Epoch: 10   Global Step: 108920   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:29:08,569-Speed 5439.90 samples/sec   Loss 5.2997   LearningRate 0.0749   Epoch: 10   Global Step: 108930   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:29:16,119-Speed 5425.68 samples/sec   Loss 5.4006   LearningRate 0.0749   Epoch: 10   Global Step: 108940   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:29:23,632-Speed 5452.58 samples/sec   Loss 5.3253   LearningRate 0.0749   Epoch: 10   Global Step: 108950   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:29:31,084-Speed 5497.96 samples/sec   Loss 5.3100   LearningRate 0.0749   Epoch: 10   Global Step: 108960   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:29:38,568-Speed 5473.68 samples/sec   Loss 5.2993   LearningRate 0.0749   Epoch: 10   Global Step: 108970   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:29:46,066-Speed 5463.38 samples/sec   Loss 5.2936   LearningRate 0.0748   Epoch: 10   Global Step: 108980   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:29:53,520-Speed 5495.63 samples/sec   Loss 5.2913   LearningRate 0.0748   Epoch: 10   Global Step: 108990   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 19:30:01,084-Speed 5416.26 samples/sec   Loss 5.3078   LearningRate 0.0748   Epoch: 10   Global Step: 109000   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 19:30:08,539-Speed 5494.90 samples/sec   Loss 5.4003   LearningRate 0.0748   Epoch: 10   Global Step: 109010   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 19:30:16,044-Speed 5458.50 samples/sec   Loss 5.3459   LearningRate 0.0748   Epoch: 10   Global Step: 109020   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:30:23,472-Speed 5515.35 samples/sec   Loss 5.3964   LearningRate 0.0748   Epoch: 10   Global Step: 109030   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:30:30,930-Speed 5492.49 samples/sec   Loss 5.3246   LearningRate 0.0747   Epoch: 10   Global Step: 109040   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:30:38,377-Speed 5501.11 samples/sec   Loss 5.3513   LearningRate 0.0747   Epoch: 10   Global Step: 109050   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:30:45,806-Speed 5514.11 samples/sec   Loss 5.2851   LearningRate 0.0747   Epoch: 10   Global Step: 109060   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:30:53,222-Speed 5523.93 samples/sec   Loss 5.2606   LearningRate 0.0747   Epoch: 10   Global Step: 109070   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:31:00,699-Speed 5479.45 samples/sec   Loss 5.2494   LearningRate 0.0747   Epoch: 10   Global Step: 109080   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:31:08,155-Speed 5494.43 samples/sec   Loss 5.2790   LearningRate 0.0747   Epoch: 10   Global Step: 109090   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:31:15,594-Speed 5506.62 samples/sec   Loss 5.2405   LearningRate 0.0747   Epoch: 10   Global Step: 109100   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:31:23,020-Speed 5516.18 samples/sec   Loss 5.2399   LearningRate 0.0746   Epoch: 10   Global Step: 109110   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:31:30,442-Speed 5519.46 samples/sec   Loss 5.2422   LearningRate 0.0746   Epoch: 10   Global Step: 109120   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:31:37,871-Speed 5514.37 samples/sec   Loss 5.2315   LearningRate 0.0746   Epoch: 10   Global Step: 109130   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:31:45,350-Speed 5477.47 samples/sec   Loss 5.2643   LearningRate 0.0746   Epoch: 10   Global Step: 109140   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:31:52,776-Speed 5516.30 samples/sec   Loss 5.2619   LearningRate 0.0746   Epoch: 10   Global Step: 109150   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:32:00,392-Speed 5379.25 samples/sec   Loss 5.2730   LearningRate 0.0746   Epoch: 10   Global Step: 109160   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:32:07,834-Speed 5505.18 samples/sec   Loss 5.3102   LearningRate 0.0746   Epoch: 10   Global Step: 109170   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 19:32:15,241-Speed 5530.29 samples/sec   Loss 5.3070   LearningRate 0.0745   Epoch: 10   Global Step: 109180   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 19:32:22,720-Speed 5477.70 samples/sec   Loss 5.3066   LearningRate 0.0745   Epoch: 10   Global Step: 109190   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:32:30,141-Speed 5520.17 samples/sec   Loss 5.2949   LearningRate 0.0745   Epoch: 10   Global Step: 109200   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:32:37,590-Speed 5499.10 samples/sec   Loss 5.3239   LearningRate 0.0745   Epoch: 10   Global Step: 109210   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:32:45,136-Speed 5429.54 samples/sec   Loss 5.2739   LearningRate 0.0745   Epoch: 10   Global Step: 109220   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:32:52,621-Speed 5472.56 samples/sec   Loss 5.3262   LearningRate 0.0745   Epoch: 10   Global Step: 109230   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:33:00,104-Speed 5474.14 samples/sec   Loss 5.3630   LearningRate 0.0744   Epoch: 10   Global Step: 109240   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:33:07,543-Speed 5507.51 samples/sec   Loss 5.3136   LearningRate 0.0744   Epoch: 10   Global Step: 109250   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:33:14,960-Speed 5523.55 samples/sec   Loss 5.3017   LearningRate 0.0744   Epoch: 10   Global Step: 109260   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:33:22,384-Speed 5517.42 samples/sec   Loss 5.2994   LearningRate 0.0744   Epoch: 10   Global Step: 109270   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:33:29,809-Speed 5517.32 samples/sec   Loss 5.3266   LearningRate 0.0744   Epoch: 10   Global Step: 109280   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:33:37,238-Speed 5514.24 samples/sec   Loss 5.3516   LearningRate 0.0744   Epoch: 10   Global Step: 109290   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:33:44,655-Speed 5523.84 samples/sec   Loss 5.2775   LearningRate 0.0744   Epoch: 10   Global Step: 109300   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:33:52,093-Speed 5507.38 samples/sec   Loss 5.2920   LearningRate 0.0743   Epoch: 10   Global Step: 109310   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:33:59,541-Speed 5500.33 samples/sec   Loss 5.2882   LearningRate 0.0743   Epoch: 10   Global Step: 109320   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:34:06,941-Speed 5535.99 samples/sec   Loss 5.3032   LearningRate 0.0743   Epoch: 10   Global Step: 109330   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:34:14,396-Speed 5494.80 samples/sec   Loss 5.2713   LearningRate 0.0743   Epoch: 10   Global Step: 109340   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:34:21,860-Speed 5488.38 samples/sec   Loss 5.3524   LearningRate 0.0743   Epoch: 10   Global Step: 109350   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:34:29,310-Speed 5498.63 samples/sec   Loss 5.3518   LearningRate 0.0743   Epoch: 10   Global Step: 109360   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:34:36,725-Speed 5525.00 samples/sec   Loss 5.2844   LearningRate 0.0742   Epoch: 10   Global Step: 109370   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:34:44,156-Speed 5512.69 samples/sec   Loss 5.2512   LearningRate 0.0742   Epoch: 10   Global Step: 109380   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:34:51,604-Speed 5499.90 samples/sec   Loss 5.3212   LearningRate 0.0742   Epoch: 10   Global Step: 109390   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:34:59,044-Speed 5506.27 samples/sec   Loss 5.3354   LearningRate 0.0742   Epoch: 10   Global Step: 109400   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:35:06,511-Speed 5487.38 samples/sec   Loss 5.2737   LearningRate 0.0742   Epoch: 10   Global Step: 109410   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:35:14,043-Speed 5438.81 samples/sec   Loss 5.2822   LearningRate 0.0742   Epoch: 10   Global Step: 109420   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:35:21,492-Speed 5498.93 samples/sec   Loss 5.2704   LearningRate 0.0742   Epoch: 10   Global Step: 109430   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:35:28,908-Speed 5523.99 samples/sec   Loss 5.2524   LearningRate 0.0741   Epoch: 10   Global Step: 109440   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:35:36,326-Speed 5523.05 samples/sec   Loss 5.3026   LearningRate 0.0741   Epoch: 10   Global Step: 109450   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:35:43,753-Speed 5515.31 samples/sec   Loss 5.3056   LearningRate 0.0741   Epoch: 10   Global Step: 109460   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:35:51,159-Speed 5531.18 samples/sec   Loss 5.2992   LearningRate 0.0741   Epoch: 10   Global Step: 109470   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:35:58,673-Speed 5452.17 samples/sec   Loss 5.2731   LearningRate 0.0741   Epoch: 10   Global Step: 109480   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:36:06,172-Speed 5462.45 samples/sec   Loss 5.2335   LearningRate 0.0741   Epoch: 10   Global Step: 109490   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:36:13,624-Speed 5497.44 samples/sec   Loss 5.3027   LearningRate 0.0741   Epoch: 10   Global Step: 109500   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:36:21,058-Speed 5510.21 samples/sec   Loss 5.3025   LearningRate 0.0740   Epoch: 10   Global Step: 109510   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:36:28,568-Speed 5455.16 samples/sec   Loss 5.3116   LearningRate 0.0740   Epoch: 10   Global Step: 109520   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:36:36,018-Speed 5498.63 samples/sec   Loss 5.2977   LearningRate 0.0740   Epoch: 10   Global Step: 109530   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:36:43,462-Speed 5503.49 samples/sec   Loss 5.2613   LearningRate 0.0740   Epoch: 10   Global Step: 109540   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:36:50,864-Speed 5534.59 samples/sec   Loss 5.3089   LearningRate 0.0740   Epoch: 10   Global Step: 109550   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:36:58,287-Speed 5518.69 samples/sec   Loss 5.2750   LearningRate 0.0740   Epoch: 10   Global Step: 109560   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 19:37:05,825-Speed 5433.96 samples/sec   Loss 5.2778   LearningRate 0.0739   Epoch: 10   Global Step: 109570   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 19:37:13,275-Speed 5498.93 samples/sec   Loss 5.2840   LearningRate 0.0739   Epoch: 10   Global Step: 109580   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 19:37:20,723-Speed 5500.09 samples/sec   Loss 5.3407   LearningRate 0.0739   Epoch: 10   Global Step: 109590   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 19:37:28,152-Speed 5514.38 samples/sec   Loss 5.2284   LearningRate 0.0739   Epoch: 10   Global Step: 109600   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:37:35,670-Speed 5448.63 samples/sec   Loss 5.2552   LearningRate 0.0739   Epoch: 10   Global Step: 109610   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:37:43,159-Speed 5471.02 samples/sec   Loss 5.2699   LearningRate 0.0739   Epoch: 10   Global Step: 109620   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:37:50,685-Speed 5442.96 samples/sec   Loss 5.3026   LearningRate 0.0739   Epoch: 10   Global Step: 109630   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:37:58,153-Speed 5485.06 samples/sec   Loss 5.3081   LearningRate 0.0738   Epoch: 10   Global Step: 109640   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:38:05,590-Speed 5508.77 samples/sec   Loss 5.2435   LearningRate 0.0738   Epoch: 10   Global Step: 109650   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:38:13,029-Speed 5507.03 samples/sec   Loss 5.2528   LearningRate 0.0738   Epoch: 10   Global Step: 109660   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:38:20,444-Speed 5524.44 samples/sec   Loss 5.2653   LearningRate 0.0738   Epoch: 10   Global Step: 109670   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:38:27,890-Speed 5501.85 samples/sec   Loss 5.2234   LearningRate 0.0738   Epoch: 10   Global Step: 109680   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:38:35,331-Speed 5504.57 samples/sec   Loss 5.2898   LearningRate 0.0738   Epoch: 10   Global Step: 109690   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:38:42,788-Speed 5494.15 samples/sec   Loss 5.2733   LearningRate 0.0737   Epoch: 10   Global Step: 109700   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:38:50,284-Speed 5464.61 samples/sec   Loss 5.3489   LearningRate 0.0737   Epoch: 10   Global Step: 109710   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:38:57,758-Speed 5481.30 samples/sec   Loss 5.2969   LearningRate 0.0737   Epoch: 10   Global Step: 109720   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:39:05,289-Speed 5439.14 samples/sec   Loss 5.2748   LearningRate 0.0737   Epoch: 10   Global Step: 109730   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:39:12,763-Speed 5481.22 samples/sec   Loss 5.2576   LearningRate 0.0737   Epoch: 10   Global Step: 109740   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:39:20,346-Speed 5402.54 samples/sec   Loss 5.3213   LearningRate 0.0737   Epoch: 10   Global Step: 109750   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:39:27,980-Speed 5366.45 samples/sec   Loss 5.2253   LearningRate 0.0737   Epoch: 10   Global Step: 109760   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:39:35,542-Speed 5416.45 samples/sec   Loss 5.2822   LearningRate 0.0736   Epoch: 10   Global Step: 109770   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:39:43,094-Speed 5424.69 samples/sec   Loss 5.2897   LearningRate 0.0736   Epoch: 10   Global Step: 109780   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:39:50,561-Speed 5486.46 samples/sec   Loss 5.2929   LearningRate 0.0736   Epoch: 10   Global Step: 109790   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:39:57,986-Speed 5517.26 samples/sec   Loss 5.2620   LearningRate 0.0736   Epoch: 10   Global Step: 109800   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:40:05,447-Speed 5490.72 samples/sec   Loss 5.2573   LearningRate 0.0736   Epoch: 10   Global Step: 109810   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:40:12,918-Speed 5482.76 samples/sec   Loss 5.2658   LearningRate 0.0736   Epoch: 10   Global Step: 109820   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:40:20,378-Speed 5491.40 samples/sec   Loss 5.3077   LearningRate 0.0736   Epoch: 10   Global Step: 109830   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:40:27,852-Speed 5481.34 samples/sec   Loss 5.3099   LearningRate 0.0735   Epoch: 10   Global Step: 109840   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:40:35,269-Speed 5523.34 samples/sec   Loss 5.2751   LearningRate 0.0735   Epoch: 10   Global Step: 109850   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:40:42,709-Speed 5505.82 samples/sec   Loss 5.2295   LearningRate 0.0735   Epoch: 10   Global Step: 109860   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:40:50,128-Speed 5521.68 samples/sec   Loss 5.2378   LearningRate 0.0735   Epoch: 10   Global Step: 109870   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:40:57,566-Speed 5507.76 samples/sec   Loss 5.2736   LearningRate 0.0735   Epoch: 10   Global Step: 109880   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:41:05,037-Speed 5482.70 samples/sec   Loss 5.2576   LearningRate 0.0735   Epoch: 10   Global Step: 109890   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:41:12,548-Speed 5453.71 samples/sec   Loss 5.2755   LearningRate 0.0734   Epoch: 10   Global Step: 109900   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:41:20,025-Speed 5479.09 samples/sec   Loss 5.3048   LearningRate 0.0734   Epoch: 10   Global Step: 109910   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:41:27,545-Speed 5447.74 samples/sec   Loss 5.2394   LearningRate 0.0734   Epoch: 10   Global Step: 109920   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:41:35,036-Speed 5468.52 samples/sec   Loss 5.3107   LearningRate 0.0734   Epoch: 10   Global Step: 109930   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:41:42,491-Speed 5495.21 samples/sec   Loss 5.2673   LearningRate 0.0734   Epoch: 10   Global Step: 109940   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:41:50,005-Speed 5451.97 samples/sec   Loss 5.2102   LearningRate 0.0734   Epoch: 10   Global Step: 109950   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:41:57,519-Speed 5451.41 samples/sec   Loss 5.2540   LearningRate 0.0734   Epoch: 10   Global Step: 109960   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:42:05,227-Speed 5314.56 samples/sec   Loss 5.2526   LearningRate 0.0733   Epoch: 10   Global Step: 109970   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:42:12,703-Speed 5479.22 samples/sec   Loss 5.3202   LearningRate 0.0733   Epoch: 10   Global Step: 109980   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:42:20,138-Speed 5509.89 samples/sec   Loss 5.2706   LearningRate 0.0733   Epoch: 10   Global Step: 109990   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:42:27,554-Speed 5523.99 samples/sec   Loss 5.2781   LearningRate 0.0733   Epoch: 10   Global Step: 110000   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:43:11,251-[lfw][110000]XNorm: 23.009356
Training: 2022-01-08 19:43:11,252-[lfw][110000]Accuracy-Flip: 0.99800+-0.00267
Training: 2022-01-08 19:43:11,253-[lfw][110000]Accuracy-Highest: 0.99817
Training: 2022-01-08 19:44:03,337-[cfp_fp][110000]XNorm: 21.193388
Training: 2022-01-08 19:44:03,337-[cfp_fp][110000]Accuracy-Flip: 0.99014+-0.00509
Training: 2022-01-08 19:44:03,338-[cfp_fp][110000]Accuracy-Highest: 0.99043
Training: 2022-01-08 19:44:47,338-[agedb_30][110000]XNorm: 22.795291
Training: 2022-01-08 19:44:47,339-[agedb_30][110000]Accuracy-Flip: 0.97900+-0.00834
Training: 2022-01-08 19:44:47,339-[agedb_30][110000]Accuracy-Highest: 0.97917
Training: 2022-01-08 19:44:54,842-Speed 278.10 samples/sec   Loss 5.2561   LearningRate 0.0733   Epoch: 10   Global Step: 110010   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:45:02,375-Speed 5438.81 samples/sec   Loss 5.2666   LearningRate 0.0733   Epoch: 10   Global Step: 110020   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:45:09,806-Speed 5512.87 samples/sec   Loss 5.2735   LearningRate 0.0733   Epoch: 10   Global Step: 110030   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:45:17,284-Speed 5479.20 samples/sec   Loss 5.3091   LearningRate 0.0732   Epoch: 10   Global Step: 110040   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:45:24,860-Speed 5408.96 samples/sec   Loss 5.2312   LearningRate 0.0732   Epoch: 10   Global Step: 110050   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:45:32,373-Speed 5452.34 samples/sec   Loss 5.2104   LearningRate 0.0732   Epoch: 10   Global Step: 110060   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:45:39,973-Speed 5390.23 samples/sec   Loss 5.2567   LearningRate 0.0732   Epoch: 10   Global Step: 110070   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 19:45:47,473-Speed 5462.53 samples/sec   Loss 5.2220   LearningRate 0.0732   Epoch: 10   Global Step: 110080   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:45:54,953-Speed 5476.51 samples/sec   Loss 5.2116   LearningRate 0.0732   Epoch: 10   Global Step: 110090   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:46:02,412-Speed 5492.08 samples/sec   Loss 5.2558   LearningRate 0.0731   Epoch: 10   Global Step: 110100   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:46:10,018-Speed 5386.03 samples/sec   Loss 5.2362   LearningRate 0.0731   Epoch: 10   Global Step: 110110   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:46:17,540-Speed 5445.86 samples/sec   Loss 5.2413   LearningRate 0.0731   Epoch: 10   Global Step: 110120   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:46:24,951-Speed 5527.96 samples/sec   Loss 5.2053   LearningRate 0.0731   Epoch: 10   Global Step: 110130   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:46:32,442-Speed 5468.37 samples/sec   Loss 5.2300   LearningRate 0.0731   Epoch: 10   Global Step: 110140   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:46:39,845-Speed 5533.53 samples/sec   Loss 5.2865   LearningRate 0.0731   Epoch: 10   Global Step: 110150   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:46:47,199-Speed 5570.88 samples/sec   Loss 5.2099   LearningRate 0.0731   Epoch: 10   Global Step: 110160   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:46:54,621-Speed 5519.46 samples/sec   Loss 5.1427   LearningRate 0.0730   Epoch: 10   Global Step: 110170   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:47:02,041-Speed 5520.95 samples/sec   Loss 5.1654   LearningRate 0.0730   Epoch: 10   Global Step: 110180   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:47:09,467-Speed 5516.14 samples/sec   Loss 5.2126   LearningRate 0.0730   Epoch: 10   Global Step: 110190   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:47:16,953-Speed 5472.27 samples/sec   Loss 5.2354   LearningRate 0.0730   Epoch: 10   Global Step: 110200   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:47:24,415-Speed 5490.27 samples/sec   Loss 5.2428   LearningRate 0.0730   Epoch: 10   Global Step: 110210   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:47:31,812-Speed 5538.36 samples/sec   Loss 5.2482   LearningRate 0.0730   Epoch: 10   Global Step: 110220   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:47:39,351-Speed 5433.71 samples/sec   Loss 5.2817   LearningRate 0.0730   Epoch: 10   Global Step: 110230   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:47:46,798-Speed 5500.99 samples/sec   Loss 5.2595   LearningRate 0.0729   Epoch: 10   Global Step: 110240   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:47:54,299-Speed 5461.31 samples/sec   Loss 5.3207   LearningRate 0.0729   Epoch: 10   Global Step: 110250   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:48:01,785-Speed 5472.23 samples/sec   Loss 5.2512   LearningRate 0.0729   Epoch: 10   Global Step: 110260   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:48:09,296-Speed 5453.79 samples/sec   Loss 5.2335   LearningRate 0.0729   Epoch: 10   Global Step: 110270   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:48:16,797-Speed 5461.92 samples/sec   Loss 5.1804   LearningRate 0.0729   Epoch: 10   Global Step: 110280   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:48:24,248-Speed 5497.19 samples/sec   Loss 5.2350   LearningRate 0.0729   Epoch: 10   Global Step: 110290   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:48:31,702-Speed 5495.62 samples/sec   Loss 5.2535   LearningRate 0.0728   Epoch: 10   Global Step: 110300   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:48:39,149-Speed 5501.91 samples/sec   Loss 5.2699   LearningRate 0.0728   Epoch: 10   Global Step: 110310   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:48:46,565-Speed 5523.11 samples/sec   Loss 5.2527   LearningRate 0.0728   Epoch: 10   Global Step: 110320   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:48:53,983-Speed 5522.90 samples/sec   Loss 5.1957   LearningRate 0.0728   Epoch: 10   Global Step: 110330   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 19:49:01,495-Speed 5453.49 samples/sec   Loss 5.2270   LearningRate 0.0728   Epoch: 10   Global Step: 110340   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 19:49:09,055-Speed 5418.40 samples/sec   Loss 5.2607   LearningRate 0.0728   Epoch: 10   Global Step: 110350   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:49:16,480-Speed 5517.24 samples/sec   Loss 5.1920   LearningRate 0.0728   Epoch: 10   Global Step: 110360   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:49:23,916-Speed 5508.87 samples/sec   Loss 5.2544   LearningRate 0.0727   Epoch: 10   Global Step: 110370   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:49:31,310-Speed 5540.48 samples/sec   Loss 5.2752   LearningRate 0.0727   Epoch: 10   Global Step: 110380   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:49:38,709-Speed 5537.07 samples/sec   Loss 5.1973   LearningRate 0.0727   Epoch: 10   Global Step: 110390   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:49:46,136-Speed 5515.46 samples/sec   Loss 5.2462   LearningRate 0.0727   Epoch: 10   Global Step: 110400   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:49:53,590-Speed 5496.09 samples/sec   Loss 5.1898   LearningRate 0.0727   Epoch: 10   Global Step: 110410   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:50:00,972-Speed 5549.42 samples/sec   Loss 5.1940   LearningRate 0.0727   Epoch: 10   Global Step: 110420   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:50:08,410-Speed 5507.02 samples/sec   Loss 5.2250   LearningRate 0.0727   Epoch: 10   Global Step: 110430   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:50:15,769-Speed 5567.04 samples/sec   Loss 5.2450   LearningRate 0.0726   Epoch: 10   Global Step: 110440   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:50:23,142-Speed 5556.09 samples/sec   Loss 5.2336   LearningRate 0.0726   Epoch: 10   Global Step: 110450   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 19:50:30,543-Speed 5534.95 samples/sec   Loss 5.2527   LearningRate 0.0726   Epoch: 10   Global Step: 110460   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:50:37,953-Speed 5528.62 samples/sec   Loss 5.2504   LearningRate 0.0726   Epoch: 10   Global Step: 110470   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:50:45,364-Speed 5528.18 samples/sec   Loss 5.2215   LearningRate 0.0726   Epoch: 10   Global Step: 110480   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:50:52,768-Speed 5532.37 samples/sec   Loss 5.2232   LearningRate 0.0726   Epoch: 10   Global Step: 110490   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:51:00,179-Speed 5528.22 samples/sec   Loss 5.2065   LearningRate 0.0725   Epoch: 10   Global Step: 110500   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:51:07,561-Speed 5549.17 samples/sec   Loss 5.1977   LearningRate 0.0725   Epoch: 10   Global Step: 110510   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:51:14,976-Speed 5524.42 samples/sec   Loss 5.2040   LearningRate 0.0725   Epoch: 10   Global Step: 110520   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:51:22,370-Speed 5540.38 samples/sec   Loss 5.2283   LearningRate 0.0725   Epoch: 10   Global Step: 110530   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:51:29,814-Speed 5503.82 samples/sec   Loss 5.2575   LearningRate 0.0725   Epoch: 10   Global Step: 110540   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:51:37,213-Speed 5535.89 samples/sec   Loss 5.1858   LearningRate 0.0725   Epoch: 10   Global Step: 110550   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:51:44,713-Speed 5462.84 samples/sec   Loss 5.2338   LearningRate 0.0725   Epoch: 10   Global Step: 110560   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 19:51:52,098-Speed 5547.03 samples/sec   Loss 5.1874   LearningRate 0.0724   Epoch: 10   Global Step: 110570   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:51:59,554-Speed 5494.21 samples/sec   Loss 5.2327   LearningRate 0.0724   Epoch: 10   Global Step: 110580   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:52:06,959-Speed 5531.97 samples/sec   Loss 5.2433   LearningRate 0.0724   Epoch: 10   Global Step: 110590   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:52:14,419-Speed 5491.93 samples/sec   Loss 5.2853   LearningRate 0.0724   Epoch: 10   Global Step: 110600   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:52:21,891-Speed 5481.91 samples/sec   Loss 5.2396   LearningRate 0.0724   Epoch: 10   Global Step: 110610   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:52:29,344-Speed 5496.70 samples/sec   Loss 5.2512   LearningRate 0.0724   Epoch: 10   Global Step: 110620   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:52:36,761-Speed 5523.37 samples/sec   Loss 5.2485   LearningRate 0.0724   Epoch: 10   Global Step: 110630   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:52:44,279-Speed 5448.74 samples/sec   Loss 5.1959   LearningRate 0.0723   Epoch: 10   Global Step: 110640   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:52:51,679-Speed 5536.43 samples/sec   Loss 5.2183   LearningRate 0.0723   Epoch: 10   Global Step: 110650   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:52:59,023-Speed 5578.07 samples/sec   Loss 5.2537   LearningRate 0.0723   Epoch: 10   Global Step: 110660   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:53:06,416-Speed 5540.63 samples/sec   Loss 5.2583   LearningRate 0.0723   Epoch: 10   Global Step: 110670   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:53:13,797-Speed 5550.78 samples/sec   Loss 5.2380   LearningRate 0.0723   Epoch: 10   Global Step: 110680   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:53:21,170-Speed 5556.31 samples/sec   Loss 5.2384   LearningRate 0.0723   Epoch: 10   Global Step: 110690   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:53:28,598-Speed 5514.25 samples/sec   Loss 5.2399   LearningRate 0.0722   Epoch: 10   Global Step: 110700   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:53:36,070-Speed 5483.17 samples/sec   Loss 5.1821   LearningRate 0.0722   Epoch: 10   Global Step: 110710   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:53:43,554-Speed 5473.17 samples/sec   Loss 5.2673   LearningRate 0.0722   Epoch: 10   Global Step: 110720   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:53:51,017-Speed 5489.50 samples/sec   Loss 5.1963   LearningRate 0.0722   Epoch: 10   Global Step: 110730   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:53:58,464-Speed 5500.57 samples/sec   Loss 5.1766   LearningRate 0.0722   Epoch: 10   Global Step: 110740   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:54:05,914-Speed 5498.89 samples/sec   Loss 5.2076   LearningRate 0.0722   Epoch: 10   Global Step: 110750   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:54:13,325-Speed 5527.46 samples/sec   Loss 5.2086   LearningRate 0.0722   Epoch: 10   Global Step: 110760   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:54:20,681-Speed 5570.63 samples/sec   Loss 5.1826   LearningRate 0.0721   Epoch: 10   Global Step: 110770   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:54:28,061-Speed 5550.90 samples/sec   Loss 5.1924   LearningRate 0.0721   Epoch: 10   Global Step: 110780   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:54:35,437-Speed 5553.01 samples/sec   Loss 5.1603   LearningRate 0.0721   Epoch: 10   Global Step: 110790   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:54:42,888-Speed 5498.37 samples/sec   Loss 5.2258   LearningRate 0.0721   Epoch: 10   Global Step: 110800   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:54:50,323-Speed 5509.86 samples/sec   Loss 5.2209   LearningRate 0.0721   Epoch: 10   Global Step: 110810   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:54:57,719-Speed 5539.37 samples/sec   Loss 5.1945   LearningRate 0.0721   Epoch: 10   Global Step: 110820   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:55:05,145-Speed 5515.90 samples/sec   Loss 5.2141   LearningRate 0.0721   Epoch: 10   Global Step: 110830   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:55:12,539-Speed 5540.50 samples/sec   Loss 5.2529   LearningRate 0.0720   Epoch: 10   Global Step: 110840   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:55:20,059-Speed 5447.50 samples/sec   Loss 5.2462   LearningRate 0.0720   Epoch: 10   Global Step: 110850   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:55:27,485-Speed 5516.98 samples/sec   Loss 5.1565   LearningRate 0.0720   Epoch: 10   Global Step: 110860   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:55:34,910-Speed 5516.82 samples/sec   Loss 5.2044   LearningRate 0.0720   Epoch: 10   Global Step: 110870   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 19:55:42,349-Speed 5507.30 samples/sec   Loss 5.2218   LearningRate 0.0720   Epoch: 10   Global Step: 110880   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 19:55:49,805-Speed 5494.27 samples/sec   Loss 5.2147   LearningRate 0.0720   Epoch: 10   Global Step: 110890   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 19:55:57,193-Speed 5544.94 samples/sec   Loss 5.1860   LearningRate 0.0719   Epoch: 10   Global Step: 110900   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 19:56:04,658-Speed 5487.56 samples/sec   Loss 5.1893   LearningRate 0.0719   Epoch: 10   Global Step: 110910   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 19:56:12,060-Speed 5534.84 samples/sec   Loss 5.2053   LearningRate 0.0719   Epoch: 10   Global Step: 110920   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 19:56:19,513-Speed 5496.61 samples/sec   Loss 5.2144   LearningRate 0.0719   Epoch: 10   Global Step: 110930   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 19:56:26,918-Speed 5532.32 samples/sec   Loss 5.1881   LearningRate 0.0719   Epoch: 10   Global Step: 110940   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 19:56:34,464-Speed 5428.63 samples/sec   Loss 5.2057   LearningRate 0.0719   Epoch: 10   Global Step: 110950   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 19:56:42,028-Speed 5415.97 samples/sec   Loss 5.2069   LearningRate 0.0719   Epoch: 10   Global Step: 110960   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 19:56:49,500-Speed 5482.51 samples/sec   Loss 5.2017   LearningRate 0.0718   Epoch: 10   Global Step: 110970   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 19:56:56,997-Speed 5464.42 samples/sec   Loss 5.2308   LearningRate 0.0718   Epoch: 10   Global Step: 110980   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 19:57:04,424-Speed 5515.83 samples/sec   Loss 5.1644   LearningRate 0.0718   Epoch: 10   Global Step: 110990   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 19:57:14,008-Speed 5528.88 samples/sec   Loss 5.2301   LearningRate 0.0718   Epoch: 10   Global Step: 111000   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:57:21,656-Speed 5356.82 samples/sec   Loss 5.1972   LearningRate 0.0718   Epoch: 10   Global Step: 111010   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:57:29,119-Speed 5489.09 samples/sec   Loss 5.2593   LearningRate 0.0718   Epoch: 10   Global Step: 111020   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:57:36,529-Speed 5528.86 samples/sec   Loss 5.1798   LearningRate 0.0718   Epoch: 10   Global Step: 111030   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:57:44,093-Speed 5415.29 samples/sec   Loss 5.2368   LearningRate 0.0717   Epoch: 10   Global Step: 111040   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:57:51,572-Speed 5477.40 samples/sec   Loss 5.2158   LearningRate 0.0717   Epoch: 10   Global Step: 111050   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:57:59,106-Speed 5437.43 samples/sec   Loss 5.2246   LearningRate 0.0717   Epoch: 10   Global Step: 111060   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:58:06,550-Speed 5503.32 samples/sec   Loss 5.2304   LearningRate 0.0717   Epoch: 10   Global Step: 111070   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:58:14,037-Speed 5471.48 samples/sec   Loss 5.1696   LearningRate 0.0717   Epoch: 10   Global Step: 111080   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:58:21,538-Speed 5460.92 samples/sec   Loss 5.2506   LearningRate 0.0717   Epoch: 10   Global Step: 111090   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:58:28,973-Speed 5509.80 samples/sec   Loss 5.2223   LearningRate 0.0716   Epoch: 10   Global Step: 111100   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:58:36,481-Speed 5456.29 samples/sec   Loss 5.1944   LearningRate 0.0716   Epoch: 10   Global Step: 111110   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:58:43,877-Speed 5538.82 samples/sec   Loss 5.1521   LearningRate 0.0716   Epoch: 10   Global Step: 111120   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:58:51,349-Speed 5483.02 samples/sec   Loss 5.1905   LearningRate 0.0716   Epoch: 10   Global Step: 111130   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:58:58,818-Speed 5484.48 samples/sec   Loss 5.0965   LearningRate 0.0716   Epoch: 10   Global Step: 111140   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:59:06,246-Speed 5514.86 samples/sec   Loss 5.1679   LearningRate 0.0716   Epoch: 10   Global Step: 111150   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:59:13,650-Speed 5532.65 samples/sec   Loss 5.1743   LearningRate 0.0716   Epoch: 10   Global Step: 111160   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:59:21,079-Speed 5514.34 samples/sec   Loss 5.2182   LearningRate 0.0715   Epoch: 10   Global Step: 111170   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:59:28,496-Speed 5523.16 samples/sec   Loss 5.1748   LearningRate 0.0715   Epoch: 10   Global Step: 111180   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 19:59:35,921-Speed 5517.45 samples/sec   Loss 5.1095   LearningRate 0.0715   Epoch: 10   Global Step: 111190   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:59:43,449-Speed 5441.62 samples/sec   Loss 5.1928   LearningRate 0.0715   Epoch: 10   Global Step: 111200   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:59:50,948-Speed 5462.53 samples/sec   Loss 5.1959   LearningRate 0.0715   Epoch: 10   Global Step: 111210   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 19:59:58,401-Speed 5496.51 samples/sec   Loss 5.1899   LearningRate 0.0715   Epoch: 10   Global Step: 111220   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:00:05,918-Speed 5449.98 samples/sec   Loss 5.1339   LearningRate 0.0715   Epoch: 10   Global Step: 111230   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:00:13,386-Speed 5485.29 samples/sec   Loss 5.1522   LearningRate 0.0714   Epoch: 10   Global Step: 111240   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:00:21,207-Speed 5238.47 samples/sec   Loss 5.1555   LearningRate 0.0714   Epoch: 10   Global Step: 111250   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:00:28,743-Speed 5436.19 samples/sec   Loss 5.1599   LearningRate 0.0714   Epoch: 10   Global Step: 111260   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:00:36,279-Speed 5436.12 samples/sec   Loss 5.1396   LearningRate 0.0714   Epoch: 10   Global Step: 111270   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:00:43,741-Speed 5489.83 samples/sec   Loss 5.1741   LearningRate 0.0714   Epoch: 10   Global Step: 111280   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:00:51,253-Speed 5453.17 samples/sec   Loss 5.1669   LearningRate 0.0714   Epoch: 10   Global Step: 111290   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:00:58,667-Speed 5525.84 samples/sec   Loss 5.1514   LearningRate 0.0714   Epoch: 10   Global Step: 111300   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:01:06,325-Speed 5349.52 samples/sec   Loss 5.1749   LearningRate 0.0713   Epoch: 10   Global Step: 111310   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:01:13,772-Speed 5500.97 samples/sec   Loss 5.1834   LearningRate 0.0713   Epoch: 10   Global Step: 111320   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:01:21,196-Speed 5517.75 samples/sec   Loss 5.1432   LearningRate 0.0713   Epoch: 10   Global Step: 111330   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:01:28,676-Speed 5476.73 samples/sec   Loss 5.1974   LearningRate 0.0713   Epoch: 10   Global Step: 111340   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:01:36,135-Speed 5492.59 samples/sec   Loss 5.1514   LearningRate 0.0713   Epoch: 10   Global Step: 111350   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:01:43,641-Speed 5457.32 samples/sec   Loss 5.1830   LearningRate 0.0713   Epoch: 10   Global Step: 111360   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:01:51,267-Speed 5371.80 samples/sec   Loss 5.1846   LearningRate 0.0712   Epoch: 10   Global Step: 111370   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:01:58,795-Speed 5441.47 samples/sec   Loss 5.2553   LearningRate 0.0712   Epoch: 10   Global Step: 111380   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:02:06,284-Speed 5470.17 samples/sec   Loss 5.2156   LearningRate 0.0712   Epoch: 10   Global Step: 111390   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:02:13,764-Speed 5476.95 samples/sec   Loss 5.1618   LearningRate 0.0712   Epoch: 10   Global Step: 111400   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:02:21,255-Speed 5467.96 samples/sec   Loss 5.2002   LearningRate 0.0712   Epoch: 10   Global Step: 111410   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:02:28,807-Speed 5425.27 samples/sec   Loss 5.2006   LearningRate 0.0712   Epoch: 10   Global Step: 111420   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:02:36,385-Speed 5406.46 samples/sec   Loss 5.1831   LearningRate 0.0712   Epoch: 10   Global Step: 111430   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:02:43,933-Speed 5426.89 samples/sec   Loss 5.2137   LearningRate 0.0711   Epoch: 10   Global Step: 111440   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:02:51,476-Speed 5430.51 samples/sec   Loss 5.1943   LearningRate 0.0711   Epoch: 10   Global Step: 111450   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:02:59,007-Speed 5440.55 samples/sec   Loss 5.1493   LearningRate 0.0711   Epoch: 10   Global Step: 111460   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:03:06,622-Speed 5379.49 samples/sec   Loss 5.1663   LearningRate 0.0711   Epoch: 10   Global Step: 111470   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:03:14,122-Speed 5462.23 samples/sec   Loss 5.1265   LearningRate 0.0711   Epoch: 10   Global Step: 111480   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:03:21,610-Speed 5470.24 samples/sec   Loss 5.2157   LearningRate 0.0711   Epoch: 10   Global Step: 111490   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 20:03:29,069-Speed 5492.26 samples/sec   Loss 5.1319   LearningRate 0.0711   Epoch: 10   Global Step: 111500   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 20:03:36,548-Speed 5477.96 samples/sec   Loss 5.2336   LearningRate 0.0710   Epoch: 10   Global Step: 111510   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 20:03:44,021-Speed 5482.11 samples/sec   Loss 5.1308   LearningRate 0.0710   Epoch: 10   Global Step: 111520   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 20:03:51,611-Speed 5396.70 samples/sec   Loss 5.1901   LearningRate 0.0710   Epoch: 10   Global Step: 111530   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 20:03:59,072-Speed 5490.76 samples/sec   Loss 5.1097   LearningRate 0.0710   Epoch: 10   Global Step: 111540   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 20:04:06,582-Speed 5454.58 samples/sec   Loss 5.1625   LearningRate 0.0710   Epoch: 10   Global Step: 111550   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:04:14,157-Speed 5408.63 samples/sec   Loss 5.1963   LearningRate 0.0710   Epoch: 10   Global Step: 111560   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:04:21,605-Speed 5499.65 samples/sec   Loss 5.1562   LearningRate 0.0710   Epoch: 10   Global Step: 111570   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:04:29,079-Speed 5480.85 samples/sec   Loss 5.2111   LearningRate 0.0709   Epoch: 10   Global Step: 111580   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:04:36,533-Speed 5496.29 samples/sec   Loss 5.1450   LearningRate 0.0709   Epoch: 10   Global Step: 111590   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:04:44,169-Speed 5364.68 samples/sec   Loss 5.1604   LearningRate 0.0709   Epoch: 10   Global Step: 111600   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:04:51,822-Speed 5352.79 samples/sec   Loss 5.1653   LearningRate 0.0709   Epoch: 10   Global Step: 111610   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:04:59,373-Speed 5425.06 samples/sec   Loss 5.1559   LearningRate 0.0709   Epoch: 10   Global Step: 111620   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:05:06,821-Speed 5500.60 samples/sec   Loss 5.2222   LearningRate 0.0709   Epoch: 10   Global Step: 111630   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:05:14,345-Speed 5444.56 samples/sec   Loss 5.1701   LearningRate 0.0708   Epoch: 10   Global Step: 111640   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:05:21,815-Speed 5483.63 samples/sec   Loss 5.1881   LearningRate 0.0708   Epoch: 10   Global Step: 111650   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 20:05:29,270-Speed 5495.00 samples/sec   Loss 5.1945   LearningRate 0.0708   Epoch: 10   Global Step: 111660   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:05:36,707-Speed 5509.18 samples/sec   Loss 5.1258   LearningRate 0.0708   Epoch: 10   Global Step: 111670   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:05:44,195-Speed 5470.28 samples/sec   Loss 5.1404   LearningRate 0.0708   Epoch: 10   Global Step: 111680   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:05:51,658-Speed 5489.21 samples/sec   Loss 5.1907   LearningRate 0.0708   Epoch: 10   Global Step: 111690   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:05:59,120-Speed 5489.75 samples/sec   Loss 5.1749   LearningRate 0.0708   Epoch: 10   Global Step: 111700   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:06:06,566-Speed 5502.43 samples/sec   Loss 5.1372   LearningRate 0.0707   Epoch: 10   Global Step: 111710   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:06:14,087-Speed 5446.45 samples/sec   Loss 5.1630   LearningRate 0.0707   Epoch: 10   Global Step: 111720   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:06:21,566-Speed 5476.85 samples/sec   Loss 5.1630   LearningRate 0.0707   Epoch: 10   Global Step: 111730   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:06:28,991-Speed 5517.43 samples/sec   Loss 5.1760   LearningRate 0.0707   Epoch: 10   Global Step: 111740   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:06:36,403-Speed 5527.42 samples/sec   Loss 5.1420   LearningRate 0.0707   Epoch: 10   Global Step: 111750   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:06:43,866-Speed 5489.03 samples/sec   Loss 5.2180   LearningRate 0.0707   Epoch: 10   Global Step: 111760   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:06:51,311-Speed 5502.32 samples/sec   Loss 5.1468   LearningRate 0.0707   Epoch: 10   Global Step: 111770   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:06:58,744-Speed 5511.08 samples/sec   Loss 5.1770   LearningRate 0.0706   Epoch: 10   Global Step: 111780   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:07:06,293-Speed 5426.61 samples/sec   Loss 5.1616   LearningRate 0.0706   Epoch: 10   Global Step: 111790   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:07:13,762-Speed 5485.02 samples/sec   Loss 5.1550   LearningRate 0.0706   Epoch: 10   Global Step: 111800   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:07:21,410-Speed 5356.39 samples/sec   Loss 5.1175   LearningRate 0.0706   Epoch: 10   Global Step: 111810   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:07:28,894-Speed 5472.94 samples/sec   Loss 5.1692   LearningRate 0.0706   Epoch: 10   Global Step: 111820   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:07:36,376-Speed 5475.82 samples/sec   Loss 5.1734   LearningRate 0.0706   Epoch: 10   Global Step: 111830   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:07:43,863-Speed 5471.26 samples/sec   Loss 5.1366   LearningRate 0.0706   Epoch: 10   Global Step: 111840   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:07:51,302-Speed 5507.31 samples/sec   Loss 5.1431   LearningRate 0.0705   Epoch: 10   Global Step: 111850   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:07:58,788-Speed 5471.88 samples/sec   Loss 5.0931   LearningRate 0.0705   Epoch: 10   Global Step: 111860   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:08:06,268-Speed 5477.00 samples/sec   Loss 5.0906   LearningRate 0.0705   Epoch: 10   Global Step: 111870   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:08:13,725-Speed 5493.55 samples/sec   Loss 5.1607   LearningRate 0.0705   Epoch: 10   Global Step: 111880   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:08:21,235-Speed 5454.68 samples/sec   Loss 5.0932   LearningRate 0.0705   Epoch: 10   Global Step: 111890   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:08:28,829-Speed 5394.47 samples/sec   Loss 5.1870   LearningRate 0.0705   Epoch: 10   Global Step: 111900   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:08:36,323-Speed 5466.37 samples/sec   Loss 5.1038   LearningRate 0.0704   Epoch: 10   Global Step: 111910   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:08:43,778-Speed 5495.74 samples/sec   Loss 5.1702   LearningRate 0.0704   Epoch: 10   Global Step: 111920   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:08:51,289-Speed 5453.86 samples/sec   Loss 5.1497   LearningRate 0.0704   Epoch: 10   Global Step: 111930   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:08:58,758-Speed 5484.29 samples/sec   Loss 5.1362   LearningRate 0.0704   Epoch: 10   Global Step: 111940   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:09:06,243-Speed 5473.63 samples/sec   Loss 5.1681   LearningRate 0.0704   Epoch: 10   Global Step: 111950   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:09:13,799-Speed 5421.71 samples/sec   Loss 5.1343   LearningRate 0.0704   Epoch: 10   Global Step: 111960   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:09:21,398-Speed 5390.81 samples/sec   Loss 5.1418   LearningRate 0.0704   Epoch: 10   Global Step: 111970   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:09:28,850-Speed 5496.93 samples/sec   Loss 5.1259   LearningRate 0.0703   Epoch: 10   Global Step: 111980   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:09:36,322-Speed 5482.51 samples/sec   Loss 5.1250   LearningRate 0.0703   Epoch: 10   Global Step: 111990   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:09:43,843-Speed 5446.71 samples/sec   Loss 5.1607   LearningRate 0.0703   Epoch: 10   Global Step: 112000   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:10:27,648-[lfw][112000]XNorm: 23.461704
Training: 2022-01-08 20:10:27,648-[lfw][112000]Accuracy-Flip: 0.99800+-0.00296
Training: 2022-01-08 20:10:27,649-[lfw][112000]Accuracy-Highest: 0.99817
Training: 2022-01-08 20:11:18,659-[cfp_fp][112000]XNorm: 21.740752
Training: 2022-01-08 20:11:18,660-[cfp_fp][112000]Accuracy-Flip: 0.99057+-0.00415
Training: 2022-01-08 20:11:18,661-[cfp_fp][112000]Accuracy-Highest: 0.99057
Training: 2022-01-08 20:12:02,679-[agedb_30][112000]XNorm: 23.420655
Training: 2022-01-08 20:12:02,681-[agedb_30][112000]Accuracy-Flip: 0.97800+-0.00718
Training: 2022-01-08 20:12:02,681-[agedb_30][112000]Accuracy-Highest: 0.97917
Training: 2022-01-08 20:12:10,108-Speed 280.04 samples/sec   Loss 5.1471   LearningRate 0.0703   Epoch: 10   Global Step: 112010   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:12:17,621-Speed 5453.30 samples/sec   Loss 5.1303   LearningRate 0.0703   Epoch: 10   Global Step: 112020   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:12:25,050-Speed 5515.07 samples/sec   Loss 5.1593   LearningRate 0.0703   Epoch: 10   Global Step: 112030   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:12:32,541-Speed 5469.14 samples/sec   Loss 5.1509   LearningRate 0.0703   Epoch: 10   Global Step: 112040   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:12:40,257-Speed 5309.76 samples/sec   Loss 5.1513   LearningRate 0.0702   Epoch: 10   Global Step: 112050   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 20:12:47,882-Speed 5373.57 samples/sec   Loss 5.1337   LearningRate 0.0702   Epoch: 10   Global Step: 112060   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 20:12:55,450-Speed 5413.09 samples/sec   Loss 5.1214   LearningRate 0.0702   Epoch: 10   Global Step: 112070   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 20:13:02,930-Speed 5476.69 samples/sec   Loss 5.1090   LearningRate 0.0702   Epoch: 10   Global Step: 112080   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 20:13:10,414-Speed 5473.86 samples/sec   Loss 5.1276   LearningRate 0.0702   Epoch: 10   Global Step: 112090   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 20:13:17,881-Speed 5486.82 samples/sec   Loss 5.1539   LearningRate 0.0702   Epoch: 10   Global Step: 112100   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 20:13:25,349-Speed 5484.93 samples/sec   Loss 5.1063   LearningRate 0.0702   Epoch: 10   Global Step: 112110   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 20:13:32,800-Speed 5498.29 samples/sec   Loss 5.0837   LearningRate 0.0701   Epoch: 10   Global Step: 112120   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 20:13:40,197-Speed 5537.90 samples/sec   Loss 5.1139   LearningRate 0.0701   Epoch: 10   Global Step: 112130   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:13:47,665-Speed 5485.85 samples/sec   Loss 5.0939   LearningRate 0.0701   Epoch: 10   Global Step: 112140   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:13:55,076-Speed 5527.75 samples/sec   Loss 5.1512   LearningRate 0.0701   Epoch: 10   Global Step: 112150   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:14:02,549-Speed 5481.64 samples/sec   Loss 5.1688   LearningRate 0.0701   Epoch: 10   Global Step: 112160   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:14:09,983-Speed 5510.09 samples/sec   Loss 5.1415   LearningRate 0.0701   Epoch: 10   Global Step: 112170   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:14:17,454-Speed 5483.62 samples/sec   Loss 5.1278   LearningRate 0.0701   Epoch: 10   Global Step: 112180   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:14:24,995-Speed 5432.49 samples/sec   Loss 5.1177   LearningRate 0.0700   Epoch: 10   Global Step: 112190   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:14:32,441-Speed 5501.21 samples/sec   Loss 5.1047   LearningRate 0.0700   Epoch: 10   Global Step: 112200   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:14:39,942-Speed 5461.45 samples/sec   Loss 5.1098   LearningRate 0.0700   Epoch: 10   Global Step: 112210   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:14:47,378-Speed 5509.65 samples/sec   Loss 5.1380   LearningRate 0.0700   Epoch: 10   Global Step: 112220   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:14:54,834-Speed 5494.44 samples/sec   Loss 5.1175   LearningRate 0.0700   Epoch: 10   Global Step: 112230   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:15:02,268-Speed 5510.09 samples/sec   Loss 5.1493   LearningRate 0.0700   Epoch: 10   Global Step: 112240   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:15:09,755-Speed 5471.58 samples/sec   Loss 5.1862   LearningRate 0.0699   Epoch: 10   Global Step: 112250   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:15:17,292-Speed 5435.77 samples/sec   Loss 5.1379   LearningRate 0.0699   Epoch: 10   Global Step: 112260   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:15:24,764-Speed 5482.51 samples/sec   Loss 5.1191   LearningRate 0.0699   Epoch: 10   Global Step: 112270   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:15:32,240-Speed 5479.73 samples/sec   Loss 5.1605   LearningRate 0.0699   Epoch: 10   Global Step: 112280   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:15:39,822-Speed 5402.56 samples/sec   Loss 5.1465   LearningRate 0.0699   Epoch: 10   Global Step: 112290   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:15:47,352-Speed 5440.71 samples/sec   Loss 5.1164   LearningRate 0.0699   Epoch: 10   Global Step: 112300   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:15:55,064-Speed 5311.70 samples/sec   Loss 5.1286   LearningRate 0.0699   Epoch: 10   Global Step: 112310   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:16:02,561-Speed 5464.55 samples/sec   Loss 5.1492   LearningRate 0.0698   Epoch: 10   Global Step: 112320   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:16:10,044-Speed 5473.98 samples/sec   Loss 5.1747   LearningRate 0.0698   Epoch: 10   Global Step: 112330   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 20:16:17,537-Speed 5467.79 samples/sec   Loss 5.1099   LearningRate 0.0698   Epoch: 10   Global Step: 112340   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 20:16:25,006-Speed 5484.48 samples/sec   Loss 5.1366   LearningRate 0.0698   Epoch: 10   Global Step: 112350   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:16:32,489-Speed 5474.92 samples/sec   Loss 5.0841   LearningRate 0.0698   Epoch: 10   Global Step: 112360   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:16:39,955-Speed 5486.35 samples/sec   Loss 5.1324   LearningRate 0.0698   Epoch: 10   Global Step: 112370   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:16:47,417-Speed 5489.97 samples/sec   Loss 5.0603   LearningRate 0.0698   Epoch: 10   Global Step: 112380   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:16:54,899-Speed 5475.49 samples/sec   Loss 5.1238   LearningRate 0.0697   Epoch: 10   Global Step: 112390   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:17:02,419-Speed 5447.68 samples/sec   Loss 5.0864   LearningRate 0.0697   Epoch: 10   Global Step: 112400   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:17:10,017-Speed 5391.39 samples/sec   Loss 5.1322   LearningRate 0.0697   Epoch: 10   Global Step: 112410   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:17:17,566-Speed 5426.10 samples/sec   Loss 5.1009   LearningRate 0.0697   Epoch: 10   Global Step: 112420   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:17:25,031-Speed 5488.17 samples/sec   Loss 5.0915   LearningRate 0.0697   Epoch: 10   Global Step: 112430   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:17:32,618-Speed 5399.24 samples/sec   Loss 5.1033   LearningRate 0.0697   Epoch: 10   Global Step: 112440   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:17:40,068-Speed 5498.75 samples/sec   Loss 5.1455   LearningRate 0.0697   Epoch: 10   Global Step: 112450   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:17:47,555-Speed 5471.38 samples/sec   Loss 5.1160   LearningRate 0.0696   Epoch: 10   Global Step: 112460   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:17:54,999-Speed 5503.56 samples/sec   Loss 5.0774   LearningRate 0.0696   Epoch: 10   Global Step: 112470   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:18:02,509-Speed 5454.20 samples/sec   Loss 5.0868   LearningRate 0.0696   Epoch: 10   Global Step: 112480   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:18:09,991-Speed 5474.99 samples/sec   Loss 5.1288   LearningRate 0.0696   Epoch: 10   Global Step: 112490   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:18:17,816-Speed 5235.08 samples/sec   Loss 5.1352   LearningRate 0.0696   Epoch: 10   Global Step: 112500   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:18:25,296-Speed 5477.27 samples/sec   Loss 5.0939   LearningRate 0.0696   Epoch: 10   Global Step: 112510   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:18:32,742-Speed 5501.44 samples/sec   Loss 5.1396   LearningRate 0.0696   Epoch: 10   Global Step: 112520   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:18:40,304-Speed 5417.33 samples/sec   Loss 5.1567   LearningRate 0.0695   Epoch: 10   Global Step: 112530   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:18:47,899-Speed 5393.74 samples/sec   Loss 5.1280   LearningRate 0.0695   Epoch: 10   Global Step: 112540   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:18:55,355-Speed 5494.13 samples/sec   Loss 5.1133   LearningRate 0.0695   Epoch: 10   Global Step: 112550   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:19:02,912-Speed 5421.09 samples/sec   Loss 5.1108   LearningRate 0.0695   Epoch: 10   Global Step: 112560   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:19:10,399-Speed 5471.88 samples/sec   Loss 5.1157   LearningRate 0.0695   Epoch: 10   Global Step: 112570   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:19:18,000-Speed 5389.37 samples/sec   Loss 5.1465   LearningRate 0.0695   Epoch: 10   Global Step: 112580   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:19:25,476-Speed 5479.07 samples/sec   Loss 5.1311   LearningRate 0.0694   Epoch: 10   Global Step: 112590   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:19:32,926-Speed 5498.82 samples/sec   Loss 5.0938   LearningRate 0.0694   Epoch: 10   Global Step: 112600   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:19:40,376-Speed 5499.33 samples/sec   Loss 5.1216   LearningRate 0.0694   Epoch: 10   Global Step: 112610   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:19:47,801-Speed 5516.40 samples/sec   Loss 5.1031   LearningRate 0.0694   Epoch: 10   Global Step: 112620   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:19:55,235-Speed 5511.08 samples/sec   Loss 5.1470   LearningRate 0.0694   Epoch: 10   Global Step: 112630   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:20:02,658-Speed 5518.94 samples/sec   Loss 5.0801   LearningRate 0.0694   Epoch: 10   Global Step: 112640   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:20:10,113-Speed 5495.06 samples/sec   Loss 5.1584   LearningRate 0.0694   Epoch: 10   Global Step: 112650   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:20:17,547-Speed 5510.08 samples/sec   Loss 5.0937   LearningRate 0.0693   Epoch: 10   Global Step: 112660   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:20:24,997-Speed 5499.05 samples/sec   Loss 5.1260   LearningRate 0.0693   Epoch: 10   Global Step: 112670   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:20:32,424-Speed 5515.87 samples/sec   Loss 5.0867   LearningRate 0.0693   Epoch: 10   Global Step: 112680   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:20:39,930-Speed 5457.67 samples/sec   Loss 5.1047   LearningRate 0.0693   Epoch: 10   Global Step: 112690   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:20:47,345-Speed 5524.85 samples/sec   Loss 5.1151   LearningRate 0.0693   Epoch: 10   Global Step: 112700   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:20:54,766-Speed 5520.20 samples/sec   Loss 5.0683   LearningRate 0.0693   Epoch: 10   Global Step: 112710   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:21:02,167-Speed 5535.25 samples/sec   Loss 5.0684   LearningRate 0.0693   Epoch: 10   Global Step: 112720   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 20:21:09,639-Speed 5482.38 samples/sec   Loss 5.0511   LearningRate 0.0692   Epoch: 10   Global Step: 112730   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 20:21:17,084-Speed 5502.80 samples/sec   Loss 5.1081   LearningRate 0.0692   Epoch: 10   Global Step: 112740   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 20:21:24,473-Speed 5544.36 samples/sec   Loss 5.1251   LearningRate 0.0692   Epoch: 10   Global Step: 112750   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:21:31,941-Speed 5485.31 samples/sec   Loss 5.0515   LearningRate 0.0692   Epoch: 10   Global Step: 112760   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:21:39,422-Speed 5476.43 samples/sec   Loss 5.1337   LearningRate 0.0692   Epoch: 10   Global Step: 112770   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:21:46,856-Speed 5510.51 samples/sec   Loss 5.0651   LearningRate 0.0692   Epoch: 10   Global Step: 112780   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:21:54,300-Speed 5502.93 samples/sec   Loss 5.1324   LearningRate 0.0692   Epoch: 10   Global Step: 112790   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:22:01,751-Speed 5498.28 samples/sec   Loss 5.1149   LearningRate 0.0691   Epoch: 10   Global Step: 112800   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:22:09,231-Speed 5477.07 samples/sec   Loss 5.1175   LearningRate 0.0691   Epoch: 10   Global Step: 112810   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:22:16,696-Speed 5487.60 samples/sec   Loss 5.1075   LearningRate 0.0691   Epoch: 10   Global Step: 112820   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:22:24,115-Speed 5521.38 samples/sec   Loss 5.1343   LearningRate 0.0691   Epoch: 10   Global Step: 112830   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:22:31,531-Speed 5524.12 samples/sec   Loss 5.0521   LearningRate 0.0691   Epoch: 10   Global Step: 112840   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:22:38,955-Speed 5518.06 samples/sec   Loss 5.1245   LearningRate 0.0691   Epoch: 10   Global Step: 112850   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 20:22:46,473-Speed 5449.15 samples/sec   Loss 5.1116   LearningRate 0.0691   Epoch: 10   Global Step: 112860   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 20:22:53,945-Speed 5482.07 samples/sec   Loss 5.1429   LearningRate 0.0690   Epoch: 10   Global Step: 112870   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:23:01,391-Speed 5502.05 samples/sec   Loss 5.1190   LearningRate 0.0690   Epoch: 10   Global Step: 112880   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:23:08,964-Speed 5409.64 samples/sec   Loss 5.0755   LearningRate 0.0690   Epoch: 10   Global Step: 112890   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:23:16,556-Speed 5396.17 samples/sec   Loss 5.0827   LearningRate 0.0690   Epoch: 10   Global Step: 112900   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:23:24,026-Speed 5483.39 samples/sec   Loss 5.0853   LearningRate 0.0690   Epoch: 10   Global Step: 112910   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:23:31,591-Speed 5415.24 samples/sec   Loss 5.0993   LearningRate 0.0690   Epoch: 10   Global Step: 112920   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:23:39,061-Speed 5484.17 samples/sec   Loss 5.0597   LearningRate 0.0690   Epoch: 10   Global Step: 112930   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:23:46,524-Speed 5488.93 samples/sec   Loss 5.0840   LearningRate 0.0689   Epoch: 10   Global Step: 112940   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:23:53,988-Speed 5488.11 samples/sec   Loss 5.0909   LearningRate 0.0689   Epoch: 10   Global Step: 112950   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:24:01,468-Speed 5477.05 samples/sec   Loss 5.0889   LearningRate 0.0689   Epoch: 10   Global Step: 112960   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:24:09,007-Speed 5434.02 samples/sec   Loss 5.0528   LearningRate 0.0689   Epoch: 10   Global Step: 112970   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:24:16,468-Speed 5490.71 samples/sec   Loss 5.1101   LearningRate 0.0689   Epoch: 10   Global Step: 112980   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:24:23,933-Speed 5487.75 samples/sec   Loss 5.0890   LearningRate 0.0689   Epoch: 10   Global Step: 112990   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:24:31,372-Speed 5507.12 samples/sec   Loss 5.1051   LearningRate 0.0688   Epoch: 10   Global Step: 113000   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:24:38,908-Speed 5436.13 samples/sec   Loss 5.1073   LearningRate 0.0688   Epoch: 10   Global Step: 113010   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:24:46,332-Speed 5517.81 samples/sec   Loss 5.1040   LearningRate 0.0688   Epoch: 10   Global Step: 113020   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:24:53,880-Speed 5427.36 samples/sec   Loss 5.0746   LearningRate 0.0688   Epoch: 10   Global Step: 113030   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:25:01,372-Speed 5467.46 samples/sec   Loss 5.1224   LearningRate 0.0688   Epoch: 10   Global Step: 113040   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:25:08,844-Speed 5483.17 samples/sec   Loss 5.1358   LearningRate 0.0688   Epoch: 10   Global Step: 113050   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:25:16,370-Speed 5443.23 samples/sec   Loss 5.1066   LearningRate 0.0688   Epoch: 10   Global Step: 113060   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:25:23,920-Speed 5425.60 samples/sec   Loss 5.1349   LearningRate 0.0687   Epoch: 10   Global Step: 113070   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 20:25:31,401-Speed 5476.18 samples/sec   Loss 5.0536   LearningRate 0.0687   Epoch: 10   Global Step: 113080   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 20:25:38,840-Speed 5506.76 samples/sec   Loss 5.0729   LearningRate 0.0687   Epoch: 10   Global Step: 113090   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 20:25:46,312-Speed 5483.19 samples/sec   Loss 5.0884   LearningRate 0.0687   Epoch: 10   Global Step: 113100   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 20:25:53,789-Speed 5478.17 samples/sec   Loss 5.0754   LearningRate 0.0687   Epoch: 10   Global Step: 113110   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 20:26:01,346-Speed 5421.05 samples/sec   Loss 5.1346   LearningRate 0.0687   Epoch: 10   Global Step: 113120   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:26:08,730-Speed 5547.91 samples/sec   Loss 5.1208   LearningRate 0.0687   Epoch: 10   Global Step: 113130   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:26:16,187-Speed 5493.89 samples/sec   Loss 5.0745   LearningRate 0.0686   Epoch: 10   Global Step: 113140   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:26:23,633-Speed 5501.69 samples/sec   Loss 5.0533   LearningRate 0.0686   Epoch: 10   Global Step: 113150   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:26:31,114-Speed 5475.16 samples/sec   Loss 5.0666   LearningRate 0.0686   Epoch: 10   Global Step: 113160   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:26:38,584-Speed 5484.76 samples/sec   Loss 5.0775   LearningRate 0.0686   Epoch: 10   Global Step: 113170   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:26:46,008-Speed 5517.45 samples/sec   Loss 5.0785   LearningRate 0.0686   Epoch: 10   Global Step: 113180   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:26:53,503-Speed 5465.76 samples/sec   Loss 5.0383   LearningRate 0.0686   Epoch: 10   Global Step: 113190   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:27:00,940-Speed 5508.60 samples/sec   Loss 5.0921   LearningRate 0.0686   Epoch: 10   Global Step: 113200   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:27:08,479-Speed 5433.64 samples/sec   Loss 5.0924   LearningRate 0.0685   Epoch: 10   Global Step: 113210   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:27:16,263-Speed 5263.11 samples/sec   Loss 5.1224   LearningRate 0.0685   Epoch: 10   Global Step: 113220   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 20:27:23,664-Speed 5535.12 samples/sec   Loss 5.1001   LearningRate 0.0685   Epoch: 10   Global Step: 113230   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:27:31,050-Speed 5546.10 samples/sec   Loss 4.9954   LearningRate 0.0685   Epoch: 10   Global Step: 113240   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:27:38,502-Speed 5497.39 samples/sec   Loss 5.0387   LearningRate 0.0685   Epoch: 10   Global Step: 113250   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:27:45,897-Speed 5539.52 samples/sec   Loss 5.1091   LearningRate 0.0685   Epoch: 10   Global Step: 113260   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:27:53,357-Speed 5491.68 samples/sec   Loss 5.1266   LearningRate 0.0685   Epoch: 10   Global Step: 113270   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:28:00,793-Speed 5508.83 samples/sec   Loss 5.0960   LearningRate 0.0684   Epoch: 10   Global Step: 113280   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:28:08,206-Speed 5526.26 samples/sec   Loss 5.0893   LearningRate 0.0684   Epoch: 10   Global Step: 113290   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:28:15,722-Speed 5450.07 samples/sec   Loss 5.0915   LearningRate 0.0684   Epoch: 10   Global Step: 113300   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:28:23,168-Speed 5502.23 samples/sec   Loss 5.0238   LearningRate 0.0684   Epoch: 10   Global Step: 113310   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:28:30,593-Speed 5517.09 samples/sec   Loss 5.0461   LearningRate 0.0684   Epoch: 10   Global Step: 113320   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:28:38,011-Speed 5522.26 samples/sec   Loss 4.9855   LearningRate 0.0684   Epoch: 10   Global Step: 113330   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:28:45,601-Speed 5397.34 samples/sec   Loss 5.0444   LearningRate 0.0684   Epoch: 10   Global Step: 113340   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:28:53,057-Speed 5494.95 samples/sec   Loss 5.0712   LearningRate 0.0683   Epoch: 10   Global Step: 113350   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:29:00,548-Speed 5468.21 samples/sec   Loss 5.0829   LearningRate 0.0683   Epoch: 10   Global Step: 113360   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:29:07,990-Speed 5504.37 samples/sec   Loss 5.0448   LearningRate 0.0683   Epoch: 10   Global Step: 113370   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:29:15,408-Speed 5522.81 samples/sec   Loss 5.0257   LearningRate 0.0683   Epoch: 10   Global Step: 113380   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:29:22,884-Speed 5480.04 samples/sec   Loss 5.0872   LearningRate 0.0683   Epoch: 10   Global Step: 113390   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:29:30,329-Speed 5501.68 samples/sec   Loss 5.0372   LearningRate 0.0683   Epoch: 10   Global Step: 113400   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:29:37,804-Speed 5480.10 samples/sec   Loss 5.0119   LearningRate 0.0683   Epoch: 10   Global Step: 113410   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:29:45,305-Speed 5462.10 samples/sec   Loss 5.0595   LearningRate 0.0682   Epoch: 10   Global Step: 113420   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:29:52,733-Speed 5514.85 samples/sec   Loss 5.1309   LearningRate 0.0682   Epoch: 10   Global Step: 113430   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 20:30:00,179-Speed 5501.84 samples/sec   Loss 5.1177   LearningRate 0.0682   Epoch: 10   Global Step: 113440   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 20:30:07,682-Speed 5459.82 samples/sec   Loss 5.0638   LearningRate 0.0682   Epoch: 10   Global Step: 113450   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 20:30:15,167-Speed 5472.66 samples/sec   Loss 5.0889   LearningRate 0.0682   Epoch: 10   Global Step: 113460   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 20:30:22,627-Speed 5492.17 samples/sec   Loss 5.0401   LearningRate 0.0682   Epoch: 10   Global Step: 113470   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 20:30:30,045-Speed 5522.01 samples/sec   Loss 5.0856   LearningRate 0.0682   Epoch: 10   Global Step: 113480   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 20:30:37,487-Speed 5504.39 samples/sec   Loss 5.0706   LearningRate 0.0681   Epoch: 10   Global Step: 113490   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 20:30:44,905-Speed 5522.66 samples/sec   Loss 5.0489   LearningRate 0.0681   Epoch: 10   Global Step: 113500   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:30:52,418-Speed 5453.35 samples/sec   Loss 5.1222   LearningRate 0.0681   Epoch: 10   Global Step: 113510   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:30:59,844-Speed 5516.28 samples/sec   Loss 5.0861   LearningRate 0.0681   Epoch: 10   Global Step: 113520   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:31:07,297-Speed 5496.49 samples/sec   Loss 5.0636   LearningRate 0.0681   Epoch: 10   Global Step: 113530   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:31:14,679-Speed 5548.96 samples/sec   Loss 5.0631   LearningRate 0.0681   Epoch: 10   Global Step: 113540   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:31:22,079-Speed 5536.27 samples/sec   Loss 5.0747   LearningRate 0.0680   Epoch: 10   Global Step: 113550   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:31:29,526-Speed 5500.68 samples/sec   Loss 5.0343   LearningRate 0.0680   Epoch: 10   Global Step: 113560   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:31:36,957-Speed 5512.57 samples/sec   Loss 5.1321   LearningRate 0.0680   Epoch: 10   Global Step: 113570   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:31:44,471-Speed 5452.11 samples/sec   Loss 5.1165   LearningRate 0.0680   Epoch: 10   Global Step: 113580   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:31:52,038-Speed 5413.74 samples/sec   Loss 5.1198   LearningRate 0.0680   Epoch: 10   Global Step: 113590   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 20:31:59,638-Speed 5390.55 samples/sec   Loss 5.0956   LearningRate 0.0680   Epoch: 10   Global Step: 113600   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 20:32:07,082-Speed 5502.79 samples/sec   Loss 5.0457   LearningRate 0.0680   Epoch: 10   Global Step: 113610   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:32:14,529-Speed 5501.19 samples/sec   Loss 5.0712   LearningRate 0.0679   Epoch: 10   Global Step: 113620   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:32:21,967-Speed 5507.27 samples/sec   Loss 5.0838   LearningRate 0.0679   Epoch: 10   Global Step: 113630   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:32:29,400-Speed 5511.84 samples/sec   Loss 5.0543   LearningRate 0.0679   Epoch: 10   Global Step: 113640   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:32:36,827-Speed 5515.38 samples/sec   Loss 5.1030   LearningRate 0.0679   Epoch: 10   Global Step: 113650   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:32:44,303-Speed 5479.64 samples/sec   Loss 5.0954   LearningRate 0.0679   Epoch: 10   Global Step: 113660   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:32:51,803-Speed 5462.74 samples/sec   Loss 5.0429   LearningRate 0.0679   Epoch: 10   Global Step: 113670   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:32:59,312-Speed 5455.35 samples/sec   Loss 5.0618   LearningRate 0.0679   Epoch: 10   Global Step: 113680   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:33:06,808-Speed 5465.23 samples/sec   Loss 5.0372   LearningRate 0.0678   Epoch: 10   Global Step: 113690   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:33:14,278-Speed 5483.62 samples/sec   Loss 5.0718   LearningRate 0.0678   Epoch: 10   Global Step: 113700   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:33:21,732-Speed 5496.43 samples/sec   Loss 5.0774   LearningRate 0.0678   Epoch: 10   Global Step: 113710   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:33:29,239-Speed 5456.84 samples/sec   Loss 5.0581   LearningRate 0.0678   Epoch: 10   Global Step: 113720   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:33:36,720-Speed 5476.10 samples/sec   Loss 5.0113   LearningRate 0.0678   Epoch: 10   Global Step: 113730   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:33:44,163-Speed 5503.26 samples/sec   Loss 5.0346   LearningRate 0.0678   Epoch: 10   Global Step: 113740   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:33:51,608-Speed 5502.83 samples/sec   Loss 5.0679   LearningRate 0.0678   Epoch: 10   Global Step: 113750   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:33:59,046-Speed 5508.22 samples/sec   Loss 5.0295   LearningRate 0.0677   Epoch: 10   Global Step: 113760   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:34:06,468-Speed 5518.76 samples/sec   Loss 5.0509   LearningRate 0.0677   Epoch: 10   Global Step: 113770   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:34:13,928-Speed 5491.97 samples/sec   Loss 4.9956   LearningRate 0.0677   Epoch: 10   Global Step: 113780   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:34:21,421-Speed 5467.13 samples/sec   Loss 5.0406   LearningRate 0.0677   Epoch: 10   Global Step: 113790   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:34:28,951-Speed 5440.61 samples/sec   Loss 5.0565   LearningRate 0.0677   Epoch: 10   Global Step: 113800   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:34:36,441-Speed 5468.68 samples/sec   Loss 5.0633   LearningRate 0.0677   Epoch: 10   Global Step: 113810   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:34:43,962-Speed 5447.14 samples/sec   Loss 5.0235   LearningRate 0.0677   Epoch: 10   Global Step: 113820   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:34:51,469-Speed 5456.47 samples/sec   Loss 5.0599   LearningRate 0.0676   Epoch: 10   Global Step: 113830   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:34:58,947-Speed 5478.56 samples/sec   Loss 5.0767   LearningRate 0.0676   Epoch: 10   Global Step: 113840   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:35:06,481-Speed 5437.52 samples/sec   Loss 5.0373   LearningRate 0.0676   Epoch: 10   Global Step: 113850   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:35:13,961-Speed 5476.16 samples/sec   Loss 5.0630   LearningRate 0.0676   Epoch: 10   Global Step: 113860   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:35:21,438-Speed 5478.81 samples/sec   Loss 5.0428   LearningRate 0.0676   Epoch: 10   Global Step: 113870   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:35:28,911-Speed 5482.09 samples/sec   Loss 5.0420   LearningRate 0.0676   Epoch: 10   Global Step: 113880   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:35:36,405-Speed 5466.60 samples/sec   Loss 5.0605   LearningRate 0.0676   Epoch: 10   Global Step: 113890   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:35:43,946-Speed 5432.44 samples/sec   Loss 5.0743   LearningRate 0.0675   Epoch: 10   Global Step: 113900   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:35:51,446-Speed 5461.72 samples/sec   Loss 5.0886   LearningRate 0.0675   Epoch: 10   Global Step: 113910   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:35:58,922-Speed 5479.90 samples/sec   Loss 5.0298   LearningRate 0.0675   Epoch: 10   Global Step: 113920   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:36:06,414-Speed 5468.41 samples/sec   Loss 5.0092   LearningRate 0.0675   Epoch: 10   Global Step: 113930   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:36:14,081-Speed 5342.79 samples/sec   Loss 5.0924   LearningRate 0.0675   Epoch: 10   Global Step: 113940   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:36:21,788-Speed 5315.33 samples/sec   Loss 5.0390   LearningRate 0.0675   Epoch: 10   Global Step: 113950   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:36:29,245-Speed 5493.86 samples/sec   Loss 5.0876   LearningRate 0.0675   Epoch: 10   Global Step: 113960   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:36:36,693-Speed 5500.49 samples/sec   Loss 5.0655   LearningRate 0.0674   Epoch: 10   Global Step: 113970   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:36:44,081-Speed 5544.12 samples/sec   Loss 5.0217   LearningRate 0.0674   Epoch: 10   Global Step: 113980   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 20:36:51,529-Speed 5500.30 samples/sec   Loss 5.0927   LearningRate 0.0674   Epoch: 10   Global Step: 113990   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:36:58,909-Speed 5551.57 samples/sec   Loss 5.0728   LearningRate 0.0674   Epoch: 10   Global Step: 114000   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:37:42,978-[lfw][114000]XNorm: 23.642814
Training: 2022-01-08 20:37:42,979-[lfw][114000]Accuracy-Flip: 0.99767+-0.00281
Training: 2022-01-08 20:37:42,979-[lfw][114000]Accuracy-Highest: 0.99817
Training: 2022-01-08 20:38:34,330-[cfp_fp][114000]XNorm: 22.002958
Training: 2022-01-08 20:38:34,331-[cfp_fp][114000]Accuracy-Flip: 0.99043+-0.00495
Training: 2022-01-08 20:38:34,331-[cfp_fp][114000]Accuracy-Highest: 0.99057
Training: 2022-01-08 20:39:18,603-[agedb_30][114000]XNorm: 23.458689
Training: 2022-01-08 20:39:18,604-[agedb_30][114000]Accuracy-Flip: 0.97733+-0.00943
Training: 2022-01-08 20:39:18,605-[agedb_30][114000]Accuracy-Highest: 0.97917
Training: 2022-01-08 20:39:26,034-Speed 278.41 samples/sec   Loss 5.0199   LearningRate 0.0674   Epoch: 10   Global Step: 114010   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:39:33,478-Speed 5504.09 samples/sec   Loss 5.0805   LearningRate 0.0674   Epoch: 10   Global Step: 114020   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:39:40,915-Speed 5509.11 samples/sec   Loss 5.0395   LearningRate 0.0674   Epoch: 10   Global Step: 114030   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:39:48,354-Speed 5507.02 samples/sec   Loss 5.0449   LearningRate 0.0673   Epoch: 10   Global Step: 114040   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:39:55,777-Speed 5519.91 samples/sec   Loss 5.0741   LearningRate 0.0673   Epoch: 10   Global Step: 114050   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:40:03,191-Speed 5525.71 samples/sec   Loss 5.0447   LearningRate 0.0673   Epoch: 10   Global Step: 114060   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:40:26,587-Speed 1750.83 samples/sec   Loss 5.0380   LearningRate 0.0673   Epoch: 11   Global Step: 114070   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:40:34,058-Speed 5483.40 samples/sec   Loss 5.0296   LearningRate 0.0673   Epoch: 11   Global Step: 114080   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:40:41,564-Speed 5457.78 samples/sec   Loss 5.0638   LearningRate 0.0673   Epoch: 11   Global Step: 114090   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 20:40:49,021-Speed 5494.10 samples/sec   Loss 4.9859   LearningRate 0.0673   Epoch: 11   Global Step: 114100   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 20:40:56,419-Speed 5537.55 samples/sec   Loss 4.9813   LearningRate 0.0672   Epoch: 11   Global Step: 114110   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:41:03,851-Speed 5512.01 samples/sec   Loss 5.0120   LearningRate 0.0672   Epoch: 11   Global Step: 114120   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:41:11,285-Speed 5509.91 samples/sec   Loss 5.0429   LearningRate 0.0672   Epoch: 11   Global Step: 114130   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:41:18,741-Speed 5495.06 samples/sec   Loss 4.9895   LearningRate 0.0672   Epoch: 11   Global Step: 114140   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:41:26,140-Speed 5536.43 samples/sec   Loss 5.0340   LearningRate 0.0672   Epoch: 11   Global Step: 114150   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:41:33,605-Speed 5488.04 samples/sec   Loss 5.0652   LearningRate 0.0672   Epoch: 11   Global Step: 114160   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:41:41,402-Speed 5253.83 samples/sec   Loss 5.0171   LearningRate 0.0672   Epoch: 11   Global Step: 114170   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:41:49,093-Speed 5325.79 samples/sec   Loss 5.0094   LearningRate 0.0671   Epoch: 11   Global Step: 114180   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:41:56,840-Speed 5288.13 samples/sec   Loss 5.0225   LearningRate 0.0671   Epoch: 11   Global Step: 114190   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:42:04,577-Speed 5295.15 samples/sec   Loss 5.0312   LearningRate 0.0671   Epoch: 11   Global Step: 114200   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:42:12,341-Speed 5276.06 samples/sec   Loss 4.9800   LearningRate 0.0671   Epoch: 11   Global Step: 114210   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:42:20,070-Speed 5300.75 samples/sec   Loss 5.0054   LearningRate 0.0671   Epoch: 11   Global Step: 114220   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:42:27,824-Speed 5282.72 samples/sec   Loss 4.9886   LearningRate 0.0671   Epoch: 11   Global Step: 114230   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:42:35,536-Speed 5311.71 samples/sec   Loss 4.9686   LearningRate 0.0671   Epoch: 11   Global Step: 114240   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:42:43,282-Speed 5288.62 samples/sec   Loss 4.9683   LearningRate 0.0670   Epoch: 11   Global Step: 114250   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:42:51,095-Speed 5243.69 samples/sec   Loss 5.0407   LearningRate 0.0670   Epoch: 11   Global Step: 114260   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:42:58,789-Speed 5324.02 samples/sec   Loss 5.0153   LearningRate 0.0670   Epoch: 11   Global Step: 114270   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:43:06,456-Speed 5343.24 samples/sec   Loss 5.0232   LearningRate 0.0670   Epoch: 11   Global Step: 114280   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:43:14,098-Speed 5360.25 samples/sec   Loss 5.0189   LearningRate 0.0670   Epoch: 11   Global Step: 114290   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:43:21,799-Speed 5319.50 samples/sec   Loss 5.0222   LearningRate 0.0670   Epoch: 11   Global Step: 114300   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:43:29,434-Speed 5365.54 samples/sec   Loss 5.0263   LearningRate 0.0670   Epoch: 11   Global Step: 114310   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:43:36,845-Speed 5527.70 samples/sec   Loss 4.9916   LearningRate 0.0669   Epoch: 11   Global Step: 114320   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 20:43:44,254-Speed 5528.76 samples/sec   Loss 5.0307   LearningRate 0.0669   Epoch: 11   Global Step: 114330   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 20:43:51,682-Speed 5515.43 samples/sec   Loss 5.0360   LearningRate 0.0669   Epoch: 11   Global Step: 114340   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:43:59,103-Speed 5520.17 samples/sec   Loss 4.9866   LearningRate 0.0669   Epoch: 11   Global Step: 114350   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:44:06,596-Speed 5467.32 samples/sec   Loss 4.9827   LearningRate 0.0669   Epoch: 11   Global Step: 114360   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:44:14,066-Speed 5484.01 samples/sec   Loss 4.9961   LearningRate 0.0669   Epoch: 11   Global Step: 114370   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:44:21,504-Speed 5507.79 samples/sec   Loss 5.0203   LearningRate 0.0669   Epoch: 11   Global Step: 114380   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:44:29,025-Speed 5446.53 samples/sec   Loss 5.0082   LearningRate 0.0668   Epoch: 11   Global Step: 114390   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:44:36,476-Speed 5498.53 samples/sec   Loss 5.0131   LearningRate 0.0668   Epoch: 11   Global Step: 114400   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:44:43,900-Speed 5517.44 samples/sec   Loss 5.0158   LearningRate 0.0668   Epoch: 11   Global Step: 114410   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:44:51,342-Speed 5504.61 samples/sec   Loss 4.9578   LearningRate 0.0668   Epoch: 11   Global Step: 114420   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:44:58,788-Speed 5502.28 samples/sec   Loss 4.9551   LearningRate 0.0668   Epoch: 11   Global Step: 114430   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:45:06,273-Speed 5472.70 samples/sec   Loss 4.9862   LearningRate 0.0668   Epoch: 11   Global Step: 114440   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:45:13,737-Speed 5488.57 samples/sec   Loss 4.9962   LearningRate 0.0668   Epoch: 11   Global Step: 114450   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:45:21,328-Speed 5396.73 samples/sec   Loss 4.9906   LearningRate 0.0667   Epoch: 11   Global Step: 114460   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:45:28,742-Speed 5525.29 samples/sec   Loss 5.0096   LearningRate 0.0667   Epoch: 11   Global Step: 114470   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:45:36,328-Speed 5400.47 samples/sec   Loss 4.9716   LearningRate 0.0667   Epoch: 11   Global Step: 114480   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:45:43,964-Speed 5364.96 samples/sec   Loss 5.0139   LearningRate 0.0667   Epoch: 11   Global Step: 114490   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:45:51,553-Speed 5397.21 samples/sec   Loss 4.9999   LearningRate 0.0667   Epoch: 11   Global Step: 114500   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:45:59,030-Speed 5479.30 samples/sec   Loss 5.0246   LearningRate 0.0667   Epoch: 11   Global Step: 114510   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:46:06,589-Speed 5419.71 samples/sec   Loss 5.0739   LearningRate 0.0666   Epoch: 11   Global Step: 114520   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:46:14,015-Speed 5516.12 samples/sec   Loss 5.0386   LearningRate 0.0666   Epoch: 11   Global Step: 114530   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:46:21,509-Speed 5466.49 samples/sec   Loss 5.0072   LearningRate 0.0666   Epoch: 11   Global Step: 114540   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:46:28,948-Speed 5506.84 samples/sec   Loss 4.9597   LearningRate 0.0666   Epoch: 11   Global Step: 114550   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:46:36,440-Speed 5468.21 samples/sec   Loss 4.9674   LearningRate 0.0666   Epoch: 11   Global Step: 114560   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:46:43,928-Speed 5470.13 samples/sec   Loss 5.0168   LearningRate 0.0666   Epoch: 11   Global Step: 114570   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:46:51,425-Speed 5464.25 samples/sec   Loss 5.0354   LearningRate 0.0666   Epoch: 11   Global Step: 114580   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:46:58,979-Speed 5423.55 samples/sec   Loss 5.0103   LearningRate 0.0665   Epoch: 11   Global Step: 114590   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:47:06,417-Speed 5507.81 samples/sec   Loss 4.9970   LearningRate 0.0665   Epoch: 11   Global Step: 114600   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:47:13,842-Speed 5516.74 samples/sec   Loss 4.9442   LearningRate 0.0665   Epoch: 11   Global Step: 114610   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:47:21,326-Speed 5474.14 samples/sec   Loss 4.9597   LearningRate 0.0665   Epoch: 11   Global Step: 114620   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:47:28,800-Speed 5481.11 samples/sec   Loss 5.0306   LearningRate 0.0665   Epoch: 11   Global Step: 114630   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:47:36,256-Speed 5493.82 samples/sec   Loss 4.9958   LearningRate 0.0665   Epoch: 11   Global Step: 114640   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:47:43,720-Speed 5489.11 samples/sec   Loss 4.9818   LearningRate 0.0665   Epoch: 11   Global Step: 114650   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:47:51,169-Speed 5499.19 samples/sec   Loss 5.0118   LearningRate 0.0664   Epoch: 11   Global Step: 114660   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:47:58,612-Speed 5504.02 samples/sec   Loss 4.9889   LearningRate 0.0664   Epoch: 11   Global Step: 114670   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:48:06,170-Speed 5419.77 samples/sec   Loss 5.0071   LearningRate 0.0664   Epoch: 11   Global Step: 114680   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:48:13,746-Speed 5407.64 samples/sec   Loss 4.9976   LearningRate 0.0664   Epoch: 11   Global Step: 114690   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:48:21,280-Speed 5437.38 samples/sec   Loss 4.9682   LearningRate 0.0664   Epoch: 11   Global Step: 114700   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:48:28,711-Speed 5512.82 samples/sec   Loss 5.0013   LearningRate 0.0664   Epoch: 11   Global Step: 114710   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:48:36,304-Speed 5394.72 samples/sec   Loss 4.9767   LearningRate 0.0664   Epoch: 11   Global Step: 114720   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:48:43,825-Speed 5446.74 samples/sec   Loss 4.9683   LearningRate 0.0663   Epoch: 11   Global Step: 114730   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:48:51,315-Speed 5469.49 samples/sec   Loss 5.0183   LearningRate 0.0663   Epoch: 11   Global Step: 114740   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:48:58,844-Speed 5441.23 samples/sec   Loss 5.0169   LearningRate 0.0663   Epoch: 11   Global Step: 114750   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:49:06,361-Speed 5450.10 samples/sec   Loss 4.9905   LearningRate 0.0663   Epoch: 11   Global Step: 114760   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:49:13,894-Speed 5437.65 samples/sec   Loss 4.9788   LearningRate 0.0663   Epoch: 11   Global Step: 114770   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:49:21,458-Speed 5415.98 samples/sec   Loss 4.9860   LearningRate 0.0663   Epoch: 11   Global Step: 114780   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:49:28,942-Speed 5473.40 samples/sec   Loss 4.9623   LearningRate 0.0663   Epoch: 11   Global Step: 114790   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:49:36,385-Speed 5504.37 samples/sec   Loss 5.0557   LearningRate 0.0662   Epoch: 11   Global Step: 114800   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:49:43,799-Speed 5524.81 samples/sec   Loss 5.0233   LearningRate 0.0662   Epoch: 11   Global Step: 114810   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:49:51,287-Speed 5471.42 samples/sec   Loss 4.9641   LearningRate 0.0662   Epoch: 11   Global Step: 114820   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:49:58,769-Speed 5475.31 samples/sec   Loss 4.9581   LearningRate 0.0662   Epoch: 11   Global Step: 114830   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:50:06,204-Speed 5509.67 samples/sec   Loss 4.9797   LearningRate 0.0662   Epoch: 11   Global Step: 114840   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:50:13,692-Speed 5470.59 samples/sec   Loss 5.0229   LearningRate 0.0662   Epoch: 11   Global Step: 114850   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 20:50:21,200-Speed 5456.25 samples/sec   Loss 4.9704   LearningRate 0.0662   Epoch: 11   Global Step: 114860   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 20:50:28,736-Speed 5436.09 samples/sec   Loss 5.0139   LearningRate 0.0661   Epoch: 11   Global Step: 114870   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 20:50:36,194-Speed 5493.32 samples/sec   Loss 5.0326   LearningRate 0.0661   Epoch: 11   Global Step: 114880   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:50:43,627-Speed 5511.14 samples/sec   Loss 5.0253   LearningRate 0.0661   Epoch: 11   Global Step: 114890   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:50:51,145-Speed 5448.84 samples/sec   Loss 4.9310   LearningRate 0.0661   Epoch: 11   Global Step: 114900   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:50:58,654-Speed 5455.65 samples/sec   Loss 4.9672   LearningRate 0.0661   Epoch: 11   Global Step: 114910   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:51:06,103-Speed 5498.93 samples/sec   Loss 5.0189   LearningRate 0.0661   Epoch: 11   Global Step: 114920   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:51:13,724-Speed 5375.69 samples/sec   Loss 4.9731   LearningRate 0.0661   Epoch: 11   Global Step: 114930   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:51:21,286-Speed 5417.46 samples/sec   Loss 4.9492   LearningRate 0.0660   Epoch: 11   Global Step: 114940   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:51:28,787-Speed 5460.63 samples/sec   Loss 4.9486   LearningRate 0.0660   Epoch: 11   Global Step: 114950   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:51:36,265-Speed 5478.32 samples/sec   Loss 5.0150   LearningRate 0.0660   Epoch: 11   Global Step: 114960   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:51:43,779-Speed 5452.16 samples/sec   Loss 5.0035   LearningRate 0.0660   Epoch: 11   Global Step: 114970   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:51:51,269-Speed 5468.90 samples/sec   Loss 4.9817   LearningRate 0.0660   Epoch: 11   Global Step: 114980   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 20:51:58,852-Speed 5402.71 samples/sec   Loss 4.9543   LearningRate 0.0660   Epoch: 11   Global Step: 114990   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 20:52:06,350-Speed 5462.88 samples/sec   Loss 5.0105   LearningRate 0.0660   Epoch: 11   Global Step: 115000   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 20:52:13,861-Speed 5454.87 samples/sec   Loss 4.9634   LearningRate 0.0659   Epoch: 11   Global Step: 115010   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:52:21,326-Speed 5487.02 samples/sec   Loss 4.9829   LearningRate 0.0659   Epoch: 11   Global Step: 115020   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:52:28,760-Speed 5510.54 samples/sec   Loss 5.0173   LearningRate 0.0659   Epoch: 11   Global Step: 115030   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:52:36,211-Speed 5498.69 samples/sec   Loss 4.9668   LearningRate 0.0659   Epoch: 11   Global Step: 115040   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:52:43,587-Speed 5553.66 samples/sec   Loss 4.9903   LearningRate 0.0659   Epoch: 11   Global Step: 115050   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:52:51,070-Speed 5474.29 samples/sec   Loss 4.9878   LearningRate 0.0659   Epoch: 11   Global Step: 115060   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:52:58,574-Speed 5458.87 samples/sec   Loss 4.9830   LearningRate 0.0659   Epoch: 11   Global Step: 115070   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:53:06,016-Speed 5504.95 samples/sec   Loss 4.9111   LearningRate 0.0658   Epoch: 11   Global Step: 115080   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:53:13,422-Speed 5531.50 samples/sec   Loss 5.0044   LearningRate 0.0658   Epoch: 11   Global Step: 115090   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:53:20,892-Speed 5483.70 samples/sec   Loss 5.0019   LearningRate 0.0658   Epoch: 11   Global Step: 115100   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:53:28,379-Speed 5471.79 samples/sec   Loss 5.0080   LearningRate 0.0658   Epoch: 11   Global Step: 115110   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 20:53:35,850-Speed 5483.02 samples/sec   Loss 5.0452   LearningRate 0.0658   Epoch: 11   Global Step: 115120   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:53:43,283-Speed 5511.62 samples/sec   Loss 4.9887   LearningRate 0.0658   Epoch: 11   Global Step: 115130   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:53:50,766-Speed 5474.23 samples/sec   Loss 4.9519   LearningRate 0.0658   Epoch: 11   Global Step: 115140   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:53:58,214-Speed 5500.39 samples/sec   Loss 4.9795   LearningRate 0.0657   Epoch: 11   Global Step: 115150   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:54:05,629-Speed 5524.71 samples/sec   Loss 4.9970   LearningRate 0.0657   Epoch: 11   Global Step: 115160   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:54:13,068-Speed 5506.76 samples/sec   Loss 4.9740   LearningRate 0.0657   Epoch: 11   Global Step: 115170   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:54:20,571-Speed 5460.10 samples/sec   Loss 5.0148   LearningRate 0.0657   Epoch: 11   Global Step: 115180   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:54:28,026-Speed 5495.04 samples/sec   Loss 4.9510   LearningRate 0.0657   Epoch: 11   Global Step: 115190   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:54:35,499-Speed 5481.66 samples/sec   Loss 4.9980   LearningRate 0.0657   Epoch: 11   Global Step: 115200   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:54:42,942-Speed 5503.80 samples/sec   Loss 4.9937   LearningRate 0.0657   Epoch: 11   Global Step: 115210   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:54:50,457-Speed 5451.63 samples/sec   Loss 5.0076   LearningRate 0.0656   Epoch: 11   Global Step: 115220   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:54:57,897-Speed 5505.92 samples/sec   Loss 4.9858   LearningRate 0.0656   Epoch: 11   Global Step: 115230   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:55:05,372-Speed 5480.11 samples/sec   Loss 4.9530   LearningRate 0.0656   Epoch: 11   Global Step: 115240   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:55:12,798-Speed 5517.04 samples/sec   Loss 4.9736   LearningRate 0.0656   Epoch: 11   Global Step: 115250   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:55:20,292-Speed 5466.18 samples/sec   Loss 4.9651   LearningRate 0.0656   Epoch: 11   Global Step: 115260   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:55:27,751-Speed 5491.72 samples/sec   Loss 4.9436   LearningRate 0.0656   Epoch: 11   Global Step: 115270   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 20:55:35,190-Speed 5507.32 samples/sec   Loss 4.9528   LearningRate 0.0656   Epoch: 11   Global Step: 115280   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:55:42,666-Speed 5479.57 samples/sec   Loss 4.9605   LearningRate 0.0655   Epoch: 11   Global Step: 115290   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:55:50,111-Speed 5502.70 samples/sec   Loss 5.0041   LearningRate 0.0655   Epoch: 11   Global Step: 115300   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:55:57,535-Speed 5517.43 samples/sec   Loss 4.9596   LearningRate 0.0655   Epoch: 11   Global Step: 115310   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:56:05,036-Speed 5461.48 samples/sec   Loss 4.9898   LearningRate 0.0655   Epoch: 11   Global Step: 115320   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:56:12,526-Speed 5469.49 samples/sec   Loss 4.9739   LearningRate 0.0655   Epoch: 11   Global Step: 115330   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:56:20,004-Speed 5477.90 samples/sec   Loss 4.9493   LearningRate 0.0655   Epoch: 11   Global Step: 115340   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:56:27,476-Speed 5482.47 samples/sec   Loss 4.9565   LearningRate 0.0655   Epoch: 11   Global Step: 115350   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:56:34,960-Speed 5473.75 samples/sec   Loss 4.9533   LearningRate 0.0654   Epoch: 11   Global Step: 115360   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:56:42,519-Speed 5419.82 samples/sec   Loss 4.9114   LearningRate 0.0654   Epoch: 11   Global Step: 115370   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:56:50,071-Speed 5423.97 samples/sec   Loss 4.9720   LearningRate 0.0654   Epoch: 11   Global Step: 115380   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 20:56:57,513-Speed 5504.62 samples/sec   Loss 4.9400   LearningRate 0.0654   Epoch: 11   Global Step: 115390   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 20:57:04,973-Speed 5490.91 samples/sec   Loss 4.9152   LearningRate 0.0654   Epoch: 11   Global Step: 115400   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:57:12,402-Speed 5514.80 samples/sec   Loss 4.8933   LearningRate 0.0654   Epoch: 11   Global Step: 115410   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:57:19,858-Speed 5494.47 samples/sec   Loss 4.9304   LearningRate 0.0654   Epoch: 11   Global Step: 115420   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:57:27,299-Speed 5505.38 samples/sec   Loss 4.9344   LearningRate 0.0653   Epoch: 11   Global Step: 115430   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:57:34,816-Speed 5449.85 samples/sec   Loss 4.9356   LearningRate 0.0653   Epoch: 11   Global Step: 115440   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:57:42,290-Speed 5480.95 samples/sec   Loss 4.9735   LearningRate 0.0653   Epoch: 11   Global Step: 115450   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:57:49,800-Speed 5454.58 samples/sec   Loss 4.9698   LearningRate 0.0653   Epoch: 11   Global Step: 115460   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:57:57,368-Speed 5413.38 samples/sec   Loss 4.9515   LearningRate 0.0653   Epoch: 11   Global Step: 115470   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:58:04,884-Speed 5450.02 samples/sec   Loss 4.9664   LearningRate 0.0653   Epoch: 11   Global Step: 115480   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:58:12,349-Speed 5487.97 samples/sec   Loss 4.9845   LearningRate 0.0653   Epoch: 11   Global Step: 115490   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:58:19,832-Speed 5474.49 samples/sec   Loss 4.9425   LearningRate 0.0653   Epoch: 11   Global Step: 115500   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 20:58:27,256-Speed 5518.22 samples/sec   Loss 4.9635   LearningRate 0.0652   Epoch: 11   Global Step: 115510   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:58:34,744-Speed 5470.55 samples/sec   Loss 4.8839   LearningRate 0.0652   Epoch: 11   Global Step: 115520   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:58:42,186-Speed 5504.70 samples/sec   Loss 4.9965   LearningRate 0.0652   Epoch: 11   Global Step: 115530   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:58:49,649-Speed 5488.87 samples/sec   Loss 4.9946   LearningRate 0.0652   Epoch: 11   Global Step: 115540   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:58:57,140-Speed 5468.68 samples/sec   Loss 4.9078   LearningRate 0.0652   Epoch: 11   Global Step: 115550   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:59:04,620-Speed 5476.58 samples/sec   Loss 4.9597   LearningRate 0.0652   Epoch: 11   Global Step: 115560   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:59:12,107-Speed 5471.81 samples/sec   Loss 4.9413   LearningRate 0.0652   Epoch: 11   Global Step: 115570   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:59:19,684-Speed 5406.27 samples/sec   Loss 4.9427   LearningRate 0.0651   Epoch: 11   Global Step: 115580   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:59:27,205-Speed 5447.31 samples/sec   Loss 4.9425   LearningRate 0.0651   Epoch: 11   Global Step: 115590   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:59:34,613-Speed 5529.49 samples/sec   Loss 4.9179   LearningRate 0.0651   Epoch: 11   Global Step: 115600   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:59:42,062-Speed 5499.27 samples/sec   Loss 4.9299   LearningRate 0.0651   Epoch: 11   Global Step: 115610   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 20:59:49,505-Speed 5504.60 samples/sec   Loss 4.9656   LearningRate 0.0651   Epoch: 11   Global Step: 115620   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 20:59:56,965-Speed 5491.01 samples/sec   Loss 4.9835   LearningRate 0.0651   Epoch: 11   Global Step: 115630   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:00:04,451-Speed 5472.03 samples/sec   Loss 4.9165   LearningRate 0.0651   Epoch: 11   Global Step: 115640   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:00:11,896-Speed 5502.65 samples/sec   Loss 4.9345   LearningRate 0.0650   Epoch: 11   Global Step: 115650   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 21:00:19,337-Speed 5505.07 samples/sec   Loss 4.9567   LearningRate 0.0650   Epoch: 11   Global Step: 115660   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 21:00:26,865-Speed 5442.30 samples/sec   Loss 4.9206   LearningRate 0.0650   Epoch: 11   Global Step: 115670   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 21:00:34,394-Speed 5440.43 samples/sec   Loss 4.9978   LearningRate 0.0650   Epoch: 11   Global Step: 115680   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 21:00:41,970-Speed 5407.81 samples/sec   Loss 4.9260   LearningRate 0.0650   Epoch: 11   Global Step: 115690   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 21:00:49,443-Speed 5481.78 samples/sec   Loss 4.9486   LearningRate 0.0650   Epoch: 11   Global Step: 115700   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 21:00:56,903-Speed 5490.81 samples/sec   Loss 4.9743   LearningRate 0.0650   Epoch: 11   Global Step: 115710   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 21:01:04,391-Speed 5471.10 samples/sec   Loss 4.8868   LearningRate 0.0649   Epoch: 11   Global Step: 115720   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 21:01:11,812-Speed 5520.07 samples/sec   Loss 4.9022   LearningRate 0.0649   Epoch: 11   Global Step: 115730   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 21:01:19,256-Speed 5503.05 samples/sec   Loss 4.9634   LearningRate 0.0649   Epoch: 11   Global Step: 115740   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 21:01:26,762-Speed 5457.50 samples/sec   Loss 4.8932   LearningRate 0.0649   Epoch: 11   Global Step: 115750   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:01:34,280-Speed 5449.40 samples/sec   Loss 4.9692   LearningRate 0.0649   Epoch: 11   Global Step: 115760   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:01:41,790-Speed 5454.35 samples/sec   Loss 4.8988   LearningRate 0.0649   Epoch: 11   Global Step: 115770   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:01:49,337-Speed 5428.37 samples/sec   Loss 4.9084   LearningRate 0.0649   Epoch: 11   Global Step: 115780   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:01:56,877-Speed 5433.24 samples/sec   Loss 4.9437   LearningRate 0.0648   Epoch: 11   Global Step: 115790   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:02:04,483-Speed 5385.87 samples/sec   Loss 4.9539   LearningRate 0.0648   Epoch: 11   Global Step: 115800   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:02:11,940-Speed 5492.78 samples/sec   Loss 4.9816   LearningRate 0.0648   Epoch: 11   Global Step: 115810   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:02:19,349-Speed 5529.40 samples/sec   Loss 4.9540   LearningRate 0.0648   Epoch: 11   Global Step: 115820   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:02:26,836-Speed 5471.51 samples/sec   Loss 4.9819   LearningRate 0.0648   Epoch: 11   Global Step: 115830   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:02:34,313-Speed 5479.28 samples/sec   Loss 4.8895   LearningRate 0.0648   Epoch: 11   Global Step: 115840   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:02:41,814-Speed 5461.33 samples/sec   Loss 4.9664   LearningRate 0.0648   Epoch: 11   Global Step: 115850   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:02:49,292-Speed 5478.06 samples/sec   Loss 4.9157   LearningRate 0.0647   Epoch: 11   Global Step: 115860   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:02:56,759-Speed 5486.20 samples/sec   Loss 4.9460   LearningRate 0.0647   Epoch: 11   Global Step: 115870   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:03:04,308-Speed 5426.33 samples/sec   Loss 4.9830   LearningRate 0.0647   Epoch: 11   Global Step: 115880   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:03:11,831-Speed 5445.47 samples/sec   Loss 4.9563   LearningRate 0.0647   Epoch: 11   Global Step: 115890   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 21:03:19,448-Speed 5378.08 samples/sec   Loss 4.9340   LearningRate 0.0647   Epoch: 11   Global Step: 115900   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 21:03:27,069-Speed 5375.64 samples/sec   Loss 4.9565   LearningRate 0.0647   Epoch: 11   Global Step: 115910   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 21:03:34,523-Speed 5495.84 samples/sec   Loss 4.9310   LearningRate 0.0647   Epoch: 11   Global Step: 115920   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 21:03:41,988-Speed 5487.20 samples/sec   Loss 4.9199   LearningRate 0.0646   Epoch: 11   Global Step: 115930   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 21:03:49,473-Speed 5473.14 samples/sec   Loss 4.8589   LearningRate 0.0646   Epoch: 11   Global Step: 115940   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 21:03:57,038-Speed 5414.77 samples/sec   Loss 4.8849   LearningRate 0.0646   Epoch: 11   Global Step: 115950   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 21:04:04,534-Speed 5465.39 samples/sec   Loss 4.9086   LearningRate 0.0646   Epoch: 11   Global Step: 115960   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 21:04:12,044-Speed 5454.89 samples/sec   Loss 4.9009   LearningRate 0.0646   Epoch: 11   Global Step: 115970   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 21:04:19,525-Speed 5475.72 samples/sec   Loss 4.9285   LearningRate 0.0646   Epoch: 11   Global Step: 115980   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 21:04:27,071-Speed 5429.14 samples/sec   Loss 4.9422   LearningRate 0.0646   Epoch: 11   Global Step: 115990   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:04:34,656-Speed 5401.01 samples/sec   Loss 4.8840   LearningRate 0.0645   Epoch: 11   Global Step: 116000   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:05:18,372-[lfw][116000]XNorm: 22.277321
Training: 2022-01-08 21:05:18,372-[lfw][116000]Accuracy-Flip: 0.99783+-0.00279
Training: 2022-01-08 21:05:18,373-[lfw][116000]Accuracy-Highest: 0.99817
Training: 2022-01-08 21:06:09,474-[cfp_fp][116000]XNorm: 20.367996
Training: 2022-01-08 21:06:09,474-[cfp_fp][116000]Accuracy-Flip: 0.98986+-0.00621
Training: 2022-01-08 21:06:09,475-[cfp_fp][116000]Accuracy-Highest: 0.99057
Training: 2022-01-08 21:06:53,770-[agedb_30][116000]XNorm: 22.038671
Training: 2022-01-08 21:06:53,771-[agedb_30][116000]Accuracy-Flip: 0.97550+-0.00796
Training: 2022-01-08 21:06:53,772-[agedb_30][116000]Accuracy-Highest: 0.97917
Training: 2022-01-08 21:07:00,982-Speed 279.92 samples/sec   Loss 4.9729   LearningRate 0.0645   Epoch: 11   Global Step: 116010   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:07:08,565-Speed 5401.90 samples/sec   Loss 4.9270   LearningRate 0.0645   Epoch: 11   Global Step: 116020   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:07:16,041-Speed 5479.64 samples/sec   Loss 4.8882   LearningRate 0.0645   Epoch: 11   Global Step: 116030   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:07:23,585-Speed 5431.38 samples/sec   Loss 4.9377   LearningRate 0.0645   Epoch: 11   Global Step: 116040   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:07:31,070-Speed 5473.61 samples/sec   Loss 4.9531   LearningRate 0.0645   Epoch: 11   Global Step: 116050   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:07:38,630-Speed 5419.76 samples/sec   Loss 4.9500   LearningRate 0.0645   Epoch: 11   Global Step: 116060   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:07:46,104-Speed 5480.89 samples/sec   Loss 4.9379   LearningRate 0.0644   Epoch: 11   Global Step: 116070   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:07:53,563-Speed 5492.59 samples/sec   Loss 4.9239   LearningRate 0.0644   Epoch: 11   Global Step: 116080   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:08:01,055-Speed 5467.57 samples/sec   Loss 4.9721   LearningRate 0.0644   Epoch: 11   Global Step: 116090   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:08:08,550-Speed 5466.04 samples/sec   Loss 4.9267   LearningRate 0.0644   Epoch: 11   Global Step: 116100   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:08:16,040-Speed 5469.67 samples/sec   Loss 4.9075   LearningRate 0.0644   Epoch: 11   Global Step: 116110   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:08:23,479-Speed 5506.88 samples/sec   Loss 4.8758   LearningRate 0.0644   Epoch: 11   Global Step: 116120   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:08:30,931-Speed 5497.08 samples/sec   Loss 4.9073   LearningRate 0.0644   Epoch: 11   Global Step: 116130   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:08:38,474-Speed 5430.85 samples/sec   Loss 4.9308   LearningRate 0.0643   Epoch: 11   Global Step: 116140   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:08:45,948-Speed 5480.70 samples/sec   Loss 4.9161   LearningRate 0.0643   Epoch: 11   Global Step: 116150   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:08:53,453-Speed 5463.03 samples/sec   Loss 4.8775   LearningRate 0.0643   Epoch: 11   Global Step: 116160   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:09:00,995-Speed 5431.26 samples/sec   Loss 4.9036   LearningRate 0.0643   Epoch: 11   Global Step: 116170   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:09:08,456-Speed 5491.00 samples/sec   Loss 4.8871   LearningRate 0.0643   Epoch: 11   Global Step: 116180   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:09:15,894-Speed 5506.74 samples/sec   Loss 4.8960   LearningRate 0.0643   Epoch: 11   Global Step: 116190   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:09:23,358-Speed 5489.03 samples/sec   Loss 4.9077   LearningRate 0.0643   Epoch: 11   Global Step: 116200   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:09:30,801-Speed 5503.76 samples/sec   Loss 4.9072   LearningRate 0.0642   Epoch: 11   Global Step: 116210   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:09:38,261-Speed 5491.33 samples/sec   Loss 4.8598   LearningRate 0.0642   Epoch: 11   Global Step: 116220   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:09:45,708-Speed 5500.70 samples/sec   Loss 4.8525   LearningRate 0.0642   Epoch: 11   Global Step: 116230   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:09:53,246-Speed 5434.75 samples/sec   Loss 4.8893   LearningRate 0.0642   Epoch: 11   Global Step: 116240   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:10:00,696-Speed 5499.27 samples/sec   Loss 4.9590   LearningRate 0.0642   Epoch: 11   Global Step: 116250   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:10:08,221-Speed 5443.86 samples/sec   Loss 4.8654   LearningRate 0.0642   Epoch: 11   Global Step: 116260   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:10:15,688-Speed 5486.34 samples/sec   Loss 4.8911   LearningRate 0.0642   Epoch: 11   Global Step: 116270   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:10:23,188-Speed 5461.22 samples/sec   Loss 4.9235   LearningRate 0.0641   Epoch: 11   Global Step: 116280   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:10:30,634-Speed 5502.36 samples/sec   Loss 4.9029   LearningRate 0.0641   Epoch: 11   Global Step: 116290   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:10:38,101-Speed 5486.61 samples/sec   Loss 4.9118   LearningRate 0.0641   Epoch: 11   Global Step: 116300   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:10:45,555-Speed 5495.21 samples/sec   Loss 4.8929   LearningRate 0.0641   Epoch: 11   Global Step: 116310   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:10:53,135-Speed 5404.77 samples/sec   Loss 4.9213   LearningRate 0.0641   Epoch: 11   Global Step: 116320   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:11:00,666-Speed 5439.70 samples/sec   Loss 4.8730   LearningRate 0.0641   Epoch: 11   Global Step: 116330   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:11:08,194-Speed 5441.69 samples/sec   Loss 4.9035   LearningRate 0.0641   Epoch: 11   Global Step: 116340   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:11:15,794-Speed 5390.17 samples/sec   Loss 4.9561   LearningRate 0.0640   Epoch: 11   Global Step: 116350   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:11:23,242-Speed 5499.76 samples/sec   Loss 4.9497   LearningRate 0.0640   Epoch: 11   Global Step: 116360   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:11:30,767-Speed 5444.57 samples/sec   Loss 4.9511   LearningRate 0.0640   Epoch: 11   Global Step: 116370   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:11:38,215-Speed 5499.66 samples/sec   Loss 4.9061   LearningRate 0.0640   Epoch: 11   Global Step: 116380   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:11:45,743-Speed 5441.93 samples/sec   Loss 4.9431   LearningRate 0.0640   Epoch: 11   Global Step: 116390   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:11:53,237-Speed 5466.78 samples/sec   Loss 4.9059   LearningRate 0.0640   Epoch: 11   Global Step: 116400   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:12:00,735-Speed 5463.31 samples/sec   Loss 4.8964   LearningRate 0.0640   Epoch: 11   Global Step: 116410   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:12:08,290-Speed 5422.16 samples/sec   Loss 4.9514   LearningRate 0.0640   Epoch: 11   Global Step: 116420   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:12:15,812-Speed 5446.35 samples/sec   Loss 4.8788   LearningRate 0.0639   Epoch: 11   Global Step: 116430   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:12:23,255-Speed 5503.77 samples/sec   Loss 4.9060   LearningRate 0.0639   Epoch: 11   Global Step: 116440   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:12:30,730-Speed 5480.14 samples/sec   Loss 4.9326   LearningRate 0.0639   Epoch: 11   Global Step: 116450   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:12:38,258-Speed 5442.43 samples/sec   Loss 4.9055   LearningRate 0.0639   Epoch: 11   Global Step: 116460   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:12:45,759-Speed 5460.44 samples/sec   Loss 4.9270   LearningRate 0.0639   Epoch: 11   Global Step: 116470   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:12:53,296-Speed 5435.48 samples/sec   Loss 4.9534   LearningRate 0.0639   Epoch: 11   Global Step: 116480   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:13:00,780-Speed 5473.94 samples/sec   Loss 4.9288   LearningRate 0.0639   Epoch: 11   Global Step: 116490   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:13:08,351-Speed 5412.56 samples/sec   Loss 4.9029   LearningRate 0.0638   Epoch: 11   Global Step: 116500   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:13:15,904-Speed 5423.22 samples/sec   Loss 4.9015   LearningRate 0.0638   Epoch: 11   Global Step: 116510   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:13:23,381-Speed 5479.09 samples/sec   Loss 4.9191   LearningRate 0.0638   Epoch: 11   Global Step: 116520   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:13:30,833-Speed 5496.86 samples/sec   Loss 4.8863   LearningRate 0.0638   Epoch: 11   Global Step: 116530   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:13:38,369-Speed 5436.46 samples/sec   Loss 4.9091   LearningRate 0.0638   Epoch: 11   Global Step: 116540   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:13:45,863-Speed 5466.82 samples/sec   Loss 4.8996   LearningRate 0.0638   Epoch: 11   Global Step: 116550   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:13:53,326-Speed 5488.37 samples/sec   Loss 4.9121   LearningRate 0.0638   Epoch: 11   Global Step: 116560   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:14:00,899-Speed 5409.99 samples/sec   Loss 4.8932   LearningRate 0.0637   Epoch: 11   Global Step: 116570   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:14:08,360-Speed 5490.47 samples/sec   Loss 4.8736   LearningRate 0.0637   Epoch: 11   Global Step: 116580   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:14:15,886-Speed 5442.87 samples/sec   Loss 4.8877   LearningRate 0.0637   Epoch: 11   Global Step: 116590   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:14:23,442-Speed 5421.86 samples/sec   Loss 4.9216   LearningRate 0.0637   Epoch: 11   Global Step: 116600   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:14:30,881-Speed 5507.21 samples/sec   Loss 4.9054   LearningRate 0.0637   Epoch: 11   Global Step: 116610   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:14:38,285-Speed 5532.55 samples/sec   Loss 4.8632   LearningRate 0.0637   Epoch: 11   Global Step: 116620   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:14:45,750-Speed 5487.77 samples/sec   Loss 4.8807   LearningRate 0.0637   Epoch: 11   Global Step: 116630   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:14:53,203-Speed 5496.33 samples/sec   Loss 4.8686   LearningRate 0.0636   Epoch: 11   Global Step: 116640   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:15:00,645-Speed 5505.22 samples/sec   Loss 4.8818   LearningRate 0.0636   Epoch: 11   Global Step: 116650   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:15:08,054-Speed 5528.80 samples/sec   Loss 4.8932   LearningRate 0.0636   Epoch: 11   Global Step: 116660   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:15:15,511-Speed 5493.75 samples/sec   Loss 4.8688   LearningRate 0.0636   Epoch: 11   Global Step: 116670   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:15:22,961-Speed 5498.65 samples/sec   Loss 4.8785   LearningRate 0.0636   Epoch: 11   Global Step: 116680   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:15:30,428-Speed 5485.97 samples/sec   Loss 4.9369   LearningRate 0.0636   Epoch: 11   Global Step: 116690   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:15:37,903-Speed 5481.17 samples/sec   Loss 4.8874   LearningRate 0.0636   Epoch: 11   Global Step: 116700   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:15:45,348-Speed 5501.53 samples/sec   Loss 4.9091   LearningRate 0.0635   Epoch: 11   Global Step: 116710   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:15:52,783-Speed 5510.07 samples/sec   Loss 4.8604   LearningRate 0.0635   Epoch: 11   Global Step: 116720   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:16:00,237-Speed 5496.04 samples/sec   Loss 4.8975   LearningRate 0.0635   Epoch: 11   Global Step: 116730   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:16:07,746-Speed 5455.47 samples/sec   Loss 4.8324   LearningRate 0.0635   Epoch: 11   Global Step: 116740   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:16:15,254-Speed 5456.43 samples/sec   Loss 4.8814   LearningRate 0.0635   Epoch: 11   Global Step: 116750   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:16:22,672-Speed 5522.06 samples/sec   Loss 4.9180   LearningRate 0.0635   Epoch: 11   Global Step: 116760   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:16:30,071-Speed 5536.93 samples/sec   Loss 4.8511   LearningRate 0.0635   Epoch: 11   Global Step: 116770   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:16:37,572-Speed 5461.45 samples/sec   Loss 4.8140   LearningRate 0.0634   Epoch: 11   Global Step: 116780   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:16:44,951-Speed 5551.33 samples/sec   Loss 4.8897   LearningRate 0.0634   Epoch: 11   Global Step: 116790   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:16:52,441-Speed 5469.74 samples/sec   Loss 4.9275   LearningRate 0.0634   Epoch: 11   Global Step: 116800   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:16:59,942-Speed 5461.22 samples/sec   Loss 4.9040   LearningRate 0.0634   Epoch: 11   Global Step: 116810   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:17:07,415-Speed 5481.98 samples/sec   Loss 4.9297   LearningRate 0.0634   Epoch: 11   Global Step: 116820   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:17:14,960-Speed 5429.66 samples/sec   Loss 4.9058   LearningRate 0.0634   Epoch: 11   Global Step: 116830   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:17:22,402-Speed 5504.47 samples/sec   Loss 4.8632   LearningRate 0.0634   Epoch: 11   Global Step: 116840   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:17:29,959-Speed 5421.05 samples/sec   Loss 4.8569   LearningRate 0.0633   Epoch: 11   Global Step: 116850   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:17:37,526-Speed 5413.57 samples/sec   Loss 4.8284   LearningRate 0.0633   Epoch: 11   Global Step: 116860   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:17:45,082-Speed 5421.46 samples/sec   Loss 4.8871   LearningRate 0.0633   Epoch: 11   Global Step: 116870   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:17:52,623-Speed 5431.91 samples/sec   Loss 4.8568   LearningRate 0.0633   Epoch: 11   Global Step: 116880   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:18:00,188-Speed 5415.03 samples/sec   Loss 4.9733   LearningRate 0.0633   Epoch: 11   Global Step: 116890   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:18:07,710-Speed 5446.09 samples/sec   Loss 4.9544   LearningRate 0.0633   Epoch: 11   Global Step: 116900   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:18:15,230-Speed 5447.92 samples/sec   Loss 4.8511   LearningRate 0.0633   Epoch: 11   Global Step: 116910   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:18:22,752-Speed 5445.73 samples/sec   Loss 4.9006   LearningRate 0.0632   Epoch: 11   Global Step: 116920   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:18:30,355-Speed 5388.24 samples/sec   Loss 4.8859   LearningRate 0.0632   Epoch: 11   Global Step: 116930   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:18:37,857-Speed 5460.77 samples/sec   Loss 4.8710   LearningRate 0.0632   Epoch: 11   Global Step: 116940   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:18:45,373-Speed 5450.52 samples/sec   Loss 4.8678   LearningRate 0.0632   Epoch: 11   Global Step: 116950   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:18:52,821-Speed 5499.84 samples/sec   Loss 4.8482   LearningRate 0.0632   Epoch: 11   Global Step: 116960   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:19:00,242-Speed 5520.21 samples/sec   Loss 4.9201   LearningRate 0.0632   Epoch: 11   Global Step: 116970   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:19:07,751-Speed 5455.81 samples/sec   Loss 4.8699   LearningRate 0.0632   Epoch: 11   Global Step: 116980   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:19:15,212-Speed 5490.30 samples/sec   Loss 4.8789   LearningRate 0.0632   Epoch: 11   Global Step: 116990   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:19:22,682-Speed 5484.09 samples/sec   Loss 4.9084   LearningRate 0.0631   Epoch: 11   Global Step: 117000   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:19:30,144-Speed 5490.27 samples/sec   Loss 4.8932   LearningRate 0.0631   Epoch: 11   Global Step: 117010   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:19:37,635-Speed 5468.81 samples/sec   Loss 4.9003   LearningRate 0.0631   Epoch: 11   Global Step: 117020   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:19:45,088-Speed 5496.62 samples/sec   Loss 4.8783   LearningRate 0.0631   Epoch: 11   Global Step: 117030   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:19:52,607-Speed 5447.76 samples/sec   Loss 4.8698   LearningRate 0.0631   Epoch: 11   Global Step: 117040   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:20:00,124-Speed 5449.35 samples/sec   Loss 4.8911   LearningRate 0.0631   Epoch: 11   Global Step: 117050   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:20:07,586-Speed 5490.14 samples/sec   Loss 4.8269   LearningRate 0.0631   Epoch: 11   Global Step: 117060   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:20:15,130-Speed 5430.57 samples/sec   Loss 4.8987   LearningRate 0.0630   Epoch: 11   Global Step: 117070   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:20:22,693-Speed 5416.58 samples/sec   Loss 4.8464   LearningRate 0.0630   Epoch: 11   Global Step: 117080   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:20:30,181-Speed 5470.63 samples/sec   Loss 4.8062   LearningRate 0.0630   Epoch: 11   Global Step: 117090   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:20:37,675-Speed 5465.83 samples/sec   Loss 4.8945   LearningRate 0.0630   Epoch: 11   Global Step: 117100   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:20:45,284-Speed 5384.12 samples/sec   Loss 4.8137   LearningRate 0.0630   Epoch: 11   Global Step: 117110   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:20:52,855-Speed 5411.48 samples/sec   Loss 4.8606   LearningRate 0.0630   Epoch: 11   Global Step: 117120   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:21:00,349-Speed 5465.81 samples/sec   Loss 4.8195   LearningRate 0.0630   Epoch: 11   Global Step: 117130   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:21:07,858-Speed 5455.53 samples/sec   Loss 4.8928   LearningRate 0.0629   Epoch: 11   Global Step: 117140   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:21:15,330-Speed 5482.64 samples/sec   Loss 4.8668   LearningRate 0.0629   Epoch: 11   Global Step: 117150   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:21:22,826-Speed 5465.50 samples/sec   Loss 4.8149   LearningRate 0.0629   Epoch: 11   Global Step: 117160   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:21:30,352-Speed 5442.63 samples/sec   Loss 4.9023   LearningRate 0.0629   Epoch: 11   Global Step: 117170   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:21:37,834-Speed 5475.25 samples/sec   Loss 4.9190   LearningRate 0.0629   Epoch: 11   Global Step: 117180   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:21:45,285-Speed 5498.54 samples/sec   Loss 4.8276   LearningRate 0.0629   Epoch: 11   Global Step: 117190   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:21:52,807-Speed 5446.15 samples/sec   Loss 4.9002   LearningRate 0.0629   Epoch: 11   Global Step: 117200   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:22:00,308-Speed 5460.48 samples/sec   Loss 4.8892   LearningRate 0.0628   Epoch: 11   Global Step: 117210   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:22:07,787-Speed 5477.47 samples/sec   Loss 4.8914   LearningRate 0.0628   Epoch: 11   Global Step: 117220   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:22:15,305-Speed 5449.90 samples/sec   Loss 4.8929   LearningRate 0.0628   Epoch: 11   Global Step: 117230   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:22:22,755-Speed 5498.05 samples/sec   Loss 4.8602   LearningRate 0.0628   Epoch: 11   Global Step: 117240   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:22:30,350-Speed 5394.18 samples/sec   Loss 4.8544   LearningRate 0.0628   Epoch: 11   Global Step: 117250   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:22:37,929-Speed 5404.77 samples/sec   Loss 4.8313   LearningRate 0.0628   Epoch: 11   Global Step: 117260   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:22:45,432-Speed 5459.60 samples/sec   Loss 4.8600   LearningRate 0.0628   Epoch: 11   Global Step: 117270   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:22:52,970-Speed 5434.65 samples/sec   Loss 4.9040   LearningRate 0.0627   Epoch: 11   Global Step: 117280   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:23:00,481-Speed 5454.05 samples/sec   Loss 4.8976   LearningRate 0.0627   Epoch: 11   Global Step: 117290   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:23:08,059-Speed 5405.69 samples/sec   Loss 4.8614   LearningRate 0.0627   Epoch: 11   Global Step: 117300   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:23:15,544-Speed 5473.13 samples/sec   Loss 4.7723   LearningRate 0.0627   Epoch: 11   Global Step: 117310   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:23:23,025-Speed 5476.01 samples/sec   Loss 4.8754   LearningRate 0.0627   Epoch: 11   Global Step: 117320   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:23:30,710-Speed 5330.43 samples/sec   Loss 4.8449   LearningRate 0.0627   Epoch: 11   Global Step: 117330   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:23:38,252-Speed 5431.54 samples/sec   Loss 4.8352   LearningRate 0.0627   Epoch: 11   Global Step: 117340   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:23:45,817-Speed 5415.65 samples/sec   Loss 4.8829   LearningRate 0.0626   Epoch: 11   Global Step: 117350   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:23:53,309-Speed 5467.94 samples/sec   Loss 4.8694   LearningRate 0.0626   Epoch: 11   Global Step: 117360   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:24:00,789-Speed 5475.99 samples/sec   Loss 4.8397   LearningRate 0.0626   Epoch: 11   Global Step: 117370   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:24:08,264-Speed 5480.48 samples/sec   Loss 4.8143   LearningRate 0.0626   Epoch: 11   Global Step: 117380   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:24:15,733-Speed 5485.16 samples/sec   Loss 4.8402   LearningRate 0.0626   Epoch: 11   Global Step: 117390   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:24:23,235-Speed 5460.46 samples/sec   Loss 4.7667   LearningRate 0.0626   Epoch: 11   Global Step: 117400   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:24:30,743-Speed 5456.59 samples/sec   Loss 4.7905   LearningRate 0.0626   Epoch: 11   Global Step: 117410   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:24:38,273-Speed 5440.04 samples/sec   Loss 4.8602   LearningRate 0.0626   Epoch: 11   Global Step: 117420   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:24:45,869-Speed 5392.45 samples/sec   Loss 4.8014   LearningRate 0.0625   Epoch: 11   Global Step: 117430   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:24:53,362-Speed 5467.45 samples/sec   Loss 4.8685   LearningRate 0.0625   Epoch: 11   Global Step: 117440   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:25:00,879-Speed 5449.55 samples/sec   Loss 4.8617   LearningRate 0.0625   Epoch: 11   Global Step: 117450   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:25:08,374-Speed 5466.29 samples/sec   Loss 4.8636   LearningRate 0.0625   Epoch: 11   Global Step: 117460   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:25:15,934-Speed 5418.59 samples/sec   Loss 4.8606   LearningRate 0.0625   Epoch: 11   Global Step: 117470   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:25:23,484-Speed 5425.55 samples/sec   Loss 4.8550   LearningRate 0.0625   Epoch: 11   Global Step: 117480   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:25:30,983-Speed 5463.45 samples/sec   Loss 4.7889   LearningRate 0.0625   Epoch: 11   Global Step: 117490   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:25:38,491-Speed 5455.36 samples/sec   Loss 4.8254   LearningRate 0.0624   Epoch: 11   Global Step: 117500   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:25:46,055-Speed 5416.47 samples/sec   Loss 4.8895   LearningRate 0.0624   Epoch: 11   Global Step: 117510   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:25:53,533-Speed 5478.38 samples/sec   Loss 4.8463   LearningRate 0.0624   Epoch: 11   Global Step: 117520   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:26:01,026-Speed 5466.72 samples/sec   Loss 4.8182   LearningRate 0.0624   Epoch: 11   Global Step: 117530   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:26:08,599-Speed 5409.73 samples/sec   Loss 4.8976   LearningRate 0.0624   Epoch: 11   Global Step: 117540   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:26:16,049-Speed 5498.50 samples/sec   Loss 4.8653   LearningRate 0.0624   Epoch: 11   Global Step: 117550   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:26:23,556-Speed 5457.04 samples/sec   Loss 4.8293   LearningRate 0.0624   Epoch: 11   Global Step: 117560   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:26:31,119-Speed 5416.16 samples/sec   Loss 4.8603   LearningRate 0.0623   Epoch: 11   Global Step: 117570   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:26:38,596-Speed 5479.30 samples/sec   Loss 4.8716   LearningRate 0.0623   Epoch: 11   Global Step: 117580   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:26:46,075-Speed 5476.87 samples/sec   Loss 4.8153   LearningRate 0.0623   Epoch: 11   Global Step: 117590   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:26:53,616-Speed 5432.53 samples/sec   Loss 4.8818   LearningRate 0.0623   Epoch: 11   Global Step: 117600   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:27:01,120-Speed 5459.32 samples/sec   Loss 4.7947   LearningRate 0.0623   Epoch: 11   Global Step: 117610   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:27:08,632-Speed 5453.27 samples/sec   Loss 4.8190   LearningRate 0.0623   Epoch: 11   Global Step: 117620   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:27:16,099-Speed 5486.29 samples/sec   Loss 4.8385   LearningRate 0.0623   Epoch: 11   Global Step: 117630   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:27:23,688-Speed 5398.11 samples/sec   Loss 4.7984   LearningRate 0.0622   Epoch: 11   Global Step: 117640   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 21:27:31,199-Speed 5453.92 samples/sec   Loss 4.7937   LearningRate 0.0622   Epoch: 11   Global Step: 117650   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:27:38,764-Speed 5415.05 samples/sec   Loss 4.8451   LearningRate 0.0622   Epoch: 11   Global Step: 117660   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:27:46,263-Speed 5462.47 samples/sec   Loss 4.8434   LearningRate 0.0622   Epoch: 11   Global Step: 117670   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:27:54,103-Speed 5225.67 samples/sec   Loss 4.8343   LearningRate 0.0622   Epoch: 11   Global Step: 117680   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:28:01,582-Speed 5476.92 samples/sec   Loss 4.8534   LearningRate 0.0622   Epoch: 11   Global Step: 117690   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:28:09,240-Speed 5349.92 samples/sec   Loss 4.8128   LearningRate 0.0622   Epoch: 11   Global Step: 117700   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:28:16,792-Speed 5424.08 samples/sec   Loss 4.8037   LearningRate 0.0621   Epoch: 11   Global Step: 117710   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:28:24,347-Speed 5422.19 samples/sec   Loss 4.8125   LearningRate 0.0621   Epoch: 11   Global Step: 117720   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:28:31,857-Speed 5455.28 samples/sec   Loss 4.8117   LearningRate 0.0621   Epoch: 11   Global Step: 117730   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:28:39,332-Speed 5479.85 samples/sec   Loss 4.8622   LearningRate 0.0621   Epoch: 11   Global Step: 117740   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:28:46,801-Speed 5484.42 samples/sec   Loss 4.8520   LearningRate 0.0621   Epoch: 11   Global Step: 117750   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 21:28:54,306-Speed 5458.83 samples/sec   Loss 4.8600   LearningRate 0.0621   Epoch: 11   Global Step: 117760   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 21:29:01,809-Speed 5460.08 samples/sec   Loss 4.8231   LearningRate 0.0621   Epoch: 11   Global Step: 117770   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 21:29:09,248-Speed 5507.17 samples/sec   Loss 4.8423   LearningRate 0.0621   Epoch: 11   Global Step: 117780   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 21:29:16,738-Speed 5469.26 samples/sec   Loss 4.8127   LearningRate 0.0620   Epoch: 11   Global Step: 117790   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 21:29:24,197-Speed 5492.33 samples/sec   Loss 4.8448   LearningRate 0.0620   Epoch: 11   Global Step: 117800   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 21:29:31,615-Speed 5522.58 samples/sec   Loss 4.8343   LearningRate 0.0620   Epoch: 11   Global Step: 117810   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 21:29:39,157-Speed 5431.41 samples/sec   Loss 4.8630   LearningRate 0.0620   Epoch: 11   Global Step: 117820   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 21:29:46,783-Speed 5371.97 samples/sec   Loss 4.8514   LearningRate 0.0620   Epoch: 11   Global Step: 117830   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-08 21:29:54,390-Speed 5384.96 samples/sec   Loss 4.8831   LearningRate 0.0620   Epoch: 11   Global Step: 117840   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:30:01,922-Speed 5438.81 samples/sec   Loss 4.8969   LearningRate 0.0620   Epoch: 11   Global Step: 117850   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:30:09,557-Speed 5365.71 samples/sec   Loss 4.8339   LearningRate 0.0619   Epoch: 11   Global Step: 117860   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:30:17,168-Speed 5382.11 samples/sec   Loss 4.8752   LearningRate 0.0619   Epoch: 11   Global Step: 117870   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:30:24,655-Speed 5471.76 samples/sec   Loss 4.8122   LearningRate 0.0619   Epoch: 11   Global Step: 117880   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:30:32,074-Speed 5521.61 samples/sec   Loss 4.7965   LearningRate 0.0619   Epoch: 11   Global Step: 117890   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:30:39,617-Speed 5430.81 samples/sec   Loss 4.7756   LearningRate 0.0619   Epoch: 11   Global Step: 117900   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:30:47,106-Speed 5469.71 samples/sec   Loss 4.8418   LearningRate 0.0619   Epoch: 11   Global Step: 117910   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:30:54,593-Speed 5471.76 samples/sec   Loss 4.8011   LearningRate 0.0619   Epoch: 11   Global Step: 117920   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:31:02,081-Speed 5470.86 samples/sec   Loss 4.7878   LearningRate 0.0618   Epoch: 11   Global Step: 117930   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:31:09,614-Speed 5438.44 samples/sec   Loss 4.7984   LearningRate 0.0618   Epoch: 11   Global Step: 117940   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:31:17,068-Speed 5495.44 samples/sec   Loss 4.7880   LearningRate 0.0618   Epoch: 11   Global Step: 117950   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:31:24,613-Speed 5429.58 samples/sec   Loss 4.7855   LearningRate 0.0618   Epoch: 11   Global Step: 117960   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 21:31:32,089-Speed 5479.45 samples/sec   Loss 4.7854   LearningRate 0.0618   Epoch: 11   Global Step: 117970   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 21:31:39,555-Speed 5487.40 samples/sec   Loss 4.8021   LearningRate 0.0618   Epoch: 11   Global Step: 117980   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:31:47,021-Speed 5486.57 samples/sec   Loss 4.7558   LearningRate 0.0618   Epoch: 11   Global Step: 117990   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:31:54,510-Speed 5469.86 samples/sec   Loss 4.8682   LearningRate 0.0617   Epoch: 11   Global Step: 118000   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:32:38,059-[lfw][118000]XNorm: 23.955640
Training: 2022-01-08 21:32:38,060-[lfw][118000]Accuracy-Flip: 0.99767+-0.00281
Training: 2022-01-08 21:32:38,060-[lfw][118000]Accuracy-Highest: 0.99817
Training: 2022-01-08 21:33:29,355-[cfp_fp][118000]XNorm: 22.261557
Training: 2022-01-08 21:33:29,356-[cfp_fp][118000]Accuracy-Flip: 0.98986+-0.00440
Training: 2022-01-08 21:33:29,357-[cfp_fp][118000]Accuracy-Highest: 0.99057
Training: 2022-01-08 21:34:13,628-[agedb_30][118000]XNorm: 24.014558
Training: 2022-01-08 21:34:13,629-[agedb_30][118000]Accuracy-Flip: 0.97683+-0.00828
Training: 2022-01-08 21:34:13,629-[agedb_30][118000]Accuracy-Highest: 0.97917
Training: 2022-01-08 21:34:21,180-Speed 279.27 samples/sec   Loss 4.7775   LearningRate 0.0617   Epoch: 11   Global Step: 118010   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:34:28,649-Speed 5485.18 samples/sec   Loss 4.8039   LearningRate 0.0617   Epoch: 11   Global Step: 118020   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:34:36,101-Speed 5498.02 samples/sec   Loss 4.7949   LearningRate 0.0617   Epoch: 11   Global Step: 118030   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:34:43,666-Speed 5416.43 samples/sec   Loss 4.8215   LearningRate 0.0617   Epoch: 11   Global Step: 118040   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:34:51,178-Speed 5453.97 samples/sec   Loss 4.7745   LearningRate 0.0617   Epoch: 11   Global Step: 118050   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:34:58,723-Speed 5429.48 samples/sec   Loss 4.7933   LearningRate 0.0617   Epoch: 11   Global Step: 118060   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:35:06,197-Speed 5482.11 samples/sec   Loss 4.7831   LearningRate 0.0617   Epoch: 11   Global Step: 118070   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:35:13,642-Speed 5502.88 samples/sec   Loss 4.8524   LearningRate 0.0616   Epoch: 11   Global Step: 118080   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:35:21,074-Speed 5512.70 samples/sec   Loss 4.8235   LearningRate 0.0616   Epoch: 11   Global Step: 118090   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:35:28,563-Speed 5470.48 samples/sec   Loss 4.8219   LearningRate 0.0616   Epoch: 11   Global Step: 118100   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:35:35,999-Speed 5509.25 samples/sec   Loss 4.8140   LearningRate 0.0616   Epoch: 11   Global Step: 118110   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:35:43,475-Speed 5480.12 samples/sec   Loss 4.8188   LearningRate 0.0616   Epoch: 11   Global Step: 118120   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:35:50,912-Speed 5508.37 samples/sec   Loss 4.8074   LearningRate 0.0616   Epoch: 11   Global Step: 118130   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:35:58,440-Speed 5441.49 samples/sec   Loss 4.8289   LearningRate 0.0616   Epoch: 11   Global Step: 118140   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:36:05,999-Speed 5419.46 samples/sec   Loss 4.8256   LearningRate 0.0615   Epoch: 11   Global Step: 118150   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:36:13,516-Speed 5449.95 samples/sec   Loss 4.8523   LearningRate 0.0615   Epoch: 11   Global Step: 118160   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:36:21,053-Speed 5435.33 samples/sec   Loss 4.7884   LearningRate 0.0615   Epoch: 11   Global Step: 118170   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:36:28,564-Speed 5453.75 samples/sec   Loss 4.8030   LearningRate 0.0615   Epoch: 11   Global Step: 118180   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:36:36,086-Speed 5446.44 samples/sec   Loss 4.7762   LearningRate 0.0615   Epoch: 11   Global Step: 118190   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:36:43,548-Speed 5489.42 samples/sec   Loss 4.7929   LearningRate 0.0615   Epoch: 11   Global Step: 118200   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 21:36:51,088-Speed 5433.83 samples/sec   Loss 4.7905   LearningRate 0.0615   Epoch: 11   Global Step: 118210   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:36:58,476-Speed 5544.87 samples/sec   Loss 4.7394   LearningRate 0.0614   Epoch: 11   Global Step: 118220   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:37:06,027-Speed 5424.76 samples/sec   Loss 4.8051   LearningRate 0.0614   Epoch: 11   Global Step: 118230   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:37:13,502-Speed 5479.92 samples/sec   Loss 4.7982   LearningRate 0.0614   Epoch: 11   Global Step: 118240   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:37:20,922-Speed 5521.68 samples/sec   Loss 4.7966   LearningRate 0.0614   Epoch: 11   Global Step: 118250   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:37:28,411-Speed 5470.13 samples/sec   Loss 4.7563   LearningRate 0.0614   Epoch: 11   Global Step: 118260   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:37:35,894-Speed 5473.84 samples/sec   Loss 4.8195   LearningRate 0.0614   Epoch: 11   Global Step: 118270   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:37:43,364-Speed 5484.27 samples/sec   Loss 4.7960   LearningRate 0.0614   Epoch: 11   Global Step: 118280   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:37:50,875-Speed 5453.74 samples/sec   Loss 4.7882   LearningRate 0.0613   Epoch: 11   Global Step: 118290   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:37:58,331-Speed 5494.76 samples/sec   Loss 4.7637   LearningRate 0.0613   Epoch: 11   Global Step: 118300   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:38:05,838-Speed 5457.08 samples/sec   Loss 4.7827   LearningRate 0.0613   Epoch: 11   Global Step: 118310   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:38:13,289-Speed 5497.41 samples/sec   Loss 4.8071   LearningRate 0.0613   Epoch: 11   Global Step: 118320   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:38:20,746-Speed 5493.67 samples/sec   Loss 4.7912   LearningRate 0.0613   Epoch: 11   Global Step: 118330   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:38:28,236-Speed 5469.35 samples/sec   Loss 4.8118   LearningRate 0.0613   Epoch: 11   Global Step: 118340   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:38:35,766-Speed 5440.13 samples/sec   Loss 4.7681   LearningRate 0.0613   Epoch: 11   Global Step: 118350   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:38:43,311-Speed 5429.51 samples/sec   Loss 4.7393   LearningRate 0.0613   Epoch: 11   Global Step: 118360   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:38:50,808-Speed 5464.52 samples/sec   Loss 4.7677   LearningRate 0.0612   Epoch: 11   Global Step: 118370   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:38:58,323-Speed 5451.26 samples/sec   Loss 4.8546   LearningRate 0.0612   Epoch: 11   Global Step: 118380   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:39:05,809-Speed 5472.03 samples/sec   Loss 4.8097   LearningRate 0.0612   Epoch: 11   Global Step: 118390   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:39:13,361-Speed 5424.76 samples/sec   Loss 4.7927   LearningRate 0.0612   Epoch: 11   Global Step: 118400   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:39:20,804-Speed 5503.71 samples/sec   Loss 4.8033   LearningRate 0.0612   Epoch: 11   Global Step: 118410   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:39:28,373-Speed 5412.40 samples/sec   Loss 4.8147   LearningRate 0.0612   Epoch: 11   Global Step: 118420   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 21:39:35,899-Speed 5443.32 samples/sec   Loss 4.7710   LearningRate 0.0612   Epoch: 11   Global Step: 118430   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:39:43,443-Speed 5430.32 samples/sec   Loss 4.7905   LearningRate 0.0611   Epoch: 11   Global Step: 118440   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:39:51,182-Speed 5293.43 samples/sec   Loss 4.8060   LearningRate 0.0611   Epoch: 11   Global Step: 118450   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:39:58,676-Speed 5466.01 samples/sec   Loss 4.7759   LearningRate 0.0611   Epoch: 11   Global Step: 118460   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:40:06,199-Speed 5445.50 samples/sec   Loss 4.8041   LearningRate 0.0611   Epoch: 11   Global Step: 118470   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:40:13,673-Speed 5480.92 samples/sec   Loss 4.8020   LearningRate 0.0611   Epoch: 11   Global Step: 118480   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:40:21,123-Speed 5498.75 samples/sec   Loss 4.8001   LearningRate 0.0611   Epoch: 11   Global Step: 118490   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:40:28,551-Speed 5514.89 samples/sec   Loss 4.8436   LearningRate 0.0611   Epoch: 11   Global Step: 118500   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:40:36,005-Speed 5495.70 samples/sec   Loss 4.7554   LearningRate 0.0610   Epoch: 11   Global Step: 118510   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:40:43,444-Speed 5507.33 samples/sec   Loss 4.8580   LearningRate 0.0610   Epoch: 11   Global Step: 118520   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:40:51,010-Speed 5414.03 samples/sec   Loss 4.7571   LearningRate 0.0610   Epoch: 11   Global Step: 118530   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:40:58,551-Speed 5432.27 samples/sec   Loss 4.7837   LearningRate 0.0610   Epoch: 11   Global Step: 118540   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:41:06,140-Speed 5398.65 samples/sec   Loss 4.8109   LearningRate 0.0610   Epoch: 11   Global Step: 118550   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:41:13,654-Speed 5451.53 samples/sec   Loss 4.7680   LearningRate 0.0610   Epoch: 11   Global Step: 118560   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:41:21,164-Speed 5454.27 samples/sec   Loss 4.7560   LearningRate 0.0610   Epoch: 11   Global Step: 118570   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:41:28,724-Speed 5419.11 samples/sec   Loss 4.7928   LearningRate 0.0609   Epoch: 11   Global Step: 118580   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:41:36,190-Speed 5487.25 samples/sec   Loss 4.8048   LearningRate 0.0609   Epoch: 11   Global Step: 118590   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:41:43,914-Speed 5302.84 samples/sec   Loss 4.7830   LearningRate 0.0609   Epoch: 11   Global Step: 118600   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:41:51,518-Speed 5387.46 samples/sec   Loss 4.7918   LearningRate 0.0609   Epoch: 11   Global Step: 118610   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:41:59,054-Speed 5436.09 samples/sec   Loss 4.7949   LearningRate 0.0609   Epoch: 11   Global Step: 118620   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:42:06,527-Speed 5482.12 samples/sec   Loss 4.8227   LearningRate 0.0609   Epoch: 11   Global Step: 118630   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:42:14,034-Speed 5456.29 samples/sec   Loss 4.8209   LearningRate 0.0609   Epoch: 11   Global Step: 118640   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:42:21,481-Speed 5501.37 samples/sec   Loss 4.7619   LearningRate 0.0609   Epoch: 11   Global Step: 118650   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:42:28,988-Speed 5457.03 samples/sec   Loss 4.7964   LearningRate 0.0608   Epoch: 11   Global Step: 118660   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:42:36,475-Speed 5471.42 samples/sec   Loss 4.7636   LearningRate 0.0608   Epoch: 11   Global Step: 118670   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:42:43,983-Speed 5456.00 samples/sec   Loss 4.7838   LearningRate 0.0608   Epoch: 11   Global Step: 118680   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:42:51,737-Speed 5283.40 samples/sec   Loss 4.7982   LearningRate 0.0608   Epoch: 11   Global Step: 118690   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 21:42:59,240-Speed 5459.41 samples/sec   Loss 4.7804   LearningRate 0.0608   Epoch: 11   Global Step: 118700   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 21:43:06,933-Speed 5325.08 samples/sec   Loss 4.7909   LearningRate 0.0608   Epoch: 11   Global Step: 118710   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 21:43:14,477-Speed 5430.07 samples/sec   Loss 4.7831   LearningRate 0.0608   Epoch: 11   Global Step: 118720   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:43:21,964-Speed 5471.72 samples/sec   Loss 4.8002   LearningRate 0.0607   Epoch: 11   Global Step: 118730   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:43:29,490-Speed 5443.20 samples/sec   Loss 4.7573   LearningRate 0.0607   Epoch: 11   Global Step: 118740   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:43:36,951-Speed 5490.74 samples/sec   Loss 4.7687   LearningRate 0.0607   Epoch: 11   Global Step: 118750   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:43:44,389-Speed 5507.75 samples/sec   Loss 4.8252   LearningRate 0.0607   Epoch: 11   Global Step: 118760   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:43:51,919-Speed 5440.10 samples/sec   Loss 4.7378   LearningRate 0.0607   Epoch: 11   Global Step: 118770   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:43:59,409-Speed 5470.33 samples/sec   Loss 4.8257   LearningRate 0.0607   Epoch: 11   Global Step: 118780   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:44:06,983-Speed 5408.49 samples/sec   Loss 4.8227   LearningRate 0.0607   Epoch: 11   Global Step: 118790   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:44:14,607-Speed 5373.22 samples/sec   Loss 4.7681   LearningRate 0.0606   Epoch: 11   Global Step: 118800   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:44:22,132-Speed 5443.71 samples/sec   Loss 4.8337   LearningRate 0.0606   Epoch: 11   Global Step: 118810   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:44:29,754-Speed 5374.79 samples/sec   Loss 4.7875   LearningRate 0.0606   Epoch: 11   Global Step: 118820   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 21:44:37,374-Speed 5376.30 samples/sec   Loss 4.7791   LearningRate 0.0606   Epoch: 11   Global Step: 118830   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 21:44:44,970-Speed 5392.72 samples/sec   Loss 4.8129   LearningRate 0.0606   Epoch: 11   Global Step: 118840   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 21:44:52,469-Speed 5463.18 samples/sec   Loss 4.7858   LearningRate 0.0606   Epoch: 11   Global Step: 118850   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 21:44:59,912-Speed 5503.14 samples/sec   Loss 4.7724   LearningRate 0.0606   Epoch: 11   Global Step: 118860   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 21:45:07,380-Speed 5486.28 samples/sec   Loss 4.7975   LearningRate 0.0606   Epoch: 11   Global Step: 118870   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 21:45:14,905-Speed 5443.59 samples/sec   Loss 4.7350   LearningRate 0.0605   Epoch: 11   Global Step: 118880   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 21:45:22,419-Speed 5451.83 samples/sec   Loss 4.7569   LearningRate 0.0605   Epoch: 11   Global Step: 118890   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 21:45:29,883-Speed 5488.23 samples/sec   Loss 4.7873   LearningRate 0.0605   Epoch: 11   Global Step: 118900   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 21:45:37,340-Speed 5493.70 samples/sec   Loss 4.7632   LearningRate 0.0605   Epoch: 11   Global Step: 118910   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:45:44,800-Speed 5491.72 samples/sec   Loss 4.7663   LearningRate 0.0605   Epoch: 11   Global Step: 118920   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:45:52,278-Speed 5477.98 samples/sec   Loss 4.7783   LearningRate 0.0605   Epoch: 11   Global Step: 118930   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:45:59,761-Speed 5474.38 samples/sec   Loss 4.8109   LearningRate 0.0605   Epoch: 11   Global Step: 118940   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:46:07,327-Speed 5414.67 samples/sec   Loss 4.7857   LearningRate 0.0604   Epoch: 11   Global Step: 118950   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:46:14,789-Speed 5490.11 samples/sec   Loss 4.7287   LearningRate 0.0604   Epoch: 11   Global Step: 118960   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:46:22,289-Speed 5461.43 samples/sec   Loss 4.7202   LearningRate 0.0604   Epoch: 11   Global Step: 118970   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:46:29,862-Speed 5409.40 samples/sec   Loss 4.7352   LearningRate 0.0604   Epoch: 11   Global Step: 118980   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:46:37,327-Speed 5488.58 samples/sec   Loss 4.7375   LearningRate 0.0604   Epoch: 11   Global Step: 118990   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:46:44,808-Speed 5475.07 samples/sec   Loss 4.7935   LearningRate 0.0604   Epoch: 11   Global Step: 119000   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:46:52,302-Speed 5466.95 samples/sec   Loss 4.8056   LearningRate 0.0604   Epoch: 11   Global Step: 119010   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:46:59,831-Speed 5441.11 samples/sec   Loss 4.7308   LearningRate 0.0603   Epoch: 11   Global Step: 119020   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:47:07,347-Speed 5449.96 samples/sec   Loss 4.7521   LearningRate 0.0603   Epoch: 11   Global Step: 119030   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:47:14,807-Speed 5491.37 samples/sec   Loss 4.7623   LearningRate 0.0603   Epoch: 11   Global Step: 119040   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:47:22,286-Speed 5477.53 samples/sec   Loss 4.7470   LearningRate 0.0603   Epoch: 11   Global Step: 119050   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:47:29,819-Speed 5438.12 samples/sec   Loss 4.8285   LearningRate 0.0603   Epoch: 11   Global Step: 119060   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:47:37,339-Speed 5447.75 samples/sec   Loss 4.7362   LearningRate 0.0603   Epoch: 11   Global Step: 119070   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:47:44,893-Speed 5423.18 samples/sec   Loss 4.7147   LearningRate 0.0603   Epoch: 11   Global Step: 119080   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:47:52,375-Speed 5475.06 samples/sec   Loss 4.7534   LearningRate 0.0603   Epoch: 11   Global Step: 119090   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:47:59,803-Speed 5514.57 samples/sec   Loss 4.7769   LearningRate 0.0602   Epoch: 11   Global Step: 119100   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:48:07,311-Speed 5456.75 samples/sec   Loss 4.7736   LearningRate 0.0602   Epoch: 11   Global Step: 119110   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:48:14,766-Speed 5494.88 samples/sec   Loss 4.8053   LearningRate 0.0602   Epoch: 11   Global Step: 119120   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 21:48:22,349-Speed 5401.92 samples/sec   Loss 4.7288   LearningRate 0.0602   Epoch: 11   Global Step: 119130   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 21:48:30,011-Speed 5346.44 samples/sec   Loss 4.7424   LearningRate 0.0602   Epoch: 11   Global Step: 119140   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 21:48:37,764-Speed 5283.58 samples/sec   Loss 4.7684   LearningRate 0.0602   Epoch: 11   Global Step: 119150   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 21:48:45,303-Speed 5433.95 samples/sec   Loss 4.7504   LearningRate 0.0602   Epoch: 11   Global Step: 119160   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:48:52,954-Speed 5354.15 samples/sec   Loss 4.7817   LearningRate 0.0601   Epoch: 11   Global Step: 119170   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:49:00,458-Speed 5459.16 samples/sec   Loss 4.7506   LearningRate 0.0601   Epoch: 11   Global Step: 119180   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:49:07,967-Speed 5455.28 samples/sec   Loss 4.7714   LearningRate 0.0601   Epoch: 11   Global Step: 119190   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:49:15,539-Speed 5410.23 samples/sec   Loss 4.7264   LearningRate 0.0601   Epoch: 11   Global Step: 119200   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:49:23,044-Speed 5458.62 samples/sec   Loss 4.7332   LearningRate 0.0601   Epoch: 11   Global Step: 119210   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:49:30,606-Speed 5417.34 samples/sec   Loss 4.7381   LearningRate 0.0601   Epoch: 11   Global Step: 119220   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:49:38,184-Speed 5405.52 samples/sec   Loss 4.7842   LearningRate 0.0601   Epoch: 11   Global Step: 119230   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:49:45,683-Speed 5462.91 samples/sec   Loss 4.7618   LearningRate 0.0600   Epoch: 11   Global Step: 119240   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:49:53,185-Speed 5460.95 samples/sec   Loss 4.7837   LearningRate 0.0600   Epoch: 11   Global Step: 119250   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:50:00,709-Speed 5444.15 samples/sec   Loss 4.7414   LearningRate 0.0600   Epoch: 11   Global Step: 119260   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:50:08,156-Speed 5501.26 samples/sec   Loss 4.7865   LearningRate 0.0600   Epoch: 11   Global Step: 119270   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:50:15,645-Speed 5470.21 samples/sec   Loss 4.7888   LearningRate 0.0600   Epoch: 11   Global Step: 119280   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:50:23,174-Speed 5440.68 samples/sec   Loss 4.7654   LearningRate 0.0600   Epoch: 11   Global Step: 119290   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:50:30,645-Speed 5483.20 samples/sec   Loss 4.7233   LearningRate 0.0600   Epoch: 11   Global Step: 119300   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:50:38,109-Speed 5488.55 samples/sec   Loss 4.7785   LearningRate 0.0600   Epoch: 11   Global Step: 119310   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:50:45,583-Speed 5481.00 samples/sec   Loss 4.7884   LearningRate 0.0599   Epoch: 11   Global Step: 119320   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:50:53,068-Speed 5473.27 samples/sec   Loss 4.7796   LearningRate 0.0599   Epoch: 11   Global Step: 119330   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:51:00,661-Speed 5394.81 samples/sec   Loss 4.7414   LearningRate 0.0599   Epoch: 11   Global Step: 119340   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:51:08,156-Speed 5466.06 samples/sec   Loss 4.7630   LearningRate 0.0599   Epoch: 11   Global Step: 119350   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:51:15,706-Speed 5425.87 samples/sec   Loss 4.7858   LearningRate 0.0599   Epoch: 11   Global Step: 119360   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:51:23,239-Speed 5437.64 samples/sec   Loss 4.7293   LearningRate 0.0599   Epoch: 11   Global Step: 119370   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:51:30,852-Speed 5381.44 samples/sec   Loss 4.7499   LearningRate 0.0599   Epoch: 11   Global Step: 119380   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:51:38,454-Speed 5388.82 samples/sec   Loss 4.6809   LearningRate 0.0598   Epoch: 11   Global Step: 119390   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:51:46,066-Speed 5381.87 samples/sec   Loss 4.7578   LearningRate 0.0598   Epoch: 11   Global Step: 119400   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:51:53,754-Speed 5328.22 samples/sec   Loss 4.6598   LearningRate 0.0598   Epoch: 11   Global Step: 119410   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:52:01,307-Speed 5423.95 samples/sec   Loss 4.6648   LearningRate 0.0598   Epoch: 11   Global Step: 119420   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:52:08,786-Speed 5477.12 samples/sec   Loss 4.7437   LearningRate 0.0598   Epoch: 11   Global Step: 119430   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:52:16,361-Speed 5407.80 samples/sec   Loss 4.7303   LearningRate 0.0598   Epoch: 11   Global Step: 119440   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:52:23,863-Speed 5461.02 samples/sec   Loss 4.7167   LearningRate 0.0598   Epoch: 11   Global Step: 119450   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:52:31,393-Speed 5439.76 samples/sec   Loss 4.7295   LearningRate 0.0597   Epoch: 11   Global Step: 119460   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:52:38,943-Speed 5426.13 samples/sec   Loss 4.7273   LearningRate 0.0597   Epoch: 11   Global Step: 119470   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:52:46,432-Speed 5470.12 samples/sec   Loss 4.7227   LearningRate 0.0597   Epoch: 11   Global Step: 119480   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:52:53,889-Speed 5494.11 samples/sec   Loss 4.6945   LearningRate 0.0597   Epoch: 11   Global Step: 119490   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:53:01,374-Speed 5472.15 samples/sec   Loss 4.7369   LearningRate 0.0597   Epoch: 11   Global Step: 119500   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:53:08,876-Speed 5461.00 samples/sec   Loss 4.7004   LearningRate 0.0597   Epoch: 11   Global Step: 119510   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:53:16,385-Speed 5455.70 samples/sec   Loss 4.7055   LearningRate 0.0597   Epoch: 11   Global Step: 119520   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:53:23,841-Speed 5494.21 samples/sec   Loss 4.7670   LearningRate 0.0597   Epoch: 11   Global Step: 119530   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:53:31,309-Speed 5485.65 samples/sec   Loss 4.7674   LearningRate 0.0596   Epoch: 11   Global Step: 119540   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:53:38,757-Speed 5500.26 samples/sec   Loss 4.7156   LearningRate 0.0596   Epoch: 11   Global Step: 119550   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:53:46,229-Speed 5482.42 samples/sec   Loss 4.7459   LearningRate 0.0596   Epoch: 11   Global Step: 119560   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:53:53,660-Speed 5512.93 samples/sec   Loss 4.7632   LearningRate 0.0596   Epoch: 11   Global Step: 119570   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:54:01,154-Speed 5466.31 samples/sec   Loss 4.7657   LearningRate 0.0596   Epoch: 11   Global Step: 119580   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:54:08,632-Speed 5477.66 samples/sec   Loss 4.7552   LearningRate 0.0596   Epoch: 11   Global Step: 119590   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:54:16,045-Speed 5526.89 samples/sec   Loss 4.7811   LearningRate 0.0596   Epoch: 11   Global Step: 119600   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:54:23,483-Speed 5507.30 samples/sec   Loss 4.7018   LearningRate 0.0595   Epoch: 11   Global Step: 119610   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:54:30,948-Speed 5487.90 samples/sec   Loss 4.7217   LearningRate 0.0595   Epoch: 11   Global Step: 119620   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:54:38,433-Speed 5472.62 samples/sec   Loss 4.7960   LearningRate 0.0595   Epoch: 11   Global Step: 119630   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:54:45,932-Speed 5463.10 samples/sec   Loss 4.7559   LearningRate 0.0595   Epoch: 11   Global Step: 119640   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:54:53,422-Speed 5469.44 samples/sec   Loss 4.7676   LearningRate 0.0595   Epoch: 11   Global Step: 119650   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:55:00,893-Speed 5482.88 samples/sec   Loss 4.7544   LearningRate 0.0595   Epoch: 11   Global Step: 119660   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:55:08,490-Speed 5392.34 samples/sec   Loss 4.7500   LearningRate 0.0595   Epoch: 11   Global Step: 119670   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:55:16,149-Speed 5348.72 samples/sec   Loss 4.7870   LearningRate 0.0594   Epoch: 11   Global Step: 119680   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:55:23,619-Speed 5484.77 samples/sec   Loss 4.7223   LearningRate 0.0594   Epoch: 11   Global Step: 119690   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:55:31,165-Speed 5427.84 samples/sec   Loss 4.7222   LearningRate 0.0594   Epoch: 11   Global Step: 119700   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:55:38,650-Speed 5472.96 samples/sec   Loss 4.7495   LearningRate 0.0594   Epoch: 11   Global Step: 119710   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:55:46,161-Speed 5454.65 samples/sec   Loss 4.7127   LearningRate 0.0594   Epoch: 11   Global Step: 119720   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:55:53,738-Speed 5406.49 samples/sec   Loss 4.7301   LearningRate 0.0594   Epoch: 11   Global Step: 119730   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:56:01,290-Speed 5424.40 samples/sec   Loss 4.7751   LearningRate 0.0594   Epoch: 11   Global Step: 119740   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 21:56:08,760-Speed 5483.64 samples/sec   Loss 4.7307   LearningRate 0.0594   Epoch: 11   Global Step: 119750   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:56:16,231-Speed 5483.28 samples/sec   Loss 4.7443   LearningRate 0.0593   Epoch: 11   Global Step: 119760   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:56:23,679-Speed 5500.79 samples/sec   Loss 4.6514   LearningRate 0.0593   Epoch: 11   Global Step: 119770   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:56:31,283-Speed 5387.41 samples/sec   Loss 4.7000   LearningRate 0.0593   Epoch: 11   Global Step: 119780   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:56:38,912-Speed 5369.74 samples/sec   Loss 4.6975   LearningRate 0.0593   Epoch: 11   Global Step: 119790   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:56:46,392-Speed 5476.16 samples/sec   Loss 4.7552   LearningRate 0.0593   Epoch: 11   Global Step: 119800   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:56:53,876-Speed 5474.32 samples/sec   Loss 4.6939   LearningRate 0.0593   Epoch: 11   Global Step: 119810   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:57:01,403-Speed 5442.27 samples/sec   Loss 4.7295   LearningRate 0.0593   Epoch: 11   Global Step: 119820   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:57:08,462-Speed 5803.18 samples/sec   Loss 4.7095   LearningRate 0.0592   Epoch: 11   Global Step: 119830   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:57:15,468-Speed 5847.16 samples/sec   Loss 4.7142   LearningRate 0.0592   Epoch: 11   Global Step: 119840   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:57:22,844-Speed 5553.85 samples/sec   Loss 4.7863   LearningRate 0.0592   Epoch: 11   Global Step: 119850   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:57:30,357-Speed 5453.20 samples/sec   Loss 4.7151   LearningRate 0.0592   Epoch: 11   Global Step: 119860   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:57:37,699-Speed 5579.35 samples/sec   Loss 4.7141   LearningRate 0.0592   Epoch: 11   Global Step: 119870   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:57:45,172-Speed 5481.91 samples/sec   Loss 4.7209   LearningRate 0.0592   Epoch: 11   Global Step: 119880   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:57:52,718-Speed 5428.64 samples/sec   Loss 4.6668   LearningRate 0.0592   Epoch: 11   Global Step: 119890   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:58:00,218-Speed 5462.40 samples/sec   Loss 4.7472   LearningRate 0.0592   Epoch: 11   Global Step: 119900   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:58:07,743-Speed 5443.52 samples/sec   Loss 4.7089   LearningRate 0.0591   Epoch: 11   Global Step: 119910   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:58:15,241-Speed 5463.57 samples/sec   Loss 4.7298   LearningRate 0.0591   Epoch: 11   Global Step: 119920   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 21:58:22,682-Speed 5505.19 samples/sec   Loss 4.6764   LearningRate 0.0591   Epoch: 11   Global Step: 119930   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:58:30,148-Speed 5487.35 samples/sec   Loss 4.7746   LearningRate 0.0591   Epoch: 11   Global Step: 119940   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:58:37,696-Speed 5427.52 samples/sec   Loss 4.7541   LearningRate 0.0591   Epoch: 11   Global Step: 119950   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:58:45,229-Speed 5437.57 samples/sec   Loss 4.7569   LearningRate 0.0591   Epoch: 11   Global Step: 119960   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:58:52,706-Speed 5478.90 samples/sec   Loss 4.7239   LearningRate 0.0591   Epoch: 11   Global Step: 119970   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:59:00,170-Speed 5488.49 samples/sec   Loss 4.7300   LearningRate 0.0590   Epoch: 11   Global Step: 119980   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:59:07,725-Speed 5422.27 samples/sec   Loss 4.6921   LearningRate 0.0590   Epoch: 11   Global Step: 119990   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:59:15,282-Speed 5421.09 samples/sec   Loss 4.6723   LearningRate 0.0590   Epoch: 11   Global Step: 120000   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 21:59:59,272-[lfw][120000]XNorm: 21.978734
Training: 2022-01-08 21:59:59,273-[lfw][120000]Accuracy-Flip: 0.99767+-0.00318
Training: 2022-01-08 21:59:59,273-[lfw][120000]Accuracy-Highest: 0.99817
Training: 2022-01-08 22:00:51,080-[cfp_fp][120000]XNorm: 20.407298
Training: 2022-01-08 22:00:51,081-[cfp_fp][120000]Accuracy-Flip: 0.98886+-0.00428
Training: 2022-01-08 22:00:51,081-[cfp_fp][120000]Accuracy-Highest: 0.99057
Training: 2022-01-08 22:01:35,431-[agedb_30][120000]XNorm: 21.831277
Training: 2022-01-08 22:01:35,432-[agedb_30][120000]Accuracy-Flip: 0.97783+-0.00827
Training: 2022-01-08 22:01:35,433-[agedb_30][120000]Accuracy-Highest: 0.97917
Training: 2022-01-08 22:01:43,082-Speed 277.13 samples/sec   Loss 4.7560   LearningRate 0.0590   Epoch: 11   Global Step: 120010   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:01:50,637-Speed 5422.87 samples/sec   Loss 4.7786   LearningRate 0.0590   Epoch: 11   Global Step: 120020   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:01:58,079-Speed 5505.46 samples/sec   Loss 4.7092   LearningRate 0.0590   Epoch: 11   Global Step: 120030   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:02:05,579-Speed 5462.00 samples/sec   Loss 4.7607   LearningRate 0.0590   Epoch: 11   Global Step: 120040   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:02:13,064-Speed 5473.01 samples/sec   Loss 4.7610   LearningRate 0.0589   Epoch: 11   Global Step: 120050   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:02:20,507-Speed 5504.89 samples/sec   Loss 4.7086   LearningRate 0.0589   Epoch: 11   Global Step: 120060   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:02:27,988-Speed 5475.57 samples/sec   Loss 4.7042   LearningRate 0.0589   Epoch: 11   Global Step: 120070   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:02:35,514-Speed 5442.98 samples/sec   Loss 4.7212   LearningRate 0.0589   Epoch: 11   Global Step: 120080   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:02:42,947-Speed 5511.59 samples/sec   Loss 4.7005   LearningRate 0.0589   Epoch: 11   Global Step: 120090   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:02:50,404-Speed 5493.42 samples/sec   Loss 4.6705   LearningRate 0.0589   Epoch: 11   Global Step: 120100   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:02:57,866-Speed 5489.89 samples/sec   Loss 4.6825   LearningRate 0.0589   Epoch: 11   Global Step: 120110   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:03:05,372-Speed 5457.87 samples/sec   Loss 4.6969   LearningRate 0.0589   Epoch: 11   Global Step: 120120   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:03:12,885-Speed 5452.50 samples/sec   Loss 4.7095   LearningRate 0.0588   Epoch: 11   Global Step: 120130   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:03:20,350-Speed 5487.94 samples/sec   Loss 4.7314   LearningRate 0.0588   Epoch: 11   Global Step: 120140   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:03:27,811-Speed 5490.46 samples/sec   Loss 4.6850   LearningRate 0.0588   Epoch: 11   Global Step: 120150   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:03:35,473-Speed 5346.43 samples/sec   Loss 4.7304   LearningRate 0.0588   Epoch: 11   Global Step: 120160   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:03:42,916-Speed 5503.69 samples/sec   Loss 4.7261   LearningRate 0.0588   Epoch: 11   Global Step: 120170   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:03:50,513-Speed 5392.43 samples/sec   Loss 4.7133   LearningRate 0.0588   Epoch: 11   Global Step: 120180   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:03:57,977-Speed 5488.71 samples/sec   Loss 4.6645   LearningRate 0.0588   Epoch: 11   Global Step: 120190   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:04:05,544-Speed 5413.38 samples/sec   Loss 4.6438   LearningRate 0.0587   Epoch: 11   Global Step: 120200   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:04:13,165-Speed 5375.05 samples/sec   Loss 4.6784   LearningRate 0.0587   Epoch: 11   Global Step: 120210   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:04:20,709-Speed 5430.56 samples/sec   Loss 4.6606   LearningRate 0.0587   Epoch: 11   Global Step: 120220   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:04:28,269-Speed 5418.65 samples/sec   Loss 4.6739   LearningRate 0.0587   Epoch: 11   Global Step: 120230   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 22:04:35,804-Speed 5436.68 samples/sec   Loss 4.7063   LearningRate 0.0587   Epoch: 11   Global Step: 120240   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:04:43,389-Speed 5400.80 samples/sec   Loss 4.6697   LearningRate 0.0587   Epoch: 11   Global Step: 120250   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:04:50,950-Speed 5417.88 samples/sec   Loss 4.6784   LearningRate 0.0587   Epoch: 11   Global Step: 120260   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:04:58,464-Speed 5452.01 samples/sec   Loss 4.7324   LearningRate 0.0587   Epoch: 11   Global Step: 120270   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:05:05,960-Speed 5464.43 samples/sec   Loss 4.6827   LearningRate 0.0586   Epoch: 11   Global Step: 120280   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:05:13,436-Speed 5479.57 samples/sec   Loss 4.6998   LearningRate 0.0586   Epoch: 11   Global Step: 120290   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:05:20,935-Speed 5463.23 samples/sec   Loss 4.6494   LearningRate 0.0586   Epoch: 11   Global Step: 120300   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:05:28,337-Speed 5534.40 samples/sec   Loss 4.7145   LearningRate 0.0586   Epoch: 11   Global Step: 120310   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:05:35,275-Speed 5904.93 samples/sec   Loss 4.6854   LearningRate 0.0586   Epoch: 11   Global Step: 120320   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:05:42,341-Speed 5797.14 samples/sec   Loss 4.6919   LearningRate 0.0586   Epoch: 11   Global Step: 120330   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:05:50,007-Speed 5343.88 samples/sec   Loss 4.7032   LearningRate 0.0586   Epoch: 11   Global Step: 120340   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 22:05:57,823-Speed 5241.49 samples/sec   Loss 4.7290   LearningRate 0.0585   Epoch: 11   Global Step: 120350   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:06:05,320-Speed 5464.11 samples/sec   Loss 4.6820   LearningRate 0.0585   Epoch: 11   Global Step: 120360   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:06:12,798-Speed 5477.82 samples/sec   Loss 4.6508   LearningRate 0.0585   Epoch: 11   Global Step: 120370   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:06:20,324-Speed 5443.26 samples/sec   Loss 4.6957   LearningRate 0.0585   Epoch: 11   Global Step: 120380   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:06:27,792-Speed 5485.49 samples/sec   Loss 4.6727   LearningRate 0.0585   Epoch: 11   Global Step: 120390   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:06:35,337-Speed 5429.09 samples/sec   Loss 4.7224   LearningRate 0.0585   Epoch: 11   Global Step: 120400   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:06:42,849-Speed 5453.71 samples/sec   Loss 4.6826   LearningRate 0.0585   Epoch: 11   Global Step: 120410   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:06:50,353-Speed 5459.83 samples/sec   Loss 4.6825   LearningRate 0.0584   Epoch: 11   Global Step: 120420   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:06:57,845-Speed 5467.20 samples/sec   Loss 4.6881   LearningRate 0.0584   Epoch: 11   Global Step: 120430   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:07:05,427-Speed 5402.97 samples/sec   Loss 4.7222   LearningRate 0.0584   Epoch: 11   Global Step: 120440   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:07:12,915-Speed 5471.42 samples/sec   Loss 4.6772   LearningRate 0.0584   Epoch: 11   Global Step: 120450   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 22:07:20,465-Speed 5425.40 samples/sec   Loss 4.7225   LearningRate 0.0584   Epoch: 11   Global Step: 120460   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 22:07:27,998-Speed 5438.38 samples/sec   Loss 4.7126   LearningRate 0.0584   Epoch: 11   Global Step: 120470   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:07:35,430-Speed 5511.63 samples/sec   Loss 4.6751   LearningRate 0.0584   Epoch: 11   Global Step: 120480   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:07:42,987-Speed 5421.15 samples/sec   Loss 4.6655   LearningRate 0.0584   Epoch: 11   Global Step: 120490   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:07:50,571-Speed 5401.66 samples/sec   Loss 4.6730   LearningRate 0.0583   Epoch: 11   Global Step: 120500   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:07:58,099-Speed 5441.85 samples/sec   Loss 4.6625   LearningRate 0.0583   Epoch: 11   Global Step: 120510   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:08:05,673-Speed 5408.99 samples/sec   Loss 4.6419   LearningRate 0.0583   Epoch: 11   Global Step: 120520   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:08:13,296-Speed 5373.46 samples/sec   Loss 4.6823   LearningRate 0.0583   Epoch: 11   Global Step: 120530   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:08:20,949-Speed 5353.07 samples/sec   Loss 4.6670   LearningRate 0.0583   Epoch: 11   Global Step: 120540   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:08:28,424-Speed 5480.25 samples/sec   Loss 4.7206   LearningRate 0.0583   Epoch: 11   Global Step: 120550   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:08:35,915-Speed 5468.77 samples/sec   Loss 4.6959   LearningRate 0.0583   Epoch: 11   Global Step: 120560   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:08:43,465-Speed 5425.29 samples/sec   Loss 4.6989   LearningRate 0.0582   Epoch: 11   Global Step: 120570   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:08:51,062-Speed 5392.76 samples/sec   Loss 4.6743   LearningRate 0.0582   Epoch: 11   Global Step: 120580   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:08:58,692-Speed 5368.82 samples/sec   Loss 4.6274   LearningRate 0.0582   Epoch: 11   Global Step: 120590   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:09:06,223-Speed 5440.14 samples/sec   Loss 4.7014   LearningRate 0.0582   Epoch: 11   Global Step: 120600   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:09:13,806-Speed 5401.83 samples/sec   Loss 4.6745   LearningRate 0.0582   Epoch: 11   Global Step: 120610   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:09:21,374-Speed 5412.94 samples/sec   Loss 4.6817   LearningRate 0.0582   Epoch: 11   Global Step: 120620   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:09:28,927-Speed 5423.43 samples/sec   Loss 4.6457   LearningRate 0.0582   Epoch: 11   Global Step: 120630   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:09:36,547-Speed 5376.44 samples/sec   Loss 4.6197   LearningRate 0.0582   Epoch: 11   Global Step: 120640   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:09:44,031-Speed 5473.33 samples/sec   Loss 4.6607   LearningRate 0.0581   Epoch: 11   Global Step: 120650   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:09:51,516-Speed 5473.30 samples/sec   Loss 4.6430   LearningRate 0.0581   Epoch: 11   Global Step: 120660   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:09:58,831-Speed 5599.77 samples/sec   Loss 4.7050   LearningRate 0.0581   Epoch: 11   Global Step: 120670   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:10:06,321-Speed 5469.80 samples/sec   Loss 4.6741   LearningRate 0.0581   Epoch: 11   Global Step: 120680   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 22:10:13,824-Speed 5460.03 samples/sec   Loss 4.7039   LearningRate 0.0581   Epoch: 11   Global Step: 120690   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 22:10:21,359-Speed 5436.80 samples/sec   Loss 4.7121   LearningRate 0.0581   Epoch: 11   Global Step: 120700   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:10:28,882-Speed 5445.09 samples/sec   Loss 4.6344   LearningRate 0.0581   Epoch: 11   Global Step: 120710   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:10:36,410-Speed 5441.78 samples/sec   Loss 4.6638   LearningRate 0.0580   Epoch: 11   Global Step: 120720   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:10:43,922-Speed 5453.38 samples/sec   Loss 4.6503   LearningRate 0.0580   Epoch: 11   Global Step: 120730   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:10:51,445-Speed 5444.60 samples/sec   Loss 4.6958   LearningRate 0.0580   Epoch: 11   Global Step: 120740   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:10:58,971-Speed 5443.52 samples/sec   Loss 4.6796   LearningRate 0.0580   Epoch: 11   Global Step: 120750   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:11:06,482-Speed 5454.07 samples/sec   Loss 4.6402   LearningRate 0.0580   Epoch: 11   Global Step: 120760   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:11:13,958-Speed 5479.81 samples/sec   Loss 4.6019   LearningRate 0.0580   Epoch: 11   Global Step: 120770   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:11:21,369-Speed 5527.65 samples/sec   Loss 4.6333   LearningRate 0.0580   Epoch: 11   Global Step: 120780   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:11:28,850-Speed 5475.70 samples/sec   Loss 4.6382   LearningRate 0.0580   Epoch: 11   Global Step: 120790   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:11:36,383-Speed 5438.16 samples/sec   Loss 4.6367   LearningRate 0.0579   Epoch: 11   Global Step: 120800   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 22:11:43,872-Speed 5470.08 samples/sec   Loss 4.7024   LearningRate 0.0579   Epoch: 11   Global Step: 120810   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 22:11:51,440-Speed 5413.23 samples/sec   Loss 4.6336   LearningRate 0.0579   Epoch: 11   Global Step: 120820   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:11:58,882-Speed 5503.82 samples/sec   Loss 4.6635   LearningRate 0.0579   Epoch: 11   Global Step: 120830   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:12:06,438-Speed 5422.34 samples/sec   Loss 4.6387   LearningRate 0.0579   Epoch: 11   Global Step: 120840   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:12:13,921-Speed 5474.56 samples/sec   Loss 4.6574   LearningRate 0.0579   Epoch: 11   Global Step: 120850   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:12:21,486-Speed 5414.94 samples/sec   Loss 4.6715   LearningRate 0.0579   Epoch: 11   Global Step: 120860   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:12:29,015-Speed 5440.62 samples/sec   Loss 4.6444   LearningRate 0.0578   Epoch: 11   Global Step: 120870   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:12:36,506-Speed 5469.17 samples/sec   Loss 4.6870   LearningRate 0.0578   Epoch: 11   Global Step: 120880   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:12:44,017-Speed 5454.13 samples/sec   Loss 4.6528   LearningRate 0.0578   Epoch: 11   Global Step: 120890   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:12:51,497-Speed 5476.55 samples/sec   Loss 4.6871   LearningRate 0.0578   Epoch: 11   Global Step: 120900   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:12:58,989-Speed 5467.97 samples/sec   Loss 4.7006   LearningRate 0.0578   Epoch: 11   Global Step: 120910   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:13:06,434-Speed 5501.99 samples/sec   Loss 4.6179   LearningRate 0.0578   Epoch: 11   Global Step: 120920   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:13:13,917-Speed 5474.92 samples/sec   Loss 4.5977   LearningRate 0.0578   Epoch: 11   Global Step: 120930   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:13:21,447-Speed 5440.10 samples/sec   Loss 4.6098   LearningRate 0.0578   Epoch: 11   Global Step: 120940   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:13:29,056-Speed 5384.03 samples/sec   Loss 4.7186   LearningRate 0.0577   Epoch: 11   Global Step: 120950   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:13:36,585-Speed 5440.68 samples/sec   Loss 4.6733   LearningRate 0.0577   Epoch: 11   Global Step: 120960   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:13:44,185-Speed 5390.11 samples/sec   Loss 4.6732   LearningRate 0.0577   Epoch: 11   Global Step: 120970   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:13:51,726-Speed 5432.51 samples/sec   Loss 4.6562   LearningRate 0.0577   Epoch: 11   Global Step: 120980   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:13:59,251-Speed 5443.48 samples/sec   Loss 4.6485   LearningRate 0.0577   Epoch: 11   Global Step: 120990   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:14:06,814-Speed 5416.86 samples/sec   Loss 4.6759   LearningRate 0.0577   Epoch: 11   Global Step: 121000   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:14:14,276-Speed 5489.94 samples/sec   Loss 4.6450   LearningRate 0.0577   Epoch: 11   Global Step: 121010   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:14:21,748-Speed 5482.79 samples/sec   Loss 4.7023   LearningRate 0.0576   Epoch: 11   Global Step: 121020   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 22:14:29,258-Speed 5454.66 samples/sec   Loss 4.6494   LearningRate 0.0576   Epoch: 11   Global Step: 121030   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:14:36,692-Speed 5510.45 samples/sec   Loss 4.6698   LearningRate 0.0576   Epoch: 11   Global Step: 121040   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:14:44,162-Speed 5483.90 samples/sec   Loss 4.6003   LearningRate 0.0576   Epoch: 11   Global Step: 121050   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:14:51,646-Speed 5473.54 samples/sec   Loss 4.6473   LearningRate 0.0576   Epoch: 11   Global Step: 121060   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:14:59,120-Speed 5481.11 samples/sec   Loss 4.6128   LearningRate 0.0576   Epoch: 11   Global Step: 121070   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:15:06,625-Speed 5458.41 samples/sec   Loss 4.6355   LearningRate 0.0576   Epoch: 11   Global Step: 121080   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:15:14,112-Speed 5471.91 samples/sec   Loss 4.6284   LearningRate 0.0576   Epoch: 11   Global Step: 121090   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:15:21,607-Speed 5466.20 samples/sec   Loss 4.5932   LearningRate 0.0575   Epoch: 11   Global Step: 121100   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:15:29,233-Speed 5371.19 samples/sec   Loss 4.6348   LearningRate 0.0575   Epoch: 11   Global Step: 121110   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:15:36,819-Speed 5399.92 samples/sec   Loss 4.5950   LearningRate 0.0575   Epoch: 11   Global Step: 121120   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:15:44,458-Speed 5363.20 samples/sec   Loss 4.6247   LearningRate 0.0575   Epoch: 11   Global Step: 121130   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:15:52,180-Speed 5304.64 samples/sec   Loss 4.6299   LearningRate 0.0575   Epoch: 11   Global Step: 121140   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:15:59,831-Speed 5354.91 samples/sec   Loss 4.6701   LearningRate 0.0575   Epoch: 11   Global Step: 121150   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:16:07,371-Speed 5432.22 samples/sec   Loss 4.6446   LearningRate 0.0575   Epoch: 11   Global Step: 121160   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:16:14,831-Speed 5492.06 samples/sec   Loss 4.6276   LearningRate 0.0574   Epoch: 11   Global Step: 121170   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:16:22,374-Speed 5430.52 samples/sec   Loss 4.6981   LearningRate 0.0574   Epoch: 11   Global Step: 121180   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:16:29,924-Speed 5425.84 samples/sec   Loss 4.7196   LearningRate 0.0574   Epoch: 11   Global Step: 121190   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:16:37,485-Speed 5418.36 samples/sec   Loss 4.6154   LearningRate 0.0574   Epoch: 11   Global Step: 121200   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:16:45,083-Speed 5391.44 samples/sec   Loss 4.6473   LearningRate 0.0574   Epoch: 11   Global Step: 121210   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:16:52,722-Speed 5363.02 samples/sec   Loss 4.6955   LearningRate 0.0574   Epoch: 11   Global Step: 121220   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:17:00,375-Speed 5352.57 samples/sec   Loss 4.6060   LearningRate 0.0574   Epoch: 11   Global Step: 121230   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:17:07,977-Speed 5389.10 samples/sec   Loss 4.6609   LearningRate 0.0574   Epoch: 11   Global Step: 121240   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:17:15,514-Speed 5435.01 samples/sec   Loss 4.6550   LearningRate 0.0573   Epoch: 11   Global Step: 121250   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:17:23,005-Speed 5468.53 samples/sec   Loss 4.6869   LearningRate 0.0573   Epoch: 11   Global Step: 121260   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:17:30,575-Speed 5411.52 samples/sec   Loss 4.6654   LearningRate 0.0573   Epoch: 11   Global Step: 121270   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:17:38,289-Speed 5310.67 samples/sec   Loss 4.6195   LearningRate 0.0573   Epoch: 11   Global Step: 121280   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:17:45,812-Speed 5444.98 samples/sec   Loss 4.6172   LearningRate 0.0573   Epoch: 11   Global Step: 121290   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:17:53,420-Speed 5384.58 samples/sec   Loss 4.7099   LearningRate 0.0573   Epoch: 11   Global Step: 121300   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:18:01,012-Speed 5395.99 samples/sec   Loss 4.6321   LearningRate 0.0573   Epoch: 11   Global Step: 121310   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:18:08,592-Speed 5404.06 samples/sec   Loss 4.6712   LearningRate 0.0572   Epoch: 11   Global Step: 121320   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:18:16,099-Speed 5456.90 samples/sec   Loss 4.6729   LearningRate 0.0572   Epoch: 11   Global Step: 121330   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:18:23,605-Speed 5458.40 samples/sec   Loss 4.7028   LearningRate 0.0572   Epoch: 11   Global Step: 121340   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:18:31,140-Speed 5435.85 samples/sec   Loss 4.6413   LearningRate 0.0572   Epoch: 11   Global Step: 121350   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:18:38,691-Speed 5425.43 samples/sec   Loss 4.6235   LearningRate 0.0572   Epoch: 11   Global Step: 121360   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:18:46,431-Speed 5292.55 samples/sec   Loss 4.6079   LearningRate 0.0572   Epoch: 11   Global Step: 121370   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:18:54,135-Speed 5317.94 samples/sec   Loss 4.6556   LearningRate 0.0572   Epoch: 11   Global Step: 121380   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:19:01,761-Speed 5371.32 samples/sec   Loss 4.6266   LearningRate 0.0572   Epoch: 11   Global Step: 121390   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:19:09,349-Speed 5399.03 samples/sec   Loss 4.6549   LearningRate 0.0571   Epoch: 11   Global Step: 121400   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:19:16,932-Speed 5401.82 samples/sec   Loss 4.6787   LearningRate 0.0571   Epoch: 11   Global Step: 121410   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:19:24,514-Speed 5403.26 samples/sec   Loss 4.6042   LearningRate 0.0571   Epoch: 11   Global Step: 121420   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:19:32,075-Speed 5417.71 samples/sec   Loss 4.6218   LearningRate 0.0571   Epoch: 11   Global Step: 121430   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:19:39,577-Speed 5460.96 samples/sec   Loss 4.5656   LearningRate 0.0571   Epoch: 11   Global Step: 121440   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:19:47,164-Speed 5399.46 samples/sec   Loss 4.6005   LearningRate 0.0571   Epoch: 11   Global Step: 121450   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:19:54,826-Speed 5346.75 samples/sec   Loss 4.6025   LearningRate 0.0571   Epoch: 11   Global Step: 121460   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:20:02,344-Speed 5448.76 samples/sec   Loss 4.6461   LearningRate 0.0570   Epoch: 11   Global Step: 121470   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:20:09,867-Speed 5445.04 samples/sec   Loss 4.5830   LearningRate 0.0570   Epoch: 11   Global Step: 121480   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:20:17,449-Speed 5403.49 samples/sec   Loss 4.6413   LearningRate 0.0570   Epoch: 11   Global Step: 121490   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:20:25,052-Speed 5388.03 samples/sec   Loss 4.6250   LearningRate 0.0570   Epoch: 11   Global Step: 121500   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 22:20:32,550-Speed 5462.82 samples/sec   Loss 4.6367   LearningRate 0.0570   Epoch: 11   Global Step: 121510   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 22:20:40,054-Speed 5459.79 samples/sec   Loss 4.5992   LearningRate 0.0570   Epoch: 11   Global Step: 121520   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 22:20:47,572-Speed 5448.50 samples/sec   Loss 4.6019   LearningRate 0.0570   Epoch: 11   Global Step: 121530   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 22:20:55,048-Speed 5479.96 samples/sec   Loss 4.6422   LearningRate 0.0570   Epoch: 11   Global Step: 121540   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:21:02,576-Speed 5441.66 samples/sec   Loss 4.7096   LearningRate 0.0569   Epoch: 11   Global Step: 121550   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:21:10,099-Speed 5444.96 samples/sec   Loss 4.6109   LearningRate 0.0569   Epoch: 11   Global Step: 121560   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:21:17,725-Speed 5372.67 samples/sec   Loss 4.6048   LearningRate 0.0569   Epoch: 11   Global Step: 121570   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:21:25,296-Speed 5410.67 samples/sec   Loss 4.5648   LearningRate 0.0569   Epoch: 11   Global Step: 121580   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:21:32,832-Speed 5435.58 samples/sec   Loss 4.6495   LearningRate 0.0569   Epoch: 11   Global Step: 121590   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:21:40,412-Speed 5404.04 samples/sec   Loss 4.6232   LearningRate 0.0569   Epoch: 11   Global Step: 121600   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:21:47,951-Speed 5434.31 samples/sec   Loss 4.5637   LearningRate 0.0569   Epoch: 11   Global Step: 121610   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:21:55,479-Speed 5441.58 samples/sec   Loss 4.5766   LearningRate 0.0568   Epoch: 11   Global Step: 121620   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:22:03,091-Speed 5382.04 samples/sec   Loss 4.5860   LearningRate 0.0568   Epoch: 11   Global Step: 121630   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:22:10,626-Speed 5436.47 samples/sec   Loss 4.6070   LearningRate 0.0568   Epoch: 11   Global Step: 121640   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 22:22:18,168-Speed 5431.63 samples/sec   Loss 4.5926   LearningRate 0.0568   Epoch: 11   Global Step: 121650   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:22:25,659-Speed 5468.63 samples/sec   Loss 4.5640   LearningRate 0.0568   Epoch: 11   Global Step: 121660   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:22:33,242-Speed 5401.86 samples/sec   Loss 4.5763   LearningRate 0.0568   Epoch: 11   Global Step: 121670   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:22:40,754-Speed 5453.51 samples/sec   Loss 4.6221   LearningRate 0.0568   Epoch: 11   Global Step: 121680   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:22:48,390-Speed 5364.84 samples/sec   Loss 4.6412   LearningRate 0.0568   Epoch: 11   Global Step: 121690   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:22:55,923-Speed 5437.89 samples/sec   Loss 4.6404   LearningRate 0.0567   Epoch: 11   Global Step: 121700   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:23:03,466-Speed 5430.97 samples/sec   Loss 4.6130   LearningRate 0.0567   Epoch: 11   Global Step: 121710   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:23:11,002-Speed 5435.63 samples/sec   Loss 4.6001   LearningRate 0.0567   Epoch: 11   Global Step: 121720   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:23:18,505-Speed 5460.16 samples/sec   Loss 4.6022   LearningRate 0.0567   Epoch: 11   Global Step: 121730   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:23:26,004-Speed 5463.01 samples/sec   Loss 4.5716   LearningRate 0.0567   Epoch: 11   Global Step: 121740   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:23:33,499-Speed 5465.73 samples/sec   Loss 4.6150   LearningRate 0.0567   Epoch: 11   Global Step: 121750   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:23:41,041-Speed 5431.40 samples/sec   Loss 4.5947   LearningRate 0.0567   Epoch: 11   Global Step: 121760   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:23:48,559-Speed 5449.48 samples/sec   Loss 4.6648   LearningRate 0.0566   Epoch: 11   Global Step: 121770   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:23:56,098-Speed 5433.83 samples/sec   Loss 4.6145   LearningRate 0.0566   Epoch: 11   Global Step: 121780   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:24:03,669-Speed 5410.34 samples/sec   Loss 4.5828   LearningRate 0.0566   Epoch: 11   Global Step: 121790   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:24:11,257-Speed 5398.68 samples/sec   Loss 4.6327   LearningRate 0.0566   Epoch: 11   Global Step: 121800   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:24:18,835-Speed 5405.59 samples/sec   Loss 4.6128   LearningRate 0.0566   Epoch: 11   Global Step: 121810   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:24:26,387-Speed 5425.11 samples/sec   Loss 4.6176   LearningRate 0.0566   Epoch: 11   Global Step: 121820   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:24:34,018-Speed 5367.98 samples/sec   Loss 4.5877   LearningRate 0.0566   Epoch: 11   Global Step: 121830   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:24:41,604-Speed 5399.85 samples/sec   Loss 4.5686   LearningRate 0.0566   Epoch: 11   Global Step: 121840   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:24:49,111-Speed 5457.29 samples/sec   Loss 4.5876   LearningRate 0.0565   Epoch: 11   Global Step: 121850   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:24:56,754-Speed 5359.91 samples/sec   Loss 4.6217   LearningRate 0.0565   Epoch: 11   Global Step: 121860   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:25:04,543-Speed 5259.18 samples/sec   Loss 4.6268   LearningRate 0.0565   Epoch: 11   Global Step: 121870   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:25:12,165-Speed 5375.02 samples/sec   Loss 4.5485   LearningRate 0.0565   Epoch: 11   Global Step: 121880   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:25:19,693-Speed 5441.12 samples/sec   Loss 4.5878   LearningRate 0.0565   Epoch: 11   Global Step: 121890   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:25:27,357-Speed 5345.63 samples/sec   Loss 4.5825   LearningRate 0.0565   Epoch: 11   Global Step: 121900   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:25:34,905-Speed 5426.88 samples/sec   Loss 4.5903   LearningRate 0.0565   Epoch: 11   Global Step: 121910   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:25:42,519-Speed 5380.96 samples/sec   Loss 4.5572   LearningRate 0.0565   Epoch: 11   Global Step: 121920   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:25:50,094-Speed 5407.61 samples/sec   Loss 4.5832   LearningRate 0.0564   Epoch: 11   Global Step: 121930   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:25:57,680-Speed 5400.34 samples/sec   Loss 4.5767   LearningRate 0.0564   Epoch: 11   Global Step: 121940   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:26:05,191-Speed 5453.63 samples/sec   Loss 4.5973   LearningRate 0.0564   Epoch: 11   Global Step: 121950   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 22:26:12,750-Speed 5419.92 samples/sec   Loss 4.6112   LearningRate 0.0564   Epoch: 11   Global Step: 121960   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 22:26:20,288-Speed 5434.12 samples/sec   Loss 4.6176   LearningRate 0.0564   Epoch: 11   Global Step: 121970   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 22:26:27,867-Speed 5405.43 samples/sec   Loss 4.5718   LearningRate 0.0564   Epoch: 11   Global Step: 121980   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:26:35,439-Speed 5409.98 samples/sec   Loss 4.6090   LearningRate 0.0564   Epoch: 11   Global Step: 121990   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:26:43,048-Speed 5383.87 samples/sec   Loss 4.6209   LearningRate 0.0563   Epoch: 11   Global Step: 122000   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:27:27,442-[lfw][122000]XNorm: 23.173306
Training: 2022-01-08 22:27:27,443-[lfw][122000]Accuracy-Flip: 0.99767+-0.00281
Training: 2022-01-08 22:27:27,443-[lfw][122000]Accuracy-Highest: 0.99817
Training: 2022-01-08 22:28:19,314-[cfp_fp][122000]XNorm: 21.469200
Training: 2022-01-08 22:28:19,315-[cfp_fp][122000]Accuracy-Flip: 0.98929+-0.00614
Training: 2022-01-08 22:28:19,316-[cfp_fp][122000]Accuracy-Highest: 0.99057
Training: 2022-01-08 22:29:04,109-[agedb_30][122000]XNorm: 23.163223
Training: 2022-01-08 22:29:04,111-[agedb_30][122000]Accuracy-Flip: 0.98000+-0.00687
Training: 2022-01-08 22:29:04,111-[agedb_30][122000]Accuracy-Highest: 0.98000
Training: 2022-01-08 22:29:11,655-Speed 275.63 samples/sec   Loss 4.6092   LearningRate 0.0563   Epoch: 11   Global Step: 122010   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:29:19,235-Speed 5405.27 samples/sec   Loss 4.6637   LearningRate 0.0563   Epoch: 11   Global Step: 122020   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:29:26,733-Speed 5464.81 samples/sec   Loss 4.6823   LearningRate 0.0563   Epoch: 11   Global Step: 122030   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:29:34,189-Speed 5494.55 samples/sec   Loss 4.6612   LearningRate 0.0563   Epoch: 11   Global Step: 122040   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:29:41,706-Speed 5450.24 samples/sec   Loss 4.5865   LearningRate 0.0563   Epoch: 11   Global Step: 122050   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:29:49,220-Speed 5452.77 samples/sec   Loss 4.5654   LearningRate 0.0563   Epoch: 11   Global Step: 122060   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:29:56,668-Speed 5500.51 samples/sec   Loss 4.5343   LearningRate 0.0563   Epoch: 11   Global Step: 122070   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:30:04,274-Speed 5386.50 samples/sec   Loss 4.6154   LearningRate 0.0562   Epoch: 11   Global Step: 122080   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:30:11,855-Speed 5404.02 samples/sec   Loss 4.6566   LearningRate 0.0562   Epoch: 11   Global Step: 122090   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:30:19,340-Speed 5473.09 samples/sec   Loss 4.6084   LearningRate 0.0562   Epoch: 11   Global Step: 122100   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:30:26,835-Speed 5466.00 samples/sec   Loss 4.5869   LearningRate 0.0562   Epoch: 11   Global Step: 122110   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:30:34,309-Speed 5480.78 samples/sec   Loss 4.6414   LearningRate 0.0562   Epoch: 11   Global Step: 122120   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:30:41,893-Speed 5401.88 samples/sec   Loss 4.6353   LearningRate 0.0562   Epoch: 11   Global Step: 122130   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:30:49,408-Speed 5450.82 samples/sec   Loss 4.6051   LearningRate 0.0562   Epoch: 11   Global Step: 122140   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:30:56,990-Speed 5402.98 samples/sec   Loss 4.5894   LearningRate 0.0561   Epoch: 11   Global Step: 122150   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:31:04,578-Speed 5398.54 samples/sec   Loss 4.5737   LearningRate 0.0561   Epoch: 11   Global Step: 122160   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:31:12,066-Speed 5470.91 samples/sec   Loss 4.5634   LearningRate 0.0561   Epoch: 11   Global Step: 122170   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:31:19,729-Speed 5346.12 samples/sec   Loss 4.5963   LearningRate 0.0561   Epoch: 11   Global Step: 122180   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:31:27,261-Speed 5438.81 samples/sec   Loss 4.6170   LearningRate 0.0561   Epoch: 11   Global Step: 122190   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:31:34,758-Speed 5463.89 samples/sec   Loss 4.6250   LearningRate 0.0561   Epoch: 11   Global Step: 122200   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:31:42,325-Speed 5414.34 samples/sec   Loss 4.5390   LearningRate 0.0561   Epoch: 11   Global Step: 122210   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:31:49,835-Speed 5454.58 samples/sec   Loss 4.5691   LearningRate 0.0561   Epoch: 11   Global Step: 122220   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:31:57,452-Speed 5377.96 samples/sec   Loss 4.5663   LearningRate 0.0560   Epoch: 11   Global Step: 122230   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:32:04,979-Speed 5442.15 samples/sec   Loss 4.5778   LearningRate 0.0560   Epoch: 11   Global Step: 122240   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 22:32:12,551-Speed 5410.70 samples/sec   Loss 4.5761   LearningRate 0.0560   Epoch: 11   Global Step: 122250   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:32:20,051-Speed 5461.65 samples/sec   Loss 4.5920   LearningRate 0.0560   Epoch: 11   Global Step: 122260   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:32:27,585-Speed 5437.36 samples/sec   Loss 4.5583   LearningRate 0.0560   Epoch: 11   Global Step: 122270   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:32:35,094-Speed 5455.23 samples/sec   Loss 4.5914   LearningRate 0.0560   Epoch: 11   Global Step: 122280   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:32:42,631-Speed 5435.23 samples/sec   Loss 4.5786   LearningRate 0.0560   Epoch: 11   Global Step: 122290   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 22:32:50,155-Speed 5445.13 samples/sec   Loss 4.5549   LearningRate 0.0559   Epoch: 11   Global Step: 122300   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:32:57,721-Speed 5414.13 samples/sec   Loss 4.5506   LearningRate 0.0559   Epoch: 11   Global Step: 122310   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:33:05,260-Speed 5434.14 samples/sec   Loss 4.6099   LearningRate 0.0559   Epoch: 11   Global Step: 122320   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:33:12,843-Speed 5402.16 samples/sec   Loss 4.6391   LearningRate 0.0559   Epoch: 11   Global Step: 122330   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:33:20,347-Speed 5459.20 samples/sec   Loss 4.5889   LearningRate 0.0559   Epoch: 11   Global Step: 122340   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:33:27,856-Speed 5454.81 samples/sec   Loss 4.5764   LearningRate 0.0559   Epoch: 11   Global Step: 122350   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:33:35,412-Speed 5421.85 samples/sec   Loss 4.5933   LearningRate 0.0559   Epoch: 11   Global Step: 122360   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:33:42,967-Speed 5422.71 samples/sec   Loss 4.5835   LearningRate 0.0559   Epoch: 11   Global Step: 122370   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:33:50,617-Speed 5354.67 samples/sec   Loss 4.5781   LearningRate 0.0558   Epoch: 11   Global Step: 122380   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:33:58,131-Speed 5451.84 samples/sec   Loss 4.5656   LearningRate 0.0558   Epoch: 11   Global Step: 122390   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:34:05,740-Speed 5383.44 samples/sec   Loss 4.5842   LearningRate 0.0558   Epoch: 11   Global Step: 122400   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:34:13,279-Speed 5434.31 samples/sec   Loss 4.5572   LearningRate 0.0558   Epoch: 11   Global Step: 122410   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:34:20,838-Speed 5419.35 samples/sec   Loss 4.5640   LearningRate 0.0558   Epoch: 11   Global Step: 122420   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:34:28,389-Speed 5425.03 samples/sec   Loss 4.5481   LearningRate 0.0558   Epoch: 11   Global Step: 122430   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:34:35,910-Speed 5446.65 samples/sec   Loss 4.5727   LearningRate 0.0558   Epoch: 11   Global Step: 122440   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:34:43,411-Speed 5461.69 samples/sec   Loss 4.5553   LearningRate 0.0558   Epoch: 11   Global Step: 122450   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 22:34:51,035-Speed 5372.97 samples/sec   Loss 4.6201   LearningRate 0.0557   Epoch: 11   Global Step: 122460   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:34:58,528-Speed 5467.53 samples/sec   Loss 4.5670   LearningRate 0.0557   Epoch: 11   Global Step: 122470   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:35:06,078-Speed 5425.00 samples/sec   Loss 4.4967   LearningRate 0.0557   Epoch: 11   Global Step: 122480   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:35:13,623-Speed 5429.82 samples/sec   Loss 4.5083   LearningRate 0.0557   Epoch: 11   Global Step: 122490   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:35:21,230-Speed 5385.44 samples/sec   Loss 4.5902   LearningRate 0.0557   Epoch: 11   Global Step: 122500   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:35:28,856-Speed 5372.01 samples/sec   Loss 4.5468   LearningRate 0.0557   Epoch: 11   Global Step: 122510   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:35:36,496-Speed 5361.42 samples/sec   Loss 4.6076   LearningRate 0.0557   Epoch: 11   Global Step: 122520   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:35:44,130-Speed 5366.44 samples/sec   Loss 4.5703   LearningRate 0.0556   Epoch: 11   Global Step: 122530   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:35:51,706-Speed 5407.03 samples/sec   Loss 4.5569   LearningRate 0.0556   Epoch: 11   Global Step: 122540   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:35:59,200-Speed 5466.75 samples/sec   Loss 4.5728   LearningRate 0.0556   Epoch: 11   Global Step: 122550   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:36:06,799-Speed 5390.84 samples/sec   Loss 4.5806   LearningRate 0.0556   Epoch: 11   Global Step: 122560   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:36:14,361-Speed 5417.00 samples/sec   Loss 4.5914   LearningRate 0.0556   Epoch: 11   Global Step: 122570   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:36:21,926-Speed 5414.96 samples/sec   Loss 4.5814   LearningRate 0.0556   Epoch: 11   Global Step: 122580   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:36:29,421-Speed 5466.03 samples/sec   Loss 4.6062   LearningRate 0.0556   Epoch: 11   Global Step: 122590   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:36:36,923-Speed 5460.61 samples/sec   Loss 4.6195   LearningRate 0.0556   Epoch: 11   Global Step: 122600   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:36:44,414-Speed 5468.74 samples/sec   Loss 4.5222   LearningRate 0.0555   Epoch: 11   Global Step: 122610   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:36:51,884-Speed 5484.09 samples/sec   Loss 4.5396   LearningRate 0.0555   Epoch: 11   Global Step: 122620   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:36:59,502-Speed 5377.03 samples/sec   Loss 4.5735   LearningRate 0.0555   Epoch: 11   Global Step: 122630   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:37:06,927-Speed 5517.89 samples/sec   Loss 4.5449   LearningRate 0.0555   Epoch: 11   Global Step: 122640   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:37:14,494-Speed 5413.07 samples/sec   Loss 4.5923   LearningRate 0.0555   Epoch: 11   Global Step: 122650   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:37:21,980-Speed 5472.99 samples/sec   Loss 4.5559   LearningRate 0.0555   Epoch: 11   Global Step: 122660   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:37:29,466-Speed 5471.96 samples/sec   Loss 4.5312   LearningRate 0.0555   Epoch: 11   Global Step: 122670   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:37:36,969-Speed 5459.65 samples/sec   Loss 4.6031   LearningRate 0.0555   Epoch: 11   Global Step: 122680   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:37:44,458-Speed 5469.90 samples/sec   Loss 4.5972   LearningRate 0.0554   Epoch: 11   Global Step: 122690   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:37:52,009-Speed 5425.91 samples/sec   Loss 4.5824   LearningRate 0.0554   Epoch: 11   Global Step: 122700   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:37:59,641-Speed 5367.32 samples/sec   Loss 4.5450   LearningRate 0.0554   Epoch: 11   Global Step: 122710   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 22:38:07,159-Speed 5449.04 samples/sec   Loss 4.6315   LearningRate 0.0554   Epoch: 11   Global Step: 122720   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:38:14,710-Speed 5424.41 samples/sec   Loss 4.5692   LearningRate 0.0554   Epoch: 11   Global Step: 122730   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:38:22,250-Speed 5433.69 samples/sec   Loss 4.5267   LearningRate 0.0554   Epoch: 11   Global Step: 122740   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:38:29,751-Speed 5460.81 samples/sec   Loss 4.6186   LearningRate 0.0554   Epoch: 11   Global Step: 122750   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:38:37,212-Speed 5490.59 samples/sec   Loss 4.5369   LearningRate 0.0553   Epoch: 11   Global Step: 122760   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:38:44,761-Speed 5426.88 samples/sec   Loss 4.5084   LearningRate 0.0553   Epoch: 11   Global Step: 122770   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:38:52,407-Speed 5357.93 samples/sec   Loss 4.5813   LearningRate 0.0553   Epoch: 11   Global Step: 122780   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:38:59,851-Speed 5503.43 samples/sec   Loss 4.5342   LearningRate 0.0553   Epoch: 11   Global Step: 122790   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:39:07,324-Speed 5481.53 samples/sec   Loss 4.4856   LearningRate 0.0553   Epoch: 11   Global Step: 122800   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:39:15,195-Speed 5204.55 samples/sec   Loss 4.5064   LearningRate 0.0553   Epoch: 11   Global Step: 122810   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:39:22,680-Speed 5472.99 samples/sec   Loss 4.5472   LearningRate 0.0553   Epoch: 11   Global Step: 122820   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 22:39:30,105-Speed 5517.62 samples/sec   Loss 4.5371   LearningRate 0.0553   Epoch: 11   Global Step: 122830   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 22:39:37,667-Speed 5417.05 samples/sec   Loss 4.5783   LearningRate 0.0552   Epoch: 11   Global Step: 122840   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 22:39:45,209-Speed 5431.63 samples/sec   Loss 4.5735   LearningRate 0.0552   Epoch: 11   Global Step: 122850   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 22:39:52,736-Speed 5441.95 samples/sec   Loss 4.5582   LearningRate 0.0552   Epoch: 11   Global Step: 122860   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:40:00,252-Speed 5450.83 samples/sec   Loss 4.5596   LearningRate 0.0552   Epoch: 11   Global Step: 122870   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:40:07,829-Speed 5406.49 samples/sec   Loss 4.5612   LearningRate 0.0552   Epoch: 11   Global Step: 122880   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:40:15,479-Speed 5354.55 samples/sec   Loss 4.5850   LearningRate 0.0552   Epoch: 11   Global Step: 122890   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:40:23,094-Speed 5379.75 samples/sec   Loss 4.5706   LearningRate 0.0552   Epoch: 11   Global Step: 122900   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:40:30,615-Speed 5447.07 samples/sec   Loss 4.5410   LearningRate 0.0551   Epoch: 11   Global Step: 122910   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:40:38,091-Speed 5479.57 samples/sec   Loss 4.5691   LearningRate 0.0551   Epoch: 11   Global Step: 122920   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:40:45,653-Speed 5416.80 samples/sec   Loss 4.5668   LearningRate 0.0551   Epoch: 11   Global Step: 122930   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:40:53,179-Speed 5443.08 samples/sec   Loss 4.5590   LearningRate 0.0551   Epoch: 11   Global Step: 122940   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:41:00,717-Speed 5434.58 samples/sec   Loss 4.5235   LearningRate 0.0551   Epoch: 11   Global Step: 122950   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:41:08,214-Speed 5464.61 samples/sec   Loss 4.6185   LearningRate 0.0551   Epoch: 11   Global Step: 122960   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 22:41:15,679-Speed 5487.30 samples/sec   Loss 4.5529   LearningRate 0.0551   Epoch: 11   Global Step: 122970   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:41:23,235-Speed 5421.81 samples/sec   Loss 4.5125   LearningRate 0.0551   Epoch: 11   Global Step: 122980   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:41:30,787-Speed 5425.06 samples/sec   Loss 4.5469   LearningRate 0.0550   Epoch: 11   Global Step: 122990   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:41:38,296-Speed 5454.88 samples/sec   Loss 4.5568   LearningRate 0.0550   Epoch: 11   Global Step: 123000   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:41:45,789-Speed 5467.53 samples/sec   Loss 4.5659   LearningRate 0.0550   Epoch: 11   Global Step: 123010   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:41:53,427-Speed 5363.01 samples/sec   Loss 4.5444   LearningRate 0.0550   Epoch: 11   Global Step: 123020   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:42:00,918-Speed 5468.78 samples/sec   Loss 4.5377   LearningRate 0.0550   Epoch: 11   Global Step: 123030   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:42:08,455-Speed 5434.90 samples/sec   Loss 4.5649   LearningRate 0.0550   Epoch: 11   Global Step: 123040   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:42:15,973-Speed 5449.34 samples/sec   Loss 4.5348   LearningRate 0.0550   Epoch: 11   Global Step: 123050   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:42:23,500-Speed 5442.17 samples/sec   Loss 4.5110   LearningRate 0.0550   Epoch: 11   Global Step: 123060   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:42:31,019-Speed 5448.48 samples/sec   Loss 4.5426   LearningRate 0.0549   Epoch: 11   Global Step: 123070   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:42:38,517-Speed 5463.09 samples/sec   Loss 4.5251   LearningRate 0.0549   Epoch: 11   Global Step: 123080   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:42:45,966-Speed 5499.76 samples/sec   Loss 4.5227   LearningRate 0.0549   Epoch: 11   Global Step: 123090   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:42:53,494-Speed 5441.73 samples/sec   Loss 4.5787   LearningRate 0.0549   Epoch: 11   Global Step: 123100   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:43:01,081-Speed 5399.19 samples/sec   Loss 4.5704   LearningRate 0.0549   Epoch: 11   Global Step: 123110   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:43:08,600-Speed 5448.54 samples/sec   Loss 4.5011   LearningRate 0.0549   Epoch: 11   Global Step: 123120   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:43:16,136-Speed 5435.49 samples/sec   Loss 4.4967   LearningRate 0.0549   Epoch: 11   Global Step: 123130   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:43:23,685-Speed 5426.54 samples/sec   Loss 4.4924   LearningRate 0.0549   Epoch: 11   Global Step: 123140   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:43:31,188-Speed 5459.98 samples/sec   Loss 4.5197   LearningRate 0.0548   Epoch: 11   Global Step: 123150   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:43:38,624-Speed 5509.42 samples/sec   Loss 4.5442   LearningRate 0.0548   Epoch: 11   Global Step: 123160   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:43:46,110-Speed 5472.31 samples/sec   Loss 4.4377   LearningRate 0.0548   Epoch: 11   Global Step: 123170   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:43:53,611-Speed 5461.11 samples/sec   Loss 4.5245   LearningRate 0.0548   Epoch: 11   Global Step: 123180   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:44:01,049-Speed 5507.29 samples/sec   Loss 4.5587   LearningRate 0.0548   Epoch: 11   Global Step: 123190   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:44:08,513-Speed 5488.72 samples/sec   Loss 4.5364   LearningRate 0.0548   Epoch: 11   Global Step: 123200   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:44:15,972-Speed 5491.89 samples/sec   Loss 4.5413   LearningRate 0.0548   Epoch: 11   Global Step: 123210   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 22:44:23,448-Speed 5479.72 samples/sec   Loss 4.5808   LearningRate 0.0547   Epoch: 11   Global Step: 123220   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 22:44:30,910-Speed 5489.75 samples/sec   Loss 4.4949   LearningRate 0.0547   Epoch: 11   Global Step: 123230   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 22:44:38,497-Speed 5399.62 samples/sec   Loss 4.5233   LearningRate 0.0547   Epoch: 11   Global Step: 123240   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 22:44:45,996-Speed 5463.17 samples/sec   Loss 4.5229   LearningRate 0.0547   Epoch: 11   Global Step: 123250   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:44:53,534-Speed 5434.15 samples/sec   Loss 4.5454   LearningRate 0.0547   Epoch: 11   Global Step: 123260   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:45:01,022-Speed 5470.52 samples/sec   Loss 4.5232   LearningRate 0.0547   Epoch: 11   Global Step: 123270   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:45:08,558-Speed 5435.71 samples/sec   Loss 4.4851   LearningRate 0.0547   Epoch: 11   Global Step: 123280   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:45:16,081-Speed 5445.86 samples/sec   Loss 4.5508   LearningRate 0.0547   Epoch: 11   Global Step: 123290   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:45:23,549-Speed 5485.24 samples/sec   Loss 4.5763   LearningRate 0.0546   Epoch: 11   Global Step: 123300   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:45:31,027-Speed 5477.83 samples/sec   Loss 4.5028   LearningRate 0.0546   Epoch: 11   Global Step: 123310   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:45:38,585-Speed 5420.75 samples/sec   Loss 4.5042   LearningRate 0.0546   Epoch: 11   Global Step: 123320   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:45:46,079-Speed 5466.57 samples/sec   Loss 4.5013   LearningRate 0.0546   Epoch: 11   Global Step: 123330   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:45:53,546-Speed 5485.69 samples/sec   Loss 4.4910   LearningRate 0.0546   Epoch: 11   Global Step: 123340   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:46:00,998-Speed 5497.33 samples/sec   Loss 4.4902   LearningRate 0.0546   Epoch: 11   Global Step: 123350   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 22:46:08,482-Speed 5473.52 samples/sec   Loss 4.5570   LearningRate 0.0546   Epoch: 11   Global Step: 123360   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 22:46:15,970-Speed 5470.80 samples/sec   Loss 4.5509   LearningRate 0.0546   Epoch: 11   Global Step: 123370   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:46:23,472-Speed 5460.64 samples/sec   Loss 4.5522   LearningRate 0.0545   Epoch: 11   Global Step: 123380   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:46:31,016-Speed 5430.33 samples/sec   Loss 4.5065   LearningRate 0.0545   Epoch: 11   Global Step: 123390   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:46:38,590-Speed 5408.37 samples/sec   Loss 4.5384   LearningRate 0.0545   Epoch: 11   Global Step: 123400   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:46:46,068-Speed 5478.48 samples/sec   Loss 4.5214   LearningRate 0.0545   Epoch: 11   Global Step: 123410   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:46:53,498-Speed 5513.87 samples/sec   Loss 4.5116   LearningRate 0.0545   Epoch: 11   Global Step: 123420   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:47:00,950-Speed 5496.65 samples/sec   Loss 4.5468   LearningRate 0.0545   Epoch: 11   Global Step: 123430   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:47:08,453-Speed 5460.00 samples/sec   Loss 4.5409   LearningRate 0.0545   Epoch: 11   Global Step: 123440   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:47:15,965-Speed 5453.57 samples/sec   Loss 4.5079   LearningRate 0.0544   Epoch: 11   Global Step: 123450   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:47:23,443-Speed 5478.06 samples/sec   Loss 4.4679   LearningRate 0.0544   Epoch: 11   Global Step: 123460   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:47:30,954-Speed 5453.36 samples/sec   Loss 4.5218   LearningRate 0.0544   Epoch: 11   Global Step: 123470   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:47:38,580-Speed 5372.01 samples/sec   Loss 4.5525   LearningRate 0.0544   Epoch: 11   Global Step: 123480   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:47:46,172-Speed 5395.80 samples/sec   Loss 4.5630   LearningRate 0.0544   Epoch: 11   Global Step: 123490   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:47:53,677-Speed 5458.94 samples/sec   Loss 4.5433   LearningRate 0.0544   Epoch: 11   Global Step: 123500   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:48:01,202-Speed 5443.68 samples/sec   Loss 4.5171   LearningRate 0.0544   Epoch: 11   Global Step: 123510   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:48:08,698-Speed 5464.54 samples/sec   Loss 4.5261   LearningRate 0.0544   Epoch: 11   Global Step: 123520   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:48:16,192-Speed 5466.53 samples/sec   Loss 4.5039   LearningRate 0.0543   Epoch: 11   Global Step: 123530   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:48:24,456-Speed 4957.15 samples/sec   Loss 4.5465   LearningRate 0.0543   Epoch: 11   Global Step: 123540   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:48:31,972-Speed 5450.97 samples/sec   Loss 4.4850   LearningRate 0.0543   Epoch: 11   Global Step: 123550   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:48:39,503-Speed 5438.84 samples/sec   Loss 4.5687   LearningRate 0.0543   Epoch: 11   Global Step: 123560   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:48:47,043-Speed 5433.78 samples/sec   Loss 4.4812   LearningRate 0.0543   Epoch: 11   Global Step: 123570   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 22:48:54,549-Speed 5457.20 samples/sec   Loss 4.5674   LearningRate 0.0543   Epoch: 11   Global Step: 123580   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 22:49:02,044-Speed 5465.97 samples/sec   Loss 4.4995   LearningRate 0.0543   Epoch: 11   Global Step: 123590   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:49:09,694-Speed 5354.98 samples/sec   Loss 4.5180   LearningRate 0.0543   Epoch: 11   Global Step: 123600   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:49:17,288-Speed 5394.56 samples/sec   Loss 4.4857   LearningRate 0.0542   Epoch: 11   Global Step: 123610   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:49:24,833-Speed 5429.34 samples/sec   Loss 4.5401   LearningRate 0.0542   Epoch: 11   Global Step: 123620   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:49:32,434-Speed 5389.59 samples/sec   Loss 4.5097   LearningRate 0.0542   Epoch: 11   Global Step: 123630   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:49:39,944-Speed 5454.26 samples/sec   Loss 4.5380   LearningRate 0.0542   Epoch: 11   Global Step: 123640   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:49:47,467-Speed 5445.50 samples/sec   Loss 4.5250   LearningRate 0.0542   Epoch: 11   Global Step: 123650   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:49:54,952-Speed 5472.71 samples/sec   Loss 4.5170   LearningRate 0.0542   Epoch: 11   Global Step: 123660   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:50:02,390-Speed 5507.96 samples/sec   Loss 4.5329   LearningRate 0.0542   Epoch: 11   Global Step: 123670   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:50:09,928-Speed 5434.40 samples/sec   Loss 4.4704   LearningRate 0.0541   Epoch: 11   Global Step: 123680   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:50:17,421-Speed 5467.39 samples/sec   Loss 4.5118   LearningRate 0.0541   Epoch: 11   Global Step: 123690   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:50:24,907-Speed 5473.96 samples/sec   Loss 4.5359   LearningRate 0.0541   Epoch: 11   Global Step: 123700   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:50:32,525-Speed 5377.45 samples/sec   Loss 4.5359   LearningRate 0.0541   Epoch: 11   Global Step: 123710   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:50:40,055-Speed 5440.21 samples/sec   Loss 4.5243   LearningRate 0.0541   Epoch: 11   Global Step: 123720   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:50:47,609-Speed 5422.05 samples/sec   Loss 4.4821   LearningRate 0.0541   Epoch: 11   Global Step: 123730   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:50:55,069-Speed 5492.01 samples/sec   Loss 4.4764   LearningRate 0.0541   Epoch: 11   Global Step: 123740   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:51:02,592-Speed 5445.37 samples/sec   Loss 4.5365   LearningRate 0.0541   Epoch: 11   Global Step: 123750   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:51:10,242-Speed 5355.21 samples/sec   Loss 4.5006   LearningRate 0.0540   Epoch: 11   Global Step: 123760   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:51:17,740-Speed 5463.19 samples/sec   Loss 4.4725   LearningRate 0.0540   Epoch: 11   Global Step: 123770   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:51:25,181-Speed 5505.69 samples/sec   Loss 4.5214   LearningRate 0.0540   Epoch: 11   Global Step: 123780   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:51:32,642-Speed 5490.86 samples/sec   Loss 4.4925   LearningRate 0.0540   Epoch: 11   Global Step: 123790   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 22:51:40,205-Speed 5416.16 samples/sec   Loss 4.5163   LearningRate 0.0540   Epoch: 11   Global Step: 123800   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:51:47,676-Speed 5482.94 samples/sec   Loss 4.4735   LearningRate 0.0540   Epoch: 11   Global Step: 123810   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:51:55,165-Speed 5470.67 samples/sec   Loss 4.5026   LearningRate 0.0540   Epoch: 11   Global Step: 123820   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:52:02,578-Speed 5526.17 samples/sec   Loss 4.4971   LearningRate 0.0540   Epoch: 11   Global Step: 123830   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:52:10,148-Speed 5411.52 samples/sec   Loss 4.4979   LearningRate 0.0539   Epoch: 11   Global Step: 123840   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:52:17,580-Speed 5512.12 samples/sec   Loss 4.4423   LearningRate 0.0539   Epoch: 11   Global Step: 123850   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:52:25,177-Speed 5392.33 samples/sec   Loss 4.5078   LearningRate 0.0539   Epoch: 11   Global Step: 123860   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:52:32,779-Speed 5388.58 samples/sec   Loss 4.5245   LearningRate 0.0539   Epoch: 11   Global Step: 123870   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:52:40,332-Speed 5423.66 samples/sec   Loss 4.4650   LearningRate 0.0539   Epoch: 11   Global Step: 123880   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:52:47,773-Speed 5505.81 samples/sec   Loss 4.5033   LearningRate 0.0539   Epoch: 11   Global Step: 123890   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:52:55,314-Speed 5431.89 samples/sec   Loss 4.5277   LearningRate 0.0539   Epoch: 11   Global Step: 123900   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 22:53:02,926-Speed 5381.76 samples/sec   Loss 4.4842   LearningRate 0.0539   Epoch: 11   Global Step: 123910   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 22:53:10,452-Speed 5443.31 samples/sec   Loss 4.5194   LearningRate 0.0538   Epoch: 11   Global Step: 123920   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 22:53:17,964-Speed 5453.69 samples/sec   Loss 4.4780   LearningRate 0.0538   Epoch: 11   Global Step: 123930   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:53:25,552-Speed 5398.25 samples/sec   Loss 4.4936   LearningRate 0.0538   Epoch: 11   Global Step: 123940   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:53:32,991-Speed 5507.39 samples/sec   Loss 4.4663   LearningRate 0.0538   Epoch: 11   Global Step: 123950   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:53:40,398-Speed 5530.79 samples/sec   Loss 4.5087   LearningRate 0.0538   Epoch: 11   Global Step: 123960   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:53:47,944-Speed 5428.49 samples/sec   Loss 4.5463   LearningRate 0.0538   Epoch: 11   Global Step: 123970   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:53:55,401-Speed 5492.98 samples/sec   Loss 4.5007   LearningRate 0.0538   Epoch: 11   Global Step: 123980   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:54:03,016-Speed 5380.31 samples/sec   Loss 4.5592   LearningRate 0.0537   Epoch: 11   Global Step: 123990   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:54:10,508-Speed 5467.79 samples/sec   Loss 4.5316   LearningRate 0.0537   Epoch: 11   Global Step: 124000   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:54:54,520-[lfw][124000]XNorm: 23.200261
Training: 2022-01-08 22:54:54,521-[lfw][124000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-01-08 22:54:54,521-[lfw][124000]Accuracy-Highest: 0.99817
Training: 2022-01-08 22:55:45,668-[cfp_fp][124000]XNorm: 21.354716
Training: 2022-01-08 22:55:45,668-[cfp_fp][124000]Accuracy-Flip: 0.98929+-0.00551
Training: 2022-01-08 22:55:45,669-[cfp_fp][124000]Accuracy-Highest: 0.99057
Training: 2022-01-08 22:56:29,556-[agedb_30][124000]XNorm: 23.171578
Training: 2022-01-08 22:56:29,557-[agedb_30][124000]Accuracy-Flip: 0.97917+-0.00790
Training: 2022-01-08 22:56:29,557-[agedb_30][124000]Accuracy-Highest: 0.98000
Training: 2022-01-08 22:56:37,070-Speed 279.47 samples/sec   Loss 4.5409   LearningRate 0.0537   Epoch: 11   Global Step: 124010   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:56:44,650-Speed 5403.90 samples/sec   Loss 4.5612   LearningRate 0.0537   Epoch: 11   Global Step: 124020   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:56:52,079-Speed 5514.34 samples/sec   Loss 4.4831   LearningRate 0.0537   Epoch: 11   Global Step: 124030   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 22:56:59,564-Speed 5472.74 samples/sec   Loss 4.4691   LearningRate 0.0537   Epoch: 11   Global Step: 124040   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:57:07,112-Speed 5428.06 samples/sec   Loss 4.5058   LearningRate 0.0537   Epoch: 11   Global Step: 124050   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:57:14,550-Speed 5507.88 samples/sec   Loss 4.4403   LearningRate 0.0537   Epoch: 11   Global Step: 124060   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:57:21,980-Speed 5512.69 samples/sec   Loss 4.4789   LearningRate 0.0536   Epoch: 11   Global Step: 124070   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:57:29,428-Speed 5500.91 samples/sec   Loss 4.5124   LearningRate 0.0536   Epoch: 11   Global Step: 124080   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:57:37,087-Speed 5348.91 samples/sec   Loss 4.4799   LearningRate 0.0536   Epoch: 11   Global Step: 124090   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:57:44,652-Speed 5414.38 samples/sec   Loss 4.5102   LearningRate 0.0536   Epoch: 11   Global Step: 124100   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:57:52,132-Speed 5476.88 samples/sec   Loss 4.5076   LearningRate 0.0536   Epoch: 11   Global Step: 124110   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:57:59,760-Speed 5370.17 samples/sec   Loss 4.4724   LearningRate 0.0536   Epoch: 11   Global Step: 124120   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:58:07,329-Speed 5412.78 samples/sec   Loss 4.4935   LearningRate 0.0536   Epoch: 11   Global Step: 124130   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:58:14,942-Speed 5380.72 samples/sec   Loss 4.4544   LearningRate 0.0536   Epoch: 11   Global Step: 124140   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 22:58:22,427-Speed 5473.24 samples/sec   Loss 4.4592   LearningRate 0.0535   Epoch: 11   Global Step: 124150   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:58:29,949-Speed 5445.79 samples/sec   Loss 4.4792   LearningRate 0.0535   Epoch: 11   Global Step: 124160   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:58:37,532-Speed 5402.32 samples/sec   Loss 4.4665   LearningRate 0.0535   Epoch: 11   Global Step: 124170   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:58:45,067-Speed 5436.83 samples/sec   Loss 4.4653   LearningRate 0.0535   Epoch: 11   Global Step: 124180   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:58:52,594-Speed 5442.27 samples/sec   Loss 4.4907   LearningRate 0.0535   Epoch: 11   Global Step: 124190   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 22:59:00,014-Speed 5520.71 samples/sec   Loss 4.4841   LearningRate 0.0535   Epoch: 11   Global Step: 124200   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:59:07,539-Speed 5444.48 samples/sec   Loss 4.4741   LearningRate 0.0535   Epoch: 11   Global Step: 124210   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:59:14,986-Speed 5500.73 samples/sec   Loss 4.4986   LearningRate 0.0535   Epoch: 11   Global Step: 124220   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:59:22,441-Speed 5495.22 samples/sec   Loss 4.4993   LearningRate 0.0534   Epoch: 11   Global Step: 124230   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:59:29,983-Speed 5431.35 samples/sec   Loss 4.4908   LearningRate 0.0534   Epoch: 11   Global Step: 124240   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:59:37,496-Speed 5452.55 samples/sec   Loss 4.5500   LearningRate 0.0534   Epoch: 11   Global Step: 124250   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:59:45,008-Speed 5453.73 samples/sec   Loss 4.4935   LearningRate 0.0534   Epoch: 11   Global Step: 124260   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:59:52,505-Speed 5464.01 samples/sec   Loss 4.4653   LearningRate 0.0534   Epoch: 11   Global Step: 124270   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 22:59:59,950-Speed 5502.17 samples/sec   Loss 4.4420   LearningRate 0.0534   Epoch: 11   Global Step: 124280   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:00:07,470-Speed 5448.36 samples/sec   Loss 4.4579   LearningRate 0.0534   Epoch: 11   Global Step: 124290   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:00:15,072-Speed 5388.25 samples/sec   Loss 4.4628   LearningRate 0.0533   Epoch: 11   Global Step: 124300   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:00:22,508-Speed 5509.62 samples/sec   Loss 4.4481   LearningRate 0.0533   Epoch: 11   Global Step: 124310   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:00:30,025-Speed 5449.49 samples/sec   Loss 4.4507   LearningRate 0.0533   Epoch: 11   Global Step: 124320   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:00:37,471-Speed 5501.88 samples/sec   Loss 4.5141   LearningRate 0.0533   Epoch: 11   Global Step: 124330   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:00:44,956-Speed 5473.31 samples/sec   Loss 4.4650   LearningRate 0.0533   Epoch: 11   Global Step: 124340   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:00:52,395-Speed 5506.31 samples/sec   Loss 4.4653   LearningRate 0.0533   Epoch: 11   Global Step: 124350   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:00:59,934-Speed 5434.06 samples/sec   Loss 4.5533   LearningRate 0.0533   Epoch: 11   Global Step: 124360   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:01:07,376-Speed 5504.87 samples/sec   Loss 4.5161   LearningRate 0.0533   Epoch: 11   Global Step: 124370   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:01:14,837-Speed 5490.61 samples/sec   Loss 4.4748   LearningRate 0.0532   Epoch: 11   Global Step: 124380   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:01:22,300-Speed 5489.16 samples/sec   Loss 4.4954   LearningRate 0.0532   Epoch: 11   Global Step: 124390   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:01:29,791-Speed 5468.18 samples/sec   Loss 4.4591   LearningRate 0.0532   Epoch: 11   Global Step: 124400   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 23:01:37,241-Speed 5498.48 samples/sec   Loss 4.4883   LearningRate 0.0532   Epoch: 11   Global Step: 124410   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:01:44,750-Speed 5456.14 samples/sec   Loss 4.4818   LearningRate 0.0532   Epoch: 11   Global Step: 124420   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:01:52,234-Speed 5473.88 samples/sec   Loss 4.4659   LearningRate 0.0532   Epoch: 11   Global Step: 124430   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:02:14,463-Speed 1842.69 samples/sec   Loss 4.5146   LearningRate 0.0532   Epoch: 12   Global Step: 124440   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:02:21,923-Speed 5491.34 samples/sec   Loss 4.4836   LearningRate 0.0532   Epoch: 12   Global Step: 124450   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:02:29,427-Speed 5459.66 samples/sec   Loss 4.5169   LearningRate 0.0531   Epoch: 12   Global Step: 124460   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:02:36,966-Speed 5433.66 samples/sec   Loss 4.4191   LearningRate 0.0531   Epoch: 12   Global Step: 124470   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:02:44,471-Speed 5458.71 samples/sec   Loss 4.4290   LearningRate 0.0531   Epoch: 12   Global Step: 124480   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:02:51,987-Speed 5450.20 samples/sec   Loss 4.4471   LearningRate 0.0531   Epoch: 12   Global Step: 124490   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:02:59,517-Speed 5440.29 samples/sec   Loss 4.4524   LearningRate 0.0531   Epoch: 12   Global Step: 124500   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:03:07,002-Speed 5472.81 samples/sec   Loss 4.4882   LearningRate 0.0531   Epoch: 12   Global Step: 124510   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:03:14,500-Speed 5464.13 samples/sec   Loss 4.5026   LearningRate 0.0531   Epoch: 12   Global Step: 124520   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:03:21,982-Speed 5474.58 samples/sec   Loss 4.4929   LearningRate 0.0531   Epoch: 12   Global Step: 124530   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:03:29,608-Speed 5371.90 samples/sec   Loss 4.4954   LearningRate 0.0530   Epoch: 12   Global Step: 124540   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:03:37,038-Speed 5514.14 samples/sec   Loss 4.4623   LearningRate 0.0530   Epoch: 12   Global Step: 124550   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:03:44,503-Speed 5486.83 samples/sec   Loss 4.4283   LearningRate 0.0530   Epoch: 12   Global Step: 124560   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:03:51,982-Speed 5477.57 samples/sec   Loss 4.4651   LearningRate 0.0530   Epoch: 12   Global Step: 124570   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:03:59,469-Speed 5472.25 samples/sec   Loss 4.4542   LearningRate 0.0530   Epoch: 12   Global Step: 124580   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:04:06,906-Speed 5508.10 samples/sec   Loss 4.4565   LearningRate 0.0530   Epoch: 12   Global Step: 124590   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:04:14,498-Speed 5395.72 samples/sec   Loss 4.4086   LearningRate 0.0530   Epoch: 12   Global Step: 124600   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:04:21,958-Speed 5491.67 samples/sec   Loss 4.4562   LearningRate 0.0530   Epoch: 12   Global Step: 124610   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:04:29,432-Speed 5481.36 samples/sec   Loss 4.3924   LearningRate 0.0529   Epoch: 12   Global Step: 124620   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:04:36,887-Speed 5495.19 samples/sec   Loss 4.4332   LearningRate 0.0529   Epoch: 12   Global Step: 124630   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:04:44,462-Speed 5407.61 samples/sec   Loss 4.4106   LearningRate 0.0529   Epoch: 12   Global Step: 124640   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:04:52,194-Speed 5297.63 samples/sec   Loss 4.4065   LearningRate 0.0529   Epoch: 12   Global Step: 124650   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:04:59,883-Speed 5328.75 samples/sec   Loss 4.4204   LearningRate 0.0529   Epoch: 12   Global Step: 124660   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:05:07,616-Speed 5297.09 samples/sec   Loss 4.3990   LearningRate 0.0529   Epoch: 12   Global Step: 124670   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:05:15,294-Speed 5335.22 samples/sec   Loss 4.3834   LearningRate 0.0529   Epoch: 12   Global Step: 124680   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:05:22,994-Speed 5319.83 samples/sec   Loss 4.4083   LearningRate 0.0529   Epoch: 12   Global Step: 124690   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:05:30,684-Speed 5327.34 samples/sec   Loss 4.4900   LearningRate 0.0528   Epoch: 12   Global Step: 124700   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:05:38,162-Speed 5478.38 samples/sec   Loss 4.4523   LearningRate 0.0528   Epoch: 12   Global Step: 124710   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:05:45,686-Speed 5444.42 samples/sec   Loss 4.4528   LearningRate 0.0528   Epoch: 12   Global Step: 124720   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:05:53,215-Speed 5441.08 samples/sec   Loss 4.4492   LearningRate 0.0528   Epoch: 12   Global Step: 124730   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:06:00,731-Speed 5450.65 samples/sec   Loss 4.3566   LearningRate 0.0528   Epoch: 12   Global Step: 124740   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:06:08,283-Speed 5424.37 samples/sec   Loss 4.4774   LearningRate 0.0528   Epoch: 12   Global Step: 124750   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:06:15,839-Speed 5421.16 samples/sec   Loss 4.4675   LearningRate 0.0528   Epoch: 12   Global Step: 124760   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:06:23,379-Speed 5433.00 samples/sec   Loss 4.4054   LearningRate 0.0527   Epoch: 12   Global Step: 124770   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:06:30,862-Speed 5475.15 samples/sec   Loss 4.4043   LearningRate 0.0527   Epoch: 12   Global Step: 124780   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:06:38,403-Speed 5432.11 samples/sec   Loss 4.4408   LearningRate 0.0527   Epoch: 12   Global Step: 124790   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:06:45,911-Speed 5456.56 samples/sec   Loss 4.4206   LearningRate 0.0527   Epoch: 12   Global Step: 124800   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:06:53,418-Speed 5456.45 samples/sec   Loss 4.4896   LearningRate 0.0527   Epoch: 12   Global Step: 124810   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:07:00,914-Speed 5465.08 samples/sec   Loss 4.4908   LearningRate 0.0527   Epoch: 12   Global Step: 124820   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:07:08,427-Speed 5453.07 samples/sec   Loss 4.4479   LearningRate 0.0527   Epoch: 12   Global Step: 124830   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:07:15,944-Speed 5449.14 samples/sec   Loss 4.4710   LearningRate 0.0527   Epoch: 12   Global Step: 124840   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:07:23,514-Speed 5411.84 samples/sec   Loss 4.4216   LearningRate 0.0526   Epoch: 12   Global Step: 124850   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:07:31,068-Speed 5422.99 samples/sec   Loss 4.4036   LearningRate 0.0526   Epoch: 12   Global Step: 124860   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:07:38,606-Speed 5434.41 samples/sec   Loss 4.4533   LearningRate 0.0526   Epoch: 12   Global Step: 124870   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:07:46,265-Speed 5348.54 samples/sec   Loss 4.4809   LearningRate 0.0526   Epoch: 12   Global Step: 124880   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:07:53,906-Speed 5361.14 samples/sec   Loss 4.4800   LearningRate 0.0526   Epoch: 12   Global Step: 124890   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:08:01,434-Speed 5442.04 samples/sec   Loss 4.4459   LearningRate 0.0526   Epoch: 12   Global Step: 124900   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:08:08,912-Speed 5478.07 samples/sec   Loss 4.4451   LearningRate 0.0526   Epoch: 12   Global Step: 124910   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:08:16,779-Speed 5206.81 samples/sec   Loss 4.4545   LearningRate 0.0526   Epoch: 12   Global Step: 124920   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:08:24,292-Speed 5452.58 samples/sec   Loss 4.4602   LearningRate 0.0525   Epoch: 12   Global Step: 124930   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:08:31,916-Speed 5372.92 samples/sec   Loss 4.4754   LearningRate 0.0525   Epoch: 12   Global Step: 124940   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:08:39,458-Speed 5432.29 samples/sec   Loss 4.4779   LearningRate 0.0525   Epoch: 12   Global Step: 124950   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:08:46,965-Speed 5457.04 samples/sec   Loss 4.4365   LearningRate 0.0525   Epoch: 12   Global Step: 124960   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:08:54,521-Speed 5421.49 samples/sec   Loss 4.4430   LearningRate 0.0525   Epoch: 12   Global Step: 124970   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:09:02,003-Speed 5474.92 samples/sec   Loss 4.4566   LearningRate 0.0525   Epoch: 12   Global Step: 124980   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:09:09,541-Speed 5434.46 samples/sec   Loss 4.4684   LearningRate 0.0525   Epoch: 12   Global Step: 124990   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:09:17,019-Speed 5478.02 samples/sec   Loss 4.4268   LearningRate 0.0525   Epoch: 12   Global Step: 125000   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:09:24,519-Speed 5462.51 samples/sec   Loss 4.4547   LearningRate 0.0524   Epoch: 12   Global Step: 125010   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:09:32,008-Speed 5469.54 samples/sec   Loss 4.4632   LearningRate 0.0524   Epoch: 12   Global Step: 125020   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:09:39,605-Speed 5392.87 samples/sec   Loss 4.4242   LearningRate 0.0524   Epoch: 12   Global Step: 125030   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:09:47,130-Speed 5443.50 samples/sec   Loss 4.4508   LearningRate 0.0524   Epoch: 12   Global Step: 125040   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:09:54,741-Speed 5382.22 samples/sec   Loss 4.4578   LearningRate 0.0524   Epoch: 12   Global Step: 125050   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:10:02,239-Speed 5463.82 samples/sec   Loss 4.4382   LearningRate 0.0524   Epoch: 12   Global Step: 125060   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:10:09,686-Speed 5500.90 samples/sec   Loss 4.4234   LearningRate 0.0524   Epoch: 12   Global Step: 125070   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:10:17,105-Speed 5521.41 samples/sec   Loss 4.4309   LearningRate 0.0524   Epoch: 12   Global Step: 125080   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:10:24,594-Speed 5470.40 samples/sec   Loss 4.4061   LearningRate 0.0523   Epoch: 12   Global Step: 125090   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:10:32,057-Speed 5489.07 samples/sec   Loss 4.3679   LearningRate 0.0523   Epoch: 12   Global Step: 125100   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:10:39,505-Speed 5500.08 samples/sec   Loss 4.4769   LearningRate 0.0523   Epoch: 12   Global Step: 125110   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:10:46,948-Speed 5504.13 samples/sec   Loss 4.4324   LearningRate 0.0523   Epoch: 12   Global Step: 125120   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:10:54,419-Speed 5482.74 samples/sec   Loss 4.4122   LearningRate 0.0523   Epoch: 12   Global Step: 125130   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:11:02,004-Speed 5401.29 samples/sec   Loss 4.4255   LearningRate 0.0523   Epoch: 12   Global Step: 125140   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:11:09,506-Speed 5460.64 samples/sec   Loss 4.4717   LearningRate 0.0523   Epoch: 12   Global Step: 125150   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:11:16,966-Speed 5491.37 samples/sec   Loss 4.3951   LearningRate 0.0523   Epoch: 12   Global Step: 125160   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:11:24,454-Speed 5470.62 samples/sec   Loss 4.4463   LearningRate 0.0522   Epoch: 12   Global Step: 125170   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:11:31,896-Speed 5504.89 samples/sec   Loss 4.4590   LearningRate 0.0522   Epoch: 12   Global Step: 125180   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:11:39,404-Speed 5455.76 samples/sec   Loss 4.4629   LearningRate 0.0522   Epoch: 12   Global Step: 125190   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:11:46,900-Speed 5465.50 samples/sec   Loss 4.4192   LearningRate 0.0522   Epoch: 12   Global Step: 125200   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:11:54,361-Speed 5490.32 samples/sec   Loss 4.3959   LearningRate 0.0522   Epoch: 12   Global Step: 125210   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 23:12:01,803-Speed 5504.87 samples/sec   Loss 4.3880   LearningRate 0.0522   Epoch: 12   Global Step: 125220   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:12:09,380-Speed 5406.76 samples/sec   Loss 4.3862   LearningRate 0.0522   Epoch: 12   Global Step: 125230   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:12:16,824-Speed 5503.26 samples/sec   Loss 4.4123   LearningRate 0.0521   Epoch: 12   Global Step: 125240   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:12:24,307-Speed 5473.88 samples/sec   Loss 4.4224   LearningRate 0.0521   Epoch: 12   Global Step: 125250   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:12:31,767-Speed 5491.36 samples/sec   Loss 4.4253   LearningRate 0.0521   Epoch: 12   Global Step: 125260   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:12:39,263-Speed 5465.08 samples/sec   Loss 4.3935   LearningRate 0.0521   Epoch: 12   Global Step: 125270   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:12:46,738-Speed 5480.54 samples/sec   Loss 4.4594   LearningRate 0.0521   Epoch: 12   Global Step: 125280   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:12:54,238-Speed 5461.67 samples/sec   Loss 4.4324   LearningRate 0.0521   Epoch: 12   Global Step: 125290   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:13:01,717-Speed 5478.10 samples/sec   Loss 4.4407   LearningRate 0.0521   Epoch: 12   Global Step: 125300   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:13:09,225-Speed 5455.74 samples/sec   Loss 4.4737   LearningRate 0.0521   Epoch: 12   Global Step: 125310   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:13:16,727-Speed 5460.59 samples/sec   Loss 4.4235   LearningRate 0.0520   Epoch: 12   Global Step: 125320   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 23:13:24,297-Speed 5411.74 samples/sec   Loss 4.4331   LearningRate 0.0520   Epoch: 12   Global Step: 125330   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 23:13:31,724-Speed 5515.70 samples/sec   Loss 4.3792   LearningRate 0.0520   Epoch: 12   Global Step: 125340   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:13:39,217-Speed 5467.54 samples/sec   Loss 4.4340   LearningRate 0.0520   Epoch: 12   Global Step: 125350   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:13:46,732-Speed 5451.42 samples/sec   Loss 4.4039   LearningRate 0.0520   Epoch: 12   Global Step: 125360   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:13:54,144-Speed 5526.20 samples/sec   Loss 4.4159   LearningRate 0.0520   Epoch: 12   Global Step: 125370   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:14:01,627-Speed 5474.76 samples/sec   Loss 4.3996   LearningRate 0.0520   Epoch: 12   Global Step: 125380   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:14:09,111-Speed 5473.44 samples/sec   Loss 4.4115   LearningRate 0.0520   Epoch: 12   Global Step: 125390   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:14:16,601-Speed 5469.78 samples/sec   Loss 4.4080   LearningRate 0.0519   Epoch: 12   Global Step: 125400   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:14:23,660-Speed 5803.22 samples/sec   Loss 4.4012   LearningRate 0.0519   Epoch: 12   Global Step: 125410   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:14:30,572-Speed 5926.63 samples/sec   Loss 4.4336   LearningRate 0.0519   Epoch: 12   Global Step: 125420   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:14:37,532-Speed 5885.34 samples/sec   Loss 4.4624   LearningRate 0.0519   Epoch: 12   Global Step: 125430   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:14:44,505-Speed 5875.54 samples/sec   Loss 4.3711   LearningRate 0.0519   Epoch: 12   Global Step: 125440   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 23:14:51,869-Speed 5563.13 samples/sec   Loss 4.4424   LearningRate 0.0519   Epoch: 12   Global Step: 125450   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 23:14:59,447-Speed 5405.24 samples/sec   Loss 4.4187   LearningRate 0.0519   Epoch: 12   Global Step: 125460   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 23:15:06,949-Speed 5460.96 samples/sec   Loss 4.3771   LearningRate 0.0519   Epoch: 12   Global Step: 125470   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 23:15:14,488-Speed 5434.04 samples/sec   Loss 4.3527   LearningRate 0.0518   Epoch: 12   Global Step: 125480   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:15:22,039-Speed 5424.57 samples/sec   Loss 4.4490   LearningRate 0.0518   Epoch: 12   Global Step: 125490   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:15:29,604-Speed 5415.69 samples/sec   Loss 4.3816   LearningRate 0.0518   Epoch: 12   Global Step: 125500   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:15:37,227-Speed 5373.71 samples/sec   Loss 4.4336   LearningRate 0.0518   Epoch: 12   Global Step: 125510   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:15:44,775-Speed 5427.34 samples/sec   Loss 4.4402   LearningRate 0.0518   Epoch: 12   Global Step: 125520   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:15:52,307-Speed 5438.28 samples/sec   Loss 4.4094   LearningRate 0.0518   Epoch: 12   Global Step: 125530   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:15:59,763-Speed 5495.02 samples/sec   Loss 4.4601   LearningRate 0.0518   Epoch: 12   Global Step: 125540   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:16:07,515-Speed 5283.77 samples/sec   Loss 4.4223   LearningRate 0.0518   Epoch: 12   Global Step: 125550   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:16:14,972-Speed 5494.06 samples/sec   Loss 4.3840   LearningRate 0.0517   Epoch: 12   Global Step: 125560   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:16:22,494-Speed 5445.84 samples/sec   Loss 4.4157   LearningRate 0.0517   Epoch: 12   Global Step: 125570   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:16:30,072-Speed 5406.08 samples/sec   Loss 4.4282   LearningRate 0.0517   Epoch: 12   Global Step: 125580   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 23:16:37,605-Speed 5438.37 samples/sec   Loss 4.4207   LearningRate 0.0517   Epoch: 12   Global Step: 125590   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 23:16:45,127-Speed 5445.60 samples/sec   Loss 4.4152   LearningRate 0.0517   Epoch: 12   Global Step: 125600   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:16:52,751-Speed 5373.83 samples/sec   Loss 4.4006   LearningRate 0.0517   Epoch: 12   Global Step: 125610   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:17:00,411-Speed 5347.42 samples/sec   Loss 4.4197   LearningRate 0.0517   Epoch: 12   Global Step: 125620   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:17:07,988-Speed 5406.79 samples/sec   Loss 4.4462   LearningRate 0.0517   Epoch: 12   Global Step: 125630   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:17:15,542-Speed 5422.75 samples/sec   Loss 4.4208   LearningRate 0.0516   Epoch: 12   Global Step: 125640   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:17:23,061-Speed 5448.76 samples/sec   Loss 4.3638   LearningRate 0.0516   Epoch: 12   Global Step: 125650   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:17:30,545-Speed 5473.69 samples/sec   Loss 4.3602   LearningRate 0.0516   Epoch: 12   Global Step: 125660   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:17:37,979-Speed 5510.09 samples/sec   Loss 4.3754   LearningRate 0.0516   Epoch: 12   Global Step: 125670   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:17:45,487-Speed 5456.98 samples/sec   Loss 4.4544   LearningRate 0.0516   Epoch: 12   Global Step: 125680   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:17:53,024-Speed 5434.82 samples/sec   Loss 4.4085   LearningRate 0.0516   Epoch: 12   Global Step: 125690   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:18:00,499-Speed 5480.34 samples/sec   Loss 4.3356   LearningRate 0.0516   Epoch: 12   Global Step: 125700   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:18:08,009-Speed 5454.71 samples/sec   Loss 4.3533   LearningRate 0.0516   Epoch: 12   Global Step: 125710   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:18:15,556-Speed 5427.92 samples/sec   Loss 4.4273   LearningRate 0.0515   Epoch: 12   Global Step: 125720   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:18:23,190-Speed 5366.31 samples/sec   Loss 4.3632   LearningRate 0.0515   Epoch: 12   Global Step: 125730   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:18:30,661-Speed 5483.66 samples/sec   Loss 4.4217   LearningRate 0.0515   Epoch: 12   Global Step: 125740   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:18:38,147-Speed 5471.81 samples/sec   Loss 4.4040   LearningRate 0.0515   Epoch: 12   Global Step: 125750   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:18:45,585-Speed 5507.43 samples/sec   Loss 4.4051   LearningRate 0.0515   Epoch: 12   Global Step: 125760   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:18:53,214-Speed 5369.95 samples/sec   Loss 4.3977   LearningRate 0.0515   Epoch: 12   Global Step: 125770   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:19:00,909-Speed 5323.75 samples/sec   Loss 4.3886   LearningRate 0.0515   Epoch: 12   Global Step: 125780   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:19:08,457-Speed 5427.22 samples/sec   Loss 4.4316   LearningRate 0.0515   Epoch: 12   Global Step: 125790   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:19:16,054-Speed 5392.11 samples/sec   Loss 4.3819   LearningRate 0.0514   Epoch: 12   Global Step: 125800   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:19:23,556-Speed 5460.47 samples/sec   Loss 4.3535   LearningRate 0.0514   Epoch: 12   Global Step: 125810   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:19:31,109-Speed 5424.45 samples/sec   Loss 4.4059   LearningRate 0.0514   Epoch: 12   Global Step: 125820   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:19:38,618-Speed 5455.22 samples/sec   Loss 4.4292   LearningRate 0.0514   Epoch: 12   Global Step: 125830   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:19:46,110-Speed 5467.60 samples/sec   Loss 4.3622   LearningRate 0.0514   Epoch: 12   Global Step: 125840   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:19:53,730-Speed 5375.98 samples/sec   Loss 4.3658   LearningRate 0.0514   Epoch: 12   Global Step: 125850   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:20:01,278-Speed 5427.67 samples/sec   Loss 4.4032   LearningRate 0.0514   Epoch: 12   Global Step: 125860   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:20:08,791-Speed 5452.06 samples/sec   Loss 4.3694   LearningRate 0.0514   Epoch: 12   Global Step: 125870   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:20:16,369-Speed 5405.71 samples/sec   Loss 4.4141   LearningRate 0.0513   Epoch: 12   Global Step: 125880   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:20:23,876-Speed 5457.56 samples/sec   Loss 4.3522   LearningRate 0.0513   Epoch: 12   Global Step: 125890   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:20:31,513-Speed 5364.24 samples/sec   Loss 4.4200   LearningRate 0.0513   Epoch: 12   Global Step: 125900   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:20:39,046-Speed 5437.75 samples/sec   Loss 4.3813   LearningRate 0.0513   Epoch: 12   Global Step: 125910   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:20:46,527-Speed 5476.31 samples/sec   Loss 4.3475   LearningRate 0.0513   Epoch: 12   Global Step: 125920   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:20:53,961-Speed 5510.63 samples/sec   Loss 4.4030   LearningRate 0.0513   Epoch: 12   Global Step: 125930   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:21:01,461-Speed 5461.89 samples/sec   Loss 4.3750   LearningRate 0.0513   Epoch: 12   Global Step: 125940   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:21:08,964-Speed 5459.61 samples/sec   Loss 4.4234   LearningRate 0.0513   Epoch: 12   Global Step: 125950   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:21:16,576-Speed 5381.88 samples/sec   Loss 4.3561   LearningRate 0.0512   Epoch: 12   Global Step: 125960   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:21:24,231-Speed 5351.77 samples/sec   Loss 4.4112   LearningRate 0.0512   Epoch: 12   Global Step: 125970   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:21:31,720-Speed 5469.41 samples/sec   Loss 4.4354   LearningRate 0.0512   Epoch: 12   Global Step: 125980   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:21:39,194-Speed 5481.12 samples/sec   Loss 4.4089   LearningRate 0.0512   Epoch: 12   Global Step: 125990   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:21:46,636-Speed 5504.57 samples/sec   Loss 4.4302   LearningRate 0.0512   Epoch: 12   Global Step: 126000   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:22:30,614-[lfw][126000]XNorm: 23.421095
Training: 2022-01-08 23:22:30,615-[lfw][126000]Accuracy-Flip: 0.99783+-0.00308
Training: 2022-01-08 23:22:30,616-[lfw][126000]Accuracy-Highest: 0.99817
Training: 2022-01-08 23:23:22,301-[cfp_fp][126000]XNorm: 21.811621
Training: 2022-01-08 23:23:22,302-[cfp_fp][126000]Accuracy-Flip: 0.99129+-0.00416
Training: 2022-01-08 23:23:22,303-[cfp_fp][126000]Accuracy-Highest: 0.99129
Training: 2022-01-08 23:24:06,997-[agedb_30][126000]XNorm: 23.365409
Training: 2022-01-08 23:24:06,998-[agedb_30][126000]Accuracy-Flip: 0.97933+-0.00716
Training: 2022-01-08 23:24:06,998-[agedb_30][126000]Accuracy-Highest: 0.98000
Training: 2022-01-08 23:24:14,700-Speed 276.64 samples/sec   Loss 4.3588   LearningRate 0.0512   Epoch: 12   Global Step: 126010   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:24:22,188-Speed 5471.28 samples/sec   Loss 4.3650   LearningRate 0.0512   Epoch: 12   Global Step: 126020   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:24:29,661-Speed 5482.63 samples/sec   Loss 4.3887   LearningRate 0.0512   Epoch: 12   Global Step: 126030   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:24:37,147-Speed 5471.69 samples/sec   Loss 4.3276   LearningRate 0.0511   Epoch: 12   Global Step: 126040   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:24:44,752-Speed 5386.85 samples/sec   Loss 4.4104   LearningRate 0.0511   Epoch: 12   Global Step: 126050   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:24:52,233-Speed 5476.44 samples/sec   Loss 4.3898   LearningRate 0.0511   Epoch: 12   Global Step: 126060   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:24:59,765-Speed 5438.41 samples/sec   Loss 4.3696   LearningRate 0.0511   Epoch: 12   Global Step: 126070   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 23:25:07,421-Speed 5351.06 samples/sec   Loss 4.3546   LearningRate 0.0511   Epoch: 12   Global Step: 126080   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 23:25:14,928-Speed 5456.84 samples/sec   Loss 4.3232   LearningRate 0.0511   Epoch: 12   Global Step: 126090   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 23:25:22,445-Speed 5449.84 samples/sec   Loss 4.4148   LearningRate 0.0511   Epoch: 12   Global Step: 126100   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 23:25:29,969-Speed 5444.73 samples/sec   Loss 4.3943   LearningRate 0.0511   Epoch: 12   Global Step: 126110   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:25:37,517-Speed 5427.54 samples/sec   Loss 4.3928   LearningRate 0.0510   Epoch: 12   Global Step: 126120   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:25:44,964-Speed 5501.23 samples/sec   Loss 4.3530   LearningRate 0.0510   Epoch: 12   Global Step: 126130   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:25:52,457-Speed 5466.99 samples/sec   Loss 4.3875   LearningRate 0.0510   Epoch: 12   Global Step: 126140   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:26:00,021-Speed 5415.61 samples/sec   Loss 4.3911   LearningRate 0.0510   Epoch: 12   Global Step: 126150   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:26:07,703-Speed 5332.93 samples/sec   Loss 4.3857   LearningRate 0.0510   Epoch: 12   Global Step: 126160   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:26:15,551-Speed 5220.12 samples/sec   Loss 4.3474   LearningRate 0.0510   Epoch: 12   Global Step: 126170   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:26:23,118-Speed 5413.65 samples/sec   Loss 4.3767   LearningRate 0.0510   Epoch: 12   Global Step: 126180   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:26:30,678-Speed 5418.59 samples/sec   Loss 4.3783   LearningRate 0.0510   Epoch: 12   Global Step: 126190   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:26:38,316-Speed 5363.07 samples/sec   Loss 4.4130   LearningRate 0.0509   Epoch: 12   Global Step: 126200   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:26:46,048-Speed 5298.53 samples/sec   Loss 4.3771   LearningRate 0.0509   Epoch: 12   Global Step: 126210   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:26:53,677-Speed 5369.92 samples/sec   Loss 4.4006   LearningRate 0.0509   Epoch: 12   Global Step: 126220   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:27:01,162-Speed 5472.07 samples/sec   Loss 4.3734   LearningRate 0.0509   Epoch: 12   Global Step: 126230   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:27:08,684-Speed 5446.85 samples/sec   Loss 4.3962   LearningRate 0.0509   Epoch: 12   Global Step: 126240   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:27:16,209-Speed 5443.31 samples/sec   Loss 4.4365   LearningRate 0.0509   Epoch: 12   Global Step: 126250   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:27:23,691-Speed 5475.62 samples/sec   Loss 4.3172   LearningRate 0.0509   Epoch: 12   Global Step: 126260   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:27:31,231-Speed 5432.58 samples/sec   Loss 4.3372   LearningRate 0.0508   Epoch: 12   Global Step: 126270   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:27:38,690-Speed 5492.59 samples/sec   Loss 4.4338   LearningRate 0.0508   Epoch: 12   Global Step: 126280   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:27:46,277-Speed 5399.12 samples/sec   Loss 4.3321   LearningRate 0.0508   Epoch: 12   Global Step: 126290   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:27:53,977-Speed 5320.16 samples/sec   Loss 4.3820   LearningRate 0.0508   Epoch: 12   Global Step: 126300   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:28:01,480-Speed 5459.73 samples/sec   Loss 4.3744   LearningRate 0.0508   Epoch: 12   Global Step: 126310   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 23:28:08,890-Speed 5528.63 samples/sec   Loss 4.3608   LearningRate 0.0508   Epoch: 12   Global Step: 126320   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:28:16,346-Speed 5494.38 samples/sec   Loss 4.3736   LearningRate 0.0508   Epoch: 12   Global Step: 126330   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:28:23,768-Speed 5520.00 samples/sec   Loss 4.3496   LearningRate 0.0508   Epoch: 12   Global Step: 126340   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:28:31,317-Speed 5426.39 samples/sec   Loss 4.3320   LearningRate 0.0507   Epoch: 12   Global Step: 126350   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:28:38,841-Speed 5444.10 samples/sec   Loss 4.3101   LearningRate 0.0507   Epoch: 12   Global Step: 126360   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:28:46,354-Speed 5452.98 samples/sec   Loss 4.3376   LearningRate 0.0507   Epoch: 12   Global Step: 126370   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:28:53,828-Speed 5481.00 samples/sec   Loss 4.3825   LearningRate 0.0507   Epoch: 12   Global Step: 126380   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:29:01,424-Speed 5393.36 samples/sec   Loss 4.3432   LearningRate 0.0507   Epoch: 12   Global Step: 126390   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:29:08,971-Speed 5427.76 samples/sec   Loss 4.3407   LearningRate 0.0507   Epoch: 12   Global Step: 126400   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:29:16,413-Speed 5505.34 samples/sec   Loss 4.3408   LearningRate 0.0507   Epoch: 12   Global Step: 126410   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:29:23,878-Speed 5487.78 samples/sec   Loss 4.3859   LearningRate 0.0507   Epoch: 12   Global Step: 126420   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:29:31,401-Speed 5445.00 samples/sec   Loss 4.3628   LearningRate 0.0506   Epoch: 12   Global Step: 126430   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:29:38,943-Speed 5431.27 samples/sec   Loss 4.2956   LearningRate 0.0506   Epoch: 12   Global Step: 126440   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:29:46,385-Speed 5505.02 samples/sec   Loss 4.3345   LearningRate 0.0506   Epoch: 12   Global Step: 126450   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:29:53,824-Speed 5506.58 samples/sec   Loss 4.3672   LearningRate 0.0506   Epoch: 12   Global Step: 126460   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:30:01,295-Speed 5483.34 samples/sec   Loss 4.3140   LearningRate 0.0506   Epoch: 12   Global Step: 126470   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:30:12,393-Speed 3691.08 samples/sec   Loss 4.3674   LearningRate 0.0506   Epoch: 12   Global Step: 126480   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:30:20,020-Speed 5371.33 samples/sec   Loss 4.3714   LearningRate 0.0506   Epoch: 12   Global Step: 126490   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:30:27,539-Speed 5448.52 samples/sec   Loss 4.3575   LearningRate 0.0506   Epoch: 12   Global Step: 126500   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 23:30:35,081-Speed 5431.77 samples/sec   Loss 4.4059   LearningRate 0.0505   Epoch: 12   Global Step: 126510   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:30:42,685-Speed 5386.69 samples/sec   Loss 4.3110   LearningRate 0.0505   Epoch: 12   Global Step: 126520   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:30:50,237-Speed 5424.89 samples/sec   Loss 4.3551   LearningRate 0.0505   Epoch: 12   Global Step: 126530   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:30:57,660-Speed 5518.74 samples/sec   Loss 4.3491   LearningRate 0.0505   Epoch: 12   Global Step: 126540   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:31:05,131-Speed 5483.38 samples/sec   Loss 4.3403   LearningRate 0.0505   Epoch: 12   Global Step: 126550   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:31:12,673-Speed 5431.63 samples/sec   Loss 4.3621   LearningRate 0.0505   Epoch: 12   Global Step: 126560   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:31:20,136-Speed 5489.10 samples/sec   Loss 4.3820   LearningRate 0.0505   Epoch: 12   Global Step: 126570   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:31:27,682-Speed 5428.96 samples/sec   Loss 4.3115   LearningRate 0.0505   Epoch: 12   Global Step: 126580   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:31:35,213-Speed 5439.51 samples/sec   Loss 4.3591   LearningRate 0.0504   Epoch: 12   Global Step: 126590   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:31:42,647-Speed 5510.52 samples/sec   Loss 4.3449   LearningRate 0.0504   Epoch: 12   Global Step: 126600   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:31:50,111-Speed 5488.82 samples/sec   Loss 4.3260   LearningRate 0.0504   Epoch: 12   Global Step: 126610   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 23:31:57,610-Speed 5462.25 samples/sec   Loss 4.2898   LearningRate 0.0504   Epoch: 12   Global Step: 126620   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 23:32:05,093-Speed 5474.82 samples/sec   Loss 4.3843   LearningRate 0.0504   Epoch: 12   Global Step: 126630   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 23:32:12,600-Speed 5456.60 samples/sec   Loss 4.2784   LearningRate 0.0504   Epoch: 12   Global Step: 126640   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:32:20,045-Speed 5502.36 samples/sec   Loss 4.3136   LearningRate 0.0504   Epoch: 12   Global Step: 126650   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:32:27,576-Speed 5440.02 samples/sec   Loss 4.3064   LearningRate 0.0504   Epoch: 12   Global Step: 126660   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:32:35,026-Speed 5498.51 samples/sec   Loss 4.3357   LearningRate 0.0503   Epoch: 12   Global Step: 126670   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:32:42,486-Speed 5491.69 samples/sec   Loss 4.4048   LearningRate 0.0503   Epoch: 12   Global Step: 126680   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:32:49,955-Speed 5484.37 samples/sec   Loss 4.3980   LearningRate 0.0503   Epoch: 12   Global Step: 126690   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:32:57,412-Speed 5493.38 samples/sec   Loss 4.4269   LearningRate 0.0503   Epoch: 12   Global Step: 126700   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:33:04,860-Speed 5500.91 samples/sec   Loss 4.3330   LearningRate 0.0503   Epoch: 12   Global Step: 126710   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:33:12,368-Speed 5455.92 samples/sec   Loss 4.3159   LearningRate 0.0503   Epoch: 12   Global Step: 126720   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:33:19,967-Speed 5390.54 samples/sec   Loss 4.3475   LearningRate 0.0503   Epoch: 12   Global Step: 126730   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:33:27,563-Speed 5393.85 samples/sec   Loss 4.3315   LearningRate 0.0503   Epoch: 12   Global Step: 126740   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:33:35,124-Speed 5417.67 samples/sec   Loss 4.3471   LearningRate 0.0502   Epoch: 12   Global Step: 126750   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:33:42,653-Speed 5441.20 samples/sec   Loss 4.3280   LearningRate 0.0502   Epoch: 12   Global Step: 126760   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:33:50,186-Speed 5437.37 samples/sec   Loss 4.3544   LearningRate 0.0502   Epoch: 12   Global Step: 126770   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:33:57,710-Speed 5445.42 samples/sec   Loss 4.3735   LearningRate 0.0502   Epoch: 12   Global Step: 126780   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:34:05,168-Speed 5492.61 samples/sec   Loss 4.3684   LearningRate 0.0502   Epoch: 12   Global Step: 126790   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:34:12,722-Speed 5423.02 samples/sec   Loss 4.3665   LearningRate 0.0502   Epoch: 12   Global Step: 126800   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:34:20,265-Speed 5430.60 samples/sec   Loss 4.3679   LearningRate 0.0502   Epoch: 12   Global Step: 126810   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:34:27,767-Speed 5460.70 samples/sec   Loss 4.3398   LearningRate 0.0502   Epoch: 12   Global Step: 126820   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:34:35,347-Speed 5404.35 samples/sec   Loss 4.3568   LearningRate 0.0502   Epoch: 12   Global Step: 126830   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 23:34:42,818-Speed 5482.98 samples/sec   Loss 4.3205   LearningRate 0.0501   Epoch: 12   Global Step: 126840   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 23:34:50,376-Speed 5420.64 samples/sec   Loss 4.3547   LearningRate 0.0501   Epoch: 12   Global Step: 126850   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 23:34:57,782-Speed 5531.23 samples/sec   Loss 4.3231   LearningRate 0.0501   Epoch: 12   Global Step: 126860   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:35:05,359-Speed 5407.02 samples/sec   Loss 4.3388   LearningRate 0.0501   Epoch: 12   Global Step: 126870   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:35:12,839-Speed 5476.19 samples/sec   Loss 4.3624   LearningRate 0.0501   Epoch: 12   Global Step: 126880   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:35:20,355-Speed 5450.64 samples/sec   Loss 4.3346   LearningRate 0.0501   Epoch: 12   Global Step: 126890   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:35:27,840-Speed 5472.78 samples/sec   Loss 4.3147   LearningRate 0.0501   Epoch: 12   Global Step: 126900   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:35:35,365-Speed 5444.34 samples/sec   Loss 4.2828   LearningRate 0.0501   Epoch: 12   Global Step: 126910   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:35:42,858-Speed 5466.96 samples/sec   Loss 4.3705   LearningRate 0.0500   Epoch: 12   Global Step: 126920   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:35:50,432-Speed 5408.52 samples/sec   Loss 4.3030   LearningRate 0.0500   Epoch: 12   Global Step: 126930   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:35:58,197-Speed 5275.75 samples/sec   Loss 4.3030   LearningRate 0.0500   Epoch: 12   Global Step: 126940   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:36:05,719-Speed 5446.44 samples/sec   Loss 4.3521   LearningRate 0.0500   Epoch: 12   Global Step: 126950   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:36:13,201-Speed 5474.93 samples/sec   Loss 4.3372   LearningRate 0.0500   Epoch: 12   Global Step: 126960   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:36:20,803-Speed 5388.42 samples/sec   Loss 4.3277   LearningRate 0.0500   Epoch: 12   Global Step: 126970   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:36:28,391-Speed 5399.53 samples/sec   Loss 4.3085   LearningRate 0.0500   Epoch: 12   Global Step: 126980   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:36:35,886-Speed 5464.97 samples/sec   Loss 4.3662   LearningRate 0.0500   Epoch: 12   Global Step: 126990   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:36:43,460-Speed 5408.96 samples/sec   Loss 4.3372   LearningRate 0.0499   Epoch: 12   Global Step: 127000   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:36:50,981-Speed 5446.27 samples/sec   Loss 4.2958   LearningRate 0.0499   Epoch: 12   Global Step: 127010   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:36:58,564-Speed 5403.26 samples/sec   Loss 4.3834   LearningRate 0.0499   Epoch: 12   Global Step: 127020   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:37:06,122-Speed 5419.39 samples/sec   Loss 4.3334   LearningRate 0.0499   Epoch: 12   Global Step: 127030   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:37:13,864-Speed 5291.76 samples/sec   Loss 4.3362   LearningRate 0.0499   Epoch: 12   Global Step: 127040   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:37:21,352-Speed 5470.17 samples/sec   Loss 4.2933   LearningRate 0.0499   Epoch: 12   Global Step: 127050   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:37:28,961-Speed 5384.09 samples/sec   Loss 4.3156   LearningRate 0.0499   Epoch: 12   Global Step: 127060   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:37:36,764-Speed 5250.04 samples/sec   Loss 4.2812   LearningRate 0.0499   Epoch: 12   Global Step: 127070   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:37:44,304-Speed 5432.82 samples/sec   Loss 4.3194   LearningRate 0.0498   Epoch: 12   Global Step: 127080   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:37:51,772-Speed 5485.31 samples/sec   Loss 4.3064   LearningRate 0.0498   Epoch: 12   Global Step: 127090   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:37:59,246-Speed 5481.56 samples/sec   Loss 4.3467   LearningRate 0.0498   Epoch: 12   Global Step: 127100   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:38:06,724-Speed 5478.07 samples/sec   Loss 4.2980   LearningRate 0.0498   Epoch: 12   Global Step: 127110   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:38:14,180-Speed 5494.48 samples/sec   Loss 4.3425   LearningRate 0.0498   Epoch: 12   Global Step: 127120   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:38:21,699-Speed 5448.34 samples/sec   Loss 4.2699   LearningRate 0.0498   Epoch: 12   Global Step: 127130   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 23:38:29,282-Speed 5402.24 samples/sec   Loss 4.2821   LearningRate 0.0498   Epoch: 12   Global Step: 127140   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 23:38:36,727-Speed 5502.59 samples/sec   Loss 4.3310   LearningRate 0.0498   Epoch: 12   Global Step: 127150   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:38:44,234-Speed 5456.34 samples/sec   Loss 4.3608   LearningRate 0.0497   Epoch: 12   Global Step: 127160   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:38:51,675-Speed 5505.36 samples/sec   Loss 4.3024   LearningRate 0.0497   Epoch: 12   Global Step: 127170   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:38:59,240-Speed 5415.30 samples/sec   Loss 4.3060   LearningRate 0.0497   Epoch: 12   Global Step: 127180   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:39:06,763-Speed 5445.68 samples/sec   Loss 4.2918   LearningRate 0.0497   Epoch: 12   Global Step: 127190   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:39:14,236-Speed 5481.65 samples/sec   Loss 4.3217   LearningRate 0.0497   Epoch: 12   Global Step: 127200   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:39:21,649-Speed 5525.90 samples/sec   Loss 4.3308   LearningRate 0.0497   Epoch: 12   Global Step: 127210   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:39:29,111-Speed 5489.95 samples/sec   Loss 4.3513   LearningRate 0.0497   Epoch: 12   Global Step: 127220   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:39:36,606-Speed 5465.55 samples/sec   Loss 4.3141   LearningRate 0.0497   Epoch: 12   Global Step: 127230   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:39:44,157-Speed 5425.09 samples/sec   Loss 4.3089   LearningRate 0.0496   Epoch: 12   Global Step: 127240   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:39:51,715-Speed 5420.14 samples/sec   Loss 4.3246   LearningRate 0.0496   Epoch: 12   Global Step: 127250   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:39:59,160-Speed 5502.26 samples/sec   Loss 4.3314   LearningRate 0.0496   Epoch: 12   Global Step: 127260   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:40:06,624-Speed 5488.85 samples/sec   Loss 4.3176   LearningRate 0.0496   Epoch: 12   Global Step: 127270   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:40:14,078-Speed 5495.23 samples/sec   Loss 4.3356   LearningRate 0.0496   Epoch: 12   Global Step: 127280   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:40:21,517-Speed 5506.55 samples/sec   Loss 4.3192   LearningRate 0.0496   Epoch: 12   Global Step: 127290   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:40:28,992-Speed 5480.44 samples/sec   Loss 4.2939   LearningRate 0.0496   Epoch: 12   Global Step: 127300   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:40:36,467-Speed 5480.96 samples/sec   Loss 4.3246   LearningRate 0.0496   Epoch: 12   Global Step: 127310   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:40:43,984-Speed 5449.12 samples/sec   Loss 4.3400   LearningRate 0.0495   Epoch: 12   Global Step: 127320   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:40:51,507-Speed 5445.36 samples/sec   Loss 4.2977   LearningRate 0.0495   Epoch: 12   Global Step: 127330   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:40:58,995-Speed 5470.78 samples/sec   Loss 4.3157   LearningRate 0.0495   Epoch: 12   Global Step: 127340   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:41:06,528-Speed 5438.16 samples/sec   Loss 4.3258   LearningRate 0.0495   Epoch: 12   Global Step: 127350   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:41:13,980-Speed 5497.52 samples/sec   Loss 4.3192   LearningRate 0.0495   Epoch: 12   Global Step: 127360   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:41:21,483-Speed 5459.51 samples/sec   Loss 4.2934   LearningRate 0.0495   Epoch: 12   Global Step: 127370   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:41:28,983-Speed 5462.05 samples/sec   Loss 4.3138   LearningRate 0.0495   Epoch: 12   Global Step: 127380   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:41:36,532-Speed 5426.70 samples/sec   Loss 4.2643   LearningRate 0.0495   Epoch: 12   Global Step: 127390   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:41:44,004-Speed 5482.85 samples/sec   Loss 4.3361   LearningRate 0.0494   Epoch: 12   Global Step: 127400   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:41:51,453-Speed 5498.79 samples/sec   Loss 4.3220   LearningRate 0.0494   Epoch: 12   Global Step: 127410   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:41:58,906-Speed 5496.36 samples/sec   Loss 4.3420   LearningRate 0.0494   Epoch: 12   Global Step: 127420   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:42:06,373-Speed 5486.33 samples/sec   Loss 4.2551   LearningRate 0.0494   Epoch: 12   Global Step: 127430   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:42:13,874-Speed 5461.78 samples/sec   Loss 4.3327   LearningRate 0.0494   Epoch: 12   Global Step: 127440   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:42:21,336-Speed 5489.27 samples/sec   Loss 4.3385   LearningRate 0.0494   Epoch: 12   Global Step: 127450   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 23:42:28,827-Speed 5468.68 samples/sec   Loss 4.3296   LearningRate 0.0494   Epoch: 12   Global Step: 127460   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 23:42:36,339-Speed 5453.72 samples/sec   Loss 4.2856   LearningRate 0.0494   Epoch: 12   Global Step: 127470   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:42:43,859-Speed 5447.29 samples/sec   Loss 4.2799   LearningRate 0.0493   Epoch: 12   Global Step: 127480   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:42:51,416-Speed 5420.69 samples/sec   Loss 4.3113   LearningRate 0.0493   Epoch: 12   Global Step: 127490   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:42:58,994-Speed 5406.06 samples/sec   Loss 4.3004   LearningRate 0.0493   Epoch: 12   Global Step: 127500   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:43:06,704-Speed 5313.02 samples/sec   Loss 4.3125   LearningRate 0.0493   Epoch: 12   Global Step: 127510   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:43:14,194-Speed 5469.94 samples/sec   Loss 4.2963   LearningRate 0.0493   Epoch: 12   Global Step: 127520   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:43:21,676-Speed 5475.02 samples/sec   Loss 4.2889   LearningRate 0.0493   Epoch: 12   Global Step: 127530   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:43:29,301-Speed 5372.84 samples/sec   Loss 4.3255   LearningRate 0.0493   Epoch: 12   Global Step: 127540   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:43:36,787-Speed 5472.22 samples/sec   Loss 4.3016   LearningRate 0.0493   Epoch: 12   Global Step: 127550   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:43:44,276-Speed 5470.69 samples/sec   Loss 4.2981   LearningRate 0.0492   Epoch: 12   Global Step: 127560   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:43:51,752-Speed 5479.45 samples/sec   Loss 4.2997   LearningRate 0.0492   Epoch: 12   Global Step: 127570   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 23:43:59,213-Speed 5490.21 samples/sec   Loss 4.3763   LearningRate 0.0492   Epoch: 12   Global Step: 127580   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:44:06,706-Speed 5467.42 samples/sec   Loss 4.3135   LearningRate 0.0492   Epoch: 12   Global Step: 127590   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:44:14,251-Speed 5429.43 samples/sec   Loss 4.2853   LearningRate 0.0492   Epoch: 12   Global Step: 127600   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:44:21,866-Speed 5379.41 samples/sec   Loss 4.2855   LearningRate 0.0492   Epoch: 12   Global Step: 127610   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:44:29,466-Speed 5390.20 samples/sec   Loss 4.2878   LearningRate 0.0492   Epoch: 12   Global Step: 127620   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:44:36,968-Speed 5460.77 samples/sec   Loss 4.2695   LearningRate 0.0492   Epoch: 12   Global Step: 127630   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:44:44,531-Speed 5416.42 samples/sec   Loss 4.3248   LearningRate 0.0491   Epoch: 12   Global Step: 127640   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:44:52,016-Speed 5472.87 samples/sec   Loss 4.2851   LearningRate 0.0491   Epoch: 12   Global Step: 127650   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:44:59,461-Speed 5502.36 samples/sec   Loss 4.2484   LearningRate 0.0491   Epoch: 12   Global Step: 127660   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:45:06,971-Speed 5454.92 samples/sec   Loss 4.3263   LearningRate 0.0491   Epoch: 12   Global Step: 127670   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:45:14,431-Speed 5491.38 samples/sec   Loss 4.3045   LearningRate 0.0491   Epoch: 12   Global Step: 127680   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 23:45:21,848-Speed 5523.43 samples/sec   Loss 4.2741   LearningRate 0.0491   Epoch: 12   Global Step: 127690   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:45:29,364-Speed 5450.09 samples/sec   Loss 4.3288   LearningRate 0.0491   Epoch: 12   Global Step: 127700   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:45:36,856-Speed 5468.10 samples/sec   Loss 4.2985   LearningRate 0.0491   Epoch: 12   Global Step: 127710   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:45:44,378-Speed 5446.38 samples/sec   Loss 4.2683   LearningRate 0.0490   Epoch: 12   Global Step: 127720   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:45:52,074-Speed 5323.12 samples/sec   Loss 4.2803   LearningRate 0.0490   Epoch: 12   Global Step: 127730   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:45:59,655-Speed 5403.34 samples/sec   Loss 4.3113   LearningRate 0.0490   Epoch: 12   Global Step: 127740   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:46:07,093-Speed 5507.11 samples/sec   Loss 4.2620   LearningRate 0.0490   Epoch: 12   Global Step: 127750   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:46:14,692-Speed 5391.56 samples/sec   Loss 4.2688   LearningRate 0.0490   Epoch: 12   Global Step: 127760   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:46:22,254-Speed 5417.48 samples/sec   Loss 4.2650   LearningRate 0.0490   Epoch: 12   Global Step: 127770   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:46:29,734-Speed 5476.14 samples/sec   Loss 4.3106   LearningRate 0.0490   Epoch: 12   Global Step: 127780   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:46:37,220-Speed 5472.26 samples/sec   Loss 4.3256   LearningRate 0.0490   Epoch: 12   Global Step: 127790   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:46:44,720-Speed 5462.24 samples/sec   Loss 4.3105   LearningRate 0.0489   Epoch: 12   Global Step: 127800   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:46:52,181-Speed 5490.48 samples/sec   Loss 4.2925   LearningRate 0.0489   Epoch: 12   Global Step: 127810   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:46:59,750-Speed 5411.79 samples/sec   Loss 4.2348   LearningRate 0.0489   Epoch: 12   Global Step: 127820   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:47:07,224-Speed 5481.23 samples/sec   Loss 4.2752   LearningRate 0.0489   Epoch: 12   Global Step: 127830   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:47:14,782-Speed 5420.71 samples/sec   Loss 4.3289   LearningRate 0.0489   Epoch: 12   Global Step: 127840   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:47:22,280-Speed 5463.62 samples/sec   Loss 4.2421   LearningRate 0.0489   Epoch: 12   Global Step: 127850   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:47:29,799-Speed 5448.03 samples/sec   Loss 4.2717   LearningRate 0.0489   Epoch: 12   Global Step: 127860   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:47:37,213-Speed 5525.09 samples/sec   Loss 4.2365   LearningRate 0.0489   Epoch: 12   Global Step: 127870   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:47:44,680-Speed 5486.01 samples/sec   Loss 4.3013   LearningRate 0.0489   Epoch: 12   Global Step: 127880   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:47:52,109-Speed 5514.81 samples/sec   Loss 4.2988   LearningRate 0.0488   Epoch: 12   Global Step: 127890   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:47:59,544-Speed 5509.81 samples/sec   Loss 4.3191   LearningRate 0.0488   Epoch: 12   Global Step: 127900   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:48:07,066-Speed 5445.46 samples/sec   Loss 4.2838   LearningRate 0.0488   Epoch: 12   Global Step: 127910   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:48:14,565-Speed 5463.52 samples/sec   Loss 4.2588   LearningRate 0.0488   Epoch: 12   Global Step: 127920   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:48:22,032-Speed 5485.97 samples/sec   Loss 4.2544   LearningRate 0.0488   Epoch: 12   Global Step: 127930   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:48:29,568-Speed 5436.46 samples/sec   Loss 4.2666   LearningRate 0.0488   Epoch: 12   Global Step: 127940   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:48:37,148-Speed 5404.09 samples/sec   Loss 4.2737   LearningRate 0.0488   Epoch: 12   Global Step: 127950   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:48:44,687-Speed 5434.07 samples/sec   Loss 4.2527   LearningRate 0.0488   Epoch: 12   Global Step: 127960   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:48:52,186-Speed 5462.98 samples/sec   Loss 4.2537   LearningRate 0.0487   Epoch: 12   Global Step: 127970   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:48:59,687-Speed 5461.17 samples/sec   Loss 4.2860   LearningRate 0.0487   Epoch: 12   Global Step: 127980   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:49:07,143-Speed 5494.52 samples/sec   Loss 4.2400   LearningRate 0.0487   Epoch: 12   Global Step: 127990   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:49:14,663-Speed 5447.21 samples/sec   Loss 4.2407   LearningRate 0.0487   Epoch: 12   Global Step: 128000   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:49:58,279-[lfw][128000]XNorm: 21.962453
Training: 2022-01-08 23:49:58,280-[lfw][128000]Accuracy-Flip: 0.99767+-0.00281
Training: 2022-01-08 23:49:58,280-[lfw][128000]Accuracy-Highest: 0.99817
Training: 2022-01-08 23:50:49,158-[cfp_fp][128000]XNorm: 20.414131
Training: 2022-01-08 23:50:49,159-[cfp_fp][128000]Accuracy-Flip: 0.99157+-0.00480
Training: 2022-01-08 23:50:49,159-[cfp_fp][128000]Accuracy-Highest: 0.99157
Training: 2022-01-08 23:51:33,032-[agedb_30][128000]XNorm: 21.981309
Training: 2022-01-08 23:51:33,032-[agedb_30][128000]Accuracy-Flip: 0.97767+-0.00824
Training: 2022-01-08 23:51:33,033-[agedb_30][128000]Accuracy-Highest: 0.98000
Training: 2022-01-08 23:51:40,627-Speed 280.62 samples/sec   Loss 4.2651   LearningRate 0.0487   Epoch: 12   Global Step: 128010   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:51:48,142-Speed 5451.51 samples/sec   Loss 4.2561   LearningRate 0.0487   Epoch: 12   Global Step: 128020   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:51:55,719-Speed 5407.18 samples/sec   Loss 4.2506   LearningRate 0.0487   Epoch: 12   Global Step: 128030   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:52:03,233-Speed 5451.58 samples/sec   Loss 4.2804   LearningRate 0.0487   Epoch: 12   Global Step: 128040   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:52:10,698-Speed 5487.50 samples/sec   Loss 4.2709   LearningRate 0.0486   Epoch: 12   Global Step: 128050   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 23:52:18,208-Speed 5455.36 samples/sec   Loss 4.3037   LearningRate 0.0486   Epoch: 12   Global Step: 128060   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:52:25,650-Speed 5504.32 samples/sec   Loss 4.2571   LearningRate 0.0486   Epoch: 12   Global Step: 128070   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:52:33,175-Speed 5443.47 samples/sec   Loss 4.2536   LearningRate 0.0486   Epoch: 12   Global Step: 128080   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:52:40,696-Speed 5447.13 samples/sec   Loss 4.2698   LearningRate 0.0486   Epoch: 12   Global Step: 128090   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:52:48,160-Speed 5488.31 samples/sec   Loss 4.2636   LearningRate 0.0486   Epoch: 12   Global Step: 128100   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:52:55,635-Speed 5480.96 samples/sec   Loss 4.2877   LearningRate 0.0486   Epoch: 12   Global Step: 128110   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:53:03,139-Speed 5458.63 samples/sec   Loss 4.2993   LearningRate 0.0486   Epoch: 12   Global Step: 128120   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:53:10,717-Speed 5405.90 samples/sec   Loss 4.2572   LearningRate 0.0485   Epoch: 12   Global Step: 128130   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:53:18,250-Speed 5438.55 samples/sec   Loss 4.2319   LearningRate 0.0485   Epoch: 12   Global Step: 128140   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:53:25,822-Speed 5410.05 samples/sec   Loss 4.2817   LearningRate 0.0485   Epoch: 12   Global Step: 128150   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:53:33,277-Speed 5494.80 samples/sec   Loss 4.2331   LearningRate 0.0485   Epoch: 12   Global Step: 128160   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:53:40,740-Speed 5488.74 samples/sec   Loss 4.2900   LearningRate 0.0485   Epoch: 12   Global Step: 128170   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:53:48,255-Speed 5451.34 samples/sec   Loss 4.2538   LearningRate 0.0485   Epoch: 12   Global Step: 128180   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:53:55,724-Speed 5488.38 samples/sec   Loss 4.2569   LearningRate 0.0485   Epoch: 12   Global Step: 128190   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:54:03,236-Speed 5452.68 samples/sec   Loss 4.2222   LearningRate 0.0485   Epoch: 12   Global Step: 128200   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:54:10,731-Speed 5466.09 samples/sec   Loss 4.2890   LearningRate 0.0484   Epoch: 12   Global Step: 128210   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:54:18,352-Speed 5375.26 samples/sec   Loss 4.2708   LearningRate 0.0484   Epoch: 12   Global Step: 128220   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:54:26,026-Speed 5338.70 samples/sec   Loss 4.2678   LearningRate 0.0484   Epoch: 12   Global Step: 128230   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:54:33,562-Speed 5435.74 samples/sec   Loss 4.2905   LearningRate 0.0484   Epoch: 12   Global Step: 128240   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:54:41,118-Speed 5421.36 samples/sec   Loss 4.2312   LearningRate 0.0484   Epoch: 12   Global Step: 128250   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:54:48,700-Speed 5402.92 samples/sec   Loss 4.2511   LearningRate 0.0484   Epoch: 12   Global Step: 128260   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:54:56,409-Speed 5314.15 samples/sec   Loss 4.2374   LearningRate 0.0484   Epoch: 12   Global Step: 128270   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:55:04,001-Speed 5395.78 samples/sec   Loss 4.1969   LearningRate 0.0484   Epoch: 12   Global Step: 128280   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:55:11,535-Speed 5437.42 samples/sec   Loss 4.2669   LearningRate 0.0483   Epoch: 12   Global Step: 128290   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:55:19,182-Speed 5357.06 samples/sec   Loss 4.1976   LearningRate 0.0483   Epoch: 12   Global Step: 128300   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:55:26,689-Speed 5457.61 samples/sec   Loss 4.2585   LearningRate 0.0483   Epoch: 12   Global Step: 128310   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:55:34,313-Speed 5372.69 samples/sec   Loss 4.2388   LearningRate 0.0483   Epoch: 12   Global Step: 128320   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:55:41,824-Speed 5453.50 samples/sec   Loss 4.2628   LearningRate 0.0483   Epoch: 12   Global Step: 128330   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:55:49,474-Speed 5355.47 samples/sec   Loss 4.2202   LearningRate 0.0483   Epoch: 12   Global Step: 128340   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:55:56,987-Speed 5452.38 samples/sec   Loss 4.2335   LearningRate 0.0483   Epoch: 12   Global Step: 128350   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 23:56:04,452-Speed 5487.84 samples/sec   Loss 4.1972   LearningRate 0.0483   Epoch: 12   Global Step: 128360   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:56:12,040-Speed 5398.62 samples/sec   Loss 4.2345   LearningRate 0.0483   Epoch: 12   Global Step: 128370   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:56:19,597-Speed 5420.81 samples/sec   Loss 4.2981   LearningRate 0.0482   Epoch: 12   Global Step: 128380   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:56:27,160-Speed 5417.22 samples/sec   Loss 4.2439   LearningRate 0.0482   Epoch: 12   Global Step: 128390   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:56:34,697-Speed 5434.99 samples/sec   Loss 4.2117   LearningRate 0.0482   Epoch: 12   Global Step: 128400   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:56:42,232-Speed 5436.83 samples/sec   Loss 4.2450   LearningRate 0.0482   Epoch: 12   Global Step: 128410   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:56:49,830-Speed 5391.28 samples/sec   Loss 4.2750   LearningRate 0.0482   Epoch: 12   Global Step: 128420   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:56:57,413-Speed 5402.91 samples/sec   Loss 4.2569   LearningRate 0.0482   Epoch: 12   Global Step: 128430   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:57:05,028-Speed 5379.66 samples/sec   Loss 4.2170   LearningRate 0.0482   Epoch: 12   Global Step: 128440   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:57:12,538-Speed 5454.44 samples/sec   Loss 4.2486   LearningRate 0.0482   Epoch: 12   Global Step: 128450   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:57:20,067-Speed 5440.82 samples/sec   Loss 4.1789   LearningRate 0.0481   Epoch: 12   Global Step: 128460   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 23:57:27,573-Speed 5458.29 samples/sec   Loss 4.2594   LearningRate 0.0481   Epoch: 12   Global Step: 128470   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 23:57:35,051-Speed 5478.13 samples/sec   Loss 4.2071   LearningRate 0.0481   Epoch: 12   Global Step: 128480   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 23:57:42,519-Speed 5485.24 samples/sec   Loss 4.2370   LearningRate 0.0481   Epoch: 12   Global Step: 128490   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 23:57:50,063-Speed 5430.18 samples/sec   Loss 4.2570   LearningRate 0.0481   Epoch: 12   Global Step: 128500   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 23:57:57,602-Speed 5434.01 samples/sec   Loss 4.2566   LearningRate 0.0481   Epoch: 12   Global Step: 128510   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:58:05,094-Speed 5468.54 samples/sec   Loss 4.2826   LearningRate 0.0481   Epoch: 12   Global Step: 128520   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:58:12,582-Speed 5470.28 samples/sec   Loss 4.2368   LearningRate 0.0481   Epoch: 12   Global Step: 128530   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:58:20,033-Speed 5497.69 samples/sec   Loss 4.2713   LearningRate 0.0480   Epoch: 12   Global Step: 128540   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:58:27,598-Speed 5415.33 samples/sec   Loss 4.3260   LearningRate 0.0480   Epoch: 12   Global Step: 128550   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:58:35,135-Speed 5435.78 samples/sec   Loss 4.2767   LearningRate 0.0480   Epoch: 12   Global Step: 128560   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:58:42,675-Speed 5432.60 samples/sec   Loss 4.2658   LearningRate 0.0480   Epoch: 12   Global Step: 128570   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:58:50,186-Speed 5454.06 samples/sec   Loss 4.2594   LearningRate 0.0480   Epoch: 12   Global Step: 128580   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:58:57,714-Speed 5441.87 samples/sec   Loss 4.2132   LearningRate 0.0480   Epoch: 12   Global Step: 128590   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:59:05,195-Speed 5476.42 samples/sec   Loss 4.2563   LearningRate 0.0480   Epoch: 12   Global Step: 128600   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:59:12,678-Speed 5474.31 samples/sec   Loss 4.3009   LearningRate 0.0480   Epoch: 12   Global Step: 128610   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 23:59:20,150-Speed 5482.01 samples/sec   Loss 4.2381   LearningRate 0.0479   Epoch: 12   Global Step: 128620   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 23:59:27,675-Speed 5443.96 samples/sec   Loss 4.1895   LearningRate 0.0479   Epoch: 12   Global Step: 128630   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:59:35,203-Speed 5442.35 samples/sec   Loss 4.2850   LearningRate 0.0479   Epoch: 12   Global Step: 128640   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:59:42,747-Speed 5430.12 samples/sec   Loss 4.2332   LearningRate 0.0479   Epoch: 12   Global Step: 128650   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:59:50,265-Speed 5448.75 samples/sec   Loss 4.2528   LearningRate 0.0479   Epoch: 12   Global Step: 128660   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 23:59:57,696-Speed 5512.70 samples/sec   Loss 4.1849   LearningRate 0.0479   Epoch: 12   Global Step: 128670   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:00:05,167-Speed 5483.97 samples/sec   Loss 4.2528   LearningRate 0.0479   Epoch: 12   Global Step: 128680   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:00:12,656-Speed 5469.44 samples/sec   Loss 4.2192   LearningRate 0.0479   Epoch: 12   Global Step: 128690   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:00:20,188-Speed 5439.00 samples/sec   Loss 4.2159   LearningRate 0.0478   Epoch: 12   Global Step: 128700   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:00:27,631-Speed 5504.07 samples/sec   Loss 4.2423   LearningRate 0.0478   Epoch: 12   Global Step: 128710   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:00:35,136-Speed 5458.49 samples/sec   Loss 4.2241   LearningRate 0.0478   Epoch: 12   Global Step: 128720   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:00:42,653-Speed 5449.80 samples/sec   Loss 4.1821   LearningRate 0.0478   Epoch: 12   Global Step: 128730   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-09 00:00:50,108-Speed 5495.09 samples/sec   Loss 4.1909   LearningRate 0.0478   Epoch: 12   Global Step: 128740   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:00:57,579-Speed 5483.47 samples/sec   Loss 4.2361   LearningRate 0.0478   Epoch: 12   Global Step: 128750   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:01:05,053-Speed 5481.40 samples/sec   Loss 4.2613   LearningRate 0.0478   Epoch: 12   Global Step: 128760   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:01:12,504-Speed 5497.90 samples/sec   Loss 4.2227   LearningRate 0.0478   Epoch: 12   Global Step: 128770   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:01:20,002-Speed 5463.14 samples/sec   Loss 4.2499   LearningRate 0.0478   Epoch: 12   Global Step: 128780   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:01:27,737-Speed 5296.37 samples/sec   Loss 4.2249   LearningRate 0.0477   Epoch: 12   Global Step: 128790   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:01:35,281-Speed 5430.06 samples/sec   Loss 4.2425   LearningRate 0.0477   Epoch: 12   Global Step: 128800   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:01:42,741-Speed 5491.81 samples/sec   Loss 4.2139   LearningRate 0.0477   Epoch: 12   Global Step: 128810   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:01:50,355-Speed 5379.79 samples/sec   Loss 4.1920   LearningRate 0.0477   Epoch: 12   Global Step: 128820   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:01:57,797-Speed 5504.95 samples/sec   Loss 4.2301   LearningRate 0.0477   Epoch: 12   Global Step: 128830   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:02:05,300-Speed 5460.00 samples/sec   Loss 4.2091   LearningRate 0.0477   Epoch: 12   Global Step: 128840   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-09 00:02:12,756-Speed 5494.70 samples/sec   Loss 4.2780   LearningRate 0.0477   Epoch: 12   Global Step: 128850   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:02:20,240-Speed 5473.41 samples/sec   Loss 4.2470   LearningRate 0.0477   Epoch: 12   Global Step: 128860   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:02:27,691-Speed 5498.09 samples/sec   Loss 4.1671   LearningRate 0.0476   Epoch: 12   Global Step: 128870   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:02:35,226-Speed 5436.70 samples/sec   Loss 4.1810   LearningRate 0.0476   Epoch: 12   Global Step: 128880   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:02:42,923-Speed 5322.29 samples/sec   Loss 4.2176   LearningRate 0.0476   Epoch: 12   Global Step: 128890   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:02:50,566-Speed 5359.93 samples/sec   Loss 4.2397   LearningRate 0.0476   Epoch: 12   Global Step: 128900   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:02:58,137-Speed 5410.99 samples/sec   Loss 4.2306   LearningRate 0.0476   Epoch: 12   Global Step: 128910   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:03:05,809-Speed 5339.39 samples/sec   Loss 4.1908   LearningRate 0.0476   Epoch: 12   Global Step: 128920   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:03:13,315-Speed 5458.05 samples/sec   Loss 4.2175   LearningRate 0.0476   Epoch: 12   Global Step: 128930   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:03:20,860-Speed 5429.14 samples/sec   Loss 4.2786   LearningRate 0.0476   Epoch: 12   Global Step: 128940   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:03:28,352-Speed 5467.77 samples/sec   Loss 4.2467   LearningRate 0.0475   Epoch: 12   Global Step: 128950   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:03:35,831-Speed 5477.38 samples/sec   Loss 4.1829   LearningRate 0.0475   Epoch: 12   Global Step: 128960   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:03:43,357-Speed 5443.75 samples/sec   Loss 4.1519   LearningRate 0.0475   Epoch: 12   Global Step: 128970   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:03:50,795-Speed 5507.71 samples/sec   Loss 4.1957   LearningRate 0.0475   Epoch: 12   Global Step: 128980   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:03:58,320-Speed 5443.51 samples/sec   Loss 4.2380   LearningRate 0.0475   Epoch: 12   Global Step: 128990   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:04:05,832-Speed 5452.85 samples/sec   Loss 4.1797   LearningRate 0.0475   Epoch: 12   Global Step: 129000   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:04:13,330-Speed 5464.06 samples/sec   Loss 4.2135   LearningRate 0.0475   Epoch: 12   Global Step: 129010   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:04:20,793-Speed 5489.02 samples/sec   Loss 4.1972   LearningRate 0.0475   Epoch: 12   Global Step: 129020   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:04:28,341-Speed 5427.01 samples/sec   Loss 4.1944   LearningRate 0.0474   Epoch: 12   Global Step: 129030   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:04:35,872-Speed 5439.30 samples/sec   Loss 4.1952   LearningRate 0.0474   Epoch: 12   Global Step: 129040   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:04:43,364-Speed 5468.36 samples/sec   Loss 4.2280   LearningRate 0.0474   Epoch: 12   Global Step: 129050   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:04:50,896-Speed 5439.21 samples/sec   Loss 4.2227   LearningRate 0.0474   Epoch: 12   Global Step: 129060   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:04:58,407-Speed 5454.08 samples/sec   Loss 4.2274   LearningRate 0.0474   Epoch: 12   Global Step: 129070   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-09 00:05:06,002-Speed 5392.96 samples/sec   Loss 4.2439   LearningRate 0.0474   Epoch: 12   Global Step: 129080   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:05:13,488-Speed 5472.54 samples/sec   Loss 4.1579   LearningRate 0.0474   Epoch: 12   Global Step: 129090   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:05:21,164-Speed 5336.63 samples/sec   Loss 4.1994   LearningRate 0.0474   Epoch: 12   Global Step: 129100   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:05:28,664-Speed 5462.40 samples/sec   Loss 4.1819   LearningRate 0.0474   Epoch: 12   Global Step: 129110   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:05:36,138-Speed 5481.18 samples/sec   Loss 4.2196   LearningRate 0.0473   Epoch: 12   Global Step: 129120   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:05:43,621-Speed 5474.52 samples/sec   Loss 4.2139   LearningRate 0.0473   Epoch: 12   Global Step: 129130   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:05:51,053-Speed 5512.11 samples/sec   Loss 4.2275   LearningRate 0.0473   Epoch: 12   Global Step: 129140   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:05:58,539-Speed 5471.62 samples/sec   Loss 4.2418   LearningRate 0.0473   Epoch: 12   Global Step: 129150   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:06:06,053-Speed 5452.30 samples/sec   Loss 4.2211   LearningRate 0.0473   Epoch: 12   Global Step: 129160   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:06:13,512-Speed 5491.53 samples/sec   Loss 4.2080   LearningRate 0.0473   Epoch: 12   Global Step: 129170   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:06:21,035-Speed 5445.32 samples/sec   Loss 4.2956   LearningRate 0.0473   Epoch: 12   Global Step: 129180   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:06:28,525-Speed 5469.60 samples/sec   Loss 4.2118   LearningRate 0.0473   Epoch: 12   Global Step: 129190   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:06:36,082-Speed 5420.95 samples/sec   Loss 4.2156   LearningRate 0.0472   Epoch: 12   Global Step: 129200   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:06:43,498-Speed 5524.00 samples/sec   Loss 4.1902   LearningRate 0.0472   Epoch: 12   Global Step: 129210   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:06:50,943-Speed 5502.09 samples/sec   Loss 4.1861   LearningRate 0.0472   Epoch: 12   Global Step: 129220   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:06:58,379-Speed 5509.02 samples/sec   Loss 4.2300   LearningRate 0.0472   Epoch: 12   Global Step: 129230   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:07:06,047-Speed 5342.67 samples/sec   Loss 4.1758   LearningRate 0.0472   Epoch: 12   Global Step: 129240   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:07:13,537-Speed 5468.86 samples/sec   Loss 4.1852   LearningRate 0.0472   Epoch: 12   Global Step: 129250   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:07:20,948-Speed 5528.31 samples/sec   Loss 4.2074   LearningRate 0.0472   Epoch: 12   Global Step: 129260   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:07:28,448-Speed 5461.64 samples/sec   Loss 4.2131   LearningRate 0.0472   Epoch: 12   Global Step: 129270   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:07:35,970-Speed 5446.00 samples/sec   Loss 4.1513   LearningRate 0.0471   Epoch: 12   Global Step: 129280   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:07:43,556-Speed 5400.27 samples/sec   Loss 4.1953   LearningRate 0.0471   Epoch: 12   Global Step: 129290   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:07:51,126-Speed 5412.24 samples/sec   Loss 4.1797   LearningRate 0.0471   Epoch: 12   Global Step: 129300   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:07:58,641-Speed 5451.29 samples/sec   Loss 4.2183   LearningRate 0.0471   Epoch: 12   Global Step: 129310   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:08:06,120-Speed 5476.95 samples/sec   Loss 4.2457   LearningRate 0.0471   Epoch: 12   Global Step: 129320   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:08:13,651-Speed 5439.45 samples/sec   Loss 4.1560   LearningRate 0.0471   Epoch: 12   Global Step: 129330   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:08:21,100-Speed 5499.86 samples/sec   Loss 4.1993   LearningRate 0.0471   Epoch: 12   Global Step: 129340   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:08:28,600-Speed 5461.41 samples/sec   Loss 4.1741   LearningRate 0.0471   Epoch: 12   Global Step: 129350   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:08:36,080-Speed 5476.74 samples/sec   Loss 4.1648   LearningRate 0.0470   Epoch: 12   Global Step: 129360   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:08:43,536-Speed 5494.52 samples/sec   Loss 4.2128   LearningRate 0.0470   Epoch: 12   Global Step: 129370   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:08:51,123-Speed 5399.18 samples/sec   Loss 4.1618   LearningRate 0.0470   Epoch: 12   Global Step: 129380   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:08:58,743-Speed 5376.72 samples/sec   Loss 4.2199   LearningRate 0.0470   Epoch: 12   Global Step: 129390   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:09:06,291-Speed 5426.82 samples/sec   Loss 4.1730   LearningRate 0.0470   Epoch: 12   Global Step: 129400   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:09:13,790-Speed 5462.49 samples/sec   Loss 4.2190   LearningRate 0.0470   Epoch: 12   Global Step: 129410   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:09:21,312-Speed 5445.93 samples/sec   Loss 4.1833   LearningRate 0.0470   Epoch: 12   Global Step: 129420   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:09:28,927-Speed 5380.36 samples/sec   Loss 4.2524   LearningRate 0.0470   Epoch: 12   Global Step: 129430   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:09:36,455-Speed 5441.27 samples/sec   Loss 4.2249   LearningRate 0.0470   Epoch: 12   Global Step: 129440   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:09:44,016-Speed 5417.65 samples/sec   Loss 4.2031   LearningRate 0.0469   Epoch: 12   Global Step: 129450   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:09:51,689-Speed 5339.16 samples/sec   Loss 4.2148   LearningRate 0.0469   Epoch: 12   Global Step: 129460   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:09:59,129-Speed 5506.30 samples/sec   Loss 4.1931   LearningRate 0.0469   Epoch: 12   Global Step: 129470   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:10:06,599-Speed 5483.52 samples/sec   Loss 4.1698   LearningRate 0.0469   Epoch: 12   Global Step: 129480   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:10:14,094-Speed 5466.15 samples/sec   Loss 4.2150   LearningRate 0.0469   Epoch: 12   Global Step: 129490   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:10:21,549-Speed 5495.02 samples/sec   Loss 4.1608   LearningRate 0.0469   Epoch: 12   Global Step: 129500   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:10:29,000-Speed 5497.60 samples/sec   Loss 4.2178   LearningRate 0.0469   Epoch: 12   Global Step: 129510   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:10:36,481-Speed 5476.24 samples/sec   Loss 4.1792   LearningRate 0.0469   Epoch: 12   Global Step: 129520   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:10:44,102-Speed 5374.93 samples/sec   Loss 4.2170   LearningRate 0.0468   Epoch: 12   Global Step: 129530   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:10:51,633-Speed 5439.70 samples/sec   Loss 4.1657   LearningRate 0.0468   Epoch: 12   Global Step: 129540   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:10:59,162-Speed 5441.07 samples/sec   Loss 4.2486   LearningRate 0.0468   Epoch: 12   Global Step: 129550   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:11:06,733-Speed 5410.72 samples/sec   Loss 4.1582   LearningRate 0.0468   Epoch: 12   Global Step: 129560   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:11:14,143-Speed 5528.79 samples/sec   Loss 4.1867   LearningRate 0.0468   Epoch: 12   Global Step: 129570   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:11:21,600-Speed 5492.66 samples/sec   Loss 4.2168   LearningRate 0.0468   Epoch: 12   Global Step: 129580   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:11:29,089-Speed 5471.13 samples/sec   Loss 4.1983   LearningRate 0.0468   Epoch: 12   Global Step: 129590   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:11:36,570-Speed 5475.47 samples/sec   Loss 4.2064   LearningRate 0.0468   Epoch: 12   Global Step: 129600   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:11:44,004-Speed 5510.39 samples/sec   Loss 4.1849   LearningRate 0.0467   Epoch: 12   Global Step: 129610   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:11:51,646-Speed 5360.90 samples/sec   Loss 4.1962   LearningRate 0.0467   Epoch: 12   Global Step: 129620   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:11:59,064-Speed 5522.28 samples/sec   Loss 4.1969   LearningRate 0.0467   Epoch: 12   Global Step: 129630   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:12:06,480-Speed 5524.11 samples/sec   Loss 4.2159   LearningRate 0.0467   Epoch: 12   Global Step: 129640   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:12:13,942-Speed 5489.73 samples/sec   Loss 4.2573   LearningRate 0.0467   Epoch: 12   Global Step: 129650   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:12:21,404-Speed 5489.98 samples/sec   Loss 4.1852   LearningRate 0.0467   Epoch: 12   Global Step: 129660   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:12:28,834-Speed 5513.23 samples/sec   Loss 4.1708   LearningRate 0.0467   Epoch: 12   Global Step: 129670   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:12:36,255-Speed 5520.47 samples/sec   Loss 4.1902   LearningRate 0.0467   Epoch: 12   Global Step: 129680   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:12:43,676-Speed 5520.51 samples/sec   Loss 4.1331   LearningRate 0.0467   Epoch: 12   Global Step: 129690   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:12:51,185-Speed 5455.13 samples/sec   Loss 4.1717   LearningRate 0.0466   Epoch: 12   Global Step: 129700   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:12:58,661-Speed 5479.50 samples/sec   Loss 4.1524   LearningRate 0.0466   Epoch: 12   Global Step: 129710   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:13:06,191-Speed 5440.80 samples/sec   Loss 4.1594   LearningRate 0.0466   Epoch: 12   Global Step: 129720   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:13:13,666-Speed 5480.51 samples/sec   Loss 4.1857   LearningRate 0.0466   Epoch: 12   Global Step: 129730   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:13:21,087-Speed 5520.18 samples/sec   Loss 4.1780   LearningRate 0.0466   Epoch: 12   Global Step: 129740   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:13:28,583-Speed 5464.83 samples/sec   Loss 4.1921   LearningRate 0.0466   Epoch: 12   Global Step: 129750   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:13:36,129-Speed 5429.02 samples/sec   Loss 4.1780   LearningRate 0.0466   Epoch: 12   Global Step: 129760   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:13:43,642-Speed 5452.36 samples/sec   Loss 4.1677   LearningRate 0.0466   Epoch: 12   Global Step: 129770   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-09 00:13:51,284-Speed 5360.54 samples/sec   Loss 4.2015   LearningRate 0.0465   Epoch: 12   Global Step: 129780   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:13:58,737-Speed 5496.36 samples/sec   Loss 4.1457   LearningRate 0.0465   Epoch: 12   Global Step: 129790   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:14:06,247-Speed 5454.94 samples/sec   Loss 4.1865   LearningRate 0.0465   Epoch: 12   Global Step: 129800   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:14:13,851-Speed 5387.39 samples/sec   Loss 4.1994   LearningRate 0.0465   Epoch: 12   Global Step: 129810   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:14:21,405-Speed 5423.44 samples/sec   Loss 4.1851   LearningRate 0.0465   Epoch: 12   Global Step: 129820   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:14:28,880-Speed 5479.64 samples/sec   Loss 4.1367   LearningRate 0.0465   Epoch: 12   Global Step: 129830   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:14:36,451-Speed 5411.19 samples/sec   Loss 4.1884   LearningRate 0.0465   Epoch: 12   Global Step: 129840   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:14:43,992-Speed 5432.67 samples/sec   Loss 4.1884   LearningRate 0.0465   Epoch: 12   Global Step: 129850   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:14:51,488-Speed 5464.50 samples/sec   Loss 4.1315   LearningRate 0.0464   Epoch: 12   Global Step: 129860   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:14:58,979-Speed 5468.35 samples/sec   Loss 4.1678   LearningRate 0.0464   Epoch: 12   Global Step: 129870   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:15:06,553-Speed 5409.04 samples/sec   Loss 4.1425   LearningRate 0.0464   Epoch: 12   Global Step: 129880   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:15:14,065-Speed 5453.48 samples/sec   Loss 4.1518   LearningRate 0.0464   Epoch: 12   Global Step: 129890   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:15:21,528-Speed 5488.99 samples/sec   Loss 4.2375   LearningRate 0.0464   Epoch: 12   Global Step: 129900   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:15:28,977-Speed 5499.37 samples/sec   Loss 4.1748   LearningRate 0.0464   Epoch: 12   Global Step: 129910   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:15:36,499-Speed 5446.57 samples/sec   Loss 4.1474   LearningRate 0.0464   Epoch: 12   Global Step: 129920   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:15:43,984-Speed 5472.80 samples/sec   Loss 4.1904   LearningRate 0.0464   Epoch: 12   Global Step: 129930   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:15:51,471-Speed 5471.39 samples/sec   Loss 4.1680   LearningRate 0.0464   Epoch: 12   Global Step: 129940   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:15:59,066-Speed 5393.46 samples/sec   Loss 4.1841   LearningRate 0.0463   Epoch: 12   Global Step: 129950   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:16:06,613-Speed 5428.63 samples/sec   Loss 4.1937   LearningRate 0.0463   Epoch: 12   Global Step: 129960   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:16:14,075-Speed 5489.81 samples/sec   Loss 4.1614   LearningRate 0.0463   Epoch: 12   Global Step: 129970   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:16:21,490-Speed 5524.64 samples/sec   Loss 4.1907   LearningRate 0.0463   Epoch: 12   Global Step: 129980   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:16:28,953-Speed 5489.21 samples/sec   Loss 4.1533   LearningRate 0.0463   Epoch: 12   Global Step: 129990   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:16:36,486-Speed 5438.00 samples/sec   Loss 4.1715   LearningRate 0.0463   Epoch: 12   Global Step: 130000   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:17:20,781-[lfw][130000]XNorm: 23.324596
Training: 2022-01-09 00:17:20,782-[lfw][130000]Accuracy-Flip: 0.99800+-0.00287
Training: 2022-01-09 00:17:20,782-[lfw][130000]Accuracy-Highest: 0.99817
Training: 2022-01-09 00:18:12,282-[cfp_fp][130000]XNorm: 21.926012
Training: 2022-01-09 00:18:12,283-[cfp_fp][130000]Accuracy-Flip: 0.98914+-0.00420
Training: 2022-01-09 00:18:12,283-[cfp_fp][130000]Accuracy-Highest: 0.99157
Training: 2022-01-09 00:18:56,575-[agedb_30][130000]XNorm: 23.240002
Training: 2022-01-09 00:18:56,576-[agedb_30][130000]Accuracy-Flip: 0.97833+-0.00703
Training: 2022-01-09 00:18:56,577-[agedb_30][130000]Accuracy-Highest: 0.98000
Training: 2022-01-09 00:19:04,167-Speed 277.36 samples/sec   Loss 4.1150   LearningRate 0.0463   Epoch: 12   Global Step: 130010   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:19:11,637-Speed 5484.33 samples/sec   Loss 4.1609   LearningRate 0.0463   Epoch: 12   Global Step: 130020   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:19:19,153-Speed 5450.32 samples/sec   Loss 4.1701   LearningRate 0.0462   Epoch: 12   Global Step: 130030   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:19:26,659-Speed 5457.50 samples/sec   Loss 4.1989   LearningRate 0.0462   Epoch: 12   Global Step: 130040   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:19:34,245-Speed 5399.92 samples/sec   Loss 4.1574   LearningRate 0.0462   Epoch: 12   Global Step: 130050   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:19:41,849-Speed 5387.83 samples/sec   Loss 4.2265   LearningRate 0.0462   Epoch: 12   Global Step: 130060   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:19:49,446-Speed 5391.72 samples/sec   Loss 4.1613   LearningRate 0.0462   Epoch: 12   Global Step: 130070   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:19:56,905-Speed 5492.19 samples/sec   Loss 4.1974   LearningRate 0.0462   Epoch: 12   Global Step: 130080   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:20:04,379-Speed 5480.64 samples/sec   Loss 4.1668   LearningRate 0.0462   Epoch: 12   Global Step: 130090   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:20:11,934-Speed 5423.14 samples/sec   Loss 4.1612   LearningRate 0.0462   Epoch: 12   Global Step: 130100   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:20:19,432-Speed 5463.74 samples/sec   Loss 4.1581   LearningRate 0.0461   Epoch: 12   Global Step: 130110   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:20:27,029-Speed 5392.04 samples/sec   Loss 4.1751   LearningRate 0.0461   Epoch: 12   Global Step: 130120   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:20:34,584-Speed 5421.66 samples/sec   Loss 4.1487   LearningRate 0.0461   Epoch: 12   Global Step: 130130   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:20:42,140-Speed 5422.11 samples/sec   Loss 4.1610   LearningRate 0.0461   Epoch: 12   Global Step: 130140   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-09 00:20:49,632-Speed 5467.80 samples/sec   Loss 4.1545   LearningRate 0.0461   Epoch: 12   Global Step: 130150   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:20:57,159-Speed 5442.42 samples/sec   Loss 4.1779   LearningRate 0.0461   Epoch: 12   Global Step: 130160   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:21:04,678-Speed 5448.24 samples/sec   Loss 4.1965   LearningRate 0.0461   Epoch: 12   Global Step: 130170   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:21:12,197-Speed 5448.58 samples/sec   Loss 4.1820   LearningRate 0.0461   Epoch: 12   Global Step: 130180   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:21:19,715-Speed 5448.75 samples/sec   Loss 4.1465   LearningRate 0.0461   Epoch: 12   Global Step: 130190   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:21:27,180-Speed 5488.00 samples/sec   Loss 4.2097   LearningRate 0.0460   Epoch: 12   Global Step: 130200   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:21:34,670-Speed 5468.75 samples/sec   Loss 4.1553   LearningRate 0.0460   Epoch: 12   Global Step: 130210   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:21:42,081-Speed 5527.85 samples/sec   Loss 4.1559   LearningRate 0.0460   Epoch: 12   Global Step: 130220   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:21:49,555-Speed 5481.39 samples/sec   Loss 4.1768   LearningRate 0.0460   Epoch: 12   Global Step: 130230   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:21:56,962-Speed 5530.08 samples/sec   Loss 4.1839   LearningRate 0.0460   Epoch: 12   Global Step: 130240   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:22:04,437-Speed 5480.87 samples/sec   Loss 4.1247   LearningRate 0.0460   Epoch: 12   Global Step: 130250   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-09 00:22:11,978-Speed 5432.55 samples/sec   Loss 4.1711   LearningRate 0.0460   Epoch: 12   Global Step: 130260   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-09 00:22:19,448-Speed 5483.40 samples/sec   Loss 4.1627   LearningRate 0.0460   Epoch: 12   Global Step: 130270   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:22:26,938-Speed 5469.68 samples/sec   Loss 4.1745   LearningRate 0.0459   Epoch: 12   Global Step: 130280   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:22:34,507-Speed 5411.93 samples/sec   Loss 4.1338   LearningRate 0.0459   Epoch: 12   Global Step: 130290   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:22:42,034-Speed 5442.73 samples/sec   Loss 4.1567   LearningRate 0.0459   Epoch: 12   Global Step: 130300   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:22:49,613-Speed 5405.20 samples/sec   Loss 4.1257   LearningRate 0.0459   Epoch: 12   Global Step: 130310   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:22:57,201-Speed 5398.61 samples/sec   Loss 4.1456   LearningRate 0.0459   Epoch: 12   Global Step: 130320   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:23:04,794-Speed 5395.01 samples/sec   Loss 4.1461   LearningRate 0.0459   Epoch: 12   Global Step: 130330   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:23:12,295-Speed 5461.39 samples/sec   Loss 4.1741   LearningRate 0.0459   Epoch: 12   Global Step: 130340   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:23:19,791-Speed 5464.81 samples/sec   Loss 4.1518   LearningRate 0.0459   Epoch: 12   Global Step: 130350   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:23:27,315-Speed 5444.67 samples/sec   Loss 4.1308   LearningRate 0.0459   Epoch: 12   Global Step: 130360   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:23:34,765-Speed 5499.07 samples/sec   Loss 4.1370   LearningRate 0.0458   Epoch: 12   Global Step: 130370   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:23:42,274-Speed 5455.49 samples/sec   Loss 4.1245   LearningRate 0.0458   Epoch: 12   Global Step: 130380   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:23:49,826-Speed 5424.30 samples/sec   Loss 4.1715   LearningRate 0.0458   Epoch: 12   Global Step: 130390   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:23:57,448-Speed 5374.79 samples/sec   Loss 4.1778   LearningRate 0.0458   Epoch: 12   Global Step: 130400   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:24:04,923-Speed 5479.92 samples/sec   Loss 4.1944   LearningRate 0.0458   Epoch: 12   Global Step: 130410   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:24:12,475-Speed 5424.58 samples/sec   Loss 4.1756   LearningRate 0.0458   Epoch: 12   Global Step: 130420   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:24:20,158-Speed 5331.76 samples/sec   Loss 4.1597   LearningRate 0.0458   Epoch: 12   Global Step: 130430   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:24:27,706-Speed 5427.78 samples/sec   Loss 4.1251   LearningRate 0.0458   Epoch: 12   Global Step: 130440   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:24:35,293-Speed 5398.82 samples/sec   Loss 4.1717   LearningRate 0.0457   Epoch: 12   Global Step: 130450   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:24:42,924-Speed 5368.77 samples/sec   Loss 4.1235   LearningRate 0.0457   Epoch: 12   Global Step: 130460   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:24:50,481-Speed 5420.98 samples/sec   Loss 4.1752   LearningRate 0.0457   Epoch: 12   Global Step: 130470   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:24:58,057-Speed 5407.54 samples/sec   Loss 4.1505   LearningRate 0.0457   Epoch: 12   Global Step: 130480   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:25:05,655-Speed 5390.90 samples/sec   Loss 4.1321   LearningRate 0.0457   Epoch: 12   Global Step: 130490   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:25:13,254-Speed 5390.82 samples/sec   Loss 4.1439   LearningRate 0.0457   Epoch: 12   Global Step: 130500   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:25:20,788-Speed 5438.58 samples/sec   Loss 4.1205   LearningRate 0.0457   Epoch: 12   Global Step: 130510   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:25:28,321-Speed 5438.30 samples/sec   Loss 4.1289   LearningRate 0.0457   Epoch: 12   Global Step: 130520   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:25:35,886-Speed 5414.64 samples/sec   Loss 4.1609   LearningRate 0.0456   Epoch: 12   Global Step: 130530   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:25:43,393-Speed 5457.17 samples/sec   Loss 4.2003   LearningRate 0.0456   Epoch: 12   Global Step: 130540   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:25:50,978-Speed 5400.64 samples/sec   Loss 4.1387   LearningRate 0.0456   Epoch: 12   Global Step: 130550   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:25:58,466-Speed 5470.85 samples/sec   Loss 4.1759   LearningRate 0.0456   Epoch: 12   Global Step: 130560   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:26:06,159-Speed 5324.89 samples/sec   Loss 4.1146   LearningRate 0.0456   Epoch: 12   Global Step: 130570   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:26:13,723-Speed 5415.81 samples/sec   Loss 4.1373   LearningRate 0.0456   Epoch: 12   Global Step: 130580   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:26:21,268-Speed 5429.29 samples/sec   Loss 4.0956   LearningRate 0.0456   Epoch: 12   Global Step: 130590   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:26:28,985-Speed 5309.15 samples/sec   Loss 4.1440   LearningRate 0.0456   Epoch: 12   Global Step: 130600   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:26:36,605-Speed 5375.28 samples/sec   Loss 4.1489   LearningRate 0.0456   Epoch: 12   Global Step: 130610   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:26:44,122-Speed 5450.05 samples/sec   Loss 4.1725   LearningRate 0.0455   Epoch: 12   Global Step: 130620   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:26:51,657-Speed 5436.56 samples/sec   Loss 4.0923   LearningRate 0.0455   Epoch: 12   Global Step: 130630   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:26:59,305-Speed 5356.58 samples/sec   Loss 4.1105   LearningRate 0.0455   Epoch: 12   Global Step: 130640   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:27:07,014-Speed 5313.64 samples/sec   Loss 4.1248   LearningRate 0.0455   Epoch: 12   Global Step: 130650   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:27:14,546-Speed 5438.78 samples/sec   Loss 4.1739   LearningRate 0.0455   Epoch: 12   Global Step: 130660   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:27:22,095-Speed 5426.44 samples/sec   Loss 4.1222   LearningRate 0.0455   Epoch: 12   Global Step: 130670   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:27:29,583-Speed 5471.07 samples/sec   Loss 4.1581   LearningRate 0.0455   Epoch: 12   Global Step: 130680   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:27:37,097-Speed 5451.87 samples/sec   Loss 4.1414   LearningRate 0.0455   Epoch: 12   Global Step: 130690   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:27:44,603-Speed 5457.75 samples/sec   Loss 4.1382   LearningRate 0.0454   Epoch: 12   Global Step: 130700   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:27:52,313-Speed 5313.45 samples/sec   Loss 4.1736   LearningRate 0.0454   Epoch: 12   Global Step: 130710   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-09 00:27:59,822-Speed 5455.21 samples/sec   Loss 4.1370   LearningRate 0.0454   Epoch: 12   Global Step: 130720   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:28:07,323-Speed 5461.19 samples/sec   Loss 4.1827   LearningRate 0.0454   Epoch: 12   Global Step: 130730   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:28:14,859-Speed 5436.66 samples/sec   Loss 4.1427   LearningRate 0.0454   Epoch: 12   Global Step: 130740   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:28:22,348-Speed 5470.30 samples/sec   Loss 4.1131   LearningRate 0.0454   Epoch: 12   Global Step: 130750   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:28:30,036-Speed 5328.43 samples/sec   Loss 4.1198   LearningRate 0.0454   Epoch: 12   Global Step: 130760   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:28:37,672-Speed 5364.90 samples/sec   Loss 4.1142   LearningRate 0.0454   Epoch: 12   Global Step: 130770   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:28:45,249-Speed 5406.08 samples/sec   Loss 4.1313   LearningRate 0.0454   Epoch: 12   Global Step: 130780   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:28:52,771-Speed 5446.49 samples/sec   Loss 4.1299   LearningRate 0.0453   Epoch: 12   Global Step: 130790   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:29:00,306-Speed 5436.95 samples/sec   Loss 4.1392   LearningRate 0.0453   Epoch: 12   Global Step: 130800   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:29:07,857-Speed 5424.75 samples/sec   Loss 4.1612   LearningRate 0.0453   Epoch: 12   Global Step: 130810   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:29:15,422-Speed 5415.29 samples/sec   Loss 4.0909   LearningRate 0.0453   Epoch: 12   Global Step: 130820   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-09 00:29:22,892-Speed 5484.23 samples/sec   Loss 4.0808   LearningRate 0.0453   Epoch: 12   Global Step: 130830   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:29:30,535-Speed 5359.93 samples/sec   Loss 4.0839   LearningRate 0.0453   Epoch: 12   Global Step: 130840   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:29:37,988-Speed 5496.60 samples/sec   Loss 4.0747   LearningRate 0.0453   Epoch: 12   Global Step: 130850   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:29:45,639-Speed 5353.89 samples/sec   Loss 4.0907   LearningRate 0.0453   Epoch: 12   Global Step: 130860   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:29:53,332-Speed 5325.51 samples/sec   Loss 4.1411   LearningRate 0.0452   Epoch: 12   Global Step: 130870   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:30:00,904-Speed 5410.60 samples/sec   Loss 4.1521   LearningRate 0.0452   Epoch: 12   Global Step: 130880   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:30:08,544-Speed 5361.50 samples/sec   Loss 4.1259   LearningRate 0.0452   Epoch: 12   Global Step: 130890   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:30:16,068-Speed 5444.80 samples/sec   Loss 4.0994   LearningRate 0.0452   Epoch: 12   Global Step: 130900   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:30:23,525-Speed 5493.55 samples/sec   Loss 4.1349   LearningRate 0.0452   Epoch: 12   Global Step: 130910   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:30:30,962-Speed 5508.76 samples/sec   Loss 4.1077   LearningRate 0.0452   Epoch: 12   Global Step: 130920   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:30:38,511-Speed 5426.49 samples/sec   Loss 4.1133   LearningRate 0.0452   Epoch: 12   Global Step: 130930   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:30:46,041-Speed 5440.32 samples/sec   Loss 4.1410   LearningRate 0.0452   Epoch: 12   Global Step: 130940   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-09 00:30:53,544-Speed 5459.39 samples/sec   Loss 4.0917   LearningRate 0.0452   Epoch: 12   Global Step: 130950   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:31:01,012-Speed 5485.77 samples/sec   Loss 4.1063   LearningRate 0.0451   Epoch: 12   Global Step: 130960   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:31:08,579-Speed 5413.41 samples/sec   Loss 4.1048   LearningRate 0.0451   Epoch: 12   Global Step: 130970   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-09 00:31:16,167-Speed 5398.87 samples/sec   Loss 4.0951   LearningRate 0.0451   Epoch: 12   Global Step: 130980   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:31:23,657-Speed 5469.49 samples/sec   Loss 4.1057   LearningRate 0.0451   Epoch: 12   Global Step: 130990   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:31:31,272-Speed 5379.67 samples/sec   Loss 4.1238   LearningRate 0.0451   Epoch: 12   Global Step: 131000   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:31:38,845-Speed 5409.48 samples/sec   Loss 4.1033   LearningRate 0.0451   Epoch: 12   Global Step: 131010   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:31:46,419-Speed 5408.68 samples/sec   Loss 4.1126   LearningRate 0.0451   Epoch: 12   Global Step: 131020   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:31:54,122-Speed 5318.05 samples/sec   Loss 4.0621   LearningRate 0.0451   Epoch: 12   Global Step: 131030   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:32:01,715-Speed 5395.37 samples/sec   Loss 4.1464   LearningRate 0.0450   Epoch: 12   Global Step: 131040   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:32:09,244-Speed 5440.38 samples/sec   Loss 4.1061   LearningRate 0.0450   Epoch: 12   Global Step: 131050   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:32:16,718-Speed 5481.35 samples/sec   Loss 4.1063   LearningRate 0.0450   Epoch: 12   Global Step: 131060   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:32:24,214-Speed 5464.98 samples/sec   Loss 4.1007   LearningRate 0.0450   Epoch: 12   Global Step: 131070   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:32:31,691-Speed 5478.67 samples/sec   Loss 4.0737   LearningRate 0.0450   Epoch: 12   Global Step: 131080   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:32:39,171-Speed 5476.96 samples/sec   Loss 4.0854   LearningRate 0.0450   Epoch: 12   Global Step: 131090   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:32:46,792-Speed 5374.74 samples/sec   Loss 4.1046   LearningRate 0.0450   Epoch: 12   Global Step: 131100   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:32:54,305-Speed 5453.37 samples/sec   Loss 4.0984   LearningRate 0.0450   Epoch: 12   Global Step: 131110   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:33:01,838-Speed 5437.75 samples/sec   Loss 4.0607   LearningRate 0.0450   Epoch: 12   Global Step: 131120   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:33:09,401-Speed 5416.50 samples/sec   Loss 4.1372   LearningRate 0.0449   Epoch: 12   Global Step: 131130   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:33:16,882-Speed 5476.13 samples/sec   Loss 4.1374   LearningRate 0.0449   Epoch: 12   Global Step: 131140   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:33:24,335-Speed 5496.09 samples/sec   Loss 4.0810   LearningRate 0.0449   Epoch: 12   Global Step: 131150   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-09 00:33:31,918-Speed 5402.13 samples/sec   Loss 4.1336   LearningRate 0.0449   Epoch: 12   Global Step: 131160   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-09 00:33:39,451-Speed 5438.37 samples/sec   Loss 4.0752   LearningRate 0.0449   Epoch: 12   Global Step: 131170   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-09 00:33:46,978-Speed 5442.52 samples/sec   Loss 4.1445   LearningRate 0.0449   Epoch: 12   Global Step: 131180   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-09 00:33:54,488-Speed 5455.08 samples/sec   Loss 4.0596   LearningRate 0.0449   Epoch: 12   Global Step: 131190   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-09 00:34:02,053-Speed 5414.84 samples/sec   Loss 4.0922   LearningRate 0.0449   Epoch: 12   Global Step: 131200   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-09 00:34:09,607-Speed 5423.15 samples/sec   Loss 4.1295   LearningRate 0.0448   Epoch: 12   Global Step: 131210   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:34:17,201-Speed 5393.90 samples/sec   Loss 4.1709   LearningRate 0.0448   Epoch: 12   Global Step: 131220   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:34:24,772-Speed 5411.16 samples/sec   Loss 4.0999   LearningRate 0.0448   Epoch: 12   Global Step: 131230   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:34:32,308-Speed 5436.04 samples/sec   Loss 4.0710   LearningRate 0.0448   Epoch: 12   Global Step: 131240   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:34:39,866-Speed 5420.15 samples/sec   Loss 4.1242   LearningRate 0.0448   Epoch: 12   Global Step: 131250   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:34:47,451-Speed 5400.64 samples/sec   Loss 4.1029   LearningRate 0.0448   Epoch: 12   Global Step: 131260   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:34:54,979-Speed 5442.06 samples/sec   Loss 4.0874   LearningRate 0.0448   Epoch: 12   Global Step: 131270   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:35:02,494-Speed 5450.97 samples/sec   Loss 4.1265   LearningRate 0.0448   Epoch: 12   Global Step: 131280   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:35:10,076-Speed 5402.61 samples/sec   Loss 4.0552   LearningRate 0.0448   Epoch: 12   Global Step: 131290   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:35:17,603-Speed 5442.44 samples/sec   Loss 4.0900   LearningRate 0.0447   Epoch: 12   Global Step: 131300   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:35:25,172-Speed 5412.21 samples/sec   Loss 4.0987   LearningRate 0.0447   Epoch: 12   Global Step: 131310   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:35:32,706-Speed 5437.67 samples/sec   Loss 4.0707   LearningRate 0.0447   Epoch: 12   Global Step: 131320   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:35:40,278-Speed 5410.03 samples/sec   Loss 4.0141   LearningRate 0.0447   Epoch: 12   Global Step: 131330   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:35:47,802-Speed 5444.27 samples/sec   Loss 4.1368   LearningRate 0.0447   Epoch: 12   Global Step: 131340   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:35:55,313-Speed 5454.37 samples/sec   Loss 4.1201   LearningRate 0.0447   Epoch: 12   Global Step: 131350   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:36:02,937-Speed 5373.28 samples/sec   Loss 4.0701   LearningRate 0.0447   Epoch: 12   Global Step: 131360   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:36:10,533-Speed 5392.93 samples/sec   Loss 4.1048   LearningRate 0.0447   Epoch: 12   Global Step: 131370   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:36:18,051-Speed 5448.37 samples/sec   Loss 4.0609   LearningRate 0.0446   Epoch: 12   Global Step: 131380   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:36:25,727-Speed 5337.45 samples/sec   Loss 4.1031   LearningRate 0.0446   Epoch: 12   Global Step: 131390   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:36:33,314-Speed 5398.89 samples/sec   Loss 4.0687   LearningRate 0.0446   Epoch: 12   Global Step: 131400   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:36:40,952-Speed 5363.57 samples/sec   Loss 4.0583   LearningRate 0.0446   Epoch: 12   Global Step: 131410   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:36:48,474-Speed 5446.12 samples/sec   Loss 4.1312   LearningRate 0.0446   Epoch: 12   Global Step: 131420   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:36:56,012-Speed 5434.33 samples/sec   Loss 4.1061   LearningRate 0.0446   Epoch: 12   Global Step: 131430   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:37:03,531-Speed 5448.08 samples/sec   Loss 4.0718   LearningRate 0.0446   Epoch: 12   Global Step: 131440   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:37:11,082-Speed 5425.38 samples/sec   Loss 4.0855   LearningRate 0.0446   Epoch: 12   Global Step: 131450   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:37:18,698-Speed 5378.36 samples/sec   Loss 4.1415   LearningRate 0.0446   Epoch: 12   Global Step: 131460   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:37:26,347-Speed 5356.41 samples/sec   Loss 4.0485   LearningRate 0.0445   Epoch: 12   Global Step: 131470   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:37:33,965-Speed 5376.86 samples/sec   Loss 4.0846   LearningRate 0.0445   Epoch: 12   Global Step: 131480   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-09 00:37:41,463-Speed 5463.61 samples/sec   Loss 4.0829   LearningRate 0.0445   Epoch: 12   Global Step: 131490   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:37:48,961-Speed 5463.24 samples/sec   Loss 4.0490   LearningRate 0.0445   Epoch: 12   Global Step: 131500   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:37:56,488-Speed 5443.05 samples/sec   Loss 4.0706   LearningRate 0.0445   Epoch: 12   Global Step: 131510   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:38:04,091-Speed 5388.34 samples/sec   Loss 4.0771   LearningRate 0.0445   Epoch: 12   Global Step: 131520   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:38:11,521-Speed 5512.94 samples/sec   Loss 4.0918   LearningRate 0.0445   Epoch: 12   Global Step: 131530   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:38:19,023-Speed 5460.46 samples/sec   Loss 4.0779   LearningRate 0.0445   Epoch: 12   Global Step: 131540   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:38:26,509-Speed 5472.80 samples/sec   Loss 4.0747   LearningRate 0.0444   Epoch: 12   Global Step: 131550   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:38:34,055-Speed 5428.49 samples/sec   Loss 4.0614   LearningRate 0.0444   Epoch: 12   Global Step: 131560   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:38:41,594-Speed 5434.17 samples/sec   Loss 4.1162   LearningRate 0.0444   Epoch: 12   Global Step: 131570   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:38:49,055-Speed 5490.28 samples/sec   Loss 4.1203   LearningRate 0.0444   Epoch: 12   Global Step: 131580   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:38:56,491-Speed 5509.31 samples/sec   Loss 4.1070   LearningRate 0.0444   Epoch: 12   Global Step: 131590   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:39:04,048-Speed 5420.73 samples/sec   Loss 4.0499   LearningRate 0.0444   Epoch: 12   Global Step: 131600   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:39:11,501-Speed 5496.39 samples/sec   Loss 4.0267   LearningRate 0.0444   Epoch: 12   Global Step: 131610   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:39:19,006-Speed 5458.39 samples/sec   Loss 4.0899   LearningRate 0.0444   Epoch: 12   Global Step: 131620   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:39:26,536-Speed 5440.57 samples/sec   Loss 4.0915   LearningRate 0.0444   Epoch: 12   Global Step: 131630   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:39:34,058-Speed 5446.47 samples/sec   Loss 4.0571   LearningRate 0.0443   Epoch: 12   Global Step: 131640   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:39:41,517-Speed 5491.76 samples/sec   Loss 4.1019   LearningRate 0.0443   Epoch: 12   Global Step: 131650   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:39:49,051-Speed 5437.25 samples/sec   Loss 4.0819   LearningRate 0.0443   Epoch: 12   Global Step: 131660   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:39:56,593-Speed 5431.93 samples/sec   Loss 4.0659   LearningRate 0.0443   Epoch: 12   Global Step: 131670   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:40:04,269-Speed 5336.90 samples/sec   Loss 4.0906   LearningRate 0.0443   Epoch: 12   Global Step: 131680   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:40:11,810-Speed 5432.30 samples/sec   Loss 4.0166   LearningRate 0.0443   Epoch: 12   Global Step: 131690   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:40:19,318-Speed 5456.43 samples/sec   Loss 4.0766   LearningRate 0.0443   Epoch: 12   Global Step: 131700   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:40:26,822-Speed 5459.09 samples/sec   Loss 4.0760   LearningRate 0.0443   Epoch: 12   Global Step: 131710   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:40:34,322-Speed 5461.72 samples/sec   Loss 4.0554   LearningRate 0.0442   Epoch: 12   Global Step: 131720   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:40:41,833-Speed 5454.11 samples/sec   Loss 4.0688   LearningRate 0.0442   Epoch: 12   Global Step: 131730   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:40:49,382-Speed 5426.56 samples/sec   Loss 4.0893   LearningRate 0.0442   Epoch: 12   Global Step: 131740   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:40:56,836-Speed 5495.86 samples/sec   Loss 4.0372   LearningRate 0.0442   Epoch: 12   Global Step: 131750   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:41:04,483-Speed 5357.16 samples/sec   Loss 4.0986   LearningRate 0.0442   Epoch: 12   Global Step: 131760   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:41:11,966-Speed 5474.48 samples/sec   Loss 4.0743   LearningRate 0.0442   Epoch: 12   Global Step: 131770   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:41:19,503-Speed 5435.35 samples/sec   Loss 4.0959   LearningRate 0.0442   Epoch: 12   Global Step: 131780   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:41:27,081-Speed 5405.26 samples/sec   Loss 4.0642   LearningRate 0.0442   Epoch: 12   Global Step: 131790   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:41:34,669-Speed 5398.65 samples/sec   Loss 4.0992   LearningRate 0.0442   Epoch: 12   Global Step: 131800   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:41:42,243-Speed 5408.84 samples/sec   Loss 4.0671   LearningRate 0.0441   Epoch: 12   Global Step: 131810   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:41:49,723-Speed 5476.60 samples/sec   Loss 4.0882   LearningRate 0.0441   Epoch: 12   Global Step: 131820   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:41:57,192-Speed 5484.63 samples/sec   Loss 4.0907   LearningRate 0.0441   Epoch: 12   Global Step: 131830   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:42:04,719-Speed 5442.22 samples/sec   Loss 3.9949   LearningRate 0.0441   Epoch: 12   Global Step: 131840   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:42:12,200-Speed 5476.29 samples/sec   Loss 4.0157   LearningRate 0.0441   Epoch: 12   Global Step: 131850   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:42:19,679-Speed 5477.44 samples/sec   Loss 4.0568   LearningRate 0.0441   Epoch: 12   Global Step: 131860   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:42:27,218-Speed 5433.63 samples/sec   Loss 4.0440   LearningRate 0.0441   Epoch: 12   Global Step: 131870   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:42:34,688-Speed 5483.94 samples/sec   Loss 4.0631   LearningRate 0.0441   Epoch: 12   Global Step: 131880   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:42:42,223-Speed 5436.61 samples/sec   Loss 4.0555   LearningRate 0.0440   Epoch: 12   Global Step: 131890   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:42:49,742-Speed 5448.61 samples/sec   Loss 4.0287   LearningRate 0.0440   Epoch: 12   Global Step: 131900   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:42:57,314-Speed 5409.81 samples/sec   Loss 4.0524   LearningRate 0.0440   Epoch: 12   Global Step: 131910   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:43:04,956-Speed 5360.46 samples/sec   Loss 4.0645   LearningRate 0.0440   Epoch: 12   Global Step: 131920   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:43:12,486-Speed 5440.21 samples/sec   Loss 4.1447   LearningRate 0.0440   Epoch: 12   Global Step: 131930   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:43:20,037-Speed 5425.37 samples/sec   Loss 4.0613   LearningRate 0.0440   Epoch: 12   Global Step: 131940   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:43:27,650-Speed 5380.91 samples/sec   Loss 4.0958   LearningRate 0.0440   Epoch: 12   Global Step: 131950   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:43:35,237-Speed 5399.71 samples/sec   Loss 4.0567   LearningRate 0.0440   Epoch: 12   Global Step: 131960   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:43:42,926-Speed 5327.88 samples/sec   Loss 4.0659   LearningRate 0.0440   Epoch: 12   Global Step: 131970   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:43:50,503-Speed 5406.56 samples/sec   Loss 4.0749   LearningRate 0.0439   Epoch: 12   Global Step: 131980   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:43:57,988-Speed 5472.67 samples/sec   Loss 4.0723   LearningRate 0.0439   Epoch: 12   Global Step: 131990   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:44:05,786-Speed 5253.83 samples/sec   Loss 4.0366   LearningRate 0.0439   Epoch: 12   Global Step: 132000   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:44:49,804-[lfw][132000]XNorm: 23.863765
Training: 2022-01-09 00:44:49,805-[lfw][132000]Accuracy-Flip: 0.99750+-0.00291
Training: 2022-01-09 00:44:49,805-[lfw][132000]Accuracy-Highest: 0.99817
Training: 2022-01-09 00:45:41,109-[cfp_fp][132000]XNorm: 22.283643
Training: 2022-01-09 00:45:41,110-[cfp_fp][132000]Accuracy-Flip: 0.99100+-0.00523
Training: 2022-01-09 00:45:41,110-[cfp_fp][132000]Accuracy-Highest: 0.99157
Training: 2022-01-09 00:46:25,456-[agedb_30][132000]XNorm: 23.641398
Training: 2022-01-09 00:46:25,457-[agedb_30][132000]Accuracy-Flip: 0.98067+-0.00742
Training: 2022-01-09 00:46:25,458-[agedb_30][132000]Accuracy-Highest: 0.98067
Training: 2022-01-09 00:46:33,034-Speed 278.17 samples/sec   Loss 4.0845   LearningRate 0.0439   Epoch: 12   Global Step: 132010   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:46:40,502-Speed 5484.76 samples/sec   Loss 4.0163   LearningRate 0.0439   Epoch: 12   Global Step: 132020   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:46:48,112-Speed 5383.07 samples/sec   Loss 4.0342   LearningRate 0.0439   Epoch: 12   Global Step: 132030   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:46:55,614-Speed 5460.98 samples/sec   Loss 4.0280   LearningRate 0.0439   Epoch: 12   Global Step: 132040   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:47:03,107-Speed 5467.18 samples/sec   Loss 4.0602   LearningRate 0.0439   Epoch: 12   Global Step: 132050   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:47:10,597-Speed 5469.38 samples/sec   Loss 4.0997   LearningRate 0.0438   Epoch: 12   Global Step: 132060   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:47:18,085-Speed 5470.51 samples/sec   Loss 3.9804   LearningRate 0.0438   Epoch: 12   Global Step: 132070   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:47:25,707-Speed 5374.47 samples/sec   Loss 3.9982   LearningRate 0.0438   Epoch: 12   Global Step: 132080   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:47:33,279-Speed 5410.23 samples/sec   Loss 4.0139   LearningRate 0.0438   Epoch: 12   Global Step: 132090   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:47:40,832-Speed 5423.97 samples/sec   Loss 4.0292   LearningRate 0.0438   Epoch: 12   Global Step: 132100   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:47:48,461-Speed 5369.30 samples/sec   Loss 4.0294   LearningRate 0.0438   Epoch: 12   Global Step: 132110   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:47:55,957-Speed 5465.07 samples/sec   Loss 4.0704   LearningRate 0.0438   Epoch: 12   Global Step: 132120   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:48:03,494-Speed 5435.47 samples/sec   Loss 4.0337   LearningRate 0.0438   Epoch: 12   Global Step: 132130   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:48:11,035-Speed 5432.22 samples/sec   Loss 4.0535   LearningRate 0.0438   Epoch: 12   Global Step: 132140   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:48:18,671-Speed 5364.93 samples/sec   Loss 4.0565   LearningRate 0.0437   Epoch: 12   Global Step: 132150   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:48:26,151-Speed 5476.76 samples/sec   Loss 4.0650   LearningRate 0.0437   Epoch: 12   Global Step: 132160   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:48:33,700-Speed 5426.57 samples/sec   Loss 4.0296   LearningRate 0.0437   Epoch: 12   Global Step: 132170   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:48:41,229-Speed 5441.11 samples/sec   Loss 4.0465   LearningRate 0.0437   Epoch: 12   Global Step: 132180   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:48:48,726-Speed 5464.12 samples/sec   Loss 4.0644   LearningRate 0.0437   Epoch: 12   Global Step: 132190   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:48:56,249-Speed 5445.23 samples/sec   Loss 4.0698   LearningRate 0.0437   Epoch: 12   Global Step: 132200   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:49:03,774-Speed 5444.03 samples/sec   Loss 4.0304   LearningRate 0.0437   Epoch: 12   Global Step: 132210   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:49:11,266-Speed 5468.65 samples/sec   Loss 4.0288   LearningRate 0.0437   Epoch: 12   Global Step: 132220   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:49:18,898-Speed 5367.38 samples/sec   Loss 4.0344   LearningRate 0.0437   Epoch: 12   Global Step: 132230   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:49:26,435-Speed 5435.09 samples/sec   Loss 4.0123   LearningRate 0.0436   Epoch: 12   Global Step: 132240   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-09 00:49:34,062-Speed 5371.18 samples/sec   Loss 4.0394   LearningRate 0.0436   Epoch: 12   Global Step: 132250   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-09 00:49:41,780-Speed 5307.70 samples/sec   Loss 4.0495   LearningRate 0.0436   Epoch: 12   Global Step: 132260   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:49:49,532-Speed 5284.44 samples/sec   Loss 4.0333   LearningRate 0.0436   Epoch: 12   Global Step: 132270   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:49:57,089-Speed 5420.67 samples/sec   Loss 4.0347   LearningRate 0.0436   Epoch: 12   Global Step: 132280   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:50:04,729-Speed 5361.73 samples/sec   Loss 4.0360   LearningRate 0.0436   Epoch: 12   Global Step: 132290   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:50:12,173-Speed 5503.79 samples/sec   Loss 4.0513   LearningRate 0.0436   Epoch: 12   Global Step: 132300   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:50:19,734-Speed 5417.67 samples/sec   Loss 4.0386   LearningRate 0.0436   Epoch: 12   Global Step: 132310   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:50:29,541-Speed 4176.93 samples/sec   Loss 4.0809   LearningRate 0.0435   Epoch: 12   Global Step: 132320   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:50:37,175-Speed 5366.84 samples/sec   Loss 4.0570   LearningRate 0.0435   Epoch: 12   Global Step: 132330   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:50:44,731-Speed 5421.54 samples/sec   Loss 4.0533   LearningRate 0.0435   Epoch: 12   Global Step: 132340   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:50:52,253-Speed 5445.69 samples/sec   Loss 4.0021   LearningRate 0.0435   Epoch: 12   Global Step: 132350   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:50:59,762-Speed 5455.09 samples/sec   Loss 4.0101   LearningRate 0.0435   Epoch: 12   Global Step: 132360   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:51:07,354-Speed 5396.36 samples/sec   Loss 4.0198   LearningRate 0.0435   Epoch: 12   Global Step: 132370   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:51:14,924-Speed 5411.76 samples/sec   Loss 4.0552   LearningRate 0.0435   Epoch: 12   Global Step: 132380   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:51:22,455-Speed 5439.46 samples/sec   Loss 4.0688   LearningRate 0.0435   Epoch: 12   Global Step: 132390   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:51:30,075-Speed 5376.25 samples/sec   Loss 4.0434   LearningRate 0.0435   Epoch: 12   Global Step: 132400   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:51:37,605-Speed 5440.35 samples/sec   Loss 4.0590   LearningRate 0.0434   Epoch: 12   Global Step: 132410   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:51:45,147-Speed 5430.93 samples/sec   Loss 4.0220   LearningRate 0.0434   Epoch: 12   Global Step: 132420   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:51:52,774-Speed 5371.19 samples/sec   Loss 3.9861   LearningRate 0.0434   Epoch: 12   Global Step: 132430   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:52:00,371-Speed 5392.32 samples/sec   Loss 4.0551   LearningRate 0.0434   Epoch: 12   Global Step: 132440   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:52:07,879-Speed 5456.34 samples/sec   Loss 4.0585   LearningRate 0.0434   Epoch: 12   Global Step: 132450   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:52:15,550-Speed 5340.57 samples/sec   Loss 4.0770   LearningRate 0.0434   Epoch: 12   Global Step: 132460   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:52:23,147-Speed 5392.31 samples/sec   Loss 4.0583   LearningRate 0.0434   Epoch: 12   Global Step: 132470   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:52:30,992-Speed 5221.45 samples/sec   Loss 4.0203   LearningRate 0.0434   Epoch: 12   Global Step: 132480   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:52:38,550-Speed 5420.30 samples/sec   Loss 4.0270   LearningRate 0.0433   Epoch: 12   Global Step: 132490   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:52:46,113-Speed 5416.72 samples/sec   Loss 4.0747   LearningRate 0.0433   Epoch: 12   Global Step: 132500   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-09 00:52:53,693-Speed 5404.84 samples/sec   Loss 3.9836   LearningRate 0.0433   Epoch: 12   Global Step: 132510   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-09 00:53:01,329-Speed 5364.53 samples/sec   Loss 4.0450   LearningRate 0.0433   Epoch: 12   Global Step: 132520   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:53:08,939-Speed 5383.72 samples/sec   Loss 4.0634   LearningRate 0.0433   Epoch: 12   Global Step: 132530   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:53:16,507-Speed 5412.90 samples/sec   Loss 4.0324   LearningRate 0.0433   Epoch: 12   Global Step: 132540   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:53:24,050-Speed 5430.51 samples/sec   Loss 4.0656   LearningRate 0.0433   Epoch: 12   Global Step: 132550   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:53:31,637-Speed 5399.50 samples/sec   Loss 4.0088   LearningRate 0.0433   Epoch: 12   Global Step: 132560   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:53:39,301-Speed 5345.04 samples/sec   Loss 3.9835   LearningRate 0.0433   Epoch: 12   Global Step: 132570   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:53:46,827-Speed 5443.74 samples/sec   Loss 4.0636   LearningRate 0.0432   Epoch: 12   Global Step: 132580   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:53:54,345-Speed 5448.48 samples/sec   Loss 4.0147   LearningRate 0.0432   Epoch: 12   Global Step: 132590   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:54:01,898-Speed 5423.87 samples/sec   Loss 4.0071   LearningRate 0.0432   Epoch: 12   Global Step: 132600   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:54:09,470-Speed 5409.81 samples/sec   Loss 3.9989   LearningRate 0.0432   Epoch: 12   Global Step: 132610   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:54:17,050-Speed 5405.27 samples/sec   Loss 4.0005   LearningRate 0.0432   Epoch: 12   Global Step: 132620   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:54:24,549-Speed 5462.04 samples/sec   Loss 4.0082   LearningRate 0.0432   Epoch: 12   Global Step: 132630   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:54:32,052-Speed 5460.07 samples/sec   Loss 4.0385   LearningRate 0.0432   Epoch: 12   Global Step: 132640   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:54:39,925-Speed 5203.75 samples/sec   Loss 4.0482   LearningRate 0.0432   Epoch: 12   Global Step: 132650   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:54:47,560-Speed 5365.03 samples/sec   Loss 4.0271   LearningRate 0.0432   Epoch: 12   Global Step: 132660   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:54:55,100-Speed 5432.87 samples/sec   Loss 4.0400   LearningRate 0.0431   Epoch: 12   Global Step: 132670   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:55:02,645-Speed 5429.70 samples/sec   Loss 4.0372   LearningRate 0.0431   Epoch: 12   Global Step: 132680   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:55:10,165-Speed 5447.74 samples/sec   Loss 4.0147   LearningRate 0.0431   Epoch: 12   Global Step: 132690   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:55:17,743-Speed 5406.15 samples/sec   Loss 4.0522   LearningRate 0.0431   Epoch: 12   Global Step: 132700   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:55:25,236-Speed 5466.77 samples/sec   Loss 3.9911   LearningRate 0.0431   Epoch: 12   Global Step: 132710   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:55:32,808-Speed 5410.59 samples/sec   Loss 4.0846   LearningRate 0.0431   Epoch: 12   Global Step: 132720   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:55:40,383-Speed 5408.42 samples/sec   Loss 4.0280   LearningRate 0.0431   Epoch: 12   Global Step: 132730   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:55:47,999-Speed 5378.96 samples/sec   Loss 4.0289   LearningRate 0.0431   Epoch: 12   Global Step: 132740   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:55:55,467-Speed 5485.37 samples/sec   Loss 4.0197   LearningRate 0.0430   Epoch: 12   Global Step: 132750   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:56:03,035-Speed 5412.87 samples/sec   Loss 4.0056   LearningRate 0.0430   Epoch: 12   Global Step: 132760   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:56:10,566-Speed 5439.76 samples/sec   Loss 4.0446   LearningRate 0.0430   Epoch: 12   Global Step: 132770   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:56:18,006-Speed 5506.06 samples/sec   Loss 3.9983   LearningRate 0.0430   Epoch: 12   Global Step: 132780   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:56:25,486-Speed 5476.48 samples/sec   Loss 4.0237   LearningRate 0.0430   Epoch: 12   Global Step: 132790   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:56:32,993-Speed 5456.95 samples/sec   Loss 4.0101   LearningRate 0.0430   Epoch: 12   Global Step: 132800   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:56:40,528-Speed 5437.79 samples/sec   Loss 4.0335   LearningRate 0.0430   Epoch: 12   Global Step: 132810   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:56:47,994-Speed 5487.62 samples/sec   Loss 4.0028   LearningRate 0.0430   Epoch: 12   Global Step: 132820   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:56:55,500-Speed 5457.47 samples/sec   Loss 3.9951   LearningRate 0.0430   Epoch: 12   Global Step: 132830   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:57:02,978-Speed 5477.69 samples/sec   Loss 3.9972   LearningRate 0.0429   Epoch: 12   Global Step: 132840   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:57:10,542-Speed 5415.68 samples/sec   Loss 4.0102   LearningRate 0.0429   Epoch: 12   Global Step: 132850   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:57:18,038-Speed 5465.36 samples/sec   Loss 4.0317   LearningRate 0.0429   Epoch: 12   Global Step: 132860   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:57:25,498-Speed 5491.64 samples/sec   Loss 3.9783   LearningRate 0.0429   Epoch: 12   Global Step: 132870   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:57:32,966-Speed 5485.42 samples/sec   Loss 3.9889   LearningRate 0.0429   Epoch: 12   Global Step: 132880   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:57:40,518-Speed 5424.40 samples/sec   Loss 3.9870   LearningRate 0.0429   Epoch: 12   Global Step: 132890   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:57:47,989-Speed 5483.30 samples/sec   Loss 3.9864   LearningRate 0.0429   Epoch: 12   Global Step: 132900   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:57:55,485-Speed 5464.94 samples/sec   Loss 3.9910   LearningRate 0.0429   Epoch: 12   Global Step: 132910   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:58:03,144-Speed 5348.20 samples/sec   Loss 4.0173   LearningRate 0.0429   Epoch: 12   Global Step: 132920   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:58:10,689-Speed 5429.77 samples/sec   Loss 4.0106   LearningRate 0.0428   Epoch: 12   Global Step: 132930   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-09 00:58:18,157-Speed 5485.20 samples/sec   Loss 4.0116   LearningRate 0.0428   Epoch: 12   Global Step: 132940   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:58:25,699-Speed 5432.02 samples/sec   Loss 3.9847   LearningRate 0.0428   Epoch: 12   Global Step: 132950   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:58:33,267-Speed 5412.60 samples/sec   Loss 3.9861   LearningRate 0.0428   Epoch: 12   Global Step: 132960   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:58:40,772-Speed 5458.47 samples/sec   Loss 4.0254   LearningRate 0.0428   Epoch: 12   Global Step: 132970   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:58:48,379-Speed 5385.52 samples/sec   Loss 4.0211   LearningRate 0.0428   Epoch: 12   Global Step: 132980   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:58:55,967-Speed 5398.76 samples/sec   Loss 3.9695   LearningRate 0.0428   Epoch: 12   Global Step: 132990   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:59:03,463-Speed 5465.00 samples/sec   Loss 3.9619   LearningRate 0.0428   Epoch: 12   Global Step: 133000   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:59:11,013-Speed 5425.57 samples/sec   Loss 3.9730   LearningRate 0.0427   Epoch: 12   Global Step: 133010   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:59:18,588-Speed 5407.99 samples/sec   Loss 3.9794   LearningRate 0.0427   Epoch: 12   Global Step: 133020   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 00:59:26,100-Speed 5453.24 samples/sec   Loss 4.0070   LearningRate 0.0427   Epoch: 12   Global Step: 133030   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:59:33,603-Speed 5459.95 samples/sec   Loss 3.9897   LearningRate 0.0427   Epoch: 12   Global Step: 133040   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:59:41,100-Speed 5463.98 samples/sec   Loss 3.9842   LearningRate 0.0427   Epoch: 12   Global Step: 133050   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:59:48,616-Speed 5449.87 samples/sec   Loss 3.9849   LearningRate 0.0427   Epoch: 12   Global Step: 133060   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 00:59:56,175-Speed 5419.74 samples/sec   Loss 4.0030   LearningRate 0.0427   Epoch: 12   Global Step: 133070   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:00:03,701-Speed 5443.43 samples/sec   Loss 4.0188   LearningRate 0.0427   Epoch: 12   Global Step: 133080   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:00:11,166-Speed 5487.42 samples/sec   Loss 4.0538   LearningRate 0.0427   Epoch: 12   Global Step: 133090   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:00:18,749-Speed 5401.51 samples/sec   Loss 4.0109   LearningRate 0.0426   Epoch: 12   Global Step: 133100   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:00:26,283-Speed 5437.56 samples/sec   Loss 4.0120   LearningRate 0.0426   Epoch: 12   Global Step: 133110   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:00:34,007-Speed 5303.92 samples/sec   Loss 3.9876   LearningRate 0.0426   Epoch: 12   Global Step: 133120   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:00:41,173-Speed 5716.06 samples/sec   Loss 3.9885   LearningRate 0.0426   Epoch: 12   Global Step: 133130   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:00:48,730-Speed 5420.97 samples/sec   Loss 3.9977   LearningRate 0.0426   Epoch: 12   Global Step: 133140   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:00:56,369-Speed 5362.99 samples/sec   Loss 3.9464   LearningRate 0.0426   Epoch: 12   Global Step: 133150   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:01:03,903-Speed 5437.77 samples/sec   Loss 3.9626   LearningRate 0.0426   Epoch: 12   Global Step: 133160   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:01:11,505-Speed 5388.06 samples/sec   Loss 3.9771   LearningRate 0.0426   Epoch: 12   Global Step: 133170   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:01:18,994-Speed 5470.12 samples/sec   Loss 4.0007   LearningRate 0.0426   Epoch: 12   Global Step: 133180   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:01:26,590-Speed 5392.97 samples/sec   Loss 3.9632   LearningRate 0.0425   Epoch: 12   Global Step: 133190   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:01:34,064-Speed 5481.22 samples/sec   Loss 4.0136   LearningRate 0.0425   Epoch: 12   Global Step: 133200   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:01:41,649-Speed 5401.10 samples/sec   Loss 4.0004   LearningRate 0.0425   Epoch: 12   Global Step: 133210   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:01:49,126-Speed 5478.26 samples/sec   Loss 4.0328   LearningRate 0.0425   Epoch: 12   Global Step: 133220   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:01:56,702-Speed 5407.57 samples/sec   Loss 3.9841   LearningRate 0.0425   Epoch: 12   Global Step: 133230   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:02:04,338-Speed 5364.71 samples/sec   Loss 3.9934   LearningRate 0.0425   Epoch: 12   Global Step: 133240   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:02:11,894-Speed 5421.64 samples/sec   Loss 3.9798   LearningRate 0.0425   Epoch: 12   Global Step: 133250   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:02:19,372-Speed 5478.30 samples/sec   Loss 3.9894   LearningRate 0.0425   Epoch: 12   Global Step: 133260   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:02:26,878-Speed 5457.73 samples/sec   Loss 3.9594   LearningRate 0.0425   Epoch: 12   Global Step: 133270   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:02:34,357-Speed 5476.84 samples/sec   Loss 3.9973   LearningRate 0.0424   Epoch: 12   Global Step: 133280   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:02:41,817-Speed 5491.79 samples/sec   Loss 4.0167   LearningRate 0.0424   Epoch: 12   Global Step: 133290   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:02:49,288-Speed 5483.07 samples/sec   Loss 4.0087   LearningRate 0.0424   Epoch: 12   Global Step: 133300   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:02:56,853-Speed 5415.14 samples/sec   Loss 3.9941   LearningRate 0.0424   Epoch: 12   Global Step: 133310   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:03:04,363-Speed 5454.92 samples/sec   Loss 3.9775   LearningRate 0.0424   Epoch: 12   Global Step: 133320   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:03:11,922-Speed 5418.93 samples/sec   Loss 3.9760   LearningRate 0.0424   Epoch: 12   Global Step: 133330   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:03:19,587-Speed 5344.38 samples/sec   Loss 3.9771   LearningRate 0.0424   Epoch: 12   Global Step: 133340   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:03:27,396-Speed 5245.97 samples/sec   Loss 3.9719   LearningRate 0.0424   Epoch: 12   Global Step: 133350   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:03:34,937-Speed 5432.22 samples/sec   Loss 4.0166   LearningRate 0.0423   Epoch: 12   Global Step: 133360   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:03:42,502-Speed 5415.29 samples/sec   Loss 3.9837   LearningRate 0.0423   Epoch: 12   Global Step: 133370   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:03:50,101-Speed 5391.01 samples/sec   Loss 3.9718   LearningRate 0.0423   Epoch: 12   Global Step: 133380   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:03:57,643-Speed 5431.51 samples/sec   Loss 3.9875   LearningRate 0.0423   Epoch: 12   Global Step: 133390   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:04:05,143-Speed 5461.75 samples/sec   Loss 4.0198   LearningRate 0.0423   Epoch: 12   Global Step: 133400   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:04:12,629-Speed 5472.38 samples/sec   Loss 3.9385   LearningRate 0.0423   Epoch: 12   Global Step: 133410   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:04:20,166-Speed 5435.18 samples/sec   Loss 3.9050   LearningRate 0.0423   Epoch: 12   Global Step: 133420   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:04:27,630-Speed 5488.20 samples/sec   Loss 3.9619   LearningRate 0.0423   Epoch: 12   Global Step: 133430   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:04:35,216-Speed 5400.76 samples/sec   Loss 3.9187   LearningRate 0.0423   Epoch: 12   Global Step: 133440   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:04:42,774-Speed 5420.27 samples/sec   Loss 3.9625   LearningRate 0.0422   Epoch: 12   Global Step: 133450   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:04:50,364-Speed 5397.03 samples/sec   Loss 3.9567   LearningRate 0.0422   Epoch: 12   Global Step: 133460   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:04:57,862-Speed 5463.26 samples/sec   Loss 3.9271   LearningRate 0.0422   Epoch: 12   Global Step: 133470   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:05:05,375-Speed 5453.18 samples/sec   Loss 3.9771   LearningRate 0.0422   Epoch: 12   Global Step: 133480   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:05:12,884-Speed 5455.65 samples/sec   Loss 3.9646   LearningRate 0.0422   Epoch: 12   Global Step: 133490   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:05:20,427-Speed 5430.87 samples/sec   Loss 3.9505   LearningRate 0.0422   Epoch: 12   Global Step: 133500   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:05:28,039-Speed 5381.61 samples/sec   Loss 4.0038   LearningRate 0.0422   Epoch: 12   Global Step: 133510   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:05:35,515-Speed 5479.55 samples/sec   Loss 3.9896   LearningRate 0.0422   Epoch: 12   Global Step: 133520   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:05:43,131-Speed 5379.04 samples/sec   Loss 4.0104   LearningRate 0.0422   Epoch: 12   Global Step: 133530   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:05:50,678-Speed 5427.90 samples/sec   Loss 3.8964   LearningRate 0.0421   Epoch: 12   Global Step: 133540   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:05:58,184-Speed 5457.24 samples/sec   Loss 3.9588   LearningRate 0.0421   Epoch: 12   Global Step: 133550   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:06:05,733-Speed 5427.47 samples/sec   Loss 4.0027   LearningRate 0.0421   Epoch: 12   Global Step: 133560   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:06:13,232-Speed 5462.12 samples/sec   Loss 4.0007   LearningRate 0.0421   Epoch: 12   Global Step: 133570   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:06:20,715-Speed 5474.87 samples/sec   Loss 3.9818   LearningRate 0.0421   Epoch: 12   Global Step: 133580   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:06:28,338-Speed 5373.29 samples/sec   Loss 3.9629   LearningRate 0.0421   Epoch: 12   Global Step: 133590   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:06:35,941-Speed 5388.77 samples/sec   Loss 4.0087   LearningRate 0.0421   Epoch: 12   Global Step: 133600   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:06:43,513-Speed 5409.97 samples/sec   Loss 3.9790   LearningRate 0.0421   Epoch: 12   Global Step: 133610   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:06:50,955-Speed 5504.59 samples/sec   Loss 3.9494   LearningRate 0.0421   Epoch: 12   Global Step: 133620   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:06:58,477-Speed 5445.80 samples/sec   Loss 3.9575   LearningRate 0.0420   Epoch: 12   Global Step: 133630   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:07:05,974-Speed 5464.23 samples/sec   Loss 3.9999   LearningRate 0.0420   Epoch: 12   Global Step: 133640   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:07:13,497-Speed 5445.93 samples/sec   Loss 3.9738   LearningRate 0.0420   Epoch: 12   Global Step: 133650   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:07:20,969-Speed 5482.55 samples/sec   Loss 3.9756   LearningRate 0.0420   Epoch: 12   Global Step: 133660   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:07:28,533-Speed 5415.59 samples/sec   Loss 4.0230   LearningRate 0.0420   Epoch: 12   Global Step: 133670   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:07:36,143-Speed 5383.41 samples/sec   Loss 3.9874   LearningRate 0.0420   Epoch: 12   Global Step: 133680   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:07:43,697-Speed 5423.10 samples/sec   Loss 3.9508   LearningRate 0.0420   Epoch: 12   Global Step: 133690   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:07:51,224-Speed 5442.66 samples/sec   Loss 3.9444   LearningRate 0.0420   Epoch: 12   Global Step: 133700   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:07:58,758-Speed 5436.86 samples/sec   Loss 3.9454   LearningRate 0.0419   Epoch: 12   Global Step: 133710   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:08:06,250-Speed 5468.23 samples/sec   Loss 3.9779   LearningRate 0.0419   Epoch: 12   Global Step: 133720   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:08:13,815-Speed 5415.11 samples/sec   Loss 3.9601   LearningRate 0.0419   Epoch: 12   Global Step: 133730   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:08:21,390-Speed 5407.64 samples/sec   Loss 3.9761   LearningRate 0.0419   Epoch: 12   Global Step: 133740   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:08:28,929-Speed 5433.94 samples/sec   Loss 3.9630   LearningRate 0.0419   Epoch: 12   Global Step: 133750   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:08:36,442-Speed 5452.52 samples/sec   Loss 3.9301   LearningRate 0.0419   Epoch: 12   Global Step: 133760   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:08:43,971-Speed 5441.15 samples/sec   Loss 3.9862   LearningRate 0.0419   Epoch: 12   Global Step: 133770   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:08:51,475-Speed 5458.66 samples/sec   Loss 3.9496   LearningRate 0.0419   Epoch: 12   Global Step: 133780   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:08:59,000-Speed 5444.39 samples/sec   Loss 3.9898   LearningRate 0.0419   Epoch: 12   Global Step: 133790   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:09:06,576-Speed 5406.87 samples/sec   Loss 3.9896   LearningRate 0.0418   Epoch: 12   Global Step: 133800   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:09:14,047-Speed 5483.03 samples/sec   Loss 3.9377   LearningRate 0.0418   Epoch: 12   Global Step: 133810   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-01-09 01:09:21,864-Speed 5240.98 samples/sec   Loss 3.9379   LearningRate 0.0418   Epoch: 12   Global Step: 133820   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-01-09 01:09:29,365-Speed 5460.86 samples/sec   Loss 3.9486   LearningRate 0.0418   Epoch: 12   Global Step: 133830   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-01-09 01:09:37,064-Speed 5320.92 samples/sec   Loss 3.9763   LearningRate 0.0418   Epoch: 12   Global Step: 133840   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-01-09 01:09:44,704-Speed 5361.73 samples/sec   Loss 4.0048   LearningRate 0.0418   Epoch: 12   Global Step: 133850   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-01-09 01:09:52,278-Speed 5408.55 samples/sec   Loss 3.9392   LearningRate 0.0418   Epoch: 12   Global Step: 133860   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-01-09 01:09:59,791-Speed 5453.03 samples/sec   Loss 3.9301   LearningRate 0.0418   Epoch: 12   Global Step: 133870   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-01-09 01:10:07,374-Speed 5402.09 samples/sec   Loss 3.9267   LearningRate 0.0418   Epoch: 12   Global Step: 133880   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-01-09 01:10:14,928-Speed 5422.95 samples/sec   Loss 3.9343   LearningRate 0.0417   Epoch: 12   Global Step: 133890   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-01-09 01:10:22,556-Speed 5370.11 samples/sec   Loss 3.9684   LearningRate 0.0417   Epoch: 12   Global Step: 133900   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-01-09 01:10:30,193-Speed 5364.43 samples/sec   Loss 3.9221   LearningRate 0.0417   Epoch: 12   Global Step: 133910   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:10:37,878-Speed 5330.61 samples/sec   Loss 3.9087   LearningRate 0.0417   Epoch: 12   Global Step: 133920   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:10:45,358-Speed 5476.71 samples/sec   Loss 3.9882   LearningRate 0.0417   Epoch: 12   Global Step: 133930   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:10:52,887-Speed 5440.67 samples/sec   Loss 3.9640   LearningRate 0.0417   Epoch: 12   Global Step: 133940   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:11:00,499-Speed 5381.97 samples/sec   Loss 3.9284   LearningRate 0.0417   Epoch: 12   Global Step: 133950   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:11:07,950-Speed 5497.83 samples/sec   Loss 3.9788   LearningRate 0.0417   Epoch: 12   Global Step: 133960   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:11:15,426-Speed 5479.47 samples/sec   Loss 3.9817   LearningRate 0.0417   Epoch: 12   Global Step: 133970   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:11:22,962-Speed 5436.53 samples/sec   Loss 3.9831   LearningRate 0.0416   Epoch: 12   Global Step: 133980   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:11:30,393-Speed 5512.49 samples/sec   Loss 3.9269   LearningRate 0.0416   Epoch: 12   Global Step: 133990   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:11:37,887-Speed 5466.39 samples/sec   Loss 3.9428   LearningRate 0.0416   Epoch: 12   Global Step: 134000   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:12:22,025-[lfw][134000]XNorm: 24.295032
Training: 2022-01-09 01:12:22,026-[lfw][134000]Accuracy-Flip: 0.99800+-0.00277
Training: 2022-01-09 01:12:22,026-[lfw][134000]Accuracy-Highest: 0.99817
Training: 2022-01-09 01:13:13,800-[cfp_fp][134000]XNorm: 22.341688
Training: 2022-01-09 01:13:13,801-[cfp_fp][134000]Accuracy-Flip: 0.99157+-0.00386
Training: 2022-01-09 01:13:13,802-[cfp_fp][134000]Accuracy-Highest: 0.99157
Training: 2022-01-09 01:13:58,203-[agedb_30][134000]XNorm: 23.833087
Training: 2022-01-09 01:13:58,203-[agedb_30][134000]Accuracy-Flip: 0.97900+-0.00569
Training: 2022-01-09 01:13:58,204-[agedb_30][134000]Accuracy-Highest: 0.98067
Training: 2022-01-09 01:14:05,827-Speed 276.87 samples/sec   Loss 3.9396   LearningRate 0.0416   Epoch: 12   Global Step: 134010   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:14:13,320-Speed 5467.55 samples/sec   Loss 3.9851   LearningRate 0.0416   Epoch: 12   Global Step: 134020   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:14:20,807-Speed 5471.06 samples/sec   Loss 3.9810   LearningRate 0.0416   Epoch: 12   Global Step: 134030   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:14:28,246-Speed 5507.45 samples/sec   Loss 3.9674   LearningRate 0.0416   Epoch: 12   Global Step: 134040   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:14:35,718-Speed 5481.93 samples/sec   Loss 3.9290   LearningRate 0.0416   Epoch: 12   Global Step: 134050   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:14:43,200-Speed 5475.72 samples/sec   Loss 3.9697   LearningRate 0.0416   Epoch: 12   Global Step: 134060   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:14:50,739-Speed 5433.98 samples/sec   Loss 3.9216   LearningRate 0.0415   Epoch: 12   Global Step: 134070   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:14:58,149-Speed 5528.32 samples/sec   Loss 3.9248   LearningRate 0.0415   Epoch: 12   Global Step: 134080   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:15:05,698-Speed 5426.64 samples/sec   Loss 3.9387   LearningRate 0.0415   Epoch: 12   Global Step: 134090   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:15:13,201-Speed 5460.05 samples/sec   Loss 3.9264   LearningRate 0.0415   Epoch: 12   Global Step: 134100   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:15:20,757-Speed 5421.23 samples/sec   Loss 3.9363   LearningRate 0.0415   Epoch: 12   Global Step: 134110   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:15:28,299-Speed 5431.40 samples/sec   Loss 3.9350   LearningRate 0.0415   Epoch: 12   Global Step: 134120   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:15:35,921-Speed 5375.31 samples/sec   Loss 3.9355   LearningRate 0.0415   Epoch: 12   Global Step: 134130   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:15:43,392-Speed 5483.55 samples/sec   Loss 3.9272   LearningRate 0.0415   Epoch: 12   Global Step: 134140   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:15:50,911-Speed 5447.94 samples/sec   Loss 3.8716   LearningRate 0.0414   Epoch: 12   Global Step: 134150   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:15:58,482-Speed 5410.37 samples/sec   Loss 3.9520   LearningRate 0.0414   Epoch: 12   Global Step: 134160   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:16:06,066-Speed 5402.13 samples/sec   Loss 3.9266   LearningRate 0.0414   Epoch: 12   Global Step: 134170   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:16:13,708-Speed 5360.24 samples/sec   Loss 3.9394   LearningRate 0.0414   Epoch: 12   Global Step: 134180   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:16:21,175-Speed 5486.16 samples/sec   Loss 3.8778   LearningRate 0.0414   Epoch: 12   Global Step: 134190   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:16:28,750-Speed 5408.17 samples/sec   Loss 3.9327   LearningRate 0.0414   Epoch: 12   Global Step: 134200   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:16:36,227-Speed 5478.95 samples/sec   Loss 3.9444   LearningRate 0.0414   Epoch: 12   Global Step: 134210   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:16:43,674-Speed 5501.08 samples/sec   Loss 3.9854   LearningRate 0.0414   Epoch: 12   Global Step: 134220   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:16:51,238-Speed 5415.40 samples/sec   Loss 3.9402   LearningRate 0.0414   Epoch: 12   Global Step: 134230   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:16:58,700-Speed 5489.82 samples/sec   Loss 3.9303   LearningRate 0.0413   Epoch: 12   Global Step: 134240   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:17:06,217-Speed 5449.57 samples/sec   Loss 3.9328   LearningRate 0.0413   Epoch: 12   Global Step: 134250   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:17:13,675-Speed 5493.31 samples/sec   Loss 3.9139   LearningRate 0.0413   Epoch: 12   Global Step: 134260   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:17:21,157-Speed 5475.13 samples/sec   Loss 3.9034   LearningRate 0.0413   Epoch: 12   Global Step: 134270   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:17:28,679-Speed 5445.56 samples/sec   Loss 3.9146   LearningRate 0.0413   Epoch: 12   Global Step: 134280   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:17:36,193-Speed 5452.05 samples/sec   Loss 3.9661   LearningRate 0.0413   Epoch: 12   Global Step: 134290   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:17:43,688-Speed 5465.55 samples/sec   Loss 3.9457   LearningRate 0.0413   Epoch: 12   Global Step: 134300   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:17:51,205-Speed 5450.12 samples/sec   Loss 3.9403   LearningRate 0.0413   Epoch: 12   Global Step: 134310   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:17:58,688-Speed 5474.44 samples/sec   Loss 3.8955   LearningRate 0.0413   Epoch: 12   Global Step: 134320   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:18:06,209-Speed 5446.51 samples/sec   Loss 3.9197   LearningRate 0.0412   Epoch: 12   Global Step: 134330   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:18:13,715-Speed 5457.43 samples/sec   Loss 3.9112   LearningRate 0.0412   Epoch: 12   Global Step: 134340   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:18:21,304-Speed 5398.45 samples/sec   Loss 3.9280   LearningRate 0.0412   Epoch: 12   Global Step: 134350   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:18:29,002-Speed 5321.14 samples/sec   Loss 3.9366   LearningRate 0.0412   Epoch: 12   Global Step: 134360   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:18:36,511-Speed 5455.46 samples/sec   Loss 3.9335   LearningRate 0.0412   Epoch: 12   Global Step: 134370   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:18:44,052-Speed 5432.98 samples/sec   Loss 3.9321   LearningRate 0.0412   Epoch: 12   Global Step: 134380   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:18:51,558-Speed 5457.88 samples/sec   Loss 3.9530   LearningRate 0.0412   Epoch: 12   Global Step: 134390   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:18:59,118-Speed 5418.68 samples/sec   Loss 3.9322   LearningRate 0.0412   Epoch: 12   Global Step: 134400   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:19:06,581-Speed 5488.78 samples/sec   Loss 3.9628   LearningRate 0.0412   Epoch: 12   Global Step: 134410   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:19:14,099-Speed 5449.33 samples/sec   Loss 3.9622   LearningRate 0.0411   Epoch: 12   Global Step: 134420   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:19:21,572-Speed 5481.91 samples/sec   Loss 3.9605   LearningRate 0.0411   Epoch: 12   Global Step: 134430   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:19:29,040-Speed 5485.22 samples/sec   Loss 3.9102   LearningRate 0.0411   Epoch: 12   Global Step: 134440   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:19:36,554-Speed 5451.45 samples/sec   Loss 3.9641   LearningRate 0.0411   Epoch: 12   Global Step: 134450   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:19:44,094-Speed 5433.25 samples/sec   Loss 3.9326   LearningRate 0.0411   Epoch: 12   Global Step: 134460   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:19:51,628-Speed 5438.17 samples/sec   Loss 3.9273   LearningRate 0.0411   Epoch: 12   Global Step: 134470   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:19:59,207-Speed 5404.66 samples/sec   Loss 3.9314   LearningRate 0.0411   Epoch: 12   Global Step: 134480   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:20:06,660-Speed 5496.32 samples/sec   Loss 3.9346   LearningRate 0.0411   Epoch: 12   Global Step: 134490   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:20:14,202-Speed 5431.73 samples/sec   Loss 3.9020   LearningRate 0.0411   Epoch: 12   Global Step: 134500   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:20:21,846-Speed 5359.87 samples/sec   Loss 3.8788   LearningRate 0.0410   Epoch: 12   Global Step: 134510   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-09 01:20:29,424-Speed 5405.75 samples/sec   Loss 3.9431   LearningRate 0.0410   Epoch: 12   Global Step: 134520   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:20:36,944-Speed 5447.13 samples/sec   Loss 3.9140   LearningRate 0.0410   Epoch: 12   Global Step: 134530   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:20:44,458-Speed 5452.39 samples/sec   Loss 3.9303   LearningRate 0.0410   Epoch: 12   Global Step: 134540   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:20:52,056-Speed 5391.48 samples/sec   Loss 3.9377   LearningRate 0.0410   Epoch: 12   Global Step: 134550   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:20:59,613-Speed 5421.41 samples/sec   Loss 3.8965   LearningRate 0.0410   Epoch: 12   Global Step: 134560   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:21:07,153-Speed 5432.53 samples/sec   Loss 3.9474   LearningRate 0.0410   Epoch: 12   Global Step: 134570   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:21:14,622-Speed 5485.26 samples/sec   Loss 3.9161   LearningRate 0.0410   Epoch: 12   Global Step: 134580   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:21:22,158-Speed 5436.33 samples/sec   Loss 3.9308   LearningRate 0.0410   Epoch: 12   Global Step: 134590   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:21:29,763-Speed 5386.24 samples/sec   Loss 3.9070   LearningRate 0.0409   Epoch: 12   Global Step: 134600   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:21:37,351-Speed 5398.40 samples/sec   Loss 3.9138   LearningRate 0.0409   Epoch: 12   Global Step: 134610   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:21:44,883-Speed 5439.68 samples/sec   Loss 3.8699   LearningRate 0.0409   Epoch: 12   Global Step: 134620   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-09 01:21:52,363-Speed 5476.09 samples/sec   Loss 3.9345   LearningRate 0.0409   Epoch: 12   Global Step: 134630   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:21:59,963-Speed 5390.28 samples/sec   Loss 3.9178   LearningRate 0.0409   Epoch: 12   Global Step: 134640   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:22:07,459-Speed 5464.89 samples/sec   Loss 3.9045   LearningRate 0.0409   Epoch: 12   Global Step: 134650   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:22:15,048-Speed 5398.17 samples/sec   Loss 3.9256   LearningRate 0.0409   Epoch: 12   Global Step: 134660   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:22:22,560-Speed 5453.53 samples/sec   Loss 3.9077   LearningRate 0.0409   Epoch: 12   Global Step: 134670   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:22:30,187-Speed 5370.97 samples/sec   Loss 3.9145   LearningRate 0.0409   Epoch: 12   Global Step: 134680   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:22:37,928-Speed 5291.77 samples/sec   Loss 3.9366   LearningRate 0.0408   Epoch: 12   Global Step: 134690   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:22:45,477-Speed 5426.54 samples/sec   Loss 3.9407   LearningRate 0.0408   Epoch: 12   Global Step: 134700   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:22:52,992-Speed 5451.45 samples/sec   Loss 3.9004   LearningRate 0.0408   Epoch: 12   Global Step: 134710   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:23:00,588-Speed 5392.86 samples/sec   Loss 3.8958   LearningRate 0.0408   Epoch: 12   Global Step: 134720   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:23:08,239-Speed 5353.98 samples/sec   Loss 3.9070   LearningRate 0.0408   Epoch: 12   Global Step: 134730   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:23:15,772-Speed 5438.20 samples/sec   Loss 3.9342   LearningRate 0.0408   Epoch: 12   Global Step: 134740   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:23:23,236-Speed 5488.13 samples/sec   Loss 3.9018   LearningRate 0.0408   Epoch: 12   Global Step: 134750   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:23:30,953-Speed 5308.92 samples/sec   Loss 3.8712   LearningRate 0.0408   Epoch: 12   Global Step: 134760   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:23:38,534-Speed 5403.70 samples/sec   Loss 3.9306   LearningRate 0.0408   Epoch: 12   Global Step: 134770   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:23:46,138-Speed 5386.68 samples/sec   Loss 3.9109   LearningRate 0.0407   Epoch: 12   Global Step: 134780   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:23:53,619-Speed 5476.21 samples/sec   Loss 3.9256   LearningRate 0.0407   Epoch: 12   Global Step: 134790   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:24:01,115-Speed 5465.22 samples/sec   Loss 3.9147   LearningRate 0.0407   Epoch: 12   Global Step: 134800   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:24:24,850-Speed 1725.80 samples/sec   Loss 3.8855   LearningRate 0.0407   Epoch: 13   Global Step: 134810   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:24:32,349-Speed 5462.51 samples/sec   Loss 3.9445   LearningRate 0.0407   Epoch: 13   Global Step: 134820   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:24:39,827-Speed 5478.53 samples/sec   Loss 3.9019   LearningRate 0.0407   Epoch: 13   Global Step: 134830   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:24:47,348-Speed 5446.78 samples/sec   Loss 3.9364   LearningRate 0.0407   Epoch: 13   Global Step: 134840   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:24:54,839-Speed 5468.40 samples/sec   Loss 3.9310   LearningRate 0.0407   Epoch: 13   Global Step: 134850   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:25:02,350-Speed 5454.39 samples/sec   Loss 3.8764   LearningRate 0.0406   Epoch: 13   Global Step: 134860   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:25:09,974-Speed 5373.07 samples/sec   Loss 3.9357   LearningRate 0.0406   Epoch: 13   Global Step: 134870   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:25:17,446-Speed 5482.47 samples/sec   Loss 3.9385   LearningRate 0.0406   Epoch: 13   Global Step: 134880   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:25:24,865-Speed 5521.06 samples/sec   Loss 3.9028   LearningRate 0.0406   Epoch: 13   Global Step: 134890   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:25:32,286-Speed 5520.77 samples/sec   Loss 3.8888   LearningRate 0.0406   Epoch: 13   Global Step: 134900   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:25:39,725-Speed 5507.00 samples/sec   Loss 3.8591   LearningRate 0.0406   Epoch: 13   Global Step: 134910   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:25:47,195-Speed 5483.86 samples/sec   Loss 3.9079   LearningRate 0.0406   Epoch: 13   Global Step: 134920   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:25:54,674-Speed 5476.74 samples/sec   Loss 3.8940   LearningRate 0.0406   Epoch: 13   Global Step: 134930   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:26:02,172-Speed 5463.62 samples/sec   Loss 3.9098   LearningRate 0.0406   Epoch: 13   Global Step: 134940   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:26:09,641-Speed 5485.38 samples/sec   Loss 3.8776   LearningRate 0.0405   Epoch: 13   Global Step: 134950   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:26:17,298-Speed 5349.84 samples/sec   Loss 3.8686   LearningRate 0.0405   Epoch: 13   Global Step: 134960   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:26:25,128-Speed 5231.62 samples/sec   Loss 3.8720   LearningRate 0.0405   Epoch: 13   Global Step: 134970   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:26:32,784-Speed 5350.84 samples/sec   Loss 3.8711   LearningRate 0.0405   Epoch: 13   Global Step: 134980   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:26:40,473-Speed 5328.01 samples/sec   Loss 3.8262   LearningRate 0.0405   Epoch: 13   Global Step: 134990   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:26:48,143-Speed 5341.01 samples/sec   Loss 3.9301   LearningRate 0.0405   Epoch: 13   Global Step: 135000   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:26:55,842-Speed 5320.15 samples/sec   Loss 3.8230   LearningRate 0.0405   Epoch: 13   Global Step: 135010   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:27:03,478-Speed 5365.07 samples/sec   Loss 3.8702   LearningRate 0.0405   Epoch: 13   Global Step: 135020   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:27:11,140-Speed 5346.64 samples/sec   Loss 3.8684   LearningRate 0.0405   Epoch: 13   Global Step: 135030   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:27:18,709-Speed 5412.13 samples/sec   Loss 3.8979   LearningRate 0.0404   Epoch: 13   Global Step: 135040   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:27:26,236-Speed 5442.39 samples/sec   Loss 3.9180   LearningRate 0.0404   Epoch: 13   Global Step: 135050   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:27:33,773-Speed 5435.32 samples/sec   Loss 3.8830   LearningRate 0.0404   Epoch: 13   Global Step: 135060   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:27:41,415-Speed 5360.38 samples/sec   Loss 3.9087   LearningRate 0.0404   Epoch: 13   Global Step: 135070   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:27:48,999-Speed 5401.56 samples/sec   Loss 3.8990   LearningRate 0.0404   Epoch: 13   Global Step: 135080   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:27:56,611-Speed 5381.53 samples/sec   Loss 3.8639   LearningRate 0.0404   Epoch: 13   Global Step: 135090   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-09 01:28:04,157-Speed 5429.07 samples/sec   Loss 3.8977   LearningRate 0.0404   Epoch: 13   Global Step: 135100   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:28:11,793-Speed 5364.96 samples/sec   Loss 3.8681   LearningRate 0.0404   Epoch: 13   Global Step: 135110   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:28:19,325-Speed 5438.77 samples/sec   Loss 3.8842   LearningRate 0.0404   Epoch: 13   Global Step: 135120   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:28:26,962-Speed 5363.65 samples/sec   Loss 3.8706   LearningRate 0.0403   Epoch: 13   Global Step: 135130   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:28:34,517-Speed 5422.75 samples/sec   Loss 3.8466   LearningRate 0.0403   Epoch: 13   Global Step: 135140   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:28:42,022-Speed 5458.37 samples/sec   Loss 3.8460   LearningRate 0.0403   Epoch: 13   Global Step: 135150   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:28:49,526-Speed 5459.16 samples/sec   Loss 3.8842   LearningRate 0.0403   Epoch: 13   Global Step: 135160   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:28:57,097-Speed 5410.70 samples/sec   Loss 3.8718   LearningRate 0.0403   Epoch: 13   Global Step: 135170   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:29:04,636-Speed 5433.96 samples/sec   Loss 3.8034   LearningRate 0.0403   Epoch: 13   Global Step: 135180   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:29:12,214-Speed 5405.58 samples/sec   Loss 3.8668   LearningRate 0.0403   Epoch: 13   Global Step: 135190   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:29:19,754-Speed 5432.88 samples/sec   Loss 3.9135   LearningRate 0.0403   Epoch: 13   Global Step: 135200   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:29:27,259-Speed 5458.41 samples/sec   Loss 3.8740   LearningRate 0.0403   Epoch: 13   Global Step: 135210   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:29:34,755-Speed 5464.96 samples/sec   Loss 3.8986   LearningRate 0.0402   Epoch: 13   Global Step: 135220   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:29:42,238-Speed 5474.27 samples/sec   Loss 3.8588   LearningRate 0.0402   Epoch: 13   Global Step: 135230   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:29:49,676-Speed 5507.86 samples/sec   Loss 3.8966   LearningRate 0.0402   Epoch: 13   Global Step: 135240   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:29:57,115-Speed 5507.02 samples/sec   Loss 3.8526   LearningRate 0.0402   Epoch: 13   Global Step: 135250   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:30:04,609-Speed 5466.40 samples/sec   Loss 3.8800   LearningRate 0.0402   Epoch: 13   Global Step: 135260   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:30:12,115-Speed 5458.01 samples/sec   Loss 3.8810   LearningRate 0.0402   Epoch: 13   Global Step: 135270   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:30:19,556-Speed 5505.48 samples/sec   Loss 3.8713   LearningRate 0.0402   Epoch: 13   Global Step: 135280   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:30:27,147-Speed 5396.23 samples/sec   Loss 3.8723   LearningRate 0.0402   Epoch: 13   Global Step: 135290   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:30:34,626-Speed 5477.54 samples/sec   Loss 3.8629   LearningRate 0.0402   Epoch: 13   Global Step: 135300   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-09 01:30:42,134-Speed 5456.20 samples/sec   Loss 3.8707   LearningRate 0.0401   Epoch: 13   Global Step: 135310   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-09 01:30:49,735-Speed 5389.19 samples/sec   Loss 3.8665   LearningRate 0.0401   Epoch: 13   Global Step: 135320   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-09 01:30:57,239-Speed 5459.44 samples/sec   Loss 3.8359   LearningRate 0.0401   Epoch: 13   Global Step: 135330   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-09 01:31:04,820-Speed 5403.71 samples/sec   Loss 3.9137   LearningRate 0.0401   Epoch: 13   Global Step: 135340   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:31:12,347-Speed 5442.22 samples/sec   Loss 3.8651   LearningRate 0.0401   Epoch: 13   Global Step: 135350   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:31:19,825-Speed 5478.11 samples/sec   Loss 3.8723   LearningRate 0.0401   Epoch: 13   Global Step: 135360   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:31:27,305-Speed 5476.98 samples/sec   Loss 3.8593   LearningRate 0.0401   Epoch: 13   Global Step: 135370   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:31:34,868-Speed 5416.75 samples/sec   Loss 3.8656   LearningRate 0.0401   Epoch: 13   Global Step: 135380   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:31:42,379-Speed 5453.96 samples/sec   Loss 3.8969   LearningRate 0.0401   Epoch: 13   Global Step: 135390   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:31:49,892-Speed 5452.47 samples/sec   Loss 3.8451   LearningRate 0.0400   Epoch: 13   Global Step: 135400   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:31:57,364-Speed 5482.52 samples/sec   Loss 3.8287   LearningRate 0.0400   Epoch: 13   Global Step: 135410   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:32:04,817-Speed 5496.55 samples/sec   Loss 3.8481   LearningRate 0.0400   Epoch: 13   Global Step: 135420   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:32:12,378-Speed 5417.73 samples/sec   Loss 3.8669   LearningRate 0.0400   Epoch: 13   Global Step: 135430   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-09 01:32:19,898-Speed 5447.94 samples/sec   Loss 3.8206   LearningRate 0.0400   Epoch: 13   Global Step: 135440   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:32:27,472-Speed 5408.72 samples/sec   Loss 3.8682   LearningRate 0.0400   Epoch: 13   Global Step: 135450   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:32:34,950-Speed 5477.70 samples/sec   Loss 3.9144   LearningRate 0.0400   Epoch: 13   Global Step: 135460   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:32:42,447-Speed 5464.30 samples/sec   Loss 3.8984   LearningRate 0.0400   Epoch: 13   Global Step: 135470   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:32:49,919-Speed 5482.11 samples/sec   Loss 3.8964   LearningRate 0.0400   Epoch: 13   Global Step: 135480   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:32:57,438-Speed 5448.26 samples/sec   Loss 3.9086   LearningRate 0.0399   Epoch: 13   Global Step: 135490   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:33:04,967-Speed 5441.63 samples/sec   Loss 3.8410   LearningRate 0.0399   Epoch: 13   Global Step: 135500   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:33:12,579-Speed 5380.84 samples/sec   Loss 3.8912   LearningRate 0.0399   Epoch: 13   Global Step: 135510   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:33:20,082-Speed 5460.81 samples/sec   Loss 3.8815   LearningRate 0.0399   Epoch: 13   Global Step: 135520   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:33:27,691-Speed 5383.45 samples/sec   Loss 3.8732   LearningRate 0.0399   Epoch: 13   Global Step: 135530   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:33:35,234-Speed 5430.92 samples/sec   Loss 3.8711   LearningRate 0.0399   Epoch: 13   Global Step: 135540   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-09 01:33:42,687-Speed 5496.29 samples/sec   Loss 3.8904   LearningRate 0.0399   Epoch: 13   Global Step: 135550   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-09 01:33:50,147-Speed 5491.55 samples/sec   Loss 3.9092   LearningRate 0.0399   Epoch: 13   Global Step: 135560   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-09 01:33:57,632-Speed 5472.67 samples/sec   Loss 3.8242   LearningRate 0.0399   Epoch: 13   Global Step: 135570   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-09 01:34:05,062-Speed 5513.36 samples/sec   Loss 3.9079   LearningRate 0.0398   Epoch: 13   Global Step: 135580   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:34:12,593-Speed 5439.31 samples/sec   Loss 3.8597   LearningRate 0.0398   Epoch: 13   Global Step: 135590   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:34:20,074-Speed 5476.49 samples/sec   Loss 3.8384   LearningRate 0.0398   Epoch: 13   Global Step: 135600   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:34:27,606-Speed 5438.96 samples/sec   Loss 3.8431   LearningRate 0.0398   Epoch: 13   Global Step: 135610   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:34:35,094-Speed 5470.52 samples/sec   Loss 3.8494   LearningRate 0.0398   Epoch: 13   Global Step: 135620   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:34:42,566-Speed 5482.31 samples/sec   Loss 3.8704   LearningRate 0.0398   Epoch: 13   Global Step: 135630   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:34:50,099-Speed 5438.23 samples/sec   Loss 3.8507   LearningRate 0.0398   Epoch: 13   Global Step: 135640   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:34:57,642-Speed 5430.97 samples/sec   Loss 3.8573   LearningRate 0.0398   Epoch: 13   Global Step: 135650   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:35:05,088-Speed 5501.52 samples/sec   Loss 3.8275   LearningRate 0.0398   Epoch: 13   Global Step: 135660   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:35:12,579-Speed 5468.69 samples/sec   Loss 3.8845   LearningRate 0.0397   Epoch: 13   Global Step: 135670   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:35:20,176-Speed 5392.41 samples/sec   Loss 3.8496   LearningRate 0.0397   Epoch: 13   Global Step: 135680   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-09 01:35:27,676-Speed 5462.20 samples/sec   Loss 3.8281   LearningRate 0.0397   Epoch: 13   Global Step: 135690   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:35:35,110-Speed 5510.27 samples/sec   Loss 3.8325   LearningRate 0.0397   Epoch: 13   Global Step: 135700   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:35:42,561-Speed 5498.26 samples/sec   Loss 3.8903   LearningRate 0.0397   Epoch: 13   Global Step: 135710   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:35:50,018-Speed 5493.31 samples/sec   Loss 3.8216   LearningRate 0.0397   Epoch: 13   Global Step: 135720   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:35:57,479-Speed 5491.18 samples/sec   Loss 3.8656   LearningRate 0.0397   Epoch: 13   Global Step: 135730   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:36:04,916-Speed 5508.07 samples/sec   Loss 3.8864   LearningRate 0.0397   Epoch: 13   Global Step: 135740   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:36:12,467-Speed 5425.01 samples/sec   Loss 3.8672   LearningRate 0.0397   Epoch: 13   Global Step: 135750   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:36:20,142-Speed 5337.26 samples/sec   Loss 3.8841   LearningRate 0.0396   Epoch: 13   Global Step: 135760   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:36:27,726-Speed 5402.18 samples/sec   Loss 3.8893   LearningRate 0.0396   Epoch: 13   Global Step: 135770   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:36:35,256-Speed 5439.93 samples/sec   Loss 3.8515   LearningRate 0.0396   Epoch: 13   Global Step: 135780   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:36:42,896-Speed 5362.39 samples/sec   Loss 3.8473   LearningRate 0.0396   Epoch: 13   Global Step: 135790   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:36:50,597-Speed 5319.06 samples/sec   Loss 3.8377   LearningRate 0.0396   Epoch: 13   Global Step: 135800   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:36:58,160-Speed 5417.28 samples/sec   Loss 3.8452   LearningRate 0.0396   Epoch: 13   Global Step: 135810   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:37:05,766-Speed 5385.55 samples/sec   Loss 3.8933   LearningRate 0.0396   Epoch: 13   Global Step: 135820   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:37:13,469-Speed 5318.41 samples/sec   Loss 3.8921   LearningRate 0.0396   Epoch: 13   Global Step: 135830   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:37:21,214-Speed 5289.13 samples/sec   Loss 3.8571   LearningRate 0.0396   Epoch: 13   Global Step: 135840   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:37:28,760-Speed 5428.59 samples/sec   Loss 3.8183   LearningRate 0.0395   Epoch: 13   Global Step: 135850   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:37:36,306-Speed 5428.71 samples/sec   Loss 3.8245   LearningRate 0.0395   Epoch: 13   Global Step: 135860   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:37:43,928-Speed 5374.50 samples/sec   Loss 3.8869   LearningRate 0.0395   Epoch: 13   Global Step: 135870   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:37:51,548-Speed 5375.92 samples/sec   Loss 3.8655   LearningRate 0.0395   Epoch: 13   Global Step: 135880   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:37:59,144-Speed 5393.23 samples/sec   Loss 3.8585   LearningRate 0.0395   Epoch: 13   Global Step: 135890   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:38:06,686-Speed 5431.75 samples/sec   Loss 3.8469   LearningRate 0.0395   Epoch: 13   Global Step: 135900   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:38:14,226-Speed 5432.94 samples/sec   Loss 3.8315   LearningRate 0.0395   Epoch: 13   Global Step: 135910   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:38:21,759-Speed 5438.09 samples/sec   Loss 3.8875   LearningRate 0.0395   Epoch: 13   Global Step: 135920   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:38:29,339-Speed 5404.89 samples/sec   Loss 3.8061   LearningRate 0.0395   Epoch: 13   Global Step: 135930   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:38:36,957-Speed 5377.12 samples/sec   Loss 3.8975   LearningRate 0.0394   Epoch: 13   Global Step: 135940   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:38:44,543-Speed 5399.90 samples/sec   Loss 3.8129   LearningRate 0.0394   Epoch: 13   Global Step: 135950   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:38:52,192-Speed 5355.14 samples/sec   Loss 3.9036   LearningRate 0.0394   Epoch: 13   Global Step: 135960   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:38:59,715-Speed 5446.04 samples/sec   Loss 3.8247   LearningRate 0.0394   Epoch: 13   Global Step: 135970   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:39:07,274-Speed 5419.46 samples/sec   Loss 3.8042   LearningRate 0.0394   Epoch: 13   Global Step: 135980   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:39:14,778-Speed 5458.78 samples/sec   Loss 3.8883   LearningRate 0.0394   Epoch: 13   Global Step: 135990   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:39:22,247-Speed 5484.61 samples/sec   Loss 3.8274   LearningRate 0.0394   Epoch: 13   Global Step: 136000   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:40:06,309-[lfw][136000]XNorm: 22.902075
Training: 2022-01-09 01:40:06,310-[lfw][136000]Accuracy-Flip: 0.99800+-0.00296
Training: 2022-01-09 01:40:06,311-[lfw][136000]Accuracy-Highest: 0.99817
Training: 2022-01-09 01:40:57,383-[cfp_fp][136000]XNorm: 21.327764
Training: 2022-01-09 01:40:57,384-[cfp_fp][136000]Accuracy-Flip: 0.99029+-0.00455
Training: 2022-01-09 01:40:57,384-[cfp_fp][136000]Accuracy-Highest: 0.99157
Training: 2022-01-09 01:41:41,288-[agedb_30][136000]XNorm: 23.001569
Training: 2022-01-09 01:41:41,289-[agedb_30][136000]Accuracy-Flip: 0.98033+-0.00839
Training: 2022-01-09 01:41:41,289-[agedb_30][136000]Accuracy-Highest: 0.98067
Training: 2022-01-09 01:41:48,868-Speed 279.36 samples/sec   Loss 3.8001   LearningRate 0.0394   Epoch: 13   Global Step: 136010   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:41:56,400-Speed 5439.19 samples/sec   Loss 3.8782   LearningRate 0.0394   Epoch: 13   Global Step: 136020   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:42:03,927-Speed 5442.88 samples/sec   Loss 3.7791   LearningRate 0.0393   Epoch: 13   Global Step: 136030   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:42:11,442-Speed 5450.28 samples/sec   Loss 3.8024   LearningRate 0.0393   Epoch: 13   Global Step: 136040   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:42:18,905-Speed 5489.11 samples/sec   Loss 3.8686   LearningRate 0.0393   Epoch: 13   Global Step: 136050   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:42:26,391-Speed 5472.75 samples/sec   Loss 3.8608   LearningRate 0.0393   Epoch: 13   Global Step: 136060   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:42:33,918-Speed 5441.98 samples/sec   Loss 3.8366   LearningRate 0.0393   Epoch: 13   Global Step: 136070   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:42:41,439-Speed 5447.38 samples/sec   Loss 3.8411   LearningRate 0.0393   Epoch: 13   Global Step: 136080   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-09 01:42:48,960-Speed 5446.47 samples/sec   Loss 3.8720   LearningRate 0.0393   Epoch: 13   Global Step: 136090   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:42:56,596-Speed 5365.08 samples/sec   Loss 3.8593   LearningRate 0.0393   Epoch: 13   Global Step: 136100   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:43:04,179-Speed 5401.76 samples/sec   Loss 3.8350   LearningRate 0.0393   Epoch: 13   Global Step: 136110   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:43:11,733-Speed 5423.17 samples/sec   Loss 3.8244   LearningRate 0.0392   Epoch: 13   Global Step: 136120   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:43:19,207-Speed 5480.55 samples/sec   Loss 3.8324   LearningRate 0.0392   Epoch: 13   Global Step: 136130   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:43:26,679-Speed 5483.25 samples/sec   Loss 3.8185   LearningRate 0.0392   Epoch: 13   Global Step: 136140   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:43:34,163-Speed 5473.14 samples/sec   Loss 3.9016   LearningRate 0.0392   Epoch: 13   Global Step: 136150   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:43:41,673-Speed 5455.32 samples/sec   Loss 3.8679   LearningRate 0.0392   Epoch: 13   Global Step: 136160   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:43:49,228-Speed 5422.11 samples/sec   Loss 3.8072   LearningRate 0.0392   Epoch: 13   Global Step: 136170   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:43:56,768-Speed 5432.92 samples/sec   Loss 3.7863   LearningRate 0.0392   Epoch: 13   Global Step: 136180   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:44:04,308-Speed 5433.53 samples/sec   Loss 3.7997   LearningRate 0.0392   Epoch: 13   Global Step: 136190   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-09 01:44:11,873-Speed 5414.34 samples/sec   Loss 3.7990   LearningRate 0.0392   Epoch: 13   Global Step: 136200   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:44:19,451-Speed 5406.06 samples/sec   Loss 3.8798   LearningRate 0.0392   Epoch: 13   Global Step: 136210   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:44:26,967-Speed 5450.71 samples/sec   Loss 3.8298   LearningRate 0.0391   Epoch: 13   Global Step: 136220   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:44:34,414-Speed 5500.85 samples/sec   Loss 3.8189   LearningRate 0.0391   Epoch: 13   Global Step: 136230   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:44:41,971-Speed 5420.48 samples/sec   Loss 3.8435   LearningRate 0.0391   Epoch: 13   Global Step: 136240   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:44:49,489-Speed 5449.21 samples/sec   Loss 3.8322   LearningRate 0.0391   Epoch: 13   Global Step: 136250   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:44:57,081-Speed 5396.47 samples/sec   Loss 3.7697   LearningRate 0.0391   Epoch: 13   Global Step: 136260   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:45:04,699-Speed 5377.30 samples/sec   Loss 3.8239   LearningRate 0.0391   Epoch: 13   Global Step: 136270   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:45:12,200-Speed 5461.03 samples/sec   Loss 3.8464   LearningRate 0.0391   Epoch: 13   Global Step: 136280   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:45:19,746-Speed 5428.71 samples/sec   Loss 3.8626   LearningRate 0.0391   Epoch: 13   Global Step: 136290   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:45:27,311-Speed 5415.87 samples/sec   Loss 3.7912   LearningRate 0.0391   Epoch: 13   Global Step: 136300   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-09 01:45:34,753-Speed 5504.22 samples/sec   Loss 3.8404   LearningRate 0.0390   Epoch: 13   Global Step: 136310   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:45:42,295-Speed 5431.51 samples/sec   Loss 3.8562   LearningRate 0.0390   Epoch: 13   Global Step: 136320   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:45:49,824-Speed 5441.44 samples/sec   Loss 3.8084   LearningRate 0.0390   Epoch: 13   Global Step: 136330   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:45:57,325-Speed 5461.01 samples/sec   Loss 3.7875   LearningRate 0.0390   Epoch: 13   Global Step: 136340   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:46:05,030-Speed 5317.42 samples/sec   Loss 3.7794   LearningRate 0.0390   Epoch: 13   Global Step: 136350   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:46:12,580-Speed 5425.71 samples/sec   Loss 3.8117   LearningRate 0.0390   Epoch: 13   Global Step: 136360   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:46:20,112-Speed 5438.96 samples/sec   Loss 3.8303   LearningRate 0.0390   Epoch: 13   Global Step: 136370   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:46:27,825-Speed 5311.35 samples/sec   Loss 3.8259   LearningRate 0.0390   Epoch: 13   Global Step: 136380   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:46:35,446-Speed 5374.84 samples/sec   Loss 3.8229   LearningRate 0.0390   Epoch: 13   Global Step: 136390   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:46:43,017-Speed 5410.68 samples/sec   Loss 3.7763   LearningRate 0.0389   Epoch: 13   Global Step: 136400   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:46:50,561-Speed 5430.22 samples/sec   Loss 3.7874   LearningRate 0.0389   Epoch: 13   Global Step: 136410   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-09 01:46:58,158-Speed 5392.30 samples/sec   Loss 3.7955   LearningRate 0.0389   Epoch: 13   Global Step: 136420   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:47:05,658-Speed 5462.44 samples/sec   Loss 3.8067   LearningRate 0.0389   Epoch: 13   Global Step: 136430   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:47:13,217-Speed 5418.80 samples/sec   Loss 3.8069   LearningRate 0.0389   Epoch: 13   Global Step: 136440   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:47:20,742-Speed 5444.18 samples/sec   Loss 3.8331   LearningRate 0.0389   Epoch: 13   Global Step: 136450   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:47:28,303-Speed 5417.71 samples/sec   Loss 3.8163   LearningRate 0.0389   Epoch: 13   Global Step: 136460   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:47:35,821-Speed 5448.99 samples/sec   Loss 3.8264   LearningRate 0.0389   Epoch: 13   Global Step: 136470   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:47:43,343-Speed 5446.14 samples/sec   Loss 3.7909   LearningRate 0.0389   Epoch: 13   Global Step: 136480   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:47:50,946-Speed 5388.06 samples/sec   Loss 3.7887   LearningRate 0.0388   Epoch: 13   Global Step: 136490   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:47:58,560-Speed 5380.05 samples/sec   Loss 3.7912   LearningRate 0.0388   Epoch: 13   Global Step: 136500   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:48:06,086-Speed 5443.45 samples/sec   Loss 3.8010   LearningRate 0.0388   Epoch: 13   Global Step: 136510   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:48:13,628-Speed 5431.42 samples/sec   Loss 3.8236   LearningRate 0.0388   Epoch: 13   Global Step: 136520   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-09 01:48:21,176-Speed 5427.54 samples/sec   Loss 3.8344   LearningRate 0.0388   Epoch: 13   Global Step: 136530   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:48:28,758-Speed 5402.68 samples/sec   Loss 3.8310   LearningRate 0.0388   Epoch: 13   Global Step: 136540   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:48:36,301-Speed 5431.45 samples/sec   Loss 3.8233   LearningRate 0.0388   Epoch: 13   Global Step: 136550   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:48:43,876-Speed 5407.75 samples/sec   Loss 3.8213   LearningRate 0.0388   Epoch: 13   Global Step: 136560   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:48:51,428-Speed 5424.59 samples/sec   Loss 3.8000   LearningRate 0.0388   Epoch: 13   Global Step: 136570   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:48:58,944-Speed 5449.73 samples/sec   Loss 3.7620   LearningRate 0.0387   Epoch: 13   Global Step: 136580   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:49:06,493-Speed 5427.37 samples/sec   Loss 3.8281   LearningRate 0.0387   Epoch: 13   Global Step: 136590   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:49:14,029-Speed 5435.50 samples/sec   Loss 3.7824   LearningRate 0.0387   Epoch: 13   Global Step: 136600   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:49:21,501-Speed 5482.42 samples/sec   Loss 3.8093   LearningRate 0.0387   Epoch: 13   Global Step: 136610   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:49:29,064-Speed 5416.36 samples/sec   Loss 3.8308   LearningRate 0.0387   Epoch: 13   Global Step: 136620   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:49:36,566-Speed 5460.98 samples/sec   Loss 3.7947   LearningRate 0.0387   Epoch: 13   Global Step: 136630   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:49:44,099-Speed 5437.64 samples/sec   Loss 3.7933   LearningRate 0.0387   Epoch: 13   Global Step: 136640   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:49:51,648-Speed 5427.11 samples/sec   Loss 3.7504   LearningRate 0.0387   Epoch: 13   Global Step: 136650   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:49:59,204-Speed 5421.25 samples/sec   Loss 3.7847   LearningRate 0.0387   Epoch: 13   Global Step: 136660   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:50:06,714-Speed 5454.79 samples/sec   Loss 3.8403   LearningRate 0.0386   Epoch: 13   Global Step: 136670   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:50:14,251-Speed 5434.80 samples/sec   Loss 3.7889   LearningRate 0.0386   Epoch: 13   Global Step: 136680   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:50:21,820-Speed 5412.82 samples/sec   Loss 3.8198   LearningRate 0.0386   Epoch: 13   Global Step: 136690   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:50:29,423-Speed 5387.31 samples/sec   Loss 3.8202   LearningRate 0.0386   Epoch: 13   Global Step: 136700   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:50:36,955-Speed 5440.95 samples/sec   Loss 3.8433   LearningRate 0.0386   Epoch: 13   Global Step: 136710   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:50:44,670-Speed 5309.50 samples/sec   Loss 3.7806   LearningRate 0.0386   Epoch: 13   Global Step: 136720   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:50:52,323-Speed 5352.70 samples/sec   Loss 3.8234   LearningRate 0.0386   Epoch: 13   Global Step: 136730   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:50:59,888-Speed 5414.71 samples/sec   Loss 3.7830   LearningRate 0.0386   Epoch: 13   Global Step: 136740   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:51:07,433-Speed 5430.06 samples/sec   Loss 3.7992   LearningRate 0.0386   Epoch: 13   Global Step: 136750   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:51:14,989-Speed 5420.96 samples/sec   Loss 3.7990   LearningRate 0.0385   Epoch: 13   Global Step: 136760   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:51:22,451-Speed 5490.29 samples/sec   Loss 3.7776   LearningRate 0.0385   Epoch: 13   Global Step: 136770   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:51:30,102-Speed 5353.54 samples/sec   Loss 3.8365   LearningRate 0.0385   Epoch: 13   Global Step: 136780   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:51:37,735-Speed 5367.15 samples/sec   Loss 3.8325   LearningRate 0.0385   Epoch: 13   Global Step: 136790   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:51:45,414-Speed 5334.91 samples/sec   Loss 3.7528   LearningRate 0.0385   Epoch: 13   Global Step: 136800   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:51:53,019-Speed 5387.00 samples/sec   Loss 3.8177   LearningRate 0.0385   Epoch: 13   Global Step: 136810   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:52:00,562-Speed 5430.22 samples/sec   Loss 3.8049   LearningRate 0.0385   Epoch: 13   Global Step: 136820   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:52:08,068-Speed 5457.67 samples/sec   Loss 3.8293   LearningRate 0.0385   Epoch: 13   Global Step: 136830   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:52:15,662-Speed 5394.74 samples/sec   Loss 3.7654   LearningRate 0.0385   Epoch: 13   Global Step: 136840   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:52:23,186-Speed 5444.17 samples/sec   Loss 3.7501   LearningRate 0.0384   Epoch: 13   Global Step: 136850   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:52:30,702-Speed 5450.62 samples/sec   Loss 3.8242   LearningRate 0.0384   Epoch: 13   Global Step: 136860   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:52:38,233-Speed 5439.35 samples/sec   Loss 3.7520   LearningRate 0.0384   Epoch: 13   Global Step: 136870   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:52:45,822-Speed 5397.66 samples/sec   Loss 3.8127   LearningRate 0.0384   Epoch: 13   Global Step: 136880   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:52:53,551-Speed 5300.57 samples/sec   Loss 3.8179   LearningRate 0.0384   Epoch: 13   Global Step: 136890   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:53:01,093-Speed 5431.28 samples/sec   Loss 3.7895   LearningRate 0.0384   Epoch: 13   Global Step: 136900   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:53:08,692-Speed 5391.07 samples/sec   Loss 3.8253   LearningRate 0.0384   Epoch: 13   Global Step: 136910   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:53:16,190-Speed 5463.79 samples/sec   Loss 3.7737   LearningRate 0.0384   Epoch: 13   Global Step: 136920   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:53:23,741-Speed 5425.07 samples/sec   Loss 3.7704   LearningRate 0.0384   Epoch: 13   Global Step: 136930   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:53:31,262-Speed 5446.38 samples/sec   Loss 3.8242   LearningRate 0.0384   Epoch: 13   Global Step: 136940   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:53:38,793-Speed 5439.71 samples/sec   Loss 3.7498   LearningRate 0.0383   Epoch: 13   Global Step: 136950   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:53:46,347-Speed 5422.94 samples/sec   Loss 3.7327   LearningRate 0.0383   Epoch: 13   Global Step: 136960   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:53:53,924-Speed 5406.45 samples/sec   Loss 3.7613   LearningRate 0.0383   Epoch: 13   Global Step: 136970   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-09 01:54:01,522-Speed 5391.17 samples/sec   Loss 3.8032   LearningRate 0.0383   Epoch: 13   Global Step: 136980   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:54:09,090-Speed 5413.09 samples/sec   Loss 3.7838   LearningRate 0.0383   Epoch: 13   Global Step: 136990   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:54:16,681-Speed 5397.13 samples/sec   Loss 3.7543   LearningRate 0.0383   Epoch: 13   Global Step: 137000   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:54:24,324-Speed 5359.28 samples/sec   Loss 3.7634   LearningRate 0.0383   Epoch: 13   Global Step: 137010   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:54:31,887-Speed 5416.81 samples/sec   Loss 3.7730   LearningRate 0.0383   Epoch: 13   Global Step: 137020   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:54:39,345-Speed 5492.29 samples/sec   Loss 3.8405   LearningRate 0.0383   Epoch: 13   Global Step: 137030   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:54:46,908-Speed 5416.63 samples/sec   Loss 3.7769   LearningRate 0.0382   Epoch: 13   Global Step: 137040   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:54:54,512-Speed 5388.65 samples/sec   Loss 3.8275   LearningRate 0.0382   Epoch: 13   Global Step: 137050   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:55:02,058-Speed 5428.28 samples/sec   Loss 3.7618   LearningRate 0.0382   Epoch: 13   Global Step: 137060   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:55:09,590-Speed 5438.57 samples/sec   Loss 3.8272   LearningRate 0.0382   Epoch: 13   Global Step: 137070   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:55:17,118-Speed 5442.10 samples/sec   Loss 3.7763   LearningRate 0.0382   Epoch: 13   Global Step: 137080   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:55:24,656-Speed 5434.77 samples/sec   Loss 3.7953   LearningRate 0.0382   Epoch: 13   Global Step: 137090   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:55:32,263-Speed 5384.67 samples/sec   Loss 3.8016   LearningRate 0.0382   Epoch: 13   Global Step: 137100   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:55:39,765-Speed 5460.64 samples/sec   Loss 3.7854   LearningRate 0.0382   Epoch: 13   Global Step: 137110   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:55:47,239-Speed 5481.04 samples/sec   Loss 3.7374   LearningRate 0.0382   Epoch: 13   Global Step: 137120   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:55:54,802-Speed 5417.16 samples/sec   Loss 3.7691   LearningRate 0.0381   Epoch: 13   Global Step: 137130   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:56:02,347-Speed 5429.15 samples/sec   Loss 3.7527   LearningRate 0.0381   Epoch: 13   Global Step: 137140   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:56:09,890-Speed 5430.53 samples/sec   Loss 3.7479   LearningRate 0.0381   Epoch: 13   Global Step: 137150   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:56:17,480-Speed 5397.65 samples/sec   Loss 3.7705   LearningRate 0.0381   Epoch: 13   Global Step: 137160   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:56:25,135-Speed 5351.15 samples/sec   Loss 3.8179   LearningRate 0.0381   Epoch: 13   Global Step: 137170   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 01:56:32,618-Speed 5474.49 samples/sec   Loss 3.7625   LearningRate 0.0381   Epoch: 13   Global Step: 137180   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:56:40,208-Speed 5396.76 samples/sec   Loss 3.7926   LearningRate 0.0381   Epoch: 13   Global Step: 137190   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:56:47,764-Speed 5422.18 samples/sec   Loss 3.7390   LearningRate 0.0381   Epoch: 13   Global Step: 137200   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:56:55,332-Speed 5413.15 samples/sec   Loss 3.8115   LearningRate 0.0381   Epoch: 13   Global Step: 137210   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:57:02,801-Speed 5484.76 samples/sec   Loss 3.7665   LearningRate 0.0380   Epoch: 13   Global Step: 137220   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:57:10,352-Speed 5425.05 samples/sec   Loss 3.7756   LearningRate 0.0380   Epoch: 13   Global Step: 137230   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:57:17,959-Speed 5384.88 samples/sec   Loss 3.7745   LearningRate 0.0380   Epoch: 13   Global Step: 137240   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:57:25,460-Speed 5461.68 samples/sec   Loss 3.7997   LearningRate 0.0380   Epoch: 13   Global Step: 137250   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:57:32,939-Speed 5476.98 samples/sec   Loss 3.7982   LearningRate 0.0380   Epoch: 13   Global Step: 137260   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-01-09 01:57:40,418-Speed 5477.27 samples/sec   Loss 3.7780   LearningRate 0.0380   Epoch: 13   Global Step: 137270   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-01-09 01:57:47,933-Speed 5451.64 samples/sec   Loss 3.7934   LearningRate 0.0380   Epoch: 13   Global Step: 137280   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-01-09 01:57:55,442-Speed 5455.78 samples/sec   Loss 3.7438   LearningRate 0.0380   Epoch: 13   Global Step: 137290   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-01-09 01:58:03,036-Speed 5393.87 samples/sec   Loss 3.7614   LearningRate 0.0380   Epoch: 13   Global Step: 137300   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-01-09 01:58:10,527-Speed 5468.69 samples/sec   Loss 3.7399   LearningRate 0.0379   Epoch: 13   Global Step: 137310   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-01-09 01:58:18,091-Speed 5415.52 samples/sec   Loss 3.7388   LearningRate 0.0379   Epoch: 13   Global Step: 137320   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-01-09 01:58:25,051-Speed 5886.10 samples/sec   Loss 3.7954   LearningRate 0.0379   Epoch: 13   Global Step: 137330   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-01-09 01:58:32,361-Speed 5604.30 samples/sec   Loss 3.7850   LearningRate 0.0379   Epoch: 13   Global Step: 137340   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-01-09 01:58:39,852-Speed 5468.44 samples/sec   Loss 3.7401   LearningRate 0.0379   Epoch: 13   Global Step: 137350   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-01-09 01:58:47,399-Speed 5428.26 samples/sec   Loss 3.7597   LearningRate 0.0379   Epoch: 13   Global Step: 137360   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:58:54,900-Speed 5460.82 samples/sec   Loss 3.7965   LearningRate 0.0379   Epoch: 13   Global Step: 137370   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:59:02,535-Speed 5365.79 samples/sec   Loss 3.7451   LearningRate 0.0379   Epoch: 13   Global Step: 137380   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:59:10,056-Speed 5446.20 samples/sec   Loss 3.7975   LearningRate 0.0379   Epoch: 13   Global Step: 137390   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:59:17,717-Speed 5347.91 samples/sec   Loss 3.7738   LearningRate 0.0379   Epoch: 13   Global Step: 137400   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:59:25,205-Speed 5470.86 samples/sec   Loss 3.7810   LearningRate 0.0378   Epoch: 13   Global Step: 137410   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:59:32,680-Speed 5480.43 samples/sec   Loss 3.7118   LearningRate 0.0378   Epoch: 13   Global Step: 137420   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:59:40,283-Speed 5387.47 samples/sec   Loss 3.7303   LearningRate 0.0378   Epoch: 13   Global Step: 137430   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:59:47,802-Speed 5448.62 samples/sec   Loss 3.7739   LearningRate 0.0378   Epoch: 13   Global Step: 137440   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 01:59:55,271-Speed 5484.67 samples/sec   Loss 3.7598   LearningRate 0.0378   Epoch: 13   Global Step: 137450   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:00:02,794-Speed 5444.94 samples/sec   Loss 3.7184   LearningRate 0.0378   Epoch: 13   Global Step: 137460   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:00:10,352-Speed 5420.64 samples/sec   Loss 3.7208   LearningRate 0.0378   Epoch: 13   Global Step: 137470   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:00:17,841-Speed 5469.57 samples/sec   Loss 3.7314   LearningRate 0.0378   Epoch: 13   Global Step: 137480   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:00:25,345-Speed 5459.20 samples/sec   Loss 3.7859   LearningRate 0.0378   Epoch: 13   Global Step: 137490   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:00:32,827-Speed 5475.12 samples/sec   Loss 3.7727   LearningRate 0.0377   Epoch: 13   Global Step: 137500   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:00:40,398-Speed 5411.18 samples/sec   Loss 3.7118   LearningRate 0.0377   Epoch: 13   Global Step: 137510   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:00:47,871-Speed 5481.41 samples/sec   Loss 3.7010   LearningRate 0.0377   Epoch: 13   Global Step: 137520   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:00:55,385-Speed 5451.88 samples/sec   Loss 3.7792   LearningRate 0.0377   Epoch: 13   Global Step: 137530   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:01:02,889-Speed 5459.53 samples/sec   Loss 3.7673   LearningRate 0.0377   Epoch: 13   Global Step: 137540   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:01:10,385-Speed 5465.01 samples/sec   Loss 3.7796   LearningRate 0.0377   Epoch: 13   Global Step: 137550   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:01:17,903-Speed 5448.49 samples/sec   Loss 3.7758   LearningRate 0.0377   Epoch: 13   Global Step: 137560   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-09 02:01:25,463-Speed 5419.08 samples/sec   Loss 3.7572   LearningRate 0.0377   Epoch: 13   Global Step: 137570   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-09 02:01:32,985-Speed 5446.76 samples/sec   Loss 3.7325   LearningRate 0.0377   Epoch: 13   Global Step: 137580   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:01:40,627-Speed 5359.71 samples/sec   Loss 3.7123   LearningRate 0.0376   Epoch: 13   Global Step: 137590   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:01:48,195-Speed 5413.27 samples/sec   Loss 3.7107   LearningRate 0.0376   Epoch: 13   Global Step: 137600   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:01:55,692-Speed 5464.25 samples/sec   Loss 3.7445   LearningRate 0.0376   Epoch: 13   Global Step: 137610   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:02:03,190-Speed 5463.95 samples/sec   Loss 3.7721   LearningRate 0.0376   Epoch: 13   Global Step: 137620   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:02:10,665-Speed 5479.96 samples/sec   Loss 3.7249   LearningRate 0.0376   Epoch: 13   Global Step: 137630   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:02:18,217-Speed 5424.66 samples/sec   Loss 3.7094   LearningRate 0.0376   Epoch: 13   Global Step: 137640   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:02:25,805-Speed 5398.36 samples/sec   Loss 3.7568   LearningRate 0.0376   Epoch: 13   Global Step: 137650   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:02:33,291-Speed 5472.56 samples/sec   Loss 3.7551   LearningRate 0.0376   Epoch: 13   Global Step: 137660   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:02:40,810-Speed 5448.47 samples/sec   Loss 3.7704   LearningRate 0.0376   Epoch: 13   Global Step: 137670   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:02:48,408-Speed 5390.98 samples/sec   Loss 3.7248   LearningRate 0.0375   Epoch: 13   Global Step: 137680   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-09 02:02:55,969-Speed 5418.26 samples/sec   Loss 3.8182   LearningRate 0.0375   Epoch: 13   Global Step: 137690   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:03:03,535-Speed 5414.84 samples/sec   Loss 3.7509   LearningRate 0.0375   Epoch: 13   Global Step: 137700   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:03:11,218-Speed 5331.93 samples/sec   Loss 3.7524   LearningRate 0.0375   Epoch: 13   Global Step: 137710   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:03:18,831-Speed 5380.93 samples/sec   Loss 3.6958   LearningRate 0.0375   Epoch: 13   Global Step: 137720   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:03:26,369-Speed 5434.92 samples/sec   Loss 3.7362   LearningRate 0.0375   Epoch: 13   Global Step: 137730   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:03:33,816-Speed 5500.30 samples/sec   Loss 3.7946   LearningRate 0.0375   Epoch: 13   Global Step: 137740   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:03:41,482-Speed 5343.69 samples/sec   Loss 3.7402   LearningRate 0.0375   Epoch: 13   Global Step: 137750   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:03:48,994-Speed 5453.69 samples/sec   Loss 3.7114   LearningRate 0.0375   Epoch: 13   Global Step: 137760   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:03:56,560-Speed 5414.10 samples/sec   Loss 3.7470   LearningRate 0.0375   Epoch: 13   Global Step: 137770   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:04:04,188-Speed 5370.90 samples/sec   Loss 3.7073   LearningRate 0.0374   Epoch: 13   Global Step: 137780   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:04:11,650-Speed 5489.87 samples/sec   Loss 3.7406   LearningRate 0.0374   Epoch: 13   Global Step: 137790   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-09 02:04:19,207-Speed 5420.87 samples/sec   Loss 3.7449   LearningRate 0.0374   Epoch: 13   Global Step: 137800   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-09 02:04:26,956-Speed 5286.48 samples/sec   Loss 3.7804   LearningRate 0.0374   Epoch: 13   Global Step: 137810   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:04:34,646-Speed 5326.84 samples/sec   Loss 3.7159   LearningRate 0.0374   Epoch: 13   Global Step: 137820   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:04:42,284-Speed 5363.22 samples/sec   Loss 3.7583   LearningRate 0.0374   Epoch: 13   Global Step: 137830   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:04:49,757-Speed 5481.51 samples/sec   Loss 3.7238   LearningRate 0.0374   Epoch: 13   Global Step: 137840   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:04:57,256-Speed 5463.15 samples/sec   Loss 3.7821   LearningRate 0.0374   Epoch: 13   Global Step: 137850   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:05:04,813-Speed 5421.45 samples/sec   Loss 3.7683   LearningRate 0.0374   Epoch: 13   Global Step: 137860   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:05:12,421-Speed 5384.16 samples/sec   Loss 3.7891   LearningRate 0.0373   Epoch: 13   Global Step: 137870   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:05:20,042-Speed 5375.24 samples/sec   Loss 3.7321   LearningRate 0.0373   Epoch: 13   Global Step: 137880   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:05:27,747-Speed 5316.91 samples/sec   Loss 3.7541   LearningRate 0.0373   Epoch: 13   Global Step: 137890   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:05:35,364-Speed 5378.46 samples/sec   Loss 3.6941   LearningRate 0.0373   Epoch: 13   Global Step: 137900   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:05:42,847-Speed 5473.60 samples/sec   Loss 3.7567   LearningRate 0.0373   Epoch: 13   Global Step: 137910   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:05:50,428-Speed 5403.90 samples/sec   Loss 3.7224   LearningRate 0.0373   Epoch: 13   Global Step: 137920   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:05:58,054-Speed 5372.21 samples/sec   Loss 3.7568   LearningRate 0.0373   Epoch: 13   Global Step: 137930   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:06:05,639-Speed 5400.69 samples/sec   Loss 3.7639   LearningRate 0.0373   Epoch: 13   Global Step: 137940   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:06:13,165-Speed 5443.08 samples/sec   Loss 3.7704   LearningRate 0.0373   Epoch: 13   Global Step: 137950   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:06:20,733-Speed 5412.83 samples/sec   Loss 3.7371   LearningRate 0.0372   Epoch: 13   Global Step: 137960   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:06:28,268-Speed 5436.46 samples/sec   Loss 3.7566   LearningRate 0.0372   Epoch: 13   Global Step: 137970   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:06:35,844-Speed 5407.27 samples/sec   Loss 3.7516   LearningRate 0.0372   Epoch: 13   Global Step: 137980   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:06:43,398-Speed 5423.38 samples/sec   Loss 3.7388   LearningRate 0.0372   Epoch: 13   Global Step: 137990   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:06:50,948-Speed 5425.56 samples/sec   Loss 3.7260   LearningRate 0.0372   Epoch: 13   Global Step: 138000   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:07:35,296-[lfw][138000]XNorm: 22.762273
Training: 2022-01-09 02:07:35,297-[lfw][138000]Accuracy-Flip: 0.99800+-0.00296
Training: 2022-01-09 02:07:35,298-[lfw][138000]Accuracy-Highest: 0.99817
Training: 2022-01-09 02:08:26,870-[cfp_fp][138000]XNorm: 21.471200
Training: 2022-01-09 02:08:26,870-[cfp_fp][138000]Accuracy-Flip: 0.99186+-0.00367
Training: 2022-01-09 02:08:26,871-[cfp_fp][138000]Accuracy-Highest: 0.99186
Training: 2022-01-09 02:09:11,247-[agedb_30][138000]XNorm: 22.569319
Training: 2022-01-09 02:09:11,248-[agedb_30][138000]Accuracy-Flip: 0.97950+-0.00843
Training: 2022-01-09 02:09:11,249-[agedb_30][138000]Accuracy-Highest: 0.98067
Training: 2022-01-09 02:09:18,796-Speed 277.04 samples/sec   Loss 3.7616   LearningRate 0.0372   Epoch: 13   Global Step: 138010   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:09:26,240-Speed 5503.08 samples/sec   Loss 3.7489   LearningRate 0.0372   Epoch: 13   Global Step: 138020   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:09:33,823-Speed 5402.14 samples/sec   Loss 3.7245   LearningRate 0.0372   Epoch: 13   Global Step: 138030   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:09:41,432-Speed 5384.40 samples/sec   Loss 3.6847   LearningRate 0.0372   Epoch: 13   Global Step: 138040   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:09:48,876-Speed 5502.40 samples/sec   Loss 3.6946   LearningRate 0.0372   Epoch: 13   Global Step: 138050   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:09:56,569-Speed 5325.41 samples/sec   Loss 3.7292   LearningRate 0.0371   Epoch: 13   Global Step: 138060   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:10:04,118-Speed 5426.26 samples/sec   Loss 3.7400   LearningRate 0.0371   Epoch: 13   Global Step: 138070   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:10:11,702-Speed 5402.14 samples/sec   Loss 3.7122   LearningRate 0.0371   Epoch: 13   Global Step: 138080   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:10:19,330-Speed 5370.23 samples/sec   Loss 3.7222   LearningRate 0.0371   Epoch: 13   Global Step: 138090   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:10:26,896-Speed 5414.39 samples/sec   Loss 3.7063   LearningRate 0.0371   Epoch: 13   Global Step: 138100   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:10:34,368-Speed 5481.77 samples/sec   Loss 3.7568   LearningRate 0.0371   Epoch: 13   Global Step: 138110   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:10:41,876-Speed 5456.68 samples/sec   Loss 3.7299   LearningRate 0.0371   Epoch: 13   Global Step: 138120   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:10:49,375-Speed 5462.38 samples/sec   Loss 3.7033   LearningRate 0.0371   Epoch: 13   Global Step: 138130   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:10:56,883-Speed 5456.62 samples/sec   Loss 3.7380   LearningRate 0.0371   Epoch: 13   Global Step: 138140   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:11:04,345-Speed 5489.56 samples/sec   Loss 3.7189   LearningRate 0.0370   Epoch: 13   Global Step: 138150   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:11:11,868-Speed 5445.91 samples/sec   Loss 3.6471   LearningRate 0.0370   Epoch: 13   Global Step: 138160   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:11:19,431-Speed 5416.77 samples/sec   Loss 3.7136   LearningRate 0.0370   Epoch: 13   Global Step: 138170   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:11:26,901-Speed 5483.45 samples/sec   Loss 3.7324   LearningRate 0.0370   Epoch: 13   Global Step: 138180   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:11:34,490-Speed 5398.12 samples/sec   Loss 3.7531   LearningRate 0.0370   Epoch: 13   Global Step: 138190   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:11:41,971-Speed 5475.87 samples/sec   Loss 3.7238   LearningRate 0.0370   Epoch: 13   Global Step: 138200   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:11:49,591-Speed 5375.80 samples/sec   Loss 3.7240   LearningRate 0.0370   Epoch: 13   Global Step: 138210   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:11:57,036-Speed 5502.67 samples/sec   Loss 3.7188   LearningRate 0.0370   Epoch: 13   Global Step: 138220   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:12:04,560-Speed 5444.31 samples/sec   Loss 3.7506   LearningRate 0.0370   Epoch: 13   Global Step: 138230   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:12:12,113-Speed 5424.55 samples/sec   Loss 3.6995   LearningRate 0.0369   Epoch: 13   Global Step: 138240   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:12:19,651-Speed 5434.46 samples/sec   Loss 3.7111   LearningRate 0.0369   Epoch: 13   Global Step: 138250   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:12:27,106-Speed 5495.15 samples/sec   Loss 3.6959   LearningRate 0.0369   Epoch: 13   Global Step: 138260   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:12:34,622-Speed 5450.26 samples/sec   Loss 3.6966   LearningRate 0.0369   Epoch: 13   Global Step: 138270   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:12:42,143-Speed 5446.87 samples/sec   Loss 3.6945   LearningRate 0.0369   Epoch: 13   Global Step: 138280   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:12:49,658-Speed 5450.65 samples/sec   Loss 3.6992   LearningRate 0.0369   Epoch: 13   Global Step: 138290   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:12:57,172-Speed 5451.88 samples/sec   Loss 3.7443   LearningRate 0.0369   Epoch: 13   Global Step: 138300   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:13:04,709-Speed 5435.24 samples/sec   Loss 3.7014   LearningRate 0.0369   Epoch: 13   Global Step: 138310   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:13:12,259-Speed 5426.48 samples/sec   Loss 3.6649   LearningRate 0.0369   Epoch: 13   Global Step: 138320   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:13:19,848-Speed 5397.63 samples/sec   Loss 3.7135   LearningRate 0.0369   Epoch: 13   Global Step: 138330   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:13:27,382-Speed 5437.54 samples/sec   Loss 3.6951   LearningRate 0.0368   Epoch: 13   Global Step: 138340   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:13:34,915-Speed 5437.95 samples/sec   Loss 3.7134   LearningRate 0.0368   Epoch: 13   Global Step: 138350   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:13:42,420-Speed 5457.99 samples/sec   Loss 3.6808   LearningRate 0.0368   Epoch: 13   Global Step: 138360   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:13:49,920-Speed 5462.81 samples/sec   Loss 3.7418   LearningRate 0.0368   Epoch: 13   Global Step: 138370   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:13:57,464-Speed 5429.70 samples/sec   Loss 3.6837   LearningRate 0.0368   Epoch: 13   Global Step: 138380   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:14:05,140-Speed 5337.60 samples/sec   Loss 3.6750   LearningRate 0.0368   Epoch: 13   Global Step: 138390   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:14:12,642-Speed 5459.88 samples/sec   Loss 3.7820   LearningRate 0.0368   Epoch: 13   Global Step: 138400   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:14:20,148-Speed 5458.30 samples/sec   Loss 3.7148   LearningRate 0.0368   Epoch: 13   Global Step: 138410   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:14:27,697-Speed 5426.43 samples/sec   Loss 3.7586   LearningRate 0.0368   Epoch: 13   Global Step: 138420   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:14:35,204-Speed 5456.77 samples/sec   Loss 3.7163   LearningRate 0.0367   Epoch: 13   Global Step: 138430   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:14:42,774-Speed 5411.69 samples/sec   Loss 3.7230   LearningRate 0.0367   Epoch: 13   Global Step: 138440   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:14:50,356-Speed 5403.08 samples/sec   Loss 3.7048   LearningRate 0.0367   Epoch: 13   Global Step: 138450   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-09 02:14:57,853-Speed 5463.77 samples/sec   Loss 3.7431   LearningRate 0.0367   Epoch: 13   Global Step: 138460   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-09 02:15:05,370-Speed 5449.76 samples/sec   Loss 3.6964   LearningRate 0.0367   Epoch: 13   Global Step: 138470   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:15:12,871-Speed 5461.61 samples/sec   Loss 3.6810   LearningRate 0.0367   Epoch: 13   Global Step: 138480   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:15:20,392-Speed 5447.06 samples/sec   Loss 3.6582   LearningRate 0.0367   Epoch: 13   Global Step: 138490   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:15:28,111-Speed 5306.83 samples/sec   Loss 3.7125   LearningRate 0.0367   Epoch: 13   Global Step: 138500   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:15:35,609-Speed 5463.47 samples/sec   Loss 3.7341   LearningRate 0.0367   Epoch: 13   Global Step: 138510   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:15:43,203-Speed 5394.08 samples/sec   Loss 3.7314   LearningRate 0.0367   Epoch: 13   Global Step: 138520   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:15:50,727-Speed 5445.16 samples/sec   Loss 3.6947   LearningRate 0.0366   Epoch: 13   Global Step: 138530   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:15:58,305-Speed 5405.62 samples/sec   Loss 3.7145   LearningRate 0.0366   Epoch: 13   Global Step: 138540   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:16:05,920-Speed 5379.12 samples/sec   Loss 3.6691   LearningRate 0.0366   Epoch: 13   Global Step: 138550   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:16:13,588-Speed 5342.65 samples/sec   Loss 3.7316   LearningRate 0.0366   Epoch: 13   Global Step: 138560   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:16:21,231-Speed 5360.46 samples/sec   Loss 3.6982   LearningRate 0.0366   Epoch: 13   Global Step: 138570   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-09 02:16:28,839-Speed 5384.31 samples/sec   Loss 3.7210   LearningRate 0.0366   Epoch: 13   Global Step: 138580   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-09 02:16:36,314-Speed 5480.55 samples/sec   Loss 3.7602   LearningRate 0.0366   Epoch: 13   Global Step: 138590   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:16:43,897-Speed 5401.77 samples/sec   Loss 3.7083   LearningRate 0.0366   Epoch: 13   Global Step: 138600   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:16:51,557-Speed 5348.53 samples/sec   Loss 3.7008   LearningRate 0.0366   Epoch: 13   Global Step: 138610   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:16:59,146-Speed 5397.67 samples/sec   Loss 3.6501   LearningRate 0.0365   Epoch: 13   Global Step: 138620   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:17:06,620-Speed 5481.52 samples/sec   Loss 3.6647   LearningRate 0.0365   Epoch: 13   Global Step: 138630   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:17:14,143-Speed 5445.18 samples/sec   Loss 3.6740   LearningRate 0.0365   Epoch: 13   Global Step: 138640   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:17:21,673-Speed 5439.74 samples/sec   Loss 3.7293   LearningRate 0.0365   Epoch: 13   Global Step: 138650   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:17:29,188-Speed 5451.26 samples/sec   Loss 3.7128   LearningRate 0.0365   Epoch: 13   Global Step: 138660   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:17:36,699-Speed 5454.25 samples/sec   Loss 3.7042   LearningRate 0.0365   Epoch: 13   Global Step: 138670   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:17:44,179-Speed 5475.93 samples/sec   Loss 3.7042   LearningRate 0.0365   Epoch: 13   Global Step: 138680   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:17:51,646-Speed 5486.35 samples/sec   Loss 3.6834   LearningRate 0.0365   Epoch: 13   Global Step: 138690   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:17:59,111-Speed 5487.93 samples/sec   Loss 3.6884   LearningRate 0.0365   Epoch: 13   Global Step: 138700   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:18:06,651-Speed 5432.77 samples/sec   Loss 3.7406   LearningRate 0.0364   Epoch: 13   Global Step: 138710   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:18:14,174-Speed 5445.61 samples/sec   Loss 3.6437   LearningRate 0.0364   Epoch: 13   Global Step: 138720   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:18:21,776-Speed 5388.08 samples/sec   Loss 3.6778   LearningRate 0.0364   Epoch: 13   Global Step: 138730   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:18:29,299-Speed 5446.05 samples/sec   Loss 3.6349   LearningRate 0.0364   Epoch: 13   Global Step: 138740   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:18:36,820-Speed 5446.91 samples/sec   Loss 3.7022   LearningRate 0.0364   Epoch: 13   Global Step: 138750   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:18:44,282-Speed 5489.23 samples/sec   Loss 3.7077   LearningRate 0.0364   Epoch: 13   Global Step: 138760   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:18:51,797-Speed 5451.21 samples/sec   Loss 3.6776   LearningRate 0.0364   Epoch: 13   Global Step: 138770   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:18:59,425-Speed 5370.84 samples/sec   Loss 3.7194   LearningRate 0.0364   Epoch: 13   Global Step: 138780   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:19:06,984-Speed 5419.06 samples/sec   Loss 3.6767   LearningRate 0.0364   Epoch: 13   Global Step: 138790   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:19:14,504-Speed 5447.36 samples/sec   Loss 3.6582   LearningRate 0.0364   Epoch: 13   Global Step: 138800   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:19:22,097-Speed 5395.45 samples/sec   Loss 3.6955   LearningRate 0.0363   Epoch: 13   Global Step: 138810   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:19:29,579-Speed 5474.87 samples/sec   Loss 3.6985   LearningRate 0.0363   Epoch: 13   Global Step: 138820   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:19:37,156-Speed 5407.39 samples/sec   Loss 3.7107   LearningRate 0.0363   Epoch: 13   Global Step: 138830   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:19:44,743-Speed 5398.85 samples/sec   Loss 3.6884   LearningRate 0.0363   Epoch: 13   Global Step: 138840   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:19:52,376-Speed 5366.94 samples/sec   Loss 3.6432   LearningRate 0.0363   Epoch: 13   Global Step: 138850   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:20:00,101-Speed 5303.88 samples/sec   Loss 3.7118   LearningRate 0.0363   Epoch: 13   Global Step: 138860   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:20:07,732-Speed 5368.08 samples/sec   Loss 3.6537   LearningRate 0.0363   Epoch: 13   Global Step: 138870   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:20:15,204-Speed 5482.63 samples/sec   Loss 3.7023   LearningRate 0.0363   Epoch: 13   Global Step: 138880   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:20:23,044-Speed 5224.83 samples/sec   Loss 3.6495   LearningRate 0.0363   Epoch: 13   Global Step: 138890   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:20:30,580-Speed 5436.01 samples/sec   Loss 3.6841   LearningRate 0.0362   Epoch: 13   Global Step: 138900   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:20:38,088-Speed 5456.36 samples/sec   Loss 3.7245   LearningRate 0.0362   Epoch: 13   Global Step: 138910   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:20:45,603-Speed 5451.58 samples/sec   Loss 3.6884   LearningRate 0.0362   Epoch: 13   Global Step: 138920   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:20:53,081-Speed 5477.67 samples/sec   Loss 3.6495   LearningRate 0.0362   Epoch: 13   Global Step: 138930   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:21:00,570-Speed 5469.85 samples/sec   Loss 3.6801   LearningRate 0.0362   Epoch: 13   Global Step: 138940   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:21:08,091-Speed 5447.37 samples/sec   Loss 3.6789   LearningRate 0.0362   Epoch: 13   Global Step: 138950   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:21:15,634-Speed 5430.90 samples/sec   Loss 3.6556   LearningRate 0.0362   Epoch: 13   Global Step: 138960   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:21:23,185-Speed 5424.95 samples/sec   Loss 3.6550   LearningRate 0.0362   Epoch: 13   Global Step: 138970   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:21:30,781-Speed 5392.36 samples/sec   Loss 3.6544   LearningRate 0.0362   Epoch: 13   Global Step: 138980   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:21:38,316-Speed 5437.01 samples/sec   Loss 3.7097   LearningRate 0.0362   Epoch: 13   Global Step: 138990   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:21:45,828-Speed 5454.08 samples/sec   Loss 3.6797   LearningRate 0.0361   Epoch: 13   Global Step: 139000   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:21:53,293-Speed 5486.79 samples/sec   Loss 3.6590   LearningRate 0.0361   Epoch: 13   Global Step: 139010   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:22:00,862-Speed 5412.39 samples/sec   Loss 3.6523   LearningRate 0.0361   Epoch: 13   Global Step: 139020   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:22:08,332-Speed 5484.52 samples/sec   Loss 3.6375   LearningRate 0.0361   Epoch: 13   Global Step: 139030   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:22:15,785-Speed 5496.37 samples/sec   Loss 3.6762   LearningRate 0.0361   Epoch: 13   Global Step: 139040   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:22:23,235-Speed 5498.22 samples/sec   Loss 3.6848   LearningRate 0.0361   Epoch: 13   Global Step: 139050   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:22:30,667-Speed 5512.45 samples/sec   Loss 3.7098   LearningRate 0.0361   Epoch: 13   Global Step: 139060   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:22:38,161-Speed 5466.76 samples/sec   Loss 3.6633   LearningRate 0.0361   Epoch: 13   Global Step: 139070   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:22:45,767-Speed 5385.66 samples/sec   Loss 3.6805   LearningRate 0.0361   Epoch: 13   Global Step: 139080   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:22:53,303-Speed 5435.99 samples/sec   Loss 3.6914   LearningRate 0.0360   Epoch: 13   Global Step: 139090   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:23:00,728-Speed 5516.84 samples/sec   Loss 3.6400   LearningRate 0.0360   Epoch: 13   Global Step: 139100   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:23:08,274-Speed 5428.69 samples/sec   Loss 3.6983   LearningRate 0.0360   Epoch: 13   Global Step: 139110   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:23:15,762-Speed 5471.56 samples/sec   Loss 3.6653   LearningRate 0.0360   Epoch: 13   Global Step: 139120   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:23:23,361-Speed 5390.34 samples/sec   Loss 3.6721   LearningRate 0.0360   Epoch: 13   Global Step: 139130   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:23:30,958-Speed 5392.72 samples/sec   Loss 3.7068   LearningRate 0.0360   Epoch: 13   Global Step: 139140   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:23:38,471-Speed 5452.63 samples/sec   Loss 3.6405   LearningRate 0.0360   Epoch: 13   Global Step: 139150   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:23:45,957-Speed 5472.25 samples/sec   Loss 3.6292   LearningRate 0.0360   Epoch: 13   Global Step: 139160   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:23:53,435-Speed 5478.69 samples/sec   Loss 3.7053   LearningRate 0.0360   Epoch: 13   Global Step: 139170   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:24:00,981-Speed 5428.08 samples/sec   Loss 3.7123   LearningRate 0.0360   Epoch: 13   Global Step: 139180   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:24:08,493-Speed 5453.20 samples/sec   Loss 3.6664   LearningRate 0.0359   Epoch: 13   Global Step: 139190   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:24:16,006-Speed 5453.24 samples/sec   Loss 3.6988   LearningRate 0.0359   Epoch: 13   Global Step: 139200   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:24:23,600-Speed 5394.08 samples/sec   Loss 3.6448   LearningRate 0.0359   Epoch: 13   Global Step: 139210   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:24:31,051-Speed 5498.08 samples/sec   Loss 3.6762   LearningRate 0.0359   Epoch: 13   Global Step: 139220   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:24:38,456-Speed 5531.86 samples/sec   Loss 3.6188   LearningRate 0.0359   Epoch: 13   Global Step: 139230   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:24:45,911-Speed 5495.44 samples/sec   Loss 3.6852   LearningRate 0.0359   Epoch: 13   Global Step: 139240   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-09 02:24:53,424-Speed 5452.90 samples/sec   Loss 3.6611   LearningRate 0.0359   Epoch: 13   Global Step: 139250   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:25:00,888-Speed 5487.96 samples/sec   Loss 3.6913   LearningRate 0.0359   Epoch: 13   Global Step: 139260   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:25:08,405-Speed 5450.04 samples/sec   Loss 3.6906   LearningRate 0.0359   Epoch: 13   Global Step: 139270   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:25:15,967-Speed 5417.71 samples/sec   Loss 3.5977   LearningRate 0.0358   Epoch: 13   Global Step: 139280   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:25:23,641-Speed 5338.18 samples/sec   Loss 3.6501   LearningRate 0.0358   Epoch: 13   Global Step: 139290   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:25:31,225-Speed 5401.44 samples/sec   Loss 3.6496   LearningRate 0.0358   Epoch: 13   Global Step: 139300   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:25:38,819-Speed 5394.48 samples/sec   Loss 3.6912   LearningRate 0.0358   Epoch: 13   Global Step: 139310   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:25:46,356-Speed 5435.36 samples/sec   Loss 3.6132   LearningRate 0.0358   Epoch: 13   Global Step: 139320   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:25:53,803-Speed 5501.04 samples/sec   Loss 3.6394   LearningRate 0.0358   Epoch: 13   Global Step: 139330   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:26:01,349-Speed 5428.60 samples/sec   Loss 3.6457   LearningRate 0.0358   Epoch: 13   Global Step: 139340   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:26:08,914-Speed 5415.28 samples/sec   Loss 3.6285   LearningRate 0.0358   Epoch: 13   Global Step: 139350   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:26:16,492-Speed 5405.94 samples/sec   Loss 3.6744   LearningRate 0.0358   Epoch: 13   Global Step: 139360   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:26:24,091-Speed 5390.58 samples/sec   Loss 3.6693   LearningRate 0.0358   Epoch: 13   Global Step: 139370   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:26:31,555-Speed 5488.98 samples/sec   Loss 3.6636   LearningRate 0.0357   Epoch: 13   Global Step: 139380   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:26:39,106-Speed 5424.68 samples/sec   Loss 3.6742   LearningRate 0.0357   Epoch: 13   Global Step: 139390   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:26:46,629-Speed 5446.03 samples/sec   Loss 3.6588   LearningRate 0.0357   Epoch: 13   Global Step: 139400   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:26:54,146-Speed 5449.58 samples/sec   Loss 3.6797   LearningRate 0.0357   Epoch: 13   Global Step: 139410   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:27:01,777-Speed 5367.79 samples/sec   Loss 3.6311   LearningRate 0.0357   Epoch: 13   Global Step: 139420   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:27:09,316-Speed 5433.92 samples/sec   Loss 3.6340   LearningRate 0.0357   Epoch: 13   Global Step: 139430   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:27:16,813-Speed 5464.24 samples/sec   Loss 3.6420   LearningRate 0.0357   Epoch: 13   Global Step: 139440   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:27:24,428-Speed 5379.41 samples/sec   Loss 3.6525   LearningRate 0.0357   Epoch: 13   Global Step: 139450   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:27:31,954-Speed 5442.94 samples/sec   Loss 3.6903   LearningRate 0.0357   Epoch: 13   Global Step: 139460   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:27:39,502-Speed 5427.13 samples/sec   Loss 3.6816   LearningRate 0.0356   Epoch: 13   Global Step: 139470   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-09 02:27:47,037-Speed 5437.19 samples/sec   Loss 3.6413   LearningRate 0.0356   Epoch: 13   Global Step: 139480   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:27:54,775-Speed 5294.00 samples/sec   Loss 3.6058   LearningRate 0.0356   Epoch: 13   Global Step: 139490   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:28:02,383-Speed 5384.59 samples/sec   Loss 3.5981   LearningRate 0.0356   Epoch: 13   Global Step: 139500   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:28:09,961-Speed 5405.62 samples/sec   Loss 3.6586   LearningRate 0.0356   Epoch: 13   Global Step: 139510   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:28:17,528-Speed 5414.15 samples/sec   Loss 3.6577   LearningRate 0.0356   Epoch: 13   Global Step: 139520   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:28:25,071-Speed 5430.88 samples/sec   Loss 3.6765   LearningRate 0.0356   Epoch: 13   Global Step: 139530   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:28:32,674-Speed 5388.28 samples/sec   Loss 3.6450   LearningRate 0.0356   Epoch: 13   Global Step: 139540   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:28:40,238-Speed 5415.54 samples/sec   Loss 3.6345   LearningRate 0.0356   Epoch: 13   Global Step: 139550   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:28:47,777-Speed 5434.30 samples/sec   Loss 3.6188   LearningRate 0.0356   Epoch: 13   Global Step: 139560   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:28:55,441-Speed 5345.17 samples/sec   Loss 3.6627   LearningRate 0.0355   Epoch: 13   Global Step: 139570   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:29:02,962-Speed 5446.43 samples/sec   Loss 3.6482   LearningRate 0.0355   Epoch: 13   Global Step: 139580   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:29:10,542-Speed 5404.53 samples/sec   Loss 3.6234   LearningRate 0.0355   Epoch: 13   Global Step: 139590   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:29:18,004-Speed 5490.44 samples/sec   Loss 3.6436   LearningRate 0.0355   Epoch: 13   Global Step: 139600   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:29:25,595-Speed 5396.44 samples/sec   Loss 3.6210   LearningRate 0.0355   Epoch: 13   Global Step: 139610   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:29:33,132-Speed 5435.24 samples/sec   Loss 3.6271   LearningRate 0.0355   Epoch: 13   Global Step: 139620   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:29:40,769-Speed 5363.61 samples/sec   Loss 3.6633   LearningRate 0.0355   Epoch: 13   Global Step: 139630   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:29:48,204-Speed 5510.18 samples/sec   Loss 3.7020   LearningRate 0.0355   Epoch: 13   Global Step: 139640   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:29:55,232-Speed 5829.17 samples/sec   Loss 3.6238   LearningRate 0.0355   Epoch: 13   Global Step: 139650   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-09 02:30:02,201-Speed 5877.87 samples/sec   Loss 3.6354   LearningRate 0.0354   Epoch: 13   Global Step: 139660   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:30:09,204-Speed 5849.69 samples/sec   Loss 3.6361   LearningRate 0.0354   Epoch: 13   Global Step: 139670   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:30:16,202-Speed 5854.08 samples/sec   Loss 3.6708   LearningRate 0.0354   Epoch: 13   Global Step: 139680   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-09 02:30:23,690-Speed 5471.16 samples/sec   Loss 3.6663   LearningRate 0.0354   Epoch: 13   Global Step: 139690   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:30:31,163-Speed 5481.46 samples/sec   Loss 3.6083   LearningRate 0.0354   Epoch: 13   Global Step: 139700   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:30:38,795-Speed 5367.38 samples/sec   Loss 3.6108   LearningRate 0.0354   Epoch: 13   Global Step: 139710   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:30:46,360-Speed 5415.14 samples/sec   Loss 3.6475   LearningRate 0.0354   Epoch: 13   Global Step: 139720   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:30:53,864-Speed 5459.32 samples/sec   Loss 3.6996   LearningRate 0.0354   Epoch: 13   Global Step: 139730   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:31:01,407-Speed 5430.94 samples/sec   Loss 3.6197   LearningRate 0.0354   Epoch: 13   Global Step: 139740   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:31:08,964-Speed 5420.63 samples/sec   Loss 3.6132   LearningRate 0.0354   Epoch: 13   Global Step: 139750   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:31:16,554-Speed 5397.42 samples/sec   Loss 3.5903   LearningRate 0.0353   Epoch: 13   Global Step: 139760   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:31:24,084-Speed 5440.00 samples/sec   Loss 3.6159   LearningRate 0.0353   Epoch: 13   Global Step: 139770   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:31:31,619-Speed 5437.20 samples/sec   Loss 3.6343   LearningRate 0.0353   Epoch: 13   Global Step: 139780   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:31:39,069-Speed 5498.11 samples/sec   Loss 3.6287   LearningRate 0.0353   Epoch: 13   Global Step: 139790   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:31:46,593-Speed 5445.00 samples/sec   Loss 3.6431   LearningRate 0.0353   Epoch: 13   Global Step: 139800   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:31:54,157-Speed 5416.28 samples/sec   Loss 3.6185   LearningRate 0.0353   Epoch: 13   Global Step: 139810   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:32:01,756-Speed 5390.31 samples/sec   Loss 3.6773   LearningRate 0.0353   Epoch: 13   Global Step: 139820   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:32:09,597-Speed 5224.76 samples/sec   Loss 3.6337   LearningRate 0.0353   Epoch: 13   Global Step: 139830   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:32:17,218-Speed 5375.07 samples/sec   Loss 3.6080   LearningRate 0.0353   Epoch: 13   Global Step: 139840   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:32:24,743-Speed 5444.17 samples/sec   Loss 3.6640   LearningRate 0.0352   Epoch: 13   Global Step: 139850   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:32:32,264-Speed 5447.14 samples/sec   Loss 3.7000   LearningRate 0.0352   Epoch: 13   Global Step: 139860   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:32:39,951-Speed 5328.88 samples/sec   Loss 3.6216   LearningRate 0.0352   Epoch: 13   Global Step: 139870   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:32:47,430-Speed 5477.49 samples/sec   Loss 3.6282   LearningRate 0.0352   Epoch: 13   Global Step: 139880   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:32:55,007-Speed 5406.66 samples/sec   Loss 3.6245   LearningRate 0.0352   Epoch: 13   Global Step: 139890   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:33:02,481-Speed 5480.77 samples/sec   Loss 3.6218   LearningRate 0.0352   Epoch: 13   Global Step: 139900   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:33:10,424-Speed 5157.40 samples/sec   Loss 3.6521   LearningRate 0.0352   Epoch: 13   Global Step: 139910   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:33:18,057-Speed 5366.78 samples/sec   Loss 3.6190   LearningRate 0.0352   Epoch: 13   Global Step: 139920   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:33:25,750-Speed 5325.26 samples/sec   Loss 3.5811   LearningRate 0.0352   Epoch: 13   Global Step: 139930   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:33:33,341-Speed 5396.63 samples/sec   Loss 3.6096   LearningRate 0.0352   Epoch: 13   Global Step: 139940   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:33:40,922-Speed 5403.61 samples/sec   Loss 3.6196   LearningRate 0.0351   Epoch: 13   Global Step: 139950   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:33:48,504-Speed 5403.14 samples/sec   Loss 3.6301   LearningRate 0.0351   Epoch: 13   Global Step: 139960   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:33:56,217-Speed 5310.52 samples/sec   Loss 3.6605   LearningRate 0.0351   Epoch: 13   Global Step: 139970   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:34:03,837-Speed 5376.91 samples/sec   Loss 3.6638   LearningRate 0.0351   Epoch: 13   Global Step: 139980   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:34:11,346-Speed 5455.78 samples/sec   Loss 3.6225   LearningRate 0.0351   Epoch: 13   Global Step: 139990   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:34:18,878-Speed 5438.37 samples/sec   Loss 3.6519   LearningRate 0.0351   Epoch: 13   Global Step: 140000   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:35:02,570-[lfw][140000]XNorm: 22.488085
Training: 2022-01-09 02:35:02,571-[lfw][140000]Accuracy-Flip: 0.99800+-0.00296
Training: 2022-01-09 02:35:02,572-[lfw][140000]Accuracy-Highest: 0.99817
Training: 2022-01-09 02:35:53,391-[cfp_fp][140000]XNorm: 21.077209
Training: 2022-01-09 02:35:53,392-[cfp_fp][140000]Accuracy-Flip: 0.99157+-0.00370
Training: 2022-01-09 02:35:53,392-[cfp_fp][140000]Accuracy-Highest: 0.99186
Training: 2022-01-09 02:36:37,207-[agedb_30][140000]XNorm: 22.369857
Training: 2022-01-09 02:36:37,208-[agedb_30][140000]Accuracy-Flip: 0.98067+-0.00620
Training: 2022-01-09 02:36:37,208-[agedb_30][140000]Accuracy-Highest: 0.98067
Training: 2022-01-09 02:36:44,854-Speed 280.60 samples/sec   Loss 3.6122   LearningRate 0.0351   Epoch: 13   Global Step: 140010   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:36:52,447-Speed 5394.71 samples/sec   Loss 3.6320   LearningRate 0.0351   Epoch: 13   Global Step: 140020   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:37:00,028-Speed 5403.33 samples/sec   Loss 3.6185   LearningRate 0.0351   Epoch: 13   Global Step: 140030   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:37:07,570-Speed 5432.50 samples/sec   Loss 3.6573   LearningRate 0.0350   Epoch: 13   Global Step: 140040   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:37:15,156-Speed 5399.59 samples/sec   Loss 3.6381   LearningRate 0.0350   Epoch: 13   Global Step: 140050   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:37:22,719-Speed 5416.79 samples/sec   Loss 3.6063   LearningRate 0.0350   Epoch: 13   Global Step: 140060   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:37:30,218-Speed 5462.45 samples/sec   Loss 3.6520   LearningRate 0.0350   Epoch: 13   Global Step: 140070   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:37:37,794-Speed 5407.15 samples/sec   Loss 3.5989   LearningRate 0.0350   Epoch: 13   Global Step: 140080   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:37:45,336-Speed 5431.93 samples/sec   Loss 3.6559   LearningRate 0.0350   Epoch: 13   Global Step: 140090   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:37:52,790-Speed 5495.37 samples/sec   Loss 3.6150   LearningRate 0.0350   Epoch: 13   Global Step: 140100   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:38:00,297-Speed 5457.21 samples/sec   Loss 3.5827   LearningRate 0.0350   Epoch: 13   Global Step: 140110   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:38:07,790-Speed 5467.23 samples/sec   Loss 3.6030   LearningRate 0.0350   Epoch: 13   Global Step: 140120   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:38:15,357-Speed 5413.41 samples/sec   Loss 3.6510   LearningRate 0.0350   Epoch: 13   Global Step: 140130   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-09 02:38:22,933-Speed 5407.49 samples/sec   Loss 3.6348   LearningRate 0.0349   Epoch: 13   Global Step: 140140   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:38:30,494-Speed 5417.96 samples/sec   Loss 3.6276   LearningRate 0.0349   Epoch: 13   Global Step: 140150   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:38:38,028-Speed 5437.27 samples/sec   Loss 3.5986   LearningRate 0.0349   Epoch: 13   Global Step: 140160   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:38:45,511-Speed 5474.50 samples/sec   Loss 3.5898   LearningRate 0.0349   Epoch: 13   Global Step: 140170   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:38:53,023-Speed 5453.69 samples/sec   Loss 3.5822   LearningRate 0.0349   Epoch: 13   Global Step: 140180   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:39:00,496-Speed 5481.41 samples/sec   Loss 3.5871   LearningRate 0.0349   Epoch: 13   Global Step: 140190   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:39:08,073-Speed 5406.49 samples/sec   Loss 3.6392   LearningRate 0.0349   Epoch: 13   Global Step: 140200   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:39:15,617-Speed 5430.12 samples/sec   Loss 3.5882   LearningRate 0.0349   Epoch: 13   Global Step: 140210   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:39:23,185-Speed 5413.03 samples/sec   Loss 3.6116   LearningRate 0.0349   Epoch: 13   Global Step: 140220   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:39:30,710-Speed 5443.54 samples/sec   Loss 3.6193   LearningRate 0.0349   Epoch: 13   Global Step: 140230   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:39:38,151-Speed 5505.93 samples/sec   Loss 3.6305   LearningRate 0.0348   Epoch: 13   Global Step: 140240   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-09 02:39:45,577-Speed 5516.22 samples/sec   Loss 3.5911   LearningRate 0.0348   Epoch: 13   Global Step: 140250   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:39:53,162-Speed 5400.91 samples/sec   Loss 3.5735   LearningRate 0.0348   Epoch: 13   Global Step: 140260   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:40:00,610-Speed 5500.20 samples/sec   Loss 3.5891   LearningRate 0.0348   Epoch: 13   Global Step: 140270   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:40:08,229-Speed 5376.82 samples/sec   Loss 3.6176   LearningRate 0.0348   Epoch: 13   Global Step: 140280   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:40:15,688-Speed 5491.53 samples/sec   Loss 3.6032   LearningRate 0.0348   Epoch: 13   Global Step: 140290   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:40:23,348-Speed 5348.19 samples/sec   Loss 3.6157   LearningRate 0.0348   Epoch: 13   Global Step: 140300   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:40:30,804-Speed 5494.75 samples/sec   Loss 3.6170   LearningRate 0.0348   Epoch: 13   Global Step: 140310   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:40:38,290-Speed 5472.22 samples/sec   Loss 3.6144   LearningRate 0.0348   Epoch: 13   Global Step: 140320   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:40:45,928-Speed 5363.25 samples/sec   Loss 3.6037   LearningRate 0.0347   Epoch: 13   Global Step: 140330   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:40:53,412-Speed 5473.78 samples/sec   Loss 3.5988   LearningRate 0.0347   Epoch: 13   Global Step: 140340   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:41:00,867-Speed 5494.81 samples/sec   Loss 3.6214   LearningRate 0.0347   Epoch: 13   Global Step: 140350   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:41:08,423-Speed 5420.89 samples/sec   Loss 3.6007   LearningRate 0.0347   Epoch: 13   Global Step: 140360   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:41:15,895-Speed 5483.02 samples/sec   Loss 3.6097   LearningRate 0.0347   Epoch: 13   Global Step: 140370   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:41:23,390-Speed 5465.38 samples/sec   Loss 3.6195   LearningRate 0.0347   Epoch: 13   Global Step: 140380   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:41:30,977-Speed 5399.49 samples/sec   Loss 3.6208   LearningRate 0.0347   Epoch: 13   Global Step: 140390   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:41:38,449-Speed 5482.85 samples/sec   Loss 3.6001   LearningRate 0.0347   Epoch: 13   Global Step: 140400   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:41:45,980-Speed 5439.27 samples/sec   Loss 3.5547   LearningRate 0.0347   Epoch: 13   Global Step: 140410   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:41:53,536-Speed 5421.62 samples/sec   Loss 3.6147   LearningRate 0.0347   Epoch: 13   Global Step: 140420   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:42:01,085-Speed 5426.76 samples/sec   Loss 3.6561   LearningRate 0.0346   Epoch: 13   Global Step: 140430   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:42:08,537-Speed 5497.12 samples/sec   Loss 3.6180   LearningRate 0.0346   Epoch: 13   Global Step: 140440   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:42:16,043-Speed 5457.20 samples/sec   Loss 3.6208   LearningRate 0.0346   Epoch: 13   Global Step: 140450   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:42:23,642-Speed 5391.43 samples/sec   Loss 3.6116   LearningRate 0.0346   Epoch: 13   Global Step: 140460   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:42:31,232-Speed 5397.40 samples/sec   Loss 3.5744   LearningRate 0.0346   Epoch: 13   Global Step: 140470   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:42:38,714-Speed 5474.61 samples/sec   Loss 3.6190   LearningRate 0.0346   Epoch: 13   Global Step: 140480   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:42:46,260-Speed 5428.99 samples/sec   Loss 3.6198   LearningRate 0.0346   Epoch: 13   Global Step: 140490   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:42:53,808-Speed 5427.74 samples/sec   Loss 3.6145   LearningRate 0.0346   Epoch: 13   Global Step: 140500   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:43:01,411-Speed 5387.80 samples/sec   Loss 3.5320   LearningRate 0.0346   Epoch: 13   Global Step: 140510   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:43:08,937-Speed 5443.07 samples/sec   Loss 3.6213   LearningRate 0.0346   Epoch: 13   Global Step: 140520   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:43:16,453-Speed 5450.77 samples/sec   Loss 3.6161   LearningRate 0.0345   Epoch: 13   Global Step: 140530   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:43:23,934-Speed 5476.11 samples/sec   Loss 3.5998   LearningRate 0.0345   Epoch: 13   Global Step: 140540   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:43:31,496-Speed 5416.97 samples/sec   Loss 3.5696   LearningRate 0.0345   Epoch: 13   Global Step: 140550   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:43:39,040-Speed 5430.46 samples/sec   Loss 3.6360   LearningRate 0.0345   Epoch: 13   Global Step: 140560   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:43:46,572-Speed 5438.47 samples/sec   Loss 3.5942   LearningRate 0.0345   Epoch: 13   Global Step: 140570   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:43:54,181-Speed 5384.03 samples/sec   Loss 3.5604   LearningRate 0.0345   Epoch: 13   Global Step: 140580   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:44:01,711-Speed 5440.77 samples/sec   Loss 3.5962   LearningRate 0.0345   Epoch: 13   Global Step: 140590   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:44:09,283-Speed 5410.35 samples/sec   Loss 3.6152   LearningRate 0.0345   Epoch: 13   Global Step: 140600   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:44:16,724-Speed 5505.40 samples/sec   Loss 3.5913   LearningRate 0.0345   Epoch: 13   Global Step: 140610   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:44:24,354-Speed 5368.68 samples/sec   Loss 3.5931   LearningRate 0.0344   Epoch: 13   Global Step: 140620   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:44:31,986-Speed 5367.52 samples/sec   Loss 3.6037   LearningRate 0.0344   Epoch: 13   Global Step: 140630   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:44:39,509-Speed 5445.73 samples/sec   Loss 3.5683   LearningRate 0.0344   Epoch: 13   Global Step: 140640   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:44:47,076-Speed 5413.64 samples/sec   Loss 3.5855   LearningRate 0.0344   Epoch: 13   Global Step: 140650   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:44:54,646-Speed 5411.47 samples/sec   Loss 3.5468   LearningRate 0.0344   Epoch: 13   Global Step: 140660   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:45:02,127-Speed 5476.20 samples/sec   Loss 3.6147   LearningRate 0.0344   Epoch: 13   Global Step: 140670   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:45:09,653-Speed 5443.15 samples/sec   Loss 3.6046   LearningRate 0.0344   Epoch: 13   Global Step: 140680   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:45:17,207-Speed 5422.71 samples/sec   Loss 3.5373   LearningRate 0.0344   Epoch: 13   Global Step: 140690   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:45:24,731-Speed 5445.32 samples/sec   Loss 3.5618   LearningRate 0.0344   Epoch: 13   Global Step: 140700   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:45:32,411-Speed 5333.88 samples/sec   Loss 3.6114   LearningRate 0.0344   Epoch: 13   Global Step: 140710   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:45:39,978-Speed 5413.98 samples/sec   Loss 3.5752   LearningRate 0.0343   Epoch: 13   Global Step: 140720   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:45:47,494-Speed 5450.40 samples/sec   Loss 3.6137   LearningRate 0.0343   Epoch: 13   Global Step: 140730   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:45:54,955-Speed 5490.59 samples/sec   Loss 3.6148   LearningRate 0.0343   Epoch: 13   Global Step: 140740   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:46:02,457-Speed 5460.69 samples/sec   Loss 3.5832   LearningRate 0.0343   Epoch: 13   Global Step: 140750   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:46:10,012-Speed 5422.38 samples/sec   Loss 3.6365   LearningRate 0.0343   Epoch: 13   Global Step: 140760   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:46:17,597-Speed 5400.83 samples/sec   Loss 3.5817   LearningRate 0.0343   Epoch: 13   Global Step: 140770   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:46:25,206-Speed 5383.70 samples/sec   Loss 3.5656   LearningRate 0.0343   Epoch: 13   Global Step: 140780   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:46:32,723-Speed 5449.76 samples/sec   Loss 3.5691   LearningRate 0.0343   Epoch: 13   Global Step: 140790   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:46:40,415-Speed 5325.13 samples/sec   Loss 3.6508   LearningRate 0.0343   Epoch: 13   Global Step: 140800   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:46:47,925-Speed 5461.43 samples/sec   Loss 3.5793   LearningRate 0.0343   Epoch: 13   Global Step: 140810   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:46:55,494-Speed 5412.01 samples/sec   Loss 3.5897   LearningRate 0.0342   Epoch: 13   Global Step: 140820   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:47:03,028-Speed 5437.17 samples/sec   Loss 3.5730   LearningRate 0.0342   Epoch: 13   Global Step: 140830   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:47:10,527-Speed 5462.75 samples/sec   Loss 3.5434   LearningRate 0.0342   Epoch: 13   Global Step: 140840   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:47:18,029-Speed 5460.79 samples/sec   Loss 3.5568   LearningRate 0.0342   Epoch: 13   Global Step: 140850   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:47:25,538-Speed 5455.25 samples/sec   Loss 3.5678   LearningRate 0.0342   Epoch: 13   Global Step: 140860   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:47:33,028-Speed 5469.25 samples/sec   Loss 3.5674   LearningRate 0.0342   Epoch: 13   Global Step: 140870   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:47:40,649-Speed 5375.72 samples/sec   Loss 3.5903   LearningRate 0.0342   Epoch: 13   Global Step: 140880   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:47:48,319-Speed 5340.89 samples/sec   Loss 3.5898   LearningRate 0.0342   Epoch: 13   Global Step: 140890   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:47:55,824-Speed 5458.69 samples/sec   Loss 3.6212   LearningRate 0.0342   Epoch: 13   Global Step: 140900   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:48:03,374-Speed 5425.52 samples/sec   Loss 3.5812   LearningRate 0.0342   Epoch: 13   Global Step: 140910   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:48:10,838-Speed 5488.80 samples/sec   Loss 3.5941   LearningRate 0.0341   Epoch: 13   Global Step: 140920   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:48:18,325-Speed 5471.24 samples/sec   Loss 3.5575   LearningRate 0.0341   Epoch: 13   Global Step: 140930   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:48:25,849-Speed 5444.66 samples/sec   Loss 3.5843   LearningRate 0.0341   Epoch: 13   Global Step: 140940   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:48:33,364-Speed 5450.65 samples/sec   Loss 3.5423   LearningRate 0.0341   Epoch: 13   Global Step: 140950   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:48:40,882-Speed 5448.98 samples/sec   Loss 3.5813   LearningRate 0.0341   Epoch: 13   Global Step: 140960   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:48:48,422-Speed 5433.28 samples/sec   Loss 3.6218   LearningRate 0.0341   Epoch: 13   Global Step: 140970   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:48:55,958-Speed 5436.00 samples/sec   Loss 3.5705   LearningRate 0.0341   Epoch: 13   Global Step: 140980   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:49:03,541-Speed 5402.19 samples/sec   Loss 3.5951   LearningRate 0.0341   Epoch: 13   Global Step: 140990   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:49:11,101-Speed 5419.11 samples/sec   Loss 3.5620   LearningRate 0.0341   Epoch: 13   Global Step: 141000   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:49:18,600-Speed 5462.18 samples/sec   Loss 3.5853   LearningRate 0.0340   Epoch: 13   Global Step: 141010   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:49:26,204-Speed 5387.70 samples/sec   Loss 3.5758   LearningRate 0.0340   Epoch: 13   Global Step: 141020   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:49:33,763-Speed 5419.38 samples/sec   Loss 3.5657   LearningRate 0.0340   Epoch: 13   Global Step: 141030   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:49:41,334-Speed 5411.10 samples/sec   Loss 3.5884   LearningRate 0.0340   Epoch: 13   Global Step: 141040   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:49:48,967-Speed 5366.87 samples/sec   Loss 3.5793   LearningRate 0.0340   Epoch: 13   Global Step: 141050   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:49:56,607-Speed 5362.15 samples/sec   Loss 3.5461   LearningRate 0.0340   Epoch: 13   Global Step: 141060   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:50:04,167-Speed 5418.66 samples/sec   Loss 3.5465   LearningRate 0.0340   Epoch: 13   Global Step: 141070   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:50:11,683-Speed 5450.51 samples/sec   Loss 3.5537   LearningRate 0.0340   Epoch: 13   Global Step: 141080   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:50:19,294-Speed 5382.27 samples/sec   Loss 3.6032   LearningRate 0.0340   Epoch: 13   Global Step: 141090   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:50:26,870-Speed 5407.67 samples/sec   Loss 3.5372   LearningRate 0.0340   Epoch: 13   Global Step: 141100   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:50:34,437-Speed 5413.52 samples/sec   Loss 3.5995   LearningRate 0.0339   Epoch: 13   Global Step: 141110   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:50:41,978-Speed 5432.59 samples/sec   Loss 3.5503   LearningRate 0.0339   Epoch: 13   Global Step: 141120   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:50:49,437-Speed 5491.46 samples/sec   Loss 3.5666   LearningRate 0.0339   Epoch: 13   Global Step: 141130   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:50:57,006-Speed 5412.67 samples/sec   Loss 3.5726   LearningRate 0.0339   Epoch: 13   Global Step: 141140   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:51:04,618-Speed 5381.59 samples/sec   Loss 3.5855   LearningRate 0.0339   Epoch: 13   Global Step: 141150   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:51:12,203-Speed 5401.13 samples/sec   Loss 3.5441   LearningRate 0.0339   Epoch: 13   Global Step: 141160   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:51:19,888-Speed 5330.20 samples/sec   Loss 3.5664   LearningRate 0.0339   Epoch: 13   Global Step: 141170   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:51:27,595-Speed 5315.12 samples/sec   Loss 3.5269   LearningRate 0.0339   Epoch: 13   Global Step: 141180   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:51:35,113-Speed 5449.51 samples/sec   Loss 3.5829   LearningRate 0.0339   Epoch: 13   Global Step: 141190   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:51:42,718-Speed 5386.64 samples/sec   Loss 3.5538   LearningRate 0.0339   Epoch: 13   Global Step: 141200   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:51:50,243-Speed 5443.32 samples/sec   Loss 3.5877   LearningRate 0.0338   Epoch: 13   Global Step: 141210   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-09 02:51:57,731-Speed 5470.65 samples/sec   Loss 3.5719   LearningRate 0.0338   Epoch: 13   Global Step: 141220   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:52:05,242-Speed 5454.77 samples/sec   Loss 3.5358   LearningRate 0.0338   Epoch: 13   Global Step: 141230   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:52:12,752-Speed 5454.94 samples/sec   Loss 3.4967   LearningRate 0.0338   Epoch: 13   Global Step: 141240   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:52:20,305-Speed 5422.96 samples/sec   Loss 3.5513   LearningRate 0.0338   Epoch: 13   Global Step: 141250   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:52:27,791-Speed 5472.62 samples/sec   Loss 3.5200   LearningRate 0.0338   Epoch: 13   Global Step: 141260   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:52:35,271-Speed 5477.08 samples/sec   Loss 3.5326   LearningRate 0.0338   Epoch: 13   Global Step: 141270   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:52:42,792-Speed 5446.65 samples/sec   Loss 3.5183   LearningRate 0.0338   Epoch: 13   Global Step: 141280   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:52:50,374-Speed 5402.48 samples/sec   Loss 3.5689   LearningRate 0.0338   Epoch: 13   Global Step: 141290   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:52:57,981-Speed 5385.18 samples/sec   Loss 3.5440   LearningRate 0.0338   Epoch: 13   Global Step: 141300   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:53:05,532-Speed 5425.63 samples/sec   Loss 3.5882   LearningRate 0.0337   Epoch: 13   Global Step: 141310   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:53:13,000-Speed 5484.93 samples/sec   Loss 3.5241   LearningRate 0.0337   Epoch: 13   Global Step: 141320   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-09 02:53:20,636-Speed 5364.49 samples/sec   Loss 3.5723   LearningRate 0.0337   Epoch: 13   Global Step: 141330   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:53:28,285-Speed 5356.40 samples/sec   Loss 3.5486   LearningRate 0.0337   Epoch: 13   Global Step: 141340   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:53:35,759-Speed 5480.40 samples/sec   Loss 3.5685   LearningRate 0.0337   Epoch: 13   Global Step: 141350   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:53:43,276-Speed 5450.05 samples/sec   Loss 3.5648   LearningRate 0.0337   Epoch: 13   Global Step: 141360   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:53:50,748-Speed 5482.63 samples/sec   Loss 3.5176   LearningRate 0.0337   Epoch: 13   Global Step: 141370   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:53:58,264-Speed 5450.63 samples/sec   Loss 3.5240   LearningRate 0.0337   Epoch: 13   Global Step: 141380   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:54:05,840-Speed 5407.15 samples/sec   Loss 3.5246   LearningRate 0.0337   Epoch: 13   Global Step: 141390   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:54:13,444-Speed 5387.41 samples/sec   Loss 3.5648   LearningRate 0.0336   Epoch: 13   Global Step: 141400   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:54:20,999-Speed 5422.51 samples/sec   Loss 3.5240   LearningRate 0.0336   Epoch: 13   Global Step: 141410   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:54:28,704-Speed 5316.41 samples/sec   Loss 3.5313   LearningRate 0.0336   Epoch: 13   Global Step: 141420   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:54:36,386-Speed 5333.32 samples/sec   Loss 3.5193   LearningRate 0.0336   Epoch: 13   Global Step: 141430   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-09 02:54:43,996-Speed 5383.23 samples/sec   Loss 3.5222   LearningRate 0.0336   Epoch: 13   Global Step: 141440   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-09 02:54:51,527-Speed 5439.81 samples/sec   Loss 3.5224   LearningRate 0.0336   Epoch: 13   Global Step: 141450   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:54:59,053-Speed 5442.69 samples/sec   Loss 3.5466   LearningRate 0.0336   Epoch: 13   Global Step: 141460   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:55:06,538-Speed 5473.21 samples/sec   Loss 3.5010   LearningRate 0.0336   Epoch: 13   Global Step: 141470   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:55:14,059-Speed 5446.85 samples/sec   Loss 3.5455   LearningRate 0.0336   Epoch: 13   Global Step: 141480   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:55:21,527-Speed 5485.87 samples/sec   Loss 3.5438   LearningRate 0.0336   Epoch: 13   Global Step: 141490   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:55:29,029-Speed 5460.40 samples/sec   Loss 3.5410   LearningRate 0.0335   Epoch: 13   Global Step: 141500   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:55:36,523-Speed 5465.54 samples/sec   Loss 3.5615   LearningRate 0.0335   Epoch: 13   Global Step: 141510   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:55:44,088-Speed 5415.90 samples/sec   Loss 3.5430   LearningRate 0.0335   Epoch: 13   Global Step: 141520   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:55:51,697-Speed 5384.05 samples/sec   Loss 3.5550   LearningRate 0.0335   Epoch: 13   Global Step: 141530   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:55:59,196-Speed 5462.87 samples/sec   Loss 3.5578   LearningRate 0.0335   Epoch: 13   Global Step: 141540   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:56:06,697-Speed 5461.01 samples/sec   Loss 3.5518   LearningRate 0.0335   Epoch: 13   Global Step: 141550   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:56:14,282-Speed 5401.36 samples/sec   Loss 3.5599   LearningRate 0.0335   Epoch: 13   Global Step: 141560   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-01-09 02:56:21,798-Speed 5450.24 samples/sec   Loss 3.5867   LearningRate 0.0335   Epoch: 13   Global Step: 141570   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-01-09 02:56:29,402-Speed 5386.88 samples/sec   Loss 3.5581   LearningRate 0.0335   Epoch: 13   Global Step: 141580   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-01-09 02:56:37,021-Speed 5376.84 samples/sec   Loss 3.5092   LearningRate 0.0335   Epoch: 13   Global Step: 141590   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-01-09 02:56:44,572-Speed 5425.70 samples/sec   Loss 3.4811   LearningRate 0.0334   Epoch: 13   Global Step: 141600   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-01-09 02:56:52,100-Speed 5441.86 samples/sec   Loss 3.5063   LearningRate 0.0334   Epoch: 13   Global Step: 141610   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-01-09 02:56:59,649-Speed 5426.02 samples/sec   Loss 3.6013   LearningRate 0.0334   Epoch: 13   Global Step: 141620   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-01-09 02:57:07,212-Speed 5416.46 samples/sec   Loss 3.5057   LearningRate 0.0334   Epoch: 13   Global Step: 141630   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-01-09 02:57:14,874-Speed 5346.90 samples/sec   Loss 3.5491   LearningRate 0.0334   Epoch: 13   Global Step: 141640   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-01-09 02:57:22,403-Speed 5441.29 samples/sec   Loss 3.5051   LearningRate 0.0334   Epoch: 13   Global Step: 141650   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-01-09 02:57:29,933-Speed 5439.89 samples/sec   Loss 3.5224   LearningRate 0.0334   Epoch: 13   Global Step: 141660   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:57:37,458-Speed 5443.95 samples/sec   Loss 3.5452   LearningRate 0.0334   Epoch: 13   Global Step: 141670   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:57:44,965-Speed 5456.96 samples/sec   Loss 3.5648   LearningRate 0.0334   Epoch: 13   Global Step: 141680   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:57:52,487-Speed 5446.19 samples/sec   Loss 3.5141   LearningRate 0.0334   Epoch: 13   Global Step: 141690   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:57:59,977-Speed 5469.39 samples/sec   Loss 3.5363   LearningRate 0.0333   Epoch: 13   Global Step: 141700   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:58:07,578-Speed 5388.86 samples/sec   Loss 3.4769   LearningRate 0.0333   Epoch: 13   Global Step: 141710   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:58:15,137-Speed 5419.89 samples/sec   Loss 3.5051   LearningRate 0.0333   Epoch: 13   Global Step: 141720   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:58:22,768-Speed 5368.41 samples/sec   Loss 3.4958   LearningRate 0.0333   Epoch: 13   Global Step: 141730   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:58:30,283-Speed 5450.67 samples/sec   Loss 3.4834   LearningRate 0.0333   Epoch: 13   Global Step: 141740   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:58:37,829-Speed 5428.91 samples/sec   Loss 3.5229   LearningRate 0.0333   Epoch: 13   Global Step: 141750   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 02:58:45,453-Speed 5373.44 samples/sec   Loss 3.5371   LearningRate 0.0333   Epoch: 13   Global Step: 141760   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:58:52,932-Speed 5477.24 samples/sec   Loss 3.5380   LearningRate 0.0333   Epoch: 13   Global Step: 141770   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:59:00,543-Speed 5382.53 samples/sec   Loss 3.5345   LearningRate 0.0333   Epoch: 13   Global Step: 141780   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:59:08,103-Speed 5417.93 samples/sec   Loss 3.5355   LearningRate 0.0333   Epoch: 13   Global Step: 141790   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:59:15,650-Speed 5428.45 samples/sec   Loss 3.5028   LearningRate 0.0332   Epoch: 13   Global Step: 141800   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:59:23,177-Speed 5442.67 samples/sec   Loss 3.5285   LearningRate 0.0332   Epoch: 13   Global Step: 141810   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:59:30,844-Speed 5342.97 samples/sec   Loss 3.4829   LearningRate 0.0332   Epoch: 13   Global Step: 141820   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:59:38,369-Speed 5443.67 samples/sec   Loss 3.5406   LearningRate 0.0332   Epoch: 13   Global Step: 141830   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:59:45,974-Speed 5386.85 samples/sec   Loss 3.5100   LearningRate 0.0332   Epoch: 13   Global Step: 141840   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 02:59:53,480-Speed 5457.74 samples/sec   Loss 3.5430   LearningRate 0.0332   Epoch: 13   Global Step: 141850   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:00:00,978-Speed 5463.10 samples/sec   Loss 3.4469   LearningRate 0.0332   Epoch: 13   Global Step: 141860   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:00:08,559-Speed 5403.55 samples/sec   Loss 3.5413   LearningRate 0.0332   Epoch: 13   Global Step: 141870   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:00:16,104-Speed 5429.82 samples/sec   Loss 3.5100   LearningRate 0.0332   Epoch: 13   Global Step: 141880   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:00:23,880-Speed 5268.36 samples/sec   Loss 3.5403   LearningRate 0.0332   Epoch: 13   Global Step: 141890   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:00:31,451-Speed 5410.37 samples/sec   Loss 3.5009   LearningRate 0.0331   Epoch: 13   Global Step: 141900   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:00:39,071-Speed 5376.12 samples/sec   Loss 3.5314   LearningRate 0.0331   Epoch: 13   Global Step: 141910   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:00:46,674-Speed 5388.13 samples/sec   Loss 3.5337   LearningRate 0.0331   Epoch: 13   Global Step: 141920   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:00:54,244-Speed 5411.87 samples/sec   Loss 3.4758   LearningRate 0.0331   Epoch: 13   Global Step: 141930   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:01:01,764-Speed 5447.20 samples/sec   Loss 3.5342   LearningRate 0.0331   Epoch: 13   Global Step: 141940   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:01:09,264-Speed 5462.45 samples/sec   Loss 3.5496   LearningRate 0.0331   Epoch: 13   Global Step: 141950   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:01:16,793-Speed 5440.56 samples/sec   Loss 3.5011   LearningRate 0.0331   Epoch: 13   Global Step: 141960   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:01:24,307-Speed 5452.25 samples/sec   Loss 3.5256   LearningRate 0.0331   Epoch: 13   Global Step: 141970   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:01:31,898-Speed 5397.12 samples/sec   Loss 3.4880   LearningRate 0.0331   Epoch: 13   Global Step: 141980   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:01:39,455-Speed 5420.64 samples/sec   Loss 3.4975   LearningRate 0.0330   Epoch: 13   Global Step: 141990   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:01:47,040-Speed 5400.61 samples/sec   Loss 3.5195   LearningRate 0.0330   Epoch: 13   Global Step: 142000   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:02:31,146-[lfw][142000]XNorm: 22.636040
Training: 2022-01-09 03:02:31,147-[lfw][142000]Accuracy-Flip: 0.99817+-0.00229
Training: 2022-01-09 03:02:31,147-[lfw][142000]Accuracy-Highest: 0.99817
Training: 2022-01-09 03:03:23,131-[cfp_fp][142000]XNorm: 21.083420
Training: 2022-01-09 03:03:23,131-[cfp_fp][142000]Accuracy-Flip: 0.99129+-0.00445
Training: 2022-01-09 03:03:23,132-[cfp_fp][142000]Accuracy-Highest: 0.99186
Training: 2022-01-09 03:04:07,614-[agedb_30][142000]XNorm: 22.439980
Training: 2022-01-09 03:04:07,615-[agedb_30][142000]Accuracy-Flip: 0.98033+-0.00710
Training: 2022-01-09 03:04:07,615-[agedb_30][142000]Accuracy-Highest: 0.98067
Training: 2022-01-09 03:04:15,332-Speed 276.22 samples/sec   Loss 3.5313   LearningRate 0.0330   Epoch: 13   Global Step: 142010   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:04:22,941-Speed 5383.86 samples/sec   Loss 3.5241   LearningRate 0.0330   Epoch: 13   Global Step: 142020   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:04:30,489-Speed 5427.10 samples/sec   Loss 3.5563   LearningRate 0.0330   Epoch: 13   Global Step: 142030   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:04:38,105-Speed 5379.17 samples/sec   Loss 3.4765   LearningRate 0.0330   Epoch: 13   Global Step: 142040   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:04:45,654-Speed 5428.98 samples/sec   Loss 3.5020   LearningRate 0.0330   Epoch: 13   Global Step: 142050   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:04:53,187-Speed 5438.34 samples/sec   Loss 3.5165   LearningRate 0.0330   Epoch: 13   Global Step: 142060   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:05:00,679-Speed 5468.07 samples/sec   Loss 3.5016   LearningRate 0.0330   Epoch: 13   Global Step: 142070   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:05:08,245-Speed 5414.33 samples/sec   Loss 3.5377   LearningRate 0.0330   Epoch: 13   Global Step: 142080   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:05:15,817-Speed 5410.25 samples/sec   Loss 3.5269   LearningRate 0.0329   Epoch: 13   Global Step: 142090   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:05:23,453-Speed 5364.80 samples/sec   Loss 3.4744   LearningRate 0.0329   Epoch: 13   Global Step: 142100   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:05:31,112-Speed 5348.27 samples/sec   Loss 3.4870   LearningRate 0.0329   Epoch: 13   Global Step: 142110   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:05:38,588-Speed 5479.72 samples/sec   Loss 3.5061   LearningRate 0.0329   Epoch: 13   Global Step: 142120   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:05:46,250-Speed 5346.36 samples/sec   Loss 3.5432   LearningRate 0.0329   Epoch: 13   Global Step: 142130   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:05:53,877-Speed 5370.85 samples/sec   Loss 3.4939   LearningRate 0.0329   Epoch: 13   Global Step: 142140   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:06:01,442-Speed 5415.33 samples/sec   Loss 3.5022   LearningRate 0.0329   Epoch: 13   Global Step: 142150   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:06:09,089-Speed 5356.90 samples/sec   Loss 3.4962   LearningRate 0.0329   Epoch: 13   Global Step: 142160   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:06:16,696-Speed 5385.37 samples/sec   Loss 3.5406   LearningRate 0.0329   Epoch: 13   Global Step: 142170   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:06:24,215-Speed 5448.20 samples/sec   Loss 3.5298   LearningRate 0.0329   Epoch: 13   Global Step: 142180   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:06:31,664-Speed 5499.84 samples/sec   Loss 3.5233   LearningRate 0.0328   Epoch: 13   Global Step: 142190   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:06:39,203-Speed 5433.59 samples/sec   Loss 3.5098   LearningRate 0.0328   Epoch: 13   Global Step: 142200   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:06:46,702-Speed 5462.59 samples/sec   Loss 3.5065   LearningRate 0.0328   Epoch: 13   Global Step: 142210   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:06:54,308-Speed 5386.04 samples/sec   Loss 3.5015   LearningRate 0.0328   Epoch: 13   Global Step: 142220   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:07:01,814-Speed 5458.09 samples/sec   Loss 3.4777   LearningRate 0.0328   Epoch: 13   Global Step: 142230   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:07:09,367-Speed 5424.00 samples/sec   Loss 3.5322   LearningRate 0.0328   Epoch: 13   Global Step: 142240   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:07:16,838-Speed 5482.61 samples/sec   Loss 3.5107   LearningRate 0.0328   Epoch: 13   Global Step: 142250   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:07:24,371-Speed 5438.84 samples/sec   Loss 3.4882   LearningRate 0.0328   Epoch: 13   Global Step: 142260   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:07:31,851-Speed 5476.76 samples/sec   Loss 3.5072   LearningRate 0.0328   Epoch: 13   Global Step: 142270   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:07:39,361-Speed 5454.53 samples/sec   Loss 3.5270   LearningRate 0.0328   Epoch: 13   Global Step: 142280   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:07:46,945-Speed 5401.21 samples/sec   Loss 3.4727   LearningRate 0.0327   Epoch: 13   Global Step: 142290   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:07:54,480-Speed 5436.85 samples/sec   Loss 3.5006   LearningRate 0.0327   Epoch: 13   Global Step: 142300   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:08:01,974-Speed 5466.81 samples/sec   Loss 3.4861   LearningRate 0.0327   Epoch: 13   Global Step: 142310   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:08:09,531-Speed 5420.65 samples/sec   Loss 3.5176   LearningRate 0.0327   Epoch: 13   Global Step: 142320   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:08:17,081-Speed 5426.06 samples/sec   Loss 3.4913   LearningRate 0.0327   Epoch: 13   Global Step: 142330   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:08:24,592-Speed 5454.04 samples/sec   Loss 3.5148   LearningRate 0.0327   Epoch: 13   Global Step: 142340   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:08:32,069-Speed 5478.85 samples/sec   Loss 3.5187   LearningRate 0.0327   Epoch: 13   Global Step: 142350   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:08:39,545-Speed 5479.75 samples/sec   Loss 3.4566   LearningRate 0.0327   Epoch: 13   Global Step: 142360   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:08:47,063-Speed 5448.38 samples/sec   Loss 3.5406   LearningRate 0.0327   Epoch: 13   Global Step: 142370   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:08:54,596-Speed 5437.95 samples/sec   Loss 3.5229   LearningRate 0.0327   Epoch: 13   Global Step: 142380   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:09:02,164-Speed 5413.47 samples/sec   Loss 3.4885   LearningRate 0.0326   Epoch: 13   Global Step: 142390   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-09 03:09:09,791-Speed 5371.30 samples/sec   Loss 3.4761   LearningRate 0.0326   Epoch: 13   Global Step: 142400   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-09 03:09:17,239-Speed 5500.01 samples/sec   Loss 3.4635   LearningRate 0.0326   Epoch: 13   Global Step: 142410   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:09:24,671-Speed 5511.50 samples/sec   Loss 3.4529   LearningRate 0.0326   Epoch: 13   Global Step: 142420   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:09:32,220-Speed 5426.85 samples/sec   Loss 3.4611   LearningRate 0.0326   Epoch: 13   Global Step: 142430   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:09:39,757-Speed 5435.13 samples/sec   Loss 3.4836   LearningRate 0.0326   Epoch: 13   Global Step: 142440   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:09:47,312-Speed 5422.26 samples/sec   Loss 3.4668   LearningRate 0.0326   Epoch: 13   Global Step: 142450   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:09:54,903-Speed 5396.30 samples/sec   Loss 3.5194   LearningRate 0.0326   Epoch: 13   Global Step: 142460   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:10:02,440-Speed 5435.17 samples/sec   Loss 3.5112   LearningRate 0.0326   Epoch: 13   Global Step: 142470   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:10:09,994-Speed 5423.60 samples/sec   Loss 3.5039   LearningRate 0.0326   Epoch: 13   Global Step: 142480   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:10:17,539-Speed 5429.18 samples/sec   Loss 3.4488   LearningRate 0.0325   Epoch: 13   Global Step: 142490   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:10:25,077-Speed 5434.23 samples/sec   Loss 3.4953   LearningRate 0.0325   Epoch: 13   Global Step: 142500   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:10:32,584-Speed 5457.36 samples/sec   Loss 3.4744   LearningRate 0.0325   Epoch: 13   Global Step: 142510   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-09 03:10:40,065-Speed 5475.63 samples/sec   Loss 3.5033   LearningRate 0.0325   Epoch: 13   Global Step: 142520   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:10:47,536-Speed 5483.46 samples/sec   Loss 3.4426   LearningRate 0.0325   Epoch: 13   Global Step: 142530   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:10:55,001-Speed 5487.30 samples/sec   Loss 3.4897   LearningRate 0.0325   Epoch: 13   Global Step: 142540   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:11:02,473-Speed 5482.77 samples/sec   Loss 3.4300   LearningRate 0.0325   Epoch: 13   Global Step: 142550   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:11:10,065-Speed 5396.38 samples/sec   Loss 3.4457   LearningRate 0.0325   Epoch: 13   Global Step: 142560   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:11:17,700-Speed 5365.08 samples/sec   Loss 3.4843   LearningRate 0.0325   Epoch: 13   Global Step: 142570   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:11:25,214-Speed 5452.05 samples/sec   Loss 3.4545   LearningRate 0.0325   Epoch: 13   Global Step: 142580   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:11:32,876-Speed 5345.87 samples/sec   Loss 3.5238   LearningRate 0.0324   Epoch: 13   Global Step: 142590   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:11:40,347-Speed 5483.90 samples/sec   Loss 3.4852   LearningRate 0.0324   Epoch: 13   Global Step: 142600   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:11:47,871-Speed 5444.78 samples/sec   Loss 3.4509   LearningRate 0.0324   Epoch: 13   Global Step: 142610   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:11:55,339-Speed 5485.09 samples/sec   Loss 3.4417   LearningRate 0.0324   Epoch: 13   Global Step: 142620   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:12:02,874-Speed 5436.67 samples/sec   Loss 3.5090   LearningRate 0.0324   Epoch: 13   Global Step: 142630   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:12:10,466-Speed 5396.12 samples/sec   Loss 3.4766   LearningRate 0.0324   Epoch: 13   Global Step: 142640   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:12:17,941-Speed 5480.78 samples/sec   Loss 3.4925   LearningRate 0.0324   Epoch: 13   Global Step: 142650   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:12:25,482-Speed 5432.10 samples/sec   Loss 3.4340   LearningRate 0.0324   Epoch: 13   Global Step: 142660   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:12:33,023-Speed 5432.24 samples/sec   Loss 3.4770   LearningRate 0.0324   Epoch: 13   Global Step: 142670   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:12:40,622-Speed 5391.45 samples/sec   Loss 3.4565   LearningRate 0.0324   Epoch: 13   Global Step: 142680   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:12:48,089-Speed 5486.00 samples/sec   Loss 3.4687   LearningRate 0.0323   Epoch: 13   Global Step: 142690   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:12:55,573-Speed 5473.78 samples/sec   Loss 3.5069   LearningRate 0.0323   Epoch: 13   Global Step: 142700   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:13:03,231-Speed 5349.03 samples/sec   Loss 3.4947   LearningRate 0.0323   Epoch: 13   Global Step: 142710   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:13:10,899-Speed 5342.71 samples/sec   Loss 3.4872   LearningRate 0.0323   Epoch: 13   Global Step: 142720   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:13:18,392-Speed 5466.60 samples/sec   Loss 3.4969   LearningRate 0.0323   Epoch: 13   Global Step: 142730   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:13:25,931-Speed 5433.70 samples/sec   Loss 3.4282   LearningRate 0.0323   Epoch: 13   Global Step: 142740   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:13:33,449-Speed 5449.10 samples/sec   Loss 3.4640   LearningRate 0.0323   Epoch: 13   Global Step: 142750   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:13:40,924-Speed 5480.55 samples/sec   Loss 3.4203   LearningRate 0.0323   Epoch: 13   Global Step: 142760   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:13:48,408-Speed 5473.75 samples/sec   Loss 3.4907   LearningRate 0.0323   Epoch: 13   Global Step: 142770   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:13:55,956-Speed 5427.12 samples/sec   Loss 3.4920   LearningRate 0.0323   Epoch: 13   Global Step: 142780   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:14:03,508-Speed 5424.10 samples/sec   Loss 3.4577   LearningRate 0.0322   Epoch: 13   Global Step: 142790   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:14:11,092-Speed 5402.26 samples/sec   Loss 3.4854   LearningRate 0.0322   Epoch: 13   Global Step: 142800   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:14:18,590-Speed 5463.18 samples/sec   Loss 3.4785   LearningRate 0.0322   Epoch: 13   Global Step: 142810   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:14:26,072-Speed 5475.09 samples/sec   Loss 3.4509   LearningRate 0.0322   Epoch: 13   Global Step: 142820   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:14:33,549-Speed 5478.28 samples/sec   Loss 3.5226   LearningRate 0.0322   Epoch: 13   Global Step: 142830   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:14:41,048-Speed 5463.85 samples/sec   Loss 3.4898   LearningRate 0.0322   Epoch: 13   Global Step: 142840   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:14:48,627-Speed 5404.72 samples/sec   Loss 3.5113   LearningRate 0.0322   Epoch: 13   Global Step: 142850   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:14:56,224-Speed 5392.23 samples/sec   Loss 3.4912   LearningRate 0.0322   Epoch: 13   Global Step: 142860   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:15:03,754-Speed 5440.40 samples/sec   Loss 3.4733   LearningRate 0.0322   Epoch: 13   Global Step: 142870   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:15:11,236-Speed 5475.39 samples/sec   Loss 3.4200   LearningRate 0.0322   Epoch: 13   Global Step: 142880   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:15:18,786-Speed 5426.26 samples/sec   Loss 3.4502   LearningRate 0.0321   Epoch: 13   Global Step: 142890   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:15:26,271-Speed 5472.48 samples/sec   Loss 3.4730   LearningRate 0.0321   Epoch: 13   Global Step: 142900   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:15:33,774-Speed 5459.50 samples/sec   Loss 3.4839   LearningRate 0.0321   Epoch: 13   Global Step: 142910   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:15:41,276-Speed 5460.60 samples/sec   Loss 3.4557   LearningRate 0.0321   Epoch: 13   Global Step: 142920   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:15:48,771-Speed 5466.26 samples/sec   Loss 3.4291   LearningRate 0.0321   Epoch: 13   Global Step: 142930   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:15:56,284-Speed 5452.37 samples/sec   Loss 3.4423   LearningRate 0.0321   Epoch: 13   Global Step: 142940   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:16:03,914-Speed 5368.94 samples/sec   Loss 3.4215   LearningRate 0.0321   Epoch: 13   Global Step: 142950   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:16:11,386-Speed 5482.60 samples/sec   Loss 3.4391   LearningRate 0.0321   Epoch: 13   Global Step: 142960   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:16:18,845-Speed 5491.87 samples/sec   Loss 3.5081   LearningRate 0.0321   Epoch: 13   Global Step: 142970   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:16:26,423-Speed 5406.02 samples/sec   Loss 3.4811   LearningRate 0.0321   Epoch: 13   Global Step: 142980   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:16:33,959-Speed 5435.99 samples/sec   Loss 3.4696   LearningRate 0.0320   Epoch: 13   Global Step: 142990   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:16:41,598-Speed 5362.45 samples/sec   Loss 3.4411   LearningRate 0.0320   Epoch: 13   Global Step: 143000   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:16:49,234-Speed 5365.52 samples/sec   Loss 3.4388   LearningRate 0.0320   Epoch: 13   Global Step: 143010   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:16:56,765-Speed 5439.92 samples/sec   Loss 3.4725   LearningRate 0.0320   Epoch: 13   Global Step: 143020   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:17:04,397-Speed 5367.15 samples/sec   Loss 3.4149   LearningRate 0.0320   Epoch: 13   Global Step: 143030   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:17:11,886-Speed 5470.52 samples/sec   Loss 3.4676   LearningRate 0.0320   Epoch: 13   Global Step: 143040   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:17:19,419-Speed 5438.28 samples/sec   Loss 3.4645   LearningRate 0.0320   Epoch: 13   Global Step: 143050   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:17:26,991-Speed 5409.99 samples/sec   Loss 3.4993   LearningRate 0.0320   Epoch: 13   Global Step: 143060   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:17:34,654-Speed 5346.15 samples/sec   Loss 3.4370   LearningRate 0.0320   Epoch: 13   Global Step: 143070   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:17:42,239-Speed 5401.13 samples/sec   Loss 3.4581   LearningRate 0.0320   Epoch: 13   Global Step: 143080   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:17:49,905-Speed 5343.87 samples/sec   Loss 3.4467   LearningRate 0.0319   Epoch: 13   Global Step: 143090   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:17:57,409-Speed 5459.11 samples/sec   Loss 3.4551   LearningRate 0.0319   Epoch: 13   Global Step: 143100   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:18:04,885-Speed 5479.64 samples/sec   Loss 3.4367   LearningRate 0.0319   Epoch: 13   Global Step: 143110   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:18:12,550-Speed 5344.39 samples/sec   Loss 3.4859   LearningRate 0.0319   Epoch: 13   Global Step: 143120   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:18:20,046-Speed 5465.51 samples/sec   Loss 3.4491   LearningRate 0.0319   Epoch: 13   Global Step: 143130   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:18:27,612-Speed 5414.17 samples/sec   Loss 3.3960   LearningRate 0.0319   Epoch: 13   Global Step: 143140   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:18:35,125-Speed 5452.55 samples/sec   Loss 3.3988   LearningRate 0.0319   Epoch: 13   Global Step: 143150   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:18:42,681-Speed 5421.22 samples/sec   Loss 3.4279   LearningRate 0.0319   Epoch: 13   Global Step: 143160   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:18:50,341-Speed 5348.69 samples/sec   Loss 3.4788   LearningRate 0.0319   Epoch: 13   Global Step: 143170   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:18:57,949-Speed 5384.52 samples/sec   Loss 3.4476   LearningRate 0.0319   Epoch: 13   Global Step: 143180   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:19:05,558-Speed 5383.95 samples/sec   Loss 3.4308   LearningRate 0.0318   Epoch: 13   Global Step: 143190   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:19:13,033-Speed 5480.16 samples/sec   Loss 3.4111   LearningRate 0.0318   Epoch: 13   Global Step: 143200   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:19:20,580-Speed 5427.97 samples/sec   Loss 3.4317   LearningRate 0.0318   Epoch: 13   Global Step: 143210   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:19:28,037-Speed 5493.60 samples/sec   Loss 3.4621   LearningRate 0.0318   Epoch: 13   Global Step: 143220   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:19:35,604-Speed 5413.88 samples/sec   Loss 3.3980   LearningRate 0.0318   Epoch: 13   Global Step: 143230   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:19:43,064-Speed 5491.65 samples/sec   Loss 3.4066   LearningRate 0.0318   Epoch: 13   Global Step: 143240   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:19:50,647-Speed 5402.46 samples/sec   Loss 3.4572   LearningRate 0.0318   Epoch: 13   Global Step: 143250   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:19:58,590-Speed 5157.14 samples/sec   Loss 3.4548   LearningRate 0.0318   Epoch: 13   Global Step: 143260   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:20:06,074-Speed 5474.06 samples/sec   Loss 3.4458   LearningRate 0.0318   Epoch: 13   Global Step: 143270   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:20:13,578-Speed 5459.35 samples/sec   Loss 3.4429   LearningRate 0.0318   Epoch: 13   Global Step: 143280   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:20:21,159-Speed 5403.95 samples/sec   Loss 3.4141   LearningRate 0.0317   Epoch: 13   Global Step: 143290   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:20:28,722-Speed 5416.14 samples/sec   Loss 3.4680   LearningRate 0.0317   Epoch: 13   Global Step: 143300   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:20:36,226-Speed 5459.38 samples/sec   Loss 3.4433   LearningRate 0.0317   Epoch: 13   Global Step: 143310   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:20:43,697-Speed 5483.11 samples/sec   Loss 3.4450   LearningRate 0.0317   Epoch: 13   Global Step: 143320   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:20:51,160-Speed 5489.41 samples/sec   Loss 3.4210   LearningRate 0.0317   Epoch: 13   Global Step: 143330   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-09 03:20:58,628-Speed 5485.05 samples/sec   Loss 3.4469   LearningRate 0.0317   Epoch: 13   Global Step: 143340   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:21:06,167-Speed 5433.85 samples/sec   Loss 3.4599   LearningRate 0.0317   Epoch: 13   Global Step: 143350   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:21:13,664-Speed 5464.63 samples/sec   Loss 3.4730   LearningRate 0.0317   Epoch: 13   Global Step: 143360   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:21:21,314-Speed 5354.94 samples/sec   Loss 3.4040   LearningRate 0.0317   Epoch: 13   Global Step: 143370   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:21:28,968-Speed 5352.02 samples/sec   Loss 3.4078   LearningRate 0.0317   Epoch: 13   Global Step: 143380   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:21:36,626-Speed 5349.58 samples/sec   Loss 3.4275   LearningRate 0.0316   Epoch: 13   Global Step: 143390   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:21:44,219-Speed 5395.24 samples/sec   Loss 3.4652   LearningRate 0.0316   Epoch: 13   Global Step: 143400   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:21:51,706-Speed 5471.42 samples/sec   Loss 3.4657   LearningRate 0.0316   Epoch: 13   Global Step: 143410   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:21:59,295-Speed 5398.31 samples/sec   Loss 3.4286   LearningRate 0.0316   Epoch: 13   Global Step: 143420   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:22:06,871-Speed 5407.34 samples/sec   Loss 3.4292   LearningRate 0.0316   Epoch: 13   Global Step: 143430   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:22:14,506-Speed 5365.30 samples/sec   Loss 3.4239   LearningRate 0.0316   Epoch: 13   Global Step: 143440   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:22:22,126-Speed 5376.33 samples/sec   Loss 3.4595   LearningRate 0.0316   Epoch: 13   Global Step: 143450   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:22:29,899-Speed 5270.44 samples/sec   Loss 3.4360   LearningRate 0.0316   Epoch: 13   Global Step: 143460   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:22:37,544-Speed 5358.52 samples/sec   Loss 3.4638   LearningRate 0.0316   Epoch: 13   Global Step: 143470   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:22:45,104-Speed 5418.47 samples/sec   Loss 3.4832   LearningRate 0.0316   Epoch: 13   Global Step: 143480   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:22:52,667-Speed 5416.94 samples/sec   Loss 3.4714   LearningRate 0.0316   Epoch: 13   Global Step: 143490   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:23:00,366-Speed 5321.12 samples/sec   Loss 3.4026   LearningRate 0.0315   Epoch: 13   Global Step: 143500   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:23:07,923-Speed 5420.75 samples/sec   Loss 3.4390   LearningRate 0.0315   Epoch: 13   Global Step: 143510   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:23:15,467-Speed 5429.85 samples/sec   Loss 3.3857   LearningRate 0.0315   Epoch: 13   Global Step: 143520   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:23:23,043-Speed 5407.44 samples/sec   Loss 3.4520   LearningRate 0.0315   Epoch: 13   Global Step: 143530   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:23:30,581-Speed 5434.64 samples/sec   Loss 3.3733   LearningRate 0.0315   Epoch: 13   Global Step: 143540   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:23:38,092-Speed 5454.04 samples/sec   Loss 3.4200   LearningRate 0.0315   Epoch: 13   Global Step: 143550   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-09 03:23:45,556-Speed 5487.74 samples/sec   Loss 3.4071   LearningRate 0.0315   Epoch: 13   Global Step: 143560   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:23:53,068-Speed 5453.83 samples/sec   Loss 3.4300   LearningRate 0.0315   Epoch: 13   Global Step: 143570   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:24:00,557-Speed 5469.94 samples/sec   Loss 3.4789   LearningRate 0.0315   Epoch: 13   Global Step: 143580   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:24:08,114-Speed 5420.89 samples/sec   Loss 3.4345   LearningRate 0.0315   Epoch: 13   Global Step: 143590   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:24:15,629-Speed 5450.64 samples/sec   Loss 3.4035   LearningRate 0.0314   Epoch: 13   Global Step: 143600   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:24:23,204-Speed 5408.43 samples/sec   Loss 3.4065   LearningRate 0.0314   Epoch: 13   Global Step: 143610   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:24:30,783-Speed 5405.21 samples/sec   Loss 3.4595   LearningRate 0.0314   Epoch: 13   Global Step: 143620   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:24:38,285-Speed 5460.55 samples/sec   Loss 3.4158   LearningRate 0.0314   Epoch: 13   Global Step: 143630   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:24:45,886-Speed 5389.13 samples/sec   Loss 3.4445   LearningRate 0.0314   Epoch: 13   Global Step: 143640   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:24:53,469-Speed 5402.24 samples/sec   Loss 3.4437   LearningRate 0.0314   Epoch: 13   Global Step: 143650   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:25:00,906-Speed 5508.29 samples/sec   Loss 3.4482   LearningRate 0.0314   Epoch: 13   Global Step: 143660   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:25:08,361-Speed 5495.33 samples/sec   Loss 3.4679   LearningRate 0.0314   Epoch: 13   Global Step: 143670   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:25:15,879-Speed 5448.54 samples/sec   Loss 3.4121   LearningRate 0.0314   Epoch: 13   Global Step: 143680   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:25:23,346-Speed 5486.56 samples/sec   Loss 3.4034   LearningRate 0.0314   Epoch: 13   Global Step: 143690   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:25:30,865-Speed 5448.05 samples/sec   Loss 3.4507   LearningRate 0.0313   Epoch: 13   Global Step: 143700   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:25:38,376-Speed 5454.50 samples/sec   Loss 3.4084   LearningRate 0.0313   Epoch: 13   Global Step: 143710   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:25:45,881-Speed 5457.63 samples/sec   Loss 3.4588   LearningRate 0.0313   Epoch: 13   Global Step: 143720   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:25:53,381-Speed 5462.23 samples/sec   Loss 3.4282   LearningRate 0.0313   Epoch: 13   Global Step: 143730   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:26:00,856-Speed 5480.39 samples/sec   Loss 3.4497   LearningRate 0.0313   Epoch: 13   Global Step: 143740   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:26:08,308-Speed 5497.23 samples/sec   Loss 3.4352   LearningRate 0.0313   Epoch: 13   Global Step: 143750   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-01-09 03:26:15,817-Speed 5455.66 samples/sec   Loss 3.4037   LearningRate 0.0313   Epoch: 13   Global Step: 143760   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-01-09 03:26:23,325-Speed 5455.77 samples/sec   Loss 3.4280   LearningRate 0.0313   Epoch: 13   Global Step: 143770   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-01-09 03:26:30,929-Speed 5387.55 samples/sec   Loss 3.4511   LearningRate 0.0313   Epoch: 13   Global Step: 143780   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-01-09 03:26:38,456-Speed 5442.40 samples/sec   Loss 3.4269   LearningRate 0.0313   Epoch: 13   Global Step: 143790   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-01-09 03:26:45,985-Speed 5440.64 samples/sec   Loss 3.4508   LearningRate 0.0312   Epoch: 13   Global Step: 143800   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-01-09 03:26:53,508-Speed 5445.45 samples/sec   Loss 3.4248   LearningRate 0.0312   Epoch: 13   Global Step: 143810   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-01-09 03:27:00,990-Speed 5475.39 samples/sec   Loss 3.4311   LearningRate 0.0312   Epoch: 13   Global Step: 143820   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-01-09 03:27:08,478-Speed 5470.44 samples/sec   Loss 3.3968   LearningRate 0.0312   Epoch: 13   Global Step: 143830   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-01-09 03:27:15,964-Speed 5472.56 samples/sec   Loss 3.4048   LearningRate 0.0312   Epoch: 13   Global Step: 143840   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-01-09 03:27:23,406-Speed 5504.92 samples/sec   Loss 3.4132   LearningRate 0.0312   Epoch: 13   Global Step: 143850   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:27:30,908-Speed 5460.11 samples/sec   Loss 3.3981   LearningRate 0.0312   Epoch: 13   Global Step: 143860   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:27:38,525-Speed 5378.35 samples/sec   Loss 3.3877   LearningRate 0.0312   Epoch: 13   Global Step: 143870   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:27:46,065-Speed 5432.69 samples/sec   Loss 3.3868   LearningRate 0.0312   Epoch: 13   Global Step: 143880   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:27:53,702-Speed 5364.39 samples/sec   Loss 3.3617   LearningRate 0.0312   Epoch: 13   Global Step: 143890   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:28:01,170-Speed 5485.46 samples/sec   Loss 3.3935   LearningRate 0.0311   Epoch: 13   Global Step: 143900   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:28:08,757-Speed 5399.10 samples/sec   Loss 3.4009   LearningRate 0.0311   Epoch: 13   Global Step: 143910   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:28:16,291-Speed 5437.63 samples/sec   Loss 3.4047   LearningRate 0.0311   Epoch: 13   Global Step: 143920   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:28:23,909-Speed 5377.40 samples/sec   Loss 3.4320   LearningRate 0.0311   Epoch: 13   Global Step: 143930   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:28:31,442-Speed 5438.47 samples/sec   Loss 3.3987   LearningRate 0.0311   Epoch: 13   Global Step: 143940   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:28:39,182-Speed 5292.27 samples/sec   Loss 3.3920   LearningRate 0.0311   Epoch: 13   Global Step: 143950   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:28:46,617-Speed 5510.15 samples/sec   Loss 3.4140   LearningRate 0.0311   Epoch: 13   Global Step: 143960   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:28:54,173-Speed 5421.97 samples/sec   Loss 3.4013   LearningRate 0.0311   Epoch: 13   Global Step: 143970   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:29:01,664-Speed 5468.41 samples/sec   Loss 3.4045   LearningRate 0.0311   Epoch: 13   Global Step: 143980   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:29:09,139-Speed 5480.46 samples/sec   Loss 3.4071   LearningRate 0.0311   Epoch: 13   Global Step: 143990   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:29:16,726-Speed 5399.78 samples/sec   Loss 3.3462   LearningRate 0.0310   Epoch: 13   Global Step: 144000   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:30:00,778-[lfw][144000]XNorm: 23.763988
Training: 2022-01-09 03:30:00,778-[lfw][144000]Accuracy-Flip: 0.99800+-0.00267
Training: 2022-01-09 03:30:00,779-[lfw][144000]Accuracy-Highest: 0.99817
Training: 2022-01-09 03:30:52,312-[cfp_fp][144000]XNorm: 22.170782
Training: 2022-01-09 03:30:52,313-[cfp_fp][144000]Accuracy-Flip: 0.99271+-0.00341
Training: 2022-01-09 03:30:52,314-[cfp_fp][144000]Accuracy-Highest: 0.99271
Training: 2022-01-09 03:31:36,814-[agedb_30][144000]XNorm: 23.898059
Training: 2022-01-09 03:31:36,815-[agedb_30][144000]Accuracy-Flip: 0.98033+-0.00823
Training: 2022-01-09 03:31:36,816-[agedb_30][144000]Accuracy-Highest: 0.98067
Training: 2022-01-09 03:31:44,454-Speed 277.27 samples/sec   Loss 3.3822   LearningRate 0.0310   Epoch: 13   Global Step: 144010   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:31:51,971-Speed 5449.91 samples/sec   Loss 3.4086   LearningRate 0.0310   Epoch: 13   Global Step: 144020   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:31:59,650-Speed 5334.95 samples/sec   Loss 3.4000   LearningRate 0.0310   Epoch: 13   Global Step: 144030   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:32:07,230-Speed 5404.03 samples/sec   Loss 3.3529   LearningRate 0.0310   Epoch: 13   Global Step: 144040   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:32:14,745-Speed 5451.75 samples/sec   Loss 3.4107   LearningRate 0.0310   Epoch: 13   Global Step: 144050   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:32:22,298-Speed 5423.58 samples/sec   Loss 3.3985   LearningRate 0.0310   Epoch: 13   Global Step: 144060   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:32:29,860-Speed 5417.15 samples/sec   Loss 3.3871   LearningRate 0.0310   Epoch: 13   Global Step: 144070   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:32:37,397-Speed 5435.34 samples/sec   Loss 3.3837   LearningRate 0.0310   Epoch: 13   Global Step: 144080   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:32:44,875-Speed 5478.59 samples/sec   Loss 3.3713   LearningRate 0.0310   Epoch: 13   Global Step: 144090   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-09 03:32:52,472-Speed 5392.38 samples/sec   Loss 3.4575   LearningRate 0.0310   Epoch: 13   Global Step: 144100   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:33:00,015-Speed 5430.41 samples/sec   Loss 3.4036   LearningRate 0.0309   Epoch: 13   Global Step: 144110   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-09 03:33:07,521-Speed 5457.59 samples/sec   Loss 3.3861   LearningRate 0.0309   Epoch: 13   Global Step: 144120   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:33:15,116-Speed 5394.39 samples/sec   Loss 3.3730   LearningRate 0.0309   Epoch: 13   Global Step: 144130   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:33:22,673-Speed 5420.38 samples/sec   Loss 3.4039   LearningRate 0.0309   Epoch: 13   Global Step: 144140   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:33:30,203-Speed 5440.39 samples/sec   Loss 3.4242   LearningRate 0.0309   Epoch: 13   Global Step: 144150   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:33:37,666-Speed 5488.56 samples/sec   Loss 3.3926   LearningRate 0.0309   Epoch: 13   Global Step: 144160   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:33:45,129-Speed 5489.52 samples/sec   Loss 3.3838   LearningRate 0.0309   Epoch: 13   Global Step: 144170   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:33:52,641-Speed 5453.43 samples/sec   Loss 3.4120   LearningRate 0.0309   Epoch: 13   Global Step: 144180   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:34:00,074-Speed 5511.02 samples/sec   Loss 3.3582   LearningRate 0.0309   Epoch: 13   Global Step: 144190   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:34:07,526-Speed 5497.47 samples/sec   Loss 3.3788   LearningRate 0.0309   Epoch: 13   Global Step: 144200   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:34:15,020-Speed 5466.20 samples/sec   Loss 3.3859   LearningRate 0.0308   Epoch: 13   Global Step: 144210   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:34:22,514-Speed 5466.59 samples/sec   Loss 3.3972   LearningRate 0.0308   Epoch: 13   Global Step: 144220   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:34:30,041-Speed 5442.27 samples/sec   Loss 3.3889   LearningRate 0.0308   Epoch: 13   Global Step: 144230   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:34:37,644-Speed 5388.67 samples/sec   Loss 3.3942   LearningRate 0.0308   Epoch: 13   Global Step: 144240   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:34:45,190-Speed 5428.93 samples/sec   Loss 3.4177   LearningRate 0.0308   Epoch: 13   Global Step: 144250   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:34:52,701-Speed 5454.41 samples/sec   Loss 3.4030   LearningRate 0.0308   Epoch: 13   Global Step: 144260   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:35:00,234-Speed 5438.19 samples/sec   Loss 3.4114   LearningRate 0.0308   Epoch: 13   Global Step: 144270   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:35:07,822-Speed 5398.20 samples/sec   Loss 3.3729   LearningRate 0.0308   Epoch: 13   Global Step: 144280   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:35:15,407-Speed 5400.46 samples/sec   Loss 3.3828   LearningRate 0.0308   Epoch: 13   Global Step: 144290   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:35:22,868-Speed 5491.05 samples/sec   Loss 3.3699   LearningRate 0.0308   Epoch: 13   Global Step: 144300   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:35:30,366-Speed 5463.35 samples/sec   Loss 3.4134   LearningRate 0.0307   Epoch: 13   Global Step: 144310   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:35:37,870-Speed 5459.04 samples/sec   Loss 3.3540   LearningRate 0.0307   Epoch: 13   Global Step: 144320   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:35:45,313-Speed 5504.01 samples/sec   Loss 3.3604   LearningRate 0.0307   Epoch: 13   Global Step: 144330   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:35:52,833-Speed 5447.41 samples/sec   Loss 3.3682   LearningRate 0.0307   Epoch: 13   Global Step: 144340   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:36:00,381-Speed 5427.01 samples/sec   Loss 3.3577   LearningRate 0.0307   Epoch: 13   Global Step: 144350   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:36:07,914-Speed 5438.20 samples/sec   Loss 3.3945   LearningRate 0.0307   Epoch: 13   Global Step: 144360   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:36:15,431-Speed 5450.28 samples/sec   Loss 3.3516   LearningRate 0.0307   Epoch: 13   Global Step: 144370   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:36:22,879-Speed 5500.19 samples/sec   Loss 3.3908   LearningRate 0.0307   Epoch: 13   Global Step: 144380   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:36:30,337-Speed 5492.77 samples/sec   Loss 3.3637   LearningRate 0.0307   Epoch: 13   Global Step: 144390   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:36:37,822-Speed 5472.91 samples/sec   Loss 3.3659   LearningRate 0.0307   Epoch: 13   Global Step: 144400   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:36:45,298-Speed 5479.41 samples/sec   Loss 3.4068   LearningRate 0.0306   Epoch: 13   Global Step: 144410   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:36:52,849-Speed 5425.98 samples/sec   Loss 3.3662   LearningRate 0.0306   Epoch: 13   Global Step: 144420   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:37:00,317-Speed 5485.26 samples/sec   Loss 3.3794   LearningRate 0.0306   Epoch: 13   Global Step: 144430   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:37:07,799-Speed 5475.18 samples/sec   Loss 3.3695   LearningRate 0.0306   Epoch: 13   Global Step: 144440   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:37:15,237-Speed 5507.76 samples/sec   Loss 3.3885   LearningRate 0.0306   Epoch: 13   Global Step: 144450   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:37:22,731-Speed 5466.26 samples/sec   Loss 3.3992   LearningRate 0.0306   Epoch: 13   Global Step: 144460   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:37:30,220-Speed 5470.21 samples/sec   Loss 3.3791   LearningRate 0.0306   Epoch: 13   Global Step: 144470   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:37:37,652-Speed 5512.22 samples/sec   Loss 3.3996   LearningRate 0.0306   Epoch: 13   Global Step: 144480   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:37:45,164-Speed 5453.33 samples/sec   Loss 3.3545   LearningRate 0.0306   Epoch: 13   Global Step: 144490   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:37:52,615-Speed 5498.30 samples/sec   Loss 3.3687   LearningRate 0.0306   Epoch: 13   Global Step: 144500   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:38:00,138-Speed 5445.72 samples/sec   Loss 3.3764   LearningRate 0.0306   Epoch: 13   Global Step: 144510   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:38:07,738-Speed 5389.73 samples/sec   Loss 3.3639   LearningRate 0.0305   Epoch: 13   Global Step: 144520   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:38:15,174-Speed 5509.32 samples/sec   Loss 3.3827   LearningRate 0.0305   Epoch: 13   Global Step: 144530   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:38:22,743-Speed 5412.07 samples/sec   Loss 3.3922   LearningRate 0.0305   Epoch: 13   Global Step: 144540   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:38:30,195-Speed 5497.85 samples/sec   Loss 3.3619   LearningRate 0.0305   Epoch: 13   Global Step: 144550   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:38:37,726-Speed 5439.66 samples/sec   Loss 3.3630   LearningRate 0.0305   Epoch: 13   Global Step: 144560   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:38:45,343-Speed 5377.38 samples/sec   Loss 3.3852   LearningRate 0.0305   Epoch: 13   Global Step: 144570   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:38:52,918-Speed 5408.21 samples/sec   Loss 3.3493   LearningRate 0.0305   Epoch: 13   Global Step: 144580   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:39:00,517-Speed 5390.95 samples/sec   Loss 3.3658   LearningRate 0.0305   Epoch: 13   Global Step: 144590   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:39:08,024-Speed 5457.43 samples/sec   Loss 3.3774   LearningRate 0.0305   Epoch: 13   Global Step: 144600   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:39:15,525-Speed 5461.06 samples/sec   Loss 3.3991   LearningRate 0.0305   Epoch: 13   Global Step: 144610   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:39:23,027-Speed 5460.58 samples/sec   Loss 3.3844   LearningRate 0.0304   Epoch: 13   Global Step: 144620   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:39:30,504-Speed 5479.13 samples/sec   Loss 3.3801   LearningRate 0.0304   Epoch: 13   Global Step: 144630   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:39:38,066-Speed 5417.28 samples/sec   Loss 3.3715   LearningRate 0.0304   Epoch: 13   Global Step: 144640   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:39:45,701-Speed 5365.24 samples/sec   Loss 3.4147   LearningRate 0.0304   Epoch: 13   Global Step: 144650   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:39:53,208-Speed 5457.71 samples/sec   Loss 3.3599   LearningRate 0.0304   Epoch: 13   Global Step: 144660   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:40:00,791-Speed 5401.97 samples/sec   Loss 3.3763   LearningRate 0.0304   Epoch: 13   Global Step: 144670   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:40:08,409-Speed 5377.29 samples/sec   Loss 3.3218   LearningRate 0.0304   Epoch: 13   Global Step: 144680   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-09 03:40:15,875-Speed 5486.77 samples/sec   Loss 3.3724   LearningRate 0.0304   Epoch: 13   Global Step: 144690   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-09 03:40:23,470-Speed 5394.16 samples/sec   Loss 3.3750   LearningRate 0.0304   Epoch: 13   Global Step: 144700   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:40:31,085-Speed 5380.35 samples/sec   Loss 3.3672   LearningRate 0.0304   Epoch: 13   Global Step: 144710   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:40:38,571-Speed 5472.81 samples/sec   Loss 3.3605   LearningRate 0.0303   Epoch: 13   Global Step: 144720   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:40:46,045-Speed 5480.34 samples/sec   Loss 3.3958   LearningRate 0.0303   Epoch: 13   Global Step: 144730   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:40:53,577-Speed 5439.22 samples/sec   Loss 3.3786   LearningRate 0.0303   Epoch: 13   Global Step: 144740   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:41:01,182-Speed 5386.51 samples/sec   Loss 3.3999   LearningRate 0.0303   Epoch: 13   Global Step: 144750   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:41:08,586-Speed 5533.53 samples/sec   Loss 3.3749   LearningRate 0.0303   Epoch: 13   Global Step: 144760   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:41:16,025-Speed 5506.32 samples/sec   Loss 3.3632   LearningRate 0.0303   Epoch: 13   Global Step: 144770   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:41:23,730-Speed 5316.96 samples/sec   Loss 3.3907   LearningRate 0.0303   Epoch: 13   Global Step: 144780   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:41:31,142-Speed 5526.79 samples/sec   Loss 3.3553   LearningRate 0.0303   Epoch: 13   Global Step: 144790   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:41:38,713-Speed 5411.37 samples/sec   Loss 3.3351   LearningRate 0.0303   Epoch: 13   Global Step: 144800   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:41:46,270-Speed 5420.84 samples/sec   Loss 3.3142   LearningRate 0.0303   Epoch: 13   Global Step: 144810   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:41:53,767-Speed 5464.13 samples/sec   Loss 3.3450   LearningRate 0.0303   Epoch: 13   Global Step: 144820   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:42:01,245-Speed 5477.78 samples/sec   Loss 3.3835   LearningRate 0.0302   Epoch: 13   Global Step: 144830   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:42:08,703-Speed 5493.39 samples/sec   Loss 3.3394   LearningRate 0.0302   Epoch: 13   Global Step: 144840   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:42:16,206-Speed 5459.97 samples/sec   Loss 3.3811   LearningRate 0.0302   Epoch: 13   Global Step: 144850   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:42:23,721-Speed 5451.00 samples/sec   Loss 3.3822   LearningRate 0.0302   Epoch: 13   Global Step: 144860   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:42:31,184-Speed 5488.96 samples/sec   Loss 3.3701   LearningRate 0.0302   Epoch: 13   Global Step: 144870   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:42:38,795-Speed 5382.36 samples/sec   Loss 3.3425   LearningRate 0.0302   Epoch: 13   Global Step: 144880   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:42:46,339-Speed 5430.60 samples/sec   Loss 3.3651   LearningRate 0.0302   Epoch: 13   Global Step: 144890   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:42:53,859-Speed 5447.42 samples/sec   Loss 3.3808   LearningRate 0.0302   Epoch: 13   Global Step: 144900   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:43:01,366-Speed 5457.41 samples/sec   Loss 3.3185   LearningRate 0.0302   Epoch: 13   Global Step: 144910   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:43:08,841-Speed 5479.75 samples/sec   Loss 3.3863   LearningRate 0.0302   Epoch: 13   Global Step: 144920   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:43:16,457-Speed 5379.11 samples/sec   Loss 3.3951   LearningRate 0.0301   Epoch: 13   Global Step: 144930   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:43:24,073-Speed 5378.88 samples/sec   Loss 3.3457   LearningRate 0.0301   Epoch: 13   Global Step: 144940   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:43:31,560-Speed 5471.69 samples/sec   Loss 3.4020   LearningRate 0.0301   Epoch: 13   Global Step: 144950   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:43:39,052-Speed 5468.13 samples/sec   Loss 3.3679   LearningRate 0.0301   Epoch: 13   Global Step: 144960   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:43:46,619-Speed 5413.71 samples/sec   Loss 3.4080   LearningRate 0.0301   Epoch: 13   Global Step: 144970   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:43:54,097-Speed 5477.37 samples/sec   Loss 3.3568   LearningRate 0.0301   Epoch: 13   Global Step: 144980   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:44:01,650-Speed 5423.86 samples/sec   Loss 3.4152   LearningRate 0.0301   Epoch: 13   Global Step: 144990   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:44:09,172-Speed 5445.96 samples/sec   Loss 3.3295   LearningRate 0.0301   Epoch: 13   Global Step: 145000   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:44:16,643-Speed 5483.75 samples/sec   Loss 3.3399   LearningRate 0.0301   Epoch: 13   Global Step: 145010   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:44:24,268-Speed 5372.54 samples/sec   Loss 3.3709   LearningRate 0.0301   Epoch: 13   Global Step: 145020   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:44:31,838-Speed 5411.31 samples/sec   Loss 3.3717   LearningRate 0.0300   Epoch: 13   Global Step: 145030   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:44:39,346-Speed 5456.35 samples/sec   Loss 3.3135   LearningRate 0.0300   Epoch: 13   Global Step: 145040   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:44:46,855-Speed 5455.94 samples/sec   Loss 3.3470   LearningRate 0.0300   Epoch: 13   Global Step: 145050   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:44:54,372-Speed 5449.25 samples/sec   Loss 3.3133   LearningRate 0.0300   Epoch: 13   Global Step: 145060   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:45:01,893-Speed 5446.77 samples/sec   Loss 3.3850   LearningRate 0.0300   Epoch: 13   Global Step: 145070   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:45:09,385-Speed 5468.15 samples/sec   Loss 3.3521   LearningRate 0.0300   Epoch: 13   Global Step: 145080   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:45:16,930-Speed 5429.62 samples/sec   Loss 3.3380   LearningRate 0.0300   Epoch: 13   Global Step: 145090   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:45:24,430-Speed 5461.69 samples/sec   Loss 3.3512   LearningRate 0.0300   Epoch: 13   Global Step: 145100   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:45:32,029-Speed 5391.10 samples/sec   Loss 3.3441   LearningRate 0.0300   Epoch: 13   Global Step: 145110   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:45:39,667-Speed 5363.35 samples/sec   Loss 3.3864   LearningRate 0.0300   Epoch: 13   Global Step: 145120   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:45:47,243-Speed 5407.38 samples/sec   Loss 3.3465   LearningRate 0.0300   Epoch: 13   Global Step: 145130   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:45:54,770-Speed 5442.32 samples/sec   Loss 3.3863   LearningRate 0.0299   Epoch: 13   Global Step: 145140   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:46:02,287-Speed 5449.92 samples/sec   Loss 3.3941   LearningRate 0.0299   Epoch: 13   Global Step: 145150   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:46:09,776-Speed 5470.47 samples/sec   Loss 3.3928   LearningRate 0.0299   Epoch: 13   Global Step: 145160   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:46:17,314-Speed 5434.45 samples/sec   Loss 3.3364   LearningRate 0.0299   Epoch: 13   Global Step: 145170   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:46:40,053-Speed 1801.42 samples/sec   Loss 3.3820   LearningRate 0.0299   Epoch: 14   Global Step: 145180   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:46:47,641-Speed 5398.66 samples/sec   Loss 3.3693   LearningRate 0.0299   Epoch: 14   Global Step: 145190   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:46:55,129-Speed 5471.18 samples/sec   Loss 3.3467   LearningRate 0.0299   Epoch: 14   Global Step: 145200   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:47:02,596-Speed 5486.66 samples/sec   Loss 3.3330   LearningRate 0.0299   Epoch: 14   Global Step: 145210   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:47:10,114-Speed 5448.50 samples/sec   Loss 3.3657   LearningRate 0.0299   Epoch: 14   Global Step: 145220   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:47:17,618-Speed 5459.74 samples/sec   Loss 3.3513   LearningRate 0.0299   Epoch: 14   Global Step: 145230   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:47:25,245-Speed 5370.82 samples/sec   Loss 3.3500   LearningRate 0.0298   Epoch: 14   Global Step: 145240   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:47:32,716-Speed 5483.53 samples/sec   Loss 3.3277   LearningRate 0.0298   Epoch: 14   Global Step: 145250   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:47:40,272-Speed 5421.58 samples/sec   Loss 3.3445   LearningRate 0.0298   Epoch: 14   Global Step: 145260   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:47:47,741-Speed 5484.75 samples/sec   Loss 3.3326   LearningRate 0.0298   Epoch: 14   Global Step: 145270   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:47:55,196-Speed 5494.87 samples/sec   Loss 3.3171   LearningRate 0.0298   Epoch: 14   Global Step: 145280   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:48:08,750-Speed 3022.49 samples/sec   Loss 3.3114   LearningRate 0.0298   Epoch: 14   Global Step: 145290   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:48:16,199-Speed 5499.77 samples/sec   Loss 3.3222   LearningRate 0.0298   Epoch: 14   Global Step: 145300   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:48:23,694-Speed 5465.80 samples/sec   Loss 3.3691   LearningRate 0.0298   Epoch: 14   Global Step: 145310   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:48:31,159-Speed 5487.55 samples/sec   Loss 3.3102   LearningRate 0.0298   Epoch: 14   Global Step: 145320   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:48:38,658-Speed 5463.12 samples/sec   Loss 3.2955   LearningRate 0.0298   Epoch: 14   Global Step: 145330   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:48:46,084-Speed 5516.03 samples/sec   Loss 3.3390   LearningRate 0.0297   Epoch: 14   Global Step: 145340   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:48:53,614-Speed 5440.78 samples/sec   Loss 3.3390   LearningRate 0.0297   Epoch: 14   Global Step: 145350   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:49:01,185-Speed 5410.84 samples/sec   Loss 3.3108   LearningRate 0.0297   Epoch: 14   Global Step: 145360   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:49:08,632-Speed 5500.67 samples/sec   Loss 3.3130   LearningRate 0.0297   Epoch: 14   Global Step: 145370   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:49:16,124-Speed 5467.90 samples/sec   Loss 3.2981   LearningRate 0.0297   Epoch: 14   Global Step: 145380   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:49:23,557-Speed 5511.37 samples/sec   Loss 3.3036   LearningRate 0.0297   Epoch: 14   Global Step: 145390   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:49:31,008-Speed 5498.10 samples/sec   Loss 3.3142   LearningRate 0.0297   Epoch: 14   Global Step: 145400   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:49:38,526-Speed 5449.28 samples/sec   Loss 3.2828   LearningRate 0.0297   Epoch: 14   Global Step: 145410   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:49:46,035-Speed 5455.19 samples/sec   Loss 3.2764   LearningRate 0.0297   Epoch: 14   Global Step: 145420   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:49:53,533-Speed 5464.14 samples/sec   Loss 3.3559   LearningRate 0.0297   Epoch: 14   Global Step: 145430   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:50:01,085-Speed 5424.56 samples/sec   Loss 3.3589   LearningRate 0.0297   Epoch: 14   Global Step: 145440   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:50:08,689-Speed 5387.52 samples/sec   Loss 3.3043   LearningRate 0.0296   Epoch: 14   Global Step: 145450   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:50:16,165-Speed 5479.79 samples/sec   Loss 3.3177   LearningRate 0.0296   Epoch: 14   Global Step: 145460   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:50:23,731-Speed 5414.39 samples/sec   Loss 3.3223   LearningRate 0.0296   Epoch: 14   Global Step: 145470   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:50:31,199-Speed 5485.13 samples/sec   Loss 3.3402   LearningRate 0.0296   Epoch: 14   Global Step: 145480   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:50:38,686-Speed 5471.37 samples/sec   Loss 3.3303   LearningRate 0.0296   Epoch: 14   Global Step: 145490   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:50:46,196-Speed 5454.99 samples/sec   Loss 3.3286   LearningRate 0.0296   Epoch: 14   Global Step: 145500   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:50:53,676-Speed 5477.21 samples/sec   Loss 3.3241   LearningRate 0.0296   Epoch: 14   Global Step: 145510   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:51:01,209-Speed 5437.91 samples/sec   Loss 3.3257   LearningRate 0.0296   Epoch: 14   Global Step: 145520   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:51:08,791-Speed 5402.68 samples/sec   Loss 3.2891   LearningRate 0.0296   Epoch: 14   Global Step: 145530   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:51:16,364-Speed 5409.61 samples/sec   Loss 3.3078   LearningRate 0.0296   Epoch: 14   Global Step: 145540   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:51:23,920-Speed 5422.04 samples/sec   Loss 3.2878   LearningRate 0.0295   Epoch: 14   Global Step: 145550   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:51:31,397-Speed 5478.71 samples/sec   Loss 3.2982   LearningRate 0.0295   Epoch: 14   Global Step: 145560   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:51:38,809-Speed 5526.31 samples/sec   Loss 3.3362   LearningRate 0.0295   Epoch: 14   Global Step: 145570   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:51:46,347-Speed 5435.15 samples/sec   Loss 3.3203   LearningRate 0.0295   Epoch: 14   Global Step: 145580   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:51:53,869-Speed 5446.46 samples/sec   Loss 3.2790   LearningRate 0.0295   Epoch: 14   Global Step: 145590   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:52:01,404-Speed 5436.38 samples/sec   Loss 3.2714   LearningRate 0.0295   Epoch: 14   Global Step: 145600   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:52:08,896-Speed 5467.62 samples/sec   Loss 3.3203   LearningRate 0.0295   Epoch: 14   Global Step: 145610   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:52:16,426-Speed 5440.74 samples/sec   Loss 3.3138   LearningRate 0.0295   Epoch: 14   Global Step: 145620   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:52:24,125-Speed 5320.96 samples/sec   Loss 3.3314   LearningRate 0.0295   Epoch: 14   Global Step: 145630   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:52:31,728-Speed 5388.01 samples/sec   Loss 3.2533   LearningRate 0.0295   Epoch: 14   Global Step: 145640   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:52:39,157-Speed 5513.90 samples/sec   Loss 3.3369   LearningRate 0.0295   Epoch: 14   Global Step: 145650   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:52:46,609-Speed 5497.77 samples/sec   Loss 3.2975   LearningRate 0.0294   Epoch: 14   Global Step: 145660   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 03:52:54,125-Speed 5450.33 samples/sec   Loss 3.2915   LearningRate 0.0294   Epoch: 14   Global Step: 145670   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:53:01,643-Speed 5448.83 samples/sec   Loss 3.2847   LearningRate 0.0294   Epoch: 14   Global Step: 145680   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:53:09,205-Speed 5417.34 samples/sec   Loss 3.3350   LearningRate 0.0294   Epoch: 14   Global Step: 145690   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:53:16,673-Speed 5485.53 samples/sec   Loss 3.2757   LearningRate 0.0294   Epoch: 14   Global Step: 145700   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:53:24,110-Speed 5508.22 samples/sec   Loss 3.3111   LearningRate 0.0294   Epoch: 14   Global Step: 145710   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:53:31,597-Speed 5471.75 samples/sec   Loss 3.3528   LearningRate 0.0294   Epoch: 14   Global Step: 145720   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:53:39,062-Speed 5487.65 samples/sec   Loss 3.3247   LearningRate 0.0294   Epoch: 14   Global Step: 145730   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:53:46,669-Speed 5385.45 samples/sec   Loss 3.3269   LearningRate 0.0294   Epoch: 14   Global Step: 145740   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:53:54,164-Speed 5465.99 samples/sec   Loss 3.3085   LearningRate 0.0294   Epoch: 14   Global Step: 145750   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:54:01,581-Speed 5522.97 samples/sec   Loss 3.2934   LearningRate 0.0293   Epoch: 14   Global Step: 145760   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:54:09,131-Speed 5426.02 samples/sec   Loss 3.3158   LearningRate 0.0293   Epoch: 14   Global Step: 145770   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-09 03:54:16,630-Speed 5462.46 samples/sec   Loss 3.3353   LearningRate 0.0293   Epoch: 14   Global Step: 145780   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:54:24,203-Speed 5409.53 samples/sec   Loss 3.3020   LearningRate 0.0293   Epoch: 14   Global Step: 145790   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:54:31,695-Speed 5468.05 samples/sec   Loss 3.2848   LearningRate 0.0293   Epoch: 14   Global Step: 145800   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:54:39,182-Speed 5471.42 samples/sec   Loss 3.3286   LearningRate 0.0293   Epoch: 14   Global Step: 145810   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:54:46,662-Speed 5476.52 samples/sec   Loss 3.2908   LearningRate 0.0293   Epoch: 14   Global Step: 145820   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:54:54,119-Speed 5494.03 samples/sec   Loss 3.3046   LearningRate 0.0293   Epoch: 14   Global Step: 145830   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:55:01,704-Speed 5401.05 samples/sec   Loss 3.3225   LearningRate 0.0293   Epoch: 14   Global Step: 145840   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:55:09,207-Speed 5460.01 samples/sec   Loss 3.3383   LearningRate 0.0293   Epoch: 14   Global Step: 145850   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:55:16,687-Speed 5476.15 samples/sec   Loss 3.3273   LearningRate 0.0293   Epoch: 14   Global Step: 145860   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:55:24,248-Speed 5418.41 samples/sec   Loss 3.2663   LearningRate 0.0292   Epoch: 14   Global Step: 145870   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:55:31,720-Speed 5482.90 samples/sec   Loss 3.2845   LearningRate 0.0292   Epoch: 14   Global Step: 145880   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:55:39,228-Speed 5456.20 samples/sec   Loss 3.3043   LearningRate 0.0292   Epoch: 14   Global Step: 145890   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:55:46,759-Speed 5438.98 samples/sec   Loss 3.3417   LearningRate 0.0292   Epoch: 14   Global Step: 145900   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:55:54,208-Speed 5499.87 samples/sec   Loss 3.2808   LearningRate 0.0292   Epoch: 14   Global Step: 145910   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:56:01,638-Speed 5513.34 samples/sec   Loss 3.2794   LearningRate 0.0292   Epoch: 14   Global Step: 145920   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:56:09,095-Speed 5493.39 samples/sec   Loss 3.2953   LearningRate 0.0292   Epoch: 14   Global Step: 145930   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:56:16,548-Speed 5496.89 samples/sec   Loss 3.3184   LearningRate 0.0292   Epoch: 14   Global Step: 145940   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:56:24,032-Speed 5473.41 samples/sec   Loss 3.3026   LearningRate 0.0292   Epoch: 14   Global Step: 145950   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:56:31,518-Speed 5472.54 samples/sec   Loss 3.2449   LearningRate 0.0292   Epoch: 14   Global Step: 145960   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:56:38,977-Speed 5492.54 samples/sec   Loss 3.3203   LearningRate 0.0291   Epoch: 14   Global Step: 145970   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:56:46,485-Speed 5455.88 samples/sec   Loss 3.3001   LearningRate 0.0291   Epoch: 14   Global Step: 145980   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-09 03:56:53,958-Speed 5482.12 samples/sec   Loss 3.3064   LearningRate 0.0291   Epoch: 14   Global Step: 145990   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:57:01,478-Speed 5447.63 samples/sec   Loss 3.3109   LearningRate 0.0291   Epoch: 14   Global Step: 146000   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:57:45,345-[lfw][146000]XNorm: 23.394129
Training: 2022-01-09 03:57:45,346-[lfw][146000]Accuracy-Flip: 0.99767+-0.00300
Training: 2022-01-09 03:57:45,346-[lfw][146000]Accuracy-Highest: 0.99817
Training: 2022-01-09 03:58:36,498-[cfp_fp][146000]XNorm: 22.015470
Training: 2022-01-09 03:58:36,499-[cfp_fp][146000]Accuracy-Flip: 0.99214+-0.00405
Training: 2022-01-09 03:58:36,499-[cfp_fp][146000]Accuracy-Highest: 0.99271
Training: 2022-01-09 03:59:20,596-[agedb_30][146000]XNorm: 23.441551
Training: 2022-01-09 03:59:20,597-[agedb_30][146000]Accuracy-Flip: 0.98050+-0.00715
Training: 2022-01-09 03:59:20,597-[agedb_30][146000]Accuracy-Highest: 0.98067
Training: 2022-01-09 03:59:28,251-Speed 279.07 samples/sec   Loss 3.3025   LearningRate 0.0291   Epoch: 14   Global Step: 146010   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:59:35,818-Speed 5413.82 samples/sec   Loss 3.2827   LearningRate 0.0291   Epoch: 14   Global Step: 146020   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:59:43,299-Speed 5475.66 samples/sec   Loss 3.2648   LearningRate 0.0291   Epoch: 14   Global Step: 146030   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:59:50,748-Speed 5500.00 samples/sec   Loss 3.3068   LearningRate 0.0291   Epoch: 14   Global Step: 146040   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 03:59:58,197-Speed 5499.65 samples/sec   Loss 3.2924   LearningRate 0.0291   Epoch: 14   Global Step: 146050   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:00:05,684-Speed 5470.94 samples/sec   Loss 3.3016   LearningRate 0.0291   Epoch: 14   Global Step: 146060   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:00:13,229-Speed 5429.49 samples/sec   Loss 3.3212   LearningRate 0.0291   Epoch: 14   Global Step: 146070   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:00:20,726-Speed 5464.53 samples/sec   Loss 3.3021   LearningRate 0.0290   Epoch: 14   Global Step: 146080   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:00:28,276-Speed 5426.11 samples/sec   Loss 3.2941   LearningRate 0.0290   Epoch: 14   Global Step: 146090   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-09 04:00:35,725-Speed 5499.01 samples/sec   Loss 3.3128   LearningRate 0.0290   Epoch: 14   Global Step: 146100   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:00:43,160-Speed 5509.88 samples/sec   Loss 3.3226   LearningRate 0.0290   Epoch: 14   Global Step: 146110   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:00:50,622-Speed 5490.10 samples/sec   Loss 3.2993   LearningRate 0.0290   Epoch: 14   Global Step: 146120   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:00:58,077-Speed 5495.77 samples/sec   Loss 3.2701   LearningRate 0.0290   Epoch: 14   Global Step: 146130   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:01:05,577-Speed 5461.67 samples/sec   Loss 3.2905   LearningRate 0.0290   Epoch: 14   Global Step: 146140   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:01:13,064-Speed 5471.92 samples/sec   Loss 3.2582   LearningRate 0.0290   Epoch: 14   Global Step: 146150   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:01:20,530-Speed 5487.38 samples/sec   Loss 3.3260   LearningRate 0.0290   Epoch: 14   Global Step: 146160   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:01:28,005-Speed 5479.91 samples/sec   Loss 3.2651   LearningRate 0.0290   Epoch: 14   Global Step: 146170   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:01:35,497-Speed 5467.90 samples/sec   Loss 3.2841   LearningRate 0.0289   Epoch: 14   Global Step: 146180   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:01:42,958-Speed 5490.64 samples/sec   Loss 3.3017   LearningRate 0.0289   Epoch: 14   Global Step: 146190   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:01:50,414-Speed 5494.46 samples/sec   Loss 3.2621   LearningRate 0.0289   Epoch: 14   Global Step: 146200   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:01:57,919-Speed 5458.56 samples/sec   Loss 3.2507   LearningRate 0.0289   Epoch: 14   Global Step: 146210   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:02:05,528-Speed 5383.71 samples/sec   Loss 3.2737   LearningRate 0.0289   Epoch: 14   Global Step: 146220   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:02:13,074-Speed 5429.00 samples/sec   Loss 3.3255   LearningRate 0.0289   Epoch: 14   Global Step: 146230   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:02:20,558-Speed 5473.50 samples/sec   Loss 3.3166   LearningRate 0.0289   Epoch: 14   Global Step: 146240   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:02:27,997-Speed 5507.07 samples/sec   Loss 3.2369   LearningRate 0.0289   Epoch: 14   Global Step: 146250   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:02:35,565-Speed 5412.81 samples/sec   Loss 3.2778   LearningRate 0.0289   Epoch: 14   Global Step: 146260   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:02:43,105-Speed 5432.71 samples/sec   Loss 3.2951   LearningRate 0.0289   Epoch: 14   Global Step: 146270   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:02:50,582-Speed 5479.14 samples/sec   Loss 3.3055   LearningRate 0.0289   Epoch: 14   Global Step: 146280   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:02:58,058-Speed 5479.61 samples/sec   Loss 3.2507   LearningRate 0.0288   Epoch: 14   Global Step: 146290   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:03:05,532-Speed 5481.46 samples/sec   Loss 3.3003   LearningRate 0.0288   Epoch: 14   Global Step: 146300   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:03:12,965-Speed 5511.10 samples/sec   Loss 3.2590   LearningRate 0.0288   Epoch: 14   Global Step: 146310   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:03:20,494-Speed 5441.18 samples/sec   Loss 3.2875   LearningRate 0.0288   Epoch: 14   Global Step: 146320   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:03:28,063-Speed 5412.38 samples/sec   Loss 3.2568   LearningRate 0.0288   Epoch: 14   Global Step: 146330   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:03:35,478-Speed 5524.48 samples/sec   Loss 3.2656   LearningRate 0.0288   Epoch: 14   Global Step: 146340   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:03:42,986-Speed 5456.46 samples/sec   Loss 3.2250   LearningRate 0.0288   Epoch: 14   Global Step: 146350   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:03:50,453-Speed 5486.45 samples/sec   Loss 3.2988   LearningRate 0.0288   Epoch: 14   Global Step: 146360   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:03:57,908-Speed 5495.63 samples/sec   Loss 3.3256   LearningRate 0.0288   Epoch: 14   Global Step: 146370   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:04:05,368-Speed 5490.97 samples/sec   Loss 3.2759   LearningRate 0.0288   Epoch: 14   Global Step: 146380   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:04:12,872-Speed 5459.23 samples/sec   Loss 3.2490   LearningRate 0.0288   Epoch: 14   Global Step: 146390   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:04:20,415-Speed 5430.14 samples/sec   Loss 3.3011   LearningRate 0.0287   Epoch: 14   Global Step: 146400   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:04:27,851-Speed 5509.51 samples/sec   Loss 3.3114   LearningRate 0.0287   Epoch: 14   Global Step: 146410   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:04:35,352-Speed 5461.85 samples/sec   Loss 3.2472   LearningRate 0.0287   Epoch: 14   Global Step: 146420   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:04:42,890-Speed 5433.96 samples/sec   Loss 3.2977   LearningRate 0.0287   Epoch: 14   Global Step: 146430   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:04:50,347-Speed 5493.89 samples/sec   Loss 3.2830   LearningRate 0.0287   Epoch: 14   Global Step: 146440   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:04:57,850-Speed 5459.64 samples/sec   Loss 3.2747   LearningRate 0.0287   Epoch: 14   Global Step: 146450   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:05:05,324-Speed 5481.35 samples/sec   Loss 3.2560   LearningRate 0.0287   Epoch: 14   Global Step: 146460   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:05:12,910-Speed 5399.59 samples/sec   Loss 3.2400   LearningRate 0.0287   Epoch: 14   Global Step: 146470   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:05:20,424-Speed 5452.08 samples/sec   Loss 3.2658   LearningRate 0.0287   Epoch: 14   Global Step: 146480   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:05:27,875-Speed 5498.86 samples/sec   Loss 3.2698   LearningRate 0.0287   Epoch: 14   Global Step: 146490   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:05:35,346-Speed 5483.27 samples/sec   Loss 3.2855   LearningRate 0.0286   Epoch: 14   Global Step: 146500   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:05:42,844-Speed 5463.08 samples/sec   Loss 3.2609   LearningRate 0.0286   Epoch: 14   Global Step: 146510   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:05:50,491-Speed 5357.34 samples/sec   Loss 3.3023   LearningRate 0.0286   Epoch: 14   Global Step: 146520   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:05:57,913-Speed 5519.49 samples/sec   Loss 3.2646   LearningRate 0.0286   Epoch: 14   Global Step: 146530   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:06:05,392-Speed 5477.37 samples/sec   Loss 3.2391   LearningRate 0.0286   Epoch: 14   Global Step: 146540   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:06:12,833-Speed 5505.57 samples/sec   Loss 3.2539   LearningRate 0.0286   Epoch: 14   Global Step: 146550   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:06:20,289-Speed 5493.85 samples/sec   Loss 3.2305   LearningRate 0.0286   Epoch: 14   Global Step: 146560   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:06:27,759-Speed 5484.69 samples/sec   Loss 3.2471   LearningRate 0.0286   Epoch: 14   Global Step: 146570   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:06:35,223-Speed 5488.05 samples/sec   Loss 3.2496   LearningRate 0.0286   Epoch: 14   Global Step: 146580   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:06:42,685-Speed 5490.34 samples/sec   Loss 3.2496   LearningRate 0.0286   Epoch: 14   Global Step: 146590   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:06:50,165-Speed 5476.28 samples/sec   Loss 3.2463   LearningRate 0.0286   Epoch: 14   Global Step: 146600   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:06:57,688-Speed 5445.75 samples/sec   Loss 3.2570   LearningRate 0.0285   Epoch: 14   Global Step: 146610   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-09 04:07:05,145-Speed 5493.52 samples/sec   Loss 3.2900   LearningRate 0.0285   Epoch: 14   Global Step: 146620   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:07:12,683-Speed 5434.29 samples/sec   Loss 3.2254   LearningRate 0.0285   Epoch: 14   Global Step: 146630   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:07:20,143-Speed 5491.14 samples/sec   Loss 3.2388   LearningRate 0.0285   Epoch: 14   Global Step: 146640   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:07:27,694-Speed 5424.84 samples/sec   Loss 3.2583   LearningRate 0.0285   Epoch: 14   Global Step: 146650   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:07:35,146-Speed 5497.92 samples/sec   Loss 3.2322   LearningRate 0.0285   Epoch: 14   Global Step: 146660   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:07:42,709-Speed 5416.59 samples/sec   Loss 3.2549   LearningRate 0.0285   Epoch: 14   Global Step: 146670   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:07:50,187-Speed 5477.95 samples/sec   Loss 3.2755   LearningRate 0.0285   Epoch: 14   Global Step: 146680   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:07:57,705-Speed 5449.22 samples/sec   Loss 3.2477   LearningRate 0.0285   Epoch: 14   Global Step: 146690   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:08:05,206-Speed 5461.40 samples/sec   Loss 3.2660   LearningRate 0.0285   Epoch: 14   Global Step: 146700   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:08:12,716-Speed 5454.89 samples/sec   Loss 3.2607   LearningRate 0.0285   Epoch: 14   Global Step: 146710   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:08:20,229-Speed 5452.16 samples/sec   Loss 3.2708   LearningRate 0.0284   Epoch: 14   Global Step: 146720   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-09 04:08:27,710-Speed 5475.82 samples/sec   Loss 3.2278   LearningRate 0.0284   Epoch: 14   Global Step: 146730   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:08:35,205-Speed 5466.30 samples/sec   Loss 3.2370   LearningRate 0.0284   Epoch: 14   Global Step: 146740   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:08:42,751-Speed 5429.14 samples/sec   Loss 3.2649   LearningRate 0.0284   Epoch: 14   Global Step: 146750   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:08:50,325-Speed 5407.83 samples/sec   Loss 3.2446   LearningRate 0.0284   Epoch: 14   Global Step: 146760   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:08:57,833-Speed 5456.31 samples/sec   Loss 3.2966   LearningRate 0.0284   Epoch: 14   Global Step: 146770   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:09:05,391-Speed 5420.56 samples/sec   Loss 3.2428   LearningRate 0.0284   Epoch: 14   Global Step: 146780   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:09:12,868-Speed 5478.94 samples/sec   Loss 3.2650   LearningRate 0.0284   Epoch: 14   Global Step: 146790   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:09:20,362-Speed 5466.48 samples/sec   Loss 3.2748   LearningRate 0.0284   Epoch: 14   Global Step: 146800   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:09:27,818-Speed 5494.52 samples/sec   Loss 3.2584   LearningRate 0.0284   Epoch: 14   Global Step: 146810   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:09:35,286-Speed 5485.41 samples/sec   Loss 3.2714   LearningRate 0.0283   Epoch: 14   Global Step: 146820   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:09:42,783-Speed 5464.38 samples/sec   Loss 3.2238   LearningRate 0.0283   Epoch: 14   Global Step: 146830   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:09:50,478-Speed 5323.34 samples/sec   Loss 3.2593   LearningRate 0.0283   Epoch: 14   Global Step: 146840   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:09:58,037-Speed 5419.48 samples/sec   Loss 3.2627   LearningRate 0.0283   Epoch: 14   Global Step: 146850   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:10:05,566-Speed 5440.64 samples/sec   Loss 3.2801   LearningRate 0.0283   Epoch: 14   Global Step: 146860   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:10:12,995-Speed 5514.39 samples/sec   Loss 3.2819   LearningRate 0.0283   Epoch: 14   Global Step: 146870   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:10:20,465-Speed 5484.12 samples/sec   Loss 3.2542   LearningRate 0.0283   Epoch: 14   Global Step: 146880   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:10:27,924-Speed 5491.67 samples/sec   Loss 3.2409   LearningRate 0.0283   Epoch: 14   Global Step: 146890   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:10:35,416-Speed 5468.05 samples/sec   Loss 3.2168   LearningRate 0.0283   Epoch: 14   Global Step: 146900   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:10:43,032-Speed 5378.68 samples/sec   Loss 3.2150   LearningRate 0.0283   Epoch: 14   Global Step: 146910   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:10:50,537-Speed 5458.89 samples/sec   Loss 3.2460   LearningRate 0.0283   Epoch: 14   Global Step: 146920   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:10:58,114-Speed 5405.98 samples/sec   Loss 3.2570   LearningRate 0.0282   Epoch: 14   Global Step: 146930   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:11:05,633-Speed 5448.30 samples/sec   Loss 3.2576   LearningRate 0.0282   Epoch: 14   Global Step: 146940   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:11:13,159-Speed 5443.83 samples/sec   Loss 3.2263   LearningRate 0.0282   Epoch: 14   Global Step: 146950   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:11:20,665-Speed 5457.44 samples/sec   Loss 3.2449   LearningRate 0.0282   Epoch: 14   Global Step: 146960   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:11:28,163-Speed 5463.20 samples/sec   Loss 3.2167   LearningRate 0.0282   Epoch: 14   Global Step: 146970   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:11:35,632-Speed 5485.36 samples/sec   Loss 3.2038   LearningRate 0.0282   Epoch: 14   Global Step: 146980   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:11:43,063-Speed 5513.06 samples/sec   Loss 3.2369   LearningRate 0.0282   Epoch: 14   Global Step: 146990   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:11:50,614-Speed 5424.97 samples/sec   Loss 3.2503   LearningRate 0.0282   Epoch: 14   Global Step: 147000   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:11:58,175-Speed 5417.84 samples/sec   Loss 3.2521   LearningRate 0.0282   Epoch: 14   Global Step: 147010   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:12:05,701-Speed 5442.95 samples/sec   Loss 3.2224   LearningRate 0.0282   Epoch: 14   Global Step: 147020   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:12:13,165-Speed 5488.70 samples/sec   Loss 3.2248   LearningRate 0.0282   Epoch: 14   Global Step: 147030   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:12:20,668-Speed 5459.72 samples/sec   Loss 3.2207   LearningRate 0.0281   Epoch: 14   Global Step: 147040   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:12:28,104-Speed 5509.39 samples/sec   Loss 3.2667   LearningRate 0.0281   Epoch: 14   Global Step: 147050   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:12:35,538-Speed 5510.25 samples/sec   Loss 3.2480   LearningRate 0.0281   Epoch: 14   Global Step: 147060   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:12:42,999-Speed 5491.19 samples/sec   Loss 3.2632   LearningRate 0.0281   Epoch: 14   Global Step: 147070   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:12:50,507-Speed 5456.40 samples/sec   Loss 3.2427   LearningRate 0.0281   Epoch: 14   Global Step: 147080   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:12:58,084-Speed 5405.78 samples/sec   Loss 3.2296   LearningRate 0.0281   Epoch: 14   Global Step: 147090   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:13:05,566-Speed 5475.43 samples/sec   Loss 3.2649   LearningRate 0.0281   Epoch: 14   Global Step: 147100   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:13:12,994-Speed 5515.35 samples/sec   Loss 3.2366   LearningRate 0.0281   Epoch: 14   Global Step: 147110   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:13:20,472-Speed 5478.01 samples/sec   Loss 3.2256   LearningRate 0.0281   Epoch: 14   Global Step: 147120   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:13:27,981-Speed 5455.38 samples/sec   Loss 3.2405   LearningRate 0.0281   Epoch: 14   Global Step: 147130   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:13:35,461-Speed 5476.55 samples/sec   Loss 3.2450   LearningRate 0.0280   Epoch: 14   Global Step: 147140   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:13:42,987-Speed 5443.94 samples/sec   Loss 3.1720   LearningRate 0.0280   Epoch: 14   Global Step: 147150   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:13:50,531-Speed 5430.00 samples/sec   Loss 3.2505   LearningRate 0.0280   Epoch: 14   Global Step: 147160   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:13:58,042-Speed 5454.15 samples/sec   Loss 3.2162   LearningRate 0.0280   Epoch: 14   Global Step: 147170   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:14:05,531-Speed 5469.92 samples/sec   Loss 3.1787   LearningRate 0.0280   Epoch: 14   Global Step: 147180   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:14:13,038-Speed 5456.98 samples/sec   Loss 3.2216   LearningRate 0.0280   Epoch: 14   Global Step: 147190   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:14:20,564-Speed 5443.78 samples/sec   Loss 3.2881   LearningRate 0.0280   Epoch: 14   Global Step: 147200   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:14:28,062-Speed 5463.69 samples/sec   Loss 3.2287   LearningRate 0.0280   Epoch: 14   Global Step: 147210   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:14:35,586-Speed 5443.97 samples/sec   Loss 3.2425   LearningRate 0.0280   Epoch: 14   Global Step: 147220   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:14:43,086-Speed 5462.54 samples/sec   Loss 3.1926   LearningRate 0.0280   Epoch: 14   Global Step: 147230   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:14:50,643-Speed 5420.43 samples/sec   Loss 3.2267   LearningRate 0.0280   Epoch: 14   Global Step: 147240   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:14:58,080-Speed 5508.42 samples/sec   Loss 3.2293   LearningRate 0.0279   Epoch: 14   Global Step: 147250   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:15:05,620-Speed 5433.12 samples/sec   Loss 3.2562   LearningRate 0.0279   Epoch: 14   Global Step: 147260   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:15:13,105-Speed 5472.81 samples/sec   Loss 3.2055   LearningRate 0.0279   Epoch: 14   Global Step: 147270   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:15:20,611-Speed 5458.19 samples/sec   Loss 3.2041   LearningRate 0.0279   Epoch: 14   Global Step: 147280   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:15:28,152-Speed 5432.01 samples/sec   Loss 3.2147   LearningRate 0.0279   Epoch: 14   Global Step: 147290   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:15:35,679-Speed 5442.97 samples/sec   Loss 3.2193   LearningRate 0.0279   Epoch: 14   Global Step: 147300   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:15:43,207-Speed 5441.33 samples/sec   Loss 3.2031   LearningRate 0.0279   Epoch: 14   Global Step: 147310   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:15:50,733-Speed 5443.58 samples/sec   Loss 3.2323   LearningRate 0.0279   Epoch: 14   Global Step: 147320   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:15:58,385-Speed 5353.44 samples/sec   Loss 3.2585   LearningRate 0.0279   Epoch: 14   Global Step: 147330   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:16:05,904-Speed 5448.21 samples/sec   Loss 3.1961   LearningRate 0.0279   Epoch: 14   Global Step: 147340   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:16:13,459-Speed 5422.10 samples/sec   Loss 3.2399   LearningRate 0.0279   Epoch: 14   Global Step: 147350   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:16:20,997-Speed 5434.26 samples/sec   Loss 3.2355   LearningRate 0.0278   Epoch: 14   Global Step: 147360   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:16:28,549-Speed 5425.10 samples/sec   Loss 3.1976   LearningRate 0.0278   Epoch: 14   Global Step: 147370   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:16:36,067-Speed 5448.63 samples/sec   Loss 3.2093   LearningRate 0.0278   Epoch: 14   Global Step: 147380   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:16:43,561-Speed 5466.36 samples/sec   Loss 3.1443   LearningRate 0.0278   Epoch: 14   Global Step: 147390   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:16:51,062-Speed 5461.31 samples/sec   Loss 3.1905   LearningRate 0.0278   Epoch: 14   Global Step: 147400   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:16:58,642-Speed 5404.53 samples/sec   Loss 3.2619   LearningRate 0.0278   Epoch: 14   Global Step: 147410   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:17:06,105-Speed 5489.67 samples/sec   Loss 3.2659   LearningRate 0.0278   Epoch: 14   Global Step: 147420   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:17:13,661-Speed 5421.55 samples/sec   Loss 3.2169   LearningRate 0.0278   Epoch: 14   Global Step: 147430   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:17:21,229-Speed 5412.94 samples/sec   Loss 3.1839   LearningRate 0.0278   Epoch: 14   Global Step: 147440   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:17:28,673-Speed 5503.05 samples/sec   Loss 3.2269   LearningRate 0.0278   Epoch: 14   Global Step: 147450   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:17:36,149-Speed 5479.21 samples/sec   Loss 3.2197   LearningRate 0.0278   Epoch: 14   Global Step: 147460   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:17:43,625-Speed 5479.90 samples/sec   Loss 3.1973   LearningRate 0.0277   Epoch: 14   Global Step: 147470   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-01-09 04:17:51,099-Speed 5480.82 samples/sec   Loss 3.2034   LearningRate 0.0277   Epoch: 14   Global Step: 147480   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-01-09 04:17:58,554-Speed 5494.50 samples/sec   Loss 3.2111   LearningRate 0.0277   Epoch: 14   Global Step: 147490   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-01-09 04:18:06,096-Speed 5432.09 samples/sec   Loss 3.2199   LearningRate 0.0277   Epoch: 14   Global Step: 147500   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-01-09 04:18:13,551-Speed 5495.21 samples/sec   Loss 3.2228   LearningRate 0.0277   Epoch: 14   Global Step: 147510   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-01-09 04:18:20,988-Speed 5507.91 samples/sec   Loss 3.2216   LearningRate 0.0277   Epoch: 14   Global Step: 147520   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-01-09 04:18:28,542-Speed 5423.49 samples/sec   Loss 3.2383   LearningRate 0.0277   Epoch: 14   Global Step: 147530   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-01-09 04:18:36,052-Speed 5454.21 samples/sec   Loss 3.1665   LearningRate 0.0277   Epoch: 14   Global Step: 147540   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-01-09 04:18:43,553-Speed 5461.68 samples/sec   Loss 3.1874   LearningRate 0.0277   Epoch: 14   Global Step: 147550   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-01-09 04:18:51,082-Speed 5440.83 samples/sec   Loss 3.2218   LearningRate 0.0277   Epoch: 14   Global Step: 147560   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-01-09 04:18:58,532-Speed 5499.11 samples/sec   Loss 3.1744   LearningRate 0.0276   Epoch: 14   Global Step: 147570   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:19:05,994-Speed 5489.65 samples/sec   Loss 3.2009   LearningRate 0.0276   Epoch: 14   Global Step: 147580   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:19:13,438-Speed 5503.12 samples/sec   Loss 3.1807   LearningRate 0.0276   Epoch: 14   Global Step: 147590   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:19:20,935-Speed 5464.88 samples/sec   Loss 3.1690   LearningRate 0.0276   Epoch: 14   Global Step: 147600   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:19:28,519-Speed 5401.04 samples/sec   Loss 3.2320   LearningRate 0.0276   Epoch: 14   Global Step: 147610   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:19:36,057-Speed 5434.70 samples/sec   Loss 3.2028   LearningRate 0.0276   Epoch: 14   Global Step: 147620   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:19:43,525-Speed 5485.29 samples/sec   Loss 3.1746   LearningRate 0.0276   Epoch: 14   Global Step: 147630   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:19:51,063-Speed 5434.95 samples/sec   Loss 3.1917   LearningRate 0.0276   Epoch: 14   Global Step: 147640   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:19:58,546-Speed 5474.41 samples/sec   Loss 3.1927   LearningRate 0.0276   Epoch: 14   Global Step: 147650   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:20:06,053-Speed 5457.03 samples/sec   Loss 3.1979   LearningRate 0.0276   Epoch: 14   Global Step: 147660   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:20:13,623-Speed 5411.50 samples/sec   Loss 3.2121   LearningRate 0.0276   Epoch: 14   Global Step: 147670   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:20:21,127-Speed 5459.21 samples/sec   Loss 3.2171   LearningRate 0.0275   Epoch: 14   Global Step: 147680   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:20:28,619-Speed 5468.09 samples/sec   Loss 3.2113   LearningRate 0.0275   Epoch: 14   Global Step: 147690   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:20:36,165-Speed 5428.30 samples/sec   Loss 3.2177   LearningRate 0.0275   Epoch: 14   Global Step: 147700   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:20:43,673-Speed 5456.22 samples/sec   Loss 3.2298   LearningRate 0.0275   Epoch: 14   Global Step: 147710   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:20:51,187-Speed 5452.29 samples/sec   Loss 3.2182   LearningRate 0.0275   Epoch: 14   Global Step: 147720   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:20:58,665-Speed 5477.95 samples/sec   Loss 3.1551   LearningRate 0.0275   Epoch: 14   Global Step: 147730   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:21:06,270-Speed 5386.50 samples/sec   Loss 3.2037   LearningRate 0.0275   Epoch: 14   Global Step: 147740   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:21:13,742-Speed 5482.85 samples/sec   Loss 3.1839   LearningRate 0.0275   Epoch: 14   Global Step: 147750   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:21:21,305-Speed 5416.75 samples/sec   Loss 3.2422   LearningRate 0.0275   Epoch: 14   Global Step: 147760   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:21:28,791-Speed 5471.72 samples/sec   Loss 3.2439   LearningRate 0.0275   Epoch: 14   Global Step: 147770   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-09 04:21:36,305-Speed 5452.65 samples/sec   Loss 3.1794   LearningRate 0.0275   Epoch: 14   Global Step: 147780   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:21:43,838-Speed 5438.14 samples/sec   Loss 3.2477   LearningRate 0.0274   Epoch: 14   Global Step: 147790   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:21:51,463-Speed 5372.50 samples/sec   Loss 3.2358   LearningRate 0.0274   Epoch: 14   Global Step: 147800   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:21:59,015-Speed 5423.97 samples/sec   Loss 3.1524   LearningRate 0.0274   Epoch: 14   Global Step: 147810   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:22:06,467-Speed 5498.00 samples/sec   Loss 3.1849   LearningRate 0.0274   Epoch: 14   Global Step: 147820   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:22:13,963-Speed 5464.56 samples/sec   Loss 3.1883   LearningRate 0.0274   Epoch: 14   Global Step: 147830   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:22:21,420-Speed 5493.31 samples/sec   Loss 3.1837   LearningRate 0.0274   Epoch: 14   Global Step: 147840   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:22:28,858-Speed 5507.96 samples/sec   Loss 3.1883   LearningRate 0.0274   Epoch: 14   Global Step: 147850   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:22:36,407-Speed 5426.48 samples/sec   Loss 3.2858   LearningRate 0.0274   Epoch: 14   Global Step: 147860   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:22:43,990-Speed 5402.41 samples/sec   Loss 3.2200   LearningRate 0.0274   Epoch: 14   Global Step: 147870   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:22:51,536-Speed 5428.34 samples/sec   Loss 3.1823   LearningRate 0.0274   Epoch: 14   Global Step: 147880   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:22:59,050-Speed 5452.43 samples/sec   Loss 3.1987   LearningRate 0.0274   Epoch: 14   Global Step: 147890   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:23:06,649-Speed 5390.79 samples/sec   Loss 3.2102   LearningRate 0.0273   Epoch: 14   Global Step: 147900   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:23:14,153-Speed 5459.01 samples/sec   Loss 3.1636   LearningRate 0.0273   Epoch: 14   Global Step: 147910   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:23:21,692-Speed 5433.66 samples/sec   Loss 3.1979   LearningRate 0.0273   Epoch: 14   Global Step: 147920   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:23:29,163-Speed 5483.43 samples/sec   Loss 3.1929   LearningRate 0.0273   Epoch: 14   Global Step: 147930   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:23:36,698-Speed 5436.31 samples/sec   Loss 3.1377   LearningRate 0.0273   Epoch: 14   Global Step: 147940   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:23:44,199-Speed 5461.69 samples/sec   Loss 3.1630   LearningRate 0.0273   Epoch: 14   Global Step: 147950   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:23:51,849-Speed 5355.17 samples/sec   Loss 3.2197   LearningRate 0.0273   Epoch: 14   Global Step: 147960   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:23:59,485-Speed 5364.41 samples/sec   Loss 3.2178   LearningRate 0.0273   Epoch: 14   Global Step: 147970   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:24:07,150-Speed 5344.65 samples/sec   Loss 3.1995   LearningRate 0.0273   Epoch: 14   Global Step: 147980   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:24:14,682-Speed 5438.47 samples/sec   Loss 3.1938   LearningRate 0.0273   Epoch: 14   Global Step: 147990   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:24:22,289-Speed 5385.60 samples/sec   Loss 3.1860   LearningRate 0.0273   Epoch: 14   Global Step: 148000   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:25:05,994-[lfw][148000]XNorm: 23.198030
Training: 2022-01-09 04:25:05,995-[lfw][148000]Accuracy-Flip: 0.99783+-0.00289
Training: 2022-01-09 04:25:05,995-[lfw][148000]Accuracy-Highest: 0.99817
Training: 2022-01-09 04:25:57,120-[cfp_fp][148000]XNorm: 21.742158
Training: 2022-01-09 04:25:57,121-[cfp_fp][148000]Accuracy-Flip: 0.99229+-0.00333
Training: 2022-01-09 04:25:57,121-[cfp_fp][148000]Accuracy-Highest: 0.99271
Training: 2022-01-09 04:26:41,007-[agedb_30][148000]XNorm: 23.122083
Training: 2022-01-09 04:26:41,008-[agedb_30][148000]Accuracy-Flip: 0.98150+-0.00677
Training: 2022-01-09 04:26:41,008-[agedb_30][148000]Accuracy-Highest: 0.98150
Training: 2022-01-09 04:26:48,582-Speed 279.99 samples/sec   Loss 3.1586   LearningRate 0.0272   Epoch: 14   Global Step: 148010   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:26:56,092-Speed 5454.84 samples/sec   Loss 3.2374   LearningRate 0.0272   Epoch: 14   Global Step: 148020   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:27:03,721-Speed 5370.02 samples/sec   Loss 3.1988   LearningRate 0.0272   Epoch: 14   Global Step: 148030   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:27:11,280-Speed 5419.66 samples/sec   Loss 3.1977   LearningRate 0.0272   Epoch: 14   Global Step: 148040   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:27:18,796-Speed 5450.55 samples/sec   Loss 3.1662   LearningRate 0.0272   Epoch: 14   Global Step: 148050   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:27:26,318-Speed 5445.59 samples/sec   Loss 3.2259   LearningRate 0.0272   Epoch: 14   Global Step: 148060   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:27:33,757-Speed 5506.85 samples/sec   Loss 3.1468   LearningRate 0.0272   Epoch: 14   Global Step: 148070   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:27:41,303-Speed 5428.92 samples/sec   Loss 3.1570   LearningRate 0.0272   Epoch: 14   Global Step: 148080   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:27:48,820-Speed 5449.42 samples/sec   Loss 3.1679   LearningRate 0.0272   Epoch: 14   Global Step: 148090   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:27:56,296-Speed 5479.56 samples/sec   Loss 3.1545   LearningRate 0.0272   Epoch: 14   Global Step: 148100   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:28:03,797-Speed 5461.40 samples/sec   Loss 3.1549   LearningRate 0.0272   Epoch: 14   Global Step: 148110   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:28:11,243-Speed 5501.85 samples/sec   Loss 3.1562   LearningRate 0.0271   Epoch: 14   Global Step: 148120   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:28:18,739-Speed 5464.45 samples/sec   Loss 3.1950   LearningRate 0.0271   Epoch: 14   Global Step: 148130   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:28:26,322-Speed 5402.43 samples/sec   Loss 3.1622   LearningRate 0.0271   Epoch: 14   Global Step: 148140   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:28:33,998-Speed 5336.75 samples/sec   Loss 3.1823   LearningRate 0.0271   Epoch: 14   Global Step: 148150   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:28:41,443-Speed 5502.48 samples/sec   Loss 3.1342   LearningRate 0.0271   Epoch: 14   Global Step: 148160   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:28:48,984-Speed 5432.38 samples/sec   Loss 3.1938   LearningRate 0.0271   Epoch: 14   Global Step: 148170   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-09 04:28:56,403-Speed 5522.00 samples/sec   Loss 3.1769   LearningRate 0.0271   Epoch: 14   Global Step: 148180   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-09 04:29:03,868-Speed 5487.96 samples/sec   Loss 3.1954   LearningRate 0.0271   Epoch: 14   Global Step: 148190   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:29:11,340-Speed 5481.94 samples/sec   Loss 3.1845   LearningRate 0.0271   Epoch: 14   Global Step: 148200   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:29:18,853-Speed 5452.83 samples/sec   Loss 3.1745   LearningRate 0.0271   Epoch: 14   Global Step: 148210   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:29:26,272-Speed 5521.47 samples/sec   Loss 3.1972   LearningRate 0.0271   Epoch: 14   Global Step: 148220   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:29:33,780-Speed 5456.62 samples/sec   Loss 3.1978   LearningRate 0.0270   Epoch: 14   Global Step: 148230   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:29:41,382-Speed 5388.28 samples/sec   Loss 3.1679   LearningRate 0.0270   Epoch: 14   Global Step: 148240   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:29:48,911-Speed 5441.21 samples/sec   Loss 3.2175   LearningRate 0.0270   Epoch: 14   Global Step: 148250   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:29:56,430-Speed 5448.64 samples/sec   Loss 3.1814   LearningRate 0.0270   Epoch: 14   Global Step: 148260   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:30:03,903-Speed 5481.44 samples/sec   Loss 3.1460   LearningRate 0.0270   Epoch: 14   Global Step: 148270   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:30:11,433-Speed 5440.20 samples/sec   Loss 3.2015   LearningRate 0.0270   Epoch: 14   Global Step: 148280   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:30:18,908-Speed 5480.56 samples/sec   Loss 3.1495   LearningRate 0.0270   Epoch: 14   Global Step: 148290   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:30:26,474-Speed 5414.19 samples/sec   Loss 3.1951   LearningRate 0.0270   Epoch: 14   Global Step: 148300   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:30:33,935-Speed 5490.95 samples/sec   Loss 3.1552   LearningRate 0.0270   Epoch: 14   Global Step: 148310   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:30:41,479-Speed 5429.90 samples/sec   Loss 3.1424   LearningRate 0.0270   Epoch: 14   Global Step: 148320   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:30:49,106-Speed 5371.32 samples/sec   Loss 3.1654   LearningRate 0.0270   Epoch: 14   Global Step: 148330   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:30:56,531-Speed 5517.00 samples/sec   Loss 3.1815   LearningRate 0.0269   Epoch: 14   Global Step: 148340   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:31:03,991-Speed 5491.18 samples/sec   Loss 3.1822   LearningRate 0.0269   Epoch: 14   Global Step: 148350   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:31:11,586-Speed 5394.22 samples/sec   Loss 3.1471   LearningRate 0.0269   Epoch: 14   Global Step: 148360   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:31:19,002-Speed 5523.15 samples/sec   Loss 3.2062   LearningRate 0.0269   Epoch: 14   Global Step: 148370   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:31:26,470-Speed 5485.97 samples/sec   Loss 3.2147   LearningRate 0.0269   Epoch: 14   Global Step: 148380   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:31:33,952-Speed 5474.87 samples/sec   Loss 3.1807   LearningRate 0.0269   Epoch: 14   Global Step: 148390   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:31:41,449-Speed 5464.24 samples/sec   Loss 3.1914   LearningRate 0.0269   Epoch: 14   Global Step: 148400   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:31:48,912-Speed 5489.72 samples/sec   Loss 3.1464   LearningRate 0.0269   Epoch: 14   Global Step: 148410   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:31:56,465-Speed 5423.52 samples/sec   Loss 3.1286   LearningRate 0.0269   Epoch: 14   Global Step: 148420   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:32:04,022-Speed 5420.54 samples/sec   Loss 3.1671   LearningRate 0.0269   Epoch: 14   Global Step: 148430   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:32:11,578-Speed 5421.63 samples/sec   Loss 3.1613   LearningRate 0.0269   Epoch: 14   Global Step: 148440   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-09 04:32:19,037-Speed 5492.27 samples/sec   Loss 3.1407   LearningRate 0.0268   Epoch: 14   Global Step: 148450   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:32:26,549-Speed 5453.05 samples/sec   Loss 3.1510   LearningRate 0.0268   Epoch: 14   Global Step: 148460   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-09 04:32:34,018-Speed 5484.61 samples/sec   Loss 3.1396   LearningRate 0.0268   Epoch: 14   Global Step: 148470   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:32:41,500-Speed 5475.18 samples/sec   Loss 3.1706   LearningRate 0.0268   Epoch: 14   Global Step: 148480   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:32:48,947-Speed 5501.10 samples/sec   Loss 3.1487   LearningRate 0.0268   Epoch: 14   Global Step: 148490   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:32:56,518-Speed 5410.66 samples/sec   Loss 3.1265   LearningRate 0.0268   Epoch: 14   Global Step: 148500   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:33:04,025-Speed 5472.00 samples/sec   Loss 3.1718   LearningRate 0.0268   Epoch: 14   Global Step: 148510   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:33:11,477-Speed 5497.38 samples/sec   Loss 3.1728   LearningRate 0.0268   Epoch: 14   Global Step: 148520   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:33:18,970-Speed 5467.23 samples/sec   Loss 3.1281   LearningRate 0.0268   Epoch: 14   Global Step: 148530   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:33:26,498-Speed 5441.50 samples/sec   Loss 3.1848   LearningRate 0.0268   Epoch: 14   Global Step: 148540   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:33:34,150-Speed 5353.63 samples/sec   Loss 3.1854   LearningRate 0.0268   Epoch: 14   Global Step: 148550   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:33:41,754-Speed 5387.17 samples/sec   Loss 3.1117   LearningRate 0.0267   Epoch: 14   Global Step: 148560   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:33:49,343-Speed 5398.53 samples/sec   Loss 3.1595   LearningRate 0.0267   Epoch: 14   Global Step: 148570   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:33:56,847-Speed 5459.03 samples/sec   Loss 3.1756   LearningRate 0.0267   Epoch: 14   Global Step: 148580   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:34:04,397-Speed 5425.97 samples/sec   Loss 3.1288   LearningRate 0.0267   Epoch: 14   Global Step: 148590   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:34:11,886-Speed 5470.25 samples/sec   Loss 3.0976   LearningRate 0.0267   Epoch: 14   Global Step: 148600   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:34:19,365-Speed 5477.67 samples/sec   Loss 3.1288   LearningRate 0.0267   Epoch: 14   Global Step: 148610   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:34:26,890-Speed 5443.84 samples/sec   Loss 3.1687   LearningRate 0.0267   Epoch: 14   Global Step: 148620   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:34:34,347-Speed 5493.77 samples/sec   Loss 3.1520   LearningRate 0.0267   Epoch: 14   Global Step: 148630   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:34:41,936-Speed 5397.86 samples/sec   Loss 3.1390   LearningRate 0.0267   Epoch: 14   Global Step: 148640   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:34:49,461-Speed 5443.59 samples/sec   Loss 3.2110   LearningRate 0.0267   Epoch: 14   Global Step: 148650   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:34:57,001-Speed 5433.80 samples/sec   Loss 3.1736   LearningRate 0.0267   Epoch: 14   Global Step: 148660   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:35:04,442-Speed 5505.18 samples/sec   Loss 3.1353   LearningRate 0.0266   Epoch: 14   Global Step: 148670   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:35:11,953-Speed 5454.16 samples/sec   Loss 3.1034   LearningRate 0.0266   Epoch: 14   Global Step: 148680   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:35:19,527-Speed 5408.49 samples/sec   Loss 3.1635   LearningRate 0.0266   Epoch: 14   Global Step: 148690   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:35:26,993-Speed 5487.01 samples/sec   Loss 3.1312   LearningRate 0.0266   Epoch: 14   Global Step: 148700   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:35:34,543-Speed 5426.36 samples/sec   Loss 3.1461   LearningRate 0.0266   Epoch: 14   Global Step: 148710   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:35:42,030-Speed 5471.18 samples/sec   Loss 3.1632   LearningRate 0.0266   Epoch: 14   Global Step: 148720   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:35:49,506-Speed 5479.57 samples/sec   Loss 3.1531   LearningRate 0.0266   Epoch: 14   Global Step: 148730   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:35:57,132-Speed 5372.32 samples/sec   Loss 3.1587   LearningRate 0.0266   Epoch: 14   Global Step: 148740   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:36:04,634-Speed 5460.42 samples/sec   Loss 3.1827   LearningRate 0.0266   Epoch: 14   Global Step: 148750   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:36:12,127-Speed 5467.54 samples/sec   Loss 3.1179   LearningRate 0.0266   Epoch: 14   Global Step: 148760   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:36:19,611-Speed 5473.55 samples/sec   Loss 3.1606   LearningRate 0.0266   Epoch: 14   Global Step: 148770   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:36:27,125-Speed 5452.20 samples/sec   Loss 3.1507   LearningRate 0.0265   Epoch: 14   Global Step: 148780   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:36:34,635-Speed 5454.50 samples/sec   Loss 3.1970   LearningRate 0.0265   Epoch: 14   Global Step: 148790   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:36:42,115-Speed 5477.28 samples/sec   Loss 3.1699   LearningRate 0.0265   Epoch: 14   Global Step: 148800   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:36:49,620-Speed 5458.31 samples/sec   Loss 3.1318   LearningRate 0.0265   Epoch: 14   Global Step: 148810   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:36:57,152-Speed 5439.08 samples/sec   Loss 3.1218   LearningRate 0.0265   Epoch: 14   Global Step: 148820   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:37:04,656-Speed 5459.32 samples/sec   Loss 3.1181   LearningRate 0.0265   Epoch: 14   Global Step: 148830   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:37:12,184-Speed 5441.54 samples/sec   Loss 3.1332   LearningRate 0.0265   Epoch: 14   Global Step: 148840   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:37:19,673-Speed 5470.44 samples/sec   Loss 3.1166   LearningRate 0.0265   Epoch: 14   Global Step: 148850   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:37:27,157-Speed 5473.18 samples/sec   Loss 3.1184   LearningRate 0.0265   Epoch: 14   Global Step: 148860   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:37:34,626-Speed 5485.12 samples/sec   Loss 3.1249   LearningRate 0.0265   Epoch: 14   Global Step: 148870   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:37:42,130-Speed 5459.42 samples/sec   Loss 3.0967   LearningRate 0.0265   Epoch: 14   Global Step: 148880   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:37:49,708-Speed 5405.88 samples/sec   Loss 3.1688   LearningRate 0.0264   Epoch: 14   Global Step: 148890   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:37:57,264-Speed 5421.18 samples/sec   Loss 3.1222   LearningRate 0.0264   Epoch: 14   Global Step: 148900   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:38:04,900-Speed 5365.45 samples/sec   Loss 3.1422   LearningRate 0.0264   Epoch: 14   Global Step: 148910   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:38:12,450-Speed 5425.65 samples/sec   Loss 3.1483   LearningRate 0.0264   Epoch: 14   Global Step: 148920   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:38:19,984-Speed 5437.10 samples/sec   Loss 3.1241   LearningRate 0.0264   Epoch: 14   Global Step: 148930   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:38:27,439-Speed 5495.58 samples/sec   Loss 3.1518   LearningRate 0.0264   Epoch: 14   Global Step: 148940   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:38:34,931-Speed 5467.76 samples/sec   Loss 3.1293   LearningRate 0.0264   Epoch: 14   Global Step: 148950   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:38:42,532-Speed 5389.31 samples/sec   Loss 3.1261   LearningRate 0.0264   Epoch: 14   Global Step: 148960   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:38:50,048-Speed 5450.81 samples/sec   Loss 3.1270   LearningRate 0.0264   Epoch: 14   Global Step: 148970   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:38:57,728-Speed 5333.89 samples/sec   Loss 3.1344   LearningRate 0.0264   Epoch: 14   Global Step: 148980   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:39:05,202-Speed 5480.47 samples/sec   Loss 3.1331   LearningRate 0.0264   Epoch: 14   Global Step: 148990   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:39:12,776-Speed 5409.18 samples/sec   Loss 3.1167   LearningRate 0.0263   Epoch: 14   Global Step: 149000   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:39:20,300-Speed 5444.14 samples/sec   Loss 3.0583   LearningRate 0.0263   Epoch: 14   Global Step: 149010   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:39:27,823-Speed 5445.60 samples/sec   Loss 3.1577   LearningRate 0.0263   Epoch: 14   Global Step: 149020   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:39:35,381-Speed 5420.22 samples/sec   Loss 3.1246   LearningRate 0.0263   Epoch: 14   Global Step: 149030   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:39:42,920-Speed 5433.58 samples/sec   Loss 3.0924   LearningRate 0.0263   Epoch: 14   Global Step: 149040   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:39:50,419-Speed 5462.53 samples/sec   Loss 3.1168   LearningRate 0.0263   Epoch: 14   Global Step: 149050   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:39:57,971-Speed 5425.08 samples/sec   Loss 3.0969   LearningRate 0.0263   Epoch: 14   Global Step: 149060   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:40:05,520-Speed 5426.45 samples/sec   Loss 3.1213   LearningRate 0.0263   Epoch: 14   Global Step: 149070   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:40:13,041-Speed 5446.89 samples/sec   Loss 3.1291   LearningRate 0.0263   Epoch: 14   Global Step: 149080   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:40:20,463-Speed 5519.35 samples/sec   Loss 3.1398   LearningRate 0.0263   Epoch: 14   Global Step: 149090   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:40:27,984-Speed 5446.92 samples/sec   Loss 3.1116   LearningRate 0.0263   Epoch: 14   Global Step: 149100   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:40:35,403-Speed 5521.60 samples/sec   Loss 3.1303   LearningRate 0.0262   Epoch: 14   Global Step: 149110   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:40:42,942-Speed 5434.07 samples/sec   Loss 3.1050   LearningRate 0.0262   Epoch: 14   Global Step: 149120   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:40:50,486-Speed 5429.70 samples/sec   Loss 3.1273   LearningRate 0.0262   Epoch: 14   Global Step: 149130   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:40:57,971-Speed 5473.54 samples/sec   Loss 3.1280   LearningRate 0.0262   Epoch: 14   Global Step: 149140   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:41:05,394-Speed 5518.85 samples/sec   Loss 3.1363   LearningRate 0.0262   Epoch: 14   Global Step: 149150   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:41:12,910-Speed 5450.20 samples/sec   Loss 3.0962   LearningRate 0.0262   Epoch: 14   Global Step: 149160   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:41:20,484-Speed 5408.53 samples/sec   Loss 3.1245   LearningRate 0.0262   Epoch: 14   Global Step: 149170   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:41:27,926-Speed 5505.13 samples/sec   Loss 3.1134   LearningRate 0.0262   Epoch: 14   Global Step: 149180   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:41:35,447-Speed 5446.51 samples/sec   Loss 3.1600   LearningRate 0.0262   Epoch: 14   Global Step: 149190   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:41:42,915-Speed 5485.25 samples/sec   Loss 3.1058   LearningRate 0.0262   Epoch: 14   Global Step: 149200   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:41:50,468-Speed 5424.43 samples/sec   Loss 3.1269   LearningRate 0.0262   Epoch: 14   Global Step: 149210   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:41:57,969-Speed 5461.13 samples/sec   Loss 3.0887   LearningRate 0.0261   Epoch: 14   Global Step: 149220   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:42:05,428-Speed 5492.38 samples/sec   Loss 3.0943   LearningRate 0.0261   Epoch: 14   Global Step: 149230   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:42:12,964-Speed 5435.43 samples/sec   Loss 3.1420   LearningRate 0.0261   Epoch: 14   Global Step: 149240   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:42:20,430-Speed 5487.13 samples/sec   Loss 3.1049   LearningRate 0.0261   Epoch: 14   Global Step: 149250   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:42:28,104-Speed 5338.27 samples/sec   Loss 3.1205   LearningRate 0.0261   Epoch: 14   Global Step: 149260   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:42:35,598-Speed 5466.57 samples/sec   Loss 3.1386   LearningRate 0.0261   Epoch: 14   Global Step: 149270   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:42:43,206-Speed 5384.55 samples/sec   Loss 3.1404   LearningRate 0.0261   Epoch: 14   Global Step: 149280   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:42:50,780-Speed 5408.79 samples/sec   Loss 3.0897   LearningRate 0.0261   Epoch: 14   Global Step: 149290   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:42:58,266-Speed 5471.92 samples/sec   Loss 3.1276   LearningRate 0.0261   Epoch: 14   Global Step: 149300   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:43:05,832-Speed 5415.06 samples/sec   Loss 3.1310   LearningRate 0.0261   Epoch: 14   Global Step: 149310   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:43:13,396-Speed 5415.74 samples/sec   Loss 3.1199   LearningRate 0.0261   Epoch: 14   Global Step: 149320   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:43:20,893-Speed 5463.92 samples/sec   Loss 3.1088   LearningRate 0.0260   Epoch: 14   Global Step: 149330   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:43:28,499-Speed 5386.03 samples/sec   Loss 3.0875   LearningRate 0.0260   Epoch: 14   Global Step: 149340   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:43:36,011-Speed 5453.60 samples/sec   Loss 3.0945   LearningRate 0.0260   Epoch: 14   Global Step: 149350   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:43:43,599-Speed 5398.19 samples/sec   Loss 3.1238   LearningRate 0.0260   Epoch: 14   Global Step: 149360   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:43:51,089-Speed 5469.41 samples/sec   Loss 3.1467   LearningRate 0.0260   Epoch: 14   Global Step: 149370   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:43:58,610-Speed 5446.47 samples/sec   Loss 3.1536   LearningRate 0.0260   Epoch: 14   Global Step: 149380   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:44:06,189-Speed 5405.22 samples/sec   Loss 3.1121   LearningRate 0.0260   Epoch: 14   Global Step: 149390   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:44:13,724-Speed 5437.12 samples/sec   Loss 3.0825   LearningRate 0.0260   Epoch: 14   Global Step: 149400   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 04:44:21,235-Speed 5453.20 samples/sec   Loss 3.0867   LearningRate 0.0260   Epoch: 14   Global Step: 149410   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:44:28,780-Speed 5429.55 samples/sec   Loss 3.0950   LearningRate 0.0260   Epoch: 14   Global Step: 149420   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:44:36,304-Speed 5445.18 samples/sec   Loss 3.0877   LearningRate 0.0260   Epoch: 14   Global Step: 149430   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:44:43,789-Speed 5473.12 samples/sec   Loss 3.0989   LearningRate 0.0259   Epoch: 14   Global Step: 149440   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:44:51,316-Speed 5442.04 samples/sec   Loss 3.0866   LearningRate 0.0259   Epoch: 14   Global Step: 149450   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:44:58,841-Speed 5443.26 samples/sec   Loss 3.1097   LearningRate 0.0259   Epoch: 14   Global Step: 149460   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:45:06,379-Speed 5435.16 samples/sec   Loss 3.1168   LearningRate 0.0259   Epoch: 14   Global Step: 149470   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:45:13,942-Speed 5416.45 samples/sec   Loss 3.0983   LearningRate 0.0259   Epoch: 14   Global Step: 149480   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:45:21,454-Speed 5453.17 samples/sec   Loss 3.0872   LearningRate 0.0259   Epoch: 14   Global Step: 149490   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:45:28,959-Speed 5458.32 samples/sec   Loss 3.0907   LearningRate 0.0259   Epoch: 14   Global Step: 149500   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:45:36,507-Speed 5427.99 samples/sec   Loss 3.0796   LearningRate 0.0259   Epoch: 14   Global Step: 149510   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:45:44,050-Speed 5430.98 samples/sec   Loss 3.0835   LearningRate 0.0259   Epoch: 14   Global Step: 149520   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:45:51,633-Speed 5401.70 samples/sec   Loss 3.1351   LearningRate 0.0259   Epoch: 14   Global Step: 149530   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:45:59,147-Speed 5452.17 samples/sec   Loss 3.1217   LearningRate 0.0259   Epoch: 14   Global Step: 149540   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:46:06,637-Speed 5469.88 samples/sec   Loss 3.1181   LearningRate 0.0258   Epoch: 14   Global Step: 149550   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:46:14,178-Speed 5432.02 samples/sec   Loss 3.0910   LearningRate 0.0258   Epoch: 14   Global Step: 149560   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:46:21,611-Speed 5511.36 samples/sec   Loss 3.1088   LearningRate 0.0258   Epoch: 14   Global Step: 149570   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:46:29,108-Speed 5463.85 samples/sec   Loss 3.1216   LearningRate 0.0258   Epoch: 14   Global Step: 149580   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:46:36,614-Speed 5458.42 samples/sec   Loss 3.0517   LearningRate 0.0258   Epoch: 14   Global Step: 149590   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:46:44,173-Speed 5419.63 samples/sec   Loss 3.1067   LearningRate 0.0258   Epoch: 14   Global Step: 149600   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:46:51,612-Speed 5506.89 samples/sec   Loss 3.0836   LearningRate 0.0258   Epoch: 14   Global Step: 149610   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:46:59,118-Speed 5457.05 samples/sec   Loss 3.1149   LearningRate 0.0258   Epoch: 14   Global Step: 149620   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:47:06,605-Speed 5472.14 samples/sec   Loss 3.0769   LearningRate 0.0258   Epoch: 14   Global Step: 149630   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:47:14,136-Speed 5439.22 samples/sec   Loss 3.1122   LearningRate 0.0258   Epoch: 14   Global Step: 149640   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:47:21,627-Speed 5468.88 samples/sec   Loss 3.0800   LearningRate 0.0258   Epoch: 14   Global Step: 149650   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:47:29,085-Speed 5492.69 samples/sec   Loss 3.1079   LearningRate 0.0258   Epoch: 14   Global Step: 149660   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:47:36,550-Speed 5488.04 samples/sec   Loss 3.0645   LearningRate 0.0257   Epoch: 14   Global Step: 149670   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:47:44,003-Speed 5496.56 samples/sec   Loss 3.1258   LearningRate 0.0257   Epoch: 14   Global Step: 149680   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:47:51,520-Speed 5449.75 samples/sec   Loss 3.0993   LearningRate 0.0257   Epoch: 14   Global Step: 149690   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:47:59,008-Speed 5470.64 samples/sec   Loss 3.0804   LearningRate 0.0257   Epoch: 14   Global Step: 149700   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:48:06,578-Speed 5411.77 samples/sec   Loss 3.0748   LearningRate 0.0257   Epoch: 14   Global Step: 149710   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:48:14,135-Speed 5421.43 samples/sec   Loss 3.0703   LearningRate 0.0257   Epoch: 14   Global Step: 149720   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:48:21,567-Speed 5511.73 samples/sec   Loss 3.0930   LearningRate 0.0257   Epoch: 14   Global Step: 149730   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:48:29,096-Speed 5440.16 samples/sec   Loss 3.0914   LearningRate 0.0257   Epoch: 14   Global Step: 149740   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:48:36,652-Speed 5437.00 samples/sec   Loss 3.0484   LearningRate 0.0257   Epoch: 14   Global Step: 149750   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:48:46,257-Speed 5634.80 samples/sec   Loss 3.0867   LearningRate 0.0257   Epoch: 14   Global Step: 149760   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:48:53,799-Speed 5431.87 samples/sec   Loss 3.1001   LearningRate 0.0257   Epoch: 14   Global Step: 149770   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:49:01,307-Speed 5455.93 samples/sec   Loss 3.0437   LearningRate 0.0256   Epoch: 14   Global Step: 149780   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:49:08,735-Speed 5515.74 samples/sec   Loss 3.1309   LearningRate 0.0256   Epoch: 14   Global Step: 149790   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:49:16,309-Speed 5408.31 samples/sec   Loss 3.0607   LearningRate 0.0256   Epoch: 14   Global Step: 149800   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:49:23,985-Speed 5337.00 samples/sec   Loss 3.0474   LearningRate 0.0256   Epoch: 14   Global Step: 149810   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:49:31,559-Speed 5408.75 samples/sec   Loss 3.0946   LearningRate 0.0256   Epoch: 14   Global Step: 149820   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:49:39,057-Speed 5463.93 samples/sec   Loss 3.1186   LearningRate 0.0256   Epoch: 14   Global Step: 149830   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:49:46,638-Speed 5403.48 samples/sec   Loss 3.1036   LearningRate 0.0256   Epoch: 14   Global Step: 149840   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:49:54,138-Speed 5462.30 samples/sec   Loss 3.0799   LearningRate 0.0256   Epoch: 14   Global Step: 149850   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:50:01,722-Speed 5401.74 samples/sec   Loss 3.0784   LearningRate 0.0256   Epoch: 14   Global Step: 149860   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:50:09,169-Speed 5500.49 samples/sec   Loss 3.1115   LearningRate 0.0256   Epoch: 14   Global Step: 149870   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:50:16,589-Speed 5521.42 samples/sec   Loss 3.0575   LearningRate 0.0256   Epoch: 14   Global Step: 149880   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:50:24,111-Speed 5445.86 samples/sec   Loss 3.0867   LearningRate 0.0255   Epoch: 14   Global Step: 149890   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:50:31,668-Speed 5420.78 samples/sec   Loss 3.0571   LearningRate 0.0255   Epoch: 14   Global Step: 149900   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:50:39,197-Speed 5440.99 samples/sec   Loss 3.0951   LearningRate 0.0255   Epoch: 14   Global Step: 149910   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:50:46,833-Speed 5365.32 samples/sec   Loss 3.0739   LearningRate 0.0255   Epoch: 14   Global Step: 149920   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:50:54,384-Speed 5424.50 samples/sec   Loss 3.0951   LearningRate 0.0255   Epoch: 14   Global Step: 149930   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:51:01,890-Speed 5457.48 samples/sec   Loss 3.0795   LearningRate 0.0255   Epoch: 14   Global Step: 149940   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:51:09,445-Speed 5422.93 samples/sec   Loss 3.0919   LearningRate 0.0255   Epoch: 14   Global Step: 149950   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:51:16,975-Speed 5440.31 samples/sec   Loss 3.1340   LearningRate 0.0255   Epoch: 14   Global Step: 149960   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:51:24,561-Speed 5399.88 samples/sec   Loss 3.0592   LearningRate 0.0255   Epoch: 14   Global Step: 149970   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:51:32,059-Speed 5463.28 samples/sec   Loss 3.0804   LearningRate 0.0255   Epoch: 14   Global Step: 149980   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:51:39,584-Speed 5444.07 samples/sec   Loss 3.0704   LearningRate 0.0255   Epoch: 14   Global Step: 149990   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:51:47,021-Speed 5508.37 samples/sec   Loss 3.0772   LearningRate 0.0254   Epoch: 14   Global Step: 150000   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:52:30,981-[lfw][150000]XNorm: 23.833620
Training: 2022-01-09 04:52:30,981-[lfw][150000]Accuracy-Flip: 0.99783+-0.00289
Training: 2022-01-09 04:52:30,982-[lfw][150000]Accuracy-Highest: 0.99817
Training: 2022-01-09 04:53:22,280-[cfp_fp][150000]XNorm: 22.481064
Training: 2022-01-09 04:53:22,281-[cfp_fp][150000]Accuracy-Flip: 0.99314+-0.00360
Training: 2022-01-09 04:53:22,281-[cfp_fp][150000]Accuracy-Highest: 0.99314
Training: 2022-01-09 04:54:06,466-[agedb_30][150000]XNorm: 23.921558
Training: 2022-01-09 04:54:06,467-[agedb_30][150000]Accuracy-Flip: 0.98017+-0.00790
Training: 2022-01-09 04:54:06,468-[agedb_30][150000]Accuracy-Highest: 0.98150
Training: 2022-01-09 04:54:14,083-Speed 278.52 samples/sec   Loss 3.0471   LearningRate 0.0254   Epoch: 14   Global Step: 150010   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:54:21,553-Speed 5484.20 samples/sec   Loss 3.0723   LearningRate 0.0254   Epoch: 14   Global Step: 150020   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:54:29,038-Speed 5473.18 samples/sec   Loss 3.0821   LearningRate 0.0254   Epoch: 14   Global Step: 150030   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:54:36,452-Speed 5525.18 samples/sec   Loss 3.0581   LearningRate 0.0254   Epoch: 14   Global Step: 150040   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:54:43,942-Speed 5468.92 samples/sec   Loss 3.0515   LearningRate 0.0254   Epoch: 14   Global Step: 150050   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:54:51,422-Speed 5477.00 samples/sec   Loss 3.0960   LearningRate 0.0254   Epoch: 14   Global Step: 150060   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:54:58,995-Speed 5409.36 samples/sec   Loss 3.0715   LearningRate 0.0254   Epoch: 14   Global Step: 150070   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:55:06,509-Speed 5451.81 samples/sec   Loss 3.1050   LearningRate 0.0254   Epoch: 14   Global Step: 150080   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:55:13,973-Speed 5488.38 samples/sec   Loss 3.0532   LearningRate 0.0254   Epoch: 14   Global Step: 150090   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:55:21,394-Speed 5520.31 samples/sec   Loss 3.0820   LearningRate 0.0254   Epoch: 14   Global Step: 150100   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:55:28,967-Speed 5410.27 samples/sec   Loss 3.0679   LearningRate 0.0254   Epoch: 14   Global Step: 150110   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-01-09 04:55:36,614-Speed 5356.59 samples/sec   Loss 3.0818   LearningRate 0.0253   Epoch: 14   Global Step: 150120   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-01-09 04:55:44,111-Speed 5464.20 samples/sec   Loss 3.1001   LearningRate 0.0253   Epoch: 14   Global Step: 150130   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-01-09 04:55:51,706-Speed 5393.46 samples/sec   Loss 3.0691   LearningRate 0.0253   Epoch: 14   Global Step: 150140   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-01-09 04:55:59,159-Speed 5497.19 samples/sec   Loss 3.0406   LearningRate 0.0253   Epoch: 14   Global Step: 150150   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-01-09 04:56:06,660-Speed 5460.99 samples/sec   Loss 3.1056   LearningRate 0.0253   Epoch: 14   Global Step: 150160   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-01-09 04:56:14,163-Speed 5459.23 samples/sec   Loss 3.1224   LearningRate 0.0253   Epoch: 14   Global Step: 150170   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-01-09 04:56:21,674-Speed 5454.42 samples/sec   Loss 3.0772   LearningRate 0.0253   Epoch: 14   Global Step: 150180   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-01-09 04:56:29,208-Speed 5437.93 samples/sec   Loss 3.0622   LearningRate 0.0253   Epoch: 14   Global Step: 150190   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-01-09 04:56:36,670-Speed 5489.69 samples/sec   Loss 3.0940   LearningRate 0.0253   Epoch: 14   Global Step: 150200   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-01-09 04:56:44,260-Speed 5397.20 samples/sec   Loss 3.0641   LearningRate 0.0253   Epoch: 14   Global Step: 150210   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:56:51,798-Speed 5434.31 samples/sec   Loss 3.1085   LearningRate 0.0253   Epoch: 14   Global Step: 150220   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:56:59,326-Speed 5441.89 samples/sec   Loss 3.0595   LearningRate 0.0252   Epoch: 14   Global Step: 150230   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:57:06,859-Speed 5438.46 samples/sec   Loss 3.0617   LearningRate 0.0252   Epoch: 14   Global Step: 150240   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:57:14,359-Speed 5461.76 samples/sec   Loss 3.0449   LearningRate 0.0252   Epoch: 14   Global Step: 150250   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:57:21,968-Speed 5383.56 samples/sec   Loss 3.0767   LearningRate 0.0252   Epoch: 14   Global Step: 150260   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:57:29,466-Speed 5463.28 samples/sec   Loss 3.0892   LearningRate 0.0252   Epoch: 14   Global Step: 150270   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:57:36,968-Speed 5461.07 samples/sec   Loss 3.0478   LearningRate 0.0252   Epoch: 14   Global Step: 150280   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:57:44,544-Speed 5406.86 samples/sec   Loss 3.0734   LearningRate 0.0252   Epoch: 14   Global Step: 150290   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:57:52,118-Speed 5408.44 samples/sec   Loss 3.0408   LearningRate 0.0252   Epoch: 14   Global Step: 150300   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:57:59,628-Speed 5454.48 samples/sec   Loss 3.0705   LearningRate 0.0252   Epoch: 14   Global Step: 150310   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:58:07,077-Speed 5500.50 samples/sec   Loss 3.0836   LearningRate 0.0252   Epoch: 14   Global Step: 150320   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:58:14,552-Speed 5479.89 samples/sec   Loss 3.0720   LearningRate 0.0252   Epoch: 14   Global Step: 150330   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:58:22,039-Speed 5471.23 samples/sec   Loss 3.0432   LearningRate 0.0251   Epoch: 14   Global Step: 150340   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:58:29,529-Speed 5469.44 samples/sec   Loss 3.0507   LearningRate 0.0251   Epoch: 14   Global Step: 150350   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:58:36,976-Speed 5501.84 samples/sec   Loss 3.0633   LearningRate 0.0251   Epoch: 14   Global Step: 150360   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:58:44,416-Speed 5505.30 samples/sec   Loss 3.0978   LearningRate 0.0251   Epoch: 14   Global Step: 150370   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:58:51,978-Speed 5417.98 samples/sec   Loss 3.0887   LearningRate 0.0251   Epoch: 14   Global Step: 150380   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:58:59,575-Speed 5392.26 samples/sec   Loss 3.0450   LearningRate 0.0251   Epoch: 14   Global Step: 150390   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:59:07,196-Speed 5375.69 samples/sec   Loss 3.0520   LearningRate 0.0251   Epoch: 14   Global Step: 150400   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:59:14,671-Speed 5480.53 samples/sec   Loss 3.0480   LearningRate 0.0251   Epoch: 14   Global Step: 150410   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:59:22,190-Speed 5448.04 samples/sec   Loss 3.0698   LearningRate 0.0251   Epoch: 14   Global Step: 150420   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:59:29,665-Speed 5480.11 samples/sec   Loss 3.0481   LearningRate 0.0251   Epoch: 14   Global Step: 150430   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:59:37,172-Speed 5457.56 samples/sec   Loss 3.0580   LearningRate 0.0251   Epoch: 14   Global Step: 150440   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:59:44,721-Speed 5426.19 samples/sec   Loss 3.0338   LearningRate 0.0251   Epoch: 14   Global Step: 150450   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 04:59:52,178-Speed 5494.07 samples/sec   Loss 3.0266   LearningRate 0.0250   Epoch: 14   Global Step: 150460   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 04:59:59,711-Speed 5437.93 samples/sec   Loss 3.0656   LearningRate 0.0250   Epoch: 14   Global Step: 150470   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:00:07,274-Speed 5417.10 samples/sec   Loss 3.0416   LearningRate 0.0250   Epoch: 14   Global Step: 150480   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:00:14,743-Speed 5484.23 samples/sec   Loss 3.0278   LearningRate 0.0250   Epoch: 14   Global Step: 150490   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:00:22,227-Speed 5473.87 samples/sec   Loss 3.0773   LearningRate 0.0250   Epoch: 14   Global Step: 150500   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:00:29,750-Speed 5445.75 samples/sec   Loss 3.0703   LearningRate 0.0250   Epoch: 14   Global Step: 150510   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:00:37,256-Speed 5457.31 samples/sec   Loss 3.0128   LearningRate 0.0250   Epoch: 14   Global Step: 150520   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:00:44,695-Speed 5507.08 samples/sec   Loss 3.0191   LearningRate 0.0250   Epoch: 14   Global Step: 150530   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:00:52,171-Speed 5479.42 samples/sec   Loss 3.0195   LearningRate 0.0250   Epoch: 14   Global Step: 150540   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:00:59,634-Speed 5489.57 samples/sec   Loss 3.1068   LearningRate 0.0250   Epoch: 14   Global Step: 150550   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:01:07,164-Speed 5440.18 samples/sec   Loss 3.0125   LearningRate 0.0250   Epoch: 14   Global Step: 150560   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:01:14,670-Speed 5457.44 samples/sec   Loss 3.0257   LearningRate 0.0249   Epoch: 14   Global Step: 150570   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:01:22,092-Speed 5520.28 samples/sec   Loss 3.0528   LearningRate 0.0249   Epoch: 14   Global Step: 150580   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:01:29,554-Speed 5489.62 samples/sec   Loss 3.0094   LearningRate 0.0249   Epoch: 14   Global Step: 150590   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:01:36,978-Speed 5518.04 samples/sec   Loss 3.0392   LearningRate 0.0249   Epoch: 14   Global Step: 150600   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:01:44,453-Speed 5480.25 samples/sec   Loss 3.0487   LearningRate 0.0249   Epoch: 14   Global Step: 150610   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:01:51,912-Speed 5492.44 samples/sec   Loss 3.0445   LearningRate 0.0249   Epoch: 14   Global Step: 150620   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:01:59,329-Speed 5523.20 samples/sec   Loss 3.0584   LearningRate 0.0249   Epoch: 14   Global Step: 150630   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:02:06,790-Speed 5490.43 samples/sec   Loss 3.0636   LearningRate 0.0249   Epoch: 14   Global Step: 150640   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:02:14,258-Speed 5485.39 samples/sec   Loss 3.0849   LearningRate 0.0249   Epoch: 14   Global Step: 150650   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:02:21,739-Speed 5476.58 samples/sec   Loss 3.0561   LearningRate 0.0249   Epoch: 14   Global Step: 150660   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:02:29,213-Speed 5480.87 samples/sec   Loss 3.0264   LearningRate 0.0249   Epoch: 14   Global Step: 150670   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:02:36,741-Speed 5441.30 samples/sec   Loss 3.0279   LearningRate 0.0248   Epoch: 14   Global Step: 150680   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:02:44,209-Speed 5485.63 samples/sec   Loss 3.0318   LearningRate 0.0248   Epoch: 14   Global Step: 150690   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:02:51,735-Speed 5443.36 samples/sec   Loss 3.0355   LearningRate 0.0248   Epoch: 14   Global Step: 150700   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:02:59,352-Speed 5378.86 samples/sec   Loss 2.9945   LearningRate 0.0248   Epoch: 14   Global Step: 150710   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:03:06,819-Speed 5486.13 samples/sec   Loss 3.0525   LearningRate 0.0248   Epoch: 14   Global Step: 150720   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:03:14,367-Speed 5427.30 samples/sec   Loss 3.0386   LearningRate 0.0248   Epoch: 14   Global Step: 150730   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:03:21,903-Speed 5436.22 samples/sec   Loss 3.0357   LearningRate 0.0248   Epoch: 14   Global Step: 150740   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:03:29,488-Speed 5400.55 samples/sec   Loss 3.0483   LearningRate 0.0248   Epoch: 14   Global Step: 150750   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:03:39,279-Speed 4184.25 samples/sec   Loss 3.0581   LearningRate 0.0248   Epoch: 14   Global Step: 150760   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:03:46,834-Speed 5422.47 samples/sec   Loss 3.0384   LearningRate 0.0248   Epoch: 14   Global Step: 150770   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:03:54,484-Speed 5354.81 samples/sec   Loss 3.0704   LearningRate 0.0248   Epoch: 14   Global Step: 150780   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:04:01,905-Speed 5519.93 samples/sec   Loss 3.0164   LearningRate 0.0248   Epoch: 14   Global Step: 150790   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:04:09,382-Speed 5479.11 samples/sec   Loss 3.0494   LearningRate 0.0247   Epoch: 14   Global Step: 150800   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:04:16,803-Speed 5520.52 samples/sec   Loss 3.0084   LearningRate 0.0247   Epoch: 14   Global Step: 150810   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:04:24,276-Speed 5481.52 samples/sec   Loss 3.0442   LearningRate 0.0247   Epoch: 14   Global Step: 150820   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:04:31,773-Speed 5463.93 samples/sec   Loss 3.0267   LearningRate 0.0247   Epoch: 14   Global Step: 150830   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:04:39,243-Speed 5483.87 samples/sec   Loss 3.0366   LearningRate 0.0247   Epoch: 14   Global Step: 150840   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:04:46,707-Speed 5488.45 samples/sec   Loss 3.0330   LearningRate 0.0247   Epoch: 14   Global Step: 150850   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:04:54,132-Speed 5517.44 samples/sec   Loss 3.0689   LearningRate 0.0247   Epoch: 14   Global Step: 150860   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:05:01,619-Speed 5470.92 samples/sec   Loss 3.0092   LearningRate 0.0247   Epoch: 14   Global Step: 150870   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:05:09,098-Speed 5478.06 samples/sec   Loss 3.0612   LearningRate 0.0247   Epoch: 14   Global Step: 150880   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:05:16,558-Speed 5491.59 samples/sec   Loss 3.0089   LearningRate 0.0247   Epoch: 14   Global Step: 150890   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:05:24,021-Speed 5488.57 samples/sec   Loss 3.0118   LearningRate 0.0247   Epoch: 14   Global Step: 150900   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:05:31,442-Speed 5519.95 samples/sec   Loss 3.0264   LearningRate 0.0246   Epoch: 14   Global Step: 150910   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:05:38,908-Speed 5487.38 samples/sec   Loss 3.0512   LearningRate 0.0246   Epoch: 14   Global Step: 150920   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:05:46,356-Speed 5500.25 samples/sec   Loss 2.9892   LearningRate 0.0246   Epoch: 14   Global Step: 150930   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:05:53,807-Speed 5497.71 samples/sec   Loss 3.0180   LearningRate 0.0246   Epoch: 14   Global Step: 150940   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:06:01,229-Speed 5519.22 samples/sec   Loss 3.0507   LearningRate 0.0246   Epoch: 14   Global Step: 150950   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:06:08,686-Speed 5493.85 samples/sec   Loss 3.0146   LearningRate 0.0246   Epoch: 14   Global Step: 150960   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:06:16,137-Speed 5498.20 samples/sec   Loss 3.0404   LearningRate 0.0246   Epoch: 14   Global Step: 150970   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:06:23,614-Speed 5478.98 samples/sec   Loss 3.0008   LearningRate 0.0246   Epoch: 14   Global Step: 150980   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-01-09 05:06:31,062-Speed 5499.44 samples/sec   Loss 3.0566   LearningRate 0.0246   Epoch: 14   Global Step: 150990   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-01-09 05:06:38,630-Speed 5413.35 samples/sec   Loss 3.0533   LearningRate 0.0246   Epoch: 14   Global Step: 151000   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-01-09 05:06:46,089-Speed 5492.62 samples/sec   Loss 3.0326   LearningRate 0.0246   Epoch: 14   Global Step: 151010   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-01-09 05:06:53,616-Speed 5442.05 samples/sec   Loss 3.0038   LearningRate 0.0246   Epoch: 14   Global Step: 151020   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-01-09 05:07:01,061-Speed 5501.99 samples/sec   Loss 3.0417   LearningRate 0.0245   Epoch: 14   Global Step: 151030   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-01-09 05:07:08,593-Speed 5438.97 samples/sec   Loss 3.0429   LearningRate 0.0245   Epoch: 14   Global Step: 151040   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-01-09 05:07:16,056-Speed 5489.38 samples/sec   Loss 3.0264   LearningRate 0.0245   Epoch: 14   Global Step: 151050   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-01-09 05:07:23,515-Speed 5492.08 samples/sec   Loss 3.0363   LearningRate 0.0245   Epoch: 14   Global Step: 151060   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-01-09 05:07:31,065-Speed 5425.49 samples/sec   Loss 3.0154   LearningRate 0.0245   Epoch: 14   Global Step: 151070   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-01-09 05:07:38,538-Speed 5482.33 samples/sec   Loss 3.0209   LearningRate 0.0245   Epoch: 14   Global Step: 151080   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:07:46,034-Speed 5465.27 samples/sec   Loss 3.0354   LearningRate 0.0245   Epoch: 14   Global Step: 151090   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:07:53,535-Speed 5461.31 samples/sec   Loss 3.0118   LearningRate 0.0245   Epoch: 14   Global Step: 151100   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:08:01,049-Speed 5451.54 samples/sec   Loss 3.0124   LearningRate 0.0245   Epoch: 14   Global Step: 151110   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:08:08,482-Speed 5511.02 samples/sec   Loss 3.0340   LearningRate 0.0245   Epoch: 14   Global Step: 151120   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:08:16,031-Speed 5426.46 samples/sec   Loss 3.0395   LearningRate 0.0245   Epoch: 14   Global Step: 151130   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:08:23,560-Speed 5441.01 samples/sec   Loss 3.0412   LearningRate 0.0244   Epoch: 14   Global Step: 151140   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:08:31,047-Speed 5471.49 samples/sec   Loss 3.0191   LearningRate 0.0244   Epoch: 14   Global Step: 151150   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:08:38,560-Speed 5452.48 samples/sec   Loss 3.0085   LearningRate 0.0244   Epoch: 14   Global Step: 151160   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:08:46,104-Speed 5430.84 samples/sec   Loss 3.0401   LearningRate 0.0244   Epoch: 14   Global Step: 151170   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:08:53,630-Speed 5443.02 samples/sec   Loss 2.9932   LearningRate 0.0244   Epoch: 14   Global Step: 151180   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:09:01,124-Speed 5466.05 samples/sec   Loss 3.0023   LearningRate 0.0244   Epoch: 14   Global Step: 151190   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:09:08,629-Speed 5457.98 samples/sec   Loss 3.0364   LearningRate 0.0244   Epoch: 14   Global Step: 151200   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:09:16,178-Speed 5426.99 samples/sec   Loss 3.0410   LearningRate 0.0244   Epoch: 14   Global Step: 151210   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:09:23,686-Speed 5456.20 samples/sec   Loss 3.0293   LearningRate 0.0244   Epoch: 14   Global Step: 151220   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:09:31,189-Speed 5459.90 samples/sec   Loss 3.0196   LearningRate 0.0244   Epoch: 14   Global Step: 151230   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:09:38,720-Speed 5439.70 samples/sec   Loss 2.9674   LearningRate 0.0244   Epoch: 14   Global Step: 151240   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:09:46,221-Speed 5461.36 samples/sec   Loss 3.0328   LearningRate 0.0244   Epoch: 14   Global Step: 151250   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:09:53,788-Speed 5413.96 samples/sec   Loss 3.0319   LearningRate 0.0243   Epoch: 14   Global Step: 151260   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:10:01,365-Speed 5406.49 samples/sec   Loss 2.9892   LearningRate 0.0243   Epoch: 14   Global Step: 151270   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:10:08,860-Speed 5465.88 samples/sec   Loss 3.0233   LearningRate 0.0243   Epoch: 14   Global Step: 151280   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:10:16,366-Speed 5457.53 samples/sec   Loss 3.0167   LearningRate 0.0243   Epoch: 14   Global Step: 151290   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:10:23,764-Speed 5537.59 samples/sec   Loss 2.9742   LearningRate 0.0243   Epoch: 14   Global Step: 151300   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:10:31,262-Speed 5463.18 samples/sec   Loss 3.0258   LearningRate 0.0243   Epoch: 14   Global Step: 151310   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:10:38,775-Speed 5452.81 samples/sec   Loss 2.9944   LearningRate 0.0243   Epoch: 14   Global Step: 151320   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:10:46,381-Speed 5385.67 samples/sec   Loss 3.0122   LearningRate 0.0243   Epoch: 14   Global Step: 151330   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:10:53,858-Speed 5479.77 samples/sec   Loss 3.0327   LearningRate 0.0243   Epoch: 14   Global Step: 151340   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:11:01,393-Speed 5436.46 samples/sec   Loss 2.9848   LearningRate 0.0243   Epoch: 14   Global Step: 151350   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:11:08,862-Speed 5484.20 samples/sec   Loss 3.0258   LearningRate 0.0243   Epoch: 14   Global Step: 151360   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:11:16,362-Speed 5462.44 samples/sec   Loss 3.0417   LearningRate 0.0242   Epoch: 14   Global Step: 151370   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:11:24,034-Speed 5340.06 samples/sec   Loss 3.0122   LearningRate 0.0242   Epoch: 14   Global Step: 151380   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:11:31,527-Speed 5466.40 samples/sec   Loss 2.9714   LearningRate 0.0242   Epoch: 14   Global Step: 151390   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:11:39,070-Speed 5431.19 samples/sec   Loss 3.0040   LearningRate 0.0242   Epoch: 14   Global Step: 151400   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:11:46,617-Speed 5427.56 samples/sec   Loss 2.9978   LearningRate 0.0242   Epoch: 14   Global Step: 151410   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:11:54,253-Speed 5365.06 samples/sec   Loss 2.9965   LearningRate 0.0242   Epoch: 14   Global Step: 151420   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:12:01,837-Speed 5401.65 samples/sec   Loss 3.0393   LearningRate 0.0242   Epoch: 14   Global Step: 151430   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:12:09,371-Speed 5437.55 samples/sec   Loss 2.9668   LearningRate 0.0242   Epoch: 14   Global Step: 151440   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:12:16,843-Speed 5482.47 samples/sec   Loss 2.9752   LearningRate 0.0242   Epoch: 14   Global Step: 151450   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:12:24,286-Speed 5504.46 samples/sec   Loss 2.9716   LearningRate 0.0242   Epoch: 14   Global Step: 151460   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:12:31,714-Speed 5514.57 samples/sec   Loss 3.0253   LearningRate 0.0242   Epoch: 14   Global Step: 151470   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:12:39,195-Speed 5476.12 samples/sec   Loss 2.9771   LearningRate 0.0242   Epoch: 14   Global Step: 151480   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:12:46,686-Speed 5468.72 samples/sec   Loss 3.0202   LearningRate 0.0241   Epoch: 14   Global Step: 151490   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:12:54,227-Speed 5432.39 samples/sec   Loss 2.9922   LearningRate 0.0241   Epoch: 14   Global Step: 151500   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:13:01,784-Speed 5421.08 samples/sec   Loss 3.0602   LearningRate 0.0241   Epoch: 14   Global Step: 151510   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:13:09,291-Speed 5456.71 samples/sec   Loss 3.0142   LearningRate 0.0241   Epoch: 14   Global Step: 151520   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:13:16,848-Speed 5420.64 samples/sec   Loss 2.9527   LearningRate 0.0241   Epoch: 14   Global Step: 151530   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:13:24,398-Speed 5426.43 samples/sec   Loss 3.0080   LearningRate 0.0241   Epoch: 14   Global Step: 151540   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:13:31,921-Speed 5445.06 samples/sec   Loss 2.9860   LearningRate 0.0241   Epoch: 14   Global Step: 151550   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:13:39,530-Speed 5383.93 samples/sec   Loss 3.0131   LearningRate 0.0241   Epoch: 14   Global Step: 151560   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:13:47,036-Speed 5457.35 samples/sec   Loss 2.9734   LearningRate 0.0241   Epoch: 14   Global Step: 151570   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:13:54,587-Speed 5425.64 samples/sec   Loss 2.9889   LearningRate 0.0241   Epoch: 14   Global Step: 151580   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:14:02,130-Speed 5430.67 samples/sec   Loss 2.9889   LearningRate 0.0241   Epoch: 14   Global Step: 151590   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:14:09,649-Speed 5448.53 samples/sec   Loss 3.0005   LearningRate 0.0240   Epoch: 14   Global Step: 151600   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:14:17,277-Speed 5370.07 samples/sec   Loss 2.9998   LearningRate 0.0240   Epoch: 14   Global Step: 151610   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:14:24,870-Speed 5394.91 samples/sec   Loss 2.9926   LearningRate 0.0240   Epoch: 14   Global Step: 151620   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:14:32,460-Speed 5397.83 samples/sec   Loss 2.9685   LearningRate 0.0240   Epoch: 14   Global Step: 151630   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:14:40,035-Speed 5407.61 samples/sec   Loss 2.9784   LearningRate 0.0240   Epoch: 14   Global Step: 151640   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:14:47,575-Speed 5432.64 samples/sec   Loss 2.9488   LearningRate 0.0240   Epoch: 14   Global Step: 151650   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:14:55,093-Speed 5449.66 samples/sec   Loss 2.9877   LearningRate 0.0240   Epoch: 14   Global Step: 151660   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:15:02,673-Speed 5404.18 samples/sec   Loss 2.9734   LearningRate 0.0240   Epoch: 14   Global Step: 151670   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:15:10,177-Speed 5459.03 samples/sec   Loss 2.9986   LearningRate 0.0240   Epoch: 14   Global Step: 151680   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:15:17,721-Speed 5429.76 samples/sec   Loss 2.9480   LearningRate 0.0240   Epoch: 14   Global Step: 151690   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:15:25,210-Speed 5470.77 samples/sec   Loss 2.9631   LearningRate 0.0240   Epoch: 14   Global Step: 151700   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:15:32,674-Speed 5487.90 samples/sec   Loss 2.9813   LearningRate 0.0240   Epoch: 14   Global Step: 151710   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:15:40,197-Speed 5445.71 samples/sec   Loss 2.9841   LearningRate 0.0239   Epoch: 14   Global Step: 151720   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:15:47,752-Speed 5422.02 samples/sec   Loss 3.0099   LearningRate 0.0239   Epoch: 14   Global Step: 151730   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:15:55,225-Speed 5481.85 samples/sec   Loss 3.0079   LearningRate 0.0239   Epoch: 14   Global Step: 151740   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:16:02,770-Speed 5429.73 samples/sec   Loss 2.9618   LearningRate 0.0239   Epoch: 14   Global Step: 151750   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:16:10,456-Speed 5329.71 samples/sec   Loss 2.9662   LearningRate 0.0239   Epoch: 14   Global Step: 151760   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:16:17,984-Speed 5441.38 samples/sec   Loss 2.9740   LearningRate 0.0239   Epoch: 14   Global Step: 151770   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:16:25,492-Speed 5456.38 samples/sec   Loss 2.9812   LearningRate 0.0239   Epoch: 14   Global Step: 151780   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:16:33,026-Speed 5437.51 samples/sec   Loss 3.0048   LearningRate 0.0239   Epoch: 14   Global Step: 151790   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:16:40,504-Speed 5477.90 samples/sec   Loss 2.9917   LearningRate 0.0239   Epoch: 14   Global Step: 151800   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:16:48,063-Speed 5419.55 samples/sec   Loss 2.9827   LearningRate 0.0239   Epoch: 14   Global Step: 151810   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:16:55,553-Speed 5469.78 samples/sec   Loss 2.9719   LearningRate 0.0239   Epoch: 14   Global Step: 151820   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:17:03,156-Speed 5388.34 samples/sec   Loss 3.0049   LearningRate 0.0239   Epoch: 14   Global Step: 151830   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:17:10,800-Speed 5358.86 samples/sec   Loss 3.0169   LearningRate 0.0238   Epoch: 14   Global Step: 151840   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:17:18,330-Speed 5440.07 samples/sec   Loss 3.0167   LearningRate 0.0238   Epoch: 14   Global Step: 151850   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:17:25,813-Speed 5474.65 samples/sec   Loss 2.9875   LearningRate 0.0238   Epoch: 14   Global Step: 151860   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:17:33,345-Speed 5439.04 samples/sec   Loss 2.9719   LearningRate 0.0238   Epoch: 14   Global Step: 151870   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:17:40,874-Speed 5441.44 samples/sec   Loss 2.9589   LearningRate 0.0238   Epoch: 14   Global Step: 151880   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:17:48,434-Speed 5418.26 samples/sec   Loss 3.0141   LearningRate 0.0238   Epoch: 14   Global Step: 151890   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:17:55,967-Speed 5438.07 samples/sec   Loss 3.0063   LearningRate 0.0238   Epoch: 14   Global Step: 151900   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:18:03,540-Speed 5409.37 samples/sec   Loss 2.9870   LearningRate 0.0238   Epoch: 14   Global Step: 151910   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:18:11,057-Speed 5449.40 samples/sec   Loss 2.9338   LearningRate 0.0238   Epoch: 14   Global Step: 151920   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:18:18,584-Speed 5442.47 samples/sec   Loss 2.9536   LearningRate 0.0238   Epoch: 14   Global Step: 151930   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:18:26,115-Speed 5440.00 samples/sec   Loss 2.9510   LearningRate 0.0238   Epoch: 14   Global Step: 151940   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:18:33,678-Speed 5416.07 samples/sec   Loss 3.0068   LearningRate 0.0237   Epoch: 14   Global Step: 151950   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:18:41,204-Speed 5443.59 samples/sec   Loss 2.9605   LearningRate 0.0237   Epoch: 14   Global Step: 151960   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:18:48,725-Speed 5446.63 samples/sec   Loss 2.9945   LearningRate 0.0237   Epoch: 14   Global Step: 151970   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:18:56,266-Speed 5432.24 samples/sec   Loss 2.9767   LearningRate 0.0237   Epoch: 14   Global Step: 151980   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:19:03,860-Speed 5394.85 samples/sec   Loss 2.9611   LearningRate 0.0237   Epoch: 14   Global Step: 151990   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:19:11,438-Speed 5405.93 samples/sec   Loss 2.9764   LearningRate 0.0237   Epoch: 14   Global Step: 152000   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:19:55,061-[lfw][152000]XNorm: 23.242325
Training: 2022-01-09 05:19:55,062-[lfw][152000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-01-09 05:19:55,062-[lfw][152000]Accuracy-Highest: 0.99817
Training: 2022-01-09 05:20:45,917-[cfp_fp][152000]XNorm: 21.870703
Training: 2022-01-09 05:20:45,918-[cfp_fp][152000]Accuracy-Flip: 0.99129+-0.00548
Training: 2022-01-09 05:20:45,919-[cfp_fp][152000]Accuracy-Highest: 0.99314
Training: 2022-01-09 05:21:29,704-[agedb_30][152000]XNorm: 23.497709
Training: 2022-01-09 05:21:29,705-[agedb_30][152000]Accuracy-Flip: 0.98217+-0.00837
Training: 2022-01-09 05:21:29,705-[agedb_30][152000]Accuracy-Highest: 0.98217
Training: 2022-01-09 05:21:37,310-Speed 280.80 samples/sec   Loss 2.9679   LearningRate 0.0237   Epoch: 14   Global Step: 152010   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:21:44,843-Speed 5438.32 samples/sec   Loss 2.9601   LearningRate 0.0237   Epoch: 14   Global Step: 152020   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:21:52,479-Speed 5364.51 samples/sec   Loss 2.9773   LearningRate 0.0237   Epoch: 14   Global Step: 152030   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:22:00,112-Speed 5366.89 samples/sec   Loss 2.9684   LearningRate 0.0237   Epoch: 14   Global Step: 152040   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:22:07,849-Speed 5295.23 samples/sec   Loss 2.9360   LearningRate 0.0237   Epoch: 14   Global Step: 152050   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:22:15,634-Speed 5261.27 samples/sec   Loss 2.9837   LearningRate 0.0237   Epoch: 14   Global Step: 152060   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:22:23,377-Speed 5291.05 samples/sec   Loss 2.9738   LearningRate 0.0236   Epoch: 14   Global Step: 152070   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:22:31,125-Speed 5287.70 samples/sec   Loss 2.9405   LearningRate 0.0236   Epoch: 14   Global Step: 152080   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:22:38,729-Speed 5386.76 samples/sec   Loss 2.9603   LearningRate 0.0236   Epoch: 14   Global Step: 152090   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:22:46,247-Speed 5449.14 samples/sec   Loss 2.9790   LearningRate 0.0236   Epoch: 14   Global Step: 152100   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:22:53,787-Speed 5432.85 samples/sec   Loss 2.9789   LearningRate 0.0236   Epoch: 14   Global Step: 152110   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:23:01,289-Speed 5460.53 samples/sec   Loss 2.9021   LearningRate 0.0236   Epoch: 14   Global Step: 152120   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:23:08,815-Speed 5443.38 samples/sec   Loss 2.9383   LearningRate 0.0236   Epoch: 14   Global Step: 152130   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:23:16,412-Speed 5391.83 samples/sec   Loss 2.9524   LearningRate 0.0236   Epoch: 14   Global Step: 152140   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:23:23,988-Speed 5407.46 samples/sec   Loss 2.9324   LearningRate 0.0236   Epoch: 14   Global Step: 152150   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:23:31,518-Speed 5440.74 samples/sec   Loss 2.9650   LearningRate 0.0236   Epoch: 14   Global Step: 152160   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:23:39,096-Speed 5405.77 samples/sec   Loss 2.9489   LearningRate 0.0236   Epoch: 14   Global Step: 152170   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:23:46,578-Speed 5474.99 samples/sec   Loss 2.9507   LearningRate 0.0236   Epoch: 14   Global Step: 152180   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:23:54,151-Speed 5409.60 samples/sec   Loss 2.9701   LearningRate 0.0235   Epoch: 14   Global Step: 152190   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:24:01,675-Speed 5444.25 samples/sec   Loss 2.9789   LearningRate 0.0235   Epoch: 14   Global Step: 152200   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:24:09,242-Speed 5413.66 samples/sec   Loss 2.9266   LearningRate 0.0235   Epoch: 14   Global Step: 152210   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:24:16,821-Speed 5404.97 samples/sec   Loss 2.9645   LearningRate 0.0235   Epoch: 14   Global Step: 152220   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:24:24,366-Speed 5429.71 samples/sec   Loss 2.9949   LearningRate 0.0235   Epoch: 14   Global Step: 152230   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:24:32,023-Speed 5350.60 samples/sec   Loss 2.9422   LearningRate 0.0235   Epoch: 14   Global Step: 152240   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:24:39,545-Speed 5446.29 samples/sec   Loss 2.9371   LearningRate 0.0235   Epoch: 14   Global Step: 152250   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:24:47,031-Speed 5472.07 samples/sec   Loss 2.9435   LearningRate 0.0235   Epoch: 14   Global Step: 152260   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:24:54,558-Speed 5442.40 samples/sec   Loss 2.9694   LearningRate 0.0235   Epoch: 14   Global Step: 152270   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:25:02,154-Speed 5393.16 samples/sec   Loss 2.9316   LearningRate 0.0235   Epoch: 14   Global Step: 152280   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:25:09,695-Speed 5432.21 samples/sec   Loss 2.9750   LearningRate 0.0235   Epoch: 14   Global Step: 152290   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:25:17,279-Speed 5401.95 samples/sec   Loss 2.9355   LearningRate 0.0234   Epoch: 14   Global Step: 152300   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:25:24,931-Speed 5353.32 samples/sec   Loss 2.9483   LearningRate 0.0234   Epoch: 14   Global Step: 152310   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:25:32,479-Speed 5427.51 samples/sec   Loss 2.9544   LearningRate 0.0234   Epoch: 14   Global Step: 152320   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:25:40,011-Speed 5439.38 samples/sec   Loss 2.9623   LearningRate 0.0234   Epoch: 14   Global Step: 152330   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:25:47,558-Speed 5427.66 samples/sec   Loss 2.9340   LearningRate 0.0234   Epoch: 14   Global Step: 152340   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:25:55,082-Speed 5444.70 samples/sec   Loss 2.9222   LearningRate 0.0234   Epoch: 14   Global Step: 152350   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:26:02,654-Speed 5410.17 samples/sec   Loss 2.9252   LearningRate 0.0234   Epoch: 14   Global Step: 152360   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:26:10,151-Speed 5464.30 samples/sec   Loss 2.9371   LearningRate 0.0234   Epoch: 14   Global Step: 152370   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:26:17,673-Speed 5445.98 samples/sec   Loss 2.9587   LearningRate 0.0234   Epoch: 14   Global Step: 152380   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:26:25,222-Speed 5426.69 samples/sec   Loss 2.9476   LearningRate 0.0234   Epoch: 14   Global Step: 152390   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:26:32,813-Speed 5396.74 samples/sec   Loss 2.9145   LearningRate 0.0234   Epoch: 14   Global Step: 152400   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:26:40,298-Speed 5473.13 samples/sec   Loss 2.9566   LearningRate 0.0234   Epoch: 14   Global Step: 152410   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:26:47,761-Speed 5488.96 samples/sec   Loss 2.9141   LearningRate 0.0233   Epoch: 14   Global Step: 152420   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:26:55,280-Speed 5448.04 samples/sec   Loss 2.9173   LearningRate 0.0233   Epoch: 14   Global Step: 152430   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:27:02,841-Speed 5418.87 samples/sec   Loss 2.9279   LearningRate 0.0233   Epoch: 14   Global Step: 152440   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:27:10,338-Speed 5463.87 samples/sec   Loss 2.9165   LearningRate 0.0233   Epoch: 14   Global Step: 152450   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:27:17,861-Speed 5445.57 samples/sec   Loss 2.9549   LearningRate 0.0233   Epoch: 14   Global Step: 152460   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:27:25,394-Speed 5437.65 samples/sec   Loss 2.9507   LearningRate 0.0233   Epoch: 14   Global Step: 152470   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:27:32,850-Speed 5494.74 samples/sec   Loss 2.9760   LearningRate 0.0233   Epoch: 14   Global Step: 152480   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:27:40,391-Speed 5432.35 samples/sec   Loss 2.9515   LearningRate 0.0233   Epoch: 14   Global Step: 152490   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:27:47,893-Speed 5460.68 samples/sec   Loss 2.8951   LearningRate 0.0233   Epoch: 14   Global Step: 152500   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:27:55,430-Speed 5435.22 samples/sec   Loss 2.9736   LearningRate 0.0233   Epoch: 14   Global Step: 152510   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:28:02,947-Speed 5449.84 samples/sec   Loss 2.9894   LearningRate 0.0233   Epoch: 14   Global Step: 152520   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:28:10,462-Speed 5451.52 samples/sec   Loss 2.9166   LearningRate 0.0233   Epoch: 14   Global Step: 152530   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:28:17,964-Speed 5460.38 samples/sec   Loss 2.9492   LearningRate 0.0232   Epoch: 14   Global Step: 152540   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:28:25,514-Speed 5426.10 samples/sec   Loss 2.9664   LearningRate 0.0232   Epoch: 14   Global Step: 152550   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:28:33,023-Speed 5455.65 samples/sec   Loss 2.9634   LearningRate 0.0232   Epoch: 14   Global Step: 152560   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:28:40,547-Speed 5444.57 samples/sec   Loss 2.9737   LearningRate 0.0232   Epoch: 14   Global Step: 152570   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:28:48,082-Speed 5436.87 samples/sec   Loss 2.9343   LearningRate 0.0232   Epoch: 14   Global Step: 152580   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:28:55,608-Speed 5442.98 samples/sec   Loss 2.9281   LearningRate 0.0232   Epoch: 14   Global Step: 152590   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:29:03,179-Speed 5410.83 samples/sec   Loss 2.9034   LearningRate 0.0232   Epoch: 14   Global Step: 152600   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:29:10,658-Speed 5477.72 samples/sec   Loss 2.9590   LearningRate 0.0232   Epoch: 14   Global Step: 152610   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:29:18,190-Speed 5439.09 samples/sec   Loss 2.8729   LearningRate 0.0232   Epoch: 14   Global Step: 152620   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:29:25,742-Speed 5424.00 samples/sec   Loss 2.9042   LearningRate 0.0232   Epoch: 14   Global Step: 152630   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:29:33,297-Speed 5423.08 samples/sec   Loss 2.9228   LearningRate 0.0232   Epoch: 14   Global Step: 152640   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:29:40,825-Speed 5441.35 samples/sec   Loss 2.9200   LearningRate 0.0232   Epoch: 14   Global Step: 152650   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:29:48,364-Speed 5434.19 samples/sec   Loss 2.9789   LearningRate 0.0231   Epoch: 14   Global Step: 152660   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:29:55,912-Speed 5426.92 samples/sec   Loss 2.9356   LearningRate 0.0231   Epoch: 14   Global Step: 152670   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:30:03,440-Speed 5441.60 samples/sec   Loss 2.9338   LearningRate 0.0231   Epoch: 14   Global Step: 152680   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:30:11,002-Speed 5417.79 samples/sec   Loss 2.9492   LearningRate 0.0231   Epoch: 14   Global Step: 152690   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:30:18,609-Speed 5385.31 samples/sec   Loss 2.9344   LearningRate 0.0231   Epoch: 14   Global Step: 152700   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:30:26,108-Speed 5462.45 samples/sec   Loss 2.8950   LearningRate 0.0231   Epoch: 14   Global Step: 152710   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:30:33,624-Speed 5450.87 samples/sec   Loss 2.9508   LearningRate 0.0231   Epoch: 14   Global Step: 152720   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:30:41,111-Speed 5471.49 samples/sec   Loss 2.9452   LearningRate 0.0231   Epoch: 14   Global Step: 152730   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:30:48,585-Speed 5481.00 samples/sec   Loss 2.8945   LearningRate 0.0231   Epoch: 14   Global Step: 152740   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:30:56,101-Speed 5450.63 samples/sec   Loss 2.9491   LearningRate 0.0231   Epoch: 14   Global Step: 152750   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:31:03,621-Speed 5447.65 samples/sec   Loss 2.9419   LearningRate 0.0231   Epoch: 14   Global Step: 152760   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:31:11,105-Speed 5474.31 samples/sec   Loss 2.9522   LearningRate 0.0231   Epoch: 14   Global Step: 152770   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 05:31:18,738-Speed 5366.65 samples/sec   Loss 2.9568   LearningRate 0.0230   Epoch: 14   Global Step: 152780   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:31:26,292-Speed 5422.86 samples/sec   Loss 2.9295   LearningRate 0.0230   Epoch: 14   Global Step: 152790   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:31:33,858-Speed 5414.46 samples/sec   Loss 2.9189   LearningRate 0.0230   Epoch: 14   Global Step: 152800   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:31:41,401-Speed 5431.13 samples/sec   Loss 2.9651   LearningRate 0.0230   Epoch: 14   Global Step: 152810   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 05:31:48,945-Speed 5430.54 samples/sec   Loss 2.9344   LearningRate 0.0230   Epoch: 14   Global Step: 152820   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:31:56,432-Speed 5471.40 samples/sec   Loss 2.9057   LearningRate 0.0230   Epoch: 14   Global Step: 152830   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:32:03,902-Speed 5484.28 samples/sec   Loss 2.8979   LearningRate 0.0230   Epoch: 14   Global Step: 152840   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-01-09 05:32:11,393-Speed 5468.38 samples/sec   Loss 2.9022   LearningRate 0.0230   Epoch: 14   Global Step: 152850   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-01-09 05:32:18,938-Speed 5429.77 samples/sec   Loss 2.9222   LearningRate 0.0230   Epoch: 14   Global Step: 152860   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-01-09 05:32:26,442-Speed 5458.83 samples/sec   Loss 2.9249   LearningRate 0.0230   Epoch: 14   Global Step: 152870   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-01-09 05:32:33,985-Speed 5430.64 samples/sec   Loss 2.9094   LearningRate 0.0230   Epoch: 14   Global Step: 152880   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-01-09 05:32:41,532-Speed 5428.55 samples/sec   Loss 2.8908   LearningRate 0.0229   Epoch: 14   Global Step: 152890   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-01-09 05:32:49,016-Speed 5473.44 samples/sec   Loss 2.8975   LearningRate 0.0229   Epoch: 14   Global Step: 152900   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-01-09 05:32:56,515-Speed 5462.83 samples/sec   Loss 2.9140   LearningRate 0.0229   Epoch: 14   Global Step: 152910   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-01-09 05:33:04,028-Speed 5452.38 samples/sec   Loss 2.8990   LearningRate 0.0229   Epoch: 14   Global Step: 152920   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-01-09 05:33:11,535-Speed 5457.03 samples/sec   Loss 2.8958   LearningRate 0.0229   Epoch: 14   Global Step: 152930   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-01-09 05:33:19,040-Speed 5458.32 samples/sec   Loss 2.9024   LearningRate 0.0229   Epoch: 14   Global Step: 152940   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:33:26,639-Speed 5391.07 samples/sec   Loss 2.8938   LearningRate 0.0229   Epoch: 14   Global Step: 152950   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:33:34,188-Speed 5427.07 samples/sec   Loss 2.9022   LearningRate 0.0229   Epoch: 14   Global Step: 152960   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:33:41,778-Speed 5396.69 samples/sec   Loss 2.9196   LearningRate 0.0229   Epoch: 14   Global Step: 152970   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:33:49,321-Speed 5431.33 samples/sec   Loss 2.9217   LearningRate 0.0229   Epoch: 14   Global Step: 152980   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:33:56,868-Speed 5427.52 samples/sec   Loss 2.9191   LearningRate 0.0229   Epoch: 14   Global Step: 152990   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:34:04,341-Speed 5482.06 samples/sec   Loss 2.9387   LearningRate 0.0229   Epoch: 14   Global Step: 153000   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:34:11,859-Speed 5448.68 samples/sec   Loss 2.9211   LearningRate 0.0228   Epoch: 14   Global Step: 153010   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:34:19,353-Speed 5466.59 samples/sec   Loss 2.8966   LearningRate 0.0228   Epoch: 14   Global Step: 153020   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:34:26,863-Speed 5454.65 samples/sec   Loss 2.9017   LearningRate 0.0228   Epoch: 14   Global Step: 153030   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:34:34,391-Speed 5441.89 samples/sec   Loss 2.9233   LearningRate 0.0228   Epoch: 14   Global Step: 153040   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:34:41,977-Speed 5400.21 samples/sec   Loss 2.8807   LearningRate 0.0228   Epoch: 14   Global Step: 153050   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:34:49,510-Speed 5437.82 samples/sec   Loss 2.9071   LearningRate 0.0228   Epoch: 14   Global Step: 153060   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:34:57,092-Speed 5403.33 samples/sec   Loss 2.9220   LearningRate 0.0228   Epoch: 14   Global Step: 153070   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:35:04,603-Speed 5453.48 samples/sec   Loss 2.9106   LearningRate 0.0228   Epoch: 14   Global Step: 153080   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:35:12,128-Speed 5443.94 samples/sec   Loss 2.8937   LearningRate 0.0228   Epoch: 14   Global Step: 153090   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:35:19,664-Speed 5436.30 samples/sec   Loss 2.9104   LearningRate 0.0228   Epoch: 14   Global Step: 153100   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:35:27,164-Speed 5461.70 samples/sec   Loss 2.9035   LearningRate 0.0228   Epoch: 14   Global Step: 153110   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:35:34,689-Speed 5444.23 samples/sec   Loss 2.9128   LearningRate 0.0228   Epoch: 14   Global Step: 153120   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:35:42,194-Speed 5458.46 samples/sec   Loss 2.9422   LearningRate 0.0227   Epoch: 14   Global Step: 153130   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:35:49,736-Speed 5431.69 samples/sec   Loss 2.9034   LearningRate 0.0227   Epoch: 14   Global Step: 153140   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:35:57,198-Speed 5489.89 samples/sec   Loss 2.9051   LearningRate 0.0227   Epoch: 14   Global Step: 153150   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:36:04,748-Speed 5425.39 samples/sec   Loss 2.9093   LearningRate 0.0227   Epoch: 14   Global Step: 153160   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:36:12,289-Speed 5432.46 samples/sec   Loss 2.9141   LearningRate 0.0227   Epoch: 14   Global Step: 153170   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:36:19,764-Speed 5480.50 samples/sec   Loss 2.9198   LearningRate 0.0227   Epoch: 14   Global Step: 153180   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:36:27,292-Speed 5441.82 samples/sec   Loss 2.9158   LearningRate 0.0227   Epoch: 14   Global Step: 153190   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:36:34,782-Speed 5468.97 samples/sec   Loss 2.9163   LearningRate 0.0227   Epoch: 14   Global Step: 153200   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:36:42,241-Speed 5492.18 samples/sec   Loss 2.8914   LearningRate 0.0227   Epoch: 14   Global Step: 153210   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:36:49,784-Speed 5430.51 samples/sec   Loss 2.8869   LearningRate 0.0227   Epoch: 14   Global Step: 153220   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:36:57,334-Speed 5426.14 samples/sec   Loss 2.8720   LearningRate 0.0227   Epoch: 14   Global Step: 153230   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:37:04,839-Speed 5458.63 samples/sec   Loss 2.8879   LearningRate 0.0227   Epoch: 14   Global Step: 153240   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:37:12,367-Speed 5442.01 samples/sec   Loss 2.9316   LearningRate 0.0226   Epoch: 14   Global Step: 153250   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:37:19,907-Speed 5433.03 samples/sec   Loss 2.9304   LearningRate 0.0226   Epoch: 14   Global Step: 153260   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:37:27,452-Speed 5429.49 samples/sec   Loss 2.8974   LearningRate 0.0226   Epoch: 14   Global Step: 153270   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:37:34,941-Speed 5469.80 samples/sec   Loss 2.9040   LearningRate 0.0226   Epoch: 14   Global Step: 153280   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:37:42,438-Speed 5464.59 samples/sec   Loss 2.9188   LearningRate 0.0226   Epoch: 14   Global Step: 153290   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:37:49,958-Speed 5447.91 samples/sec   Loss 2.9086   LearningRate 0.0226   Epoch: 14   Global Step: 153300   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 05:37:57,466-Speed 5455.99 samples/sec   Loss 2.8969   LearningRate 0.0226   Epoch: 14   Global Step: 153310   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:38:05,121-Speed 5351.54 samples/sec   Loss 2.9514   LearningRate 0.0226   Epoch: 14   Global Step: 153320   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:38:12,635-Speed 5451.24 samples/sec   Loss 2.9442   LearningRate 0.0226   Epoch: 14   Global Step: 153330   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:38:20,230-Speed 5394.15 samples/sec   Loss 2.9168   LearningRate 0.0226   Epoch: 14   Global Step: 153340   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:38:27,904-Speed 5337.87 samples/sec   Loss 2.8776   LearningRate 0.0226   Epoch: 14   Global Step: 153350   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:38:35,528-Speed 5373.13 samples/sec   Loss 2.9130   LearningRate 0.0226   Epoch: 14   Global Step: 153360   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:38:43,074-Speed 5428.59 samples/sec   Loss 2.8781   LearningRate 0.0225   Epoch: 14   Global Step: 153370   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:38:50,579-Speed 5458.81 samples/sec   Loss 2.8688   LearningRate 0.0225   Epoch: 14   Global Step: 153380   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:38:58,117-Speed 5434.43 samples/sec   Loss 2.8961   LearningRate 0.0225   Epoch: 14   Global Step: 153390   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:39:05,688-Speed 5410.59 samples/sec   Loss 2.9129   LearningRate 0.0225   Epoch: 14   Global Step: 153400   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:39:13,219-Speed 5440.04 samples/sec   Loss 2.9045   LearningRate 0.0225   Epoch: 14   Global Step: 153410   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:39:20,772-Speed 5423.67 samples/sec   Loss 2.8734   LearningRate 0.0225   Epoch: 14   Global Step: 153420   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:39:28,271-Speed 5463.01 samples/sec   Loss 2.8881   LearningRate 0.0225   Epoch: 14   Global Step: 153430   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:39:35,840-Speed 5412.17 samples/sec   Loss 2.8826   LearningRate 0.0225   Epoch: 14   Global Step: 153440   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:39:43,359-Speed 5448.33 samples/sec   Loss 2.9064   LearningRate 0.0225   Epoch: 14   Global Step: 153450   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:39:50,833-Speed 5481.08 samples/sec   Loss 2.9065   LearningRate 0.0225   Epoch: 14   Global Step: 153460   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:39:58,331-Speed 5463.63 samples/sec   Loss 2.8508   LearningRate 0.0225   Epoch: 14   Global Step: 153470   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:40:05,827-Speed 5464.93 samples/sec   Loss 2.8709   LearningRate 0.0225   Epoch: 14   Global Step: 153480   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:40:13,342-Speed 5451.36 samples/sec   Loss 2.8889   LearningRate 0.0224   Epoch: 14   Global Step: 153490   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:40:20,815-Speed 5481.39 samples/sec   Loss 2.9027   LearningRate 0.0224   Epoch: 14   Global Step: 153500   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:40:28,288-Speed 5482.14 samples/sec   Loss 2.9256   LearningRate 0.0224   Epoch: 14   Global Step: 153510   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:40:35,713-Speed 5516.63 samples/sec   Loss 2.8942   LearningRate 0.0224   Epoch: 14   Global Step: 153520   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:40:43,135-Speed 5519.63 samples/sec   Loss 2.8897   LearningRate 0.0224   Epoch: 14   Global Step: 153530   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:40:50,635-Speed 5462.85 samples/sec   Loss 2.8708   LearningRate 0.0224   Epoch: 14   Global Step: 153540   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:40:58,171-Speed 5435.54 samples/sec   Loss 2.9126   LearningRate 0.0224   Epoch: 14   Global Step: 153550   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:41:05,696-Speed 5443.92 samples/sec   Loss 2.8780   LearningRate 0.0224   Epoch: 14   Global Step: 153560   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:41:13,218-Speed 5445.80 samples/sec   Loss 2.9057   LearningRate 0.0224   Epoch: 14   Global Step: 153570   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:41:20,729-Speed 5454.88 samples/sec   Loss 2.8755   LearningRate 0.0224   Epoch: 14   Global Step: 153580   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:41:28,273-Speed 5429.82 samples/sec   Loss 2.8811   LearningRate 0.0224   Epoch: 14   Global Step: 153590   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:41:35,821-Speed 5427.38 samples/sec   Loss 2.8467   LearningRate 0.0224   Epoch: 14   Global Step: 153600   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:41:43,352-Speed 5439.61 samples/sec   Loss 2.8735   LearningRate 0.0223   Epoch: 14   Global Step: 153610   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:41:50,911-Speed 5419.65 samples/sec   Loss 2.8758   LearningRate 0.0223   Epoch: 14   Global Step: 153620   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:41:58,367-Speed 5493.99 samples/sec   Loss 2.8432   LearningRate 0.0223   Epoch: 14   Global Step: 153630   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:42:05,875-Speed 5456.44 samples/sec   Loss 2.9041   LearningRate 0.0223   Epoch: 14   Global Step: 153640   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:42:13,407-Speed 5438.73 samples/sec   Loss 2.8708   LearningRate 0.0223   Epoch: 14   Global Step: 153650   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:42:20,954-Speed 5428.24 samples/sec   Loss 2.8918   LearningRate 0.0223   Epoch: 14   Global Step: 153660   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:42:28,468-Speed 5452.29 samples/sec   Loss 2.8790   LearningRate 0.0223   Epoch: 14   Global Step: 153670   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:42:35,926-Speed 5492.18 samples/sec   Loss 2.8829   LearningRate 0.0223   Epoch: 14   Global Step: 153680   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:42:43,377-Speed 5498.50 samples/sec   Loss 2.8556   LearningRate 0.0223   Epoch: 14   Global Step: 153690   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:42:50,868-Speed 5468.51 samples/sec   Loss 2.8761   LearningRate 0.0223   Epoch: 14   Global Step: 153700   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:42:58,357-Speed 5470.07 samples/sec   Loss 2.8409   LearningRate 0.0223   Epoch: 14   Global Step: 153710   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:43:05,788-Speed 5512.54 samples/sec   Loss 2.8468   LearningRate 0.0223   Epoch: 14   Global Step: 153720   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:43:13,335-Speed 5428.44 samples/sec   Loss 2.8817   LearningRate 0.0222   Epoch: 14   Global Step: 153730   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:43:20,788-Speed 5496.20 samples/sec   Loss 2.8787   LearningRate 0.0222   Epoch: 14   Global Step: 153740   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:43:28,293-Speed 5458.35 samples/sec   Loss 2.8934   LearningRate 0.0222   Epoch: 14   Global Step: 153750   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:43:35,823-Speed 5440.96 samples/sec   Loss 2.8502   LearningRate 0.0222   Epoch: 14   Global Step: 153760   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:43:43,349-Speed 5442.53 samples/sec   Loss 2.8695   LearningRate 0.0222   Epoch: 14   Global Step: 153770   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:43:50,800-Speed 5498.57 samples/sec   Loss 2.8936   LearningRate 0.0222   Epoch: 14   Global Step: 153780   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:43:58,309-Speed 5455.64 samples/sec   Loss 2.8841   LearningRate 0.0222   Epoch: 14   Global Step: 153790   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:44:05,870-Speed 5417.39 samples/sec   Loss 2.8711   LearningRate 0.0222   Epoch: 14   Global Step: 153800   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:44:13,372-Speed 5460.66 samples/sec   Loss 2.8845   LearningRate 0.0222   Epoch: 14   Global Step: 153810   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:44:20,833-Speed 5490.77 samples/sec   Loss 2.8516   LearningRate 0.0222   Epoch: 14   Global Step: 153820   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:44:28,319-Speed 5472.49 samples/sec   Loss 2.9002   LearningRate 0.0222   Epoch: 14   Global Step: 153830   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:44:35,772-Speed 5496.43 samples/sec   Loss 2.8905   LearningRate 0.0222   Epoch: 14   Global Step: 153840   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:44:43,222-Speed 5498.66 samples/sec   Loss 2.8568   LearningRate 0.0221   Epoch: 14   Global Step: 153850   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:44:50,756-Speed 5438.01 samples/sec   Loss 2.9019   LearningRate 0.0221   Epoch: 14   Global Step: 153860   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:44:58,246-Speed 5469.61 samples/sec   Loss 2.8809   LearningRate 0.0221   Epoch: 14   Global Step: 153870   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:45:05,807-Speed 5417.86 samples/sec   Loss 2.8422   LearningRate 0.0221   Epoch: 14   Global Step: 153880   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:45:13,301-Speed 5466.12 samples/sec   Loss 2.8451   LearningRate 0.0221   Epoch: 14   Global Step: 153890   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:45:20,802-Speed 5461.25 samples/sec   Loss 2.8952   LearningRate 0.0221   Epoch: 14   Global Step: 153900   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:45:28,339-Speed 5435.74 samples/sec   Loss 2.8473   LearningRate 0.0221   Epoch: 14   Global Step: 153910   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:45:35,887-Speed 5426.92 samples/sec   Loss 2.8918   LearningRate 0.0221   Epoch: 14   Global Step: 153920   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:45:43,467-Speed 5404.51 samples/sec   Loss 2.8439   LearningRate 0.0221   Epoch: 14   Global Step: 153930   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:45:51,159-Speed 5325.85 samples/sec   Loss 2.8845   LearningRate 0.0221   Epoch: 14   Global Step: 153940   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:45:58,692-Speed 5438.23 samples/sec   Loss 2.8827   LearningRate 0.0221   Epoch: 14   Global Step: 153950   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:46:06,288-Speed 5392.94 samples/sec   Loss 2.8644   LearningRate 0.0221   Epoch: 14   Global Step: 153960   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:46:13,805-Speed 5449.99 samples/sec   Loss 2.8564   LearningRate 0.0220   Epoch: 14   Global Step: 153970   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:46:21,301-Speed 5464.76 samples/sec   Loss 2.8887   LearningRate 0.0220   Epoch: 14   Global Step: 153980   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:46:28,777-Speed 5479.52 samples/sec   Loss 2.8717   LearningRate 0.0220   Epoch: 14   Global Step: 153990   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:46:36,322-Speed 5429.94 samples/sec   Loss 2.8760   LearningRate 0.0220   Epoch: 14   Global Step: 154000   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:47:20,308-[lfw][154000]XNorm: 22.161902
Training: 2022-01-09 05:47:20,309-[lfw][154000]Accuracy-Flip: 0.99800+-0.00221
Training: 2022-01-09 05:47:20,310-[lfw][154000]Accuracy-Highest: 0.99817
Training: 2022-01-09 05:48:11,597-[cfp_fp][154000]XNorm: 21.247886
Training: 2022-01-09 05:48:11,598-[cfp_fp][154000]Accuracy-Flip: 0.99214+-0.00453
Training: 2022-01-09 05:48:11,599-[cfp_fp][154000]Accuracy-Highest: 0.99314
Training: 2022-01-09 05:48:55,760-[agedb_30][154000]XNorm: 22.316006
Training: 2022-01-09 05:48:55,761-[agedb_30][154000]Accuracy-Flip: 0.98217+-0.00615
Training: 2022-01-09 05:48:55,761-[agedb_30][154000]Accuracy-Highest: 0.98217
Training: 2022-01-09 05:49:03,445-Speed 278.41 samples/sec   Loss 2.8560   LearningRate 0.0220   Epoch: 14   Global Step: 154010   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:49:10,981-Speed 5435.79 samples/sec   Loss 2.8659   LearningRate 0.0220   Epoch: 14   Global Step: 154020   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:49:18,521-Speed 5433.07 samples/sec   Loss 2.8748   LearningRate 0.0220   Epoch: 14   Global Step: 154030   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:49:26,033-Speed 5453.49 samples/sec   Loss 2.8628   LearningRate 0.0220   Epoch: 14   Global Step: 154040   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:49:33,539-Speed 5457.61 samples/sec   Loss 2.8550   LearningRate 0.0220   Epoch: 14   Global Step: 154050   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:49:41,070-Speed 5439.40 samples/sec   Loss 2.8940   LearningRate 0.0220   Epoch: 14   Global Step: 154060   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:49:48,582-Speed 5453.49 samples/sec   Loss 2.8705   LearningRate 0.0220   Epoch: 14   Global Step: 154070   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:49:56,149-Speed 5413.98 samples/sec   Loss 2.9256   LearningRate 0.0220   Epoch: 14   Global Step: 154080   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:50:03,747-Speed 5391.54 samples/sec   Loss 2.8721   LearningRate 0.0219   Epoch: 14   Global Step: 154090   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-01-09 05:50:11,302-Speed 5422.64 samples/sec   Loss 2.8669   LearningRate 0.0219   Epoch: 14   Global Step: 154100   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-01-09 05:50:18,854-Speed 5424.06 samples/sec   Loss 2.8243   LearningRate 0.0219   Epoch: 14   Global Step: 154110   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-01-09 05:50:26,388-Speed 5437.23 samples/sec   Loss 2.8978   LearningRate 0.0219   Epoch: 14   Global Step: 154120   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-01-09 05:50:33,932-Speed 5431.21 samples/sec   Loss 2.8100   LearningRate 0.0219   Epoch: 14   Global Step: 154130   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-01-09 05:50:41,530-Speed 5391.32 samples/sec   Loss 2.8514   LearningRate 0.0219   Epoch: 14   Global Step: 154140   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-01-09 05:50:49,048-Speed 5449.15 samples/sec   Loss 2.8886   LearningRate 0.0219   Epoch: 14   Global Step: 154150   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-01-09 05:50:56,580-Speed 5438.76 samples/sec   Loss 2.8696   LearningRate 0.0219   Epoch: 14   Global Step: 154160   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-01-09 05:51:04,226-Speed 5358.12 samples/sec   Loss 2.8113   LearningRate 0.0219   Epoch: 14   Global Step: 154170   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-01-09 05:51:11,748-Speed 5445.86 samples/sec   Loss 2.8687   LearningRate 0.0219   Epoch: 14   Global Step: 154180   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-01-09 05:51:19,270-Speed 5445.84 samples/sec   Loss 2.8659   LearningRate 0.0219   Epoch: 14   Global Step: 154190   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:51:26,822-Speed 5424.97 samples/sec   Loss 2.8464   LearningRate 0.0219   Epoch: 14   Global Step: 154200   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:51:34,355-Speed 5438.59 samples/sec   Loss 2.8447   LearningRate 0.0219   Epoch: 14   Global Step: 154210   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:51:41,900-Speed 5429.03 samples/sec   Loss 2.8125   LearningRate 0.0218   Epoch: 14   Global Step: 154220   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:51:49,487-Speed 5399.44 samples/sec   Loss 2.8013   LearningRate 0.0218   Epoch: 14   Global Step: 154230   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:51:57,071-Speed 5401.98 samples/sec   Loss 2.8354   LearningRate 0.0218   Epoch: 14   Global Step: 154240   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:52:04,562-Speed 5468.34 samples/sec   Loss 2.8385   LearningRate 0.0218   Epoch: 14   Global Step: 154250   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:52:12,048-Speed 5471.91 samples/sec   Loss 2.8500   LearningRate 0.0218   Epoch: 14   Global Step: 154260   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:52:19,613-Speed 5415.25 samples/sec   Loss 2.8642   LearningRate 0.0218   Epoch: 14   Global Step: 154270   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:52:27,166-Speed 5423.94 samples/sec   Loss 2.8282   LearningRate 0.0218   Epoch: 14   Global Step: 154280   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:52:34,651-Speed 5473.74 samples/sec   Loss 2.8697   LearningRate 0.0218   Epoch: 14   Global Step: 154290   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:52:42,167-Speed 5450.60 samples/sec   Loss 2.8311   LearningRate 0.0218   Epoch: 14   Global Step: 154300   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:52:49,699-Speed 5438.59 samples/sec   Loss 2.8528   LearningRate 0.0218   Epoch: 14   Global Step: 154310   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:52:57,219-Speed 5447.45 samples/sec   Loss 2.8326   LearningRate 0.0218   Epoch: 14   Global Step: 154320   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:53:04,717-Speed 5463.49 samples/sec   Loss 2.8494   LearningRate 0.0218   Epoch: 14   Global Step: 154330   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:53:12,260-Speed 5430.67 samples/sec   Loss 2.8482   LearningRate 0.0217   Epoch: 14   Global Step: 154340   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:53:19,778-Speed 5448.91 samples/sec   Loss 2.8323   LearningRate 0.0217   Epoch: 14   Global Step: 154350   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:53:27,294-Speed 5450.77 samples/sec   Loss 2.8024   LearningRate 0.0217   Epoch: 14   Global Step: 154360   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:53:34,934-Speed 5361.51 samples/sec   Loss 2.8340   LearningRate 0.0217   Epoch: 14   Global Step: 154370   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:53:42,489-Speed 5422.58 samples/sec   Loss 2.8198   LearningRate 0.0217   Epoch: 14   Global Step: 154380   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:53:50,081-Speed 5395.56 samples/sec   Loss 2.8625   LearningRate 0.0217   Epoch: 14   Global Step: 154390   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:53:57,649-Speed 5412.67 samples/sec   Loss 2.8101   LearningRate 0.0217   Epoch: 14   Global Step: 154400   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:54:05,192-Speed 5431.66 samples/sec   Loss 2.7884   LearningRate 0.0217   Epoch: 14   Global Step: 154410   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:54:12,836-Speed 5359.13 samples/sec   Loss 2.8522   LearningRate 0.0217   Epoch: 14   Global Step: 154420   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:54:20,388-Speed 5423.62 samples/sec   Loss 2.8299   LearningRate 0.0217   Epoch: 14   Global Step: 154430   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:54:27,945-Speed 5420.97 samples/sec   Loss 2.8686   LearningRate 0.0217   Epoch: 14   Global Step: 154440   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:54:35,518-Speed 5409.64 samples/sec   Loss 2.8100   LearningRate 0.0217   Epoch: 14   Global Step: 154450   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:54:43,082-Speed 5416.34 samples/sec   Loss 2.8510   LearningRate 0.0216   Epoch: 14   Global Step: 154460   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:54:50,728-Speed 5357.11 samples/sec   Loss 2.8434   LearningRate 0.0216   Epoch: 14   Global Step: 154470   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:54:58,302-Speed 5409.30 samples/sec   Loss 2.8526   LearningRate 0.0216   Epoch: 14   Global Step: 154480   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:55:05,930-Speed 5370.08 samples/sec   Loss 2.8715   LearningRate 0.0216   Epoch: 14   Global Step: 154490   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:55:13,481-Speed 5425.46 samples/sec   Loss 2.8417   LearningRate 0.0216   Epoch: 14   Global Step: 154500   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:55:21,049-Speed 5413.10 samples/sec   Loss 2.8512   LearningRate 0.0216   Epoch: 14   Global Step: 154510   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:55:28,638-Speed 5397.74 samples/sec   Loss 2.8562   LearningRate 0.0216   Epoch: 14   Global Step: 154520   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:55:36,227-Speed 5398.34 samples/sec   Loss 2.8624   LearningRate 0.0216   Epoch: 14   Global Step: 154530   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:55:43,789-Speed 5417.79 samples/sec   Loss 2.8197   LearningRate 0.0216   Epoch: 14   Global Step: 154540   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:55:51,340-Speed 5424.48 samples/sec   Loss 2.8361   LearningRate 0.0216   Epoch: 14   Global Step: 154550   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:55:58,905-Speed 5414.95 samples/sec   Loss 2.8227   LearningRate 0.0216   Epoch: 14   Global Step: 154560   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:56:06,505-Speed 5390.69 samples/sec   Loss 2.8198   LearningRate 0.0216   Epoch: 14   Global Step: 154570   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:56:14,048-Speed 5431.16 samples/sec   Loss 2.8375   LearningRate 0.0215   Epoch: 14   Global Step: 154580   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:56:21,579-Speed 5439.35 samples/sec   Loss 2.8283   LearningRate 0.0215   Epoch: 14   Global Step: 154590   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:56:29,136-Speed 5421.00 samples/sec   Loss 2.8495   LearningRate 0.0215   Epoch: 14   Global Step: 154600   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:56:36,706-Speed 5411.43 samples/sec   Loss 2.8051   LearningRate 0.0215   Epoch: 14   Global Step: 154610   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:56:44,276-Speed 5412.02 samples/sec   Loss 2.8378   LearningRate 0.0215   Epoch: 14   Global Step: 154620   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:56:51,923-Speed 5357.04 samples/sec   Loss 2.8252   LearningRate 0.0215   Epoch: 14   Global Step: 154630   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:56:59,575-Speed 5353.45 samples/sec   Loss 2.7969   LearningRate 0.0215   Epoch: 14   Global Step: 154640   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:57:07,245-Speed 5341.12 samples/sec   Loss 2.8245   LearningRate 0.0215   Epoch: 14   Global Step: 154650   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:57:14,828-Speed 5402.36 samples/sec   Loss 2.8400   LearningRate 0.0215   Epoch: 14   Global Step: 154660   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 05:57:22,537-Speed 5313.46 samples/sec   Loss 2.8255   LearningRate 0.0215   Epoch: 14   Global Step: 154670   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:57:30,038-Speed 5461.19 samples/sec   Loss 2.8514   LearningRate 0.0215   Epoch: 14   Global Step: 154680   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:57:37,563-Speed 5444.28 samples/sec   Loss 2.8358   LearningRate 0.0215   Epoch: 14   Global Step: 154690   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:57:45,237-Speed 5338.84 samples/sec   Loss 2.8481   LearningRate 0.0215   Epoch: 14   Global Step: 154700   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:57:52,799-Speed 5416.67 samples/sec   Loss 2.8418   LearningRate 0.0214   Epoch: 14   Global Step: 154710   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:58:00,351-Speed 5423.99 samples/sec   Loss 2.8391   LearningRate 0.0214   Epoch: 14   Global Step: 154720   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:58:07,892-Speed 5432.62 samples/sec   Loss 2.8223   LearningRate 0.0214   Epoch: 14   Global Step: 154730   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:58:15,480-Speed 5398.96 samples/sec   Loss 2.7836   LearningRate 0.0214   Epoch: 14   Global Step: 154740   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:58:23,098-Speed 5377.38 samples/sec   Loss 2.8131   LearningRate 0.0214   Epoch: 14   Global Step: 154750   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:58:30,592-Speed 5466.38 samples/sec   Loss 2.8400   LearningRate 0.0214   Epoch: 14   Global Step: 154760   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:58:38,216-Speed 5372.96 samples/sec   Loss 2.8037   LearningRate 0.0214   Epoch: 14   Global Step: 154770   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:58:45,731-Speed 5451.70 samples/sec   Loss 2.8113   LearningRate 0.0214   Epoch: 14   Global Step: 154780   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:58:53,306-Speed 5407.77 samples/sec   Loss 2.7919   LearningRate 0.0214   Epoch: 14   Global Step: 154790   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:59:00,839-Speed 5437.71 samples/sec   Loss 2.8038   LearningRate 0.0214   Epoch: 14   Global Step: 154800   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:59:08,448-Speed 5384.34 samples/sec   Loss 2.8286   LearningRate 0.0214   Epoch: 14   Global Step: 154810   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:59:16,067-Speed 5376.47 samples/sec   Loss 2.8365   LearningRate 0.0214   Epoch: 14   Global Step: 154820   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 05:59:23,578-Speed 5453.69 samples/sec   Loss 2.8272   LearningRate 0.0213   Epoch: 14   Global Step: 154830   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:59:31,215-Speed 5364.21 samples/sec   Loss 2.7995   LearningRate 0.0213   Epoch: 14   Global Step: 154840   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:59:38,841-Speed 5371.89 samples/sec   Loss 2.7699   LearningRate 0.0213   Epoch: 14   Global Step: 154850   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:59:46,348-Speed 5457.24 samples/sec   Loss 2.8213   LearningRate 0.0213   Epoch: 14   Global Step: 154860   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 05:59:53,915-Speed 5413.68 samples/sec   Loss 2.8472   LearningRate 0.0213   Epoch: 14   Global Step: 154870   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:00:01,497-Speed 5403.17 samples/sec   Loss 2.8213   LearningRate 0.0213   Epoch: 14   Global Step: 154880   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:00:09,067-Speed 5411.13 samples/sec   Loss 2.8441   LearningRate 0.0213   Epoch: 14   Global Step: 154890   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:00:16,629-Speed 5417.61 samples/sec   Loss 2.7874   LearningRate 0.0213   Epoch: 14   Global Step: 154900   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:00:24,208-Speed 5405.43 samples/sec   Loss 2.8521   LearningRate 0.0213   Epoch: 14   Global Step: 154910   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:00:31,730-Speed 5446.08 samples/sec   Loss 2.8405   LearningRate 0.0213   Epoch: 14   Global Step: 154920   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:00:39,369-Speed 5362.40 samples/sec   Loss 2.8112   LearningRate 0.0213   Epoch: 14   Global Step: 154930   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:00:46,925-Speed 5422.01 samples/sec   Loss 2.8315   LearningRate 0.0213   Epoch: 14   Global Step: 154940   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:00:54,481-Speed 5421.79 samples/sec   Loss 2.7878   LearningRate 0.0212   Epoch: 14   Global Step: 154950   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:01:02,053-Speed 5409.82 samples/sec   Loss 2.8191   LearningRate 0.0212   Epoch: 14   Global Step: 154960   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:01:09,706-Speed 5352.71 samples/sec   Loss 2.7607   LearningRate 0.0212   Epoch: 14   Global Step: 154970   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:01:17,390-Speed 5331.53 samples/sec   Loss 2.8218   LearningRate 0.0212   Epoch: 14   Global Step: 154980   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:01:24,943-Speed 5424.05 samples/sec   Loss 2.8284   LearningRate 0.0212   Epoch: 14   Global Step: 154990   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:01:32,484-Speed 5431.94 samples/sec   Loss 2.8273   LearningRate 0.0212   Epoch: 14   Global Step: 155000   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:01:40,015-Speed 5439.52 samples/sec   Loss 2.8137   LearningRate 0.0212   Epoch: 14   Global Step: 155010   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:01:47,538-Speed 5445.35 samples/sec   Loss 2.8046   LearningRate 0.0212   Epoch: 14   Global Step: 155020   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:01:55,246-Speed 5314.67 samples/sec   Loss 2.8317   LearningRate 0.0212   Epoch: 14   Global Step: 155030   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:02:02,741-Speed 5465.49 samples/sec   Loss 2.8120   LearningRate 0.0212   Epoch: 14   Global Step: 155040   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:02:10,262-Speed 5447.12 samples/sec   Loss 2.8298   LearningRate 0.0212   Epoch: 14   Global Step: 155050   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:02:17,862-Speed 5389.90 samples/sec   Loss 2.8095   LearningRate 0.0212   Epoch: 14   Global Step: 155060   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:02:25,422-Speed 5419.09 samples/sec   Loss 2.8153   LearningRate 0.0211   Epoch: 14   Global Step: 155070   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:02:32,972-Speed 5425.74 samples/sec   Loss 2.8107   LearningRate 0.0211   Epoch: 14   Global Step: 155080   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:02:40,507-Speed 5436.76 samples/sec   Loss 2.7785   LearningRate 0.0211   Epoch: 14   Global Step: 155090   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:02:48,031-Speed 5444.71 samples/sec   Loss 2.7922   LearningRate 0.0211   Epoch: 14   Global Step: 155100   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:02:55,570-Speed 5433.39 samples/sec   Loss 2.8293   LearningRate 0.0211   Epoch: 14   Global Step: 155110   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:03:03,144-Speed 5408.80 samples/sec   Loss 2.8340   LearningRate 0.0211   Epoch: 14   Global Step: 155120   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:03:10,678-Speed 5437.31 samples/sec   Loss 2.8214   LearningRate 0.0211   Epoch: 14   Global Step: 155130   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:03:18,199-Speed 5446.73 samples/sec   Loss 2.7749   LearningRate 0.0211   Epoch: 14   Global Step: 155140   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:03:25,785-Speed 5400.35 samples/sec   Loss 2.7814   LearningRate 0.0211   Epoch: 14   Global Step: 155150   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:03:33,268-Speed 5474.75 samples/sec   Loss 2.7925   LearningRate 0.0211   Epoch: 14   Global Step: 155160   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:03:40,835-Speed 5413.08 samples/sec   Loss 2.7873   LearningRate 0.0211   Epoch: 14   Global Step: 155170   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:03:48,407-Speed 5410.96 samples/sec   Loss 2.8140   LearningRate 0.0211   Epoch: 14   Global Step: 155180   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:03:55,982-Speed 5408.47 samples/sec   Loss 2.7842   LearningRate 0.0211   Epoch: 14   Global Step: 155190   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:04:03,517-Speed 5436.14 samples/sec   Loss 2.7955   LearningRate 0.0210   Epoch: 14   Global Step: 155200   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:04:11,033-Speed 5451.08 samples/sec   Loss 2.8157   LearningRate 0.0210   Epoch: 14   Global Step: 155210   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:04:18,600-Speed 5413.54 samples/sec   Loss 2.8542   LearningRate 0.0210   Epoch: 14   Global Step: 155220   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:04:26,203-Speed 5387.99 samples/sec   Loss 2.8024   LearningRate 0.0210   Epoch: 14   Global Step: 155230   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:04:33,714-Speed 5454.00 samples/sec   Loss 2.8086   LearningRate 0.0210   Epoch: 14   Global Step: 155240   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:04:41,365-Speed 5354.61 samples/sec   Loss 2.7545   LearningRate 0.0210   Epoch: 14   Global Step: 155250   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:04:48,906-Speed 5432.50 samples/sec   Loss 2.7718   LearningRate 0.0210   Epoch: 14   Global Step: 155260   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:04:56,491-Speed 5400.59 samples/sec   Loss 2.7957   LearningRate 0.0210   Epoch: 14   Global Step: 155270   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:05:04,024-Speed 5438.05 samples/sec   Loss 2.8094   LearningRate 0.0210   Epoch: 14   Global Step: 155280   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:05:11,652-Speed 5370.51 samples/sec   Loss 2.8322   LearningRate 0.0210   Epoch: 14   Global Step: 155290   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:05:19,272-Speed 5376.53 samples/sec   Loss 2.7873   LearningRate 0.0210   Epoch: 14   Global Step: 155300   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:05:26,778-Speed 5457.23 samples/sec   Loss 2.8097   LearningRate 0.0210   Epoch: 14   Global Step: 155310   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:05:34,331-Speed 5423.57 samples/sec   Loss 2.7812   LearningRate 0.0209   Epoch: 14   Global Step: 155320   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:05:41,814-Speed 5474.67 samples/sec   Loss 2.8027   LearningRate 0.0209   Epoch: 14   Global Step: 155330   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:05:49,373-Speed 5419.56 samples/sec   Loss 2.7989   LearningRate 0.0209   Epoch: 14   Global Step: 155340   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:05:56,930-Speed 5420.76 samples/sec   Loss 2.7845   LearningRate 0.0209   Epoch: 14   Global Step: 155350   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:06:04,462-Speed 5438.88 samples/sec   Loss 2.7883   LearningRate 0.0209   Epoch: 14   Global Step: 155360   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:06:11,979-Speed 5449.17 samples/sec   Loss 2.7768   LearningRate 0.0209   Epoch: 14   Global Step: 155370   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:06:19,560-Speed 5404.01 samples/sec   Loss 2.8180   LearningRate 0.0209   Epoch: 14   Global Step: 155380   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:06:27,096-Speed 5436.34 samples/sec   Loss 2.7856   LearningRate 0.0209   Epoch: 14   Global Step: 155390   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:06:34,561-Speed 5487.08 samples/sec   Loss 2.8145   LearningRate 0.0209   Epoch: 14   Global Step: 155400   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:06:42,131-Speed 5411.40 samples/sec   Loss 2.7863   LearningRate 0.0209   Epoch: 14   Global Step: 155410   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:06:49,648-Speed 5450.47 samples/sec   Loss 2.7581   LearningRate 0.0209   Epoch: 14   Global Step: 155420   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:06:57,180-Speed 5438.71 samples/sec   Loss 2.7673   LearningRate 0.0209   Epoch: 14   Global Step: 155430   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:07:04,780-Speed 5389.73 samples/sec   Loss 2.7376   LearningRate 0.0209   Epoch: 14   Global Step: 155440   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:07:12,368-Speed 5399.04 samples/sec   Loss 2.8083   LearningRate 0.0208   Epoch: 14   Global Step: 155450   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:07:19,876-Speed 5456.35 samples/sec   Loss 2.7851   LearningRate 0.0208   Epoch: 14   Global Step: 155460   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:07:27,451-Speed 5407.38 samples/sec   Loss 2.8199   LearningRate 0.0208   Epoch: 14   Global Step: 155470   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:07:35,040-Speed 5397.98 samples/sec   Loss 2.7929   LearningRate 0.0208   Epoch: 14   Global Step: 155480   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:07:42,578-Speed 5434.41 samples/sec   Loss 2.7954   LearningRate 0.0208   Epoch: 14   Global Step: 155490   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:07:50,193-Speed 5380.29 samples/sec   Loss 2.7978   LearningRate 0.0208   Epoch: 14   Global Step: 155500   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:07:57,855-Speed 5346.09 samples/sec   Loss 2.8038   LearningRate 0.0208   Epoch: 14   Global Step: 155510   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:08:05,377-Speed 5446.10 samples/sec   Loss 2.8216   LearningRate 0.0208   Epoch: 14   Global Step: 155520   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:08:12,848-Speed 5483.71 samples/sec   Loss 2.7433   LearningRate 0.0208   Epoch: 14   Global Step: 155530   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:08:20,361-Speed 5452.53 samples/sec   Loss 2.7725   LearningRate 0.0208   Epoch: 14   Global Step: 155540   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:08:43,333-Speed 1783.18 samples/sec   Loss 2.7553   LearningRate 0.0208   Epoch: 15   Global Step: 155550   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:08:50,789-Speed 5493.65 samples/sec   Loss 2.7819   LearningRate 0.0208   Epoch: 15   Global Step: 155560   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:08:58,348-Speed 5419.82 samples/sec   Loss 2.7993   LearningRate 0.0207   Epoch: 15   Global Step: 155570   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:09:05,832-Speed 5474.03 samples/sec   Loss 2.7789   LearningRate 0.0207   Epoch: 15   Global Step: 155580   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:09:13,399-Speed 5413.20 samples/sec   Loss 2.7597   LearningRate 0.0207   Epoch: 15   Global Step: 155590   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:09:20,817-Speed 5522.88 samples/sec   Loss 2.7940   LearningRate 0.0207   Epoch: 15   Global Step: 155600   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:09:28,317-Speed 5462.06 samples/sec   Loss 2.7686   LearningRate 0.0207   Epoch: 15   Global Step: 155610   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:09:35,788-Speed 5483.24 samples/sec   Loss 2.7728   LearningRate 0.0207   Epoch: 15   Global Step: 155620   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:09:43,250-Speed 5489.79 samples/sec   Loss 2.7965   LearningRate 0.0207   Epoch: 15   Global Step: 155630   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:09:50,727-Speed 5479.50 samples/sec   Loss 2.7348   LearningRate 0.0207   Epoch: 15   Global Step: 155640   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:09:58,211-Speed 5473.72 samples/sec   Loss 2.7638   LearningRate 0.0207   Epoch: 15   Global Step: 155650   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:10:05,643-Speed 5511.61 samples/sec   Loss 2.7533   LearningRate 0.0207   Epoch: 15   Global Step: 155660   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:10:13,065-Speed 5519.84 samples/sec   Loss 2.7431   LearningRate 0.0207   Epoch: 15   Global Step: 155670   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:10:20,563-Speed 5463.41 samples/sec   Loss 2.7516   LearningRate 0.0207   Epoch: 15   Global Step: 155680   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:10:28,277-Speed 5310.93 samples/sec   Loss 2.7894   LearningRate 0.0207   Epoch: 15   Global Step: 155690   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:10:35,912-Speed 5365.39 samples/sec   Loss 2.7837   LearningRate 0.0206   Epoch: 15   Global Step: 155700   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:10:43,589-Speed 5336.18 samples/sec   Loss 2.7722   LearningRate 0.0206   Epoch: 15   Global Step: 155710   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:10:51,284-Speed 5323.78 samples/sec   Loss 2.7367   LearningRate 0.0206   Epoch: 15   Global Step: 155720   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:10:58,983-Speed 5320.44 samples/sec   Loss 2.7543   LearningRate 0.0206   Epoch: 15   Global Step: 155730   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:11:06,660-Speed 5336.68 samples/sec   Loss 2.7363   LearningRate 0.0206   Epoch: 15   Global Step: 155740   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:11:14,349-Speed 5327.99 samples/sec   Loss 2.7522   LearningRate 0.0206   Epoch: 15   Global Step: 155750   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:11:21,993-Speed 5359.27 samples/sec   Loss 2.7669   LearningRate 0.0206   Epoch: 15   Global Step: 155760   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:11:29,642-Speed 5355.15 samples/sec   Loss 2.7644   LearningRate 0.0206   Epoch: 15   Global Step: 155770   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:11:37,316-Speed 5338.18 samples/sec   Loss 2.7420   LearningRate 0.0206   Epoch: 15   Global Step: 155780   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:11:44,876-Speed 5418.86 samples/sec   Loss 2.7554   LearningRate 0.0206   Epoch: 15   Global Step: 155790   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:11:52,411-Speed 5436.44 samples/sec   Loss 2.7313   LearningRate 0.0206   Epoch: 15   Global Step: 155800   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:11:59,910-Speed 5463.41 samples/sec   Loss 2.7181   LearningRate 0.0206   Epoch: 15   Global Step: 155810   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:12:07,398-Speed 5470.70 samples/sec   Loss 2.7464   LearningRate 0.0205   Epoch: 15   Global Step: 155820   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:12:14,898-Speed 5462.02 samples/sec   Loss 2.8014   LearningRate 0.0205   Epoch: 15   Global Step: 155830   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:12:22,305-Speed 5530.47 samples/sec   Loss 2.7304   LearningRate 0.0205   Epoch: 15   Global Step: 155840   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:12:29,749-Speed 5503.00 samples/sec   Loss 2.7878   LearningRate 0.0205   Epoch: 15   Global Step: 155850   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:12:37,221-Speed 5482.58 samples/sec   Loss 2.7332   LearningRate 0.0205   Epoch: 15   Global Step: 155860   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:12:44,715-Speed 5466.07 samples/sec   Loss 2.7669   LearningRate 0.0205   Epoch: 15   Global Step: 155870   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:12:52,238-Speed 5445.78 samples/sec   Loss 2.7746   LearningRate 0.0205   Epoch: 15   Global Step: 155880   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:12:59,678-Speed 5506.08 samples/sec   Loss 2.7324   LearningRate 0.0205   Epoch: 15   Global Step: 155890   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:13:07,222-Speed 5429.97 samples/sec   Loss 2.7346   LearningRate 0.0205   Epoch: 15   Global Step: 155900   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:13:14,721-Speed 5462.68 samples/sec   Loss 2.7598   LearningRate 0.0205   Epoch: 15   Global Step: 155910   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:13:22,233-Speed 5453.57 samples/sec   Loss 2.7314   LearningRate 0.0205   Epoch: 15   Global Step: 155920   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:13:29,703-Speed 5483.63 samples/sec   Loss 2.7623   LearningRate 0.0205   Epoch: 15   Global Step: 155930   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:13:37,221-Speed 5449.08 samples/sec   Loss 2.7830   LearningRate 0.0205   Epoch: 15   Global Step: 155940   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:13:44,725-Speed 5459.14 samples/sec   Loss 2.7105   LearningRate 0.0204   Epoch: 15   Global Step: 155950   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:13:52,174-Speed 5499.66 samples/sec   Loss 2.7639   LearningRate 0.0204   Epoch: 15   Global Step: 155960   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:13:59,731-Speed 5420.74 samples/sec   Loss 2.7218   LearningRate 0.0204   Epoch: 15   Global Step: 155970   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:14:07,410-Speed 5334.84 samples/sec   Loss 2.7481   LearningRate 0.0204   Epoch: 15   Global Step: 155980   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:14:14,966-Speed 5421.44 samples/sec   Loss 2.7274   LearningRate 0.0204   Epoch: 15   Global Step: 155990   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:14:22,539-Speed 5410.01 samples/sec   Loss 2.7169   LearningRate 0.0204   Epoch: 15   Global Step: 156000   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:15:12,466-[lfw][156000]XNorm: 23.614685
Training: 2022-01-09 06:15:12,467-[lfw][156000]Accuracy-Flip: 0.99817+-0.00229
Training: 2022-01-09 06:15:12,467-[lfw][156000]Accuracy-Highest: 0.99817
Training: 2022-01-09 06:16:05,225-[cfp_fp][156000]XNorm: 22.312758
Training: 2022-01-09 06:16:05,226-[cfp_fp][156000]Accuracy-Flip: 0.99371+-0.00363
Training: 2022-01-09 06:16:05,226-[cfp_fp][156000]Accuracy-Highest: 0.99371
Training: 2022-01-09 06:16:49,927-[agedb_30][156000]XNorm: 23.813588
Training: 2022-01-09 06:16:49,928-[agedb_30][156000]Accuracy-Flip: 0.98150+-0.00724
Training: 2022-01-09 06:16:49,928-[agedb_30][156000]Accuracy-Highest: 0.98217
Training: 2022-01-09 06:16:57,544-Speed 264.25 samples/sec   Loss 2.7393   LearningRate 0.0204   Epoch: 15   Global Step: 156010   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:17:05,143-Speed 5390.80 samples/sec   Loss 2.7506   LearningRate 0.0204   Epoch: 15   Global Step: 156020   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:17:12,707-Speed 5415.67 samples/sec   Loss 2.7186   LearningRate 0.0204   Epoch: 15   Global Step: 156030   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:17:20,206-Speed 5463.23 samples/sec   Loss 2.7567   LearningRate 0.0204   Epoch: 15   Global Step: 156040   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:17:27,682-Speed 5479.66 samples/sec   Loss 2.7678   LearningRate 0.0204   Epoch: 15   Global Step: 156050   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:17:35,164-Speed 5474.65 samples/sec   Loss 2.7411   LearningRate 0.0204   Epoch: 15   Global Step: 156060   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:17:42,629-Speed 5488.21 samples/sec   Loss 2.7285   LearningRate 0.0203   Epoch: 15   Global Step: 156070   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:17:50,159-Speed 5440.24 samples/sec   Loss 2.7535   LearningRate 0.0203   Epoch: 15   Global Step: 156080   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:17:57,714-Speed 5422.38 samples/sec   Loss 2.7716   LearningRate 0.0203   Epoch: 15   Global Step: 156090   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:18:05,228-Speed 5452.08 samples/sec   Loss 2.7780   LearningRate 0.0203   Epoch: 15   Global Step: 156100   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:18:12,688-Speed 5491.28 samples/sec   Loss 2.7400   LearningRate 0.0203   Epoch: 15   Global Step: 156110   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:18:20,250-Speed 5417.57 samples/sec   Loss 2.7909   LearningRate 0.0203   Epoch: 15   Global Step: 156120   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:18:27,827-Speed 5406.50 samples/sec   Loss 2.7898   LearningRate 0.0203   Epoch: 15   Global Step: 156130   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:18:35,389-Speed 5417.40 samples/sec   Loss 2.7670   LearningRate 0.0203   Epoch: 15   Global Step: 156140   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:18:43,048-Speed 5348.46 samples/sec   Loss 2.7800   LearningRate 0.0203   Epoch: 15   Global Step: 156150   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:18:50,691-Speed 5360.17 samples/sec   Loss 2.7460   LearningRate 0.0203   Epoch: 15   Global Step: 156160   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:18:58,187-Speed 5464.64 samples/sec   Loss 2.7706   LearningRate 0.0203   Epoch: 15   Global Step: 156170   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:19:05,766-Speed 5405.21 samples/sec   Loss 2.7644   LearningRate 0.0203   Epoch: 15   Global Step: 156180   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:19:13,247-Speed 5475.35 samples/sec   Loss 2.7634   LearningRate 0.0203   Epoch: 15   Global Step: 156190   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:19:20,809-Speed 5418.16 samples/sec   Loss 2.7410   LearningRate 0.0202   Epoch: 15   Global Step: 156200   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:19:28,303-Speed 5466.42 samples/sec   Loss 2.7562   LearningRate 0.0202   Epoch: 15   Global Step: 156210   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:19:35,798-Speed 5465.07 samples/sec   Loss 2.7772   LearningRate 0.0202   Epoch: 15   Global Step: 156220   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:19:43,373-Speed 5407.86 samples/sec   Loss 2.7275   LearningRate 0.0202   Epoch: 15   Global Step: 156230   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:19:51,177-Speed 5249.67 samples/sec   Loss 2.7534   LearningRate 0.0202   Epoch: 15   Global Step: 156240   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:19:58,705-Speed 5441.96 samples/sec   Loss 2.7617   LearningRate 0.0202   Epoch: 15   Global Step: 156250   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:20:06,298-Speed 5395.16 samples/sec   Loss 2.7311   LearningRate 0.0202   Epoch: 15   Global Step: 156260   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:20:13,793-Speed 5465.37 samples/sec   Loss 2.7484   LearningRate 0.0202   Epoch: 15   Global Step: 156270   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:20:21,321-Speed 5441.68 samples/sec   Loss 2.7393   LearningRate 0.0202   Epoch: 15   Global Step: 156280   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:20:28,877-Speed 5421.94 samples/sec   Loss 2.7328   LearningRate 0.0202   Epoch: 15   Global Step: 156290   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:20:36,435-Speed 5419.97 samples/sec   Loss 2.7588   LearningRate 0.0202   Epoch: 15   Global Step: 156300   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:20:43,978-Speed 5430.97 samples/sec   Loss 2.7294   LearningRate 0.0202   Epoch: 15   Global Step: 156310   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:20:51,528-Speed 5425.92 samples/sec   Loss 2.6783   LearningRate 0.0202   Epoch: 15   Global Step: 156320   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:20:59,030-Speed 5460.90 samples/sec   Loss 2.7661   LearningRate 0.0201   Epoch: 15   Global Step: 156330   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:21:06,507-Speed 5478.69 samples/sec   Loss 2.6876   LearningRate 0.0201   Epoch: 15   Global Step: 156340   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:21:14,043-Speed 5435.48 samples/sec   Loss 2.7378   LearningRate 0.0201   Epoch: 15   Global Step: 156350   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:21:21,557-Speed 5452.53 samples/sec   Loss 2.7621   LearningRate 0.0201   Epoch: 15   Global Step: 156360   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:21:29,076-Speed 5448.27 samples/sec   Loss 2.7086   LearningRate 0.0201   Epoch: 15   Global Step: 156370   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:21:36,693-Speed 5378.32 samples/sec   Loss 2.7099   LearningRate 0.0201   Epoch: 15   Global Step: 156380   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:21:44,369-Speed 5336.62 samples/sec   Loss 2.7486   LearningRate 0.0201   Epoch: 15   Global Step: 156390   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:21:51,879-Speed 5455.43 samples/sec   Loss 2.7710   LearningRate 0.0201   Epoch: 15   Global Step: 156400   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:21:59,412-Speed 5438.07 samples/sec   Loss 2.7691   LearningRate 0.0201   Epoch: 15   Global Step: 156410   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:22:06,949-Speed 5435.49 samples/sec   Loss 2.7294   LearningRate 0.0201   Epoch: 15   Global Step: 156420   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:22:14,497-Speed 5426.45 samples/sec   Loss 2.7407   LearningRate 0.0201   Epoch: 15   Global Step: 156430   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:22:22,061-Speed 5416.18 samples/sec   Loss 2.7379   LearningRate 0.0201   Epoch: 15   Global Step: 156440   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:22:29,541-Speed 5477.10 samples/sec   Loss 2.7291   LearningRate 0.0200   Epoch: 15   Global Step: 156450   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:22:37,052-Speed 5454.08 samples/sec   Loss 2.7113   LearningRate 0.0200   Epoch: 15   Global Step: 156460   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:22:44,550-Speed 5463.22 samples/sec   Loss 2.7201   LearningRate 0.0200   Epoch: 15   Global Step: 156470   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:22:52,103-Speed 5423.43 samples/sec   Loss 2.7144   LearningRate 0.0200   Epoch: 15   Global Step: 156480   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:22:59,643-Speed 5433.14 samples/sec   Loss 2.7454   LearningRate 0.0200   Epoch: 15   Global Step: 156490   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:23:07,104-Speed 5490.75 samples/sec   Loss 2.7121   LearningRate 0.0200   Epoch: 15   Global Step: 156500   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:23:14,599-Speed 5465.46 samples/sec   Loss 2.7483   LearningRate 0.0200   Epoch: 15   Global Step: 156510   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:23:22,120-Speed 5446.74 samples/sec   Loss 2.7207   LearningRate 0.0200   Epoch: 15   Global Step: 156520   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:23:29,605-Speed 5473.42 samples/sec   Loss 2.7120   LearningRate 0.0200   Epoch: 15   Global Step: 156530   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:23:37,166-Speed 5417.71 samples/sec   Loss 2.7115   LearningRate 0.0200   Epoch: 15   Global Step: 156540   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:23:44,734-Speed 5413.04 samples/sec   Loss 2.7219   LearningRate 0.0200   Epoch: 15   Global Step: 156550   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:23:52,220-Speed 5472.42 samples/sec   Loss 2.7475   LearningRate 0.0200   Epoch: 15   Global Step: 156560   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:23:59,707-Speed 5471.52 samples/sec   Loss 2.7449   LearningRate 0.0200   Epoch: 15   Global Step: 156570   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:24:07,230-Speed 5445.24 samples/sec   Loss 2.7367   LearningRate 0.0199   Epoch: 15   Global Step: 156580   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:24:14,772-Speed 5431.48 samples/sec   Loss 2.6993   LearningRate 0.0199   Epoch: 15   Global Step: 156590   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:24:22,238-Speed 5486.84 samples/sec   Loss 2.7451   LearningRate 0.0199   Epoch: 15   Global Step: 156600   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:24:29,725-Speed 5471.63 samples/sec   Loss 2.7287   LearningRate 0.0199   Epoch: 15   Global Step: 156610   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:24:37,197-Speed 5482.32 samples/sec   Loss 2.7185   LearningRate 0.0199   Epoch: 15   Global Step: 156620   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:24:44,660-Speed 5489.13 samples/sec   Loss 2.7468   LearningRate 0.0199   Epoch: 15   Global Step: 156630   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:24:52,267-Speed 5384.87 samples/sec   Loss 2.6992   LearningRate 0.0199   Epoch: 15   Global Step: 156640   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 06:24:59,758-Speed 5468.84 samples/sec   Loss 2.7067   LearningRate 0.0199   Epoch: 15   Global Step: 156650   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:25:07,310-Speed 5424.26 samples/sec   Loss 2.7730   LearningRate 0.0199   Epoch: 15   Global Step: 156660   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:25:14,815-Speed 5459.02 samples/sec   Loss 2.7010   LearningRate 0.0199   Epoch: 15   Global Step: 156670   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:25:22,301-Speed 5472.07 samples/sec   Loss 2.7186   LearningRate 0.0199   Epoch: 15   Global Step: 156680   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:25:29,830-Speed 5441.33 samples/sec   Loss 2.7122   LearningRate 0.0199   Epoch: 15   Global Step: 156690   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:25:37,296-Speed 5486.21 samples/sec   Loss 2.6863   LearningRate 0.0199   Epoch: 15   Global Step: 156700   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:25:44,859-Speed 5416.66 samples/sec   Loss 2.6995   LearningRate 0.0198   Epoch: 15   Global Step: 156710   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:25:52,355-Speed 5465.38 samples/sec   Loss 2.7379   LearningRate 0.0198   Epoch: 15   Global Step: 156720   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:26:00,010-Speed 5351.54 samples/sec   Loss 2.7128   LearningRate 0.0198   Epoch: 15   Global Step: 156730   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:26:07,599-Speed 5397.80 samples/sec   Loss 2.6830   LearningRate 0.0198   Epoch: 15   Global Step: 156740   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:26:15,130-Speed 5439.94 samples/sec   Loss 2.7022   LearningRate 0.0198   Epoch: 15   Global Step: 156750   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:26:22,689-Speed 5419.67 samples/sec   Loss 2.6907   LearningRate 0.0198   Epoch: 15   Global Step: 156760   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:26:30,229-Speed 5433.14 samples/sec   Loss 2.7081   LearningRate 0.0198   Epoch: 15   Global Step: 156770   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:26:37,731-Speed 5459.98 samples/sec   Loss 2.7084   LearningRate 0.0198   Epoch: 15   Global Step: 156780   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:26:45,246-Speed 5451.52 samples/sec   Loss 2.7041   LearningRate 0.0198   Epoch: 15   Global Step: 156790   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:26:52,793-Speed 5428.28 samples/sec   Loss 2.7370   LearningRate 0.0198   Epoch: 15   Global Step: 156800   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:27:00,322-Speed 5441.17 samples/sec   Loss 2.7328   LearningRate 0.0198   Epoch: 15   Global Step: 156810   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:27:07,799-Speed 5478.36 samples/sec   Loss 2.7235   LearningRate 0.0198   Epoch: 15   Global Step: 156820   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:27:15,306-Speed 5457.15 samples/sec   Loss 2.7225   LearningRate 0.0198   Epoch: 15   Global Step: 156830   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:27:22,850-Speed 5430.56 samples/sec   Loss 2.6845   LearningRate 0.0197   Epoch: 15   Global Step: 156840   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:27:30,417-Speed 5413.41 samples/sec   Loss 2.6895   LearningRate 0.0197   Epoch: 15   Global Step: 156850   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:27:37,884-Speed 5486.06 samples/sec   Loss 2.7145   LearningRate 0.0197   Epoch: 15   Global Step: 156860   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:27:45,476-Speed 5396.36 samples/sec   Loss 2.6971   LearningRate 0.0197   Epoch: 15   Global Step: 156870   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:27:53,004-Speed 5441.80 samples/sec   Loss 2.7067   LearningRate 0.0197   Epoch: 15   Global Step: 156880   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:28:00,538-Speed 5437.25 samples/sec   Loss 2.7056   LearningRate 0.0197   Epoch: 15   Global Step: 156890   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:28:08,045-Speed 5457.22 samples/sec   Loss 2.6676   LearningRate 0.0197   Epoch: 15   Global Step: 156900   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:28:15,517-Speed 5482.61 samples/sec   Loss 2.6838   LearningRate 0.0197   Epoch: 15   Global Step: 156910   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:28:23,017-Speed 5461.66 samples/sec   Loss 2.6728   LearningRate 0.0197   Epoch: 15   Global Step: 156920   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:28:30,610-Speed 5394.95 samples/sec   Loss 2.6844   LearningRate 0.0197   Epoch: 15   Global Step: 156930   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:28:38,096-Speed 5472.67 samples/sec   Loss 2.7045   LearningRate 0.0197   Epoch: 15   Global Step: 156940   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:28:45,573-Speed 5478.90 samples/sec   Loss 2.7089   LearningRate 0.0197   Epoch: 15   Global Step: 156950   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:28:53,121-Speed 5427.70 samples/sec   Loss 2.6798   LearningRate 0.0196   Epoch: 15   Global Step: 156960   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:29:00,720-Speed 5390.95 samples/sec   Loss 2.6932   LearningRate 0.0196   Epoch: 15   Global Step: 156970   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:29:08,302-Speed 5402.87 samples/sec   Loss 2.6713   LearningRate 0.0196   Epoch: 15   Global Step: 156980   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:29:15,858-Speed 5421.95 samples/sec   Loss 2.7151   LearningRate 0.0196   Epoch: 15   Global Step: 156990   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:29:23,334-Speed 5479.00 samples/sec   Loss 2.6869   LearningRate 0.0196   Epoch: 15   Global Step: 157000   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:29:30,822-Speed 5471.48 samples/sec   Loss 2.7259   LearningRate 0.0196   Epoch: 15   Global Step: 157010   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:29:38,307-Speed 5472.77 samples/sec   Loss 2.7280   LearningRate 0.0196   Epoch: 15   Global Step: 157020   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 06:29:45,899-Speed 5396.09 samples/sec   Loss 2.7156   LearningRate 0.0196   Epoch: 15   Global Step: 157030   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 06:29:53,422-Speed 5445.44 samples/sec   Loss 2.6937   LearningRate 0.0196   Epoch: 15   Global Step: 157040   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:30:00,888-Speed 5486.66 samples/sec   Loss 2.6860   LearningRate 0.0196   Epoch: 15   Global Step: 157050   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:30:08,558-Speed 5341.12 samples/sec   Loss 2.6911   LearningRate 0.0196   Epoch: 15   Global Step: 157060   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:30:16,033-Speed 5480.68 samples/sec   Loss 2.6842   LearningRate 0.0196   Epoch: 15   Global Step: 157070   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:30:23,597-Speed 5415.32 samples/sec   Loss 2.7061   LearningRate 0.0196   Epoch: 15   Global Step: 157080   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:30:31,228-Speed 5368.98 samples/sec   Loss 2.6925   LearningRate 0.0195   Epoch: 15   Global Step: 157090   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 06:30:38,731-Speed 5459.32 samples/sec   Loss 2.6568   LearningRate 0.0195   Epoch: 15   Global Step: 157100   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:30:46,184-Speed 5496.97 samples/sec   Loss 2.7137   LearningRate 0.0195   Epoch: 15   Global Step: 157110   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:30:53,691-Speed 5456.94 samples/sec   Loss 2.7285   LearningRate 0.0195   Epoch: 15   Global Step: 157120   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:31:01,167-Speed 5479.92 samples/sec   Loss 2.6652   LearningRate 0.0195   Epoch: 15   Global Step: 157130   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:31:08,638-Speed 5482.73 samples/sec   Loss 2.7071   LearningRate 0.0195   Epoch: 15   Global Step: 157140   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:31:16,113-Speed 5480.54 samples/sec   Loss 2.7020   LearningRate 0.0195   Epoch: 15   Global Step: 157150   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:31:23,544-Speed 5513.39 samples/sec   Loss 2.6968   LearningRate 0.0195   Epoch: 15   Global Step: 157160   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:31:31,138-Speed 5393.79 samples/sec   Loss 2.7156   LearningRate 0.0195   Epoch: 15   Global Step: 157170   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 06:31:38,715-Speed 5407.09 samples/sec   Loss 2.6671   LearningRate 0.0195   Epoch: 15   Global Step: 157180   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:31:46,229-Speed 5451.82 samples/sec   Loss 2.7296   LearningRate 0.0195   Epoch: 15   Global Step: 157190   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:31:53,690-Speed 5490.49 samples/sec   Loss 2.6679   LearningRate 0.0195   Epoch: 15   Global Step: 157200   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:32:01,187-Speed 5464.46 samples/sec   Loss 2.6810   LearningRate 0.0195   Epoch: 15   Global Step: 157210   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:32:08,740-Speed 5423.43 samples/sec   Loss 2.6876   LearningRate 0.0194   Epoch: 15   Global Step: 157220   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:32:16,307-Speed 5413.70 samples/sec   Loss 2.6649   LearningRate 0.0194   Epoch: 15   Global Step: 157230   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:32:23,808-Speed 5461.37 samples/sec   Loss 2.6431   LearningRate 0.0194   Epoch: 15   Global Step: 157240   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:32:31,414-Speed 5385.77 samples/sec   Loss 2.6608   LearningRate 0.0194   Epoch: 15   Global Step: 157250   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:32:38,974-Speed 5418.84 samples/sec   Loss 2.7100   LearningRate 0.0194   Epoch: 15   Global Step: 157260   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:32:46,482-Speed 5456.06 samples/sec   Loss 2.6841   LearningRate 0.0194   Epoch: 15   Global Step: 157270   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:32:54,056-Speed 5408.90 samples/sec   Loss 2.6461   LearningRate 0.0194   Epoch: 15   Global Step: 157280   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:33:01,594-Speed 5433.84 samples/sec   Loss 2.7110   LearningRate 0.0194   Epoch: 15   Global Step: 157290   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:33:09,057-Speed 5489.86 samples/sec   Loss 2.6930   LearningRate 0.0194   Epoch: 15   Global Step: 157300   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 06:33:16,590-Speed 5437.62 samples/sec   Loss 2.6955   LearningRate 0.0194   Epoch: 15   Global Step: 157310   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 06:33:24,119-Speed 5441.19 samples/sec   Loss 2.6709   LearningRate 0.0194   Epoch: 15   Global Step: 157320   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 06:33:31,630-Speed 5453.92 samples/sec   Loss 2.6685   LearningRate 0.0194   Epoch: 15   Global Step: 157330   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 06:33:39,160-Speed 5440.35 samples/sec   Loss 2.6696   LearningRate 0.0194   Epoch: 15   Global Step: 157340   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:33:46,726-Speed 5414.71 samples/sec   Loss 2.6795   LearningRate 0.0193   Epoch: 15   Global Step: 157350   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:33:54,343-Speed 5378.23 samples/sec   Loss 2.6387   LearningRate 0.0193   Epoch: 15   Global Step: 157360   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:34:01,842-Speed 5462.30 samples/sec   Loss 2.6927   LearningRate 0.0193   Epoch: 15   Global Step: 157370   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:34:09,417-Speed 5408.33 samples/sec   Loss 2.7032   LearningRate 0.0193   Epoch: 15   Global Step: 157380   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:34:16,988-Speed 5410.96 samples/sec   Loss 2.6454   LearningRate 0.0193   Epoch: 15   Global Step: 157390   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:34:24,472-Speed 5473.97 samples/sec   Loss 2.6507   LearningRate 0.0193   Epoch: 15   Global Step: 157400   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:34:31,981-Speed 5454.99 samples/sec   Loss 2.6542   LearningRate 0.0193   Epoch: 15   Global Step: 157410   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:34:39,500-Speed 5448.76 samples/sec   Loss 2.6983   LearningRate 0.0193   Epoch: 15   Global Step: 157420   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:34:46,959-Speed 5492.42 samples/sec   Loss 2.6530   LearningRate 0.0193   Epoch: 15   Global Step: 157430   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:34:54,578-Speed 5376.20 samples/sec   Loss 2.6894   LearningRate 0.0193   Epoch: 15   Global Step: 157440   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 06:35:02,105-Speed 5442.14 samples/sec   Loss 2.6531   LearningRate 0.0193   Epoch: 15   Global Step: 157450   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 06:35:09,580-Speed 5480.41 samples/sec   Loss 2.6327   LearningRate 0.0193   Epoch: 15   Global Step: 157460   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:35:17,140-Speed 5419.38 samples/sec   Loss 2.6231   LearningRate 0.0193   Epoch: 15   Global Step: 157470   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:35:24,693-Speed 5423.42 samples/sec   Loss 2.7010   LearningRate 0.0192   Epoch: 15   Global Step: 157480   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:35:32,253-Speed 5418.42 samples/sec   Loss 2.6861   LearningRate 0.0192   Epoch: 15   Global Step: 157490   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:35:39,772-Speed 5448.88 samples/sec   Loss 2.6861   LearningRate 0.0192   Epoch: 15   Global Step: 157500   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:35:47,358-Speed 5400.28 samples/sec   Loss 2.6850   LearningRate 0.0192   Epoch: 15   Global Step: 157510   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:35:54,867-Speed 5455.19 samples/sec   Loss 2.6396   LearningRate 0.0192   Epoch: 15   Global Step: 157520   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:36:02,288-Speed 5519.85 samples/sec   Loss 2.6783   LearningRate 0.0192   Epoch: 15   Global Step: 157530   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:36:09,751-Speed 5489.70 samples/sec   Loss 2.6788   LearningRate 0.0192   Epoch: 15   Global Step: 157540   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:36:17,261-Speed 5454.38 samples/sec   Loss 2.6458   LearningRate 0.0192   Epoch: 15   Global Step: 157550   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:36:24,696-Speed 5509.86 samples/sec   Loss 2.6629   LearningRate 0.0192   Epoch: 15   Global Step: 157560   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:36:32,230-Speed 5436.95 samples/sec   Loss 2.6904   LearningRate 0.0192   Epoch: 15   Global Step: 157570   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:36:39,789-Speed 5419.95 samples/sec   Loss 2.6148   LearningRate 0.0192   Epoch: 15   Global Step: 157580   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:36:47,230-Speed 5505.63 samples/sec   Loss 2.6591   LearningRate 0.0192   Epoch: 15   Global Step: 157590   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:36:54,700-Speed 5483.68 samples/sec   Loss 2.6713   LearningRate 0.0192   Epoch: 15   Global Step: 157600   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:37:02,222-Speed 5445.82 samples/sec   Loss 2.6625   LearningRate 0.0191   Epoch: 15   Global Step: 157610   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:37:09,761-Speed 5433.15 samples/sec   Loss 2.6626   LearningRate 0.0191   Epoch: 15   Global Step: 157620   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:37:17,221-Speed 5492.00 samples/sec   Loss 2.6529   LearningRate 0.0191   Epoch: 15   Global Step: 157630   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:37:24,679-Speed 5492.81 samples/sec   Loss 2.6636   LearningRate 0.0191   Epoch: 15   Global Step: 157640   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:37:32,150-Speed 5482.96 samples/sec   Loss 2.6425   LearningRate 0.0191   Epoch: 15   Global Step: 157650   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:37:39,665-Speed 5450.72 samples/sec   Loss 2.6635   LearningRate 0.0191   Epoch: 15   Global Step: 157660   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:37:47,167-Speed 5461.25 samples/sec   Loss 2.6435   LearningRate 0.0191   Epoch: 15   Global Step: 157670   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-01-09 06:37:54,681-Speed 5451.62 samples/sec   Loss 2.6509   LearningRate 0.0191   Epoch: 15   Global Step: 157680   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-01-09 06:38:02,153-Speed 5482.84 samples/sec   Loss 2.6523   LearningRate 0.0191   Epoch: 15   Global Step: 157690   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-01-09 06:38:09,694-Speed 5431.51 samples/sec   Loss 2.6868   LearningRate 0.0191   Epoch: 15   Global Step: 157700   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-01-09 06:38:17,213-Speed 5449.22 samples/sec   Loss 2.6719   LearningRate 0.0191   Epoch: 15   Global Step: 157710   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-01-09 06:38:24,783-Speed 5411.49 samples/sec   Loss 2.6649   LearningRate 0.0191   Epoch: 15   Global Step: 157720   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-01-09 06:38:32,290-Speed 5456.21 samples/sec   Loss 2.6921   LearningRate 0.0191   Epoch: 15   Global Step: 157730   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-01-09 06:38:39,781-Speed 5468.71 samples/sec   Loss 2.6698   LearningRate 0.0190   Epoch: 15   Global Step: 157740   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-01-09 06:38:47,292-Speed 5454.19 samples/sec   Loss 2.6034   LearningRate 0.0190   Epoch: 15   Global Step: 157750   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-01-09 06:38:54,899-Speed 5385.27 samples/sec   Loss 2.6890   LearningRate 0.0190   Epoch: 15   Global Step: 157760   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-01-09 06:39:02,419-Speed 5447.52 samples/sec   Loss 2.6482   LearningRate 0.0190   Epoch: 15   Global Step: 157770   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:39:09,893-Speed 5481.02 samples/sec   Loss 2.6940   LearningRate 0.0190   Epoch: 15   Global Step: 157780   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:39:17,387-Speed 5466.51 samples/sec   Loss 2.6821   LearningRate 0.0190   Epoch: 15   Global Step: 157790   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:39:24,841-Speed 5496.09 samples/sec   Loss 2.6425   LearningRate 0.0190   Epoch: 15   Global Step: 157800   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:39:32,310-Speed 5484.52 samples/sec   Loss 2.6290   LearningRate 0.0190   Epoch: 15   Global Step: 157810   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:39:39,815-Speed 5457.83 samples/sec   Loss 2.6664   LearningRate 0.0190   Epoch: 15   Global Step: 157820   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:39:47,356-Speed 5432.71 samples/sec   Loss 2.6543   LearningRate 0.0190   Epoch: 15   Global Step: 157830   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:39:54,862-Speed 5457.84 samples/sec   Loss 2.6452   LearningRate 0.0190   Epoch: 15   Global Step: 157840   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:40:02,379-Speed 5449.55 samples/sec   Loss 2.6161   LearningRate 0.0190   Epoch: 15   Global Step: 157850   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:40:09,815-Speed 5508.55 samples/sec   Loss 2.6655   LearningRate 0.0190   Epoch: 15   Global Step: 157860   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:40:17,381-Speed 5414.47 samples/sec   Loss 2.6310   LearningRate 0.0189   Epoch: 15   Global Step: 157870   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:40:24,907-Speed 5443.54 samples/sec   Loss 2.6280   LearningRate 0.0189   Epoch: 15   Global Step: 157880   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:40:32,450-Speed 5431.17 samples/sec   Loss 2.6671   LearningRate 0.0189   Epoch: 15   Global Step: 157890   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:40:39,940-Speed 5468.83 samples/sec   Loss 2.6279   LearningRate 0.0189   Epoch: 15   Global Step: 157900   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:40:47,415-Speed 5480.87 samples/sec   Loss 2.6593   LearningRate 0.0189   Epoch: 15   Global Step: 157910   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:40:54,937-Speed 5445.56 samples/sec   Loss 2.6461   LearningRate 0.0189   Epoch: 15   Global Step: 157920   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:41:02,486-Speed 5426.38 samples/sec   Loss 2.6625   LearningRate 0.0189   Epoch: 15   Global Step: 157930   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:41:10,006-Speed 5447.85 samples/sec   Loss 2.6636   LearningRate 0.0189   Epoch: 15   Global Step: 157940   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:41:17,524-Speed 5449.24 samples/sec   Loss 2.6397   LearningRate 0.0189   Epoch: 15   Global Step: 157950   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:41:25,051-Speed 5442.80 samples/sec   Loss 2.6489   LearningRate 0.0189   Epoch: 15   Global Step: 157960   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:41:32,640-Speed 5397.54 samples/sec   Loss 2.6957   LearningRate 0.0189   Epoch: 15   Global Step: 157970   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:41:40,127-Speed 5471.51 samples/sec   Loss 2.6350   LearningRate 0.0189   Epoch: 15   Global Step: 157980   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:41:47,813-Speed 5329.94 samples/sec   Loss 2.6114   LearningRate 0.0189   Epoch: 15   Global Step: 157990   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:41:55,325-Speed 5453.37 samples/sec   Loss 2.6101   LearningRate 0.0188   Epoch: 15   Global Step: 158000   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:42:39,136-[lfw][158000]XNorm: 23.097801
Training: 2022-01-09 06:42:39,136-[lfw][158000]Accuracy-Flip: 0.99833+-0.00236
Training: 2022-01-09 06:42:39,137-[lfw][158000]Accuracy-Highest: 0.99833
Training: 2022-01-09 06:43:29,973-[cfp_fp][158000]XNorm: 21.881999
Training: 2022-01-09 06:43:29,974-[cfp_fp][158000]Accuracy-Flip: 0.99129+-0.00463
Training: 2022-01-09 06:43:29,974-[cfp_fp][158000]Accuracy-Highest: 0.99371
Training: 2022-01-09 06:44:13,606-[agedb_30][158000]XNorm: 23.452830
Training: 2022-01-09 06:44:13,607-[agedb_30][158000]Accuracy-Flip: 0.98167+-0.00796
Training: 2022-01-09 06:44:13,607-[agedb_30][158000]Accuracy-Highest: 0.98217
Training: 2022-01-09 06:44:21,219-Speed 280.75 samples/sec   Loss 2.6181   LearningRate 0.0188   Epoch: 15   Global Step: 158010   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:44:28,671-Speed 5497.01 samples/sec   Loss 2.6485   LearningRate 0.0188   Epoch: 15   Global Step: 158020   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:44:36,145-Speed 5480.83 samples/sec   Loss 2.6247   LearningRate 0.0188   Epoch: 15   Global Step: 158030   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:44:43,614-Speed 5485.20 samples/sec   Loss 2.6345   LearningRate 0.0188   Epoch: 15   Global Step: 158040   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:44:51,201-Speed 5399.19 samples/sec   Loss 2.6478   LearningRate 0.0188   Epoch: 15   Global Step: 158050   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:44:58,682-Speed 5476.08 samples/sec   Loss 2.6161   LearningRate 0.0188   Epoch: 15   Global Step: 158060   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:45:06,278-Speed 5393.16 samples/sec   Loss 2.6512   LearningRate 0.0188   Epoch: 15   Global Step: 158070   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:45:13,737-Speed 5491.89 samples/sec   Loss 2.6512   LearningRate 0.0188   Epoch: 15   Global Step: 158080   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 06:45:21,259-Speed 5446.43 samples/sec   Loss 2.6361   LearningRate 0.0188   Epoch: 15   Global Step: 158090   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:45:28,805-Speed 5428.79 samples/sec   Loss 2.6319   LearningRate 0.0188   Epoch: 15   Global Step: 158100   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:45:36,277-Speed 5482.70 samples/sec   Loss 2.6444   LearningRate 0.0188   Epoch: 15   Global Step: 158110   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:45:43,807-Speed 5440.10 samples/sec   Loss 2.6203   LearningRate 0.0188   Epoch: 15   Global Step: 158120   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:45:51,289-Speed 5474.90 samples/sec   Loss 2.6280   LearningRate 0.0187   Epoch: 15   Global Step: 158130   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:45:58,813-Speed 5444.70 samples/sec   Loss 2.6721   LearningRate 0.0187   Epoch: 15   Global Step: 158140   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:46:06,438-Speed 5372.85 samples/sec   Loss 2.6474   LearningRate 0.0187   Epoch: 15   Global Step: 158150   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:46:14,059-Speed 5374.85 samples/sec   Loss 2.6596   LearningRate 0.0187   Epoch: 15   Global Step: 158160   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:46:21,644-Speed 5400.60 samples/sec   Loss 2.6522   LearningRate 0.0187   Epoch: 15   Global Step: 158170   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:46:29,148-Speed 5459.68 samples/sec   Loss 2.6298   LearningRate 0.0187   Epoch: 15   Global Step: 158180   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:46:36,691-Speed 5430.54 samples/sec   Loss 2.6065   LearningRate 0.0187   Epoch: 15   Global Step: 158190   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:46:44,150-Speed 5492.64 samples/sec   Loss 2.6023   LearningRate 0.0187   Epoch: 15   Global Step: 158200   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:46:51,670-Speed 5447.30 samples/sec   Loss 2.6113   LearningRate 0.0187   Epoch: 15   Global Step: 158210   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:46:59,191-Speed 5446.07 samples/sec   Loss 2.6411   LearningRate 0.0187   Epoch: 15   Global Step: 158220   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:47:06,707-Speed 5450.64 samples/sec   Loss 2.6494   LearningRate 0.0187   Epoch: 15   Global Step: 158230   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:47:14,271-Speed 5416.52 samples/sec   Loss 2.5733   LearningRate 0.0187   Epoch: 15   Global Step: 158240   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:47:21,709-Speed 5507.53 samples/sec   Loss 2.6325   LearningRate 0.0187   Epoch: 15   Global Step: 158250   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:47:29,201-Speed 5466.94 samples/sec   Loss 2.6233   LearningRate 0.0186   Epoch: 15   Global Step: 158260   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:47:36,796-Speed 5393.86 samples/sec   Loss 2.6176   LearningRate 0.0186   Epoch: 15   Global Step: 158270   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:47:44,367-Speed 5411.41 samples/sec   Loss 2.6728   LearningRate 0.0186   Epoch: 15   Global Step: 158280   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:47:51,837-Speed 5483.98 samples/sec   Loss 2.6393   LearningRate 0.0186   Epoch: 15   Global Step: 158290   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:47:59,317-Speed 5476.70 samples/sec   Loss 2.6160   LearningRate 0.0186   Epoch: 15   Global Step: 158300   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:48:06,780-Speed 5488.90 samples/sec   Loss 2.6011   LearningRate 0.0186   Epoch: 15   Global Step: 158310   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:48:14,234-Speed 5496.21 samples/sec   Loss 2.6202   LearningRate 0.0186   Epoch: 15   Global Step: 158320   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:48:21,702-Speed 5485.13 samples/sec   Loss 2.6365   LearningRate 0.0186   Epoch: 15   Global Step: 158330   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:48:29,192-Speed 5469.32 samples/sec   Loss 2.6114   LearningRate 0.0186   Epoch: 15   Global Step: 158340   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:48:36,687-Speed 5465.69 samples/sec   Loss 2.6074   LearningRate 0.0186   Epoch: 15   Global Step: 158350   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:48:44,165-Speed 5478.29 samples/sec   Loss 2.6065   LearningRate 0.0186   Epoch: 15   Global Step: 158360   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:48:51,669-Speed 5459.10 samples/sec   Loss 2.6190   LearningRate 0.0186   Epoch: 15   Global Step: 158370   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:48:59,112-Speed 5503.66 samples/sec   Loss 2.6653   LearningRate 0.0186   Epoch: 15   Global Step: 158380   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:49:06,596-Speed 5473.76 samples/sec   Loss 2.6151   LearningRate 0.0186   Epoch: 15   Global Step: 158390   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:49:14,111-Speed 5451.40 samples/sec   Loss 2.5870   LearningRate 0.0185   Epoch: 15   Global Step: 158400   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:49:21,518-Speed 5530.85 samples/sec   Loss 2.6367   LearningRate 0.0185   Epoch: 15   Global Step: 158410   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:49:29,025-Speed 5456.71 samples/sec   Loss 2.5650   LearningRate 0.0185   Epoch: 15   Global Step: 158420   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:49:36,523-Speed 5463.93 samples/sec   Loss 2.6205   LearningRate 0.0185   Epoch: 15   Global Step: 158430   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:49:43,983-Speed 5491.31 samples/sec   Loss 2.5795   LearningRate 0.0185   Epoch: 15   Global Step: 158440   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:49:51,419-Speed 5508.91 samples/sec   Loss 2.6457   LearningRate 0.0185   Epoch: 15   Global Step: 158450   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:49:58,885-Speed 5487.05 samples/sec   Loss 2.5945   LearningRate 0.0185   Epoch: 15   Global Step: 158460   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:50:06,373-Speed 5470.99 samples/sec   Loss 2.6625   LearningRate 0.0185   Epoch: 15   Global Step: 158470   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:50:13,857-Speed 5473.77 samples/sec   Loss 2.6173   LearningRate 0.0185   Epoch: 15   Global Step: 158480   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:50:21,390-Speed 5438.51 samples/sec   Loss 2.6077   LearningRate 0.0185   Epoch: 15   Global Step: 158490   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:50:28,978-Speed 5398.20 samples/sec   Loss 2.6242   LearningRate 0.0185   Epoch: 15   Global Step: 158500   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:50:39,586-Speed 5491.85 samples/sec   Loss 2.5984   LearningRate 0.0185   Epoch: 15   Global Step: 158510   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:50:47,144-Speed 5420.46 samples/sec   Loss 2.6044   LearningRate 0.0185   Epoch: 15   Global Step: 158520   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:50:54,669-Speed 5443.54 samples/sec   Loss 2.6383   LearningRate 0.0184   Epoch: 15   Global Step: 158530   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:51:02,252-Speed 5402.13 samples/sec   Loss 2.5823   LearningRate 0.0184   Epoch: 15   Global Step: 158540   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:51:09,814-Speed 5417.02 samples/sec   Loss 2.6216   LearningRate 0.0184   Epoch: 15   Global Step: 158550   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:51:17,311-Speed 5464.94 samples/sec   Loss 2.5893   LearningRate 0.0184   Epoch: 15   Global Step: 158560   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:51:24,773-Speed 5489.78 samples/sec   Loss 2.6455   LearningRate 0.0184   Epoch: 15   Global Step: 158570   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:51:32,354-Speed 5403.79 samples/sec   Loss 2.6006   LearningRate 0.0184   Epoch: 15   Global Step: 158580   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:51:39,879-Speed 5443.89 samples/sec   Loss 2.5976   LearningRate 0.0184   Epoch: 15   Global Step: 158590   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:51:47,453-Speed 5408.41 samples/sec   Loss 2.6152   LearningRate 0.0184   Epoch: 15   Global Step: 158600   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:51:54,923-Speed 5484.33 samples/sec   Loss 2.6183   LearningRate 0.0184   Epoch: 15   Global Step: 158610   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:52:02,463-Speed 5432.55 samples/sec   Loss 2.5950   LearningRate 0.0184   Epoch: 15   Global Step: 158620   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:52:09,956-Speed 5467.29 samples/sec   Loss 2.6590   LearningRate 0.0184   Epoch: 15   Global Step: 158630   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:52:17,464-Speed 5456.15 samples/sec   Loss 2.6316   LearningRate 0.0184   Epoch: 15   Global Step: 158640   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:52:24,979-Speed 5451.22 samples/sec   Loss 2.6218   LearningRate 0.0184   Epoch: 15   Global Step: 158650   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:52:32,592-Speed 5380.91 samples/sec   Loss 2.6567   LearningRate 0.0183   Epoch: 15   Global Step: 158660   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:52:40,109-Speed 5450.43 samples/sec   Loss 2.6563   LearningRate 0.0183   Epoch: 15   Global Step: 158670   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:52:47,560-Speed 5497.56 samples/sec   Loss 2.6340   LearningRate 0.0183   Epoch: 15   Global Step: 158680   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:52:55,047-Speed 5471.74 samples/sec   Loss 2.6436   LearningRate 0.0183   Epoch: 15   Global Step: 158690   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:53:02,497-Speed 5498.53 samples/sec   Loss 2.6491   LearningRate 0.0183   Epoch: 15   Global Step: 158700   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:53:10,010-Speed 5452.51 samples/sec   Loss 2.6120   LearningRate 0.0183   Epoch: 15   Global Step: 158710   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:53:17,555-Speed 5429.43 samples/sec   Loss 2.6350   LearningRate 0.0183   Epoch: 15   Global Step: 158720   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:53:25,034-Speed 5477.44 samples/sec   Loss 2.6494   LearningRate 0.0183   Epoch: 15   Global Step: 158730   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:53:32,161-Speed 5747.84 samples/sec   Loss 2.5689   LearningRate 0.0183   Epoch: 15   Global Step: 158740   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:53:39,231-Speed 5794.78 samples/sec   Loss 2.5945   LearningRate 0.0183   Epoch: 15   Global Step: 158750   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:53:46,576-Speed 5577.31 samples/sec   Loss 2.6040   LearningRate 0.0183   Epoch: 15   Global Step: 158760   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:53:54,246-Speed 5340.43 samples/sec   Loss 2.6404   LearningRate 0.0183   Epoch: 15   Global Step: 158770   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:54:01,814-Speed 5413.13 samples/sec   Loss 2.6040   LearningRate 0.0183   Epoch: 15   Global Step: 158780   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:54:09,503-Speed 5328.31 samples/sec   Loss 2.5993   LearningRate 0.0182   Epoch: 15   Global Step: 158790   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:54:17,017-Speed 5451.77 samples/sec   Loss 2.6203   LearningRate 0.0182   Epoch: 15   Global Step: 158800   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:54:24,640-Speed 5373.62 samples/sec   Loss 2.5780   LearningRate 0.0182   Epoch: 15   Global Step: 158810   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:54:32,162-Speed 5446.07 samples/sec   Loss 2.5981   LearningRate 0.0182   Epoch: 15   Global Step: 158820   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:54:39,685-Speed 5445.87 samples/sec   Loss 2.6230   LearningRate 0.0182   Epoch: 15   Global Step: 158830   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:54:47,251-Speed 5413.56 samples/sec   Loss 2.6034   LearningRate 0.0182   Epoch: 15   Global Step: 158840   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:54:54,784-Speed 5438.78 samples/sec   Loss 2.5549   LearningRate 0.0182   Epoch: 15   Global Step: 158850   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:55:02,241-Speed 5493.26 samples/sec   Loss 2.5928   LearningRate 0.0182   Epoch: 15   Global Step: 158860   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:55:09,781-Speed 5432.86 samples/sec   Loss 2.6184   LearningRate 0.0182   Epoch: 15   Global Step: 158870   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:55:17,245-Speed 5488.15 samples/sec   Loss 2.6160   LearningRate 0.0182   Epoch: 15   Global Step: 158880   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:55:24,701-Speed 5494.88 samples/sec   Loss 2.5998   LearningRate 0.0182   Epoch: 15   Global Step: 158890   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:55:32,184-Speed 5474.60 samples/sec   Loss 2.5483   LearningRate 0.0182   Epoch: 15   Global Step: 158900   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:55:39,647-Speed 5489.12 samples/sec   Loss 2.5805   LearningRate 0.0182   Epoch: 15   Global Step: 158910   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:55:47,117-Speed 5483.53 samples/sec   Loss 2.5780   LearningRate 0.0182   Epoch: 15   Global Step: 158920   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:55:54,642-Speed 5443.54 samples/sec   Loss 2.5677   LearningRate 0.0181   Epoch: 15   Global Step: 158930   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:56:02,175-Speed 5438.96 samples/sec   Loss 2.5641   LearningRate 0.0181   Epoch: 15   Global Step: 158940   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:56:09,620-Speed 5501.73 samples/sec   Loss 2.5975   LearningRate 0.0181   Epoch: 15   Global Step: 158950   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:56:17,176-Speed 5422.20 samples/sec   Loss 2.6245   LearningRate 0.0181   Epoch: 15   Global Step: 158960   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:56:24,636-Speed 5491.22 samples/sec   Loss 2.6301   LearningRate 0.0181   Epoch: 15   Global Step: 158970   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:56:32,064-Speed 5515.23 samples/sec   Loss 2.5671   LearningRate 0.0181   Epoch: 15   Global Step: 158980   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:56:39,549-Speed 5472.76 samples/sec   Loss 2.6474   LearningRate 0.0181   Epoch: 15   Global Step: 158990   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:56:47,037-Speed 5471.07 samples/sec   Loss 2.5845   LearningRate 0.0181   Epoch: 15   Global Step: 159000   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:56:54,523-Speed 5472.34 samples/sec   Loss 2.6017   LearningRate 0.0181   Epoch: 15   Global Step: 159010   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:57:01,996-Speed 5482.08 samples/sec   Loss 2.5777   LearningRate 0.0181   Epoch: 15   Global Step: 159020   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:57:09,446-Speed 5498.78 samples/sec   Loss 2.6006   LearningRate 0.0181   Epoch: 15   Global Step: 159030   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:57:16,936-Speed 5469.39 samples/sec   Loss 2.5700   LearningRate 0.0181   Epoch: 15   Global Step: 159040   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:57:24,524-Speed 5398.54 samples/sec   Loss 2.5792   LearningRate 0.0181   Epoch: 15   Global Step: 159050   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:57:32,082-Speed 5420.04 samples/sec   Loss 2.5553   LearningRate 0.0180   Epoch: 15   Global Step: 159060   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:57:39,697-Speed 5380.06 samples/sec   Loss 2.5614   LearningRate 0.0180   Epoch: 15   Global Step: 159070   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:57:47,263-Speed 5414.32 samples/sec   Loss 2.5815   LearningRate 0.0180   Epoch: 15   Global Step: 159080   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:57:54,786-Speed 5445.76 samples/sec   Loss 2.5703   LearningRate 0.0180   Epoch: 15   Global Step: 159090   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:58:02,248-Speed 5489.22 samples/sec   Loss 2.5887   LearningRate 0.0180   Epoch: 15   Global Step: 159100   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:58:09,755-Speed 5457.22 samples/sec   Loss 2.5462   LearningRate 0.0180   Epoch: 15   Global Step: 159110   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:58:17,254-Speed 5462.53 samples/sec   Loss 2.5856   LearningRate 0.0180   Epoch: 15   Global Step: 159120   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:58:24,849-Speed 5394.37 samples/sec   Loss 2.5109   LearningRate 0.0180   Epoch: 15   Global Step: 159130   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:58:32,378-Speed 5440.67 samples/sec   Loss 2.5899   LearningRate 0.0180   Epoch: 15   Global Step: 159140   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:58:39,910-Speed 5439.37 samples/sec   Loss 2.5509   LearningRate 0.0180   Epoch: 15   Global Step: 159150   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:58:47,377-Speed 5486.15 samples/sec   Loss 2.6033   LearningRate 0.0180   Epoch: 15   Global Step: 159160   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:58:54,845-Speed 5484.93 samples/sec   Loss 2.5759   LearningRate 0.0180   Epoch: 15   Global Step: 159170   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:59:02,404-Speed 5419.35 samples/sec   Loss 2.5539   LearningRate 0.0180   Epoch: 15   Global Step: 159180   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:59:09,904-Speed 5462.64 samples/sec   Loss 2.5827   LearningRate 0.0179   Epoch: 15   Global Step: 159190   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:59:17,443-Speed 5433.29 samples/sec   Loss 2.5820   LearningRate 0.0179   Epoch: 15   Global Step: 159200   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:59:24,962-Speed 5448.24 samples/sec   Loss 2.5497   LearningRate 0.0179   Epoch: 15   Global Step: 159210   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 06:59:32,394-Speed 5512.50 samples/sec   Loss 2.5913   LearningRate 0.0179   Epoch: 15   Global Step: 159220   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:59:39,869-Speed 5480.33 samples/sec   Loss 2.5776   LearningRate 0.0179   Epoch: 15   Global Step: 159230   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:59:47,361-Speed 5467.76 samples/sec   Loss 2.6055   LearningRate 0.0179   Epoch: 15   Global Step: 159240   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 06:59:54,904-Speed 5430.50 samples/sec   Loss 2.5779   LearningRate 0.0179   Epoch: 15   Global Step: 159250   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:00:02,405-Speed 5462.23 samples/sec   Loss 2.5373   LearningRate 0.0179   Epoch: 15   Global Step: 159260   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:00:10,083-Speed 5335.21 samples/sec   Loss 2.5892   LearningRate 0.0179   Epoch: 15   Global Step: 159270   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:00:17,554-Speed 5482.70 samples/sec   Loss 2.5850   LearningRate 0.0179   Epoch: 15   Global Step: 159280   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:00:25,046-Speed 5468.16 samples/sec   Loss 2.6143   LearningRate 0.0179   Epoch: 15   Global Step: 159290   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:00:32,547-Speed 5461.87 samples/sec   Loss 2.5892   LearningRate 0.0179   Epoch: 15   Global Step: 159300   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:00:40,051-Speed 5458.85 samples/sec   Loss 2.5767   LearningRate 0.0179   Epoch: 15   Global Step: 159310   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:00:47,579-Speed 5441.72 samples/sec   Loss 2.6176   LearningRate 0.0179   Epoch: 15   Global Step: 159320   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:00:55,392-Speed 5243.12 samples/sec   Loss 2.5821   LearningRate 0.0178   Epoch: 15   Global Step: 159330   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:01:02,980-Speed 5398.77 samples/sec   Loss 2.6179   LearningRate 0.0178   Epoch: 15   Global Step: 159340   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:01:10,476-Speed 5465.18 samples/sec   Loss 2.5338   LearningRate 0.0178   Epoch: 15   Global Step: 159350   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:01:17,976-Speed 5462.03 samples/sec   Loss 2.5461   LearningRate 0.0178   Epoch: 15   Global Step: 159360   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:01:25,515-Speed 5433.28 samples/sec   Loss 2.5435   LearningRate 0.0178   Epoch: 15   Global Step: 159370   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:01:33,087-Speed 5410.19 samples/sec   Loss 2.5684   LearningRate 0.0178   Epoch: 15   Global Step: 159380   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:01:40,585-Speed 5463.55 samples/sec   Loss 2.5574   LearningRate 0.0178   Epoch: 15   Global Step: 159390   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:01:48,063-Speed 5478.41 samples/sec   Loss 2.5939   LearningRate 0.0178   Epoch: 15   Global Step: 159400   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:01:55,493-Speed 5513.04 samples/sec   Loss 2.5558   LearningRate 0.0178   Epoch: 15   Global Step: 159410   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:02:02,970-Speed 5478.80 samples/sec   Loss 2.5358   LearningRate 0.0178   Epoch: 15   Global Step: 159420   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:02:10,418-Speed 5500.46 samples/sec   Loss 2.5737   LearningRate 0.0178   Epoch: 15   Global Step: 159430   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:02:18,039-Speed 5375.10 samples/sec   Loss 2.5521   LearningRate 0.0178   Epoch: 15   Global Step: 159440   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:02:25,543-Speed 5459.05 samples/sec   Loss 2.5683   LearningRate 0.0178   Epoch: 15   Global Step: 159450   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:02:32,988-Speed 5502.98 samples/sec   Loss 2.5914   LearningRate 0.0177   Epoch: 15   Global Step: 159460   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:02:40,538-Speed 5425.73 samples/sec   Loss 2.5597   LearningRate 0.0177   Epoch: 15   Global Step: 159470   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:02:48,041-Speed 5459.83 samples/sec   Loss 2.5177   LearningRate 0.0177   Epoch: 15   Global Step: 159480   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:02:55,597-Speed 5421.54 samples/sec   Loss 2.5279   LearningRate 0.0177   Epoch: 15   Global Step: 159490   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-01-09 07:03:03,127-Speed 5440.54 samples/sec   Loss 2.5470   LearningRate 0.0177   Epoch: 15   Global Step: 159500   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-01-09 07:03:10,702-Speed 5408.07 samples/sec   Loss 2.5209   LearningRate 0.0177   Epoch: 15   Global Step: 159510   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-01-09 07:03:18,210-Speed 5456.49 samples/sec   Loss 2.5650   LearningRate 0.0177   Epoch: 15   Global Step: 159520   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-01-09 07:03:25,656-Speed 5501.10 samples/sec   Loss 2.5507   LearningRate 0.0177   Epoch: 15   Global Step: 159530   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-01-09 07:03:33,328-Speed 5339.20 samples/sec   Loss 2.5808   LearningRate 0.0177   Epoch: 15   Global Step: 159540   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-01-09 07:03:40,840-Speed 5454.03 samples/sec   Loss 2.5614   LearningRate 0.0177   Epoch: 15   Global Step: 159550   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-01-09 07:03:48,344-Speed 5458.91 samples/sec   Loss 2.5272   LearningRate 0.0177   Epoch: 15   Global Step: 159560   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-01-09 07:03:55,849-Speed 5458.19 samples/sec   Loss 2.5361   LearningRate 0.0177   Epoch: 15   Global Step: 159570   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-01-09 07:04:03,441-Speed 5395.52 samples/sec   Loss 2.6053   LearningRate 0.0177   Epoch: 15   Global Step: 159580   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-01-09 07:04:10,946-Speed 5458.72 samples/sec   Loss 2.5960   LearningRate 0.0177   Epoch: 15   Global Step: 159590   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:04:18,458-Speed 5453.65 samples/sec   Loss 2.5378   LearningRate 0.0176   Epoch: 15   Global Step: 159600   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:04:25,966-Speed 5455.76 samples/sec   Loss 2.5592   LearningRate 0.0176   Epoch: 15   Global Step: 159610   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:04:33,479-Speed 5452.18 samples/sec   Loss 2.5714   LearningRate 0.0176   Epoch: 15   Global Step: 159620   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:04:41,067-Speed 5398.85 samples/sec   Loss 2.5590   LearningRate 0.0176   Epoch: 15   Global Step: 159630   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:04:48,573-Speed 5457.68 samples/sec   Loss 2.5401   LearningRate 0.0176   Epoch: 15   Global Step: 159640   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:04:56,015-Speed 5504.86 samples/sec   Loss 2.5467   LearningRate 0.0176   Epoch: 15   Global Step: 159650   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:05:03,477-Speed 5489.38 samples/sec   Loss 2.5956   LearningRate 0.0176   Epoch: 15   Global Step: 159660   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:05:10,974-Speed 5464.38 samples/sec   Loss 2.5997   LearningRate 0.0176   Epoch: 15   Global Step: 159670   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:05:18,433-Speed 5492.53 samples/sec   Loss 2.5565   LearningRate 0.0176   Epoch: 15   Global Step: 159680   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:05:25,898-Speed 5487.56 samples/sec   Loss 2.5556   LearningRate 0.0176   Epoch: 15   Global Step: 159690   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:05:33,374-Speed 5478.78 samples/sec   Loss 2.5395   LearningRate 0.0176   Epoch: 15   Global Step: 159700   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:05:40,857-Speed 5475.10 samples/sec   Loss 2.5733   LearningRate 0.0176   Epoch: 15   Global Step: 159710   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:05:48,317-Speed 5491.31 samples/sec   Loss 2.5321   LearningRate 0.0176   Epoch: 15   Global Step: 159720   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:05:55,818-Speed 5461.27 samples/sec   Loss 2.5093   LearningRate 0.0175   Epoch: 15   Global Step: 159730   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:06:03,346-Speed 5441.64 samples/sec   Loss 2.5547   LearningRate 0.0175   Epoch: 15   Global Step: 159740   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:06:10,889-Speed 5430.84 samples/sec   Loss 2.5153   LearningRate 0.0175   Epoch: 15   Global Step: 159750   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:06:18,475-Speed 5400.53 samples/sec   Loss 2.5380   LearningRate 0.0175   Epoch: 15   Global Step: 159760   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:06:25,950-Speed 5480.50 samples/sec   Loss 2.5244   LearningRate 0.0175   Epoch: 15   Global Step: 159770   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:06:33,458-Speed 5455.75 samples/sec   Loss 2.5597   LearningRate 0.0175   Epoch: 15   Global Step: 159780   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:06:40,904-Speed 5501.76 samples/sec   Loss 2.5485   LearningRate 0.0175   Epoch: 15   Global Step: 159790   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:06:48,401-Speed 5464.19 samples/sec   Loss 2.5207   LearningRate 0.0175   Epoch: 15   Global Step: 159800   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:06:55,845-Speed 5503.06 samples/sec   Loss 2.5861   LearningRate 0.0175   Epoch: 15   Global Step: 159810   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:07:03,316-Speed 5483.57 samples/sec   Loss 2.5754   LearningRate 0.0175   Epoch: 15   Global Step: 159820   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:07:10,729-Speed 5525.95 samples/sec   Loss 2.5337   LearningRate 0.0175   Epoch: 15   Global Step: 159830   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:07:18,327-Speed 5391.80 samples/sec   Loss 2.5601   LearningRate 0.0175   Epoch: 15   Global Step: 159840   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:07:25,910-Speed 5402.22 samples/sec   Loss 2.5623   LearningRate 0.0175   Epoch: 15   Global Step: 159850   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:07:33,357-Speed 5500.89 samples/sec   Loss 2.5684   LearningRate 0.0175   Epoch: 15   Global Step: 159860   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:07:40,814-Speed 5493.59 samples/sec   Loss 2.5308   LearningRate 0.0174   Epoch: 15   Global Step: 159870   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:07:48,298-Speed 5473.74 samples/sec   Loss 2.5709   LearningRate 0.0174   Epoch: 15   Global Step: 159880   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:07:55,793-Speed 5466.15 samples/sec   Loss 2.5462   LearningRate 0.0174   Epoch: 15   Global Step: 159890   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:08:03,412-Speed 5376.44 samples/sec   Loss 2.5624   LearningRate 0.0174   Epoch: 15   Global Step: 159900   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:08:10,973-Speed 5417.89 samples/sec   Loss 2.5516   LearningRate 0.0174   Epoch: 15   Global Step: 159910   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:08:18,595-Speed 5374.50 samples/sec   Loss 2.5693   LearningRate 0.0174   Epoch: 15   Global Step: 159920   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:08:26,158-Speed 5416.88 samples/sec   Loss 2.5341   LearningRate 0.0174   Epoch: 15   Global Step: 159930   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:08:33,712-Speed 5423.22 samples/sec   Loss 2.5428   LearningRate 0.0174   Epoch: 15   Global Step: 159940   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:08:41,334-Speed 5374.21 samples/sec   Loss 2.5422   LearningRate 0.0174   Epoch: 15   Global Step: 159950   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:08:48,879-Speed 5429.58 samples/sec   Loss 2.5074   LearningRate 0.0174   Epoch: 15   Global Step: 159960   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:08:56,400-Speed 5446.75 samples/sec   Loss 2.5452   LearningRate 0.0174   Epoch: 15   Global Step: 159970   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:09:03,868-Speed 5485.70 samples/sec   Loss 2.5132   LearningRate 0.0174   Epoch: 15   Global Step: 159980   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:09:11,420-Speed 5424.41 samples/sec   Loss 2.5474   LearningRate 0.0174   Epoch: 15   Global Step: 159990   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:09:19,043-Speed 5373.88 samples/sec   Loss 2.5358   LearningRate 0.0174   Epoch: 15   Global Step: 160000   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:10:02,497-[lfw][160000]XNorm: 22.309747
Training: 2022-01-09 07:10:02,498-[lfw][160000]Accuracy-Flip: 0.99833+-0.00236
Training: 2022-01-09 07:10:02,498-[lfw][160000]Accuracy-Highest: 0.99833
Training: 2022-01-09 07:10:53,121-[cfp_fp][160000]XNorm: 21.000020
Training: 2022-01-09 07:10:53,122-[cfp_fp][160000]Accuracy-Flip: 0.99214+-0.00301
Training: 2022-01-09 07:10:53,122-[cfp_fp][160000]Accuracy-Highest: 0.99371
Training: 2022-01-09 07:11:36,776-[agedb_30][160000]XNorm: 22.272978
Training: 2022-01-09 07:11:36,776-[agedb_30][160000]Accuracy-Flip: 0.98133+-0.00670
Training: 2022-01-09 07:11:36,777-[agedb_30][160000]Accuracy-Highest: 0.98217
Training: 2022-01-09 07:11:44,052-Speed 282.47 samples/sec   Loss 2.5251   LearningRate 0.0173   Epoch: 15   Global Step: 160010   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:11:51,653-Speed 5389.71 samples/sec   Loss 2.5380   LearningRate 0.0173   Epoch: 15   Global Step: 160020   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:11:59,236-Speed 5402.58 samples/sec   Loss 2.5441   LearningRate 0.0173   Epoch: 15   Global Step: 160030   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:12:06,803-Speed 5413.29 samples/sec   Loss 2.5343   LearningRate 0.0173   Epoch: 15   Global Step: 160040   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:12:14,404-Speed 5389.75 samples/sec   Loss 2.5359   LearningRate 0.0173   Epoch: 15   Global Step: 160050   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:12:21,942-Speed 5434.37 samples/sec   Loss 2.5564   LearningRate 0.0173   Epoch: 15   Global Step: 160060   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:12:29,587-Speed 5358.76 samples/sec   Loss 2.5517   LearningRate 0.0173   Epoch: 15   Global Step: 160070   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:12:37,194-Speed 5385.66 samples/sec   Loss 2.4695   LearningRate 0.0173   Epoch: 15   Global Step: 160080   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:12:44,765-Speed 5410.63 samples/sec   Loss 2.5183   LearningRate 0.0173   Epoch: 15   Global Step: 160090   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:12:52,538-Speed 5270.31 samples/sec   Loss 2.4992   LearningRate 0.0173   Epoch: 15   Global Step: 160100   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:13:00,033-Speed 5465.60 samples/sec   Loss 2.5386   LearningRate 0.0173   Epoch: 15   Global Step: 160110   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:13:07,577-Speed 5430.05 samples/sec   Loss 2.5469   LearningRate 0.0173   Epoch: 15   Global Step: 160120   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:13:15,090-Speed 5453.02 samples/sec   Loss 2.5226   LearningRate 0.0173   Epoch: 15   Global Step: 160130   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:13:22,764-Speed 5338.19 samples/sec   Loss 2.5490   LearningRate 0.0172   Epoch: 15   Global Step: 160140   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:13:30,446-Speed 5332.80 samples/sec   Loss 2.5280   LearningRate 0.0172   Epoch: 15   Global Step: 160150   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:13:38,023-Speed 5406.59 samples/sec   Loss 2.5181   LearningRate 0.0172   Epoch: 15   Global Step: 160160   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:13:45,728-Speed 5316.81 samples/sec   Loss 2.5416   LearningRate 0.0172   Epoch: 15   Global Step: 160170   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:13:53,372-Speed 5359.54 samples/sec   Loss 2.5259   LearningRate 0.0172   Epoch: 15   Global Step: 160180   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:14:00,888-Speed 5450.12 samples/sec   Loss 2.5073   LearningRate 0.0172   Epoch: 15   Global Step: 160190   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:14:08,449-Speed 5418.20 samples/sec   Loss 2.4935   LearningRate 0.0172   Epoch: 15   Global Step: 160200   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:14:16,122-Speed 5338.58 samples/sec   Loss 2.5479   LearningRate 0.0172   Epoch: 15   Global Step: 160210   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:14:23,753-Speed 5368.95 samples/sec   Loss 2.5334   LearningRate 0.0172   Epoch: 15   Global Step: 160220   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:14:31,319-Speed 5414.21 samples/sec   Loss 2.5193   LearningRate 0.0172   Epoch: 15   Global Step: 160230   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:14:38,918-Speed 5390.59 samples/sec   Loss 2.5009   LearningRate 0.0172   Epoch: 15   Global Step: 160240   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:14:46,610-Speed 5325.20 samples/sec   Loss 2.5005   LearningRate 0.0172   Epoch: 15   Global Step: 160250   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:14:54,269-Speed 5349.03 samples/sec   Loss 2.4973   LearningRate 0.0172   Epoch: 15   Global Step: 160260   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:15:01,795-Speed 5443.21 samples/sec   Loss 2.5340   LearningRate 0.0172   Epoch: 15   Global Step: 160270   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:15:09,338-Speed 5430.78 samples/sec   Loss 2.5314   LearningRate 0.0171   Epoch: 15   Global Step: 160280   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:15:16,850-Speed 5453.55 samples/sec   Loss 2.5098   LearningRate 0.0171   Epoch: 15   Global Step: 160290   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:15:24,404-Speed 5422.71 samples/sec   Loss 2.5203   LearningRate 0.0171   Epoch: 15   Global Step: 160300   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 07:15:31,904-Speed 5462.05 samples/sec   Loss 2.4985   LearningRate 0.0171   Epoch: 15   Global Step: 160310   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 07:15:39,350-Speed 5501.37 samples/sec   Loss 2.5263   LearningRate 0.0171   Epoch: 15   Global Step: 160320   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:15:46,819-Speed 5484.84 samples/sec   Loss 2.5159   LearningRate 0.0171   Epoch: 15   Global Step: 160330   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:15:54,337-Speed 5448.77 samples/sec   Loss 2.5110   LearningRate 0.0171   Epoch: 15   Global Step: 160340   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:16:01,834-Speed 5464.90 samples/sec   Loss 2.4978   LearningRate 0.0171   Epoch: 15   Global Step: 160350   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:16:09,361-Speed 5442.04 samples/sec   Loss 2.5523   LearningRate 0.0171   Epoch: 15   Global Step: 160360   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:16:16,864-Speed 5459.77 samples/sec   Loss 2.5253   LearningRate 0.0171   Epoch: 15   Global Step: 160370   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:16:24,279-Speed 5524.37 samples/sec   Loss 2.5095   LearningRate 0.0171   Epoch: 15   Global Step: 160380   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:16:31,769-Speed 5469.61 samples/sec   Loss 2.4992   LearningRate 0.0171   Epoch: 15   Global Step: 160390   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:16:39,216-Speed 5500.74 samples/sec   Loss 2.5486   LearningRate 0.0171   Epoch: 15   Global Step: 160400   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:16:46,693-Speed 5479.06 samples/sec   Loss 2.5356   LearningRate 0.0171   Epoch: 15   Global Step: 160410   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:16:54,238-Speed 5429.38 samples/sec   Loss 2.4796   LearningRate 0.0170   Epoch: 15   Global Step: 160420   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 07:17:01,792-Speed 5422.99 samples/sec   Loss 2.5049   LearningRate 0.0170   Epoch: 15   Global Step: 160430   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:17:09,269-Speed 5479.18 samples/sec   Loss 2.5046   LearningRate 0.0170   Epoch: 15   Global Step: 160440   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:17:16,734-Speed 5487.76 samples/sec   Loss 2.5264   LearningRate 0.0170   Epoch: 15   Global Step: 160450   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:17:24,325-Speed 5396.43 samples/sec   Loss 2.5331   LearningRate 0.0170   Epoch: 15   Global Step: 160460   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:17:31,835-Speed 5455.45 samples/sec   Loss 2.5396   LearningRate 0.0170   Epoch: 15   Global Step: 160470   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:17:39,441-Speed 5385.69 samples/sec   Loss 2.4968   LearningRate 0.0170   Epoch: 15   Global Step: 160480   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:17:46,996-Speed 5422.24 samples/sec   Loss 2.5324   LearningRate 0.0170   Epoch: 15   Global Step: 160490   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:17:54,499-Speed 5459.92 samples/sec   Loss 2.4951   LearningRate 0.0170   Epoch: 15   Global Step: 160500   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:18:01,982-Speed 5474.63 samples/sec   Loss 2.4606   LearningRate 0.0170   Epoch: 15   Global Step: 160510   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:18:09,467-Speed 5472.52 samples/sec   Loss 2.5144   LearningRate 0.0170   Epoch: 15   Global Step: 160520   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:18:17,047-Speed 5404.76 samples/sec   Loss 2.4892   LearningRate 0.0170   Epoch: 15   Global Step: 160530   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:18:24,690-Speed 5359.37 samples/sec   Loss 2.5157   LearningRate 0.0170   Epoch: 15   Global Step: 160540   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:18:32,217-Speed 5443.10 samples/sec   Loss 2.5137   LearningRate 0.0170   Epoch: 15   Global Step: 160550   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:18:39,765-Speed 5427.23 samples/sec   Loss 2.5310   LearningRate 0.0169   Epoch: 15   Global Step: 160560   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:18:47,260-Speed 5465.72 samples/sec   Loss 2.4695   LearningRate 0.0169   Epoch: 15   Global Step: 160570   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:18:54,810-Speed 5425.59 samples/sec   Loss 2.5029   LearningRate 0.0169   Epoch: 15   Global Step: 160580   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:19:02,376-Speed 5414.80 samples/sec   Loss 2.4934   LearningRate 0.0169   Epoch: 15   Global Step: 160590   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:19:09,947-Speed 5410.58 samples/sec   Loss 2.5044   LearningRate 0.0169   Epoch: 15   Global Step: 160600   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:19:17,537-Speed 5397.10 samples/sec   Loss 2.5122   LearningRate 0.0169   Epoch: 15   Global Step: 160610   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:19:25,070-Speed 5437.60 samples/sec   Loss 2.4956   LearningRate 0.0169   Epoch: 15   Global Step: 160620   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:19:32,600-Speed 5440.77 samples/sec   Loss 2.4946   LearningRate 0.0169   Epoch: 15   Global Step: 160630   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:19:40,094-Speed 5466.40 samples/sec   Loss 2.5384   LearningRate 0.0169   Epoch: 15   Global Step: 160640   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:19:47,570-Speed 5479.39 samples/sec   Loss 2.5268   LearningRate 0.0169   Epoch: 15   Global Step: 160650   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:19:55,076-Speed 5457.33 samples/sec   Loss 2.5205   LearningRate 0.0169   Epoch: 15   Global Step: 160660   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:20:02,529-Speed 5497.08 samples/sec   Loss 2.5077   LearningRate 0.0169   Epoch: 15   Global Step: 160670   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:20:10,199-Speed 5341.11 samples/sec   Loss 2.4960   LearningRate 0.0169   Epoch: 15   Global Step: 160680   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:20:17,713-Speed 5451.82 samples/sec   Loss 2.4930   LearningRate 0.0168   Epoch: 15   Global Step: 160690   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:20:25,243-Speed 5440.27 samples/sec   Loss 2.4636   LearningRate 0.0168   Epoch: 15   Global Step: 160700   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:20:32,709-Speed 5486.44 samples/sec   Loss 2.5302   LearningRate 0.0168   Epoch: 15   Global Step: 160710   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:20:40,272-Speed 5417.38 samples/sec   Loss 2.5090   LearningRate 0.0168   Epoch: 15   Global Step: 160720   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:20:47,766-Speed 5465.68 samples/sec   Loss 2.5093   LearningRate 0.0168   Epoch: 15   Global Step: 160730   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:20:55,258-Speed 5468.08 samples/sec   Loss 2.5115   LearningRate 0.0168   Epoch: 15   Global Step: 160740   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:21:02,834-Speed 5407.24 samples/sec   Loss 2.4925   LearningRate 0.0168   Epoch: 15   Global Step: 160750   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:21:10,342-Speed 5456.46 samples/sec   Loss 2.4835   LearningRate 0.0168   Epoch: 15   Global Step: 160760   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:21:17,901-Speed 5419.48 samples/sec   Loss 2.5025   LearningRate 0.0168   Epoch: 15   Global Step: 160770   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:21:25,409-Speed 5456.04 samples/sec   Loss 2.5362   LearningRate 0.0168   Epoch: 15   Global Step: 160780   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:21:32,908-Speed 5462.14 samples/sec   Loss 2.4484   LearningRate 0.0168   Epoch: 15   Global Step: 160790   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:21:40,416-Speed 5456.91 samples/sec   Loss 2.5030   LearningRate 0.0168   Epoch: 15   Global Step: 160800   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:21:47,899-Speed 5474.07 samples/sec   Loss 2.5094   LearningRate 0.0168   Epoch: 15   Global Step: 160810   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:21:55,418-Speed 5448.37 samples/sec   Loss 2.4731   LearningRate 0.0168   Epoch: 15   Global Step: 160820   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:22:03,036-Speed 5377.87 samples/sec   Loss 2.4856   LearningRate 0.0167   Epoch: 15   Global Step: 160830   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:22:10,619-Speed 5402.74 samples/sec   Loss 2.4557   LearningRate 0.0167   Epoch: 15   Global Step: 160840   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:22:18,132-Speed 5452.48 samples/sec   Loss 2.4952   LearningRate 0.0167   Epoch: 15   Global Step: 160850   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:22:25,752-Speed 5375.58 samples/sec   Loss 2.5179   LearningRate 0.0167   Epoch: 15   Global Step: 160860   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:22:33,339-Speed 5399.86 samples/sec   Loss 2.4964   LearningRate 0.0167   Epoch: 15   Global Step: 160870   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:22:40,830-Speed 5468.51 samples/sec   Loss 2.5049   LearningRate 0.0167   Epoch: 15   Global Step: 160880   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:22:48,304-Speed 5481.18 samples/sec   Loss 2.5011   LearningRate 0.0167   Epoch: 15   Global Step: 160890   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:22:55,738-Speed 5510.51 samples/sec   Loss 2.4763   LearningRate 0.0167   Epoch: 15   Global Step: 160900   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:23:03,254-Speed 5450.42 samples/sec   Loss 2.4752   LearningRate 0.0167   Epoch: 15   Global Step: 160910   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:23:10,831-Speed 5406.88 samples/sec   Loss 2.4879   LearningRate 0.0167   Epoch: 15   Global Step: 160920   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:23:18,484-Speed 5353.10 samples/sec   Loss 2.4625   LearningRate 0.0167   Epoch: 15   Global Step: 160930   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:23:26,035-Speed 5424.61 samples/sec   Loss 2.4504   LearningRate 0.0167   Epoch: 15   Global Step: 160940   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:23:33,530-Speed 5465.50 samples/sec   Loss 2.4841   LearningRate 0.0167   Epoch: 15   Global Step: 160950   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:23:41,216-Speed 5330.62 samples/sec   Loss 2.4404   LearningRate 0.0167   Epoch: 15   Global Step: 160960   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:23:48,706-Speed 5469.55 samples/sec   Loss 2.4742   LearningRate 0.0166   Epoch: 15   Global Step: 160970   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:23:56,382-Speed 5336.25 samples/sec   Loss 2.4415   LearningRate 0.0166   Epoch: 15   Global Step: 160980   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:24:03,894-Speed 5453.16 samples/sec   Loss 2.4719   LearningRate 0.0166   Epoch: 15   Global Step: 160990   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:24:11,468-Speed 5409.54 samples/sec   Loss 2.4544   LearningRate 0.0166   Epoch: 15   Global Step: 161000   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:24:18,938-Speed 5483.90 samples/sec   Loss 2.4796   LearningRate 0.0166   Epoch: 15   Global Step: 161010   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:24:26,453-Speed 5451.17 samples/sec   Loss 2.4472   LearningRate 0.0166   Epoch: 15   Global Step: 161020   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:24:33,928-Speed 5480.23 samples/sec   Loss 2.5242   LearningRate 0.0166   Epoch: 15   Global Step: 161030   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:24:41,421-Speed 5466.77 samples/sec   Loss 2.4572   LearningRate 0.0166   Epoch: 15   Global Step: 161040   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:24:48,853-Speed 5512.22 samples/sec   Loss 2.4709   LearningRate 0.0166   Epoch: 15   Global Step: 161050   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:24:56,310-Speed 5493.81 samples/sec   Loss 2.4966   LearningRate 0.0166   Epoch: 15   Global Step: 161060   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:25:03,816-Speed 5457.28 samples/sec   Loss 2.4739   LearningRate 0.0166   Epoch: 15   Global Step: 161070   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:25:11,385-Speed 5412.39 samples/sec   Loss 2.4455   LearningRate 0.0166   Epoch: 15   Global Step: 161080   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:25:18,960-Speed 5408.60 samples/sec   Loss 2.4426   LearningRate 0.0166   Epoch: 15   Global Step: 161090   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 07:25:26,461-Speed 5461.12 samples/sec   Loss 2.5267   LearningRate 0.0166   Epoch: 15   Global Step: 161100   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:25:33,954-Speed 5466.97 samples/sec   Loss 2.5303   LearningRate 0.0165   Epoch: 15   Global Step: 161110   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:25:41,489-Speed 5436.44 samples/sec   Loss 2.4528   LearningRate 0.0165   Epoch: 15   Global Step: 161120   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:25:49,140-Speed 5354.48 samples/sec   Loss 2.4728   LearningRate 0.0165   Epoch: 15   Global Step: 161130   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:25:56,730-Speed 5397.63 samples/sec   Loss 2.5051   LearningRate 0.0165   Epoch: 15   Global Step: 161140   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:26:04,268-Speed 5434.64 samples/sec   Loss 2.4908   LearningRate 0.0165   Epoch: 15   Global Step: 161150   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:26:11,758-Speed 5468.73 samples/sec   Loss 2.4509   LearningRate 0.0165   Epoch: 15   Global Step: 161160   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:26:19,262-Speed 5459.69 samples/sec   Loss 2.5084   LearningRate 0.0165   Epoch: 15   Global Step: 161170   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:26:26,749-Speed 5471.36 samples/sec   Loss 2.4715   LearningRate 0.0165   Epoch: 15   Global Step: 161180   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:26:34,315-Speed 5414.20 samples/sec   Loss 2.4681   LearningRate 0.0165   Epoch: 15   Global Step: 161190   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:26:41,838-Speed 5445.11 samples/sec   Loss 2.4784   LearningRate 0.0165   Epoch: 15   Global Step: 161200   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:26:49,413-Speed 5408.23 samples/sec   Loss 2.4514   LearningRate 0.0165   Epoch: 15   Global Step: 161210   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:26:56,971-Speed 5420.20 samples/sec   Loss 2.5010   LearningRate 0.0165   Epoch: 15   Global Step: 161220   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:27:04,406-Speed 5509.89 samples/sec   Loss 2.4815   LearningRate 0.0165   Epoch: 15   Global Step: 161230   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:27:11,873-Speed 5485.65 samples/sec   Loss 2.4739   LearningRate 0.0165   Epoch: 15   Global Step: 161240   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:27:19,385-Speed 5453.42 samples/sec   Loss 2.4869   LearningRate 0.0164   Epoch: 15   Global Step: 161250   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:27:26,860-Speed 5481.17 samples/sec   Loss 2.4823   LearningRate 0.0164   Epoch: 15   Global Step: 161260   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:27:34,335-Speed 5479.62 samples/sec   Loss 2.4661   LearningRate 0.0164   Epoch: 15   Global Step: 161270   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:27:41,764-Speed 5514.29 samples/sec   Loss 2.4991   LearningRate 0.0164   Epoch: 15   Global Step: 161280   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:27:49,374-Speed 5383.41 samples/sec   Loss 2.4526   LearningRate 0.0164   Epoch: 15   Global Step: 161290   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:27:56,811-Speed 5508.06 samples/sec   Loss 2.5219   LearningRate 0.0164   Epoch: 15   Global Step: 161300   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:28:04,381-Speed 5411.78 samples/sec   Loss 2.4406   LearningRate 0.0164   Epoch: 15   Global Step: 161310   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:28:11,877-Speed 5464.34 samples/sec   Loss 2.4652   LearningRate 0.0164   Epoch: 15   Global Step: 161320   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:28:19,430-Speed 5423.48 samples/sec   Loss 2.4511   LearningRate 0.0164   Epoch: 15   Global Step: 161330   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:28:26,954-Speed 5445.11 samples/sec   Loss 2.4804   LearningRate 0.0164   Epoch: 15   Global Step: 161340   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:28:34,404-Speed 5498.58 samples/sec   Loss 2.4858   LearningRate 0.0164   Epoch: 15   Global Step: 161350   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:28:41,819-Speed 5524.51 samples/sec   Loss 2.4822   LearningRate 0.0164   Epoch: 15   Global Step: 161360   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:28:49,318-Speed 5462.41 samples/sec   Loss 2.4642   LearningRate 0.0164   Epoch: 15   Global Step: 161370   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 07:28:56,776-Speed 5493.31 samples/sec   Loss 2.4794   LearningRate 0.0164   Epoch: 15   Global Step: 161380   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:29:04,296-Speed 5448.00 samples/sec   Loss 2.4741   LearningRate 0.0163   Epoch: 15   Global Step: 161390   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:29:11,822-Speed 5442.61 samples/sec   Loss 2.4569   LearningRate 0.0163   Epoch: 15   Global Step: 161400   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:29:19,444-Speed 5374.38 samples/sec   Loss 2.4528   LearningRate 0.0163   Epoch: 15   Global Step: 161410   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:29:26,976-Speed 5439.63 samples/sec   Loss 2.4646   LearningRate 0.0163   Epoch: 15   Global Step: 161420   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:29:34,480-Speed 5458.73 samples/sec   Loss 2.4846   LearningRate 0.0163   Epoch: 15   Global Step: 161430   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-01-09 07:29:42,031-Speed 5425.00 samples/sec   Loss 2.4528   LearningRate 0.0163   Epoch: 15   Global Step: 161440   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-01-09 07:29:49,546-Speed 5451.07 samples/sec   Loss 2.4604   LearningRate 0.0163   Epoch: 15   Global Step: 161450   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-01-09 07:29:57,089-Speed 5431.29 samples/sec   Loss 2.4759   LearningRate 0.0163   Epoch: 15   Global Step: 161460   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-01-09 07:30:04,689-Speed 5389.85 samples/sec   Loss 2.4412   LearningRate 0.0163   Epoch: 15   Global Step: 161470   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-01-09 07:30:12,311-Speed 5375.17 samples/sec   Loss 2.4726   LearningRate 0.0163   Epoch: 15   Global Step: 161480   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-01-09 07:30:19,860-Speed 5425.89 samples/sec   Loss 2.4797   LearningRate 0.0163   Epoch: 15   Global Step: 161490   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-01-09 07:30:27,329-Speed 5484.72 samples/sec   Loss 2.4447   LearningRate 0.0163   Epoch: 15   Global Step: 161500   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-01-09 07:30:34,853-Speed 5445.05 samples/sec   Loss 2.4327   LearningRate 0.0163   Epoch: 15   Global Step: 161510   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-01-09 07:30:42,415-Speed 5417.15 samples/sec   Loss 2.4449   LearningRate 0.0163   Epoch: 15   Global Step: 161520   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-01-09 07:30:49,936-Speed 5446.65 samples/sec   Loss 2.4369   LearningRate 0.0162   Epoch: 15   Global Step: 161530   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 07:30:57,430-Speed 5466.72 samples/sec   Loss 2.4805   LearningRate 0.0162   Epoch: 15   Global Step: 161540   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:31:04,938-Speed 5456.07 samples/sec   Loss 2.4674   LearningRate 0.0162   Epoch: 15   Global Step: 161550   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:31:12,491-Speed 5424.03 samples/sec   Loss 2.4505   LearningRate 0.0162   Epoch: 15   Global Step: 161560   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:31:20,052-Speed 5417.64 samples/sec   Loss 2.4550   LearningRate 0.0162   Epoch: 15   Global Step: 161570   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:31:27,511-Speed 5491.97 samples/sec   Loss 2.4703   LearningRate 0.0162   Epoch: 15   Global Step: 161580   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-01-09 07:31:35,046-Speed 5436.85 samples/sec   Loss 2.4590   LearningRate 0.0162   Epoch: 15   Global Step: 161590   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-01-09 07:31:42,615-Speed 5412.38 samples/sec   Loss 2.4680   LearningRate 0.0162   Epoch: 15   Global Step: 161600   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-01-09 07:31:50,095-Speed 5476.62 samples/sec   Loss 2.4775   LearningRate 0.0162   Epoch: 15   Global Step: 161610   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-01-09 07:31:57,600-Speed 5458.15 samples/sec   Loss 2.4687   LearningRate 0.0162   Epoch: 15   Global Step: 161620   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-01-09 07:32:05,191-Speed 5397.07 samples/sec   Loss 2.4488   LearningRate 0.0162   Epoch: 15   Global Step: 161630   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-01-09 07:32:12,752-Speed 5418.07 samples/sec   Loss 2.4410   LearningRate 0.0162   Epoch: 15   Global Step: 161640   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-01-09 07:32:20,327-Speed 5407.31 samples/sec   Loss 2.4528   LearningRate 0.0162   Epoch: 15   Global Step: 161650   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-01-09 07:32:27,799-Speed 5482.97 samples/sec   Loss 2.4253   LearningRate 0.0162   Epoch: 15   Global Step: 161660   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-01-09 07:32:35,443-Speed 5358.75 samples/sec   Loss 2.4483   LearningRate 0.0161   Epoch: 15   Global Step: 161670   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-01-09 07:32:43,108-Speed 5344.52 samples/sec   Loss 2.4494   LearningRate 0.0161   Epoch: 15   Global Step: 161680   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:32:50,804-Speed 5323.36 samples/sec   Loss 2.4677   LearningRate 0.0161   Epoch: 15   Global Step: 161690   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:32:58,320-Speed 5449.98 samples/sec   Loss 2.4431   LearningRate 0.0161   Epoch: 15   Global Step: 161700   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:33:05,815-Speed 5465.92 samples/sec   Loss 2.4559   LearningRate 0.0161   Epoch: 15   Global Step: 161710   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:33:13,253-Speed 5507.41 samples/sec   Loss 2.4450   LearningRate 0.0161   Epoch: 15   Global Step: 161720   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:33:20,841-Speed 5398.37 samples/sec   Loss 2.4665   LearningRate 0.0161   Epoch: 15   Global Step: 161730   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:33:28,390-Speed 5426.88 samples/sec   Loss 2.4004   LearningRate 0.0161   Epoch: 15   Global Step: 161740   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:33:35,987-Speed 5392.68 samples/sec   Loss 2.4143   LearningRate 0.0161   Epoch: 15   Global Step: 161750   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:33:43,514-Speed 5442.74 samples/sec   Loss 2.4499   LearningRate 0.0161   Epoch: 15   Global Step: 161760   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:33:51,013-Speed 5462.47 samples/sec   Loss 2.4439   LearningRate 0.0161   Epoch: 15   Global Step: 161770   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:33:58,557-Speed 5430.28 samples/sec   Loss 2.4473   LearningRate 0.0161   Epoch: 15   Global Step: 161780   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:34:06,137-Speed 5404.55 samples/sec   Loss 2.3968   LearningRate 0.0161   Epoch: 15   Global Step: 161790   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:34:13,598-Speed 5490.95 samples/sec   Loss 2.4481   LearningRate 0.0161   Epoch: 15   Global Step: 161800   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:34:21,091-Speed 5467.00 samples/sec   Loss 2.4414   LearningRate 0.0161   Epoch: 15   Global Step: 161810   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:34:28,614-Speed 5445.83 samples/sec   Loss 2.4412   LearningRate 0.0160   Epoch: 15   Global Step: 161820   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:34:36,124-Speed 5454.64 samples/sec   Loss 2.4356   LearningRate 0.0160   Epoch: 15   Global Step: 161830   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:34:43,731-Speed 5385.35 samples/sec   Loss 2.4591   LearningRate 0.0160   Epoch: 15   Global Step: 161840   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:34:51,249-Speed 5448.70 samples/sec   Loss 2.4281   LearningRate 0.0160   Epoch: 15   Global Step: 161850   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:34:58,888-Speed 5362.54 samples/sec   Loss 2.4179   LearningRate 0.0160   Epoch: 15   Global Step: 161860   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:35:06,575-Speed 5329.81 samples/sec   Loss 2.4465   LearningRate 0.0160   Epoch: 15   Global Step: 161870   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:35:14,160-Speed 5400.55 samples/sec   Loss 2.4244   LearningRate 0.0160   Epoch: 15   Global Step: 161880   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:35:21,726-Speed 5414.76 samples/sec   Loss 2.4059   LearningRate 0.0160   Epoch: 15   Global Step: 161890   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:35:29,304-Speed 5405.58 samples/sec   Loss 2.4717   LearningRate 0.0160   Epoch: 15   Global Step: 161900   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:35:36,850-Speed 5428.89 samples/sec   Loss 2.4208   LearningRate 0.0160   Epoch: 15   Global Step: 161910   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:35:44,475-Speed 5372.31 samples/sec   Loss 2.4335   LearningRate 0.0160   Epoch: 15   Global Step: 161920   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:35:52,007-Speed 5438.89 samples/sec   Loss 2.4470   LearningRate 0.0160   Epoch: 15   Global Step: 161930   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:35:59,472-Speed 5487.84 samples/sec   Loss 2.4295   LearningRate 0.0160   Epoch: 15   Global Step: 161940   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:36:06,992-Speed 5447.98 samples/sec   Loss 2.4362   LearningRate 0.0160   Epoch: 15   Global Step: 161950   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:36:14,524-Speed 5438.75 samples/sec   Loss 2.4228   LearningRate 0.0159   Epoch: 15   Global Step: 161960   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:36:22,102-Speed 5405.97 samples/sec   Loss 2.4450   LearningRate 0.0159   Epoch: 15   Global Step: 161970   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:36:29,724-Speed 5374.34 samples/sec   Loss 2.4210   LearningRate 0.0159   Epoch: 15   Global Step: 161980   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:36:37,358-Speed 5366.70 samples/sec   Loss 2.4492   LearningRate 0.0159   Epoch: 15   Global Step: 161990   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:36:45,014-Speed 5350.45 samples/sec   Loss 2.3755   LearningRate 0.0159   Epoch: 15   Global Step: 162000   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:37:28,674-[lfw][162000]XNorm: 22.860283
Training: 2022-01-09 07:37:28,674-[lfw][162000]Accuracy-Flip: 0.99833+-0.00197
Training: 2022-01-09 07:37:28,675-[lfw][162000]Accuracy-Highest: 0.99833
Training: 2022-01-09 07:38:19,599-[cfp_fp][162000]XNorm: 21.701475
Training: 2022-01-09 07:38:19,600-[cfp_fp][162000]Accuracy-Flip: 0.99286+-0.00409
Training: 2022-01-09 07:38:19,601-[cfp_fp][162000]Accuracy-Highest: 0.99371
Training: 2022-01-09 07:39:03,439-[agedb_30][162000]XNorm: 23.361640
Training: 2022-01-09 07:39:03,440-[agedb_30][162000]Accuracy-Flip: 0.98333+-0.00695
Training: 2022-01-09 07:39:03,440-[agedb_30][162000]Accuracy-Highest: 0.98333
Training: 2022-01-09 07:39:11,145-Speed 280.30 samples/sec   Loss 2.4089   LearningRate 0.0159   Epoch: 15   Global Step: 162010   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:39:18,725-Speed 5404.74 samples/sec   Loss 2.4423   LearningRate 0.0159   Epoch: 15   Global Step: 162020   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:39:26,220-Speed 5465.44 samples/sec   Loss 2.4469   LearningRate 0.0159   Epoch: 15   Global Step: 162030   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:39:33,673-Speed 5496.32 samples/sec   Loss 2.4311   LearningRate 0.0159   Epoch: 15   Global Step: 162040   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:39:41,194-Speed 5446.97 samples/sec   Loss 2.3981   LearningRate 0.0159   Epoch: 15   Global Step: 162050   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:39:48,684-Speed 5469.08 samples/sec   Loss 2.4072   LearningRate 0.0159   Epoch: 15   Global Step: 162060   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:39:56,241-Speed 5420.76 samples/sec   Loss 2.4085   LearningRate 0.0159   Epoch: 15   Global Step: 162070   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:40:03,721-Speed 5477.00 samples/sec   Loss 2.4214   LearningRate 0.0159   Epoch: 15   Global Step: 162080   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:40:11,217-Speed 5465.32 samples/sec   Loss 2.4149   LearningRate 0.0159   Epoch: 15   Global Step: 162090   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:40:18,793-Speed 5407.05 samples/sec   Loss 2.4055   LearningRate 0.0158   Epoch: 15   Global Step: 162100   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:40:26,312-Speed 5448.23 samples/sec   Loss 2.4283   LearningRate 0.0158   Epoch: 15   Global Step: 162110   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:40:33,794-Speed 5475.31 samples/sec   Loss 2.4221   LearningRate 0.0158   Epoch: 15   Global Step: 162120   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:40:41,267-Speed 5481.70 samples/sec   Loss 2.4253   LearningRate 0.0158   Epoch: 15   Global Step: 162130   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:40:48,751-Speed 5474.06 samples/sec   Loss 2.4479   LearningRate 0.0158   Epoch: 15   Global Step: 162140   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:40:56,248-Speed 5464.03 samples/sec   Loss 2.3833   LearningRate 0.0158   Epoch: 15   Global Step: 162150   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:41:03,794-Speed 5428.98 samples/sec   Loss 2.3920   LearningRate 0.0158   Epoch: 15   Global Step: 162160   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:41:11,486-Speed 5325.98 samples/sec   Loss 2.3871   LearningRate 0.0158   Epoch: 15   Global Step: 162170   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:41:19,070-Speed 5401.11 samples/sec   Loss 2.4197   LearningRate 0.0158   Epoch: 15   Global Step: 162180   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-01-09 07:41:26,640-Speed 5411.63 samples/sec   Loss 2.4068   LearningRate 0.0158   Epoch: 15   Global Step: 162190   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-01-09 07:41:34,203-Speed 5416.80 samples/sec   Loss 2.4266   LearningRate 0.0158   Epoch: 15   Global Step: 162200   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-01-09 07:41:41,915-Speed 5312.21 samples/sec   Loss 2.4063   LearningRate 0.0158   Epoch: 15   Global Step: 162210   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-01-09 07:41:49,559-Speed 5359.07 samples/sec   Loss 2.4206   LearningRate 0.0158   Epoch: 15   Global Step: 162220   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-01-09 07:41:57,169-Speed 5382.67 samples/sec   Loss 2.4484   LearningRate 0.0158   Epoch: 15   Global Step: 162230   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-01-09 07:42:04,693-Speed 5445.18 samples/sec   Loss 2.4205   LearningRate 0.0157   Epoch: 15   Global Step: 162240   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-01-09 07:42:12,193-Speed 5461.94 samples/sec   Loss 2.3922   LearningRate 0.0157   Epoch: 15   Global Step: 162250   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-01-09 07:42:19,759-Speed 5414.31 samples/sec   Loss 2.4415   LearningRate 0.0157   Epoch: 15   Global Step: 162260   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-01-09 07:42:27,251-Speed 5467.75 samples/sec   Loss 2.3855   LearningRate 0.0157   Epoch: 15   Global Step: 162270   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-01-09 07:42:34,763-Speed 5453.88 samples/sec   Loss 2.4254   LearningRate 0.0157   Epoch: 15   Global Step: 162280   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:42:42,286-Speed 5445.70 samples/sec   Loss 2.3889   LearningRate 0.0157   Epoch: 15   Global Step: 162290   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:42:49,861-Speed 5407.19 samples/sec   Loss 2.4149   LearningRate 0.0157   Epoch: 15   Global Step: 162300   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:42:57,326-Speed 5487.85 samples/sec   Loss 2.4434   LearningRate 0.0157   Epoch: 15   Global Step: 162310   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:43:04,865-Speed 5433.91 samples/sec   Loss 2.3915   LearningRate 0.0157   Epoch: 15   Global Step: 162320   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:43:12,364-Speed 5463.26 samples/sec   Loss 2.4476   LearningRate 0.0157   Epoch: 15   Global Step: 162330   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:43:19,826-Speed 5489.62 samples/sec   Loss 2.4205   LearningRate 0.0157   Epoch: 15   Global Step: 162340   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:43:27,268-Speed 5504.58 samples/sec   Loss 2.4334   LearningRate 0.0157   Epoch: 15   Global Step: 162350   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:43:34,791-Speed 5446.09 samples/sec   Loss 2.4230   LearningRate 0.0157   Epoch: 15   Global Step: 162360   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:43:42,376-Speed 5400.77 samples/sec   Loss 2.4225   LearningRate 0.0157   Epoch: 15   Global Step: 162370   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:43:49,885-Speed 5455.73 samples/sec   Loss 2.3964   LearningRate 0.0157   Epoch: 15   Global Step: 162380   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:43:57,316-Speed 5512.31 samples/sec   Loss 2.3939   LearningRate 0.0156   Epoch: 15   Global Step: 162390   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:44:04,766-Speed 5498.85 samples/sec   Loss 2.3978   LearningRate 0.0156   Epoch: 15   Global Step: 162400   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:44:12,361-Speed 5393.59 samples/sec   Loss 2.4029   LearningRate 0.0156   Epoch: 15   Global Step: 162410   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:44:19,938-Speed 5406.65 samples/sec   Loss 2.3987   LearningRate 0.0156   Epoch: 15   Global Step: 162420   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:44:27,619-Speed 5333.18 samples/sec   Loss 2.4412   LearningRate 0.0156   Epoch: 15   Global Step: 162430   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:44:35,078-Speed 5492.26 samples/sec   Loss 2.3898   LearningRate 0.0156   Epoch: 15   Global Step: 162440   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:44:42,489-Speed 5528.00 samples/sec   Loss 2.4135   LearningRate 0.0156   Epoch: 15   Global Step: 162450   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:44:50,055-Speed 5414.74 samples/sec   Loss 2.4024   LearningRate 0.0156   Epoch: 15   Global Step: 162460   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:44:57,607-Speed 5424.25 samples/sec   Loss 2.3969   LearningRate 0.0156   Epoch: 15   Global Step: 162470   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:45:05,201-Speed 5394.29 samples/sec   Loss 2.4020   LearningRate 0.0156   Epoch: 15   Global Step: 162480   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:45:12,678-Speed 5478.76 samples/sec   Loss 2.3910   LearningRate 0.0156   Epoch: 15   Global Step: 162490   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:45:20,175-Speed 5464.61 samples/sec   Loss 2.4142   LearningRate 0.0156   Epoch: 15   Global Step: 162500   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:45:27,721-Speed 5428.45 samples/sec   Loss 2.4122   LearningRate 0.0156   Epoch: 15   Global Step: 162510   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:45:35,265-Speed 5429.77 samples/sec   Loss 2.3944   LearningRate 0.0156   Epoch: 15   Global Step: 162520   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:45:42,868-Speed 5388.68 samples/sec   Loss 2.4050   LearningRate 0.0155   Epoch: 15   Global Step: 162530   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:45:50,394-Speed 5443.77 samples/sec   Loss 2.4092   LearningRate 0.0155   Epoch: 15   Global Step: 162540   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:45:57,837-Speed 5503.46 samples/sec   Loss 2.4073   LearningRate 0.0155   Epoch: 15   Global Step: 162550   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:46:05,289-Speed 5496.83 samples/sec   Loss 2.4266   LearningRate 0.0155   Epoch: 15   Global Step: 162560   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:46:12,933-Speed 5359.38 samples/sec   Loss 2.3968   LearningRate 0.0155   Epoch: 15   Global Step: 162570   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:46:20,424-Speed 5468.88 samples/sec   Loss 2.3781   LearningRate 0.0155   Epoch: 15   Global Step: 162580   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:46:27,984-Speed 5418.04 samples/sec   Loss 2.3754   LearningRate 0.0155   Epoch: 15   Global Step: 162590   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:46:35,462-Speed 5477.96 samples/sec   Loss 2.4121   LearningRate 0.0155   Epoch: 15   Global Step: 162600   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:46:42,943-Speed 5475.89 samples/sec   Loss 2.3932   LearningRate 0.0155   Epoch: 15   Global Step: 162610   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:46:50,421-Speed 5478.45 samples/sec   Loss 2.4383   LearningRate 0.0155   Epoch: 15   Global Step: 162620   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:46:57,915-Speed 5466.66 samples/sec   Loss 2.4526   LearningRate 0.0155   Epoch: 15   Global Step: 162630   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:47:05,504-Speed 5397.71 samples/sec   Loss 2.4040   LearningRate 0.0155   Epoch: 15   Global Step: 162640   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:47:12,998-Speed 5466.91 samples/sec   Loss 2.4216   LearningRate 0.0155   Epoch: 15   Global Step: 162650   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:47:20,444-Speed 5501.48 samples/sec   Loss 2.3758   LearningRate 0.0155   Epoch: 15   Global Step: 162660   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:47:27,903-Speed 5492.09 samples/sec   Loss 2.3581   LearningRate 0.0155   Epoch: 15   Global Step: 162670   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:47:35,391-Speed 5470.93 samples/sec   Loss 2.3715   LearningRate 0.0154   Epoch: 15   Global Step: 162680   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:47:42,979-Speed 5398.79 samples/sec   Loss 2.4405   LearningRate 0.0154   Epoch: 15   Global Step: 162690   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:47:50,488-Speed 5455.02 samples/sec   Loss 2.3958   LearningRate 0.0154   Epoch: 15   Global Step: 162700   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:47:58,094-Speed 5386.25 samples/sec   Loss 2.4170   LearningRate 0.0154   Epoch: 15   Global Step: 162710   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:48:05,717-Speed 5374.28 samples/sec   Loss 2.3892   LearningRate 0.0154   Epoch: 15   Global Step: 162720   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:48:13,305-Speed 5398.65 samples/sec   Loss 2.4140   LearningRate 0.0154   Epoch: 15   Global Step: 162730   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:48:20,818-Speed 5452.16 samples/sec   Loss 2.3638   LearningRate 0.0154   Epoch: 15   Global Step: 162740   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:48:28,367-Speed 5426.50 samples/sec   Loss 2.3830   LearningRate 0.0154   Epoch: 15   Global Step: 162750   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:48:35,835-Speed 5485.95 samples/sec   Loss 2.4039   LearningRate 0.0154   Epoch: 15   Global Step: 162760   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:48:43,356-Speed 5446.92 samples/sec   Loss 2.3871   LearningRate 0.0154   Epoch: 15   Global Step: 162770   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:48:50,831-Speed 5479.54 samples/sec   Loss 2.3947   LearningRate 0.0154   Epoch: 15   Global Step: 162780   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:48:58,329-Speed 5463.62 samples/sec   Loss 2.3612   LearningRate 0.0154   Epoch: 15   Global Step: 162790   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:49:05,796-Speed 5486.08 samples/sec   Loss 2.4042   LearningRate 0.0154   Epoch: 15   Global Step: 162800   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:49:13,448-Speed 5354.21 samples/sec   Loss 2.3906   LearningRate 0.0154   Epoch: 15   Global Step: 162810   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:49:21,087-Speed 5362.39 samples/sec   Loss 2.3910   LearningRate 0.0153   Epoch: 15   Global Step: 162820   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:49:28,644-Speed 5421.54 samples/sec   Loss 2.3674   LearningRate 0.0153   Epoch: 15   Global Step: 162830   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:49:36,335-Speed 5326.14 samples/sec   Loss 2.3930   LearningRate 0.0153   Epoch: 15   Global Step: 162840   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:49:43,837-Speed 5460.79 samples/sec   Loss 2.3973   LearningRate 0.0153   Epoch: 15   Global Step: 162850   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:49:51,417-Speed 5404.33 samples/sec   Loss 2.3860   LearningRate 0.0153   Epoch: 15   Global Step: 162860   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:49:59,170-Speed 5283.85 samples/sec   Loss 2.4045   LearningRate 0.0153   Epoch: 15   Global Step: 162870   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:50:06,804-Speed 5366.13 samples/sec   Loss 2.3957   LearningRate 0.0153   Epoch: 15   Global Step: 162880   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:50:14,290-Speed 5471.99 samples/sec   Loss 2.3823   LearningRate 0.0153   Epoch: 15   Global Step: 162890   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:50:21,728-Speed 5507.57 samples/sec   Loss 2.3701   LearningRate 0.0153   Epoch: 15   Global Step: 162900   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:50:29,209-Speed 5476.31 samples/sec   Loss 2.3761   LearningRate 0.0153   Epoch: 15   Global Step: 162910   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:50:36,765-Speed 5421.45 samples/sec   Loss 2.3953   LearningRate 0.0153   Epoch: 15   Global Step: 162920   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:50:44,277-Speed 5453.08 samples/sec   Loss 2.3774   LearningRate 0.0153   Epoch: 15   Global Step: 162930   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:50:51,789-Speed 5453.33 samples/sec   Loss 2.3723   LearningRate 0.0153   Epoch: 15   Global Step: 162940   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:50:59,327-Speed 5434.79 samples/sec   Loss 2.3916   LearningRate 0.0153   Epoch: 15   Global Step: 162950   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:51:06,915-Speed 5398.44 samples/sec   Loss 2.3463   LearningRate 0.0153   Epoch: 15   Global Step: 162960   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:51:14,429-Speed 5451.81 samples/sec   Loss 2.3706   LearningRate 0.0152   Epoch: 15   Global Step: 162970   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:51:22,041-Speed 5382.00 samples/sec   Loss 2.3899   LearningRate 0.0152   Epoch: 15   Global Step: 162980   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:51:29,506-Speed 5487.19 samples/sec   Loss 2.3624   LearningRate 0.0152   Epoch: 15   Global Step: 162990   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:51:37,044-Speed 5434.73 samples/sec   Loss 2.3135   LearningRate 0.0152   Epoch: 15   Global Step: 163000   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:51:44,522-Speed 5478.49 samples/sec   Loss 2.3674   LearningRate 0.0152   Epoch: 15   Global Step: 163010   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:51:52,010-Speed 5470.08 samples/sec   Loss 2.3620   LearningRate 0.0152   Epoch: 15   Global Step: 163020   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:51:59,566-Speed 5422.13 samples/sec   Loss 2.3911   LearningRate 0.0152   Epoch: 15   Global Step: 163030   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:52:07,030-Speed 5487.98 samples/sec   Loss 2.3611   LearningRate 0.0152   Epoch: 15   Global Step: 163040   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:52:14,680-Speed 5354.93 samples/sec   Loss 2.3467   LearningRate 0.0152   Epoch: 15   Global Step: 163050   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:52:22,252-Speed 5410.00 samples/sec   Loss 2.4262   LearningRate 0.0152   Epoch: 15   Global Step: 163060   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:52:29,766-Speed 5452.28 samples/sec   Loss 2.3608   LearningRate 0.0152   Epoch: 15   Global Step: 163070   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:52:37,366-Speed 5390.17 samples/sec   Loss 2.3484   LearningRate 0.0152   Epoch: 15   Global Step: 163080   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:52:44,872-Speed 5457.93 samples/sec   Loss 2.3593   LearningRate 0.0152   Epoch: 15   Global Step: 163090   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:52:52,594-Speed 5305.00 samples/sec   Loss 2.3327   LearningRate 0.0152   Epoch: 15   Global Step: 163100   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:53:00,258-Speed 5344.87 samples/sec   Loss 2.3665   LearningRate 0.0151   Epoch: 15   Global Step: 163110   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:53:07,876-Speed 5377.52 samples/sec   Loss 2.3539   LearningRate 0.0151   Epoch: 15   Global Step: 163120   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:53:15,480-Speed 5388.01 samples/sec   Loss 2.3879   LearningRate 0.0151   Epoch: 15   Global Step: 163130   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:53:23,166-Speed 5330.10 samples/sec   Loss 2.3470   LearningRate 0.0151   Epoch: 15   Global Step: 163140   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:53:30,729-Speed 5416.40 samples/sec   Loss 2.3628   LearningRate 0.0151   Epoch: 15   Global Step: 163150   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:53:38,257-Speed 5441.96 samples/sec   Loss 2.3838   LearningRate 0.0151   Epoch: 15   Global Step: 163160   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:53:45,816-Speed 5419.63 samples/sec   Loss 2.3760   LearningRate 0.0151   Epoch: 15   Global Step: 163170   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:53:53,378-Speed 5416.96 samples/sec   Loss 2.3960   LearningRate 0.0151   Epoch: 15   Global Step: 163180   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:54:01,027-Speed 5355.71 samples/sec   Loss 2.3905   LearningRate 0.0151   Epoch: 15   Global Step: 163190   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:54:08,601-Speed 5409.25 samples/sec   Loss 2.3587   LearningRate 0.0151   Epoch: 15   Global Step: 163200   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:54:16,255-Speed 5352.02 samples/sec   Loss 2.3985   LearningRate 0.0151   Epoch: 15   Global Step: 163210   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:54:23,817-Speed 5417.07 samples/sec   Loss 2.3527   LearningRate 0.0151   Epoch: 15   Global Step: 163220   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:54:31,276-Speed 5491.74 samples/sec   Loss 2.3498   LearningRate 0.0151   Epoch: 15   Global Step: 163230   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:54:38,782-Speed 5457.84 samples/sec   Loss 2.3570   LearningRate 0.0151   Epoch: 15   Global Step: 163240   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:54:46,268-Speed 5472.62 samples/sec   Loss 2.3485   LearningRate 0.0151   Epoch: 15   Global Step: 163250   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:54:53,793-Speed 5443.49 samples/sec   Loss 2.3677   LearningRate 0.0150   Epoch: 15   Global Step: 163260   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:55:01,375-Speed 5403.01 samples/sec   Loss 2.3802   LearningRate 0.0150   Epoch: 15   Global Step: 163270   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:55:08,896-Speed 5446.59 samples/sec   Loss 2.3202   LearningRate 0.0150   Epoch: 15   Global Step: 163280   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:55:16,434-Speed 5435.20 samples/sec   Loss 2.3533   LearningRate 0.0150   Epoch: 15   Global Step: 163290   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:55:23,962-Speed 5441.19 samples/sec   Loss 2.3775   LearningRate 0.0150   Epoch: 15   Global Step: 163300   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:55:31,478-Speed 5450.46 samples/sec   Loss 2.3203   LearningRate 0.0150   Epoch: 15   Global Step: 163310   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:55:38,967-Speed 5469.74 samples/sec   Loss 2.3474   LearningRate 0.0150   Epoch: 15   Global Step: 163320   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:55:46,473-Speed 5458.30 samples/sec   Loss 2.3576   LearningRate 0.0150   Epoch: 15   Global Step: 163330   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:55:53,902-Speed 5513.57 samples/sec   Loss 2.3698   LearningRate 0.0150   Epoch: 15   Global Step: 163340   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:56:01,390-Speed 5471.10 samples/sec   Loss 2.3188   LearningRate 0.0150   Epoch: 15   Global Step: 163350   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:56:08,889-Speed 5462.69 samples/sec   Loss 2.3276   LearningRate 0.0150   Epoch: 15   Global Step: 163360   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:56:16,396-Speed 5457.24 samples/sec   Loss 2.3583   LearningRate 0.0150   Epoch: 15   Global Step: 163370   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:56:23,944-Speed 5427.25 samples/sec   Loss 2.2903   LearningRate 0.0150   Epoch: 15   Global Step: 163380   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:56:31,522-Speed 5405.71 samples/sec   Loss 2.3518   LearningRate 0.0150   Epoch: 15   Global Step: 163390   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:56:39,059-Speed 5434.66 samples/sec   Loss 2.3628   LearningRate 0.0150   Epoch: 15   Global Step: 163400   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:56:46,574-Speed 5451.60 samples/sec   Loss 2.3608   LearningRate 0.0149   Epoch: 15   Global Step: 163410   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:56:54,047-Speed 5481.34 samples/sec   Loss 2.3604   LearningRate 0.0149   Epoch: 15   Global Step: 163420   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:57:01,566-Speed 5448.04 samples/sec   Loss 2.3728   LearningRate 0.0149   Epoch: 15   Global Step: 163430   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:57:09,067-Speed 5461.89 samples/sec   Loss 2.3706   LearningRate 0.0149   Epoch: 15   Global Step: 163440   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:57:16,545-Speed 5477.83 samples/sec   Loss 2.3397   LearningRate 0.0149   Epoch: 15   Global Step: 163450   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:57:24,028-Speed 5474.78 samples/sec   Loss 2.3610   LearningRate 0.0149   Epoch: 15   Global Step: 163460   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:57:31,455-Speed 5515.27 samples/sec   Loss 2.3479   LearningRate 0.0149   Epoch: 15   Global Step: 163470   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:57:39,007-Speed 5424.66 samples/sec   Loss 2.3380   LearningRate 0.0149   Epoch: 15   Global Step: 163480   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:57:46,556-Speed 5426.23 samples/sec   Loss 2.3367   LearningRate 0.0149   Epoch: 15   Global Step: 163490   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:57:54,019-Speed 5489.26 samples/sec   Loss 2.3597   LearningRate 0.0149   Epoch: 15   Global Step: 163500   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:58:01,583-Speed 5415.85 samples/sec   Loss 2.3567   LearningRate 0.0149   Epoch: 15   Global Step: 163510   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:58:09,060-Speed 5478.68 samples/sec   Loss 2.3387   LearningRate 0.0149   Epoch: 15   Global Step: 163520   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:58:16,666-Speed 5386.13 samples/sec   Loss 2.3412   LearningRate 0.0149   Epoch: 15   Global Step: 163530   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:58:24,170-Speed 5459.58 samples/sec   Loss 2.3473   LearningRate 0.0149   Epoch: 15   Global Step: 163540   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:58:31,738-Speed 5412.71 samples/sec   Loss 2.3214   LearningRate 0.0148   Epoch: 15   Global Step: 163550   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:58:39,270-Speed 5438.18 samples/sec   Loss 2.3682   LearningRate 0.0148   Epoch: 15   Global Step: 163560   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:58:46,759-Speed 5470.01 samples/sec   Loss 2.3233   LearningRate 0.0148   Epoch: 15   Global Step: 163570   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:58:54,234-Speed 5481.30 samples/sec   Loss 2.3385   LearningRate 0.0148   Epoch: 15   Global Step: 163580   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:59:01,732-Speed 5462.96 samples/sec   Loss 2.3372   LearningRate 0.0148   Epoch: 15   Global Step: 163590   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:59:09,302-Speed 5411.05 samples/sec   Loss 2.3526   LearningRate 0.0148   Epoch: 15   Global Step: 163600   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:59:16,851-Speed 5426.79 samples/sec   Loss 2.3310   LearningRate 0.0148   Epoch: 15   Global Step: 163610   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 07:59:24,382-Speed 5439.72 samples/sec   Loss 2.3647   LearningRate 0.0148   Epoch: 15   Global Step: 163620   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:59:31,841-Speed 5492.14 samples/sec   Loss 2.3493   LearningRate 0.0148   Epoch: 15   Global Step: 163630   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:59:39,303-Speed 5489.46 samples/sec   Loss 2.3673   LearningRate 0.0148   Epoch: 15   Global Step: 163640   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:59:46,771-Speed 5486.02 samples/sec   Loss 2.3880   LearningRate 0.0148   Epoch: 15   Global Step: 163650   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 07:59:54,240-Speed 5484.74 samples/sec   Loss 2.3629   LearningRate 0.0148   Epoch: 15   Global Step: 163660   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:00:01,696-Speed 5494.20 samples/sec   Loss 2.3572   LearningRate 0.0148   Epoch: 15   Global Step: 163670   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:00:09,189-Speed 5467.08 samples/sec   Loss 2.3581   LearningRate 0.0148   Epoch: 15   Global Step: 163680   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:00:16,724-Speed 5436.65 samples/sec   Loss 2.3915   LearningRate 0.0148   Epoch: 15   Global Step: 163690   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:00:24,218-Speed 5466.40 samples/sec   Loss 2.3027   LearningRate 0.0147   Epoch: 15   Global Step: 163700   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:00:31,707-Speed 5470.08 samples/sec   Loss 2.3387   LearningRate 0.0147   Epoch: 15   Global Step: 163710   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:00:39,280-Speed 5409.25 samples/sec   Loss 2.3212   LearningRate 0.0147   Epoch: 15   Global Step: 163720   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:00:46,777-Speed 5463.84 samples/sec   Loss 2.3333   LearningRate 0.0147   Epoch: 15   Global Step: 163730   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:00:54,312-Speed 5437.30 samples/sec   Loss 2.3417   LearningRate 0.0147   Epoch: 15   Global Step: 163740   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:01:01,885-Speed 5408.95 samples/sec   Loss 2.3181   LearningRate 0.0147   Epoch: 15   Global Step: 163750   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:01:09,412-Speed 5442.67 samples/sec   Loss 2.3527   LearningRate 0.0147   Epoch: 15   Global Step: 163760   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:01:16,994-Speed 5403.18 samples/sec   Loss 2.3167   LearningRate 0.0147   Epoch: 15   Global Step: 163770   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:01:24,470-Speed 5479.54 samples/sec   Loss 2.3334   LearningRate 0.0147   Epoch: 15   Global Step: 163780   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:01:31,969-Speed 5462.53 samples/sec   Loss 2.3696   LearningRate 0.0147   Epoch: 15   Global Step: 163790   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:01:39,498-Speed 5440.66 samples/sec   Loss 2.3006   LearningRate 0.0147   Epoch: 15   Global Step: 163800   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:01:47,151-Speed 5353.16 samples/sec   Loss 2.3355   LearningRate 0.0147   Epoch: 15   Global Step: 163810   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:01:54,700-Speed 5427.13 samples/sec   Loss 2.3424   LearningRate 0.0147   Epoch: 15   Global Step: 163820   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:02:02,178-Speed 5478.17 samples/sec   Loss 2.3342   LearningRate 0.0147   Epoch: 15   Global Step: 163830   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:02:09,649-Speed 5483.03 samples/sec   Loss 2.3533   LearningRate 0.0147   Epoch: 15   Global Step: 163840   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:02:17,180-Speed 5439.38 samples/sec   Loss 2.3416   LearningRate 0.0146   Epoch: 15   Global Step: 163850   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:02:24,675-Speed 5465.90 samples/sec   Loss 2.3389   LearningRate 0.0146   Epoch: 15   Global Step: 163860   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:02:32,183-Speed 5456.53 samples/sec   Loss 2.3035   LearningRate 0.0146   Epoch: 15   Global Step: 163870   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:02:39,661-Speed 5477.71 samples/sec   Loss 2.3197   LearningRate 0.0146   Epoch: 15   Global Step: 163880   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:02:47,356-Speed 5323.82 samples/sec   Loss 2.2983   LearningRate 0.0146   Epoch: 15   Global Step: 163890   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:02:54,855-Speed 5463.16 samples/sec   Loss 2.3598   LearningRate 0.0146   Epoch: 15   Global Step: 163900   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:03:02,379-Speed 5444.86 samples/sec   Loss 2.3486   LearningRate 0.0146   Epoch: 15   Global Step: 163910   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:03:09,843-Speed 5488.02 samples/sec   Loss 2.3322   LearningRate 0.0146   Epoch: 15   Global Step: 163920   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:03:17,269-Speed 5516.49 samples/sec   Loss 2.3096   LearningRate 0.0146   Epoch: 15   Global Step: 163930   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:03:24,766-Speed 5464.66 samples/sec   Loss 2.3229   LearningRate 0.0146   Epoch: 15   Global Step: 163940   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:03:32,171-Speed 5532.01 samples/sec   Loss 2.3159   LearningRate 0.0146   Epoch: 15   Global Step: 163950   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:03:39,645-Speed 5480.50 samples/sec   Loss 2.3003   LearningRate 0.0146   Epoch: 15   Global Step: 163960   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:03:47,112-Speed 5486.98 samples/sec   Loss 2.3121   LearningRate 0.0146   Epoch: 15   Global Step: 163970   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:03:54,681-Speed 5412.01 samples/sec   Loss 2.3092   LearningRate 0.0146   Epoch: 15   Global Step: 163980   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:04:02,182-Speed 5461.83 samples/sec   Loss 2.3304   LearningRate 0.0146   Epoch: 15   Global Step: 163990   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:04:09,776-Speed 5394.34 samples/sec   Loss 2.3588   LearningRate 0.0145   Epoch: 15   Global Step: 164000   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:04:54,381-[lfw][164000]XNorm: 22.614009
Training: 2022-01-09 08:04:54,382-[lfw][164000]Accuracy-Flip: 0.99833+-0.00236
Training: 2022-01-09 08:04:54,382-[lfw][164000]Accuracy-Highest: 0.99833
Training: 2022-01-09 08:05:46,446-[cfp_fp][164000]XNorm: 21.485216
Training: 2022-01-09 08:05:46,447-[cfp_fp][164000]Accuracy-Flip: 0.99214+-0.00430
Training: 2022-01-09 08:05:46,447-[cfp_fp][164000]Accuracy-Highest: 0.99371
Training: 2022-01-09 08:06:30,871-[agedb_30][164000]XNorm: 22.797398
Training: 2022-01-09 08:06:30,872-[agedb_30][164000]Accuracy-Flip: 0.98183+-0.00751
Training: 2022-01-09 08:06:30,872-[agedb_30][164000]Accuracy-Highest: 0.98333
Training: 2022-01-09 08:06:38,523-Speed 275.37 samples/sec   Loss 2.3075   LearningRate 0.0145   Epoch: 15   Global Step: 164010   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:06:46,030-Speed 5456.70 samples/sec   Loss 2.3199   LearningRate 0.0145   Epoch: 15   Global Step: 164020   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:06:53,572-Speed 5431.89 samples/sec   Loss 2.3144   LearningRate 0.0145   Epoch: 15   Global Step: 164030   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:07:01,113-Speed 5432.27 samples/sec   Loss 2.3459   LearningRate 0.0145   Epoch: 15   Global Step: 164040   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:07:08,672-Speed 5419.05 samples/sec   Loss 2.3401   LearningRate 0.0145   Epoch: 15   Global Step: 164050   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:07:16,155-Speed 5474.50 samples/sec   Loss 2.3077   LearningRate 0.0145   Epoch: 15   Global Step: 164060   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:07:23,607-Speed 5497.39 samples/sec   Loss 2.3058   LearningRate 0.0145   Epoch: 15   Global Step: 164070   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:07:31,211-Speed 5387.55 samples/sec   Loss 2.3156   LearningRate 0.0145   Epoch: 15   Global Step: 164080   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:07:38,690-Speed 5476.93 samples/sec   Loss 2.3164   LearningRate 0.0145   Epoch: 15   Global Step: 164090   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:07:46,201-Speed 5454.30 samples/sec   Loss 2.3094   LearningRate 0.0145   Epoch: 15   Global Step: 164100   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:07:53,703-Speed 5460.21 samples/sec   Loss 2.3398   LearningRate 0.0145   Epoch: 15   Global Step: 164110   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:08:01,319-Speed 5379.43 samples/sec   Loss 2.3160   LearningRate 0.0145   Epoch: 15   Global Step: 164120   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:08:08,811-Speed 5467.84 samples/sec   Loss 2.3292   LearningRate 0.0145   Epoch: 15   Global Step: 164130   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:08:16,364-Speed 5423.33 samples/sec   Loss 2.3070   LearningRate 0.0145   Epoch: 15   Global Step: 164140   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:08:23,950-Speed 5400.03 samples/sec   Loss 2.3193   LearningRate 0.0144   Epoch: 15   Global Step: 164150   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:08:31,582-Speed 5367.43 samples/sec   Loss 2.3433   LearningRate 0.0144   Epoch: 15   Global Step: 164160   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:08:39,075-Speed 5466.96 samples/sec   Loss 2.2948   LearningRate 0.0144   Epoch: 15   Global Step: 164170   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:08:46,525-Speed 5498.80 samples/sec   Loss 2.3279   LearningRate 0.0144   Epoch: 15   Global Step: 164180   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:08:54,021-Speed 5465.09 samples/sec   Loss 2.3414   LearningRate 0.0144   Epoch: 15   Global Step: 164190   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:09:01,592-Speed 5411.34 samples/sec   Loss 2.3320   LearningRate 0.0144   Epoch: 15   Global Step: 164200   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:09:09,200-Speed 5384.40 samples/sec   Loss 2.3177   LearningRate 0.0144   Epoch: 15   Global Step: 164210   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:09:16,823-Speed 5373.44 samples/sec   Loss 2.3048   LearningRate 0.0144   Epoch: 15   Global Step: 164220   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:09:24,481-Speed 5349.74 samples/sec   Loss 2.3072   LearningRate 0.0144   Epoch: 15   Global Step: 164230   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:09:32,094-Speed 5381.29 samples/sec   Loss 2.3149   LearningRate 0.0144   Epoch: 15   Global Step: 164240   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:09:39,722-Speed 5369.90 samples/sec   Loss 2.3116   LearningRate 0.0144   Epoch: 15   Global Step: 164250   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:09:47,231-Speed 5455.29 samples/sec   Loss 2.2914   LearningRate 0.0144   Epoch: 15   Global Step: 164260   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:09:54,801-Speed 5412.15 samples/sec   Loss 2.2647   LearningRate 0.0144   Epoch: 15   Global Step: 164270   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:10:02,424-Speed 5373.44 samples/sec   Loss 2.2985   LearningRate 0.0144   Epoch: 15   Global Step: 164280   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:10:09,951-Speed 5442.48 samples/sec   Loss 2.3100   LearningRate 0.0144   Epoch: 15   Global Step: 164290   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:10:17,483-Speed 5439.18 samples/sec   Loss 2.3140   LearningRate 0.0143   Epoch: 15   Global Step: 164300   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:10:24,977-Speed 5466.57 samples/sec   Loss 2.2965   LearningRate 0.0143   Epoch: 15   Global Step: 164310   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:10:32,538-Speed 5417.80 samples/sec   Loss 2.2793   LearningRate 0.0143   Epoch: 15   Global Step: 164320   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:10:40,103-Speed 5414.98 samples/sec   Loss 2.3119   LearningRate 0.0143   Epoch: 15   Global Step: 164330   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:10:47,596-Speed 5467.39 samples/sec   Loss 2.2913   LearningRate 0.0143   Epoch: 15   Global Step: 164340   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:10:55,078-Speed 5474.85 samples/sec   Loss 2.3134   LearningRate 0.0143   Epoch: 15   Global Step: 164350   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:11:02,614-Speed 5436.30 samples/sec   Loss 2.2949   LearningRate 0.0143   Epoch: 15   Global Step: 164360   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-01-09 08:11:13,478-Speed 3770.33 samples/sec   Loss 2.2940   LearningRate 0.0143   Epoch: 15   Global Step: 164370   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-01-09 08:11:21,028-Speed 5426.00 samples/sec   Loss 2.3177   LearningRate 0.0143   Epoch: 15   Global Step: 164380   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-01-09 08:11:28,526-Speed 5463.77 samples/sec   Loss 2.3167   LearningRate 0.0143   Epoch: 15   Global Step: 164390   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-01-09 08:11:36,049-Speed 5444.89 samples/sec   Loss 2.3438   LearningRate 0.0143   Epoch: 15   Global Step: 164400   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-01-09 08:11:43,576-Speed 5442.57 samples/sec   Loss 2.3423   LearningRate 0.0143   Epoch: 15   Global Step: 164410   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-01-09 08:11:51,126-Speed 5426.44 samples/sec   Loss 2.3673   LearningRate 0.0143   Epoch: 15   Global Step: 164420   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-01-09 08:11:58,651-Speed 5443.63 samples/sec   Loss 2.3258   LearningRate 0.0143   Epoch: 15   Global Step: 164430   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-01-09 08:12:06,176-Speed 5443.68 samples/sec   Loss 2.3085   LearningRate 0.0143   Epoch: 15   Global Step: 164440   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-01-09 08:12:13,670-Speed 5466.57 samples/sec   Loss 2.2811   LearningRate 0.0142   Epoch: 15   Global Step: 164450   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-01-09 08:12:21,171-Speed 5461.34 samples/sec   Loss 2.3177   LearningRate 0.0142   Epoch: 15   Global Step: 164460   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:12:28,667-Speed 5465.28 samples/sec   Loss 2.2982   LearningRate 0.0142   Epoch: 15   Global Step: 164470   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:12:36,129-Speed 5489.93 samples/sec   Loss 2.3074   LearningRate 0.0142   Epoch: 15   Global Step: 164480   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:12:43,613-Speed 5473.55 samples/sec   Loss 2.3122   LearningRate 0.0142   Epoch: 15   Global Step: 164490   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:12:51,184-Speed 5410.87 samples/sec   Loss 2.3064   LearningRate 0.0142   Epoch: 15   Global Step: 164500   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:12:58,724-Speed 5433.76 samples/sec   Loss 2.2706   LearningRate 0.0142   Epoch: 15   Global Step: 164510   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:13:06,254-Speed 5440.04 samples/sec   Loss 2.2742   LearningRate 0.0142   Epoch: 15   Global Step: 164520   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:13:13,958-Speed 5317.63 samples/sec   Loss 2.2922   LearningRate 0.0142   Epoch: 15   Global Step: 164530   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:13:21,453-Speed 5465.26 samples/sec   Loss 2.2795   LearningRate 0.0142   Epoch: 15   Global Step: 164540   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:13:28,959-Speed 5458.59 samples/sec   Loss 2.2812   LearningRate 0.0142   Epoch: 15   Global Step: 164550   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:13:36,497-Speed 5433.77 samples/sec   Loss 2.2658   LearningRate 0.0142   Epoch: 15   Global Step: 164560   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:13:43,924-Speed 5516.03 samples/sec   Loss 2.3081   LearningRate 0.0142   Epoch: 15   Global Step: 164570   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:13:51,534-Speed 5382.85 samples/sec   Loss 2.3252   LearningRate 0.0142   Epoch: 15   Global Step: 164580   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:13:59,056-Speed 5446.96 samples/sec   Loss 2.3067   LearningRate 0.0142   Epoch: 15   Global Step: 164590   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:14:06,564-Speed 5455.66 samples/sec   Loss 2.3034   LearningRate 0.0141   Epoch: 15   Global Step: 164600   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:14:14,116-Speed 5424.18 samples/sec   Loss 2.2610   LearningRate 0.0141   Epoch: 15   Global Step: 164610   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:14:21,644-Speed 5442.29 samples/sec   Loss 2.2952   LearningRate 0.0141   Epoch: 15   Global Step: 164620   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:14:29,231-Speed 5399.21 samples/sec   Loss 2.2842   LearningRate 0.0141   Epoch: 15   Global Step: 164630   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:14:36,748-Speed 5449.85 samples/sec   Loss 2.2588   LearningRate 0.0141   Epoch: 15   Global Step: 164640   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:14:44,315-Speed 5413.09 samples/sec   Loss 2.2833   LearningRate 0.0141   Epoch: 15   Global Step: 164650   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:14:51,836-Speed 5447.13 samples/sec   Loss 2.2755   LearningRate 0.0141   Epoch: 15   Global Step: 164660   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:14:59,531-Speed 5323.66 samples/sec   Loss 2.2916   LearningRate 0.0141   Epoch: 15   Global Step: 164670   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:15:07,159-Speed 5370.46 samples/sec   Loss 2.2741   LearningRate 0.0141   Epoch: 15   Global Step: 164680   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:15:14,686-Speed 5441.85 samples/sec   Loss 2.2925   LearningRate 0.0141   Epoch: 15   Global Step: 164690   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:15:22,166-Speed 5476.46 samples/sec   Loss 2.2677   LearningRate 0.0141   Epoch: 15   Global Step: 164700   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:15:29,767-Speed 5390.32 samples/sec   Loss 2.2903   LearningRate 0.0141   Epoch: 15   Global Step: 164710   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:15:37,287-Speed 5447.29 samples/sec   Loss 2.3011   LearningRate 0.0141   Epoch: 15   Global Step: 164720   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:15:44,741-Speed 5495.12 samples/sec   Loss 2.2701   LearningRate 0.0141   Epoch: 15   Global Step: 164730   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:15:52,289-Speed 5427.89 samples/sec   Loss 2.3066   LearningRate 0.0141   Epoch: 15   Global Step: 164740   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:15:59,759-Speed 5483.88 samples/sec   Loss 2.2971   LearningRate 0.0140   Epoch: 15   Global Step: 164750   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:16:07,275-Speed 5450.63 samples/sec   Loss 2.2775   LearningRate 0.0140   Epoch: 15   Global Step: 164760   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:16:14,834-Speed 5418.68 samples/sec   Loss 2.2661   LearningRate 0.0140   Epoch: 15   Global Step: 164770   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:16:22,352-Speed 5449.00 samples/sec   Loss 2.2719   LearningRate 0.0140   Epoch: 15   Global Step: 164780   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:16:29,897-Speed 5429.73 samples/sec   Loss 2.2942   LearningRate 0.0140   Epoch: 15   Global Step: 164790   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:16:37,453-Speed 5421.54 samples/sec   Loss 2.2819   LearningRate 0.0140   Epoch: 15   Global Step: 164800   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:16:44,912-Speed 5492.07 samples/sec   Loss 2.3035   LearningRate 0.0140   Epoch: 15   Global Step: 164810   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:16:52,608-Speed 5323.16 samples/sec   Loss 2.2923   LearningRate 0.0140   Epoch: 15   Global Step: 164820   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:17:00,127-Speed 5448.12 samples/sec   Loss 2.2822   LearningRate 0.0140   Epoch: 15   Global Step: 164830   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:17:07,653-Speed 5442.84 samples/sec   Loss 2.2710   LearningRate 0.0140   Epoch: 15   Global Step: 164840   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:17:15,192-Speed 5434.30 samples/sec   Loss 2.2497   LearningRate 0.0140   Epoch: 15   Global Step: 164850   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:17:22,777-Speed 5400.41 samples/sec   Loss 2.3055   LearningRate 0.0140   Epoch: 15   Global Step: 164860   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:17:30,242-Speed 5488.54 samples/sec   Loss 2.3096   LearningRate 0.0140   Epoch: 15   Global Step: 164870   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:17:37,844-Speed 5388.52 samples/sec   Loss 2.2809   LearningRate 0.0140   Epoch: 15   Global Step: 164880   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:17:45,492-Speed 5356.05 samples/sec   Loss 2.3119   LearningRate 0.0140   Epoch: 15   Global Step: 164890   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:17:53,115-Speed 5374.22 samples/sec   Loss 2.2447   LearningRate 0.0139   Epoch: 15   Global Step: 164900   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:18:00,695-Speed 5404.11 samples/sec   Loss 2.2698   LearningRate 0.0139   Epoch: 15   Global Step: 164910   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:18:08,231-Speed 5436.73 samples/sec   Loss 2.2949   LearningRate 0.0139   Epoch: 15   Global Step: 164920   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:18:15,905-Speed 5337.81 samples/sec   Loss 2.2726   LearningRate 0.0139   Epoch: 15   Global Step: 164930   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:18:23,482-Speed 5406.67 samples/sec   Loss 2.2792   LearningRate 0.0139   Epoch: 15   Global Step: 164940   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:18:31,099-Speed 5377.87 samples/sec   Loss 2.2714   LearningRate 0.0139   Epoch: 15   Global Step: 164950   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:18:38,570-Speed 5483.85 samples/sec   Loss 2.2796   LearningRate 0.0139   Epoch: 15   Global Step: 164960   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:18:46,117-Speed 5427.59 samples/sec   Loss 2.2940   LearningRate 0.0139   Epoch: 15   Global Step: 164970   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:18:53,758-Speed 5361.53 samples/sec   Loss 2.2566   LearningRate 0.0139   Epoch: 15   Global Step: 164980   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:19:01,335-Speed 5406.74 samples/sec   Loss 2.2460   LearningRate 0.0139   Epoch: 15   Global Step: 164990   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:19:08,880-Speed 5429.76 samples/sec   Loss 2.2477   LearningRate 0.0139   Epoch: 15   Global Step: 165000   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:19:16,681-Speed 5251.00 samples/sec   Loss 2.2238   LearningRate 0.0139   Epoch: 15   Global Step: 165010   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:19:24,276-Speed 5393.84 samples/sec   Loss 2.3060   LearningRate 0.0139   Epoch: 15   Global Step: 165020   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:19:31,804-Speed 5441.24 samples/sec   Loss 2.2709   LearningRate 0.0139   Epoch: 15   Global Step: 165030   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:19:39,296-Speed 5468.34 samples/sec   Loss 2.2596   LearningRate 0.0139   Epoch: 15   Global Step: 165040   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 08:19:46,710-Speed 5525.64 samples/sec   Loss 2.2809   LearningRate 0.0138   Epoch: 15   Global Step: 165050   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:19:54,223-Speed 5451.87 samples/sec   Loss 2.2809   LearningRate 0.0138   Epoch: 15   Global Step: 165060   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:20:01,808-Speed 5400.84 samples/sec   Loss 2.2666   LearningRate 0.0138   Epoch: 15   Global Step: 165070   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:20:09,296-Speed 5471.19 samples/sec   Loss 2.2893   LearningRate 0.0138   Epoch: 15   Global Step: 165080   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:20:16,835-Speed 5434.28 samples/sec   Loss 2.2437   LearningRate 0.0138   Epoch: 15   Global Step: 165090   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:20:24,389-Speed 5422.15 samples/sec   Loss 2.3016   LearningRate 0.0138   Epoch: 15   Global Step: 165100   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:20:31,934-Speed 5429.30 samples/sec   Loss 2.2633   LearningRate 0.0138   Epoch: 15   Global Step: 165110   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:20:39,367-Speed 5511.46 samples/sec   Loss 2.2889   LearningRate 0.0138   Epoch: 15   Global Step: 165120   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:20:46,989-Speed 5375.08 samples/sec   Loss 2.2779   LearningRate 0.0138   Epoch: 15   Global Step: 165130   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:20:54,574-Speed 5400.70 samples/sec   Loss 2.2685   LearningRate 0.0138   Epoch: 15   Global Step: 165140   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:21:02,088-Speed 5451.96 samples/sec   Loss 2.2144   LearningRate 0.0138   Epoch: 15   Global Step: 165150   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:21:09,598-Speed 5454.27 samples/sec   Loss 2.2763   LearningRate 0.0138   Epoch: 15   Global Step: 165160   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:21:17,231-Speed 5367.08 samples/sec   Loss 2.2578   LearningRate 0.0138   Epoch: 15   Global Step: 165170   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:21:24,740-Speed 5456.04 samples/sec   Loss 2.2182   LearningRate 0.0138   Epoch: 15   Global Step: 165180   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:21:32,370-Speed 5368.56 samples/sec   Loss 2.2535   LearningRate 0.0138   Epoch: 15   Global Step: 165190   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:21:39,863-Speed 5467.08 samples/sec   Loss 2.3030   LearningRate 0.0138   Epoch: 15   Global Step: 165200   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:21:47,406-Speed 5431.15 samples/sec   Loss 2.2580   LearningRate 0.0137   Epoch: 15   Global Step: 165210   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:21:54,876-Speed 5483.92 samples/sec   Loss 2.2639   LearningRate 0.0137   Epoch: 15   Global Step: 165220   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:22:02,393-Speed 5449.74 samples/sec   Loss 2.2375   LearningRate 0.0137   Epoch: 15   Global Step: 165230   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:22:09,873-Speed 5477.02 samples/sec   Loss 2.2310   LearningRate 0.0137   Epoch: 15   Global Step: 165240   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:22:17,374-Speed 5461.70 samples/sec   Loss 2.2755   LearningRate 0.0137   Epoch: 15   Global Step: 165250   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:22:24,847-Speed 5481.82 samples/sec   Loss 2.2426   LearningRate 0.0137   Epoch: 15   Global Step: 165260   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:22:32,280-Speed 5510.79 samples/sec   Loss 2.2664   LearningRate 0.0137   Epoch: 15   Global Step: 165270   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:22:39,884-Speed 5387.57 samples/sec   Loss 2.2991   LearningRate 0.0137   Epoch: 15   Global Step: 165280   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:22:47,382-Speed 5463.32 samples/sec   Loss 2.2188   LearningRate 0.0137   Epoch: 15   Global Step: 165290   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:22:54,871-Speed 5470.44 samples/sec   Loss 2.2672   LearningRate 0.0137   Epoch: 15   Global Step: 165300   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:23:02,366-Speed 5465.14 samples/sec   Loss 2.2513   LearningRate 0.0137   Epoch: 15   Global Step: 165310   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:23:09,916-Speed 5426.57 samples/sec   Loss 2.2374   LearningRate 0.0137   Epoch: 15   Global Step: 165320   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:23:17,489-Speed 5409.40 samples/sec   Loss 2.2429   LearningRate 0.0137   Epoch: 15   Global Step: 165330   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:23:25,144-Speed 5351.41 samples/sec   Loss 2.2924   LearningRate 0.0137   Epoch: 15   Global Step: 165340   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:23:32,713-Speed 5412.41 samples/sec   Loss 2.2581   LearningRate 0.0137   Epoch: 15   Global Step: 165350   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:23:40,257-Speed 5429.87 samples/sec   Loss 2.2635   LearningRate 0.0136   Epoch: 15   Global Step: 165360   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:23:47,729-Speed 5482.21 samples/sec   Loss 2.2578   LearningRate 0.0136   Epoch: 15   Global Step: 165370   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:23:55,202-Speed 5481.82 samples/sec   Loss 2.2465   LearningRate 0.0136   Epoch: 15   Global Step: 165380   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:24:02,690-Speed 5470.80 samples/sec   Loss 2.2547   LearningRate 0.0136   Epoch: 15   Global Step: 165390   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:24:10,277-Speed 5399.71 samples/sec   Loss 2.2781   LearningRate 0.0136   Epoch: 15   Global Step: 165400   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:24:17,764-Speed 5471.34 samples/sec   Loss 2.2675   LearningRate 0.0136   Epoch: 15   Global Step: 165410   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:24:25,241-Speed 5478.65 samples/sec   Loss 2.2409   LearningRate 0.0136   Epoch: 15   Global Step: 165420   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:24:32,914-Speed 5339.16 samples/sec   Loss 2.2458   LearningRate 0.0136   Epoch: 15   Global Step: 165430   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:24:40,492-Speed 5405.56 samples/sec   Loss 2.2698   LearningRate 0.0136   Epoch: 15   Global Step: 165440   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:24:47,986-Speed 5466.23 samples/sec   Loss 2.2128   LearningRate 0.0136   Epoch: 15   Global Step: 165450   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:24:55,477-Speed 5469.33 samples/sec   Loss 2.2337   LearningRate 0.0136   Epoch: 15   Global Step: 165460   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:25:02,974-Speed 5464.13 samples/sec   Loss 2.2667   LearningRate 0.0136   Epoch: 15   Global Step: 165470   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:25:10,527-Speed 5423.69 samples/sec   Loss 2.2467   LearningRate 0.0136   Epoch: 15   Global Step: 165480   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:25:18,086-Speed 5419.54 samples/sec   Loss 2.2220   LearningRate 0.0136   Epoch: 15   Global Step: 165490   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:25:25,594-Speed 5456.08 samples/sec   Loss 2.2552   LearningRate 0.0136   Epoch: 15   Global Step: 165500   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:25:33,106-Speed 5452.99 samples/sec   Loss 2.2271   LearningRate 0.0136   Epoch: 15   Global Step: 165510   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:25:40,602-Speed 5465.37 samples/sec   Loss 2.2577   LearningRate 0.0135   Epoch: 15   Global Step: 165520   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 08:25:48,125-Speed 5445.05 samples/sec   Loss 2.2166   LearningRate 0.0135   Epoch: 15   Global Step: 165530   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:25:55,598-Speed 5481.32 samples/sec   Loss 2.2438   LearningRate 0.0135   Epoch: 15   Global Step: 165540   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:26:03,142-Speed 5430.70 samples/sec   Loss 2.2413   LearningRate 0.0135   Epoch: 15   Global Step: 165550   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:26:10,662-Speed 5447.64 samples/sec   Loss 2.2685   LearningRate 0.0135   Epoch: 15   Global Step: 165560   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:26:18,162-Speed 5462.18 samples/sec   Loss 2.2547   LearningRate 0.0135   Epoch: 15   Global Step: 165570   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:26:25,648-Speed 5472.09 samples/sec   Loss 2.2419   LearningRate 0.0135   Epoch: 15   Global Step: 165580   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:26:33,185-Speed 5435.54 samples/sec   Loss 2.2337   LearningRate 0.0135   Epoch: 15   Global Step: 165590   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:26:40,796-Speed 5381.97 samples/sec   Loss 2.2312   LearningRate 0.0135   Epoch: 15   Global Step: 165600   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:26:48,286-Speed 5469.40 samples/sec   Loss 2.2401   LearningRate 0.0135   Epoch: 15   Global Step: 165610   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:26:55,741-Speed 5495.71 samples/sec   Loss 2.2841   LearningRate 0.0135   Epoch: 15   Global Step: 165620   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:27:03,351-Speed 5382.70 samples/sec   Loss 2.2469   LearningRate 0.0135   Epoch: 15   Global Step: 165630   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:27:10,904-Speed 5423.44 samples/sec   Loss 2.2658   LearningRate 0.0135   Epoch: 15   Global Step: 165640   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:27:18,416-Speed 5454.09 samples/sec   Loss 2.2525   LearningRate 0.0135   Epoch: 15   Global Step: 165650   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:27:25,922-Speed 5457.03 samples/sec   Loss 2.2847   LearningRate 0.0135   Epoch: 15   Global Step: 165660   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:27:33,514-Speed 5395.69 samples/sec   Loss 2.2453   LearningRate 0.0134   Epoch: 15   Global Step: 165670   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:27:41,068-Speed 5422.67 samples/sec   Loss 2.2384   LearningRate 0.0134   Epoch: 15   Global Step: 165680   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:27:48,557-Speed 5470.27 samples/sec   Loss 2.2253   LearningRate 0.0134   Epoch: 15   Global Step: 165690   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:27:56,010-Speed 5497.29 samples/sec   Loss 2.2296   LearningRate 0.0134   Epoch: 15   Global Step: 165700   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:28:03,668-Speed 5348.85 samples/sec   Loss 2.2355   LearningRate 0.0134   Epoch: 15   Global Step: 165710   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:28:11,227-Speed 5419.36 samples/sec   Loss 2.2464   LearningRate 0.0134   Epoch: 15   Global Step: 165720   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:28:18,840-Speed 5380.81 samples/sec   Loss 2.2414   LearningRate 0.0134   Epoch: 15   Global Step: 165730   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:28:26,381-Speed 5432.64 samples/sec   Loss 2.2507   LearningRate 0.0134   Epoch: 15   Global Step: 165740   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:28:33,887-Speed 5457.35 samples/sec   Loss 2.2141   LearningRate 0.0134   Epoch: 15   Global Step: 165750   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:28:41,354-Speed 5486.24 samples/sec   Loss 2.2425   LearningRate 0.0134   Epoch: 15   Global Step: 165760   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:28:48,795-Speed 5506.39 samples/sec   Loss 2.2145   LearningRate 0.0134   Epoch: 15   Global Step: 165770   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:28:56,333-Speed 5434.21 samples/sec   Loss 2.2443   LearningRate 0.0134   Epoch: 15   Global Step: 165780   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:29:03,923-Speed 5397.29 samples/sec   Loss 2.2001   LearningRate 0.0134   Epoch: 15   Global Step: 165790   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:29:11,396-Speed 5481.40 samples/sec   Loss 2.2242   LearningRate 0.0134   Epoch: 15   Global Step: 165800   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 08:29:18,897-Speed 5461.22 samples/sec   Loss 2.2807   LearningRate 0.0134   Epoch: 15   Global Step: 165810   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:29:26,477-Speed 5404.66 samples/sec   Loss 2.2608   LearningRate 0.0134   Epoch: 15   Global Step: 165820   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:29:34,173-Speed 5322.99 samples/sec   Loss 2.2108   LearningRate 0.0133   Epoch: 15   Global Step: 165830   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:29:41,685-Speed 5453.40 samples/sec   Loss 2.2353   LearningRate 0.0133   Epoch: 15   Global Step: 165840   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:29:49,187-Speed 5459.80 samples/sec   Loss 2.2713   LearningRate 0.0133   Epoch: 15   Global Step: 165850   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:29:56,668-Speed 5476.48 samples/sec   Loss 2.2404   LearningRate 0.0133   Epoch: 15   Global Step: 165860   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:30:04,238-Speed 5411.80 samples/sec   Loss 2.2219   LearningRate 0.0133   Epoch: 15   Global Step: 165870   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:30:11,780-Speed 5431.01 samples/sec   Loss 2.2585   LearningRate 0.0133   Epoch: 15   Global Step: 165880   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:30:19,331-Speed 5425.12 samples/sec   Loss 2.2447   LearningRate 0.0133   Epoch: 15   Global Step: 165890   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 08:30:26,802-Speed 5483.36 samples/sec   Loss 2.2248   LearningRate 0.0133   Epoch: 15   Global Step: 165900   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:30:34,334-Speed 5439.30 samples/sec   Loss 2.2243   LearningRate 0.0133   Epoch: 15   Global Step: 165910   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:30:56,880-Speed 1816.77 samples/sec   Loss 2.2215   LearningRate 0.0133   Epoch: 16   Global Step: 165920   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:31:04,344-Speed 5488.79 samples/sec   Loss 2.1997   LearningRate 0.0133   Epoch: 16   Global Step: 165930   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:31:11,757-Speed 5526.14 samples/sec   Loss 2.2263   LearningRate 0.0133   Epoch: 16   Global Step: 165940   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:31:19,177-Speed 5520.82 samples/sec   Loss 2.2306   LearningRate 0.0133   Epoch: 16   Global Step: 165950   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:31:26,602-Speed 5517.43 samples/sec   Loss 2.2228   LearningRate 0.0133   Epoch: 16   Global Step: 165960   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:31:33,982-Speed 5550.89 samples/sec   Loss 2.2080   LearningRate 0.0133   Epoch: 16   Global Step: 165970   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:31:41,436-Speed 5496.10 samples/sec   Loss 2.2253   LearningRate 0.0132   Epoch: 16   Global Step: 165980   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:31:48,832-Speed 5538.74 samples/sec   Loss 2.2085   LearningRate 0.0132   Epoch: 16   Global Step: 165990   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:31:56,228-Speed 5539.03 samples/sec   Loss 2.2101   LearningRate 0.0132   Epoch: 16   Global Step: 166000   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:32:40,156-[lfw][166000]XNorm: 22.020784
Training: 2022-01-09 08:32:40,156-[lfw][166000]Accuracy-Flip: 0.99833+-0.00236
Training: 2022-01-09 08:32:40,157-[lfw][166000]Accuracy-Highest: 0.99833
Training: 2022-01-09 08:33:31,765-[cfp_fp][166000]XNorm: 21.145533
Training: 2022-01-09 08:33:31,766-[cfp_fp][166000]Accuracy-Flip: 0.99214+-0.00508
Training: 2022-01-09 08:33:31,766-[cfp_fp][166000]Accuracy-Highest: 0.99371
Training: 2022-01-09 08:34:15,727-[agedb_30][166000]XNorm: 22.243697
Training: 2022-01-09 08:34:15,728-[agedb_30][166000]Accuracy-Flip: 0.98317+-0.00724
Training: 2022-01-09 08:34:15,728-[agedb_30][166000]Accuracy-Highest: 0.98333
Training: 2022-01-09 08:34:23,058-Speed 278.96 samples/sec   Loss 2.1959   LearningRate 0.0132   Epoch: 16   Global Step: 166010   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:34:30,585-Speed 5442.74 samples/sec   Loss 2.2154   LearningRate 0.0132   Epoch: 16   Global Step: 166020   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:34:38,045-Speed 5490.82 samples/sec   Loss 2.2333   LearningRate 0.0132   Epoch: 16   Global Step: 166030   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:34:45,505-Speed 5491.62 samples/sec   Loss 2.2557   LearningRate 0.0132   Epoch: 16   Global Step: 166040   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:34:52,965-Speed 5492.13 samples/sec   Loss 2.2308   LearningRate 0.0132   Epoch: 16   Global Step: 166050   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:35:00,420-Speed 5494.33 samples/sec   Loss 2.1811   LearningRate 0.0132   Epoch: 16   Global Step: 166060   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:35:07,913-Speed 5467.22 samples/sec   Loss 2.1791   LearningRate 0.0132   Epoch: 16   Global Step: 166070   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:35:15,412-Speed 5463.65 samples/sec   Loss 2.2115   LearningRate 0.0132   Epoch: 16   Global Step: 166080   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:35:22,883-Speed 5482.99 samples/sec   Loss 2.2223   LearningRate 0.0132   Epoch: 16   Global Step: 166090   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:35:30,397-Speed 5451.54 samples/sec   Loss 2.2211   LearningRate 0.0132   Epoch: 16   Global Step: 166100   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:35:37,869-Speed 5482.64 samples/sec   Loss 2.1495   LearningRate 0.0132   Epoch: 16   Global Step: 166110   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:35:45,341-Speed 5482.91 samples/sec   Loss 2.2062   LearningRate 0.0132   Epoch: 16   Global Step: 166120   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:35:52,835-Speed 5466.76 samples/sec   Loss 2.2347   LearningRate 0.0132   Epoch: 16   Global Step: 166130   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:36:00,268-Speed 5511.07 samples/sec   Loss 2.1751   LearningRate 0.0131   Epoch: 16   Global Step: 166140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:36:07,780-Speed 5453.20 samples/sec   Loss 2.1991   LearningRate 0.0131   Epoch: 16   Global Step: 166150   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:36:15,277-Speed 5464.96 samples/sec   Loss 2.2094   LearningRate 0.0131   Epoch: 16   Global Step: 166160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:36:22,920-Speed 5359.73 samples/sec   Loss 2.2128   LearningRate 0.0131   Epoch: 16   Global Step: 166170   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:36:30,474-Speed 5422.64 samples/sec   Loss 2.1799   LearningRate 0.0131   Epoch: 16   Global Step: 166180   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:36:38,045-Speed 5410.86 samples/sec   Loss 2.2027   LearningRate 0.0131   Epoch: 16   Global Step: 166190   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:36:45,521-Speed 5479.84 samples/sec   Loss 2.2183   LearningRate 0.0131   Epoch: 16   Global Step: 166200   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:36:53,061-Speed 5432.74 samples/sec   Loss 2.2234   LearningRate 0.0131   Epoch: 16   Global Step: 166210   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:37:00,516-Speed 5495.30 samples/sec   Loss 2.1805   LearningRate 0.0131   Epoch: 16   Global Step: 166220   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:37:07,989-Speed 5481.70 samples/sec   Loss 2.2037   LearningRate 0.0131   Epoch: 16   Global Step: 166230   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:37:15,524-Speed 5436.76 samples/sec   Loss 2.1948   LearningRate 0.0131   Epoch: 16   Global Step: 166240   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:37:23,011-Speed 5471.59 samples/sec   Loss 2.1553   LearningRate 0.0131   Epoch: 16   Global Step: 166250   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:37:30,430-Speed 5521.36 samples/sec   Loss 2.1657   LearningRate 0.0131   Epoch: 16   Global Step: 166260   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:37:37,905-Speed 5480.59 samples/sec   Loss 2.1964   LearningRate 0.0131   Epoch: 16   Global Step: 166270   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:37:45,427-Speed 5445.64 samples/sec   Loss 2.2094   LearningRate 0.0131   Epoch: 16   Global Step: 166280   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:37:52,846-Speed 5522.63 samples/sec   Loss 2.2165   LearningRate 0.0131   Epoch: 16   Global Step: 166290   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:38:00,378-Speed 5438.97 samples/sec   Loss 2.1979   LearningRate 0.0130   Epoch: 16   Global Step: 166300   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:38:07,853-Speed 5479.94 samples/sec   Loss 2.1687   LearningRate 0.0130   Epoch: 16   Global Step: 166310   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:38:15,361-Speed 5455.96 samples/sec   Loss 2.2148   LearningRate 0.0130   Epoch: 16   Global Step: 166320   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:38:22,930-Speed 5412.53 samples/sec   Loss 2.1638   LearningRate 0.0130   Epoch: 16   Global Step: 166330   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:38:30,428-Speed 5463.66 samples/sec   Loss 2.1973   LearningRate 0.0130   Epoch: 16   Global Step: 166340   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:38:37,940-Speed 5453.47 samples/sec   Loss 2.1796   LearningRate 0.0130   Epoch: 16   Global Step: 166350   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:38:45,482-Speed 5431.84 samples/sec   Loss 2.2122   LearningRate 0.0130   Epoch: 16   Global Step: 166360   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:38:53,039-Speed 5421.01 samples/sec   Loss 2.1762   LearningRate 0.0130   Epoch: 16   Global Step: 166370   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:39:00,523-Speed 5474.24 samples/sec   Loss 2.2133   LearningRate 0.0130   Epoch: 16   Global Step: 166380   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:39:08,048-Speed 5443.48 samples/sec   Loss 2.1917   LearningRate 0.0130   Epoch: 16   Global Step: 166390   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:39:15,512-Speed 5488.36 samples/sec   Loss 2.1936   LearningRate 0.0130   Epoch: 16   Global Step: 166400   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:39:22,939-Speed 5516.42 samples/sec   Loss 2.2211   LearningRate 0.0130   Epoch: 16   Global Step: 166410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:39:30,428-Speed 5469.94 samples/sec   Loss 2.2365   LearningRate 0.0130   Epoch: 16   Global Step: 166420   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:39:37,929-Speed 5460.81 samples/sec   Loss 2.2044   LearningRate 0.0130   Epoch: 16   Global Step: 166430   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:39:45,423-Speed 5466.88 samples/sec   Loss 2.2037   LearningRate 0.0130   Epoch: 16   Global Step: 166440   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:39:52,913-Speed 5469.12 samples/sec   Loss 2.2340   LearningRate 0.0129   Epoch: 16   Global Step: 166450   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:40:00,316-Speed 5533.84 samples/sec   Loss 2.1877   LearningRate 0.0129   Epoch: 16   Global Step: 166460   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:40:07,843-Speed 5442.66 samples/sec   Loss 2.1991   LearningRate 0.0129   Epoch: 16   Global Step: 166470   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:40:15,371-Speed 5441.85 samples/sec   Loss 2.1952   LearningRate 0.0129   Epoch: 16   Global Step: 166480   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:40:22,963-Speed 5395.70 samples/sec   Loss 2.1876   LearningRate 0.0129   Epoch: 16   Global Step: 166490   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:40:30,430-Speed 5486.51 samples/sec   Loss 2.2331   LearningRate 0.0129   Epoch: 16   Global Step: 166500   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:40:37,885-Speed 5495.04 samples/sec   Loss 2.1899   LearningRate 0.0129   Epoch: 16   Global Step: 166510   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:40:45,401-Speed 5449.81 samples/sec   Loss 2.1949   LearningRate 0.0129   Epoch: 16   Global Step: 166520   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:40:52,899-Speed 5464.10 samples/sec   Loss 2.1893   LearningRate 0.0129   Epoch: 16   Global Step: 166530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:41:00,454-Speed 5422.38 samples/sec   Loss 2.2195   LearningRate 0.0129   Epoch: 16   Global Step: 166540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:41:07,942-Speed 5470.82 samples/sec   Loss 2.1667   LearningRate 0.0129   Epoch: 16   Global Step: 166550   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:41:15,430-Speed 5470.19 samples/sec   Loss 2.1732   LearningRate 0.0129   Epoch: 16   Global Step: 166560   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:41:22,914-Speed 5473.97 samples/sec   Loss 2.1971   LearningRate 0.0129   Epoch: 16   Global Step: 166570   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:41:30,356-Speed 5504.89 samples/sec   Loss 2.1702   LearningRate 0.0129   Epoch: 16   Global Step: 166580   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:41:37,876-Speed 5447.55 samples/sec   Loss 2.2122   LearningRate 0.0129   Epoch: 16   Global Step: 166590   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:41:45,388-Speed 5453.15 samples/sec   Loss 2.1957   LearningRate 0.0129   Epoch: 16   Global Step: 166600   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:41:52,994-Speed 5386.12 samples/sec   Loss 2.1840   LearningRate 0.0128   Epoch: 16   Global Step: 166610   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:42:00,641-Speed 5357.59 samples/sec   Loss 2.1979   LearningRate 0.0128   Epoch: 16   Global Step: 166620   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:42:08,235-Speed 5394.34 samples/sec   Loss 2.2118   LearningRate 0.0128   Epoch: 16   Global Step: 166630   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:42:15,920-Speed 5330.23 samples/sec   Loss 2.2081   LearningRate 0.0128   Epoch: 16   Global Step: 166640   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:42:23,423-Speed 5460.49 samples/sec   Loss 2.1699   LearningRate 0.0128   Epoch: 16   Global Step: 166650   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:42:30,825-Speed 5534.06 samples/sec   Loss 2.2002   LearningRate 0.0128   Epoch: 16   Global Step: 166660   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:42:38,404-Speed 5405.10 samples/sec   Loss 2.1704   LearningRate 0.0128   Epoch: 16   Global Step: 166670   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:42:46,060-Speed 5350.43 samples/sec   Loss 2.2162   LearningRate 0.0128   Epoch: 16   Global Step: 166680   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:42:53,704-Speed 5359.14 samples/sec   Loss 2.1745   LearningRate 0.0128   Epoch: 16   Global Step: 166690   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:43:01,125-Speed 5520.37 samples/sec   Loss 2.1866   LearningRate 0.0128   Epoch: 16   Global Step: 166700   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:43:08,642-Speed 5449.33 samples/sec   Loss 2.1979   LearningRate 0.0128   Epoch: 16   Global Step: 166710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:43:16,206-Speed 5416.36 samples/sec   Loss 2.2273   LearningRate 0.0128   Epoch: 16   Global Step: 166720   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:43:23,792-Speed 5399.90 samples/sec   Loss 2.1552   LearningRate 0.0128   Epoch: 16   Global Step: 166730   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:43:31,297-Speed 5458.71 samples/sec   Loss 2.2059   LearningRate 0.0128   Epoch: 16   Global Step: 166740   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:43:38,837-Speed 5432.76 samples/sec   Loss 2.2059   LearningRate 0.0128   Epoch: 16   Global Step: 166750   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:43:46,397-Speed 5418.72 samples/sec   Loss 2.1957   LearningRate 0.0128   Epoch: 16   Global Step: 166760   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:43:53,906-Speed 5455.09 samples/sec   Loss 2.1941   LearningRate 0.0127   Epoch: 16   Global Step: 166770   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:44:01,371-Speed 5488.19 samples/sec   Loss 2.1832   LearningRate 0.0127   Epoch: 16   Global Step: 166780   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:44:08,887-Speed 5450.45 samples/sec   Loss 2.1508   LearningRate 0.0127   Epoch: 16   Global Step: 166790   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:44:16,407-Speed 5447.42 samples/sec   Loss 2.1380   LearningRate 0.0127   Epoch: 16   Global Step: 166800   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:44:23,922-Speed 5451.29 samples/sec   Loss 2.1674   LearningRate 0.0127   Epoch: 16   Global Step: 166810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 08:44:31,478-Speed 5422.00 samples/sec   Loss 2.1770   LearningRate 0.0127   Epoch: 16   Global Step: 166820   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:44:38,964-Speed 5471.91 samples/sec   Loss 2.1719   LearningRate 0.0127   Epoch: 16   Global Step: 166830   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:44:46,724-Speed 5278.74 samples/sec   Loss 2.1850   LearningRate 0.0127   Epoch: 16   Global Step: 166840   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:44:54,345-Speed 5375.63 samples/sec   Loss 2.1559   LearningRate 0.0127   Epoch: 16   Global Step: 166850   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:45:01,911-Speed 5414.55 samples/sec   Loss 2.1680   LearningRate 0.0127   Epoch: 16   Global Step: 166860   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:45:09,420-Speed 5455.30 samples/sec   Loss 2.1920   LearningRate 0.0127   Epoch: 16   Global Step: 166870   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:45:17,031-Speed 5382.44 samples/sec   Loss 2.2027   LearningRate 0.0127   Epoch: 16   Global Step: 166880   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:45:24,495-Speed 5488.07 samples/sec   Loss 2.1799   LearningRate 0.0127   Epoch: 16   Global Step: 166890   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:45:32,030-Speed 5437.15 samples/sec   Loss 2.1393   LearningRate 0.0127   Epoch: 16   Global Step: 166900   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:45:39,565-Speed 5436.31 samples/sec   Loss 2.1786   LearningRate 0.0127   Epoch: 16   Global Step: 166910   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:45:47,151-Speed 5399.93 samples/sec   Loss 2.1896   LearningRate 0.0127   Epoch: 16   Global Step: 166920   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 08:45:54,645-Speed 5466.60 samples/sec   Loss 2.1532   LearningRate 0.0126   Epoch: 16   Global Step: 166930   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:46:02,174-Speed 5440.86 samples/sec   Loss 2.1582   LearningRate 0.0126   Epoch: 16   Global Step: 166940   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:46:09,778-Speed 5387.68 samples/sec   Loss 2.1727   LearningRate 0.0126   Epoch: 16   Global Step: 166950   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:46:17,496-Speed 5307.47 samples/sec   Loss 2.1619   LearningRate 0.0126   Epoch: 16   Global Step: 166960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:46:25,035-Speed 5434.03 samples/sec   Loss 2.1603   LearningRate 0.0126   Epoch: 16   Global Step: 166970   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:46:32,571-Speed 5436.12 samples/sec   Loss 2.1952   LearningRate 0.0126   Epoch: 16   Global Step: 166980   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:46:40,010-Speed 5506.08 samples/sec   Loss 2.1642   LearningRate 0.0126   Epoch: 16   Global Step: 166990   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:46:47,592-Speed 5403.23 samples/sec   Loss 2.1727   LearningRate 0.0126   Epoch: 16   Global Step: 167000   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:46:55,112-Speed 5447.85 samples/sec   Loss 2.1717   LearningRate 0.0126   Epoch: 16   Global Step: 167010   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:47:02,654-Speed 5431.25 samples/sec   Loss 2.1570   LearningRate 0.0126   Epoch: 16   Global Step: 167020   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:47:10,149-Speed 5465.48 samples/sec   Loss 2.1536   LearningRate 0.0126   Epoch: 16   Global Step: 167030   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:47:17,589-Speed 5506.14 samples/sec   Loss 2.1629   LearningRate 0.0126   Epoch: 16   Global Step: 167040   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:47:25,142-Speed 5424.10 samples/sec   Loss 2.1527   LearningRate 0.0126   Epoch: 16   Global Step: 167050   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:47:32,594-Speed 5497.42 samples/sec   Loss 2.1930   LearningRate 0.0126   Epoch: 16   Global Step: 167060   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:47:40,042-Speed 5500.29 samples/sec   Loss 2.1404   LearningRate 0.0126   Epoch: 16   Global Step: 167070   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:47:47,576-Speed 5437.25 samples/sec   Loss 2.1443   LearningRate 0.0126   Epoch: 16   Global Step: 167080   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:47:55,062-Speed 5472.69 samples/sec   Loss 2.1400   LearningRate 0.0125   Epoch: 16   Global Step: 167090   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:48:02,542-Speed 5476.77 samples/sec   Loss 2.1762   LearningRate 0.0125   Epoch: 16   Global Step: 167100   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:48:10,159-Speed 5377.63 samples/sec   Loss 2.1757   LearningRate 0.0125   Epoch: 16   Global Step: 167110   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:48:17,740-Speed 5404.24 samples/sec   Loss 2.1389   LearningRate 0.0125   Epoch: 16   Global Step: 167120   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:48:25,189-Speed 5499.44 samples/sec   Loss 2.1682   LearningRate 0.0125   Epoch: 16   Global Step: 167130   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:48:32,623-Speed 5510.84 samples/sec   Loss 2.1743   LearningRate 0.0125   Epoch: 16   Global Step: 167140   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:48:40,093-Speed 5483.52 samples/sec   Loss 2.1954   LearningRate 0.0125   Epoch: 16   Global Step: 167150   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:48:47,528-Speed 5509.63 samples/sec   Loss 2.1457   LearningRate 0.0125   Epoch: 16   Global Step: 167160   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:48:55,108-Speed 5404.21 samples/sec   Loss 2.1503   LearningRate 0.0125   Epoch: 16   Global Step: 167170   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:49:02,577-Speed 5485.55 samples/sec   Loss 2.1490   LearningRate 0.0125   Epoch: 16   Global Step: 167180   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-01-09 08:49:10,040-Speed 5489.29 samples/sec   Loss 2.1644   LearningRate 0.0125   Epoch: 16   Global Step: 167190   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-01-09 08:49:17,527-Speed 5471.30 samples/sec   Loss 2.1432   LearningRate 0.0125   Epoch: 16   Global Step: 167200   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-01-09 08:49:25,013-Speed 5471.94 samples/sec   Loss 2.1652   LearningRate 0.0125   Epoch: 16   Global Step: 167210   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-01-09 08:49:32,493-Speed 5476.82 samples/sec   Loss 2.1568   LearningRate 0.0125   Epoch: 16   Global Step: 167220   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-01-09 08:49:40,010-Speed 5450.16 samples/sec   Loss 2.1705   LearningRate 0.0125   Epoch: 16   Global Step: 167230   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-01-09 08:49:47,637-Speed 5370.56 samples/sec   Loss 2.1639   LearningRate 0.0125   Epoch: 16   Global Step: 167240   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-01-09 08:49:55,197-Speed 5418.62 samples/sec   Loss 2.1369   LearningRate 0.0124   Epoch: 16   Global Step: 167250   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-01-09 08:50:02,770-Speed 5409.94 samples/sec   Loss 2.1526   LearningRate 0.0124   Epoch: 16   Global Step: 167260   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-01-09 08:50:10,313-Speed 5431.24 samples/sec   Loss 2.1364   LearningRate 0.0124   Epoch: 16   Global Step: 167270   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-01-09 08:50:17,858-Speed 5428.77 samples/sec   Loss 2.1515   LearningRate 0.0124   Epoch: 16   Global Step: 167280   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:50:25,463-Speed 5387.30 samples/sec   Loss 2.1340   LearningRate 0.0124   Epoch: 16   Global Step: 167290   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:50:33,002-Speed 5433.59 samples/sec   Loss 2.1249   LearningRate 0.0124   Epoch: 16   Global Step: 167300   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:50:40,564-Speed 5418.00 samples/sec   Loss 2.1590   LearningRate 0.0124   Epoch: 16   Global Step: 167310   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:50:48,099-Speed 5435.84 samples/sec   Loss 2.1069   LearningRate 0.0124   Epoch: 16   Global Step: 167320   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:50:55,695-Speed 5393.08 samples/sec   Loss 2.1301   LearningRate 0.0124   Epoch: 16   Global Step: 167330   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:51:03,336-Speed 5361.48 samples/sec   Loss 2.1427   LearningRate 0.0124   Epoch: 16   Global Step: 167340   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:51:11,026-Speed 5327.27 samples/sec   Loss 2.1443   LearningRate 0.0124   Epoch: 16   Global Step: 167350   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:51:18,734-Speed 5314.08 samples/sec   Loss 2.1044   LearningRate 0.0124   Epoch: 16   Global Step: 167360   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:51:26,276-Speed 5431.72 samples/sec   Loss 2.1529   LearningRate 0.0124   Epoch: 16   Global Step: 167370   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:51:33,832-Speed 5421.35 samples/sec   Loss 2.1764   LearningRate 0.0124   Epoch: 16   Global Step: 167380   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:51:41,379-Speed 5428.60 samples/sec   Loss 2.1470   LearningRate 0.0124   Epoch: 16   Global Step: 167390   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:51:49,007-Speed 5370.28 samples/sec   Loss 2.1048   LearningRate 0.0124   Epoch: 16   Global Step: 167400   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:51:56,538-Speed 5439.67 samples/sec   Loss 2.1191   LearningRate 0.0123   Epoch: 16   Global Step: 167410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:52:04,110-Speed 5410.32 samples/sec   Loss 2.1176   LearningRate 0.0123   Epoch: 16   Global Step: 167420   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:52:11,784-Speed 5338.42 samples/sec   Loss 2.1698   LearningRate 0.0123   Epoch: 16   Global Step: 167430   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:52:19,285-Speed 5460.99 samples/sec   Loss 2.1178   LearningRate 0.0123   Epoch: 16   Global Step: 167440   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:52:26,821-Speed 5435.36 samples/sec   Loss 2.1076   LearningRate 0.0123   Epoch: 16   Global Step: 167450   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:52:34,277-Speed 5494.90 samples/sec   Loss 2.1446   LearningRate 0.0123   Epoch: 16   Global Step: 167460   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:52:41,750-Speed 5481.52 samples/sec   Loss 2.1380   LearningRate 0.0123   Epoch: 16   Global Step: 167470   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:52:49,308-Speed 5420.39 samples/sec   Loss 2.1689   LearningRate 0.0123   Epoch: 16   Global Step: 167480   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:52:56,808-Speed 5462.07 samples/sec   Loss 2.1544   LearningRate 0.0123   Epoch: 16   Global Step: 167490   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:53:04,339-Speed 5439.13 samples/sec   Loss 2.1096   LearningRate 0.0123   Epoch: 16   Global Step: 167500   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:53:11,790-Speed 5497.93 samples/sec   Loss 2.1739   LearningRate 0.0123   Epoch: 16   Global Step: 167510   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:53:19,232-Speed 5504.74 samples/sec   Loss 2.1415   LearningRate 0.0123   Epoch: 16   Global Step: 167520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:53:26,747-Speed 5451.26 samples/sec   Loss 2.1296   LearningRate 0.0123   Epoch: 16   Global Step: 167530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:53:34,203-Speed 5494.16 samples/sec   Loss 2.1598   LearningRate 0.0123   Epoch: 16   Global Step: 167540   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:53:41,721-Speed 5449.18 samples/sec   Loss 2.1430   LearningRate 0.0123   Epoch: 16   Global Step: 167550   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:53:49,411-Speed 5326.89 samples/sec   Loss 2.1760   LearningRate 0.0123   Epoch: 16   Global Step: 167560   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:53:56,868-Speed 5493.61 samples/sec   Loss 2.1304   LearningRate 0.0122   Epoch: 16   Global Step: 167570   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:54:04,470-Speed 5388.63 samples/sec   Loss 2.1295   LearningRate 0.0122   Epoch: 16   Global Step: 167580   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:54:11,955-Speed 5473.36 samples/sec   Loss 2.1257   LearningRate 0.0122   Epoch: 16   Global Step: 167590   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:54:19,496-Speed 5432.28 samples/sec   Loss 2.1241   LearningRate 0.0122   Epoch: 16   Global Step: 167600   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:54:27,331-Speed 5228.21 samples/sec   Loss 2.0967   LearningRate 0.0122   Epoch: 16   Global Step: 167610   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:54:34,988-Speed 5350.49 samples/sec   Loss 2.1481   LearningRate 0.0122   Epoch: 16   Global Step: 167620   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:54:42,536-Speed 5426.92 samples/sec   Loss 2.1614   LearningRate 0.0122   Epoch: 16   Global Step: 167630   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:54:50,012-Speed 5479.25 samples/sec   Loss 2.1335   LearningRate 0.0122   Epoch: 16   Global Step: 167640   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:54:57,670-Speed 5349.34 samples/sec   Loss 2.1406   LearningRate 0.0122   Epoch: 16   Global Step: 167650   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:55:05,173-Speed 5460.18 samples/sec   Loss 2.1368   LearningRate 0.0122   Epoch: 16   Global Step: 167660   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:55:12,718-Speed 5429.28 samples/sec   Loss 2.1504   LearningRate 0.0122   Epoch: 16   Global Step: 167670   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:55:20,154-Speed 5509.32 samples/sec   Loss 2.1290   LearningRate 0.0122   Epoch: 16   Global Step: 167680   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:55:27,657-Speed 5460.11 samples/sec   Loss 2.1467   LearningRate 0.0122   Epoch: 16   Global Step: 167690   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:55:35,262-Speed 5386.05 samples/sec   Loss 2.1165   LearningRate 0.0122   Epoch: 16   Global Step: 167700   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:55:42,876-Speed 5380.72 samples/sec   Loss 2.1296   LearningRate 0.0122   Epoch: 16   Global Step: 167710   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:55:50,504-Speed 5370.72 samples/sec   Loss 2.1181   LearningRate 0.0122   Epoch: 16   Global Step: 167720   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:55:58,087-Speed 5401.81 samples/sec   Loss 2.1338   LearningRate 0.0122   Epoch: 16   Global Step: 167730   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:56:05,573-Speed 5472.69 samples/sec   Loss 2.1188   LearningRate 0.0121   Epoch: 16   Global Step: 167740   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:56:12,972-Speed 5536.79 samples/sec   Loss 2.1342   LearningRate 0.0121   Epoch: 16   Global Step: 167750   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:56:20,407-Speed 5509.32 samples/sec   Loss 2.1225   LearningRate 0.0121   Epoch: 16   Global Step: 167760   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:56:27,952-Speed 5430.05 samples/sec   Loss 2.1402   LearningRate 0.0121   Epoch: 16   Global Step: 167770   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:56:35,478-Speed 5442.82 samples/sec   Loss 2.1125   LearningRate 0.0121   Epoch: 16   Global Step: 167780   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:56:43,060-Speed 5403.13 samples/sec   Loss 2.1285   LearningRate 0.0121   Epoch: 16   Global Step: 167790   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:56:50,643-Speed 5402.04 samples/sec   Loss 2.1120   LearningRate 0.0121   Epoch: 16   Global Step: 167800   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:56:58,200-Speed 5421.46 samples/sec   Loss 2.0975   LearningRate 0.0121   Epoch: 16   Global Step: 167810   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:57:05,748-Speed 5426.77 samples/sec   Loss 2.1372   LearningRate 0.0121   Epoch: 16   Global Step: 167820   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:57:13,262-Speed 5452.20 samples/sec   Loss 2.1145   LearningRate 0.0121   Epoch: 16   Global Step: 167830   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:57:20,796-Speed 5437.68 samples/sec   Loss 2.1185   LearningRate 0.0121   Epoch: 16   Global Step: 167840   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:57:28,478-Speed 5332.93 samples/sec   Loss 2.1176   LearningRate 0.0121   Epoch: 16   Global Step: 167850   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:57:36,009-Speed 5439.64 samples/sec   Loss 2.0911   LearningRate 0.0121   Epoch: 16   Global Step: 167860   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:57:43,575-Speed 5414.58 samples/sec   Loss 2.1158   LearningRate 0.0121   Epoch: 16   Global Step: 167870   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:57:51,100-Speed 5443.55 samples/sec   Loss 2.1276   LearningRate 0.0121   Epoch: 16   Global Step: 167880   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:57:58,566-Speed 5487.46 samples/sec   Loss 2.0963   LearningRate 0.0121   Epoch: 16   Global Step: 167890   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:58:06,054-Speed 5470.40 samples/sec   Loss 2.1436   LearningRate 0.0120   Epoch: 16   Global Step: 167900   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:58:13,642-Speed 5399.35 samples/sec   Loss 2.0828   LearningRate 0.0120   Epoch: 16   Global Step: 167910   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:58:21,142-Speed 5462.10 samples/sec   Loss 2.1309   LearningRate 0.0120   Epoch: 16   Global Step: 167920   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:58:28,732-Speed 5397.26 samples/sec   Loss 2.1153   LearningRate 0.0120   Epoch: 16   Global Step: 167930   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:58:36,182-Speed 5498.23 samples/sec   Loss 2.1227   LearningRate 0.0120   Epoch: 16   Global Step: 167940   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 08:58:43,697-Speed 5451.65 samples/sec   Loss 2.1229   LearningRate 0.0120   Epoch: 16   Global Step: 167950   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:58:51,191-Speed 5466.43 samples/sec   Loss 2.1044   LearningRate 0.0120   Epoch: 16   Global Step: 167960   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:58:58,731-Speed 5432.75 samples/sec   Loss 2.0950   LearningRate 0.0120   Epoch: 16   Global Step: 167970   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:59:06,259-Speed 5441.31 samples/sec   Loss 2.1492   LearningRate 0.0120   Epoch: 16   Global Step: 167980   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:59:13,769-Speed 5455.55 samples/sec   Loss 2.1429   LearningRate 0.0120   Epoch: 16   Global Step: 167990   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 08:59:21,384-Speed 5379.26 samples/sec   Loss 2.0970   LearningRate 0.0120   Epoch: 16   Global Step: 168000   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:00:05,742-[lfw][168000]XNorm: 23.593189
Training: 2022-01-09 09:00:05,743-[lfw][168000]Accuracy-Flip: 0.99833+-0.00236
Training: 2022-01-09 09:00:05,743-[lfw][168000]Accuracy-Highest: 0.99833
Training: 2022-01-09 09:00:57,019-[cfp_fp][168000]XNorm: 22.541390
Training: 2022-01-09 09:00:57,020-[cfp_fp][168000]Accuracy-Flip: 0.99271+-0.00391
Training: 2022-01-09 09:00:57,020-[cfp_fp][168000]Accuracy-Highest: 0.99371
Training: 2022-01-09 09:01:40,995-[agedb_30][168000]XNorm: 23.996785
Training: 2022-01-09 09:01:40,995-[agedb_30][168000]Accuracy-Flip: 0.98217+-0.00803
Training: 2022-01-09 09:01:40,996-[agedb_30][168000]Accuracy-Highest: 0.98333
Training: 2022-01-09 09:01:48,657-Speed 278.13 samples/sec   Loss 2.1274   LearningRate 0.0120   Epoch: 16   Global Step: 168010   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:01:56,236-Speed 5404.91 samples/sec   Loss 2.1151   LearningRate 0.0120   Epoch: 16   Global Step: 168020   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:02:03,732-Speed 5464.74 samples/sec   Loss 2.0981   LearningRate 0.0120   Epoch: 16   Global Step: 168030   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:02:11,201-Speed 5484.96 samples/sec   Loss 2.0805   LearningRate 0.0120   Epoch: 16   Global Step: 168040   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:02:18,693-Speed 5467.65 samples/sec   Loss 2.1023   LearningRate 0.0120   Epoch: 16   Global Step: 168050   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:02:26,293-Speed 5390.47 samples/sec   Loss 2.1019   LearningRate 0.0119   Epoch: 16   Global Step: 168060   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:02:33,783-Speed 5469.24 samples/sec   Loss 2.1162   LearningRate 0.0119   Epoch: 16   Global Step: 168070   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:02:41,258-Speed 5480.32 samples/sec   Loss 2.1127   LearningRate 0.0119   Epoch: 16   Global Step: 168080   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:02:48,794-Speed 5436.26 samples/sec   Loss 2.0928   LearningRate 0.0119   Epoch: 16   Global Step: 168090   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:02:56,301-Speed 5456.69 samples/sec   Loss 2.1078   LearningRate 0.0119   Epoch: 16   Global Step: 168100   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:03:03,983-Speed 5332.83 samples/sec   Loss 2.1430   LearningRate 0.0119   Epoch: 16   Global Step: 168110   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:03:11,473-Speed 5469.31 samples/sec   Loss 2.1102   LearningRate 0.0119   Epoch: 16   Global Step: 168120   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:03:19,011-Speed 5434.15 samples/sec   Loss 2.1275   LearningRate 0.0119   Epoch: 16   Global Step: 168130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:03:26,552-Speed 5432.45 samples/sec   Loss 2.1056   LearningRate 0.0119   Epoch: 16   Global Step: 168140   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:03:34,095-Speed 5431.13 samples/sec   Loss 2.1403   LearningRate 0.0119   Epoch: 16   Global Step: 168150   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:03:41,681-Speed 5400.08 samples/sec   Loss 2.0743   LearningRate 0.0119   Epoch: 16   Global Step: 168160   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:03:49,168-Speed 5470.98 samples/sec   Loss 2.1284   LearningRate 0.0119   Epoch: 16   Global Step: 168170   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:03:56,726-Speed 5420.56 samples/sec   Loss 2.0917   LearningRate 0.0119   Epoch: 16   Global Step: 168180   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:04:04,364-Speed 5363.19 samples/sec   Loss 2.1063   LearningRate 0.0119   Epoch: 16   Global Step: 168190   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:04:11,935-Speed 5411.28 samples/sec   Loss 2.0988   LearningRate 0.0119   Epoch: 16   Global Step: 168200   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:04:19,540-Speed 5385.95 samples/sec   Loss 2.1092   LearningRate 0.0119   Epoch: 16   Global Step: 168210   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:04:27,064-Speed 5444.81 samples/sec   Loss 2.0712   LearningRate 0.0119   Epoch: 16   Global Step: 168220   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:04:34,627-Speed 5416.34 samples/sec   Loss 2.1121   LearningRate 0.0118   Epoch: 16   Global Step: 168230   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:04:42,133-Speed 5458.27 samples/sec   Loss 2.1064   LearningRate 0.0118   Epoch: 16   Global Step: 168240   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:04:49,598-Speed 5487.23 samples/sec   Loss 2.1175   LearningRate 0.0118   Epoch: 16   Global Step: 168250   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:04:57,096-Speed 5463.30 samples/sec   Loss 2.0790   LearningRate 0.0118   Epoch: 16   Global Step: 168260   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:05:04,599-Speed 5460.33 samples/sec   Loss 2.1041   LearningRate 0.0118   Epoch: 16   Global Step: 168270   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:05:11,993-Speed 5540.49 samples/sec   Loss 2.0947   LearningRate 0.0118   Epoch: 16   Global Step: 168280   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:05:19,452-Speed 5491.91 samples/sec   Loss 2.1124   LearningRate 0.0118   Epoch: 16   Global Step: 168290   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:05:27,082-Speed 5369.25 samples/sec   Loss 2.1021   LearningRate 0.0118   Epoch: 16   Global Step: 168300   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:05:34,524-Speed 5504.77 samples/sec   Loss 2.1137   LearningRate 0.0118   Epoch: 16   Global Step: 168310   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:05:42,107-Speed 5401.90 samples/sec   Loss 2.1359   LearningRate 0.0118   Epoch: 16   Global Step: 168320   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:05:49,599-Speed 5467.93 samples/sec   Loss 2.0650   LearningRate 0.0118   Epoch: 16   Global Step: 168330   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:05:57,155-Speed 5421.31 samples/sec   Loss 2.0514   LearningRate 0.0118   Epoch: 16   Global Step: 168340   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:06:04,743-Speed 5399.65 samples/sec   Loss 2.1059   LearningRate 0.0118   Epoch: 16   Global Step: 168350   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:06:12,406-Speed 5346.06 samples/sec   Loss 2.0999   LearningRate 0.0118   Epoch: 16   Global Step: 168360   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:06:19,943-Speed 5434.66 samples/sec   Loss 2.1000   LearningRate 0.0118   Epoch: 16   Global Step: 168370   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:06:27,507-Speed 5416.36 samples/sec   Loss 2.1240   LearningRate 0.0118   Epoch: 16   Global Step: 168380   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:06:34,931-Speed 5517.82 samples/sec   Loss 2.0879   LearningRate 0.0118   Epoch: 16   Global Step: 168390   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:06:42,366-Speed 5509.84 samples/sec   Loss 2.0952   LearningRate 0.0117   Epoch: 16   Global Step: 168400   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:06:49,916-Speed 5426.36 samples/sec   Loss 2.0971   LearningRate 0.0117   Epoch: 16   Global Step: 168410   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:06:57,347-Speed 5512.55 samples/sec   Loss 2.1009   LearningRate 0.0117   Epoch: 16   Global Step: 168420   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:07:04,767-Speed 5520.64 samples/sec   Loss 2.0818   LearningRate 0.0117   Epoch: 16   Global Step: 168430   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:07:12,166-Speed 5536.16 samples/sec   Loss 2.0974   LearningRate 0.0117   Epoch: 16   Global Step: 168440   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:07:19,678-Speed 5453.78 samples/sec   Loss 2.1236   LearningRate 0.0117   Epoch: 16   Global Step: 168450   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:07:27,109-Speed 5513.03 samples/sec   Loss 2.1292   LearningRate 0.0117   Epoch: 16   Global Step: 168460   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:07:34,549-Speed 5505.68 samples/sec   Loss 2.0606   LearningRate 0.0117   Epoch: 16   Global Step: 168470   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:07:42,038-Speed 5469.99 samples/sec   Loss 2.1088   LearningRate 0.0117   Epoch: 16   Global Step: 168480   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:07:49,605-Speed 5414.23 samples/sec   Loss 2.1005   LearningRate 0.0117   Epoch: 16   Global Step: 168490   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:07:57,044-Speed 5506.89 samples/sec   Loss 2.0795   LearningRate 0.0117   Epoch: 16   Global Step: 168500   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:08:04,621-Speed 5406.37 samples/sec   Loss 2.0786   LearningRate 0.0117   Epoch: 16   Global Step: 168510   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:08:12,114-Speed 5466.92 samples/sec   Loss 2.0911   LearningRate 0.0117   Epoch: 16   Global Step: 168520   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:08:19,577-Speed 5489.36 samples/sec   Loss 2.0754   LearningRate 0.0117   Epoch: 16   Global Step: 168530   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:08:27,049-Speed 5482.89 samples/sec   Loss 2.0998   LearningRate 0.0117   Epoch: 16   Global Step: 168540   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:08:34,557-Speed 5457.34 samples/sec   Loss 2.0709   LearningRate 0.0117   Epoch: 16   Global Step: 168550   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:08:42,027-Speed 5483.73 samples/sec   Loss 2.0557   LearningRate 0.0116   Epoch: 16   Global Step: 168560   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:08:49,461-Speed 5510.57 samples/sec   Loss 2.0803   LearningRate 0.0116   Epoch: 16   Global Step: 168570   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:08:56,903-Speed 5504.41 samples/sec   Loss 2.0591   LearningRate 0.0116   Epoch: 16   Global Step: 168580   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:09:04,411-Speed 5456.74 samples/sec   Loss 2.0815   LearningRate 0.0116   Epoch: 16   Global Step: 168590   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:09:11,969-Speed 5420.37 samples/sec   Loss 2.0624   LearningRate 0.0116   Epoch: 16   Global Step: 168600   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:09:19,598-Speed 5369.86 samples/sec   Loss 2.0946   LearningRate 0.0116   Epoch: 16   Global Step: 168610   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:09:27,131-Speed 5437.88 samples/sec   Loss 2.0616   LearningRate 0.0116   Epoch: 16   Global Step: 168620   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:09:34,701-Speed 5411.76 samples/sec   Loss 2.0718   LearningRate 0.0116   Epoch: 16   Global Step: 168630   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:09:42,202-Speed 5461.36 samples/sec   Loss 2.0682   LearningRate 0.0116   Epoch: 16   Global Step: 168640   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:09:49,741-Speed 5434.08 samples/sec   Loss 2.0669   LearningRate 0.0116   Epoch: 16   Global Step: 168650   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:09:57,222-Speed 5475.09 samples/sec   Loss 2.0598   LearningRate 0.0116   Epoch: 16   Global Step: 168660   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:10:04,829-Speed 5385.84 samples/sec   Loss 2.1066   LearningRate 0.0116   Epoch: 16   Global Step: 168670   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:10:12,365-Speed 5435.49 samples/sec   Loss 2.1029   LearningRate 0.0116   Epoch: 16   Global Step: 168680   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:10:19,857-Speed 5467.85 samples/sec   Loss 2.1125   LearningRate 0.0116   Epoch: 16   Global Step: 168690   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:10:27,417-Speed 5419.04 samples/sec   Loss 2.0947   LearningRate 0.0116   Epoch: 16   Global Step: 168700   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:10:34,906-Speed 5469.34 samples/sec   Loss 2.0877   LearningRate 0.0116   Epoch: 16   Global Step: 168710   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:10:42,442-Speed 5436.01 samples/sec   Loss 2.1104   LearningRate 0.0116   Epoch: 16   Global Step: 168720   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:10:49,953-Speed 5454.65 samples/sec   Loss 2.0695   LearningRate 0.0115   Epoch: 16   Global Step: 168730   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:10:57,466-Speed 5452.18 samples/sec   Loss 2.0608   LearningRate 0.0115   Epoch: 16   Global Step: 168740   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:11:05,036-Speed 5411.53 samples/sec   Loss 2.0798   LearningRate 0.0115   Epoch: 16   Global Step: 168750   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:11:12,558-Speed 5445.49 samples/sec   Loss 2.1203   LearningRate 0.0115   Epoch: 16   Global Step: 168760   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:11:20,057-Speed 5463.71 samples/sec   Loss 2.0469   LearningRate 0.0115   Epoch: 16   Global Step: 168770   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:11:27,566-Speed 5455.59 samples/sec   Loss 2.0825   LearningRate 0.0115   Epoch: 16   Global Step: 168780   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:11:41,286-Speed 2985.47 samples/sec   Loss 2.1013   LearningRate 0.0115   Epoch: 16   Global Step: 168790   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:11:48,870-Speed 5402.05 samples/sec   Loss 2.0613   LearningRate 0.0115   Epoch: 16   Global Step: 168800   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:11:56,434-Speed 5416.17 samples/sec   Loss 2.0983   LearningRate 0.0115   Epoch: 16   Global Step: 168810   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:12:03,988-Speed 5422.73 samples/sec   Loss 2.0535   LearningRate 0.0115   Epoch: 16   Global Step: 168820   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:12:11,518-Speed 5440.01 samples/sec   Loss 2.0435   LearningRate 0.0115   Epoch: 16   Global Step: 168830   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:12:19,055-Speed 5434.96 samples/sec   Loss 2.0975   LearningRate 0.0115   Epoch: 16   Global Step: 168840   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:12:26,575-Speed 5448.00 samples/sec   Loss 2.0696   LearningRate 0.0115   Epoch: 16   Global Step: 168850   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:12:34,099-Speed 5444.83 samples/sec   Loss 2.0743   LearningRate 0.0115   Epoch: 16   Global Step: 168860   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:12:41,648-Speed 5426.58 samples/sec   Loss 2.0846   LearningRate 0.0115   Epoch: 16   Global Step: 168870   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:12:49,212-Speed 5415.73 samples/sec   Loss 2.0522   LearningRate 0.0115   Epoch: 16   Global Step: 168880   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:12:56,703-Speed 5469.23 samples/sec   Loss 2.0845   LearningRate 0.0115   Epoch: 16   Global Step: 168890   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:13:04,234-Speed 5438.83 samples/sec   Loss 2.0800   LearningRate 0.0114   Epoch: 16   Global Step: 168900   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:13:11,808-Speed 5408.87 samples/sec   Loss 2.1041   LearningRate 0.0114   Epoch: 16   Global Step: 168910   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:13:19,249-Speed 5505.37 samples/sec   Loss 2.0527   LearningRate 0.0114   Epoch: 16   Global Step: 168920   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:13:26,830-Speed 5403.69 samples/sec   Loss 2.0712   LearningRate 0.0114   Epoch: 16   Global Step: 168930   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:13:34,488-Speed 5349.12 samples/sec   Loss 2.0201   LearningRate 0.0114   Epoch: 16   Global Step: 168940   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:13:42,057-Speed 5412.92 samples/sec   Loss 2.0603   LearningRate 0.0114   Epoch: 16   Global Step: 168950   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:13:49,676-Speed 5376.17 samples/sec   Loss 2.0734   LearningRate 0.0114   Epoch: 16   Global Step: 168960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:13:57,391-Speed 5310.10 samples/sec   Loss 2.0802   LearningRate 0.0114   Epoch: 16   Global Step: 168970   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:14:05,131-Speed 5292.65 samples/sec   Loss 2.1039   LearningRate 0.0114   Epoch: 16   Global Step: 168980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:14:12,893-Speed 5277.49 samples/sec   Loss 2.0814   LearningRate 0.0114   Epoch: 16   Global Step: 168990   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:14:20,417-Speed 5444.50 samples/sec   Loss 2.0562   LearningRate 0.0114   Epoch: 16   Global Step: 169000   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 09:14:27,909-Speed 5467.79 samples/sec   Loss 2.0768   LearningRate 0.0114   Epoch: 16   Global Step: 169010   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:14:35,365-Speed 5494.63 samples/sec   Loss 2.0718   LearningRate 0.0114   Epoch: 16   Global Step: 169020   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:14:42,893-Speed 5441.73 samples/sec   Loss 2.1098   LearningRate 0.0114   Epoch: 16   Global Step: 169030   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:14:50,373-Speed 5476.11 samples/sec   Loss 2.0412   LearningRate 0.0114   Epoch: 16   Global Step: 169040   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:14:57,800-Speed 5516.65 samples/sec   Loss 2.0956   LearningRate 0.0114   Epoch: 16   Global Step: 169050   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:15:05,279-Speed 5477.16 samples/sec   Loss 2.0740   LearningRate 0.0113   Epoch: 16   Global Step: 169060   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:15:12,842-Speed 5416.50 samples/sec   Loss 2.0730   LearningRate 0.0113   Epoch: 16   Global Step: 169070   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:15:20,317-Speed 5479.91 samples/sec   Loss 2.0551   LearningRate 0.0113   Epoch: 16   Global Step: 169080   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:15:27,755-Speed 5507.90 samples/sec   Loss 2.1004   LearningRate 0.0113   Epoch: 16   Global Step: 169090   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:15:35,204-Speed 5499.23 samples/sec   Loss 2.0587   LearningRate 0.0113   Epoch: 16   Global Step: 169100   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:15:42,626-Speed 5519.25 samples/sec   Loss 2.0784   LearningRate 0.0113   Epoch: 16   Global Step: 169110   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:15:50,033-Speed 5530.66 samples/sec   Loss 2.0575   LearningRate 0.0113   Epoch: 16   Global Step: 169120   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:15:57,678-Speed 5359.08 samples/sec   Loss 2.0589   LearningRate 0.0113   Epoch: 16   Global Step: 169130   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:16:05,294-Speed 5379.03 samples/sec   Loss 2.0758   LearningRate 0.0113   Epoch: 16   Global Step: 169140   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:16:12,822-Speed 5441.40 samples/sec   Loss 2.0750   LearningRate 0.0113   Epoch: 16   Global Step: 169150   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:16:20,328-Speed 5457.28 samples/sec   Loss 2.0682   LearningRate 0.0113   Epoch: 16   Global Step: 169160   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:16:27,791-Speed 5489.82 samples/sec   Loss 2.0553   LearningRate 0.0113   Epoch: 16   Global Step: 169170   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:16:35,290-Speed 5462.59 samples/sec   Loss 2.0671   LearningRate 0.0113   Epoch: 16   Global Step: 169180   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:16:42,793-Speed 5459.41 samples/sec   Loss 2.0318   LearningRate 0.0113   Epoch: 16   Global Step: 169190   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:16:50,382-Speed 5397.99 samples/sec   Loss 2.0563   LearningRate 0.0113   Epoch: 16   Global Step: 169200   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:16:57,828-Speed 5502.24 samples/sec   Loss 2.0141   LearningRate 0.0113   Epoch: 16   Global Step: 169210   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:17:05,419-Speed 5396.70 samples/sec   Loss 2.0384   LearningRate 0.0113   Epoch: 16   Global Step: 169220   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:17:12,850-Speed 5512.90 samples/sec   Loss 2.0441   LearningRate 0.0112   Epoch: 16   Global Step: 169230   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:17:20,397-Speed 5427.29 samples/sec   Loss 2.0759   LearningRate 0.0112   Epoch: 16   Global Step: 169240   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:17:27,825-Speed 5515.35 samples/sec   Loss 2.0586   LearningRate 0.0112   Epoch: 16   Global Step: 169250   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:17:35,280-Speed 5495.46 samples/sec   Loss 2.0337   LearningRate 0.0112   Epoch: 16   Global Step: 169260   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:17:42,851-Speed 5410.87 samples/sec   Loss 2.0712   LearningRate 0.0112   Epoch: 16   Global Step: 169270   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:17:50,485-Speed 5366.06 samples/sec   Loss 2.0381   LearningRate 0.0112   Epoch: 16   Global Step: 169280   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:17:58,347-Speed 5210.70 samples/sec   Loss 2.0413   LearningRate 0.0112   Epoch: 16   Global Step: 169290   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:18:05,943-Speed 5393.00 samples/sec   Loss 2.0265   LearningRate 0.0112   Epoch: 16   Global Step: 169300   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:18:13,538-Speed 5393.94 samples/sec   Loss 2.0329   LearningRate 0.0112   Epoch: 16   Global Step: 169310   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:18:21,027-Speed 5470.18 samples/sec   Loss 2.0626   LearningRate 0.0112   Epoch: 16   Global Step: 169320   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:18:28,665-Speed 5363.18 samples/sec   Loss 2.0117   LearningRate 0.0112   Epoch: 16   Global Step: 169330   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:18:36,197-Speed 5438.40 samples/sec   Loss 2.0525   LearningRate 0.0112   Epoch: 16   Global Step: 169340   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:18:43,771-Speed 5408.69 samples/sec   Loss 2.0268   LearningRate 0.0112   Epoch: 16   Global Step: 169350   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:18:51,259-Speed 5470.84 samples/sec   Loss 2.0279   LearningRate 0.0112   Epoch: 16   Global Step: 169360   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:18:58,698-Speed 5506.83 samples/sec   Loss 2.0279   LearningRate 0.0112   Epoch: 16   Global Step: 169370   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:19:06,165-Speed 5486.45 samples/sec   Loss 2.0162   LearningRate 0.0112   Epoch: 16   Global Step: 169380   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:19:13,666-Speed 5461.06 samples/sec   Loss 2.0504   LearningRate 0.0112   Epoch: 16   Global Step: 169390   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:19:21,122-Speed 5494.41 samples/sec   Loss 2.0830   LearningRate 0.0111   Epoch: 16   Global Step: 169400   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:19:28,748-Speed 5372.08 samples/sec   Loss 2.0226   LearningRate 0.0111   Epoch: 16   Global Step: 169410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:19:36,167-Speed 5521.40 samples/sec   Loss 1.9912   LearningRate 0.0111   Epoch: 16   Global Step: 169420   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:19:43,707-Speed 5433.06 samples/sec   Loss 2.0554   LearningRate 0.0111   Epoch: 16   Global Step: 169430   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:19:51,229-Speed 5446.04 samples/sec   Loss 2.0747   LearningRate 0.0111   Epoch: 16   Global Step: 169440   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:19:58,682-Speed 5496.16 samples/sec   Loss 2.0188   LearningRate 0.0111   Epoch: 16   Global Step: 169450   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:20:06,132-Speed 5498.87 samples/sec   Loss 2.0491   LearningRate 0.0111   Epoch: 16   Global Step: 169460   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:20:13,669-Speed 5435.44 samples/sec   Loss 2.0543   LearningRate 0.0111   Epoch: 16   Global Step: 169470   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:20:21,145-Speed 5479.30 samples/sec   Loss 2.0226   LearningRate 0.0111   Epoch: 16   Global Step: 169480   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:20:28,608-Speed 5489.18 samples/sec   Loss 2.0903   LearningRate 0.0111   Epoch: 16   Global Step: 169490   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:20:36,129-Speed 5446.79 samples/sec   Loss 2.0478   LearningRate 0.0111   Epoch: 16   Global Step: 169500   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:20:43,726-Speed 5392.63 samples/sec   Loss 2.0416   LearningRate 0.0111   Epoch: 16   Global Step: 169510   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:20:51,418-Speed 5325.51 samples/sec   Loss 2.0429   LearningRate 0.0111   Epoch: 16   Global Step: 169520   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:20:58,883-Speed 5487.87 samples/sec   Loss 2.0484   LearningRate 0.0111   Epoch: 16   Global Step: 169530   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:21:06,314-Speed 5518.38 samples/sec   Loss 2.0011   LearningRate 0.0111   Epoch: 16   Global Step: 169540   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:21:13,770-Speed 5494.31 samples/sec   Loss 2.0429   LearningRate 0.0111   Epoch: 16   Global Step: 169550   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:21:21,294-Speed 5444.39 samples/sec   Loss 2.0080   LearningRate 0.0111   Epoch: 16   Global Step: 169560   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:21:28,786-Speed 5468.48 samples/sec   Loss 2.0490   LearningRate 0.0110   Epoch: 16   Global Step: 169570   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:21:36,214-Speed 5515.00 samples/sec   Loss 2.0493   LearningRate 0.0110   Epoch: 16   Global Step: 169580   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:21:43,677-Speed 5488.95 samples/sec   Loss 2.0369   LearningRate 0.0110   Epoch: 16   Global Step: 169590   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:21:51,145-Speed 5485.92 samples/sec   Loss 2.0699   LearningRate 0.0110   Epoch: 16   Global Step: 169600   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:21:58,702-Speed 5420.64 samples/sec   Loss 2.0364   LearningRate 0.0110   Epoch: 16   Global Step: 169610   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:22:06,242-Speed 5432.95 samples/sec   Loss 2.0374   LearningRate 0.0110   Epoch: 16   Global Step: 169620   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:22:13,663-Speed 5521.00 samples/sec   Loss 2.0392   LearningRate 0.0110   Epoch: 16   Global Step: 169630   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:22:21,257-Speed 5394.44 samples/sec   Loss 2.0241   LearningRate 0.0110   Epoch: 16   Global Step: 169640   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:22:28,810-Speed 5423.95 samples/sec   Loss 2.0409   LearningRate 0.0110   Epoch: 16   Global Step: 169650   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:22:36,406-Speed 5392.61 samples/sec   Loss 2.0178   LearningRate 0.0110   Epoch: 16   Global Step: 169660   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:22:43,994-Speed 5398.58 samples/sec   Loss 1.9938   LearningRate 0.0110   Epoch: 16   Global Step: 169670   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 09:22:51,518-Speed 5444.58 samples/sec   Loss 2.0338   LearningRate 0.0110   Epoch: 16   Global Step: 169680   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:22:59,064-Speed 5429.28 samples/sec   Loss 2.0461   LearningRate 0.0110   Epoch: 16   Global Step: 169690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:23:06,670-Speed 5385.85 samples/sec   Loss 2.0566   LearningRate 0.0110   Epoch: 16   Global Step: 169700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:23:14,116-Speed 5501.39 samples/sec   Loss 2.0576   LearningRate 0.0110   Epoch: 16   Global Step: 169710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:23:21,554-Speed 5507.76 samples/sec   Loss 2.0471   LearningRate 0.0110   Epoch: 16   Global Step: 169720   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:23:29,122-Speed 5413.32 samples/sec   Loss 1.9983   LearningRate 0.0110   Epoch: 16   Global Step: 169730   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:23:36,564-Speed 5504.59 samples/sec   Loss 2.0206   LearningRate 0.0110   Epoch: 16   Global Step: 169740   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:23:44,055-Speed 5468.84 samples/sec   Loss 1.9975   LearningRate 0.0109   Epoch: 16   Global Step: 169750   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:23:51,544-Speed 5470.05 samples/sec   Loss 1.9987   LearningRate 0.0109   Epoch: 16   Global Step: 169760   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:23:59,063-Speed 5448.72 samples/sec   Loss 2.0237   LearningRate 0.0109   Epoch: 16   Global Step: 169770   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:24:06,571-Speed 5455.82 samples/sec   Loss 2.0404   LearningRate 0.0109   Epoch: 16   Global Step: 169780   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:24:14,064-Speed 5467.35 samples/sec   Loss 2.0490   LearningRate 0.0109   Epoch: 16   Global Step: 169790   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:24:21,580-Speed 5450.32 samples/sec   Loss 2.0457   LearningRate 0.0109   Epoch: 16   Global Step: 169800   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:24:29,056-Speed 5480.44 samples/sec   Loss 1.9761   LearningRate 0.0109   Epoch: 16   Global Step: 169810   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:24:36,608-Speed 5424.19 samples/sec   Loss 2.0341   LearningRate 0.0109   Epoch: 16   Global Step: 169820   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:24:44,161-Speed 5423.52 samples/sec   Loss 2.0307   LearningRate 0.0109   Epoch: 16   Global Step: 169830   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:24:51,831-Speed 5340.51 samples/sec   Loss 2.0383   LearningRate 0.0109   Epoch: 16   Global Step: 169840   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:24:59,329-Speed 5463.90 samples/sec   Loss 2.0333   LearningRate 0.0109   Epoch: 16   Global Step: 169850   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:25:06,831-Speed 5460.88 samples/sec   Loss 1.9931   LearningRate 0.0109   Epoch: 16   Global Step: 169860   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:25:14,303-Speed 5482.24 samples/sec   Loss 2.0122   LearningRate 0.0109   Epoch: 16   Global Step: 169870   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:25:21,849-Speed 5428.50 samples/sec   Loss 2.0106   LearningRate 0.0109   Epoch: 16   Global Step: 169880   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:25:29,352-Speed 5460.41 samples/sec   Loss 2.0376   LearningRate 0.0109   Epoch: 16   Global Step: 169890   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:25:36,955-Speed 5388.30 samples/sec   Loss 2.0324   LearningRate 0.0109   Epoch: 16   Global Step: 169900   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:25:44,495-Speed 5433.00 samples/sec   Loss 2.0201   LearningRate 0.0109   Epoch: 16   Global Step: 169910   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:25:51,948-Speed 5496.36 samples/sec   Loss 2.0014   LearningRate 0.0108   Epoch: 16   Global Step: 169920   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:25:59,498-Speed 5425.99 samples/sec   Loss 2.0113   LearningRate 0.0108   Epoch: 16   Global Step: 169930   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:26:06,990-Speed 5468.19 samples/sec   Loss 2.0069   LearningRate 0.0108   Epoch: 16   Global Step: 169940   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:26:14,490-Speed 5461.82 samples/sec   Loss 2.0007   LearningRate 0.0108   Epoch: 16   Global Step: 169950   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:26:21,908-Speed 5522.24 samples/sec   Loss 2.0230   LearningRate 0.0108   Epoch: 16   Global Step: 169960   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:26:29,470-Speed 5417.34 samples/sec   Loss 2.0136   LearningRate 0.0108   Epoch: 16   Global Step: 169970   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:26:36,997-Speed 5442.77 samples/sec   Loss 2.0244   LearningRate 0.0108   Epoch: 16   Global Step: 169980   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:26:44,559-Speed 5416.65 samples/sec   Loss 2.0390   LearningRate 0.0108   Epoch: 16   Global Step: 169990   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:26:52,006-Speed 5500.92 samples/sec   Loss 2.0178   LearningRate 0.0108   Epoch: 16   Global Step: 170000   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:27:35,747-[lfw][170000]XNorm: 22.682219
Training: 2022-01-09 09:27:35,747-[lfw][170000]Accuracy-Flip: 0.99833+-0.00236
Training: 2022-01-09 09:27:35,748-[lfw][170000]Accuracy-Highest: 0.99833
Training: 2022-01-09 09:28:26,944-[cfp_fp][170000]XNorm: 21.675479
Training: 2022-01-09 09:28:26,945-[cfp_fp][170000]Accuracy-Flip: 0.99257+-0.00360
Training: 2022-01-09 09:28:26,945-[cfp_fp][170000]Accuracy-Highest: 0.99371
Training: 2022-01-09 09:29:10,895-[agedb_30][170000]XNorm: 23.066802
Training: 2022-01-09 09:29:10,896-[agedb_30][170000]Accuracy-Flip: 0.98117+-0.00663
Training: 2022-01-09 09:29:10,897-[agedb_30][170000]Accuracy-Highest: 0.98333
Training: 2022-01-09 09:29:18,498-Speed 279.61 samples/sec   Loss 2.0017   LearningRate 0.0108   Epoch: 16   Global Step: 170010   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:29:25,968-Speed 5484.20 samples/sec   Loss 2.0038   LearningRate 0.0108   Epoch: 16   Global Step: 170020   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:29:33,494-Speed 5443.54 samples/sec   Loss 2.0244   LearningRate 0.0108   Epoch: 16   Global Step: 170030   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:29:40,973-Speed 5477.50 samples/sec   Loss 2.0279   LearningRate 0.0108   Epoch: 16   Global Step: 170040   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:29:48,496-Speed 5445.08 samples/sec   Loss 2.0393   LearningRate 0.0108   Epoch: 16   Global Step: 170050   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:29:55,996-Speed 5462.05 samples/sec   Loss 1.9943   LearningRate 0.0108   Epoch: 16   Global Step: 170060   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:30:03,530-Speed 5437.70 samples/sec   Loss 2.0359   LearningRate 0.0108   Epoch: 16   Global Step: 170070   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:30:11,013-Speed 5473.92 samples/sec   Loss 1.9854   LearningRate 0.0108   Epoch: 16   Global Step: 170080   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:30:18,481-Speed 5485.18 samples/sec   Loss 2.0309   LearningRate 0.0107   Epoch: 16   Global Step: 170090   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:30:25,920-Speed 5507.02 samples/sec   Loss 2.0189   LearningRate 0.0107   Epoch: 16   Global Step: 170100   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:30:33,426-Speed 5458.11 samples/sec   Loss 2.0028   LearningRate 0.0107   Epoch: 16   Global Step: 170110   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:30:40,976-Speed 5425.82 samples/sec   Loss 1.9790   LearningRate 0.0107   Epoch: 16   Global Step: 170120   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:30:48,528-Speed 5424.78 samples/sec   Loss 2.0058   LearningRate 0.0107   Epoch: 16   Global Step: 170130   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:30:56,090-Speed 5417.24 samples/sec   Loss 1.9700   LearningRate 0.0107   Epoch: 16   Global Step: 170140   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:31:03,643-Speed 5423.74 samples/sec   Loss 1.9915   LearningRate 0.0107   Epoch: 16   Global Step: 170150   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:31:11,207-Speed 5416.05 samples/sec   Loss 2.0456   LearningRate 0.0107   Epoch: 16   Global Step: 170160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:31:18,734-Speed 5442.01 samples/sec   Loss 1.9847   LearningRate 0.0107   Epoch: 16   Global Step: 170170   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:31:26,373-Speed 5362.56 samples/sec   Loss 2.0005   LearningRate 0.0107   Epoch: 16   Global Step: 170180   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:31:33,951-Speed 5406.74 samples/sec   Loss 2.0075   LearningRate 0.0107   Epoch: 16   Global Step: 170190   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:31:41,512-Speed 5417.54 samples/sec   Loss 2.0058   LearningRate 0.0107   Epoch: 16   Global Step: 170200   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:31:49,050-Speed 5434.25 samples/sec   Loss 2.0077   LearningRate 0.0107   Epoch: 16   Global Step: 170210   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:31:56,628-Speed 5405.75 samples/sec   Loss 1.9956   LearningRate 0.0107   Epoch: 16   Global Step: 170220   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:32:04,192-Speed 5416.57 samples/sec   Loss 2.0417   LearningRate 0.0107   Epoch: 16   Global Step: 170230   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:32:11,767-Speed 5407.56 samples/sec   Loss 2.0093   LearningRate 0.0107   Epoch: 16   Global Step: 170240   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 09:32:19,386-Speed 5376.57 samples/sec   Loss 1.9600   LearningRate 0.0107   Epoch: 16   Global Step: 170250   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:32:26,879-Speed 5467.16 samples/sec   Loss 2.0073   LearningRate 0.0107   Epoch: 16   Global Step: 170260   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:32:34,364-Speed 5473.51 samples/sec   Loss 1.9850   LearningRate 0.0106   Epoch: 16   Global Step: 170270   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:32:41,900-Speed 5435.54 samples/sec   Loss 1.9949   LearningRate 0.0106   Epoch: 16   Global Step: 170280   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:32:49,454-Speed 5422.81 samples/sec   Loss 2.0084   LearningRate 0.0106   Epoch: 16   Global Step: 170290   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 09:32:57,041-Speed 5399.63 samples/sec   Loss 1.9952   LearningRate 0.0106   Epoch: 16   Global Step: 170300   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:33:04,665-Speed 5373.53 samples/sec   Loss 2.0087   LearningRate 0.0106   Epoch: 16   Global Step: 170310   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:33:12,228-Speed 5416.74 samples/sec   Loss 1.9723   LearningRate 0.0106   Epoch: 16   Global Step: 170320   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:33:19,740-Speed 5453.19 samples/sec   Loss 2.0079   LearningRate 0.0106   Epoch: 16   Global Step: 170330   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:33:27,374-Speed 5366.05 samples/sec   Loss 2.0209   LearningRate 0.0106   Epoch: 16   Global Step: 170340   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:33:35,102-Speed 5300.99 samples/sec   Loss 1.9805   LearningRate 0.0106   Epoch: 16   Global Step: 170350   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:33:42,652-Speed 5426.23 samples/sec   Loss 1.9731   LearningRate 0.0106   Epoch: 16   Global Step: 170360   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:33:50,243-Speed 5396.83 samples/sec   Loss 1.9977   LearningRate 0.0106   Epoch: 16   Global Step: 170370   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:33:57,785-Speed 5431.57 samples/sec   Loss 1.9954   LearningRate 0.0106   Epoch: 16   Global Step: 170380   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:34:05,317-Speed 5438.72 samples/sec   Loss 2.0418   LearningRate 0.0106   Epoch: 16   Global Step: 170390   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:34:12,836-Speed 5449.07 samples/sec   Loss 1.9684   LearningRate 0.0106   Epoch: 16   Global Step: 170400   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:34:20,372-Speed 5435.37 samples/sec   Loss 1.9983   LearningRate 0.0106   Epoch: 16   Global Step: 170410   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:34:27,903-Speed 5439.98 samples/sec   Loss 1.9831   LearningRate 0.0106   Epoch: 16   Global Step: 170420   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:34:35,420-Speed 5449.64 samples/sec   Loss 1.9796   LearningRate 0.0106   Epoch: 16   Global Step: 170430   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:34:42,945-Speed 5444.14 samples/sec   Loss 1.9820   LearningRate 0.0105   Epoch: 16   Global Step: 170440   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:34:50,528-Speed 5401.85 samples/sec   Loss 1.9999   LearningRate 0.0105   Epoch: 16   Global Step: 170450   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:34:58,045-Speed 5449.39 samples/sec   Loss 1.9697   LearningRate 0.0105   Epoch: 16   Global Step: 170460   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:35:05,563-Speed 5449.30 samples/sec   Loss 1.9804   LearningRate 0.0105   Epoch: 16   Global Step: 170470   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:35:13,093-Speed 5440.64 samples/sec   Loss 1.9600   LearningRate 0.0105   Epoch: 16   Global Step: 170480   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:35:20,592-Speed 5463.49 samples/sec   Loss 1.9694   LearningRate 0.0105   Epoch: 16   Global Step: 170490   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:35:28,071-Speed 5477.10 samples/sec   Loss 1.9845   LearningRate 0.0105   Epoch: 16   Global Step: 170500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:35:35,557-Speed 5472.30 samples/sec   Loss 2.0031   LearningRate 0.0105   Epoch: 16   Global Step: 170510   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:35:43,069-Speed 5453.49 samples/sec   Loss 1.9885   LearningRate 0.0105   Epoch: 16   Global Step: 170520   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:35:50,598-Speed 5441.64 samples/sec   Loss 1.9615   LearningRate 0.0105   Epoch: 16   Global Step: 170530   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:35:58,212-Speed 5379.47 samples/sec   Loss 1.9625   LearningRate 0.0105   Epoch: 16   Global Step: 170540   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:36:05,748-Speed 5436.04 samples/sec   Loss 1.9839   LearningRate 0.0105   Epoch: 16   Global Step: 170550   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:36:13,234-Speed 5472.30 samples/sec   Loss 1.9884   LearningRate 0.0105   Epoch: 16   Global Step: 170560   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:36:20,689-Speed 5495.27 samples/sec   Loss 1.9737   LearningRate 0.0105   Epoch: 16   Global Step: 170570   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:36:28,216-Speed 5441.82 samples/sec   Loss 1.9570   LearningRate 0.0105   Epoch: 16   Global Step: 170580   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:36:35,729-Speed 5452.55 samples/sec   Loss 1.9824   LearningRate 0.0105   Epoch: 16   Global Step: 170590   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:36:43,226-Speed 5464.70 samples/sec   Loss 1.9771   LearningRate 0.0105   Epoch: 16   Global Step: 170600   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:36:50,714-Speed 5471.09 samples/sec   Loss 1.9727   LearningRate 0.0105   Epoch: 16   Global Step: 170610   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:36:58,275-Speed 5417.21 samples/sec   Loss 1.9932   LearningRate 0.0104   Epoch: 16   Global Step: 170620   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:37:05,870-Speed 5394.02 samples/sec   Loss 2.0212   LearningRate 0.0104   Epoch: 16   Global Step: 170630   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:37:13,365-Speed 5465.39 samples/sec   Loss 1.9820   LearningRate 0.0104   Epoch: 16   Global Step: 170640   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:37:20,908-Speed 5431.08 samples/sec   Loss 2.0011   LearningRate 0.0104   Epoch: 16   Global Step: 170650   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:37:28,382-Speed 5481.53 samples/sec   Loss 1.9603   LearningRate 0.0104   Epoch: 16   Global Step: 170660   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:37:35,927-Speed 5429.08 samples/sec   Loss 1.9935   LearningRate 0.0104   Epoch: 16   Global Step: 170670   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:37:43,584-Speed 5350.33 samples/sec   Loss 1.9993   LearningRate 0.0104   Epoch: 16   Global Step: 170680   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:37:51,269-Speed 5330.26 samples/sec   Loss 2.0194   LearningRate 0.0104   Epoch: 16   Global Step: 170690   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:37:58,818-Speed 5426.92 samples/sec   Loss 1.9858   LearningRate 0.0104   Epoch: 16   Global Step: 170700   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:38:06,397-Speed 5405.41 samples/sec   Loss 1.9955   LearningRate 0.0104   Epoch: 16   Global Step: 170710   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:38:13,948-Speed 5424.90 samples/sec   Loss 1.9801   LearningRate 0.0104   Epoch: 16   Global Step: 170720   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:38:21,542-Speed 5394.90 samples/sec   Loss 1.9858   LearningRate 0.0104   Epoch: 16   Global Step: 170730   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:38:29,115-Speed 5409.70 samples/sec   Loss 1.9797   LearningRate 0.0104   Epoch: 16   Global Step: 170740   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:38:36,611-Speed 5464.46 samples/sec   Loss 1.9915   LearningRate 0.0104   Epoch: 16   Global Step: 170750   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:38:44,071-Speed 5491.24 samples/sec   Loss 1.9665   LearningRate 0.0104   Epoch: 16   Global Step: 170760   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:38:51,677-Speed 5385.96 samples/sec   Loss 1.9606   LearningRate 0.0104   Epoch: 16   Global Step: 170770   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:38:59,100-Speed 5519.42 samples/sec   Loss 1.9901   LearningRate 0.0104   Epoch: 16   Global Step: 170780   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:39:06,649-Speed 5426.23 samples/sec   Loss 1.9535   LearningRate 0.0103   Epoch: 16   Global Step: 170790   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:39:14,084-Speed 5509.84 samples/sec   Loss 1.9681   LearningRate 0.0103   Epoch: 16   Global Step: 170800   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:39:21,531-Speed 5501.35 samples/sec   Loss 1.9854   LearningRate 0.0103   Epoch: 16   Global Step: 170810   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:39:29,079-Speed 5426.88 samples/sec   Loss 1.9919   LearningRate 0.0103   Epoch: 16   Global Step: 170820   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:39:36,623-Speed 5430.54 samples/sec   Loss 1.9583   LearningRate 0.0103   Epoch: 16   Global Step: 170830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-09 09:39:44,049-Speed 5516.78 samples/sec   Loss 1.9640   LearningRate 0.0103   Epoch: 16   Global Step: 170840   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:39:51,592-Speed 5430.70 samples/sec   Loss 2.0002   LearningRate 0.0103   Epoch: 16   Global Step: 170850   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:39:59,124-Speed 5439.01 samples/sec   Loss 1.9388   LearningRate 0.0103   Epoch: 16   Global Step: 170860   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:40:06,710-Speed 5399.41 samples/sec   Loss 1.9647   LearningRate 0.0103   Epoch: 16   Global Step: 170870   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:40:14,257-Speed 5428.55 samples/sec   Loss 1.9879   LearningRate 0.0103   Epoch: 16   Global Step: 170880   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:40:21,721-Speed 5488.58 samples/sec   Loss 1.9709   LearningRate 0.0103   Epoch: 16   Global Step: 170890   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:40:29,262-Speed 5432.48 samples/sec   Loss 1.9884   LearningRate 0.0103   Epoch: 16   Global Step: 170900   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:40:36,771-Speed 5455.33 samples/sec   Loss 1.9826   LearningRate 0.0103   Epoch: 16   Global Step: 170910   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:40:44,200-Speed 5514.37 samples/sec   Loss 1.9383   LearningRate 0.0103   Epoch: 16   Global Step: 170920   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:40:51,750-Speed 5425.94 samples/sec   Loss 1.9586   LearningRate 0.0103   Epoch: 16   Global Step: 170930   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:40:59,255-Speed 5458.36 samples/sec   Loss 1.9814   LearningRate 0.0103   Epoch: 16   Global Step: 170940   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:41:06,904-Speed 5355.48 samples/sec   Loss 1.9653   LearningRate 0.0103   Epoch: 16   Global Step: 170950   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:41:14,468-Speed 5416.13 samples/sec   Loss 1.9698   LearningRate 0.0103   Epoch: 16   Global Step: 170960   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:41:21,978-Speed 5454.90 samples/sec   Loss 1.9758   LearningRate 0.0102   Epoch: 16   Global Step: 170970   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:41:29,486-Speed 5455.98 samples/sec   Loss 1.9844   LearningRate 0.0102   Epoch: 16   Global Step: 170980   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:41:37,004-Speed 5449.31 samples/sec   Loss 1.9610   LearningRate 0.0102   Epoch: 16   Global Step: 170990   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:41:44,585-Speed 5403.15 samples/sec   Loss 1.9716   LearningRate 0.0102   Epoch: 16   Global Step: 171000   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:41:52,085-Speed 5462.18 samples/sec   Loss 1.9212   LearningRate 0.0102   Epoch: 16   Global Step: 171010   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:41:59,678-Speed 5395.22 samples/sec   Loss 1.9626   LearningRate 0.0102   Epoch: 16   Global Step: 171020   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:42:07,239-Speed 5417.60 samples/sec   Loss 1.9625   LearningRate 0.0102   Epoch: 16   Global Step: 171030   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:42:14,819-Speed 5404.45 samples/sec   Loss 1.9410   LearningRate 0.0102   Epoch: 16   Global Step: 171040   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:42:22,346-Speed 5442.64 samples/sec   Loss 1.9873   LearningRate 0.0102   Epoch: 16   Global Step: 171050   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:42:29,900-Speed 5423.28 samples/sec   Loss 1.9581   LearningRate 0.0102   Epoch: 16   Global Step: 171060   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:42:37,437-Speed 5434.68 samples/sec   Loss 1.9465   LearningRate 0.0102   Epoch: 16   Global Step: 171070   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:42:44,946-Speed 5455.47 samples/sec   Loss 2.0050   LearningRate 0.0102   Epoch: 16   Global Step: 171080   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:42:52,476-Speed 5440.74 samples/sec   Loss 1.9275   LearningRate 0.0102   Epoch: 16   Global Step: 171090   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:42:59,971-Speed 5465.69 samples/sec   Loss 1.9848   LearningRate 0.0102   Epoch: 16   Global Step: 171100   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:43:07,532-Speed 5417.79 samples/sec   Loss 1.9607   LearningRate 0.0102   Epoch: 16   Global Step: 171110   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:43:14,989-Speed 5493.74 samples/sec   Loss 1.9523   LearningRate 0.0102   Epoch: 16   Global Step: 171120   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:43:22,439-Speed 5498.41 samples/sec   Loss 1.9218   LearningRate 0.0102   Epoch: 16   Global Step: 171130   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:43:29,888-Speed 5499.54 samples/sec   Loss 1.9779   LearningRate 0.0102   Epoch: 16   Global Step: 171140   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:43:37,438-Speed 5426.24 samples/sec   Loss 1.9627   LearningRate 0.0101   Epoch: 16   Global Step: 171150   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:43:44,920-Speed 5475.16 samples/sec   Loss 1.9629   LearningRate 0.0101   Epoch: 16   Global Step: 171160   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:43:52,559-Speed 5362.01 samples/sec   Loss 1.9930   LearningRate 0.0101   Epoch: 16   Global Step: 171170   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:44:00,218-Speed 5348.95 samples/sec   Loss 1.9359   LearningRate 0.0101   Epoch: 16   Global Step: 171180   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:44:07,804-Speed 5400.18 samples/sec   Loss 1.9123   LearningRate 0.0101   Epoch: 16   Global Step: 171190   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:44:15,282-Speed 5477.66 samples/sec   Loss 1.9301   LearningRate 0.0101   Epoch: 16   Global Step: 171200   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:44:22,840-Speed 5420.72 samples/sec   Loss 1.9813   LearningRate 0.0101   Epoch: 16   Global Step: 171210   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:44:30,332-Speed 5468.23 samples/sec   Loss 1.9495   LearningRate 0.0101   Epoch: 16   Global Step: 171220   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:44:37,881-Speed 5426.16 samples/sec   Loss 1.9274   LearningRate 0.0101   Epoch: 16   Global Step: 171230   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:44:45,398-Speed 5449.77 samples/sec   Loss 1.9545   LearningRate 0.0101   Epoch: 16   Global Step: 171240   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:44:52,931-Speed 5438.15 samples/sec   Loss 1.9473   LearningRate 0.0101   Epoch: 16   Global Step: 171250   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:45:00,465-Speed 5437.86 samples/sec   Loss 1.9590   LearningRate 0.0101   Epoch: 16   Global Step: 171260   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:45:07,960-Speed 5465.26 samples/sec   Loss 1.9990   LearningRate 0.0101   Epoch: 16   Global Step: 171270   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:45:15,496-Speed 5436.22 samples/sec   Loss 1.9468   LearningRate 0.0101   Epoch: 16   Global Step: 171280   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:45:23,059-Speed 5416.75 samples/sec   Loss 1.9202   LearningRate 0.0101   Epoch: 16   Global Step: 171290   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:45:30,595-Speed 5435.80 samples/sec   Loss 1.9508   LearningRate 0.0101   Epoch: 16   Global Step: 171300   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:45:38,143-Speed 5427.24 samples/sec   Loss 1.9353   LearningRate 0.0101   Epoch: 16   Global Step: 171310   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:45:45,576-Speed 5510.71 samples/sec   Loss 1.9206   LearningRate 0.0101   Epoch: 16   Global Step: 171320   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:45:53,054-Speed 5478.67 samples/sec   Loss 1.9476   LearningRate 0.0100   Epoch: 16   Global Step: 171330   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:46:00,536-Speed 5474.95 samples/sec   Loss 1.9392   LearningRate 0.0100   Epoch: 16   Global Step: 171340   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:46:08,208-Speed 5339.44 samples/sec   Loss 1.9427   LearningRate 0.0100   Epoch: 16   Global Step: 171350   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:46:15,733-Speed 5444.22 samples/sec   Loss 1.8994   LearningRate 0.0100   Epoch: 16   Global Step: 171360   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:46:23,178-Speed 5502.88 samples/sec   Loss 1.9325   LearningRate 0.0100   Epoch: 16   Global Step: 171370   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:46:30,672-Speed 5466.38 samples/sec   Loss 1.9272   LearningRate 0.0100   Epoch: 16   Global Step: 171380   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:46:38,126-Speed 5495.10 samples/sec   Loss 1.9245   LearningRate 0.0100   Epoch: 16   Global Step: 171390   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:46:45,616-Speed 5469.79 samples/sec   Loss 1.9600   LearningRate 0.0100   Epoch: 16   Global Step: 171400   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:46:53,089-Speed 5482.14 samples/sec   Loss 1.9320   LearningRate 0.0100   Epoch: 16   Global Step: 171410   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:47:00,575-Speed 5471.52 samples/sec   Loss 1.9184   LearningRate 0.0100   Epoch: 16   Global Step: 171420   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:47:08,134-Speed 5419.83 samples/sec   Loss 1.9325   LearningRate 0.0100   Epoch: 16   Global Step: 171430   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:47:15,609-Speed 5480.45 samples/sec   Loss 1.9104   LearningRate 0.0100   Epoch: 16   Global Step: 171440   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:47:23,132-Speed 5445.64 samples/sec   Loss 1.9537   LearningRate 0.0100   Epoch: 16   Global Step: 171450   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:47:30,694-Speed 5417.34 samples/sec   Loss 1.9247   LearningRate 0.0100   Epoch: 16   Global Step: 171460   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:47:38,264-Speed 5411.23 samples/sec   Loss 1.9604   LearningRate 0.0100   Epoch: 16   Global Step: 171470   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:47:45,779-Speed 5451.24 samples/sec   Loss 1.9120   LearningRate 0.0100   Epoch: 16   Global Step: 171480   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:47:53,367-Speed 5399.06 samples/sec   Loss 1.9777   LearningRate 0.0100   Epoch: 16   Global Step: 171490   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:48:00,874-Speed 5456.45 samples/sec   Loss 1.9454   LearningRate 0.0100   Epoch: 16   Global Step: 171500   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:48:08,377-Speed 5459.56 samples/sec   Loss 1.9261   LearningRate 0.0099   Epoch: 16   Global Step: 171510   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:48:15,853-Speed 5480.18 samples/sec   Loss 1.9468   LearningRate 0.0099   Epoch: 16   Global Step: 171520   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 09:48:23,426-Speed 5409.32 samples/sec   Loss 1.9264   LearningRate 0.0099   Epoch: 16   Global Step: 171530   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 09:48:30,950-Speed 5444.51 samples/sec   Loss 1.9322   LearningRate 0.0099   Epoch: 16   Global Step: 171540   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 09:48:38,468-Speed 5449.11 samples/sec   Loss 1.9557   LearningRate 0.0099   Epoch: 16   Global Step: 171550   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 09:48:46,029-Speed 5418.09 samples/sec   Loss 1.9477   LearningRate 0.0099   Epoch: 16   Global Step: 171560   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 09:48:53,579-Speed 5425.55 samples/sec   Loss 1.9392   LearningRate 0.0099   Epoch: 16   Global Step: 171570   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 09:49:01,126-Speed 5427.95 samples/sec   Loss 1.8962   LearningRate 0.0099   Epoch: 16   Global Step: 171580   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 09:49:08,672-Speed 5429.07 samples/sec   Loss 1.9561   LearningRate 0.0099   Epoch: 16   Global Step: 171590   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 09:49:16,213-Speed 5431.91 samples/sec   Loss 1.9295   LearningRate 0.0099   Epoch: 16   Global Step: 171600   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 09:49:23,827-Speed 5380.95 samples/sec   Loss 1.9149   LearningRate 0.0099   Epoch: 16   Global Step: 171610   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 09:49:31,430-Speed 5387.77 samples/sec   Loss 1.9312   LearningRate 0.0099   Epoch: 16   Global Step: 171620   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:49:39,022-Speed 5396.05 samples/sec   Loss 1.9294   LearningRate 0.0099   Epoch: 16   Global Step: 171630   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:49:46,578-Speed 5421.68 samples/sec   Loss 1.9485   LearningRate 0.0099   Epoch: 16   Global Step: 171640   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:49:54,205-Speed 5371.04 samples/sec   Loss 1.9301   LearningRate 0.0099   Epoch: 16   Global Step: 171650   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:50:01,828-Speed 5373.76 samples/sec   Loss 1.9217   LearningRate 0.0099   Epoch: 16   Global Step: 171660   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:50:09,439-Speed 5382.57 samples/sec   Loss 1.9105   LearningRate 0.0099   Epoch: 16   Global Step: 171670   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:50:17,046-Speed 5385.09 samples/sec   Loss 1.9231   LearningRate 0.0099   Epoch: 16   Global Step: 171680   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:50:24,595-Speed 5426.38 samples/sec   Loss 1.9530   LearningRate 0.0098   Epoch: 16   Global Step: 171690   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 09:50:32,209-Speed 5380.53 samples/sec   Loss 1.9439   LearningRate 0.0098   Epoch: 16   Global Step: 171700   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 09:50:39,928-Speed 5306.94 samples/sec   Loss 1.9464   LearningRate 0.0098   Epoch: 16   Global Step: 171710   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 09:50:47,541-Speed 5380.56 samples/sec   Loss 1.9552   LearningRate 0.0098   Epoch: 16   Global Step: 171720   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 09:50:55,074-Speed 5438.21 samples/sec   Loss 1.9553   LearningRate 0.0098   Epoch: 16   Global Step: 171730   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 09:51:02,643-Speed 5412.40 samples/sec   Loss 1.9395   LearningRate 0.0098   Epoch: 16   Global Step: 171740   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 09:51:10,076-Speed 5511.71 samples/sec   Loss 1.8964   LearningRate 0.0098   Epoch: 16   Global Step: 171750   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 09:51:17,587-Speed 5453.54 samples/sec   Loss 1.9389   LearningRate 0.0098   Epoch: 16   Global Step: 171760   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 09:51:25,098-Speed 5454.42 samples/sec   Loss 1.9413   LearningRate 0.0098   Epoch: 16   Global Step: 171770   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 09:51:32,567-Speed 5484.66 samples/sec   Loss 1.9839   LearningRate 0.0098   Epoch: 16   Global Step: 171780   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 09:51:40,127-Speed 5418.77 samples/sec   Loss 1.9448   LearningRate 0.0098   Epoch: 16   Global Step: 171790   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:51:47,611-Speed 5473.21 samples/sec   Loss 1.9151   LearningRate 0.0098   Epoch: 16   Global Step: 171800   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:51:55,077-Speed 5487.13 samples/sec   Loss 1.9123   LearningRate 0.0098   Epoch: 16   Global Step: 171810   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:52:02,545-Speed 5485.40 samples/sec   Loss 1.9231   LearningRate 0.0098   Epoch: 16   Global Step: 171820   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:52:10,050-Speed 5458.38 samples/sec   Loss 1.9195   LearningRate 0.0098   Epoch: 16   Global Step: 171830   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:52:17,640-Speed 5397.46 samples/sec   Loss 1.9055   LearningRate 0.0098   Epoch: 16   Global Step: 171840   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:52:25,146-Speed 5457.54 samples/sec   Loss 1.9345   LearningRate 0.0098   Epoch: 16   Global Step: 171850   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:52:32,723-Speed 5407.15 samples/sec   Loss 1.9157   LearningRate 0.0098   Epoch: 16   Global Step: 171860   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:52:40,197-Speed 5480.84 samples/sec   Loss 1.9245   LearningRate 0.0097   Epoch: 16   Global Step: 171870   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:52:47,739-Speed 5431.82 samples/sec   Loss 1.9287   LearningRate 0.0097   Epoch: 16   Global Step: 171880   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:52:55,294-Speed 5421.62 samples/sec   Loss 1.9470   LearningRate 0.0097   Epoch: 16   Global Step: 171890   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:53:02,910-Speed 5379.16 samples/sec   Loss 1.9358   LearningRate 0.0097   Epoch: 16   Global Step: 171900   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:53:10,516-Speed 5386.55 samples/sec   Loss 1.9036   LearningRate 0.0097   Epoch: 16   Global Step: 171910   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:53:18,036-Speed 5447.20 samples/sec   Loss 1.9180   LearningRate 0.0097   Epoch: 16   Global Step: 171920   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:53:25,548-Speed 5453.02 samples/sec   Loss 1.8878   LearningRate 0.0097   Epoch: 16   Global Step: 171930   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:53:33,116-Speed 5413.06 samples/sec   Loss 1.9166   LearningRate 0.0097   Epoch: 16   Global Step: 171940   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:53:40,595-Speed 5494.58 samples/sec   Loss 1.9689   LearningRate 0.0097   Epoch: 16   Global Step: 171950   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:53:48,118-Speed 5444.89 samples/sec   Loss 1.9047   LearningRate 0.0097   Epoch: 16   Global Step: 171960   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:53:55,590-Speed 5482.59 samples/sec   Loss 1.9300   LearningRate 0.0097   Epoch: 16   Global Step: 171970   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:54:03,107-Speed 5449.61 samples/sec   Loss 1.8914   LearningRate 0.0097   Epoch: 16   Global Step: 171980   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:54:10,554-Speed 5501.57 samples/sec   Loss 1.9313   LearningRate 0.0097   Epoch: 16   Global Step: 171990   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:54:18,087-Speed 5437.81 samples/sec   Loss 1.9355   LearningRate 0.0097   Epoch: 16   Global Step: 172000   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:55:01,955-[lfw][172000]XNorm: 22.950953
Training: 2022-01-09 09:55:01,956-[lfw][172000]Accuracy-Flip: 0.99817+-0.00229
Training: 2022-01-09 09:55:01,957-[lfw][172000]Accuracy-Highest: 0.99833
Training: 2022-01-09 09:55:53,022-[cfp_fp][172000]XNorm: 22.024559
Training: 2022-01-09 09:55:53,023-[cfp_fp][172000]Accuracy-Flip: 0.99314+-0.00349
Training: 2022-01-09 09:55:53,023-[cfp_fp][172000]Accuracy-Highest: 0.99371
Training: 2022-01-09 09:56:36,909-[agedb_30][172000]XNorm: 23.269388
Training: 2022-01-09 09:56:36,910-[agedb_30][172000]Accuracy-Flip: 0.98433+-0.00588
Training: 2022-01-09 09:56:36,911-[agedb_30][172000]Accuracy-Highest: 0.98433
Training: 2022-01-09 09:56:44,492-Speed 279.77 samples/sec   Loss 1.9287   LearningRate 0.0097   Epoch: 16   Global Step: 172010   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:56:52,032-Speed 5433.06 samples/sec   Loss 1.9253   LearningRate 0.0097   Epoch: 16   Global Step: 172020   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:56:59,582-Speed 5425.40 samples/sec   Loss 1.9117   LearningRate 0.0097   Epoch: 16   Global Step: 172030   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:57:07,221-Speed 5362.97 samples/sec   Loss 1.9259   LearningRate 0.0097   Epoch: 16   Global Step: 172040   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:57:14,794-Speed 5409.33 samples/sec   Loss 1.9040   LearningRate 0.0096   Epoch: 16   Global Step: 172050   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:57:22,311-Speed 5449.76 samples/sec   Loss 1.8938   LearningRate 0.0096   Epoch: 16   Global Step: 172060   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:57:29,769-Speed 5492.59 samples/sec   Loss 1.8862   LearningRate 0.0096   Epoch: 16   Global Step: 172070   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:57:37,236-Speed 5486.79 samples/sec   Loss 1.9024   LearningRate 0.0096   Epoch: 16   Global Step: 172080   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:57:44,680-Speed 5502.37 samples/sec   Loss 1.8841   LearningRate 0.0096   Epoch: 16   Global Step: 172090   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:57:52,211-Speed 5440.16 samples/sec   Loss 1.9504   LearningRate 0.0096   Epoch: 16   Global Step: 172100   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:57:59,719-Speed 5455.90 samples/sec   Loss 1.9525   LearningRate 0.0096   Epoch: 16   Global Step: 172110   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:58:07,218-Speed 5463.02 samples/sec   Loss 1.8947   LearningRate 0.0096   Epoch: 16   Global Step: 172120   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:58:14,756-Speed 5434.19 samples/sec   Loss 1.9134   LearningRate 0.0096   Epoch: 16   Global Step: 172130   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:58:22,308-Speed 5424.69 samples/sec   Loss 1.9096   LearningRate 0.0096   Epoch: 16   Global Step: 172140   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:58:29,827-Speed 5447.71 samples/sec   Loss 1.9211   LearningRate 0.0096   Epoch: 16   Global Step: 172150   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:58:37,406-Speed 5405.22 samples/sec   Loss 1.9243   LearningRate 0.0096   Epoch: 16   Global Step: 172160   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:58:44,888-Speed 5475.13 samples/sec   Loss 1.9135   LearningRate 0.0096   Epoch: 16   Global Step: 172170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 09:58:54,107-Speed 4443.47 samples/sec   Loss 1.8898   LearningRate 0.0096   Epoch: 16   Global Step: 172180   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:59:01,657-Speed 5426.33 samples/sec   Loss 1.9212   LearningRate 0.0096   Epoch: 16   Global Step: 172190   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:59:09,168-Speed 5453.45 samples/sec   Loss 1.9279   LearningRate 0.0096   Epoch: 16   Global Step: 172200   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:59:16,648-Speed 5476.80 samples/sec   Loss 1.8914   LearningRate 0.0096   Epoch: 16   Global Step: 172210   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:59:24,219-Speed 5411.12 samples/sec   Loss 1.9230   LearningRate 0.0096   Epoch: 16   Global Step: 172220   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:59:31,737-Speed 5448.96 samples/sec   Loss 1.8948   LearningRate 0.0095   Epoch: 16   Global Step: 172230   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:59:39,408-Speed 5340.28 samples/sec   Loss 1.8833   LearningRate 0.0095   Epoch: 16   Global Step: 172240   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:59:46,852-Speed 5503.24 samples/sec   Loss 1.9100   LearningRate 0.0095   Epoch: 16   Global Step: 172250   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 09:59:54,373-Speed 5446.59 samples/sec   Loss 1.8972   LearningRate 0.0095   Epoch: 16   Global Step: 172260   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:00:01,896-Speed 5445.59 samples/sec   Loss 1.8724   LearningRate 0.0095   Epoch: 16   Global Step: 172270   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:00:09,329-Speed 5511.57 samples/sec   Loss 1.9232   LearningRate 0.0095   Epoch: 16   Global Step: 172280   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:00:16,975-Speed 5357.22 samples/sec   Loss 1.8919   LearningRate 0.0095   Epoch: 16   Global Step: 172290   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:00:24,465-Speed 5469.54 samples/sec   Loss 1.8771   LearningRate 0.0095   Epoch: 16   Global Step: 172300   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:00:31,975-Speed 5455.35 samples/sec   Loss 1.9008   LearningRate 0.0095   Epoch: 16   Global Step: 172310   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:00:39,491-Speed 5450.40 samples/sec   Loss 1.8925   LearningRate 0.0095   Epoch: 16   Global Step: 172320   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:00:47,033-Speed 5431.50 samples/sec   Loss 1.9073   LearningRate 0.0095   Epoch: 16   Global Step: 172330   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:00:54,684-Speed 5354.45 samples/sec   Loss 1.8654   LearningRate 0.0095   Epoch: 16   Global Step: 172340   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:01:02,201-Speed 5449.99 samples/sec   Loss 1.9014   LearningRate 0.0095   Epoch: 16   Global Step: 172350   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:01:09,860-Speed 5348.69 samples/sec   Loss 1.8833   LearningRate 0.0095   Epoch: 16   Global Step: 172360   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:01:17,418-Speed 5419.62 samples/sec   Loss 1.9120   LearningRate 0.0095   Epoch: 16   Global Step: 172370   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:01:24,952-Speed 5437.97 samples/sec   Loss 1.9149   LearningRate 0.0095   Epoch: 16   Global Step: 172380   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:01:32,477-Speed 5444.22 samples/sec   Loss 1.9061   LearningRate 0.0095   Epoch: 16   Global Step: 172390   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:01:40,077-Speed 5389.60 samples/sec   Loss 1.9034   LearningRate 0.0095   Epoch: 16   Global Step: 172400   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:01:47,560-Speed 5475.19 samples/sec   Loss 1.8914   LearningRate 0.0095   Epoch: 16   Global Step: 172410   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:01:55,046-Speed 5472.16 samples/sec   Loss 1.9006   LearningRate 0.0094   Epoch: 16   Global Step: 172420   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:02:02,644-Speed 5391.35 samples/sec   Loss 1.8803   LearningRate 0.0094   Epoch: 16   Global Step: 172430   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:02:10,190-Speed 5428.90 samples/sec   Loss 1.8689   LearningRate 0.0094   Epoch: 16   Global Step: 172440   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:02:17,812-Speed 5374.43 samples/sec   Loss 1.8734   LearningRate 0.0094   Epoch: 16   Global Step: 172450   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:02:25,378-Speed 5414.68 samples/sec   Loss 1.8787   LearningRate 0.0094   Epoch: 16   Global Step: 172460   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:02:32,937-Speed 5419.65 samples/sec   Loss 1.8908   LearningRate 0.0094   Epoch: 16   Global Step: 172470   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:02:40,463-Speed 5443.03 samples/sec   Loss 1.9228   LearningRate 0.0094   Epoch: 16   Global Step: 172480   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:02:48,069-Speed 5386.26 samples/sec   Loss 1.8762   LearningRate 0.0094   Epoch: 16   Global Step: 172490   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:02:55,529-Speed 5491.09 samples/sec   Loss 1.8816   LearningRate 0.0094   Epoch: 16   Global Step: 172500   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 10:03:03,110-Speed 5403.48 samples/sec   Loss 1.8855   LearningRate 0.0094   Epoch: 16   Global Step: 172510   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 10:03:10,617-Speed 5457.03 samples/sec   Loss 1.8763   LearningRate 0.0094   Epoch: 16   Global Step: 172520   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 10:03:18,169-Speed 5424.90 samples/sec   Loss 1.8931   LearningRate 0.0094   Epoch: 16   Global Step: 172530   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 10:03:25,649-Speed 5476.61 samples/sec   Loss 1.8897   LearningRate 0.0094   Epoch: 16   Global Step: 172540   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 10:03:33,201-Speed 5424.80 samples/sec   Loss 1.8738   LearningRate 0.0094   Epoch: 16   Global Step: 172550   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 10:03:40,710-Speed 5455.25 samples/sec   Loss 1.8840   LearningRate 0.0094   Epoch: 16   Global Step: 172560   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 10:03:48,226-Speed 5450.13 samples/sec   Loss 1.8545   LearningRate 0.0094   Epoch: 16   Global Step: 172570   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 10:03:55,741-Speed 5451.74 samples/sec   Loss 1.8934   LearningRate 0.0094   Epoch: 16   Global Step: 172580   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 10:04:03,329-Speed 5398.24 samples/sec   Loss 1.8765   LearningRate 0.0094   Epoch: 16   Global Step: 172590   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 10:04:10,857-Speed 5442.08 samples/sec   Loss 1.8650   LearningRate 0.0093   Epoch: 16   Global Step: 172600   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:04:18,343-Speed 5472.21 samples/sec   Loss 1.9203   LearningRate 0.0093   Epoch: 16   Global Step: 172610   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:04:25,856-Speed 5452.35 samples/sec   Loss 1.8738   LearningRate 0.0093   Epoch: 16   Global Step: 172620   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:04:33,360-Speed 5458.97 samples/sec   Loss 1.8903   LearningRate 0.0093   Epoch: 16   Global Step: 172630   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:04:40,834-Speed 5481.60 samples/sec   Loss 1.8747   LearningRate 0.0093   Epoch: 16   Global Step: 172640   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:04:48,380-Speed 5428.36 samples/sec   Loss 1.8764   LearningRate 0.0093   Epoch: 16   Global Step: 172650   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:04:55,867-Speed 5471.68 samples/sec   Loss 1.8394   LearningRate 0.0093   Epoch: 16   Global Step: 172660   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:05:03,409-Speed 5431.52 samples/sec   Loss 1.8829   LearningRate 0.0093   Epoch: 16   Global Step: 172670   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:05:10,895-Speed 5472.63 samples/sec   Loss 1.8351   LearningRate 0.0093   Epoch: 16   Global Step: 172680   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:05:18,431-Speed 5435.92 samples/sec   Loss 1.8904   LearningRate 0.0093   Epoch: 16   Global Step: 172690   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:05:25,933-Speed 5460.86 samples/sec   Loss 1.8662   LearningRate 0.0093   Epoch: 16   Global Step: 172700   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:05:33,417-Speed 5473.72 samples/sec   Loss 1.8729   LearningRate 0.0093   Epoch: 16   Global Step: 172710   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:05:40,961-Speed 5430.30 samples/sec   Loss 1.8690   LearningRate 0.0093   Epoch: 16   Global Step: 172720   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:05:48,490-Speed 5440.91 samples/sec   Loss 1.8776   LearningRate 0.0093   Epoch: 16   Global Step: 172730   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:05:56,085-Speed 5393.89 samples/sec   Loss 1.8757   LearningRate 0.0093   Epoch: 16   Global Step: 172740   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:06:03,557-Speed 5483.75 samples/sec   Loss 1.9126   LearningRate 0.0093   Epoch: 16   Global Step: 172750   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:06:11,307-Speed 5286.28 samples/sec   Loss 1.8668   LearningRate 0.0093   Epoch: 16   Global Step: 172760   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:06:18,941-Speed 5366.32 samples/sec   Loss 1.8819   LearningRate 0.0093   Epoch: 16   Global Step: 172770   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:06:26,524-Speed 5401.51 samples/sec   Loss 1.8490   LearningRate 0.0093   Epoch: 16   Global Step: 172780   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:06:34,083-Speed 5419.26 samples/sec   Loss 1.9166   LearningRate 0.0092   Epoch: 16   Global Step: 172790   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:06:41,612-Speed 5441.08 samples/sec   Loss 1.8373   LearningRate 0.0092   Epoch: 16   Global Step: 172800   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:06:49,116-Speed 5459.76 samples/sec   Loss 1.8725   LearningRate 0.0092   Epoch: 16   Global Step: 172810   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:06:56,665-Speed 5425.92 samples/sec   Loss 1.8766   LearningRate 0.0092   Epoch: 16   Global Step: 172820   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:07:04,082-Speed 5523.38 samples/sec   Loss 1.8710   LearningRate 0.0092   Epoch: 16   Global Step: 172830   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:07:11,626-Speed 5430.68 samples/sec   Loss 1.8985   LearningRate 0.0092   Epoch: 16   Global Step: 172840   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:07:19,111-Speed 5473.22 samples/sec   Loss 1.8548   LearningRate 0.0092   Epoch: 16   Global Step: 172850   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:07:26,665-Speed 5422.61 samples/sec   Loss 1.8193   LearningRate 0.0092   Epoch: 16   Global Step: 172860   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:07:34,152-Speed 5471.85 samples/sec   Loss 1.8589   LearningRate 0.0092   Epoch: 16   Global Step: 172870   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:07:41,758-Speed 5386.15 samples/sec   Loss 1.8631   LearningRate 0.0092   Epoch: 16   Global Step: 172880   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:07:49,325-Speed 5414.07 samples/sec   Loss 1.8354   LearningRate 0.0092   Epoch: 16   Global Step: 172890   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:07:56,896-Speed 5410.18 samples/sec   Loss 1.8745   LearningRate 0.0092   Epoch: 16   Global Step: 172900   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:08:04,432-Speed 5435.92 samples/sec   Loss 1.8457   LearningRate 0.0092   Epoch: 16   Global Step: 172910   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:08:12,070-Speed 5363.25 samples/sec   Loss 1.8965   LearningRate 0.0092   Epoch: 16   Global Step: 172920   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:08:19,557-Speed 5472.18 samples/sec   Loss 1.8634   LearningRate 0.0092   Epoch: 16   Global Step: 172930   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:08:27,086-Speed 5440.55 samples/sec   Loss 1.8677   LearningRate 0.0092   Epoch: 16   Global Step: 172940   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:08:34,561-Speed 5480.05 samples/sec   Loss 1.8642   LearningRate 0.0092   Epoch: 16   Global Step: 172950   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:08:42,096-Speed 5436.90 samples/sec   Loss 1.8649   LearningRate 0.0092   Epoch: 16   Global Step: 172960   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:08:49,603-Speed 5457.54 samples/sec   Loss 1.8458   LearningRate 0.0092   Epoch: 16   Global Step: 172970   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:08:57,139-Speed 5436.03 samples/sec   Loss 1.8450   LearningRate 0.0091   Epoch: 16   Global Step: 172980   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:09:04,674-Speed 5436.46 samples/sec   Loss 1.8568   LearningRate 0.0091   Epoch: 16   Global Step: 172990   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:09:12,258-Speed 5400.73 samples/sec   Loss 1.8697   LearningRate 0.0091   Epoch: 16   Global Step: 173000   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:09:19,811-Speed 5424.29 samples/sec   Loss 1.8702   LearningRate 0.0091   Epoch: 16   Global Step: 173010   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:09:27,374-Speed 5416.27 samples/sec   Loss 1.8246   LearningRate 0.0091   Epoch: 16   Global Step: 173020   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:09:34,903-Speed 5441.29 samples/sec   Loss 1.8832   LearningRate 0.0091   Epoch: 16   Global Step: 173030   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:09:42,418-Speed 5450.43 samples/sec   Loss 1.8677   LearningRate 0.0091   Epoch: 16   Global Step: 173040   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:09:49,965-Speed 5428.74 samples/sec   Loss 1.8728   LearningRate 0.0091   Epoch: 16   Global Step: 173050   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:09:57,576-Speed 5382.14 samples/sec   Loss 1.8484   LearningRate 0.0091   Epoch: 16   Global Step: 173060   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:10:05,212-Speed 5364.51 samples/sec   Loss 1.8674   LearningRate 0.0091   Epoch: 16   Global Step: 173070   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:10:12,799-Speed 5399.56 samples/sec   Loss 1.8449   LearningRate 0.0091   Epoch: 16   Global Step: 173080   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:10:20,199-Speed 5535.93 samples/sec   Loss 1.8616   LearningRate 0.0091   Epoch: 16   Global Step: 173090   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:10:27,781-Speed 5403.50 samples/sec   Loss 1.8523   LearningRate 0.0091   Epoch: 16   Global Step: 173100   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:10:35,337-Speed 5420.95 samples/sec   Loss 1.8600   LearningRate 0.0091   Epoch: 16   Global Step: 173110   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:10:42,858-Speed 5446.76 samples/sec   Loss 1.8626   LearningRate 0.0091   Epoch: 16   Global Step: 173120   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:10:50,373-Speed 5451.63 samples/sec   Loss 1.8591   LearningRate 0.0091   Epoch: 16   Global Step: 173130   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:10:57,902-Speed 5441.24 samples/sec   Loss 1.8710   LearningRate 0.0091   Epoch: 16   Global Step: 173140   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:11:05,423-Speed 5446.27 samples/sec   Loss 1.8641   LearningRate 0.0091   Epoch: 16   Global Step: 173150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:11:12,972-Speed 5426.57 samples/sec   Loss 1.8812   LearningRate 0.0091   Epoch: 16   Global Step: 173160   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:11:20,481-Speed 5456.20 samples/sec   Loss 1.8817   LearningRate 0.0090   Epoch: 16   Global Step: 173170   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:11:28,077-Speed 5393.02 samples/sec   Loss 1.8538   LearningRate 0.0090   Epoch: 16   Global Step: 173180   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:11:35,574-Speed 5464.63 samples/sec   Loss 1.8680   LearningRate 0.0090   Epoch: 16   Global Step: 173190   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:11:43,019-Speed 5501.88 samples/sec   Loss 1.8231   LearningRate 0.0090   Epoch: 16   Global Step: 173200   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:11:50,629-Speed 5382.98 samples/sec   Loss 1.8513   LearningRate 0.0090   Epoch: 16   Global Step: 173210   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:11:58,152-Speed 5445.96 samples/sec   Loss 1.8393   LearningRate 0.0090   Epoch: 16   Global Step: 173220   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:12:05,710-Speed 5419.76 samples/sec   Loss 1.8497   LearningRate 0.0090   Epoch: 16   Global Step: 173230   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:12:13,263-Speed 5423.37 samples/sec   Loss 1.8360   LearningRate 0.0090   Epoch: 16   Global Step: 173240   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:12:20,789-Speed 5443.71 samples/sec   Loss 1.8256   LearningRate 0.0090   Epoch: 16   Global Step: 173250   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:12:28,400-Speed 5382.72 samples/sec   Loss 1.8429   LearningRate 0.0090   Epoch: 16   Global Step: 173260   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:12:35,987-Speed 5399.69 samples/sec   Loss 1.8684   LearningRate 0.0090   Epoch: 16   Global Step: 173270   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:12:43,532-Speed 5428.84 samples/sec   Loss 1.8515   LearningRate 0.0090   Epoch: 16   Global Step: 173280   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:12:51,036-Speed 5459.88 samples/sec   Loss 1.8582   LearningRate 0.0090   Epoch: 16   Global Step: 173290   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:12:58,530-Speed 5466.50 samples/sec   Loss 1.8367   LearningRate 0.0090   Epoch: 16   Global Step: 173300   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:13:05,994-Speed 5487.85 samples/sec   Loss 1.8431   LearningRate 0.0090   Epoch: 16   Global Step: 173310   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:13:13,430-Speed 5509.43 samples/sec   Loss 1.8565   LearningRate 0.0090   Epoch: 16   Global Step: 173320   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:13:20,896-Speed 5486.88 samples/sec   Loss 1.8434   LearningRate 0.0090   Epoch: 16   Global Step: 173330   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:13:28,438-Speed 5432.09 samples/sec   Loss 1.8629   LearningRate 0.0090   Epoch: 16   Global Step: 173340   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:13:35,984-Speed 5428.66 samples/sec   Loss 1.8560   LearningRate 0.0090   Epoch: 16   Global Step: 173350   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:13:43,418-Speed 5509.59 samples/sec   Loss 1.8570   LearningRate 0.0089   Epoch: 16   Global Step: 173360   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:13:50,912-Speed 5467.28 samples/sec   Loss 1.8548   LearningRate 0.0089   Epoch: 16   Global Step: 173370   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:13:58,440-Speed 5441.54 samples/sec   Loss 1.8676   LearningRate 0.0089   Epoch: 16   Global Step: 173380   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:14:06,046-Speed 5386.30 samples/sec   Loss 1.8481   LearningRate 0.0089   Epoch: 16   Global Step: 173390   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:14:13,548-Speed 5460.23 samples/sec   Loss 1.8453   LearningRate 0.0089   Epoch: 16   Global Step: 173400   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:14:21,014-Speed 5486.78 samples/sec   Loss 1.8289   LearningRate 0.0089   Epoch: 16   Global Step: 173410   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:14:28,520-Speed 5458.45 samples/sec   Loss 1.8212   LearningRate 0.0089   Epoch: 16   Global Step: 173420   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:14:36,076-Speed 5420.97 samples/sec   Loss 1.8668   LearningRate 0.0089   Epoch: 16   Global Step: 173430   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:14:43,684-Speed 5384.28 samples/sec   Loss 1.8619   LearningRate 0.0089   Epoch: 16   Global Step: 173440   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:14:51,161-Speed 5479.31 samples/sec   Loss 1.8322   LearningRate 0.0089   Epoch: 16   Global Step: 173450   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:14:58,593-Speed 5512.37 samples/sec   Loss 1.8296   LearningRate 0.0089   Epoch: 16   Global Step: 173460   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:15:06,144-Speed 5425.21 samples/sec   Loss 1.8364   LearningRate 0.0089   Epoch: 16   Global Step: 173470   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:15:13,692-Speed 5426.90 samples/sec   Loss 1.8217   LearningRate 0.0089   Epoch: 16   Global Step: 173480   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:15:21,248-Speed 5421.94 samples/sec   Loss 1.8402   LearningRate 0.0089   Epoch: 16   Global Step: 173490   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:15:28,829-Speed 5403.96 samples/sec   Loss 1.8210   LearningRate 0.0089   Epoch: 16   Global Step: 173500   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:15:36,328-Speed 5462.39 samples/sec   Loss 1.8282   LearningRate 0.0089   Epoch: 16   Global Step: 173510   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 10:15:43,846-Speed 5449.20 samples/sec   Loss 1.8075   LearningRate 0.0089   Epoch: 16   Global Step: 173520   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 10:15:51,390-Speed 5429.66 samples/sec   Loss 1.8395   LearningRate 0.0089   Epoch: 16   Global Step: 173530   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 10:15:59,005-Speed 5379.88 samples/sec   Loss 1.8402   LearningRate 0.0089   Epoch: 16   Global Step: 173540   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 10:16:06,575-Speed 5411.60 samples/sec   Loss 1.8251   LearningRate 0.0088   Epoch: 16   Global Step: 173550   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 10:16:14,158-Speed 5401.98 samples/sec   Loss 1.8442   LearningRate 0.0088   Epoch: 16   Global Step: 173560   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 10:16:21,637-Speed 5477.61 samples/sec   Loss 1.8266   LearningRate 0.0088   Epoch: 16   Global Step: 173570   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 10:16:29,163-Speed 5443.38 samples/sec   Loss 1.8505   LearningRate 0.0088   Epoch: 16   Global Step: 173580   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 10:16:36,764-Speed 5389.52 samples/sec   Loss 1.8052   LearningRate 0.0088   Epoch: 16   Global Step: 173590   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 10:16:44,301-Speed 5435.10 samples/sec   Loss 1.8634   LearningRate 0.0088   Epoch: 16   Global Step: 173600   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 10:16:51,911-Speed 5382.96 samples/sec   Loss 1.8216   LearningRate 0.0088   Epoch: 16   Global Step: 173610   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:16:59,419-Speed 5456.79 samples/sec   Loss 1.8201   LearningRate 0.0088   Epoch: 16   Global Step: 173620   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:17:06,984-Speed 5415.30 samples/sec   Loss 1.8407   LearningRate 0.0088   Epoch: 16   Global Step: 173630   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:17:14,584-Speed 5389.89 samples/sec   Loss 1.8238   LearningRate 0.0088   Epoch: 16   Global Step: 173640   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:17:22,165-Speed 5403.41 samples/sec   Loss 1.8711   LearningRate 0.0088   Epoch: 16   Global Step: 173650   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:17:29,705-Speed 5433.52 samples/sec   Loss 1.8176   LearningRate 0.0088   Epoch: 16   Global Step: 173660   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:17:37,195-Speed 5469.26 samples/sec   Loss 1.8232   LearningRate 0.0088   Epoch: 16   Global Step: 173670   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:17:44,716-Speed 5447.11 samples/sec   Loss 1.8367   LearningRate 0.0088   Epoch: 16   Global Step: 173680   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 10:17:52,303-Speed 5398.95 samples/sec   Loss 1.8031   LearningRate 0.0088   Epoch: 16   Global Step: 173690   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 10:17:59,887-Speed 5401.65 samples/sec   Loss 1.8490   LearningRate 0.0088   Epoch: 16   Global Step: 173700   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 10:18:07,386-Speed 5462.42 samples/sec   Loss 1.7967   LearningRate 0.0088   Epoch: 16   Global Step: 173710   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 10:18:14,915-Speed 5441.15 samples/sec   Loss 1.8011   LearningRate 0.0088   Epoch: 16   Global Step: 173720   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 10:18:22,446-Speed 5439.42 samples/sec   Loss 1.8561   LearningRate 0.0088   Epoch: 16   Global Step: 173730   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 10:18:30,019-Speed 5409.46 samples/sec   Loss 1.8148   LearningRate 0.0087   Epoch: 16   Global Step: 173740   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 10:18:37,558-Speed 5433.99 samples/sec   Loss 1.8453   LearningRate 0.0087   Epoch: 16   Global Step: 173750   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 10:18:45,011-Speed 5496.86 samples/sec   Loss 1.8241   LearningRate 0.0087   Epoch: 16   Global Step: 173760   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 10:18:52,563-Speed 5423.77 samples/sec   Loss 1.8031   LearningRate 0.0087   Epoch: 16   Global Step: 173770   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 10:19:00,029-Speed 5487.00 samples/sec   Loss 1.8341   LearningRate 0.0087   Epoch: 16   Global Step: 173780   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:19:07,546-Speed 5450.29 samples/sec   Loss 1.8183   LearningRate 0.0087   Epoch: 16   Global Step: 173790   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:19:15,051-Speed 5458.17 samples/sec   Loss 1.8048   LearningRate 0.0087   Epoch: 16   Global Step: 173800   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:19:22,564-Speed 5452.33 samples/sec   Loss 1.7886   LearningRate 0.0087   Epoch: 16   Global Step: 173810   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:19:30,076-Speed 5452.97 samples/sec   Loss 1.8031   LearningRate 0.0087   Epoch: 16   Global Step: 173820   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:19:37,528-Speed 5497.64 samples/sec   Loss 1.8424   LearningRate 0.0087   Epoch: 16   Global Step: 173830   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:19:45,015-Speed 5471.60 samples/sec   Loss 1.7829   LearningRate 0.0087   Epoch: 16   Global Step: 173840   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:19:52,511-Speed 5464.58 samples/sec   Loss 1.7876   LearningRate 0.0087   Epoch: 16   Global Step: 173850   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:20:00,020-Speed 5455.17 samples/sec   Loss 1.8489   LearningRate 0.0087   Epoch: 16   Global Step: 173860   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:20:07,612-Speed 5396.34 samples/sec   Loss 1.8207   LearningRate 0.0087   Epoch: 16   Global Step: 173870   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:20:15,099-Speed 5471.52 samples/sec   Loss 1.8430   LearningRate 0.0087   Epoch: 16   Global Step: 173880   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:20:22,854-Speed 5282.55 samples/sec   Loss 1.8460   LearningRate 0.0087   Epoch: 16   Global Step: 173890   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:20:30,410-Speed 5421.17 samples/sec   Loss 1.8295   LearningRate 0.0087   Epoch: 16   Global Step: 173900   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:20:38,145-Speed 5296.32 samples/sec   Loss 1.8637   LearningRate 0.0087   Epoch: 16   Global Step: 173910   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:20:45,665-Speed 5447.44 samples/sec   Loss 1.8523   LearningRate 0.0087   Epoch: 16   Global Step: 173920   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:20:53,212-Speed 5428.11 samples/sec   Loss 1.8195   LearningRate 0.0086   Epoch: 16   Global Step: 173930   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:21:00,755-Speed 5431.04 samples/sec   Loss 1.8143   LearningRate 0.0086   Epoch: 16   Global Step: 173940   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:21:08,244-Speed 5470.39 samples/sec   Loss 1.8175   LearningRate 0.0086   Epoch: 16   Global Step: 173950   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:21:15,742-Speed 5463.45 samples/sec   Loss 1.7952   LearningRate 0.0086   Epoch: 16   Global Step: 173960   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:21:23,283-Speed 5432.31 samples/sec   Loss 1.8256   LearningRate 0.0086   Epoch: 16   Global Step: 173970   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:21:30,815-Speed 5438.44 samples/sec   Loss 1.8187   LearningRate 0.0086   Epoch: 16   Global Step: 173980   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:21:38,301-Speed 5472.63 samples/sec   Loss 1.8395   LearningRate 0.0086   Epoch: 16   Global Step: 173990   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:21:45,818-Speed 5450.37 samples/sec   Loss 1.8341   LearningRate 0.0086   Epoch: 16   Global Step: 174000   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:22:30,344-[lfw][174000]XNorm: 23.097489
Training: 2022-01-09 10:22:30,345-[lfw][174000]Accuracy-Flip: 0.99833+-0.00236
Training: 2022-01-09 10:22:30,345-[lfw][174000]Accuracy-Highest: 0.99833
Training: 2022-01-09 10:23:22,116-[cfp_fp][174000]XNorm: 22.366889
Training: 2022-01-09 10:23:22,117-[cfp_fp][174000]Accuracy-Flip: 0.99300+-0.00364
Training: 2022-01-09 10:23:22,117-[cfp_fp][174000]Accuracy-Highest: 0.99371
Training: 2022-01-09 10:24:06,609-[agedb_30][174000]XNorm: 23.369141
Training: 2022-01-09 10:24:06,610-[agedb_30][174000]Accuracy-Flip: 0.98400+-0.00484
Training: 2022-01-09 10:24:06,610-[agedb_30][174000]Accuracy-Highest: 0.98433
Training: 2022-01-09 10:24:14,202-Speed 276.04 samples/sec   Loss 1.8180   LearningRate 0.0086   Epoch: 16   Global Step: 174010   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:24:21,648-Speed 5501.23 samples/sec   Loss 1.7896   LearningRate 0.0086   Epoch: 16   Global Step: 174020   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:24:29,252-Speed 5387.35 samples/sec   Loss 1.8389   LearningRate 0.0086   Epoch: 16   Global Step: 174030   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:24:36,732-Speed 5476.78 samples/sec   Loss 1.7926   LearningRate 0.0086   Epoch: 16   Global Step: 174040   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:24:44,236-Speed 5459.27 samples/sec   Loss 1.7933   LearningRate 0.0086   Epoch: 16   Global Step: 174050   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:24:51,817-Speed 5403.40 samples/sec   Loss 1.8024   LearningRate 0.0086   Epoch: 16   Global Step: 174060   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:24:59,371-Speed 5423.41 samples/sec   Loss 1.8198   LearningRate 0.0086   Epoch: 16   Global Step: 174070   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:25:06,864-Speed 5467.31 samples/sec   Loss 1.8213   LearningRate 0.0086   Epoch: 16   Global Step: 174080   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:25:14,426-Speed 5417.77 samples/sec   Loss 1.8393   LearningRate 0.0086   Epoch: 16   Global Step: 174090   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:25:21,907-Speed 5475.21 samples/sec   Loss 1.7604   LearningRate 0.0086   Epoch: 16   Global Step: 174100   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:25:29,498-Speed 5396.78 samples/sec   Loss 1.8315   LearningRate 0.0086   Epoch: 16   Global Step: 174110   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:25:37,116-Speed 5377.78 samples/sec   Loss 1.7939   LearningRate 0.0086   Epoch: 16   Global Step: 174120   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:25:44,600-Speed 5473.56 samples/sec   Loss 1.8154   LearningRate 0.0085   Epoch: 16   Global Step: 174130   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:25:52,134-Speed 5437.08 samples/sec   Loss 1.8408   LearningRate 0.0085   Epoch: 16   Global Step: 174140   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:25:59,683-Speed 5426.83 samples/sec   Loss 1.7906   LearningRate 0.0085   Epoch: 16   Global Step: 174150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:26:07,189-Speed 5458.12 samples/sec   Loss 1.8014   LearningRate 0.0085   Epoch: 16   Global Step: 174160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:26:14,680-Speed 5468.66 samples/sec   Loss 1.8105   LearningRate 0.0085   Epoch: 16   Global Step: 174170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:26:22,189-Speed 5455.13 samples/sec   Loss 1.8059   LearningRate 0.0085   Epoch: 16   Global Step: 174180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:26:29,726-Speed 5435.26 samples/sec   Loss 1.8148   LearningRate 0.0085   Epoch: 16   Global Step: 174190   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:26:37,255-Speed 5441.43 samples/sec   Loss 1.7982   LearningRate 0.0085   Epoch: 16   Global Step: 174200   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:26:44,714-Speed 5491.53 samples/sec   Loss 1.8119   LearningRate 0.0085   Epoch: 16   Global Step: 174210   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:26:52,288-Speed 5408.73 samples/sec   Loss 1.7890   LearningRate 0.0085   Epoch: 16   Global Step: 174220   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:26:59,800-Speed 5453.02 samples/sec   Loss 1.8170   LearningRate 0.0085   Epoch: 16   Global Step: 174230   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:27:07,356-Speed 5421.68 samples/sec   Loss 1.8114   LearningRate 0.0085   Epoch: 16   Global Step: 174240   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:27:14,865-Speed 5456.19 samples/sec   Loss 1.7917   LearningRate 0.0085   Epoch: 16   Global Step: 174250   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:27:22,344-Speed 5477.27 samples/sec   Loss 1.8507   LearningRate 0.0085   Epoch: 16   Global Step: 174260   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:27:29,888-Speed 5429.90 samples/sec   Loss 1.7751   LearningRate 0.0085   Epoch: 16   Global Step: 174270   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:27:37,528-Speed 5361.58 samples/sec   Loss 1.7905   LearningRate 0.0085   Epoch: 16   Global Step: 174280   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:27:45,302-Speed 5269.88 samples/sec   Loss 1.8028   LearningRate 0.0085   Epoch: 16   Global Step: 174290   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:27:53,062-Speed 5279.12 samples/sec   Loss 1.7854   LearningRate 0.0085   Epoch: 16   Global Step: 174300   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:28:00,733-Speed 5339.82 samples/sec   Loss 1.7773   LearningRate 0.0085   Epoch: 16   Global Step: 174310   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:28:08,328-Speed 5393.72 samples/sec   Loss 1.7932   LearningRate 0.0084   Epoch: 16   Global Step: 174320   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:28:16,052-Speed 5304.14 samples/sec   Loss 1.7609   LearningRate 0.0084   Epoch: 16   Global Step: 174330   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:28:23,714-Speed 5346.89 samples/sec   Loss 1.8010   LearningRate 0.0084   Epoch: 16   Global Step: 174340   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:28:31,306-Speed 5395.04 samples/sec   Loss 1.7990   LearningRate 0.0084   Epoch: 16   Global Step: 174350   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:28:38,851-Speed 5429.74 samples/sec   Loss 1.7837   LearningRate 0.0084   Epoch: 16   Global Step: 174360   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:28:46,374-Speed 5445.80 samples/sec   Loss 1.8038   LearningRate 0.0084   Epoch: 16   Global Step: 174370   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:28:53,885-Speed 5453.85 samples/sec   Loss 1.7973   LearningRate 0.0084   Epoch: 16   Global Step: 174380   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:29:01,424-Speed 5433.78 samples/sec   Loss 1.7999   LearningRate 0.0084   Epoch: 16   Global Step: 174390   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:29:08,984-Speed 5418.61 samples/sec   Loss 1.8023   LearningRate 0.0084   Epoch: 16   Global Step: 174400   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:29:16,509-Speed 5444.36 samples/sec   Loss 1.7888   LearningRate 0.0084   Epoch: 16   Global Step: 174410   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:29:24,010-Speed 5461.16 samples/sec   Loss 1.7957   LearningRate 0.0084   Epoch: 16   Global Step: 174420   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:29:31,496-Speed 5471.61 samples/sec   Loss 1.7551   LearningRate 0.0084   Epoch: 16   Global Step: 174430   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:29:39,266-Speed 5272.20 samples/sec   Loss 1.8252   LearningRate 0.0084   Epoch: 16   Global Step: 174440   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:29:46,835-Speed 5412.51 samples/sec   Loss 1.7962   LearningRate 0.0084   Epoch: 16   Global Step: 174450   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:29:54,295-Speed 5491.07 samples/sec   Loss 1.7636   LearningRate 0.0084   Epoch: 16   Global Step: 174460   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:30:01,831-Speed 5436.13 samples/sec   Loss 1.8236   LearningRate 0.0084   Epoch: 16   Global Step: 174470   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:30:09,363-Speed 5438.60 samples/sec   Loss 1.7867   LearningRate 0.0084   Epoch: 16   Global Step: 174480   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:30:16,847-Speed 5473.94 samples/sec   Loss 1.7743   LearningRate 0.0084   Epoch: 16   Global Step: 174490   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:30:24,340-Speed 5466.94 samples/sec   Loss 1.7711   LearningRate 0.0084   Epoch: 16   Global Step: 174500   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:30:31,836-Speed 5464.72 samples/sec   Loss 1.7881   LearningRate 0.0084   Epoch: 16   Global Step: 174510   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:30:39,453-Speed 5378.78 samples/sec   Loss 1.7908   LearningRate 0.0083   Epoch: 16   Global Step: 174520   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:30:46,890-Speed 5508.55 samples/sec   Loss 1.8044   LearningRate 0.0083   Epoch: 16   Global Step: 174530   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:30:54,383-Speed 5467.26 samples/sec   Loss 1.7900   LearningRate 0.0083   Epoch: 16   Global Step: 174540   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:31:02,071-Speed 5328.18 samples/sec   Loss 1.7710   LearningRate 0.0083   Epoch: 16   Global Step: 174550   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:31:09,659-Speed 5398.65 samples/sec   Loss 1.7598   LearningRate 0.0083   Epoch: 16   Global Step: 174560   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:31:17,173-Speed 5452.55 samples/sec   Loss 1.7748   LearningRate 0.0083   Epoch: 16   Global Step: 174570   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:31:24,831-Speed 5349.03 samples/sec   Loss 1.8093   LearningRate 0.0083   Epoch: 16   Global Step: 174580   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:31:32,291-Speed 5491.24 samples/sec   Loss 1.7855   LearningRate 0.0083   Epoch: 16   Global Step: 174590   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:31:39,778-Speed 5471.93 samples/sec   Loss 1.7889   LearningRate 0.0083   Epoch: 16   Global Step: 174600   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 10:31:47,278-Speed 5462.32 samples/sec   Loss 1.7902   LearningRate 0.0083   Epoch: 16   Global Step: 174610   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:31:54,778-Speed 5461.73 samples/sec   Loss 1.7765   LearningRate 0.0083   Epoch: 16   Global Step: 174620   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:32:02,303-Speed 5443.60 samples/sec   Loss 1.7530   LearningRate 0.0083   Epoch: 16   Global Step: 174630   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:32:09,918-Speed 5379.77 samples/sec   Loss 1.7676   LearningRate 0.0083   Epoch: 16   Global Step: 174640   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 10:32:17,445-Speed 5442.30 samples/sec   Loss 1.7818   LearningRate 0.0083   Epoch: 16   Global Step: 174650   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:32:24,976-Speed 5439.27 samples/sec   Loss 1.7690   LearningRate 0.0083   Epoch: 16   Global Step: 174660   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:32:32,670-Speed 5324.73 samples/sec   Loss 1.8112   LearningRate 0.0083   Epoch: 16   Global Step: 174670   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:32:40,147-Speed 5478.81 samples/sec   Loss 1.8018   LearningRate 0.0083   Epoch: 16   Global Step: 174680   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:32:47,669-Speed 5445.92 samples/sec   Loss 1.7931   LearningRate 0.0083   Epoch: 16   Global Step: 174690   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:32:55,165-Speed 5464.80 samples/sec   Loss 1.7614   LearningRate 0.0083   Epoch: 16   Global Step: 174700   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:33:02,616-Speed 5498.44 samples/sec   Loss 1.8115   LearningRate 0.0082   Epoch: 16   Global Step: 174710   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:33:10,198-Speed 5402.87 samples/sec   Loss 1.7688   LearningRate 0.0082   Epoch: 16   Global Step: 174720   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:33:17,726-Speed 5441.74 samples/sec   Loss 1.7761   LearningRate 0.0082   Epoch: 16   Global Step: 174730   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:33:25,230-Speed 5458.95 samples/sec   Loss 1.7923   LearningRate 0.0082   Epoch: 16   Global Step: 174740   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:33:32,779-Speed 5426.38 samples/sec   Loss 1.7902   LearningRate 0.0082   Epoch: 16   Global Step: 174750   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:33:40,312-Speed 5438.40 samples/sec   Loss 1.7900   LearningRate 0.0082   Epoch: 16   Global Step: 174760   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:33:47,824-Speed 5452.91 samples/sec   Loss 1.8205   LearningRate 0.0082   Epoch: 16   Global Step: 174770   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:33:55,381-Speed 5420.89 samples/sec   Loss 1.7993   LearningRate 0.0082   Epoch: 16   Global Step: 174780   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:34:02,900-Speed 5448.75 samples/sec   Loss 1.7928   LearningRate 0.0082   Epoch: 16   Global Step: 174790   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 10:34:10,361-Speed 5490.01 samples/sec   Loss 1.7983   LearningRate 0.0082   Epoch: 16   Global Step: 174800   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 10:34:17,892-Speed 5440.27 samples/sec   Loss 1.7428   LearningRate 0.0082   Epoch: 16   Global Step: 174810   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 10:34:25,424-Speed 5438.41 samples/sec   Loss 1.7809   LearningRate 0.0082   Epoch: 16   Global Step: 174820   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 10:34:33,032-Speed 5384.71 samples/sec   Loss 1.7552   LearningRate 0.0082   Epoch: 16   Global Step: 174830   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 10:34:40,557-Speed 5443.44 samples/sec   Loss 1.7862   LearningRate 0.0082   Epoch: 16   Global Step: 174840   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 10:34:48,505-Speed 5154.17 samples/sec   Loss 1.7752   LearningRate 0.0082   Epoch: 16   Global Step: 174850   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 10:34:56,088-Speed 5402.38 samples/sec   Loss 1.7562   LearningRate 0.0082   Epoch: 16   Global Step: 174860   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 10:35:03,608-Speed 5447.68 samples/sec   Loss 1.7813   LearningRate 0.0082   Epoch: 16   Global Step: 174870   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 10:35:11,167-Speed 5419.07 samples/sec   Loss 1.7638   LearningRate 0.0082   Epoch: 16   Global Step: 174880   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 10:35:18,685-Speed 5448.71 samples/sec   Loss 1.7755   LearningRate 0.0082   Epoch: 16   Global Step: 174890   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:35:26,222-Speed 5436.00 samples/sec   Loss 1.7707   LearningRate 0.0082   Epoch: 16   Global Step: 174900   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:35:33,787-Speed 5414.51 samples/sec   Loss 1.7571   LearningRate 0.0081   Epoch: 16   Global Step: 174910   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:35:41,422-Speed 5365.78 samples/sec   Loss 1.7439   LearningRate 0.0081   Epoch: 16   Global Step: 174920   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:35:48,947-Speed 5444.48 samples/sec   Loss 1.7475   LearningRate 0.0081   Epoch: 16   Global Step: 174930   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:35:56,590-Speed 5359.86 samples/sec   Loss 1.7863   LearningRate 0.0081   Epoch: 16   Global Step: 174940   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:36:04,168-Speed 5405.95 samples/sec   Loss 1.7868   LearningRate 0.0081   Epoch: 16   Global Step: 174950   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:36:11,793-Speed 5372.54 samples/sec   Loss 1.7416   LearningRate 0.0081   Epoch: 16   Global Step: 174960   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:36:19,302-Speed 5455.57 samples/sec   Loss 1.7814   LearningRate 0.0081   Epoch: 16   Global Step: 174970   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:36:26,842-Speed 5432.64 samples/sec   Loss 1.8023   LearningRate 0.0081   Epoch: 16   Global Step: 174980   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:36:34,483-Speed 5361.66 samples/sec   Loss 1.7966   LearningRate 0.0081   Epoch: 16   Global Step: 174990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:36:42,045-Speed 5416.81 samples/sec   Loss 1.7565   LearningRate 0.0081   Epoch: 16   Global Step: 175000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:36:49,709-Speed 5345.38 samples/sec   Loss 1.7665   LearningRate 0.0081   Epoch: 16   Global Step: 175010   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:36:57,302-Speed 5395.25 samples/sec   Loss 1.7502   LearningRate 0.0081   Epoch: 16   Global Step: 175020   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:37:04,899-Speed 5391.93 samples/sec   Loss 1.7646   LearningRate 0.0081   Epoch: 16   Global Step: 175030   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:37:12,493-Speed 5394.25 samples/sec   Loss 1.7911   LearningRate 0.0081   Epoch: 16   Global Step: 175040   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:37:19,977-Speed 5473.88 samples/sec   Loss 1.7670   LearningRate 0.0081   Epoch: 16   Global Step: 175050   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:37:27,515-Speed 5434.68 samples/sec   Loss 1.7677   LearningRate 0.0081   Epoch: 16   Global Step: 175060   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:37:35,137-Speed 5374.52 samples/sec   Loss 1.7544   LearningRate 0.0081   Epoch: 16   Global Step: 175070   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:37:42,753-Speed 5378.17 samples/sec   Loss 1.7681   LearningRate 0.0081   Epoch: 16   Global Step: 175080   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:37:50,359-Speed 5386.61 samples/sec   Loss 1.7422   LearningRate 0.0081   Epoch: 16   Global Step: 175090   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:37:57,995-Speed 5364.48 samples/sec   Loss 1.7778   LearningRate 0.0081   Epoch: 16   Global Step: 175100   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:38:05,579-Speed 5401.59 samples/sec   Loss 1.7747   LearningRate 0.0080   Epoch: 16   Global Step: 175110   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:38:13,169-Speed 5396.87 samples/sec   Loss 1.7409   LearningRate 0.0080   Epoch: 16   Global Step: 175120   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:38:20,743-Speed 5408.72 samples/sec   Loss 1.7836   LearningRate 0.0080   Epoch: 16   Global Step: 175130   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:38:28,291-Speed 5427.53 samples/sec   Loss 1.7648   LearningRate 0.0080   Epoch: 16   Global Step: 175140   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:38:35,780-Speed 5469.75 samples/sec   Loss 1.7626   LearningRate 0.0080   Epoch: 16   Global Step: 175150   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:38:43,420-Speed 5362.21 samples/sec   Loss 1.7577   LearningRate 0.0080   Epoch: 16   Global Step: 175160   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:38:51,073-Speed 5353.04 samples/sec   Loss 1.7710   LearningRate 0.0080   Epoch: 16   Global Step: 175170   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:38:58,621-Speed 5427.26 samples/sec   Loss 1.7790   LearningRate 0.0080   Epoch: 16   Global Step: 175180   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:39:06,117-Speed 5464.75 samples/sec   Loss 1.7853   LearningRate 0.0080   Epoch: 16   Global Step: 175190   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:39:13,764-Speed 5356.94 samples/sec   Loss 1.7567   LearningRate 0.0080   Epoch: 16   Global Step: 175200   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:39:21,306-Speed 5432.04 samples/sec   Loss 1.7517   LearningRate 0.0080   Epoch: 16   Global Step: 175210   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:39:28,883-Speed 5406.38 samples/sec   Loss 1.7440   LearningRate 0.0080   Epoch: 16   Global Step: 175220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:39:36,414-Speed 5439.43 samples/sec   Loss 1.7744   LearningRate 0.0080   Epoch: 16   Global Step: 175230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:39:44,009-Speed 5394.00 samples/sec   Loss 1.7538   LearningRate 0.0080   Epoch: 16   Global Step: 175240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:39:51,642-Speed 5366.48 samples/sec   Loss 1.7324   LearningRate 0.0080   Epoch: 16   Global Step: 175250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:39:59,149-Speed 5457.43 samples/sec   Loss 1.7445   LearningRate 0.0080   Epoch: 16   Global Step: 175260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:40:06,878-Speed 5300.01 samples/sec   Loss 1.7490   LearningRate 0.0080   Epoch: 16   Global Step: 175270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:40:14,486-Speed 5384.70 samples/sec   Loss 1.7765   LearningRate 0.0080   Epoch: 16   Global Step: 175280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:40:22,056-Speed 5411.67 samples/sec   Loss 1.7673   LearningRate 0.0080   Epoch: 16   Global Step: 175290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:40:29,599-Speed 5430.78 samples/sec   Loss 1.7582   LearningRate 0.0080   Epoch: 16   Global Step: 175300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:40:37,079-Speed 5476.78 samples/sec   Loss 1.7696   LearningRate 0.0079   Epoch: 16   Global Step: 175310   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:40:44,651-Speed 5410.04 samples/sec   Loss 1.7747   LearningRate 0.0079   Epoch: 16   Global Step: 175320   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:40:52,129-Speed 5477.80 samples/sec   Loss 1.7622   LearningRate 0.0079   Epoch: 16   Global Step: 175330   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:40:59,617-Speed 5470.97 samples/sec   Loss 1.7237   LearningRate 0.0079   Epoch: 16   Global Step: 175340   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:41:07,220-Speed 5388.22 samples/sec   Loss 1.7293   LearningRate 0.0079   Epoch: 16   Global Step: 175350   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:41:14,843-Speed 5373.84 samples/sec   Loss 1.7405   LearningRate 0.0079   Epoch: 16   Global Step: 175360   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:41:22,607-Speed 5275.59 samples/sec   Loss 1.7545   LearningRate 0.0079   Epoch: 16   Global Step: 175370   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:41:30,131-Speed 5445.51 samples/sec   Loss 1.7367   LearningRate 0.0079   Epoch: 16   Global Step: 175380   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:41:37,645-Speed 5451.74 samples/sec   Loss 1.7698   LearningRate 0.0079   Epoch: 16   Global Step: 175390   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:41:45,193-Speed 5426.76 samples/sec   Loss 1.7594   LearningRate 0.0079   Epoch: 16   Global Step: 175400   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:41:52,799-Speed 5385.64 samples/sec   Loss 1.7495   LearningRate 0.0079   Epoch: 16   Global Step: 175410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:42:00,283-Speed 5474.04 samples/sec   Loss 1.7428   LearningRate 0.0079   Epoch: 16   Global Step: 175420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:42:07,758-Speed 5480.53 samples/sec   Loss 1.7392   LearningRate 0.0079   Epoch: 16   Global Step: 175430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:42:15,266-Speed 5456.05 samples/sec   Loss 1.7458   LearningRate 0.0079   Epoch: 16   Global Step: 175440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:42:22,819-Speed 5423.36 samples/sec   Loss 1.7591   LearningRate 0.0079   Epoch: 16   Global Step: 175450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:42:30,397-Speed 5406.55 samples/sec   Loss 1.7204   LearningRate 0.0079   Epoch: 16   Global Step: 175460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:42:37,918-Speed 5446.85 samples/sec   Loss 1.7431   LearningRate 0.0079   Epoch: 16   Global Step: 175470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:42:45,358-Speed 5505.28 samples/sec   Loss 1.7229   LearningRate 0.0079   Epoch: 16   Global Step: 175480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:42:52,874-Speed 5450.08 samples/sec   Loss 1.7426   LearningRate 0.0079   Epoch: 16   Global Step: 175490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:43:00,503-Speed 5370.18 samples/sec   Loss 1.7385   LearningRate 0.0079   Epoch: 16   Global Step: 175500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:43:08,038-Speed 5436.82 samples/sec   Loss 1.7575   LearningRate 0.0079   Epoch: 16   Global Step: 175510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-09 10:43:15,585-Speed 5428.16 samples/sec   Loss 1.7260   LearningRate 0.0078   Epoch: 16   Global Step: 175520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-09 10:43:23,202-Speed 5377.46 samples/sec   Loss 1.7582   LearningRate 0.0078   Epoch: 16   Global Step: 175530   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:43:30,679-Speed 5479.05 samples/sec   Loss 1.7314   LearningRate 0.0078   Epoch: 16   Global Step: 175540   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:43:38,158-Speed 5477.70 samples/sec   Loss 1.7331   LearningRate 0.0078   Epoch: 16   Global Step: 175550   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:43:45,669-Speed 5453.89 samples/sec   Loss 1.7525   LearningRate 0.0078   Epoch: 16   Global Step: 175560   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:43:53,238-Speed 5411.60 samples/sec   Loss 1.7275   LearningRate 0.0078   Epoch: 16   Global Step: 175570   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:44:00,744-Speed 5458.31 samples/sec   Loss 1.7451   LearningRate 0.0078   Epoch: 16   Global Step: 175580   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:44:08,249-Speed 5458.73 samples/sec   Loss 1.7328   LearningRate 0.0078   Epoch: 16   Global Step: 175590   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:44:15,829-Speed 5403.86 samples/sec   Loss 1.7648   LearningRate 0.0078   Epoch: 16   Global Step: 175600   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:44:23,298-Speed 5484.45 samples/sec   Loss 1.7473   LearningRate 0.0078   Epoch: 16   Global Step: 175610   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:44:30,789-Speed 5469.10 samples/sec   Loss 1.7303   LearningRate 0.0078   Epoch: 16   Global Step: 175620   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:44:38,356-Speed 5413.52 samples/sec   Loss 1.7624   LearningRate 0.0078   Epoch: 16   Global Step: 175630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:44:45,907-Speed 5424.70 samples/sec   Loss 1.7483   LearningRate 0.0078   Epoch: 16   Global Step: 175640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:44:53,421-Speed 5452.00 samples/sec   Loss 1.7422   LearningRate 0.0078   Epoch: 16   Global Step: 175650   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:45:00,923-Speed 5460.75 samples/sec   Loss 1.7495   LearningRate 0.0078   Epoch: 16   Global Step: 175660   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:45:08,510-Speed 5399.84 samples/sec   Loss 1.7492   LearningRate 0.0078   Epoch: 16   Global Step: 175670   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:45:16,071-Speed 5417.14 samples/sec   Loss 1.7595   LearningRate 0.0078   Epoch: 16   Global Step: 175680   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:45:23,602-Speed 5439.48 samples/sec   Loss 1.7466   LearningRate 0.0078   Epoch: 16   Global Step: 175690   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:45:31,152-Speed 5426.10 samples/sec   Loss 1.7349   LearningRate 0.0078   Epoch: 16   Global Step: 175700   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:45:38,664-Speed 5453.81 samples/sec   Loss 1.7097   LearningRate 0.0078   Epoch: 16   Global Step: 175710   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:45:46,122-Speed 5492.59 samples/sec   Loss 1.7477   LearningRate 0.0077   Epoch: 16   Global Step: 175720   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:45:53,734-Speed 5382.03 samples/sec   Loss 1.7384   LearningRate 0.0077   Epoch: 16   Global Step: 175730   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:46:01,283-Speed 5426.01 samples/sec   Loss 1.7176   LearningRate 0.0077   Epoch: 16   Global Step: 175740   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:46:08,969-Speed 5330.59 samples/sec   Loss 1.7406   LearningRate 0.0077   Epoch: 16   Global Step: 175750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:46:16,524-Speed 5422.88 samples/sec   Loss 1.7097   LearningRate 0.0077   Epoch: 16   Global Step: 175760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:46:24,100-Speed 5406.39 samples/sec   Loss 1.7278   LearningRate 0.0077   Epoch: 16   Global Step: 175770   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:46:31,535-Speed 5509.87 samples/sec   Loss 1.7547   LearningRate 0.0077   Epoch: 16   Global Step: 175780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:46:39,086-Speed 5426.10 samples/sec   Loss 1.7384   LearningRate 0.0077   Epoch: 16   Global Step: 175790   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:46:46,666-Speed 5403.93 samples/sec   Loss 1.7453   LearningRate 0.0077   Epoch: 16   Global Step: 175800   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:46:54,154-Speed 5470.95 samples/sec   Loss 1.7260   LearningRate 0.0077   Epoch: 16   Global Step: 175810   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:47:01,671-Speed 5449.33 samples/sec   Loss 1.7370   LearningRate 0.0077   Epoch: 16   Global Step: 175820   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:47:09,233-Speed 5417.94 samples/sec   Loss 1.7279   LearningRate 0.0077   Epoch: 16   Global Step: 175830   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:47:16,668-Speed 5509.74 samples/sec   Loss 1.7560   LearningRate 0.0077   Epoch: 16   Global Step: 175840   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:47:24,146-Speed 5477.82 samples/sec   Loss 1.7088   LearningRate 0.0077   Epoch: 16   Global Step: 175850   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:47:31,610-Speed 5488.35 samples/sec   Loss 1.7078   LearningRate 0.0077   Epoch: 16   Global Step: 175860   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:47:39,075-Speed 5488.43 samples/sec   Loss 1.6979   LearningRate 0.0077   Epoch: 16   Global Step: 175870   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:47:46,645-Speed 5411.77 samples/sec   Loss 1.7620   LearningRate 0.0077   Epoch: 16   Global Step: 175880   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:47:54,079-Speed 5510.14 samples/sec   Loss 1.6937   LearningRate 0.0077   Epoch: 16   Global Step: 175890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:48:01,691-Speed 5381.33 samples/sec   Loss 1.7746   LearningRate 0.0077   Epoch: 16   Global Step: 175900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:48:09,183-Speed 5468.38 samples/sec   Loss 1.7078   LearningRate 0.0077   Epoch: 16   Global Step: 175910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:48:16,617-Speed 5511.05 samples/sec   Loss 1.7296   LearningRate 0.0076   Epoch: 16   Global Step: 175920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:48:24,144-Speed 5441.99 samples/sec   Loss 1.7379   LearningRate 0.0076   Epoch: 16   Global Step: 175930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:48:31,678-Speed 5437.12 samples/sec   Loss 1.7263   LearningRate 0.0076   Epoch: 16   Global Step: 175940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:48:39,194-Speed 5451.13 samples/sec   Loss 1.7214   LearningRate 0.0076   Epoch: 16   Global Step: 175950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:48:46,739-Speed 5429.67 samples/sec   Loss 1.7189   LearningRate 0.0076   Epoch: 16   Global Step: 175960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:48:54,333-Speed 5393.84 samples/sec   Loss 1.7133   LearningRate 0.0076   Epoch: 16   Global Step: 175970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:49:01,795-Speed 5490.00 samples/sec   Loss 1.7091   LearningRate 0.0076   Epoch: 16   Global Step: 175980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:49:09,292-Speed 5464.40 samples/sec   Loss 1.7324   LearningRate 0.0076   Epoch: 16   Global Step: 175990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:49:16,805-Speed 5453.21 samples/sec   Loss 1.7108   LearningRate 0.0076   Epoch: 16   Global Step: 176000   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:50:00,680-[lfw][176000]XNorm: 23.440188
Training: 2022-01-09 10:50:00,681-[lfw][176000]Accuracy-Flip: 0.99817+-0.00229
Training: 2022-01-09 10:50:00,681-[lfw][176000]Accuracy-Highest: 0.99833
Training: 2022-01-09 10:50:51,874-[cfp_fp][176000]XNorm: 22.570946
Training: 2022-01-09 10:50:51,874-[cfp_fp][176000]Accuracy-Flip: 0.99300+-0.00431
Training: 2022-01-09 10:50:51,875-[cfp_fp][176000]Accuracy-Highest: 0.99371
Training: 2022-01-09 10:51:35,833-[agedb_30][176000]XNorm: 23.694088
Training: 2022-01-09 10:51:35,834-[agedb_30][176000]Accuracy-Flip: 0.98333+-0.00654
Training: 2022-01-09 10:51:35,834-[agedb_30][176000]Accuracy-Highest: 0.98433
Training: 2022-01-09 10:51:43,414-Speed 279.38 samples/sec   Loss 1.7228   LearningRate 0.0076   Epoch: 16   Global Step: 176010   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:51:50,879-Speed 5487.53 samples/sec   Loss 1.7199   LearningRate 0.0076   Epoch: 16   Global Step: 176020   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:51:58,384-Speed 5458.07 samples/sec   Loss 1.7259   LearningRate 0.0076   Epoch: 16   Global Step: 176030   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:52:05,958-Speed 5409.03 samples/sec   Loss 1.7140   LearningRate 0.0076   Epoch: 16   Global Step: 176040   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:52:13,453-Speed 5465.85 samples/sec   Loss 1.7026   LearningRate 0.0076   Epoch: 16   Global Step: 176050   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:52:21,004-Speed 5425.14 samples/sec   Loss 1.6922   LearningRate 0.0076   Epoch: 16   Global Step: 176060   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:52:28,564-Speed 5418.18 samples/sec   Loss 1.7109   LearningRate 0.0076   Epoch: 16   Global Step: 176070   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:52:36,064-Speed 5462.65 samples/sec   Loss 1.7116   LearningRate 0.0076   Epoch: 16   Global Step: 176080   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:52:43,590-Speed 5443.09 samples/sec   Loss 1.7314   LearningRate 0.0076   Epoch: 16   Global Step: 176090   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:52:51,263-Speed 5339.07 samples/sec   Loss 1.7416   LearningRate 0.0076   Epoch: 16   Global Step: 176100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:52:58,783-Speed 5447.28 samples/sec   Loss 1.7159   LearningRate 0.0076   Epoch: 16   Global Step: 176110   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:53:06,241-Speed 5492.29 samples/sec   Loss 1.7595   LearningRate 0.0076   Epoch: 16   Global Step: 176120   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:53:13,801-Speed 5419.08 samples/sec   Loss 1.7343   LearningRate 0.0075   Epoch: 16   Global Step: 176130   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:53:21,327-Speed 5443.13 samples/sec   Loss 1.6993   LearningRate 0.0075   Epoch: 16   Global Step: 176140   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:53:28,881-Speed 5422.83 samples/sec   Loss 1.7546   LearningRate 0.0075   Epoch: 16   Global Step: 176150   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:53:36,401-Speed 5447.63 samples/sec   Loss 1.7170   LearningRate 0.0075   Epoch: 16   Global Step: 176160   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:53:44,057-Speed 5350.60 samples/sec   Loss 1.7430   LearningRate 0.0075   Epoch: 16   Global Step: 176170   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:53:51,587-Speed 5440.71 samples/sec   Loss 1.7393   LearningRate 0.0075   Epoch: 16   Global Step: 176180   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:53:59,157-Speed 5411.54 samples/sec   Loss 1.7388   LearningRate 0.0075   Epoch: 16   Global Step: 176190   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:54:06,669-Speed 5452.84 samples/sec   Loss 1.7148   LearningRate 0.0075   Epoch: 16   Global Step: 176200   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:54:14,209-Speed 5433.36 samples/sec   Loss 1.6912   LearningRate 0.0075   Epoch: 16   Global Step: 176210   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:54:21,747-Speed 5434.24 samples/sec   Loss 1.7338   LearningRate 0.0075   Epoch: 16   Global Step: 176220   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:54:29,261-Speed 5451.78 samples/sec   Loss 1.7174   LearningRate 0.0075   Epoch: 16   Global Step: 176230   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:54:36,872-Speed 5382.37 samples/sec   Loss 1.7074   LearningRate 0.0075   Epoch: 16   Global Step: 176240   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:54:44,428-Speed 5421.74 samples/sec   Loss 1.7047   LearningRate 0.0075   Epoch: 16   Global Step: 176250   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:54:52,009-Speed 5403.58 samples/sec   Loss 1.7274   LearningRate 0.0075   Epoch: 16   Global Step: 176260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 10:54:59,404-Speed 5539.26 samples/sec   Loss 1.7173   LearningRate 0.0075   Epoch: 16   Global Step: 176270   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:55:06,873-Speed 5484.77 samples/sec   Loss 1.7401   LearningRate 0.0075   Epoch: 16   Global Step: 176280   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:55:29,458-Speed 1813.71 samples/sec   Loss 1.6794   LearningRate 0.0075   Epoch: 17   Global Step: 176290   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:55:36,903-Speed 5502.31 samples/sec   Loss 1.7063   LearningRate 0.0075   Epoch: 17   Global Step: 176300   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:55:44,371-Speed 5485.58 samples/sec   Loss 1.7185   LearningRate 0.0075   Epoch: 17   Global Step: 176310   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:55:51,872-Speed 5461.52 samples/sec   Loss 1.7203   LearningRate 0.0075   Epoch: 17   Global Step: 176320   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:55:59,340-Speed 5484.91 samples/sec   Loss 1.6664   LearningRate 0.0075   Epoch: 17   Global Step: 176330   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:56:06,860-Speed 5447.82 samples/sec   Loss 1.6837   LearningRate 0.0074   Epoch: 17   Global Step: 176340   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 10:56:14,287-Speed 5515.75 samples/sec   Loss 1.6731   LearningRate 0.0074   Epoch: 17   Global Step: 176350   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 10:56:21,688-Speed 5535.30 samples/sec   Loss 1.7235   LearningRate 0.0074   Epoch: 17   Global Step: 176360   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 10:56:29,138-Speed 5497.83 samples/sec   Loss 1.6918   LearningRate 0.0074   Epoch: 17   Global Step: 176370   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 10:56:36,700-Speed 5418.18 samples/sec   Loss 1.7223   LearningRate 0.0074   Epoch: 17   Global Step: 176380   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 10:56:44,159-Speed 5492.25 samples/sec   Loss 1.6944   LearningRate 0.0074   Epoch: 17   Global Step: 176390   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 10:56:51,612-Speed 5496.81 samples/sec   Loss 1.7073   LearningRate 0.0074   Epoch: 17   Global Step: 176400   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 10:56:59,038-Speed 5515.85 samples/sec   Loss 1.7267   LearningRate 0.0074   Epoch: 17   Global Step: 176410   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 10:57:06,523-Speed 5473.28 samples/sec   Loss 1.6989   LearningRate 0.0074   Epoch: 17   Global Step: 176420   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 10:57:14,118-Speed 5393.86 samples/sec   Loss 1.6944   LearningRate 0.0074   Epoch: 17   Global Step: 176430   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 10:57:21,834-Speed 5309.14 samples/sec   Loss 1.6814   LearningRate 0.0074   Epoch: 17   Global Step: 176440   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:57:29,442-Speed 5383.96 samples/sec   Loss 1.6816   LearningRate 0.0074   Epoch: 17   Global Step: 176450   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:57:37,075-Speed 5366.95 samples/sec   Loss 1.6841   LearningRate 0.0074   Epoch: 17   Global Step: 176460   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:57:44,753-Speed 5335.24 samples/sec   Loss 1.6780   LearningRate 0.0074   Epoch: 17   Global Step: 176470   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:57:52,390-Speed 5364.37 samples/sec   Loss 1.6845   LearningRate 0.0074   Epoch: 17   Global Step: 176480   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 10:58:00,057-Speed 5343.07 samples/sec   Loss 1.6782   LearningRate 0.0074   Epoch: 17   Global Step: 176490   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 10:58:07,752-Speed 5323.41 samples/sec   Loss 1.7011   LearningRate 0.0074   Epoch: 17   Global Step: 176500   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 10:58:15,491-Speed 5293.98 samples/sec   Loss 1.7188   LearningRate 0.0074   Epoch: 17   Global Step: 176510   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 10:58:23,159-Speed 5342.32 samples/sec   Loss 1.6771   LearningRate 0.0074   Epoch: 17   Global Step: 176520   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 10:58:30,791-Speed 5367.42 samples/sec   Loss 1.7081   LearningRate 0.0074   Epoch: 17   Global Step: 176530   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 10:58:38,451-Speed 5347.87 samples/sec   Loss 1.6765   LearningRate 0.0074   Epoch: 17   Global Step: 176540   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 10:58:45,905-Speed 5495.68 samples/sec   Loss 1.7187   LearningRate 0.0073   Epoch: 17   Global Step: 176550   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 10:58:53,545-Speed 5361.92 samples/sec   Loss 1.6947   LearningRate 0.0073   Epoch: 17   Global Step: 176560   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 10:59:01,049-Speed 5459.44 samples/sec   Loss 1.6800   LearningRate 0.0073   Epoch: 17   Global Step: 176570   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 10:59:08,553-Speed 5459.20 samples/sec   Loss 1.6850   LearningRate 0.0073   Epoch: 17   Global Step: 176580   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 10:59:16,093-Speed 5433.15 samples/sec   Loss 1.6811   LearningRate 0.0073   Epoch: 17   Global Step: 176590   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 10:59:23,905-Speed 5243.68 samples/sec   Loss 1.7103   LearningRate 0.0073   Epoch: 17   Global Step: 176600   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 10:59:31,360-Speed 5495.37 samples/sec   Loss 1.6766   LearningRate 0.0073   Epoch: 17   Global Step: 176610   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 10:59:38,806-Speed 5501.73 samples/sec   Loss 1.6957   LearningRate 0.0073   Epoch: 17   Global Step: 176620   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 10:59:46,312-Speed 5458.12 samples/sec   Loss 1.7022   LearningRate 0.0073   Epoch: 17   Global Step: 176630   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 10:59:53,789-Speed 5478.40 samples/sec   Loss 1.6931   LearningRate 0.0073   Epoch: 17   Global Step: 176640   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 11:00:01,206-Speed 5523.20 samples/sec   Loss 1.6976   LearningRate 0.0073   Epoch: 17   Global Step: 176650   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 11:00:08,722-Speed 5451.06 samples/sec   Loss 1.6719   LearningRate 0.0073   Epoch: 17   Global Step: 176660   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 11:00:16,291-Speed 5412.10 samples/sec   Loss 1.6387   LearningRate 0.0073   Epoch: 17   Global Step: 176670   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 11:00:23,728-Speed 5508.56 samples/sec   Loss 1.6692   LearningRate 0.0073   Epoch: 17   Global Step: 176680   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 11:00:31,168-Speed 5506.07 samples/sec   Loss 1.6860   LearningRate 0.0073   Epoch: 17   Global Step: 176690   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:00:38,759-Speed 5396.65 samples/sec   Loss 1.6830   LearningRate 0.0073   Epoch: 17   Global Step: 176700   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:00:46,179-Speed 5521.37 samples/sec   Loss 1.6853   LearningRate 0.0073   Epoch: 17   Global Step: 176710   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:00:53,613-Speed 5510.18 samples/sec   Loss 1.6625   LearningRate 0.0073   Epoch: 17   Global Step: 176720   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:01:01,047-Speed 5510.66 samples/sec   Loss 1.6875   LearningRate 0.0073   Epoch: 17   Global Step: 176730   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:01:08,557-Speed 5454.75 samples/sec   Loss 1.6758   LearningRate 0.0073   Epoch: 17   Global Step: 176740   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:01:16,063-Speed 5457.89 samples/sec   Loss 1.6588   LearningRate 0.0073   Epoch: 17   Global Step: 176750   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:01:23,592-Speed 5440.85 samples/sec   Loss 1.6753   LearningRate 0.0072   Epoch: 17   Global Step: 176760   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:01:31,146-Speed 5423.20 samples/sec   Loss 1.6675   LearningRate 0.0072   Epoch: 17   Global Step: 176770   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:01:38,798-Speed 5353.13 samples/sec   Loss 1.6676   LearningRate 0.0072   Epoch: 17   Global Step: 176780   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:01:46,334-Speed 5436.85 samples/sec   Loss 1.6746   LearningRate 0.0072   Epoch: 17   Global Step: 176790   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:01:53,796-Speed 5489.59 samples/sec   Loss 1.6809   LearningRate 0.0072   Epoch: 17   Global Step: 176800   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:02:01,290-Speed 5466.30 samples/sec   Loss 1.6819   LearningRate 0.0072   Epoch: 17   Global Step: 176810   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:02:08,869-Speed 5404.83 samples/sec   Loss 1.6741   LearningRate 0.0072   Epoch: 17   Global Step: 176820   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:02:16,393-Speed 5445.01 samples/sec   Loss 1.6773   LearningRate 0.0072   Epoch: 17   Global Step: 176830   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:02:23,923-Speed 5440.30 samples/sec   Loss 1.6913   LearningRate 0.0072   Epoch: 17   Global Step: 176840   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:02:31,424-Speed 5460.66 samples/sec   Loss 1.6893   LearningRate 0.0072   Epoch: 17   Global Step: 176850   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:02:38,955-Speed 5440.12 samples/sec   Loss 1.6857   LearningRate 0.0072   Epoch: 17   Global Step: 176860   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:02:46,483-Speed 5441.38 samples/sec   Loss 1.6892   LearningRate 0.0072   Epoch: 17   Global Step: 176870   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:02:54,004-Speed 5447.12 samples/sec   Loss 1.6781   LearningRate 0.0072   Epoch: 17   Global Step: 176880   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:03:01,557-Speed 5423.53 samples/sec   Loss 1.7021   LearningRate 0.0072   Epoch: 17   Global Step: 176890   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:03:09,073-Speed 5450.44 samples/sec   Loss 1.6576   LearningRate 0.0072   Epoch: 17   Global Step: 176900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:03:16,544-Speed 5483.29 samples/sec   Loss 1.6515   LearningRate 0.0072   Epoch: 17   Global Step: 176910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:03:24,001-Speed 5493.53 samples/sec   Loss 1.7008   LearningRate 0.0072   Epoch: 17   Global Step: 176920   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:03:31,407-Speed 5531.42 samples/sec   Loss 1.6975   LearningRate 0.0072   Epoch: 17   Global Step: 176930   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:03:38,865-Speed 5492.90 samples/sec   Loss 1.6776   LearningRate 0.0072   Epoch: 17   Global Step: 176940   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:03:46,308-Speed 5504.25 samples/sec   Loss 1.6767   LearningRate 0.0072   Epoch: 17   Global Step: 176950   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:03:53,794-Speed 5472.06 samples/sec   Loss 1.6847   LearningRate 0.0072   Epoch: 17   Global Step: 176960   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:04:01,303-Speed 5455.18 samples/sec   Loss 1.6773   LearningRate 0.0071   Epoch: 17   Global Step: 176970   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:04:08,788-Speed 5473.22 samples/sec   Loss 1.6711   LearningRate 0.0071   Epoch: 17   Global Step: 176980   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:04:16,318-Speed 5440.25 samples/sec   Loss 1.6396   LearningRate 0.0071   Epoch: 17   Global Step: 176990   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:04:23,965-Speed 5357.31 samples/sec   Loss 1.6933   LearningRate 0.0071   Epoch: 17   Global Step: 177000   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:04:31,473-Speed 5456.10 samples/sec   Loss 1.6905   LearningRate 0.0071   Epoch: 17   Global Step: 177010   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:04:39,051-Speed 5405.67 samples/sec   Loss 1.6321   LearningRate 0.0071   Epoch: 17   Global Step: 177020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:04:46,546-Speed 5465.95 samples/sec   Loss 1.6572   LearningRate 0.0071   Epoch: 17   Global Step: 177030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:04:54,075-Speed 5440.93 samples/sec   Loss 1.6874   LearningRate 0.0071   Epoch: 17   Global Step: 177040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:05:01,544-Speed 5484.88 samples/sec   Loss 1.6736   LearningRate 0.0071   Epoch: 17   Global Step: 177050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:05:09,116-Speed 5410.14 samples/sec   Loss 1.6880   LearningRate 0.0071   Epoch: 17   Global Step: 177060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:05:16,550-Speed 5510.19 samples/sec   Loss 1.6806   LearningRate 0.0071   Epoch: 17   Global Step: 177070   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:05:24,047-Speed 5464.63 samples/sec   Loss 1.6653   LearningRate 0.0071   Epoch: 17   Global Step: 177080   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:05:31,519-Speed 5482.22 samples/sec   Loss 1.6685   LearningRate 0.0071   Epoch: 17   Global Step: 177090   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:05:39,195-Speed 5337.03 samples/sec   Loss 1.6729   LearningRate 0.0071   Epoch: 17   Global Step: 177100   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:05:46,761-Speed 5414.04 samples/sec   Loss 1.6328   LearningRate 0.0071   Epoch: 17   Global Step: 177110   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:05:54,266-Speed 5459.24 samples/sec   Loss 1.6821   LearningRate 0.0071   Epoch: 17   Global Step: 177120   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:06:01,808-Speed 5431.41 samples/sec   Loss 1.6950   LearningRate 0.0071   Epoch: 17   Global Step: 177130   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:06:09,292-Speed 5473.09 samples/sec   Loss 1.6667   LearningRate 0.0071   Epoch: 17   Global Step: 177140   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:06:16,764-Speed 5482.86 samples/sec   Loss 1.6841   LearningRate 0.0071   Epoch: 17   Global Step: 177150   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:06:24,232-Speed 5485.51 samples/sec   Loss 1.7159   LearningRate 0.0071   Epoch: 17   Global Step: 177160   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:06:31,788-Speed 5421.57 samples/sec   Loss 1.6452   LearningRate 0.0071   Epoch: 17   Global Step: 177170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:06:39,253-Speed 5487.65 samples/sec   Loss 1.6526   LearningRate 0.0070   Epoch: 17   Global Step: 177180   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 11:06:46,753-Speed 5461.91 samples/sec   Loss 1.6796   LearningRate 0.0070   Epoch: 17   Global Step: 177190   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 11:06:54,272-Speed 5448.78 samples/sec   Loss 1.7040   LearningRate 0.0070   Epoch: 17   Global Step: 177200   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 11:07:01,839-Speed 5413.32 samples/sec   Loss 1.6607   LearningRate 0.0070   Epoch: 17   Global Step: 177210   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 11:07:09,390-Speed 5424.43 samples/sec   Loss 1.6687   LearningRate 0.0070   Epoch: 17   Global Step: 177220   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 11:07:17,101-Speed 5313.05 samples/sec   Loss 1.6580   LearningRate 0.0070   Epoch: 17   Global Step: 177230   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 11:07:24,622-Speed 5446.62 samples/sec   Loss 1.6666   LearningRate 0.0070   Epoch: 17   Global Step: 177240   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 11:07:32,170-Speed 5427.26 samples/sec   Loss 1.6466   LearningRate 0.0070   Epoch: 17   Global Step: 177250   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 11:07:39,662-Speed 5467.83 samples/sec   Loss 1.6401   LearningRate 0.0070   Epoch: 17   Global Step: 177260   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 11:07:47,094-Speed 5512.23 samples/sec   Loss 1.6538   LearningRate 0.0070   Epoch: 17   Global Step: 177270   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 11:07:54,634-Speed 5433.14 samples/sec   Loss 1.6925   LearningRate 0.0070   Epoch: 17   Global Step: 177280   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:08:02,092-Speed 5493.26 samples/sec   Loss 1.6415   LearningRate 0.0070   Epoch: 17   Global Step: 177290   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:08:09,577-Speed 5473.27 samples/sec   Loss 1.6502   LearningRate 0.0070   Epoch: 17   Global Step: 177300   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:08:17,154-Speed 5405.99 samples/sec   Loss 1.6834   LearningRate 0.0070   Epoch: 17   Global Step: 177310   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:08:24,681-Speed 5442.43 samples/sec   Loss 1.6541   LearningRate 0.0070   Epoch: 17   Global Step: 177320   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:08:32,240-Speed 5419.30 samples/sec   Loss 1.6439   LearningRate 0.0070   Epoch: 17   Global Step: 177330   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:08:39,790-Speed 5426.30 samples/sec   Loss 1.6646   LearningRate 0.0070   Epoch: 17   Global Step: 177340   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:08:47,311-Speed 5446.53 samples/sec   Loss 1.6756   LearningRate 0.0070   Epoch: 17   Global Step: 177350   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:08:54,827-Speed 5450.35 samples/sec   Loss 1.6671   LearningRate 0.0070   Epoch: 17   Global Step: 177360   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:09:02,316-Speed 5470.46 samples/sec   Loss 1.6523   LearningRate 0.0070   Epoch: 17   Global Step: 177370   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:09:09,793-Speed 5478.45 samples/sec   Loss 1.6384   LearningRate 0.0070   Epoch: 17   Global Step: 177380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:09:17,398-Speed 5386.66 samples/sec   Loss 1.6640   LearningRate 0.0070   Epoch: 17   Global Step: 177390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:09:24,871-Speed 5481.73 samples/sec   Loss 1.6914   LearningRate 0.0069   Epoch: 17   Global Step: 177400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:09:32,351-Speed 5477.31 samples/sec   Loss 1.6777   LearningRate 0.0069   Epoch: 17   Global Step: 177410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:09:39,858-Speed 5456.22 samples/sec   Loss 1.6639   LearningRate 0.0069   Epoch: 17   Global Step: 177420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:09:47,358-Speed 5462.07 samples/sec   Loss 1.6511   LearningRate 0.0069   Epoch: 17   Global Step: 177430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:09:54,914-Speed 5422.23 samples/sec   Loss 1.6511   LearningRate 0.0069   Epoch: 17   Global Step: 177440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:10:02,450-Speed 5435.91 samples/sec   Loss 1.6157   LearningRate 0.0069   Epoch: 17   Global Step: 177450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:10:09,994-Speed 5429.99 samples/sec   Loss 1.6401   LearningRate 0.0069   Epoch: 17   Global Step: 177460   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:10:17,551-Speed 5420.78 samples/sec   Loss 1.6647   LearningRate 0.0069   Epoch: 17   Global Step: 177470   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:10:25,064-Speed 5452.87 samples/sec   Loss 1.6731   LearningRate 0.0069   Epoch: 17   Global Step: 177480   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:10:32,654-Speed 5397.37 samples/sec   Loss 1.6722   LearningRate 0.0069   Epoch: 17   Global Step: 177490   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:10:40,240-Speed 5399.53 samples/sec   Loss 1.6641   LearningRate 0.0069   Epoch: 17   Global Step: 177500   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:10:47,748-Speed 5456.07 samples/sec   Loss 1.6469   LearningRate 0.0069   Epoch: 17   Global Step: 177510   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:10:55,231-Speed 5474.84 samples/sec   Loss 1.6571   LearningRate 0.0069   Epoch: 17   Global Step: 177520   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:11:02,848-Speed 5378.28 samples/sec   Loss 1.6670   LearningRate 0.0069   Epoch: 17   Global Step: 177530   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:11:10,319-Speed 5483.22 samples/sec   Loss 1.6778   LearningRate 0.0069   Epoch: 17   Global Step: 177540   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:11:17,818-Speed 5462.64 samples/sec   Loss 1.6421   LearningRate 0.0069   Epoch: 17   Global Step: 177550   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:11:25,448-Speed 5369.35 samples/sec   Loss 1.6200   LearningRate 0.0069   Epoch: 17   Global Step: 177560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:11:33,013-Speed 5414.89 samples/sec   Loss 1.6395   LearningRate 0.0069   Epoch: 17   Global Step: 177570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:11:40,588-Speed 5407.27 samples/sec   Loss 1.6032   LearningRate 0.0069   Epoch: 17   Global Step: 177580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:11:48,131-Speed 5431.37 samples/sec   Loss 1.6499   LearningRate 0.0069   Epoch: 17   Global Step: 177590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:11:55,656-Speed 5443.79 samples/sec   Loss 1.6280   LearningRate 0.0069   Epoch: 17   Global Step: 177600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:12:03,237-Speed 5403.58 samples/sec   Loss 1.6454   LearningRate 0.0069   Epoch: 17   Global Step: 177610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:12:10,751-Speed 5451.95 samples/sec   Loss 1.6326   LearningRate 0.0068   Epoch: 17   Global Step: 177620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:12:18,315-Speed 5416.13 samples/sec   Loss 1.6377   LearningRate 0.0068   Epoch: 17   Global Step: 177630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:12:25,981-Speed 5343.55 samples/sec   Loss 1.6358   LearningRate 0.0068   Epoch: 17   Global Step: 177640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:12:33,588-Speed 5385.30 samples/sec   Loss 1.6462   LearningRate 0.0068   Epoch: 17   Global Step: 177650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:12:41,148-Speed 5419.14 samples/sec   Loss 1.6300   LearningRate 0.0068   Epoch: 17   Global Step: 177660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:12:48,635-Speed 5471.58 samples/sec   Loss 1.6598   LearningRate 0.0068   Epoch: 17   Global Step: 177670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:12:56,162-Speed 5442.75 samples/sec   Loss 1.6171   LearningRate 0.0068   Epoch: 17   Global Step: 177680   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:13:03,758-Speed 5392.85 samples/sec   Loss 1.6185   LearningRate 0.0068   Epoch: 17   Global Step: 177690   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:13:11,283-Speed 5443.92 samples/sec   Loss 1.6433   LearningRate 0.0068   Epoch: 17   Global Step: 177700   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:13:18,778-Speed 5465.38 samples/sec   Loss 1.6013   LearningRate 0.0068   Epoch: 17   Global Step: 177710   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:13:26,238-Speed 5491.79 samples/sec   Loss 1.5836   LearningRate 0.0068   Epoch: 17   Global Step: 177720   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:13:33,804-Speed 5413.90 samples/sec   Loss 1.6510   LearningRate 0.0068   Epoch: 17   Global Step: 177730   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:13:41,367-Speed 5417.05 samples/sec   Loss 1.6181   LearningRate 0.0068   Epoch: 17   Global Step: 177740   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:13:48,993-Speed 5371.15 samples/sec   Loss 1.6312   LearningRate 0.0068   Epoch: 17   Global Step: 177750   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:13:56,587-Speed 5394.77 samples/sec   Loss 1.6235   LearningRate 0.0068   Epoch: 17   Global Step: 177760   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:14:04,130-Speed 5431.14 samples/sec   Loss 1.6257   LearningRate 0.0068   Epoch: 17   Global Step: 177770   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:14:11,618-Speed 5470.23 samples/sec   Loss 1.6510   LearningRate 0.0068   Epoch: 17   Global Step: 177780   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:14:19,045-Speed 5515.48 samples/sec   Loss 1.6430   LearningRate 0.0068   Epoch: 17   Global Step: 177790   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:14:26,603-Speed 5420.56 samples/sec   Loss 1.6329   LearningRate 0.0068   Epoch: 17   Global Step: 177800   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:14:34,184-Speed 5404.31 samples/sec   Loss 1.6450   LearningRate 0.0068   Epoch: 17   Global Step: 177810   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:14:41,823-Speed 5361.88 samples/sec   Loss 1.6353   LearningRate 0.0068   Epoch: 17   Global Step: 177820   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:14:49,351-Speed 5441.48 samples/sec   Loss 1.6333   LearningRate 0.0067   Epoch: 17   Global Step: 177830   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:14:57,056-Speed 5317.56 samples/sec   Loss 1.6398   LearningRate 0.0067   Epoch: 17   Global Step: 177840   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:15:04,673-Speed 5377.66 samples/sec   Loss 1.6243   LearningRate 0.0067   Epoch: 17   Global Step: 177850   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:15:12,169-Speed 5464.86 samples/sec   Loss 1.6318   LearningRate 0.0067   Epoch: 17   Global Step: 177860   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:15:19,687-Speed 5449.00 samples/sec   Loss 1.6524   LearningRate 0.0067   Epoch: 17   Global Step: 177870   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:15:27,215-Speed 5441.68 samples/sec   Loss 1.6295   LearningRate 0.0067   Epoch: 17   Global Step: 177880   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:15:34,766-Speed 5425.17 samples/sec   Loss 1.6304   LearningRate 0.0067   Epoch: 17   Global Step: 177890   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:15:42,337-Speed 5410.97 samples/sec   Loss 1.6673   LearningRate 0.0067   Epoch: 17   Global Step: 177900   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:15:49,987-Speed 5354.77 samples/sec   Loss 1.6695   LearningRate 0.0067   Epoch: 17   Global Step: 177910   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:15:57,586-Speed 5390.55 samples/sec   Loss 1.6637   LearningRate 0.0067   Epoch: 17   Global Step: 177920   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:16:05,182-Speed 5393.64 samples/sec   Loss 1.6211   LearningRate 0.0067   Epoch: 17   Global Step: 177930   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:16:12,932-Speed 5285.37 samples/sec   Loss 1.6243   LearningRate 0.0067   Epoch: 17   Global Step: 177940   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:16:20,588-Speed 5350.37 samples/sec   Loss 1.6237   LearningRate 0.0067   Epoch: 17   Global Step: 177950   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:16:28,352-Speed 5277.04 samples/sec   Loss 1.6374   LearningRate 0.0067   Epoch: 17   Global Step: 177960   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:16:36,010-Speed 5349.14 samples/sec   Loss 1.6524   LearningRate 0.0067   Epoch: 17   Global Step: 177970   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:16:43,664-Speed 5351.85 samples/sec   Loss 1.6377   LearningRate 0.0067   Epoch: 17   Global Step: 177980   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:16:51,429-Speed 5275.63 samples/sec   Loss 1.6006   LearningRate 0.0067   Epoch: 17   Global Step: 177990   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:16:58,960-Speed 5439.94 samples/sec   Loss 1.6289   LearningRate 0.0067   Epoch: 17   Global Step: 178000   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:17:43,353-[lfw][178000]XNorm: 22.461704
Training: 2022-01-09 11:17:43,354-[lfw][178000]Accuracy-Flip: 0.99833+-0.00197
Training: 2022-01-09 11:17:43,354-[lfw][178000]Accuracy-Highest: 0.99833
Training: 2022-01-09 11:18:35,079-[cfp_fp][178000]XNorm: 21.921971
Training: 2022-01-09 11:18:35,080-[cfp_fp][178000]Accuracy-Flip: 0.99257+-0.00343
Training: 2022-01-09 11:18:35,081-[cfp_fp][178000]Accuracy-Highest: 0.99371
Training: 2022-01-09 11:19:19,827-[agedb_30][178000]XNorm: 23.023412
Training: 2022-01-09 11:19:19,828-[agedb_30][178000]Accuracy-Flip: 0.98367+-0.00557
Training: 2022-01-09 11:19:19,828-[agedb_30][178000]Accuracy-Highest: 0.98433
Training: 2022-01-09 11:19:27,453-Speed 275.84 samples/sec   Loss 1.6437   LearningRate 0.0067   Epoch: 17   Global Step: 178010   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:19:34,991-Speed 5434.45 samples/sec   Loss 1.5900   LearningRate 0.0067   Epoch: 17   Global Step: 178020   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:19:42,549-Speed 5420.14 samples/sec   Loss 1.6305   LearningRate 0.0067   Epoch: 17   Global Step: 178030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:19:49,992-Speed 5503.76 samples/sec   Loss 1.6233   LearningRate 0.0067   Epoch: 17   Global Step: 178040   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:20:01,486-Speed 3564.01 samples/sec   Loss 1.5903   LearningRate 0.0066   Epoch: 17   Global Step: 178050   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:20:08,814-Speed 5590.04 samples/sec   Loss 1.6485   LearningRate 0.0066   Epoch: 17   Global Step: 178060   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:20:16,370-Speed 5421.78 samples/sec   Loss 1.6403   LearningRate 0.0066   Epoch: 17   Global Step: 178070   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:20:23,900-Speed 5440.13 samples/sec   Loss 1.6214   LearningRate 0.0066   Epoch: 17   Global Step: 178080   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:20:31,442-Speed 5432.03 samples/sec   Loss 1.6650   LearningRate 0.0066   Epoch: 17   Global Step: 178090   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:20:39,015-Speed 5409.07 samples/sec   Loss 1.6277   LearningRate 0.0066   Epoch: 17   Global Step: 178100   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:20:46,620-Speed 5386.61 samples/sec   Loss 1.6316   LearningRate 0.0066   Epoch: 17   Global Step: 178110   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:20:54,120-Speed 5462.35 samples/sec   Loss 1.6306   LearningRate 0.0066   Epoch: 17   Global Step: 178120   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:21:01,593-Speed 5482.35 samples/sec   Loss 1.6107   LearningRate 0.0066   Epoch: 17   Global Step: 178130   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:21:09,122-Speed 5440.84 samples/sec   Loss 1.6251   LearningRate 0.0066   Epoch: 17   Global Step: 178140   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:21:16,677-Speed 5421.64 samples/sec   Loss 1.6225   LearningRate 0.0066   Epoch: 17   Global Step: 178150   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:21:24,142-Speed 5487.88 samples/sec   Loss 1.6104   LearningRate 0.0066   Epoch: 17   Global Step: 178160   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:21:31,656-Speed 5452.23 samples/sec   Loss 1.6226   LearningRate 0.0066   Epoch: 17   Global Step: 178170   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:21:39,120-Speed 5488.25 samples/sec   Loss 1.5986   LearningRate 0.0066   Epoch: 17   Global Step: 178180   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:21:46,635-Speed 5451.31 samples/sec   Loss 1.6319   LearningRate 0.0066   Epoch: 17   Global Step: 178190   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:21:54,143-Speed 5456.67 samples/sec   Loss 1.5839   LearningRate 0.0066   Epoch: 17   Global Step: 178200   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:22:01,736-Speed 5394.91 samples/sec   Loss 1.6130   LearningRate 0.0066   Epoch: 17   Global Step: 178210   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:22:09,195-Speed 5492.25 samples/sec   Loss 1.6107   LearningRate 0.0066   Epoch: 17   Global Step: 178220   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:22:16,687-Speed 5467.67 samples/sec   Loss 1.5847   LearningRate 0.0066   Epoch: 17   Global Step: 178230   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:22:24,224-Speed 5435.44 samples/sec   Loss 1.6394   LearningRate 0.0066   Epoch: 17   Global Step: 178240   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:22:31,711-Speed 5471.75 samples/sec   Loss 1.6198   LearningRate 0.0066   Epoch: 17   Global Step: 178250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:22:39,156-Speed 5502.46 samples/sec   Loss 1.6144   LearningRate 0.0066   Epoch: 17   Global Step: 178260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:22:46,624-Speed 5485.44 samples/sec   Loss 1.5861   LearningRate 0.0065   Epoch: 17   Global Step: 178270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:22:54,093-Speed 5485.33 samples/sec   Loss 1.6105   LearningRate 0.0065   Epoch: 17   Global Step: 178280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:23:01,559-Speed 5487.00 samples/sec   Loss 1.6402   LearningRate 0.0065   Epoch: 17   Global Step: 178290   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:23:09,051-Speed 5467.58 samples/sec   Loss 1.6367   LearningRate 0.0065   Epoch: 17   Global Step: 178300   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:23:16,576-Speed 5444.13 samples/sec   Loss 1.6331   LearningRate 0.0065   Epoch: 17   Global Step: 178310   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:23:24,093-Speed 5449.74 samples/sec   Loss 1.6097   LearningRate 0.0065   Epoch: 17   Global Step: 178320   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:23:31,613-Speed 5447.96 samples/sec   Loss 1.6032   LearningRate 0.0065   Epoch: 17   Global Step: 178330   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:23:39,183-Speed 5411.06 samples/sec   Loss 1.6107   LearningRate 0.0065   Epoch: 17   Global Step: 178340   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:23:46,755-Speed 5410.18 samples/sec   Loss 1.5644   LearningRate 0.0065   Epoch: 17   Global Step: 178350   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:23:54,274-Speed 5448.38 samples/sec   Loss 1.5824   LearningRate 0.0065   Epoch: 17   Global Step: 178360   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:24:01,799-Speed 5444.35 samples/sec   Loss 1.5958   LearningRate 0.0065   Epoch: 17   Global Step: 178370   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:24:09,386-Speed 5399.47 samples/sec   Loss 1.5966   LearningRate 0.0065   Epoch: 17   Global Step: 178380   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:24:16,831-Speed 5501.95 samples/sec   Loss 1.6122   LearningRate 0.0065   Epoch: 17   Global Step: 178390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:24:24,389-Speed 5420.20 samples/sec   Loss 1.5909   LearningRate 0.0065   Epoch: 17   Global Step: 178400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:24:31,922-Speed 5438.67 samples/sec   Loss 1.5904   LearningRate 0.0065   Epoch: 17   Global Step: 178410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:24:39,436-Speed 5451.56 samples/sec   Loss 1.5808   LearningRate 0.0065   Epoch: 17   Global Step: 178420   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:24:46,996-Speed 5418.65 samples/sec   Loss 1.5960   LearningRate 0.0065   Epoch: 17   Global Step: 178430   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:24:54,477-Speed 5475.96 samples/sec   Loss 1.6241   LearningRate 0.0065   Epoch: 17   Global Step: 178440   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:25:02,019-Speed 5431.69 samples/sec   Loss 1.6153   LearningRate 0.0065   Epoch: 17   Global Step: 178450   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:25:09,584-Speed 5415.24 samples/sec   Loss 1.5940   LearningRate 0.0065   Epoch: 17   Global Step: 178460   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:25:17,079-Speed 5465.93 samples/sec   Loss 1.5993   LearningRate 0.0065   Epoch: 17   Global Step: 178470   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:25:24,603-Speed 5444.60 samples/sec   Loss 1.6180   LearningRate 0.0065   Epoch: 17   Global Step: 178480   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:25:32,140-Speed 5435.47 samples/sec   Loss 1.6216   LearningRate 0.0065   Epoch: 17   Global Step: 178490   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:25:39,616-Speed 5478.87 samples/sec   Loss 1.6120   LearningRate 0.0064   Epoch: 17   Global Step: 178500   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:25:47,086-Speed 5484.10 samples/sec   Loss 1.6080   LearningRate 0.0064   Epoch: 17   Global Step: 178510   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:25:54,667-Speed 5403.86 samples/sec   Loss 1.6122   LearningRate 0.0064   Epoch: 17   Global Step: 178520   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:26:02,120-Speed 5497.12 samples/sec   Loss 1.6294   LearningRate 0.0064   Epoch: 17   Global Step: 178530   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:26:09,562-Speed 5503.98 samples/sec   Loss 1.6119   LearningRate 0.0064   Epoch: 17   Global Step: 178540   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:26:17,059-Speed 5464.42 samples/sec   Loss 1.6139   LearningRate 0.0064   Epoch: 17   Global Step: 178550   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:26:24,538-Speed 5477.88 samples/sec   Loss 1.6110   LearningRate 0.0064   Epoch: 17   Global Step: 178560   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:26:32,013-Speed 5479.81 samples/sec   Loss 1.5978   LearningRate 0.0064   Epoch: 17   Global Step: 178570   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:26:39,483-Speed 5484.18 samples/sec   Loss 1.6054   LearningRate 0.0064   Epoch: 17   Global Step: 178580   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:26:47,028-Speed 5429.71 samples/sec   Loss 1.5872   LearningRate 0.0064   Epoch: 17   Global Step: 178590   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:26:54,541-Speed 5453.10 samples/sec   Loss 1.6153   LearningRate 0.0064   Epoch: 17   Global Step: 178600   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:27:02,077-Speed 5436.63 samples/sec   Loss 1.6086   LearningRate 0.0064   Epoch: 17   Global Step: 178610   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:27:09,690-Speed 5380.74 samples/sec   Loss 1.6030   LearningRate 0.0064   Epoch: 17   Global Step: 178620   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:27:17,295-Speed 5386.38 samples/sec   Loss 1.5795   LearningRate 0.0064   Epoch: 17   Global Step: 178630   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:27:24,881-Speed 5400.27 samples/sec   Loss 1.6175   LearningRate 0.0064   Epoch: 17   Global Step: 178640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:27:32,443-Speed 5417.77 samples/sec   Loss 1.5989   LearningRate 0.0064   Epoch: 17   Global Step: 178650   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:27:40,008-Speed 5414.66 samples/sec   Loss 1.5895   LearningRate 0.0064   Epoch: 17   Global Step: 178660   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:27:47,577-Speed 5412.41 samples/sec   Loss 1.6029   LearningRate 0.0064   Epoch: 17   Global Step: 178670   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:27:55,139-Speed 5417.34 samples/sec   Loss 1.5782   LearningRate 0.0064   Epoch: 17   Global Step: 178680   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:28:02,672-Speed 5437.85 samples/sec   Loss 1.6197   LearningRate 0.0064   Epoch: 17   Global Step: 178690   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:28:10,175-Speed 5459.90 samples/sec   Loss 1.6109   LearningRate 0.0064   Epoch: 17   Global Step: 178700   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:28:17,780-Speed 5386.95 samples/sec   Loss 1.6337   LearningRate 0.0064   Epoch: 17   Global Step: 178710   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:28:25,205-Speed 5516.65 samples/sec   Loss 1.5752   LearningRate 0.0063   Epoch: 17   Global Step: 178720   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:28:32,689-Speed 5474.31 samples/sec   Loss 1.5803   LearningRate 0.0063   Epoch: 17   Global Step: 178730   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:28:40,163-Speed 5480.54 samples/sec   Loss 1.5738   LearningRate 0.0063   Epoch: 17   Global Step: 178740   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:28:47,701-Speed 5434.39 samples/sec   Loss 1.5964   LearningRate 0.0063   Epoch: 17   Global Step: 178750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:28:55,189-Speed 5471.50 samples/sec   Loss 1.5610   LearningRate 0.0063   Epoch: 17   Global Step: 178760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:29:02,690-Speed 5461.05 samples/sec   Loss 1.5842   LearningRate 0.0063   Epoch: 17   Global Step: 178770   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:29:10,240-Speed 5426.07 samples/sec   Loss 1.5836   LearningRate 0.0063   Epoch: 17   Global Step: 178780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:29:17,754-Speed 5451.66 samples/sec   Loss 1.6007   LearningRate 0.0063   Epoch: 17   Global Step: 178790   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:29:25,233-Speed 5477.42 samples/sec   Loss 1.5860   LearningRate 0.0063   Epoch: 17   Global Step: 178800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:29:32,712-Speed 5477.56 samples/sec   Loss 1.5863   LearningRate 0.0063   Epoch: 17   Global Step: 178810   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 11:29:40,234-Speed 5446.06 samples/sec   Loss 1.6029   LearningRate 0.0063   Epoch: 17   Global Step: 178820   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:29:47,760-Speed 5443.09 samples/sec   Loss 1.6134   LearningRate 0.0063   Epoch: 17   Global Step: 178830   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:29:55,232-Speed 5482.29 samples/sec   Loss 1.5815   LearningRate 0.0063   Epoch: 17   Global Step: 178840   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:30:02,777-Speed 5429.79 samples/sec   Loss 1.5978   LearningRate 0.0063   Epoch: 17   Global Step: 178850   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:30:10,320-Speed 5431.35 samples/sec   Loss 1.5808   LearningRate 0.0063   Epoch: 17   Global Step: 178860   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:30:17,847-Speed 5441.66 samples/sec   Loss 1.5976   LearningRate 0.0063   Epoch: 17   Global Step: 178870   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:30:25,344-Speed 5464.47 samples/sec   Loss 1.5819   LearningRate 0.0063   Epoch: 17   Global Step: 178880   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:30:32,839-Speed 5466.24 samples/sec   Loss 1.5696   LearningRate 0.0063   Epoch: 17   Global Step: 178890   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:30:40,310-Speed 5483.34 samples/sec   Loss 1.5933   LearningRate 0.0063   Epoch: 17   Global Step: 178900   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 11:30:47,763-Speed 5495.89 samples/sec   Loss 1.5708   LearningRate 0.0063   Epoch: 17   Global Step: 178910   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 11:30:55,202-Speed 5506.45 samples/sec   Loss 1.5859   LearningRate 0.0063   Epoch: 17   Global Step: 178920   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 11:31:02,633-Speed 5513.68 samples/sec   Loss 1.6077   LearningRate 0.0063   Epoch: 17   Global Step: 178930   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 11:31:10,114-Speed 5476.17 samples/sec   Loss 1.5838   LearningRate 0.0063   Epoch: 17   Global Step: 178940   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 11:31:17,562-Speed 5499.96 samples/sec   Loss 1.5593   LearningRate 0.0062   Epoch: 17   Global Step: 178950   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 11:31:25,080-Speed 5448.22 samples/sec   Loss 1.6080   LearningRate 0.0062   Epoch: 17   Global Step: 178960   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 11:31:32,542-Speed 5490.19 samples/sec   Loss 1.5981   LearningRate 0.0062   Epoch: 17   Global Step: 178970   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 11:31:40,028-Speed 5472.22 samples/sec   Loss 1.5839   LearningRate 0.0062   Epoch: 17   Global Step: 178980   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 11:31:47,636-Speed 5384.49 samples/sec   Loss 1.5753   LearningRate 0.0062   Epoch: 17   Global Step: 178990   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 11:31:55,841-Speed 5509.73 samples/sec   Loss 1.5624   LearningRate 0.0062   Epoch: 17   Global Step: 179000   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-01-09 11:32:03,399-Speed 5420.76 samples/sec   Loss 1.5821   LearningRate 0.0062   Epoch: 17   Global Step: 179010   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:32:10,863-Speed 5488.83 samples/sec   Loss 1.5446   LearningRate 0.0062   Epoch: 17   Global Step: 179020   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:32:18,449-Speed 5399.67 samples/sec   Loss 1.5619   LearningRate 0.0062   Epoch: 17   Global Step: 179030   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:32:26,004-Speed 5422.45 samples/sec   Loss 1.5809   LearningRate 0.0062   Epoch: 17   Global Step: 179040   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:32:33,485-Speed 5476.37 samples/sec   Loss 1.5868   LearningRate 0.0062   Epoch: 17   Global Step: 179050   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:32:41,099-Speed 5380.46 samples/sec   Loss 1.5694   LearningRate 0.0062   Epoch: 17   Global Step: 179060   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:32:48,650-Speed 5424.69 samples/sec   Loss 1.5783   LearningRate 0.0062   Epoch: 17   Global Step: 179070   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:32:56,168-Speed 5448.97 samples/sec   Loss 1.5749   LearningRate 0.0062   Epoch: 17   Global Step: 179080   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:33:03,703-Speed 5436.73 samples/sec   Loss 1.5858   LearningRate 0.0062   Epoch: 17   Global Step: 179090   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:33:11,233-Speed 5440.10 samples/sec   Loss 1.5839   LearningRate 0.0062   Epoch: 17   Global Step: 179100   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:33:18,793-Speed 5419.17 samples/sec   Loss 1.5644   LearningRate 0.0062   Epoch: 17   Global Step: 179110   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:33:26,364-Speed 5410.67 samples/sec   Loss 1.5957   LearningRate 0.0062   Epoch: 17   Global Step: 179120   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:33:33,890-Speed 5442.89 samples/sec   Loss 1.6085   LearningRate 0.0062   Epoch: 17   Global Step: 179130   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:33:41,388-Speed 5463.61 samples/sec   Loss 1.5714   LearningRate 0.0062   Epoch: 17   Global Step: 179140   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:33:48,954-Speed 5414.35 samples/sec   Loss 1.5624   LearningRate 0.0062   Epoch: 17   Global Step: 179150   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:33:56,495-Speed 5432.69 samples/sec   Loss 1.5594   LearningRate 0.0062   Epoch: 17   Global Step: 179160   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:34:04,067-Speed 5409.96 samples/sec   Loss 1.5967   LearningRate 0.0062   Epoch: 17   Global Step: 179170   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:34:11,559-Speed 5467.82 samples/sec   Loss 1.5845   LearningRate 0.0061   Epoch: 17   Global Step: 179180   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:34:19,003-Speed 5503.02 samples/sec   Loss 1.5683   LearningRate 0.0061   Epoch: 17   Global Step: 179190   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:34:26,502-Speed 5462.36 samples/sec   Loss 1.6064   LearningRate 0.0061   Epoch: 17   Global Step: 179200   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:34:33,985-Speed 5474.97 samples/sec   Loss 1.5876   LearningRate 0.0061   Epoch: 17   Global Step: 179210   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:34:41,489-Speed 5458.87 samples/sec   Loss 1.5604   LearningRate 0.0061   Epoch: 17   Global Step: 179220   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:34:48,946-Speed 5493.50 samples/sec   Loss 1.5670   LearningRate 0.0061   Epoch: 17   Global Step: 179230   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:34:56,474-Speed 5441.72 samples/sec   Loss 1.5507   LearningRate 0.0061   Epoch: 17   Global Step: 179240   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:35:04,020-Speed 5428.95 samples/sec   Loss 1.5854   LearningRate 0.0061   Epoch: 17   Global Step: 179250   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:35:11,580-Speed 5418.68 samples/sec   Loss 1.5724   LearningRate 0.0061   Epoch: 17   Global Step: 179260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 11:35:19,077-Speed 5464.53 samples/sec   Loss 1.5693   LearningRate 0.0061   Epoch: 17   Global Step: 179270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 11:35:26,589-Speed 5452.67 samples/sec   Loss 1.5699   LearningRate 0.0061   Epoch: 17   Global Step: 179280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 11:35:34,040-Speed 5498.50 samples/sec   Loss 1.5672   LearningRate 0.0061   Epoch: 17   Global Step: 179290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 11:35:41,545-Speed 5457.82 samples/sec   Loss 1.5783   LearningRate 0.0061   Epoch: 17   Global Step: 179300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 11:35:49,069-Speed 5445.12 samples/sec   Loss 1.5756   LearningRate 0.0061   Epoch: 17   Global Step: 179310   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:35:56,644-Speed 5408.05 samples/sec   Loss 1.5673   LearningRate 0.0061   Epoch: 17   Global Step: 179320   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:36:04,213-Speed 5412.15 samples/sec   Loss 1.5681   LearningRate 0.0061   Epoch: 17   Global Step: 179330   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:36:11,662-Speed 5498.81 samples/sec   Loss 1.5521   LearningRate 0.0061   Epoch: 17   Global Step: 179340   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:36:19,196-Speed 5437.97 samples/sec   Loss 1.5818   LearningRate 0.0061   Epoch: 17   Global Step: 179350   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:36:26,754-Speed 5420.31 samples/sec   Loss 1.5914   LearningRate 0.0061   Epoch: 17   Global Step: 179360   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:36:34,261-Speed 5456.98 samples/sec   Loss 1.5656   LearningRate 0.0061   Epoch: 17   Global Step: 179370   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:36:41,738-Speed 5478.62 samples/sec   Loss 1.5612   LearningRate 0.0061   Epoch: 17   Global Step: 179380   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:36:49,294-Speed 5421.92 samples/sec   Loss 1.5776   LearningRate 0.0061   Epoch: 17   Global Step: 179390   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:36:56,721-Speed 5515.87 samples/sec   Loss 1.5674   LearningRate 0.0061   Epoch: 17   Global Step: 179400   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:37:04,220-Speed 5462.30 samples/sec   Loss 1.5628   LearningRate 0.0060   Epoch: 17   Global Step: 179410   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:37:11,742-Speed 5446.49 samples/sec   Loss 1.5736   LearningRate 0.0060   Epoch: 17   Global Step: 179420   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:37:19,241-Speed 5462.46 samples/sec   Loss 1.6077   LearningRate 0.0060   Epoch: 17   Global Step: 179430   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:37:26,750-Speed 5455.63 samples/sec   Loss 1.5743   LearningRate 0.0060   Epoch: 17   Global Step: 179440   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:37:34,199-Speed 5499.21 samples/sec   Loss 1.5447   LearningRate 0.0060   Epoch: 17   Global Step: 179450   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:37:41,718-Speed 5448.47 samples/sec   Loss 1.5976   LearningRate 0.0060   Epoch: 17   Global Step: 179460   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:37:49,205-Speed 5471.77 samples/sec   Loss 1.5844   LearningRate 0.0060   Epoch: 17   Global Step: 179470   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-01-09 11:37:56,706-Speed 5461.33 samples/sec   Loss 1.5389   LearningRate 0.0060   Epoch: 17   Global Step: 179480   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-01-09 11:38:04,193-Speed 5471.36 samples/sec   Loss 1.5671   LearningRate 0.0060   Epoch: 17   Global Step: 179490   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-01-09 11:38:11,741-Speed 5427.87 samples/sec   Loss 1.5762   LearningRate 0.0060   Epoch: 17   Global Step: 179500   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-01-09 11:38:19,310-Speed 5411.79 samples/sec   Loss 1.5688   LearningRate 0.0060   Epoch: 17   Global Step: 179510   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-01-09 11:38:26,786-Speed 5479.89 samples/sec   Loss 1.5555   LearningRate 0.0060   Epoch: 17   Global Step: 179520   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-01-09 11:38:34,307-Speed 5446.95 samples/sec   Loss 1.5869   LearningRate 0.0060   Epoch: 17   Global Step: 179530   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-01-09 11:38:41,803-Speed 5464.67 samples/sec   Loss 1.5482   LearningRate 0.0060   Epoch: 17   Global Step: 179540   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-01-09 11:38:49,295-Speed 5468.17 samples/sec   Loss 1.5718   LearningRate 0.0060   Epoch: 17   Global Step: 179550   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-01-09 11:38:56,732-Speed 5508.83 samples/sec   Loss 1.5469   LearningRate 0.0060   Epoch: 17   Global Step: 179560   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-01-09 11:39:04,296-Speed 5415.51 samples/sec   Loss 1.5979   LearningRate 0.0060   Epoch: 17   Global Step: 179570   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:39:11,785-Speed 5470.28 samples/sec   Loss 1.5567   LearningRate 0.0060   Epoch: 17   Global Step: 179580   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:39:19,271-Speed 5472.16 samples/sec   Loss 1.5826   LearningRate 0.0060   Epoch: 17   Global Step: 179590   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:39:26,827-Speed 5421.35 samples/sec   Loss 1.5698   LearningRate 0.0060   Epoch: 17   Global Step: 179600   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:39:34,402-Speed 5408.47 samples/sec   Loss 1.5509   LearningRate 0.0060   Epoch: 17   Global Step: 179610   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:39:42,037-Speed 5365.56 samples/sec   Loss 1.5532   LearningRate 0.0060   Epoch: 17   Global Step: 179620   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:39:49,694-Speed 5349.71 samples/sec   Loss 1.5279   LearningRate 0.0060   Epoch: 17   Global Step: 179630   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:39:57,254-Speed 5419.09 samples/sec   Loss 1.5578   LearningRate 0.0059   Epoch: 17   Global Step: 179640   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:40:04,880-Speed 5371.72 samples/sec   Loss 1.5594   LearningRate 0.0059   Epoch: 17   Global Step: 179650   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:40:12,550-Speed 5340.54 samples/sec   Loss 1.5550   LearningRate 0.0059   Epoch: 17   Global Step: 179660   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:40:20,101-Speed 5425.00 samples/sec   Loss 1.5478   LearningRate 0.0059   Epoch: 17   Global Step: 179670   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:40:27,821-Speed 5306.61 samples/sec   Loss 1.5359   LearningRate 0.0059   Epoch: 17   Global Step: 179680   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:40:35,541-Speed 5306.22 samples/sec   Loss 1.5421   LearningRate 0.0059   Epoch: 17   Global Step: 179690   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:40:43,143-Speed 5388.84 samples/sec   Loss 1.5459   LearningRate 0.0059   Epoch: 17   Global Step: 179700   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:40:50,650-Speed 5457.12 samples/sec   Loss 1.5206   LearningRate 0.0059   Epoch: 17   Global Step: 179710   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:40:58,146-Speed 5465.04 samples/sec   Loss 1.5426   LearningRate 0.0059   Epoch: 17   Global Step: 179720   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:41:05,748-Speed 5389.24 samples/sec   Loss 1.5333   LearningRate 0.0059   Epoch: 17   Global Step: 179730   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:41:13,270-Speed 5446.26 samples/sec   Loss 1.5528   LearningRate 0.0059   Epoch: 17   Global Step: 179740   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:41:20,855-Speed 5400.41 samples/sec   Loss 1.5570   LearningRate 0.0059   Epoch: 17   Global Step: 179750   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:41:28,394-Speed 5433.85 samples/sec   Loss 1.5286   LearningRate 0.0059   Epoch: 17   Global Step: 179760   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:41:35,865-Speed 5483.18 samples/sec   Loss 1.5430   LearningRate 0.0059   Epoch: 17   Global Step: 179770   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:41:43,440-Speed 5408.72 samples/sec   Loss 1.5436   LearningRate 0.0059   Epoch: 17   Global Step: 179780   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:41:50,931-Speed 5468.44 samples/sec   Loss 1.5328   LearningRate 0.0059   Epoch: 17   Global Step: 179790   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:41:58,461-Speed 5440.05 samples/sec   Loss 1.5142   LearningRate 0.0059   Epoch: 17   Global Step: 179800   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:42:06,002-Speed 5431.88 samples/sec   Loss 1.5392   LearningRate 0.0059   Epoch: 17   Global Step: 179810   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:42:13,712-Speed 5314.10 samples/sec   Loss 1.5374   LearningRate 0.0059   Epoch: 17   Global Step: 179820   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:42:21,191-Speed 5477.24 samples/sec   Loss 1.5243   LearningRate 0.0059   Epoch: 17   Global Step: 179830   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:42:28,722-Speed 5439.21 samples/sec   Loss 1.5468   LearningRate 0.0059   Epoch: 17   Global Step: 179840   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:42:36,334-Speed 5381.79 samples/sec   Loss 1.5559   LearningRate 0.0059   Epoch: 17   Global Step: 179850   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:42:43,885-Speed 5425.03 samples/sec   Loss 1.5131   LearningRate 0.0059   Epoch: 17   Global Step: 179860   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:42:51,367-Speed 5476.03 samples/sec   Loss 1.5636   LearningRate 0.0058   Epoch: 17   Global Step: 179870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 11:42:58,921-Speed 5422.13 samples/sec   Loss 1.5717   LearningRate 0.0058   Epoch: 17   Global Step: 179880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 11:43:06,453-Speed 5439.08 samples/sec   Loss 1.5433   LearningRate 0.0058   Epoch: 17   Global Step: 179890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 11:43:14,046-Speed 5395.42 samples/sec   Loss 1.5311   LearningRate 0.0058   Epoch: 17   Global Step: 179900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 11:43:21,589-Speed 5431.13 samples/sec   Loss 1.5508   LearningRate 0.0058   Epoch: 17   Global Step: 179910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 11:43:29,201-Speed 5381.46 samples/sec   Loss 1.5287   LearningRate 0.0058   Epoch: 17   Global Step: 179920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 11:43:36,752-Speed 5424.66 samples/sec   Loss 1.5593   LearningRate 0.0058   Epoch: 17   Global Step: 179930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 11:43:44,255-Speed 5460.16 samples/sec   Loss 1.5272   LearningRate 0.0058   Epoch: 17   Global Step: 179940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 11:43:51,777-Speed 5445.91 samples/sec   Loss 1.5587   LearningRate 0.0058   Epoch: 17   Global Step: 179950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 11:43:59,334-Speed 5420.62 samples/sec   Loss 1.5286   LearningRate 0.0058   Epoch: 17   Global Step: 179960   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:44:06,905-Speed 5410.89 samples/sec   Loss 1.5477   LearningRate 0.0058   Epoch: 17   Global Step: 179970   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:44:14,393-Speed 5470.61 samples/sec   Loss 1.5467   LearningRate 0.0058   Epoch: 17   Global Step: 179980   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:44:21,935-Speed 5432.33 samples/sec   Loss 1.5359   LearningRate 0.0058   Epoch: 17   Global Step: 179990   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:44:29,503-Speed 5412.81 samples/sec   Loss 1.5487   LearningRate 0.0058   Epoch: 17   Global Step: 180000   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:45:13,738-[lfw][180000]XNorm: 23.220754
Training: 2022-01-09 11:45:13,739-[lfw][180000]Accuracy-Flip: 0.99833+-0.00236
Training: 2022-01-09 11:45:13,739-[lfw][180000]Accuracy-Highest: 0.99833
Training: 2022-01-09 11:46:05,256-[cfp_fp][180000]XNorm: 22.335384
Training: 2022-01-09 11:46:05,257-[cfp_fp][180000]Accuracy-Flip: 0.99314+-0.00408
Training: 2022-01-09 11:46:05,258-[cfp_fp][180000]Accuracy-Highest: 0.99371
Training: 2022-01-09 11:46:49,575-[agedb_30][180000]XNorm: 23.338926
Training: 2022-01-09 11:46:49,576-[agedb_30][180000]Accuracy-Flip: 0.98300+-0.00600
Training: 2022-01-09 11:46:49,576-[agedb_30][180000]Accuracy-Highest: 0.98433
Training: 2022-01-09 11:46:57,151-Speed 277.42 samples/sec   Loss 1.5445   LearningRate 0.0058   Epoch: 17   Global Step: 180010   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:47:04,658-Speed 5457.01 samples/sec   Loss 1.5341   LearningRate 0.0058   Epoch: 17   Global Step: 180020   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:47:12,207-Speed 5426.20 samples/sec   Loss 1.5508   LearningRate 0.0058   Epoch: 17   Global Step: 180030   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:47:19,772-Speed 5415.18 samples/sec   Loss 1.5559   LearningRate 0.0058   Epoch: 17   Global Step: 180040   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:47:27,238-Speed 5487.41 samples/sec   Loss 1.5583   LearningRate 0.0058   Epoch: 17   Global Step: 180050   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:47:34,720-Speed 5475.11 samples/sec   Loss 1.5257   LearningRate 0.0058   Epoch: 17   Global Step: 180060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 11:47:42,254-Speed 5437.40 samples/sec   Loss 1.5279   LearningRate 0.0058   Epoch: 17   Global Step: 180070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 11:47:49,805-Speed 5424.57 samples/sec   Loss 1.5514   LearningRate 0.0058   Epoch: 17   Global Step: 180080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 11:47:57,453-Speed 5356.30 samples/sec   Loss 1.5230   LearningRate 0.0058   Epoch: 17   Global Step: 180090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 11:48:05,265-Speed 5243.93 samples/sec   Loss 1.5343   LearningRate 0.0058   Epoch: 17   Global Step: 180100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 11:48:12,741-Speed 5479.49 samples/sec   Loss 1.5048   LearningRate 0.0057   Epoch: 17   Global Step: 180110   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:48:20,206-Speed 5487.99 samples/sec   Loss 1.5288   LearningRate 0.0057   Epoch: 17   Global Step: 180120   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:48:27,734-Speed 5441.51 samples/sec   Loss 1.5151   LearningRate 0.0057   Epoch: 17   Global Step: 180130   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:48:35,295-Speed 5417.86 samples/sec   Loss 1.5254   LearningRate 0.0057   Epoch: 17   Global Step: 180140   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:48:42,828-Speed 5438.13 samples/sec   Loss 1.5319   LearningRate 0.0057   Epoch: 17   Global Step: 180150   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:48:50,363-Speed 5436.80 samples/sec   Loss 1.5109   LearningRate 0.0057   Epoch: 17   Global Step: 180160   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:48:57,888-Speed 5444.06 samples/sec   Loss 1.5248   LearningRate 0.0057   Epoch: 17   Global Step: 180170   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:49:05,401-Speed 5452.13 samples/sec   Loss 1.5341   LearningRate 0.0057   Epoch: 17   Global Step: 180180   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:49:12,919-Speed 5448.96 samples/sec   Loss 1.5302   LearningRate 0.0057   Epoch: 17   Global Step: 180190   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:49:20,424-Speed 5458.81 samples/sec   Loss 1.5239   LearningRate 0.0057   Epoch: 17   Global Step: 180200   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:49:27,911-Speed 5471.23 samples/sec   Loss 1.5222   LearningRate 0.0057   Epoch: 17   Global Step: 180210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 11:49:35,363-Speed 5497.23 samples/sec   Loss 1.5200   LearningRate 0.0057   Epoch: 17   Global Step: 180220   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:49:42,922-Speed 5419.55 samples/sec   Loss 1.5426   LearningRate 0.0057   Epoch: 17   Global Step: 180230   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:49:50,411-Speed 5469.85 samples/sec   Loss 1.5198   LearningRate 0.0057   Epoch: 17   Global Step: 180240   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:49:57,851-Speed 5505.95 samples/sec   Loss 1.5110   LearningRate 0.0057   Epoch: 17   Global Step: 180250   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:50:05,319-Speed 5485.41 samples/sec   Loss 1.5142   LearningRate 0.0057   Epoch: 17   Global Step: 180260   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:50:12,797-Speed 5478.09 samples/sec   Loss 1.5536   LearningRate 0.0057   Epoch: 17   Global Step: 180270   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:50:20,257-Speed 5491.42 samples/sec   Loss 1.5113   LearningRate 0.0057   Epoch: 17   Global Step: 180280   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:50:27,762-Speed 5458.31 samples/sec   Loss 1.5239   LearningRate 0.0057   Epoch: 17   Global Step: 180290   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:50:35,247-Speed 5472.87 samples/sec   Loss 1.5016   LearningRate 0.0057   Epoch: 17   Global Step: 180300   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:50:42,754-Speed 5457.08 samples/sec   Loss 1.5279   LearningRate 0.0057   Epoch: 17   Global Step: 180310   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 11:50:50,246-Speed 5468.19 samples/sec   Loss 1.5544   LearningRate 0.0057   Epoch: 17   Global Step: 180320   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:50:57,718-Speed 5482.05 samples/sec   Loss 1.5210   LearningRate 0.0057   Epoch: 17   Global Step: 180330   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:51:05,233-Speed 5451.74 samples/sec   Loss 1.5443   LearningRate 0.0057   Epoch: 17   Global Step: 180340   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:51:12,723-Speed 5468.73 samples/sec   Loss 1.5146   LearningRate 0.0056   Epoch: 17   Global Step: 180350   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:51:20,223-Speed 5462.71 samples/sec   Loss 1.5113   LearningRate 0.0056   Epoch: 17   Global Step: 180360   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:51:27,748-Speed 5443.85 samples/sec   Loss 1.5207   LearningRate 0.0056   Epoch: 17   Global Step: 180370   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:51:35,211-Speed 5488.65 samples/sec   Loss 1.5312   LearningRate 0.0056   Epoch: 17   Global Step: 180380   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:51:42,740-Speed 5441.64 samples/sec   Loss 1.5441   LearningRate 0.0056   Epoch: 17   Global Step: 180390   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:51:50,349-Speed 5383.87 samples/sec   Loss 1.5414   LearningRate 0.0056   Epoch: 17   Global Step: 180400   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:51:57,859-Speed 5454.47 samples/sec   Loss 1.5397   LearningRate 0.0056   Epoch: 17   Global Step: 180410   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:52:05,405-Speed 5429.09 samples/sec   Loss 1.5191   LearningRate 0.0056   Epoch: 17   Global Step: 180420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 11:52:12,918-Speed 5452.20 samples/sec   Loss 1.5249   LearningRate 0.0056   Epoch: 17   Global Step: 180430   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:52:20,467-Speed 5426.54 samples/sec   Loss 1.5223   LearningRate 0.0056   Epoch: 17   Global Step: 180440   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:52:28,100-Speed 5367.37 samples/sec   Loss 1.4995   LearningRate 0.0056   Epoch: 17   Global Step: 180450   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:52:35,617-Speed 5449.37 samples/sec   Loss 1.5344   LearningRate 0.0056   Epoch: 17   Global Step: 180460   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:52:43,175-Speed 5420.86 samples/sec   Loss 1.4781   LearningRate 0.0056   Epoch: 17   Global Step: 180470   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:52:50,718-Speed 5430.33 samples/sec   Loss 1.5179   LearningRate 0.0056   Epoch: 17   Global Step: 180480   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:52:58,221-Speed 5460.21 samples/sec   Loss 1.5233   LearningRate 0.0056   Epoch: 17   Global Step: 180490   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:53:05,778-Speed 5420.98 samples/sec   Loss 1.5079   LearningRate 0.0056   Epoch: 17   Global Step: 180500   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:53:13,240-Speed 5490.05 samples/sec   Loss 1.5151   LearningRate 0.0056   Epoch: 17   Global Step: 180510   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:53:20,726-Speed 5472.38 samples/sec   Loss 1.5064   LearningRate 0.0056   Epoch: 17   Global Step: 180520   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:53:28,245-Speed 5448.33 samples/sec   Loss 1.5129   LearningRate 0.0056   Epoch: 17   Global Step: 180530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 11:53:35,733-Speed 5471.01 samples/sec   Loss 1.5171   LearningRate 0.0056   Epoch: 17   Global Step: 180540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 11:53:43,180-Speed 5500.66 samples/sec   Loss 1.5077   LearningRate 0.0056   Epoch: 17   Global Step: 180550   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:53:50,647-Speed 5486.12 samples/sec   Loss 1.5015   LearningRate 0.0056   Epoch: 17   Global Step: 180560   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:53:58,152-Speed 5458.60 samples/sec   Loss 1.4983   LearningRate 0.0056   Epoch: 17   Global Step: 180570   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:54:05,651-Speed 5463.06 samples/sec   Loss 1.5211   LearningRate 0.0056   Epoch: 17   Global Step: 180580   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:54:13,226-Speed 5407.60 samples/sec   Loss 1.5133   LearningRate 0.0055   Epoch: 17   Global Step: 180590   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:54:20,720-Speed 5466.48 samples/sec   Loss 1.5183   LearningRate 0.0055   Epoch: 17   Global Step: 180600   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:54:28,223-Speed 5459.76 samples/sec   Loss 1.5110   LearningRate 0.0055   Epoch: 17   Global Step: 180610   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:54:35,724-Speed 5461.60 samples/sec   Loss 1.4808   LearningRate 0.0055   Epoch: 17   Global Step: 180620   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:54:43,223-Speed 5463.21 samples/sec   Loss 1.5304   LearningRate 0.0055   Epoch: 17   Global Step: 180630   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:54:50,738-Speed 5451.15 samples/sec   Loss 1.5132   LearningRate 0.0055   Epoch: 17   Global Step: 180640   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:54:58,162-Speed 5518.20 samples/sec   Loss 1.4941   LearningRate 0.0055   Epoch: 17   Global Step: 180650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 11:55:05,724-Speed 5416.95 samples/sec   Loss 1.5266   LearningRate 0.0055   Epoch: 17   Global Step: 180660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 11:55:13,163-Speed 5507.06 samples/sec   Loss 1.5231   LearningRate 0.0055   Epoch: 17   Global Step: 180670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 11:55:20,739-Speed 5407.49 samples/sec   Loss 1.4984   LearningRate 0.0055   Epoch: 17   Global Step: 180680   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:55:28,248-Speed 5454.68 samples/sec   Loss 1.5320   LearningRate 0.0055   Epoch: 17   Global Step: 180690   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:55:35,716-Speed 5485.85 samples/sec   Loss 1.5007   LearningRate 0.0055   Epoch: 17   Global Step: 180700   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:55:43,289-Speed 5409.67 samples/sec   Loss 1.5043   LearningRate 0.0055   Epoch: 17   Global Step: 180710   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:55:50,972-Speed 5332.07 samples/sec   Loss 1.5024   LearningRate 0.0055   Epoch: 17   Global Step: 180720   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:55:58,516-Speed 5429.68 samples/sec   Loss 1.4950   LearningRate 0.0055   Epoch: 17   Global Step: 180730   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:56:06,067-Speed 5426.09 samples/sec   Loss 1.5110   LearningRate 0.0055   Epoch: 17   Global Step: 180740   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:56:13,585-Speed 5448.47 samples/sec   Loss 1.5242   LearningRate 0.0055   Epoch: 17   Global Step: 180750   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:56:21,062-Speed 5479.37 samples/sec   Loss 1.4949   LearningRate 0.0055   Epoch: 17   Global Step: 180760   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:56:28,649-Speed 5398.97 samples/sec   Loss 1.4938   LearningRate 0.0055   Epoch: 17   Global Step: 180770   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:56:36,180-Speed 5439.35 samples/sec   Loss 1.4742   LearningRate 0.0055   Epoch: 17   Global Step: 180780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 11:56:43,665-Speed 5473.53 samples/sec   Loss 1.5129   LearningRate 0.0055   Epoch: 17   Global Step: 180790   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:56:51,150-Speed 5472.36 samples/sec   Loss 1.5015   LearningRate 0.0055   Epoch: 17   Global Step: 180800   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:56:58,663-Speed 5453.14 samples/sec   Loss 1.4833   LearningRate 0.0055   Epoch: 17   Global Step: 180810   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:57:06,258-Speed 5393.55 samples/sec   Loss 1.5272   LearningRate 0.0055   Epoch: 17   Global Step: 180820   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:57:13,725-Speed 5486.00 samples/sec   Loss 1.5154   LearningRate 0.0054   Epoch: 17   Global Step: 180830   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:57:21,266-Speed 5432.26 samples/sec   Loss 1.5157   LearningRate 0.0054   Epoch: 17   Global Step: 180840   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:57:28,789-Speed 5445.42 samples/sec   Loss 1.5143   LearningRate 0.0054   Epoch: 17   Global Step: 180850   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:57:36,313-Speed 5444.41 samples/sec   Loss 1.4930   LearningRate 0.0054   Epoch: 17   Global Step: 180860   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:57:43,847-Speed 5437.57 samples/sec   Loss 1.4936   LearningRate 0.0054   Epoch: 17   Global Step: 180870   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:57:51,353-Speed 5457.47 samples/sec   Loss 1.5083   LearningRate 0.0054   Epoch: 17   Global Step: 180880   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:57:58,893-Speed 5433.44 samples/sec   Loss 1.5389   LearningRate 0.0054   Epoch: 17   Global Step: 180890   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:58:06,435-Speed 5431.26 samples/sec   Loss 1.5193   LearningRate 0.0054   Epoch: 17   Global Step: 180900   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:58:13,971-Speed 5436.50 samples/sec   Loss 1.4901   LearningRate 0.0054   Epoch: 17   Global Step: 180910   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:58:21,459-Speed 5470.42 samples/sec   Loss 1.4966   LearningRate 0.0054   Epoch: 17   Global Step: 180920   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:58:28,985-Speed 5443.87 samples/sec   Loss 1.4889   LearningRate 0.0054   Epoch: 17   Global Step: 180930   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:58:36,573-Speed 5398.55 samples/sec   Loss 1.5209   LearningRate 0.0054   Epoch: 17   Global Step: 180940   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:58:44,098-Speed 5443.84 samples/sec   Loss 1.4986   LearningRate 0.0054   Epoch: 17   Global Step: 180950   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:58:51,576-Speed 5478.41 samples/sec   Loss 1.4677   LearningRate 0.0054   Epoch: 17   Global Step: 180960   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:58:59,136-Speed 5418.69 samples/sec   Loss 1.5049   LearningRate 0.0054   Epoch: 17   Global Step: 180970   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:59:06,659-Speed 5445.54 samples/sec   Loss 1.4903   LearningRate 0.0054   Epoch: 17   Global Step: 180980   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:59:14,193-Speed 5436.97 samples/sec   Loss 1.5124   LearningRate 0.0054   Epoch: 17   Global Step: 180990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 11:59:21,694-Speed 5461.16 samples/sec   Loss 1.4982   LearningRate 0.0054   Epoch: 17   Global Step: 181000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 11:59:29,267-Speed 5409.23 samples/sec   Loss 1.4826   LearningRate 0.0054   Epoch: 17   Global Step: 181010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 11:59:36,754-Speed 5471.91 samples/sec   Loss 1.4969   LearningRate 0.0054   Epoch: 17   Global Step: 181020   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:59:44,343-Speed 5397.48 samples/sec   Loss 1.4976   LearningRate 0.0054   Epoch: 17   Global Step: 181030   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:59:51,903-Speed 5418.96 samples/sec   Loss 1.5006   LearningRate 0.0054   Epoch: 17   Global Step: 181040   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 11:59:59,442-Speed 5433.77 samples/sec   Loss 1.4871   LearningRate 0.0054   Epoch: 17   Global Step: 181050   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:00:06,958-Speed 5450.69 samples/sec   Loss 1.5009   LearningRate 0.0054   Epoch: 17   Global Step: 181060   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:00:14,432-Speed 5480.32 samples/sec   Loss 1.5000   LearningRate 0.0054   Epoch: 17   Global Step: 181070   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:00:22,040-Speed 5384.85 samples/sec   Loss 1.4773   LearningRate 0.0053   Epoch: 17   Global Step: 181080   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:00:29,540-Speed 5462.05 samples/sec   Loss 1.4723   LearningRate 0.0053   Epoch: 17   Global Step: 181090   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:00:37,089-Speed 5426.62 samples/sec   Loss 1.5081   LearningRate 0.0053   Epoch: 17   Global Step: 181100   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:00:44,589-Speed 5462.04 samples/sec   Loss 1.5025   LearningRate 0.0053   Epoch: 17   Global Step: 181110   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:00:52,061-Speed 5483.06 samples/sec   Loss 1.4710   LearningRate 0.0053   Epoch: 17   Global Step: 181120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 12:00:59,567-Speed 5457.19 samples/sec   Loss 1.4856   LearningRate 0.0053   Epoch: 17   Global Step: 181130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 12:01:07,112-Speed 5429.77 samples/sec   Loss 1.4954   LearningRate 0.0053   Epoch: 17   Global Step: 181140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 12:01:14,664-Speed 5424.33 samples/sec   Loss 1.5160   LearningRate 0.0053   Epoch: 17   Global Step: 181150   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:01:22,245-Speed 5403.95 samples/sec   Loss 1.4747   LearningRate 0.0053   Epoch: 17   Global Step: 181160   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:01:29,860-Speed 5379.68 samples/sec   Loss 1.4813   LearningRate 0.0053   Epoch: 17   Global Step: 181170   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:01:37,409-Speed 5426.22 samples/sec   Loss 1.4928   LearningRate 0.0053   Epoch: 17   Global Step: 181180   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:01:44,955-Speed 5428.45 samples/sec   Loss 1.5119   LearningRate 0.0053   Epoch: 17   Global Step: 181190   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:01:52,493-Speed 5435.22 samples/sec   Loss 1.4788   LearningRate 0.0053   Epoch: 17   Global Step: 181200   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:02:00,025-Speed 5439.02 samples/sec   Loss 1.5025   LearningRate 0.0053   Epoch: 17   Global Step: 181210   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:02:07,522-Speed 5464.01 samples/sec   Loss 1.5041   LearningRate 0.0053   Epoch: 17   Global Step: 181220   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:02:15,075-Speed 5423.57 samples/sec   Loss 1.4775   LearningRate 0.0053   Epoch: 17   Global Step: 181230   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:02:22,554-Speed 5477.41 samples/sec   Loss 1.4894   LearningRate 0.0053   Epoch: 17   Global Step: 181240   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:02:30,088-Speed 5438.22 samples/sec   Loss 1.4646   LearningRate 0.0053   Epoch: 17   Global Step: 181250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 12:02:37,598-Speed 5454.15 samples/sec   Loss 1.4649   LearningRate 0.0053   Epoch: 17   Global Step: 181260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 12:02:45,148-Speed 5425.85 samples/sec   Loss 1.4598   LearningRate 0.0053   Epoch: 17   Global Step: 181270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 12:02:52,636-Speed 5471.31 samples/sec   Loss 1.4990   LearningRate 0.0053   Epoch: 17   Global Step: 181280   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:03:00,130-Speed 5466.49 samples/sec   Loss 1.4857   LearningRate 0.0053   Epoch: 17   Global Step: 181290   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:03:07,693-Speed 5416.41 samples/sec   Loss 1.4863   LearningRate 0.0053   Epoch: 17   Global Step: 181300   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:03:15,199-Speed 5457.88 samples/sec   Loss 1.4647   LearningRate 0.0053   Epoch: 17   Global Step: 181310   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:03:22,646-Speed 5500.96 samples/sec   Loss 1.4707   LearningRate 0.0052   Epoch: 17   Global Step: 181320   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:03:30,166-Speed 5447.74 samples/sec   Loss 1.5003   LearningRate 0.0052   Epoch: 17   Global Step: 181330   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:03:37,793-Speed 5370.77 samples/sec   Loss 1.4891   LearningRate 0.0052   Epoch: 17   Global Step: 181340   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:03:45,331-Speed 5434.98 samples/sec   Loss 1.4922   LearningRate 0.0052   Epoch: 17   Global Step: 181350   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:03:52,886-Speed 5422.46 samples/sec   Loss 1.4634   LearningRate 0.0052   Epoch: 17   Global Step: 181360   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:04:00,502-Speed 5379.20 samples/sec   Loss 1.4629   LearningRate 0.0052   Epoch: 17   Global Step: 181370   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:04:07,938-Speed 5508.92 samples/sec   Loss 1.4753   LearningRate 0.0052   Epoch: 17   Global Step: 181380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 12:04:15,482-Speed 5429.77 samples/sec   Loss 1.4783   LearningRate 0.0052   Epoch: 17   Global Step: 181390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 12:04:23,163-Speed 5333.98 samples/sec   Loss 1.5082   LearningRate 0.0052   Epoch: 17   Global Step: 181400   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:04:30,694-Speed 5439.94 samples/sec   Loss 1.4706   LearningRate 0.0052   Epoch: 17   Global Step: 181410   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:04:38,223-Speed 5440.72 samples/sec   Loss 1.4920   LearningRate 0.0052   Epoch: 17   Global Step: 181420   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:04:45,691-Speed 5485.02 samples/sec   Loss 1.4576   LearningRate 0.0052   Epoch: 17   Global Step: 181430   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:04:53,204-Speed 5453.11 samples/sec   Loss 1.4962   LearningRate 0.0052   Epoch: 17   Global Step: 181440   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:05:00,710-Speed 5457.87 samples/sec   Loss 1.4937   LearningRate 0.0052   Epoch: 17   Global Step: 181450   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:05:08,260-Speed 5425.91 samples/sec   Loss 1.5062   LearningRate 0.0052   Epoch: 17   Global Step: 181460   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:05:15,773-Speed 5452.35 samples/sec   Loss 1.4863   LearningRate 0.0052   Epoch: 17   Global Step: 181470   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:05:23,290-Speed 5449.25 samples/sec   Loss 1.4824   LearningRate 0.0052   Epoch: 17   Global Step: 181480   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:05:30,871-Speed 5403.96 samples/sec   Loss 1.4832   LearningRate 0.0052   Epoch: 17   Global Step: 181490   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:05:38,480-Speed 5384.17 samples/sec   Loss 1.4557   LearningRate 0.0052   Epoch: 17   Global Step: 181500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 12:05:46,104-Speed 5373.19 samples/sec   Loss 1.4848   LearningRate 0.0052   Epoch: 17   Global Step: 181510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 12:05:53,587-Speed 5474.27 samples/sec   Loss 1.4870   LearningRate 0.0052   Epoch: 17   Global Step: 181520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 12:06:01,207-Speed 5376.28 samples/sec   Loss 1.4831   LearningRate 0.0052   Epoch: 17   Global Step: 181530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 12:06:08,706-Speed 5462.86 samples/sec   Loss 1.4815   LearningRate 0.0052   Epoch: 17   Global Step: 181540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 12:06:16,296-Speed 5397.89 samples/sec   Loss 1.4754   LearningRate 0.0052   Epoch: 17   Global Step: 181550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 12:06:23,885-Speed 5397.58 samples/sec   Loss 1.4686   LearningRate 0.0052   Epoch: 17   Global Step: 181560   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:06:31,381-Speed 5465.21 samples/sec   Loss 1.4546   LearningRate 0.0051   Epoch: 17   Global Step: 181570   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:06:38,925-Speed 5430.44 samples/sec   Loss 1.4688   LearningRate 0.0051   Epoch: 17   Global Step: 181580   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:06:46,450-Speed 5443.77 samples/sec   Loss 1.4693   LearningRate 0.0051   Epoch: 17   Global Step: 181590   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:06:53,933-Speed 5474.06 samples/sec   Loss 1.4738   LearningRate 0.0051   Epoch: 17   Global Step: 181600   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:07:01,388-Speed 5494.95 samples/sec   Loss 1.4521   LearningRate 0.0051   Epoch: 17   Global Step: 181610   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:07:08,894-Speed 5457.94 samples/sec   Loss 1.4630   LearningRate 0.0051   Epoch: 17   Global Step: 181620   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:07:16,411-Speed 5449.73 samples/sec   Loss 1.4778   LearningRate 0.0051   Epoch: 17   Global Step: 181630   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:07:23,943-Speed 5438.45 samples/sec   Loss 1.4398   LearningRate 0.0051   Epoch: 17   Global Step: 181640   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:07:31,547-Speed 5387.64 samples/sec   Loss 1.4753   LearningRate 0.0051   Epoch: 17   Global Step: 181650   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:07:39,071-Speed 5444.47 samples/sec   Loss 1.4788   LearningRate 0.0051   Epoch: 17   Global Step: 181660   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:07:46,542-Speed 5483.92 samples/sec   Loss 1.4575   LearningRate 0.0051   Epoch: 17   Global Step: 181670   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:07:54,053-Speed 5453.75 samples/sec   Loss 1.4760   LearningRate 0.0051   Epoch: 17   Global Step: 181680   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:08:01,606-Speed 5423.37 samples/sec   Loss 1.4714   LearningRate 0.0051   Epoch: 17   Global Step: 181690   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:08:09,122-Speed 5450.99 samples/sec   Loss 1.4670   LearningRate 0.0051   Epoch: 17   Global Step: 181700   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:08:16,664-Speed 5431.22 samples/sec   Loss 1.4475   LearningRate 0.0051   Epoch: 17   Global Step: 181710   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:08:24,154-Speed 5469.30 samples/sec   Loss 1.4702   LearningRate 0.0051   Epoch: 17   Global Step: 181720   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:08:31,666-Speed 5452.79 samples/sec   Loss 1.4609   LearningRate 0.0051   Epoch: 17   Global Step: 181730   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:08:39,159-Speed 5468.04 samples/sec   Loss 1.4601   LearningRate 0.0051   Epoch: 17   Global Step: 181740   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:08:46,597-Speed 5507.44 samples/sec   Loss 1.4579   LearningRate 0.0051   Epoch: 17   Global Step: 181750   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:08:54,069-Speed 5482.72 samples/sec   Loss 1.4751   LearningRate 0.0051   Epoch: 17   Global Step: 181760   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:09:01,481-Speed 5526.82 samples/sec   Loss 1.4617   LearningRate 0.0051   Epoch: 17   Global Step: 181770   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:09:08,961-Speed 5476.39 samples/sec   Loss 1.4499   LearningRate 0.0051   Epoch: 17   Global Step: 181780   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:09:16,417-Speed 5494.72 samples/sec   Loss 1.4512   LearningRate 0.0051   Epoch: 17   Global Step: 181790   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:09:23,907-Speed 5469.00 samples/sec   Loss 1.4583   LearningRate 0.0051   Epoch: 17   Global Step: 181800   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:09:31,331-Speed 5518.03 samples/sec   Loss 1.4534   LearningRate 0.0051   Epoch: 17   Global Step: 181810   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:09:38,857-Speed 5442.85 samples/sec   Loss 1.4580   LearningRate 0.0050   Epoch: 17   Global Step: 181820   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:09:46,377-Speed 5448.02 samples/sec   Loss 1.4739   LearningRate 0.0050   Epoch: 17   Global Step: 181830   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:09:53,832-Speed 5495.66 samples/sec   Loss 1.4452   LearningRate 0.0050   Epoch: 17   Global Step: 181840   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:10:01,334-Speed 5460.13 samples/sec   Loss 1.4611   LearningRate 0.0050   Epoch: 17   Global Step: 181850   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:10:08,867-Speed 5438.25 samples/sec   Loss 1.4759   LearningRate 0.0050   Epoch: 17   Global Step: 181860   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:10:16,421-Speed 5422.69 samples/sec   Loss 1.4549   LearningRate 0.0050   Epoch: 17   Global Step: 181870   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:10:23,847-Speed 5516.59 samples/sec   Loss 1.4785   LearningRate 0.0050   Epoch: 17   Global Step: 181880   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:10:31,295-Speed 5500.74 samples/sec   Loss 1.4630   LearningRate 0.0050   Epoch: 17   Global Step: 181890   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:10:38,801-Speed 5456.82 samples/sec   Loss 1.4753   LearningRate 0.0050   Epoch: 17   Global Step: 181900   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:10:46,298-Speed 5464.45 samples/sec   Loss 1.4559   LearningRate 0.0050   Epoch: 17   Global Step: 181910   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:10:53,732-Speed 5511.08 samples/sec   Loss 1.4405   LearningRate 0.0050   Epoch: 17   Global Step: 181920   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:11:01,228-Speed 5464.70 samples/sec   Loss 1.4766   LearningRate 0.0050   Epoch: 17   Global Step: 181930   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:11:08,663-Speed 5509.50 samples/sec   Loss 1.4551   LearningRate 0.0050   Epoch: 17   Global Step: 181940   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:11:16,171-Speed 5456.79 samples/sec   Loss 1.4628   LearningRate 0.0050   Epoch: 17   Global Step: 181950   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:11:23,676-Speed 5458.63 samples/sec   Loss 1.4530   LearningRate 0.0050   Epoch: 17   Global Step: 181960   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:11:31,132-Speed 5494.23 samples/sec   Loss 1.4383   LearningRate 0.0050   Epoch: 17   Global Step: 181970   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:11:38,606-Speed 5481.01 samples/sec   Loss 1.4407   LearningRate 0.0050   Epoch: 17   Global Step: 181980   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:11:46,099-Speed 5467.20 samples/sec   Loss 1.4602   LearningRate 0.0050   Epoch: 17   Global Step: 181990   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:11:53,755-Speed 5350.82 samples/sec   Loss 1.4646   LearningRate 0.0050   Epoch: 17   Global Step: 182000   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:12:38,309-[lfw][182000]XNorm: 22.457532
Training: 2022-01-09 12:12:38,310-[lfw][182000]Accuracy-Flip: 0.99850+-0.00203
Training: 2022-01-09 12:12:38,311-[lfw][182000]Accuracy-Highest: 0.99850
Training: 2022-01-09 12:13:30,232-[cfp_fp][182000]XNorm: 21.930279
Training: 2022-01-09 12:13:30,233-[cfp_fp][182000]Accuracy-Flip: 0.99357+-0.00346
Training: 2022-01-09 12:13:30,233-[cfp_fp][182000]Accuracy-Highest: 0.99371
Training: 2022-01-09 12:14:14,765-[agedb_30][182000]XNorm: 22.879698
Training: 2022-01-09 12:14:14,766-[agedb_30][182000]Accuracy-Flip: 0.98283+-0.00568
Training: 2022-01-09 12:14:14,767-[agedb_30][182000]Accuracy-Highest: 0.98433
Training: 2022-01-09 12:14:22,289-Speed 275.76 samples/sec   Loss 1.4693   LearningRate 0.0050   Epoch: 17   Global Step: 182010   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:14:29,785-Speed 5464.51 samples/sec   Loss 1.4609   LearningRate 0.0050   Epoch: 17   Global Step: 182020   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:14:37,313-Speed 5441.54 samples/sec   Loss 1.4732   LearningRate 0.0050   Epoch: 17   Global Step: 182030   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:14:44,901-Speed 5398.74 samples/sec   Loss 1.4625   LearningRate 0.0050   Epoch: 17   Global Step: 182040   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:14:52,625-Speed 5304.03 samples/sec   Loss 1.4690   LearningRate 0.0050   Epoch: 17   Global Step: 182050   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:15:00,207-Speed 5402.53 samples/sec   Loss 1.4610   LearningRate 0.0050   Epoch: 17   Global Step: 182060   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:15:07,702-Speed 5465.67 samples/sec   Loss 1.4425   LearningRate 0.0050   Epoch: 17   Global Step: 182070   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:15:15,211-Speed 5455.56 samples/sec   Loss 1.4419   LearningRate 0.0049   Epoch: 17   Global Step: 182080   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:15:22,807-Speed 5393.02 samples/sec   Loss 1.4523   LearningRate 0.0049   Epoch: 17   Global Step: 182090   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:15:30,428-Speed 5375.03 samples/sec   Loss 1.4682   LearningRate 0.0049   Epoch: 17   Global Step: 182100   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:15:37,935-Speed 5457.46 samples/sec   Loss 1.4547   LearningRate 0.0049   Epoch: 17   Global Step: 182110   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:15:45,351-Speed 5523.42 samples/sec   Loss 1.4499   LearningRate 0.0049   Epoch: 17   Global Step: 182120   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:15:52,846-Speed 5465.74 samples/sec   Loss 1.4492   LearningRate 0.0049   Epoch: 17   Global Step: 182130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 12:16:00,372-Speed 5443.16 samples/sec   Loss 1.4434   LearningRate 0.0049   Epoch: 17   Global Step: 182140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 12:16:07,822-Speed 5499.03 samples/sec   Loss 1.4613   LearningRate 0.0049   Epoch: 17   Global Step: 182150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 12:16:15,289-Speed 5486.15 samples/sec   Loss 1.4464   LearningRate 0.0049   Epoch: 17   Global Step: 182160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 12:16:22,790-Speed 5461.09 samples/sec   Loss 1.4571   LearningRate 0.0049   Epoch: 17   Global Step: 182170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 12:16:30,345-Speed 5422.27 samples/sec   Loss 1.4588   LearningRate 0.0049   Epoch: 17   Global Step: 182180   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:16:37,833-Speed 5471.21 samples/sec   Loss 1.4454   LearningRate 0.0049   Epoch: 17   Global Step: 182190   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:16:45,317-Speed 5473.38 samples/sec   Loss 1.4423   LearningRate 0.0049   Epoch: 17   Global Step: 182200   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:16:52,801-Speed 5473.81 samples/sec   Loss 1.4431   LearningRate 0.0049   Epoch: 17   Global Step: 182210   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:17:00,234-Speed 5511.28 samples/sec   Loss 1.4525   LearningRate 0.0049   Epoch: 17   Global Step: 182220   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:17:07,728-Speed 5466.70 samples/sec   Loss 1.4412   LearningRate 0.0049   Epoch: 17   Global Step: 182230   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:17:15,166-Speed 5507.41 samples/sec   Loss 1.4671   LearningRate 0.0049   Epoch: 17   Global Step: 182240   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:17:22,613-Speed 5501.06 samples/sec   Loss 1.4540   LearningRate 0.0049   Epoch: 17   Global Step: 182250   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:17:30,054-Speed 5504.96 samples/sec   Loss 1.4312   LearningRate 0.0049   Epoch: 17   Global Step: 182260   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:17:37,606-Speed 5423.95 samples/sec   Loss 1.4667   LearningRate 0.0049   Epoch: 17   Global Step: 182270   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:17:45,046-Speed 5506.22 samples/sec   Loss 1.4470   LearningRate 0.0049   Epoch: 17   Global Step: 182280   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:17:52,540-Speed 5466.31 samples/sec   Loss 1.4402   LearningRate 0.0049   Epoch: 17   Global Step: 182290   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:18:00,068-Speed 5441.78 samples/sec   Loss 1.4390   LearningRate 0.0049   Epoch: 17   Global Step: 182300   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:18:07,520-Speed 5497.47 samples/sec   Loss 1.4320   LearningRate 0.0049   Epoch: 17   Global Step: 182310   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:18:14,988-Speed 5485.41 samples/sec   Loss 1.4421   LearningRate 0.0049   Epoch: 17   Global Step: 182320   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:18:22,476-Speed 5470.65 samples/sec   Loss 1.4458   LearningRate 0.0049   Epoch: 17   Global Step: 182330   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:18:30,029-Speed 5423.88 samples/sec   Loss 1.4620   LearningRate 0.0048   Epoch: 17   Global Step: 182340   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:18:37,635-Speed 5386.35 samples/sec   Loss 1.4447   LearningRate 0.0048   Epoch: 17   Global Step: 182350   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:18:45,211-Speed 5407.20 samples/sec   Loss 1.4623   LearningRate 0.0048   Epoch: 17   Global Step: 182360   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:18:52,775-Speed 5416.29 samples/sec   Loss 1.4465   LearningRate 0.0048   Epoch: 17   Global Step: 182370   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:19:00,272-Speed 5463.64 samples/sec   Loss 1.4357   LearningRate 0.0048   Epoch: 17   Global Step: 182380   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:19:07,896-Speed 5373.08 samples/sec   Loss 1.4138   LearningRate 0.0048   Epoch: 17   Global Step: 182390   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:19:15,400-Speed 5458.81 samples/sec   Loss 1.4525   LearningRate 0.0048   Epoch: 17   Global Step: 182400   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:19:23,018-Speed 5377.73 samples/sec   Loss 1.4320   LearningRate 0.0048   Epoch: 17   Global Step: 182410   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:19:30,613-Speed 5393.48 samples/sec   Loss 1.4408   LearningRate 0.0048   Epoch: 17   Global Step: 182420   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:19:38,263-Speed 5355.11 samples/sec   Loss 1.4163   LearningRate 0.0048   Epoch: 17   Global Step: 182430   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:19:45,805-Speed 5431.68 samples/sec   Loss 1.4329   LearningRate 0.0048   Epoch: 17   Global Step: 182440   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:19:53,319-Speed 5452.27 samples/sec   Loss 1.4207   LearningRate 0.0048   Epoch: 17   Global Step: 182450   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:20:00,889-Speed 5411.08 samples/sec   Loss 1.4451   LearningRate 0.0048   Epoch: 17   Global Step: 182460   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:20:08,370-Speed 5475.33 samples/sec   Loss 1.4301   LearningRate 0.0048   Epoch: 17   Global Step: 182470   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:20:15,922-Speed 5425.01 samples/sec   Loss 1.4424   LearningRate 0.0048   Epoch: 17   Global Step: 182480   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:20:23,340-Speed 5522.73 samples/sec   Loss 1.4502   LearningRate 0.0048   Epoch: 17   Global Step: 182490   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:20:30,833-Speed 5466.95 samples/sec   Loss 1.4508   LearningRate 0.0048   Epoch: 17   Global Step: 182500   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:20:38,365-Speed 5438.13 samples/sec   Loss 1.4320   LearningRate 0.0048   Epoch: 17   Global Step: 182510   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:20:46,034-Speed 5342.19 samples/sec   Loss 1.4190   LearningRate 0.0048   Epoch: 17   Global Step: 182520   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:20:53,538-Speed 5459.06 samples/sec   Loss 1.4417   LearningRate 0.0048   Epoch: 17   Global Step: 182530   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:21:00,990-Speed 5497.55 samples/sec   Loss 1.4499   LearningRate 0.0048   Epoch: 17   Global Step: 182540   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:21:08,458-Speed 5485.05 samples/sec   Loss 1.4110   LearningRate 0.0048   Epoch: 17   Global Step: 182550   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:21:15,984-Speed 5443.23 samples/sec   Loss 1.4065   LearningRate 0.0048   Epoch: 17   Global Step: 182560   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:21:23,365-Speed 5552.30 samples/sec   Loss 1.4406   LearningRate 0.0048   Epoch: 17   Global Step: 182570   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:21:30,912-Speed 5427.88 samples/sec   Loss 1.4047   LearningRate 0.0048   Epoch: 17   Global Step: 182580   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:21:38,459-Speed 5427.44 samples/sec   Loss 1.4333   LearningRate 0.0047   Epoch: 17   Global Step: 182590   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:21:46,055-Speed 5393.00 samples/sec   Loss 1.4607   LearningRate 0.0047   Epoch: 17   Global Step: 182600   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:21:53,523-Speed 5485.85 samples/sec   Loss 1.4600   LearningRate 0.0047   Epoch: 17   Global Step: 182610   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:22:00,992-Speed 5484.89 samples/sec   Loss 1.4307   LearningRate 0.0047   Epoch: 17   Global Step: 182620   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:22:08,472-Speed 5476.44 samples/sec   Loss 1.4341   LearningRate 0.0047   Epoch: 17   Global Step: 182630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 12:22:15,920-Speed 5499.88 samples/sec   Loss 1.4368   LearningRate 0.0047   Epoch: 17   Global Step: 182640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 12:22:23,418-Speed 5463.62 samples/sec   Loss 1.3987   LearningRate 0.0047   Epoch: 17   Global Step: 182650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 12:22:31,068-Speed 5355.30 samples/sec   Loss 1.4358   LearningRate 0.0047   Epoch: 17   Global Step: 182660   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:22:38,635-Speed 5412.85 samples/sec   Loss 1.4324   LearningRate 0.0047   Epoch: 17   Global Step: 182670   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:22:46,156-Speed 5447.34 samples/sec   Loss 1.4159   LearningRate 0.0047   Epoch: 17   Global Step: 182680   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:22:53,673-Speed 5449.42 samples/sec   Loss 1.3919   LearningRate 0.0047   Epoch: 17   Global Step: 182690   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:23:01,163-Speed 5469.18 samples/sec   Loss 1.4277   LearningRate 0.0047   Epoch: 17   Global Step: 182700   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:23:08,630-Speed 5486.26 samples/sec   Loss 1.4276   LearningRate 0.0047   Epoch: 17   Global Step: 182710   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:23:16,109-Speed 5477.14 samples/sec   Loss 1.4231   LearningRate 0.0047   Epoch: 17   Global Step: 182720   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:23:23,631-Speed 5446.52 samples/sec   Loss 1.4346   LearningRate 0.0047   Epoch: 17   Global Step: 182730   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:23:31,126-Speed 5465.72 samples/sec   Loss 1.4126   LearningRate 0.0047   Epoch: 17   Global Step: 182740   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:23:38,683-Speed 5420.35 samples/sec   Loss 1.4154   LearningRate 0.0047   Epoch: 17   Global Step: 182750   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:23:46,222-Speed 5433.93 samples/sec   Loss 1.4210   LearningRate 0.0047   Epoch: 17   Global Step: 182760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 12:23:53,711-Speed 5470.13 samples/sec   Loss 1.4044   LearningRate 0.0047   Epoch: 17   Global Step: 182770   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:24:01,226-Speed 5451.15 samples/sec   Loss 1.4413   LearningRate 0.0047   Epoch: 17   Global Step: 182780   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:24:08,738-Speed 5453.21 samples/sec   Loss 1.3911   LearningRate 0.0047   Epoch: 17   Global Step: 182790   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:24:16,240-Speed 5460.91 samples/sec   Loss 1.4264   LearningRate 0.0047   Epoch: 17   Global Step: 182800   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:24:23,775-Speed 5436.75 samples/sec   Loss 1.4213   LearningRate 0.0047   Epoch: 17   Global Step: 182810   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:24:31,280-Speed 5458.44 samples/sec   Loss 1.4251   LearningRate 0.0047   Epoch: 17   Global Step: 182820   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:24:38,883-Speed 5388.03 samples/sec   Loss 1.4282   LearningRate 0.0047   Epoch: 17   Global Step: 182830   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:24:46,355-Speed 5482.29 samples/sec   Loss 1.4351   LearningRate 0.0047   Epoch: 17   Global Step: 182840   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:24:53,935-Speed 5404.64 samples/sec   Loss 1.3935   LearningRate 0.0047   Epoch: 17   Global Step: 182850   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:25:01,434-Speed 5463.36 samples/sec   Loss 1.4155   LearningRate 0.0046   Epoch: 17   Global Step: 182860   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:25:08,918-Speed 5473.07 samples/sec   Loss 1.4313   LearningRate 0.0046   Epoch: 17   Global Step: 182870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 12:25:16,454-Speed 5435.88 samples/sec   Loss 1.4268   LearningRate 0.0046   Epoch: 17   Global Step: 182880   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:25:23,962-Speed 5456.76 samples/sec   Loss 1.4204   LearningRate 0.0046   Epoch: 17   Global Step: 182890   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:25:31,536-Speed 5408.64 samples/sec   Loss 1.4098   LearningRate 0.0046   Epoch: 17   Global Step: 182900   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:25:39,024-Speed 5470.57 samples/sec   Loss 1.4075   LearningRate 0.0046   Epoch: 17   Global Step: 182910   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:25:46,580-Speed 5421.91 samples/sec   Loss 1.3994   LearningRate 0.0046   Epoch: 17   Global Step: 182920   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:25:54,016-Speed 5509.27 samples/sec   Loss 1.4124   LearningRate 0.0046   Epoch: 17   Global Step: 182930   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:26:01,446-Speed 5513.90 samples/sec   Loss 1.3914   LearningRate 0.0046   Epoch: 17   Global Step: 182940   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:26:09,081-Speed 5364.87 samples/sec   Loss 1.4161   LearningRate 0.0046   Epoch: 17   Global Step: 182950   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:26:16,577-Speed 5465.10 samples/sec   Loss 1.4188   LearningRate 0.0046   Epoch: 17   Global Step: 182960   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:26:24,029-Speed 5497.44 samples/sec   Loss 1.3993   LearningRate 0.0046   Epoch: 17   Global Step: 182970   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:26:31,481-Speed 5497.40 samples/sec   Loss 1.4144   LearningRate 0.0046   Epoch: 17   Global Step: 182980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 12:26:38,984-Speed 5459.99 samples/sec   Loss 1.4116   LearningRate 0.0046   Epoch: 17   Global Step: 182990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 12:26:46,438-Speed 5495.55 samples/sec   Loss 1.4163   LearningRate 0.0046   Epoch: 17   Global Step: 183000   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:26:54,006-Speed 5412.93 samples/sec   Loss 1.4164   LearningRate 0.0046   Epoch: 17   Global Step: 183010   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:27:01,550-Speed 5430.62 samples/sec   Loss 1.4063   LearningRate 0.0046   Epoch: 17   Global Step: 183020   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:27:09,155-Speed 5386.61 samples/sec   Loss 1.4232   LearningRate 0.0046   Epoch: 17   Global Step: 183030   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:27:16,827-Speed 5339.49 samples/sec   Loss 1.4014   LearningRate 0.0046   Epoch: 17   Global Step: 183040   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:27:24,372-Speed 5429.62 samples/sec   Loss 1.4120   LearningRate 0.0046   Epoch: 17   Global Step: 183050   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:27:31,844-Speed 5482.83 samples/sec   Loss 1.4212   LearningRate 0.0046   Epoch: 17   Global Step: 183060   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:27:39,419-Speed 5407.78 samples/sec   Loss 1.4342   LearningRate 0.0046   Epoch: 17   Global Step: 183070   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:27:46,905-Speed 5472.45 samples/sec   Loss 1.4048   LearningRate 0.0046   Epoch: 17   Global Step: 183080   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:27:54,492-Speed 5399.46 samples/sec   Loss 1.4115   LearningRate 0.0046   Epoch: 17   Global Step: 183090   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:28:02,064-Speed 5410.06 samples/sec   Loss 1.4037   LearningRate 0.0046   Epoch: 17   Global Step: 183100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 12:28:09,603-Speed 5434.12 samples/sec   Loss 1.3857   LearningRate 0.0046   Epoch: 17   Global Step: 183110   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:28:17,183-Speed 5404.53 samples/sec   Loss 1.3980   LearningRate 0.0045   Epoch: 17   Global Step: 183120   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:28:24,801-Speed 5376.85 samples/sec   Loss 1.4186   LearningRate 0.0045   Epoch: 17   Global Step: 183130   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:28:32,468-Speed 5343.73 samples/sec   Loss 1.4067   LearningRate 0.0045   Epoch: 17   Global Step: 183140   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:28:40,017-Speed 5426.88 samples/sec   Loss 1.4150   LearningRate 0.0045   Epoch: 17   Global Step: 183150   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:28:47,650-Speed 5366.57 samples/sec   Loss 1.4188   LearningRate 0.0045   Epoch: 17   Global Step: 183160   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:28:55,317-Speed 5343.20 samples/sec   Loss 1.3997   LearningRate 0.0045   Epoch: 17   Global Step: 183170   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:29:02,867-Speed 5426.16 samples/sec   Loss 1.4177   LearningRate 0.0045   Epoch: 17   Global Step: 183180   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:29:10,455-Speed 5398.28 samples/sec   Loss 1.3866   LearningRate 0.0045   Epoch: 17   Global Step: 183190   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:29:18,050-Speed 5393.80 samples/sec   Loss 1.4070   LearningRate 0.0045   Epoch: 17   Global Step: 183200   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:29:25,628-Speed 5405.92 samples/sec   Loss 1.4011   LearningRate 0.0045   Epoch: 17   Global Step: 183210   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:29:33,199-Speed 5411.19 samples/sec   Loss 1.4014   LearningRate 0.0045   Epoch: 17   Global Step: 183220   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:29:40,717-Speed 5448.89 samples/sec   Loss 1.4206   LearningRate 0.0045   Epoch: 17   Global Step: 183230   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:29:48,371-Speed 5351.90 samples/sec   Loss 1.4151   LearningRate 0.0045   Epoch: 17   Global Step: 183240   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:29:55,911-Speed 5433.23 samples/sec   Loss 1.3951   LearningRate 0.0045   Epoch: 17   Global Step: 183250   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:30:03,494-Speed 5402.14 samples/sec   Loss 1.3847   LearningRate 0.0045   Epoch: 17   Global Step: 183260   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:30:11,090-Speed 5393.47 samples/sec   Loss 1.3966   LearningRate 0.0045   Epoch: 17   Global Step: 183270   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:30:18,568-Speed 5478.05 samples/sec   Loss 1.3934   LearningRate 0.0045   Epoch: 17   Global Step: 183280   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:30:26,218-Speed 5355.12 samples/sec   Loss 1.3681   LearningRate 0.0045   Epoch: 17   Global Step: 183290   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:30:33,930-Speed 5311.76 samples/sec   Loss 1.3785   LearningRate 0.0045   Epoch: 17   Global Step: 183300   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 12:30:41,480-Speed 5426.01 samples/sec   Loss 1.3893   LearningRate 0.0045   Epoch: 17   Global Step: 183310   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:30:49,129-Speed 5355.27 samples/sec   Loss 1.4116   LearningRate 0.0045   Epoch: 17   Global Step: 183320   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:30:56,762-Speed 5366.63 samples/sec   Loss 1.4045   LearningRate 0.0045   Epoch: 17   Global Step: 183330   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:31:04,474-Speed 5312.60 samples/sec   Loss 1.4019   LearningRate 0.0045   Epoch: 17   Global Step: 183340   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:31:12,039-Speed 5415.27 samples/sec   Loss 1.4224   LearningRate 0.0045   Epoch: 17   Global Step: 183350   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:31:19,615-Speed 5407.06 samples/sec   Loss 1.4079   LearningRate 0.0045   Epoch: 17   Global Step: 183360   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 12:31:27,098-Speed 5473.82 samples/sec   Loss 1.3982   LearningRate 0.0045   Epoch: 17   Global Step: 183370   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 12:31:34,654-Speed 5421.75 samples/sec   Loss 1.3862   LearningRate 0.0045   Epoch: 17   Global Step: 183380   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 12:31:42,288-Speed 5366.72 samples/sec   Loss 1.3975   LearningRate 0.0044   Epoch: 17   Global Step: 183390   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 12:31:49,850-Speed 5416.95 samples/sec   Loss 1.4057   LearningRate 0.0044   Epoch: 17   Global Step: 183400   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 12:31:57,301-Speed 5497.48 samples/sec   Loss 1.4173   LearningRate 0.0044   Epoch: 17   Global Step: 183410   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:32:04,773-Speed 5482.44 samples/sec   Loss 1.4017   LearningRate 0.0044   Epoch: 17   Global Step: 183420   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:32:12,256-Speed 5474.86 samples/sec   Loss 1.3675   LearningRate 0.0044   Epoch: 17   Global Step: 183430   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:32:19,804-Speed 5427.08 samples/sec   Loss 1.3973   LearningRate 0.0044   Epoch: 17   Global Step: 183440   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:32:27,452-Speed 5356.38 samples/sec   Loss 1.4055   LearningRate 0.0044   Epoch: 17   Global Step: 183450   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:32:34,976-Speed 5444.80 samples/sec   Loss 1.3877   LearningRate 0.0044   Epoch: 17   Global Step: 183460   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:32:42,525-Speed 5426.93 samples/sec   Loss 1.4116   LearningRate 0.0044   Epoch: 17   Global Step: 183470   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:32:50,054-Speed 5440.73 samples/sec   Loss 1.4269   LearningRate 0.0044   Epoch: 17   Global Step: 183480   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:32:57,643-Speed 5397.99 samples/sec   Loss 1.3950   LearningRate 0.0044   Epoch: 17   Global Step: 183490   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:33:05,183-Speed 5433.08 samples/sec   Loss 1.3969   LearningRate 0.0044   Epoch: 17   Global Step: 183500   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:33:12,687-Speed 5459.19 samples/sec   Loss 1.3872   LearningRate 0.0044   Epoch: 17   Global Step: 183510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 12:33:20,134-Speed 5501.20 samples/sec   Loss 1.3830   LearningRate 0.0044   Epoch: 17   Global Step: 183520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 12:33:27,607-Speed 5481.62 samples/sec   Loss 1.3750   LearningRate 0.0044   Epoch: 17   Global Step: 183530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 12:33:35,134-Speed 5441.96 samples/sec   Loss 1.3739   LearningRate 0.0044   Epoch: 17   Global Step: 183540   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:33:42,796-Speed 5347.27 samples/sec   Loss 1.3895   LearningRate 0.0044   Epoch: 17   Global Step: 183550   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:33:50,472-Speed 5336.49 samples/sec   Loss 1.3947   LearningRate 0.0044   Epoch: 17   Global Step: 183560   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:33:58,016-Speed 5429.92 samples/sec   Loss 1.3600   LearningRate 0.0044   Epoch: 17   Global Step: 183570   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:34:05,544-Speed 5441.98 samples/sec   Loss 1.3955   LearningRate 0.0044   Epoch: 17   Global Step: 183580   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:34:13,142-Speed 5391.40 samples/sec   Loss 1.3814   LearningRate 0.0044   Epoch: 17   Global Step: 183590   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:34:20,620-Speed 5478.36 samples/sec   Loss 1.4128   LearningRate 0.0044   Epoch: 17   Global Step: 183600   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:34:28,181-Speed 5417.26 samples/sec   Loss 1.3700   LearningRate 0.0044   Epoch: 17   Global Step: 183610   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:34:35,688-Speed 5457.33 samples/sec   Loss 1.4034   LearningRate 0.0044   Epoch: 17   Global Step: 183620   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:34:43,210-Speed 5446.33 samples/sec   Loss 1.3859   LearningRate 0.0044   Epoch: 17   Global Step: 183630   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:34:50,829-Speed 5377.15 samples/sec   Loss 1.3998   LearningRate 0.0044   Epoch: 17   Global Step: 183640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 12:34:58,365-Speed 5435.19 samples/sec   Loss 1.4052   LearningRate 0.0044   Epoch: 17   Global Step: 183650   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:35:05,874-Speed 5455.61 samples/sec   Loss 1.3808   LearningRate 0.0043   Epoch: 17   Global Step: 183660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:35:13,416-Speed 5432.45 samples/sec   Loss 1.3789   LearningRate 0.0043   Epoch: 17   Global Step: 183670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:35:20,925-Speed 5455.77 samples/sec   Loss 1.3692   LearningRate 0.0043   Epoch: 17   Global Step: 183680   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:35:28,574-Speed 5355.00 samples/sec   Loss 1.3794   LearningRate 0.0043   Epoch: 17   Global Step: 183690   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:35:36,134-Speed 5418.70 samples/sec   Loss 1.3746   LearningRate 0.0043   Epoch: 17   Global Step: 183700   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:35:43,653-Speed 5448.47 samples/sec   Loss 1.3992   LearningRate 0.0043   Epoch: 17   Global Step: 183710   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:35:51,224-Speed 5410.83 samples/sec   Loss 1.3715   LearningRate 0.0043   Epoch: 17   Global Step: 183720   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:35:58,794-Speed 5411.29 samples/sec   Loss 1.3734   LearningRate 0.0043   Epoch: 17   Global Step: 183730   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:36:06,441-Speed 5357.44 samples/sec   Loss 1.3874   LearningRate 0.0043   Epoch: 17   Global Step: 183740   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:36:13,971-Speed 5440.73 samples/sec   Loss 1.3855   LearningRate 0.0043   Epoch: 17   Global Step: 183750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 12:36:21,461-Speed 5469.09 samples/sec   Loss 1.3720   LearningRate 0.0043   Epoch: 17   Global Step: 183760   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:36:29,036-Speed 5408.04 samples/sec   Loss 1.3688   LearningRate 0.0043   Epoch: 17   Global Step: 183770   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:36:36,643-Speed 5385.20 samples/sec   Loss 1.3648   LearningRate 0.0043   Epoch: 17   Global Step: 183780   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:36:44,305-Speed 5346.87 samples/sec   Loss 1.3705   LearningRate 0.0043   Epoch: 17   Global Step: 183790   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:36:51,823-Speed 5449.29 samples/sec   Loss 1.3665   LearningRate 0.0043   Epoch: 17   Global Step: 183800   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:36:59,391-Speed 5412.78 samples/sec   Loss 1.3668   LearningRate 0.0043   Epoch: 17   Global Step: 183810   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:37:06,927-Speed 5435.72 samples/sec   Loss 1.3800   LearningRate 0.0043   Epoch: 17   Global Step: 183820   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:37:14,454-Speed 5442.59 samples/sec   Loss 1.3902   LearningRate 0.0043   Epoch: 17   Global Step: 183830   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:37:21,955-Speed 5461.94 samples/sec   Loss 1.3589   LearningRate 0.0043   Epoch: 17   Global Step: 183840   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:37:29,421-Speed 5486.56 samples/sec   Loss 1.3634   LearningRate 0.0043   Epoch: 17   Global Step: 183850   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:37:37,125-Speed 5317.30 samples/sec   Loss 1.3795   LearningRate 0.0043   Epoch: 17   Global Step: 183860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 12:37:44,651-Speed 5442.99 samples/sec   Loss 1.3561   LearningRate 0.0043   Epoch: 17   Global Step: 183870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 12:37:52,122-Speed 5484.06 samples/sec   Loss 1.3594   LearningRate 0.0043   Epoch: 17   Global Step: 183880   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:37:59,682-Speed 5418.02 samples/sec   Loss 1.3734   LearningRate 0.0043   Epoch: 17   Global Step: 183890   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:38:07,150-Speed 5485.68 samples/sec   Loss 1.3836   LearningRate 0.0043   Epoch: 17   Global Step: 183900   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:38:14,636-Speed 5472.39 samples/sec   Loss 1.3799   LearningRate 0.0043   Epoch: 17   Global Step: 183910   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:38:22,279-Speed 5359.78 samples/sec   Loss 1.3771   LearningRate 0.0043   Epoch: 17   Global Step: 183920   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:38:29,756-Speed 5478.81 samples/sec   Loss 1.3675   LearningRate 0.0043   Epoch: 17   Global Step: 183930   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:38:37,349-Speed 5394.49 samples/sec   Loss 1.3797   LearningRate 0.0042   Epoch: 17   Global Step: 183940   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:38:44,868-Speed 5449.07 samples/sec   Loss 1.3978   LearningRate 0.0042   Epoch: 17   Global Step: 183950   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:38:52,325-Speed 5493.15 samples/sec   Loss 1.3873   LearningRate 0.0042   Epoch: 17   Global Step: 183960   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:38:59,852-Speed 5442.63 samples/sec   Loss 1.3522   LearningRate 0.0042   Epoch: 17   Global Step: 183970   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:39:07,336-Speed 5473.11 samples/sec   Loss 1.3622   LearningRate 0.0042   Epoch: 17   Global Step: 183980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 12:39:14,895-Speed 5420.30 samples/sec   Loss 1.3788   LearningRate 0.0042   Epoch: 17   Global Step: 183990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 12:39:22,361-Speed 5486.72 samples/sec   Loss 1.3674   LearningRate 0.0042   Epoch: 17   Global Step: 184000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 12:40:06,627-[lfw][184000]XNorm: 22.478137
Training: 2022-01-09 12:40:06,628-[lfw][184000]Accuracy-Flip: 0.99817+-0.00229
Training: 2022-01-09 12:40:06,628-[lfw][184000]Accuracy-Highest: 0.99850
Training: 2022-01-09 12:40:58,147-[cfp_fp][184000]XNorm: 22.006548
Training: 2022-01-09 12:40:58,148-[cfp_fp][184000]Accuracy-Flip: 0.99343+-0.00395
Training: 2022-01-09 12:40:58,148-[cfp_fp][184000]Accuracy-Highest: 0.99371
Training: 2022-01-09 12:41:42,445-[agedb_30][184000]XNorm: 22.937021
Training: 2022-01-09 12:41:42,446-[agedb_30][184000]Accuracy-Flip: 0.98500+-0.00587
Training: 2022-01-09 12:41:42,446-[agedb_30][184000]Accuracy-Highest: 0.98500
Training: 2022-01-09 12:41:49,952-Speed 277.53 samples/sec   Loss 1.3797   LearningRate 0.0042   Epoch: 17   Global Step: 184010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 12:41:57,380-Speed 5515.04 samples/sec   Loss 1.3692   LearningRate 0.0042   Epoch: 17   Global Step: 184020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 12:42:04,996-Speed 5378.68 samples/sec   Loss 1.3430   LearningRate 0.0042   Epoch: 17   Global Step: 184030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 12:42:12,500-Speed 5459.06 samples/sec   Loss 1.3531   LearningRate 0.0042   Epoch: 17   Global Step: 184040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 12:42:19,897-Speed 5537.91 samples/sec   Loss 1.3564   LearningRate 0.0042   Epoch: 17   Global Step: 184050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 12:42:27,349-Speed 5497.77 samples/sec   Loss 1.3929   LearningRate 0.0042   Epoch: 17   Global Step: 184060   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:42:34,739-Speed 5543.22 samples/sec   Loss 1.3655   LearningRate 0.0042   Epoch: 17   Global Step: 184070   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:42:42,198-Speed 5491.66 samples/sec   Loss 1.3527   LearningRate 0.0042   Epoch: 17   Global Step: 184080   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:42:49,727-Speed 5441.60 samples/sec   Loss 1.3556   LearningRate 0.0042   Epoch: 17   Global Step: 184090   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:42:57,209-Speed 5475.55 samples/sec   Loss 1.3741   LearningRate 0.0042   Epoch: 17   Global Step: 184100   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:43:04,712-Speed 5459.59 samples/sec   Loss 1.3666   LearningRate 0.0042   Epoch: 17   Global Step: 184110   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 12:43:12,204-Speed 5467.41 samples/sec   Loss 1.3646   LearningRate 0.0042   Epoch: 17   Global Step: 184120   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 12:43:19,672-Speed 5486.17 samples/sec   Loss 1.3524   LearningRate 0.0042   Epoch: 17   Global Step: 184130   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 12:43:27,173-Speed 5461.58 samples/sec   Loss 1.3868   LearningRate 0.0042   Epoch: 17   Global Step: 184140   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 12:43:34,645-Speed 5481.94 samples/sec   Loss 1.3699   LearningRate 0.0042   Epoch: 17   Global Step: 184150   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 12:43:42,203-Speed 5420.30 samples/sec   Loss 1.3473   LearningRate 0.0042   Epoch: 17   Global Step: 184160   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 12:43:49,661-Speed 5492.85 samples/sec   Loss 1.3535   LearningRate 0.0042   Epoch: 17   Global Step: 184170   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 12:43:57,268-Speed 5385.29 samples/sec   Loss 1.3306   LearningRate 0.0042   Epoch: 17   Global Step: 184180   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 12:44:04,797-Speed 5441.22 samples/sec   Loss 1.3438   LearningRate 0.0042   Epoch: 17   Global Step: 184190   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 12:44:12,422-Speed 5372.00 samples/sec   Loss 1.3803   LearningRate 0.0042   Epoch: 17   Global Step: 184200   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 12:44:19,881-Speed 5493.04 samples/sec   Loss 1.3743   LearningRate 0.0041   Epoch: 17   Global Step: 184210   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:44:27,343-Speed 5489.69 samples/sec   Loss 1.3645   LearningRate 0.0041   Epoch: 17   Global Step: 184220   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:44:34,820-Speed 5478.76 samples/sec   Loss 1.3753   LearningRate 0.0041   Epoch: 17   Global Step: 184230   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:44:42,323-Speed 5460.05 samples/sec   Loss 1.3783   LearningRate 0.0041   Epoch: 17   Global Step: 184240   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:44:49,873-Speed 5425.67 samples/sec   Loss 1.3999   LearningRate 0.0041   Epoch: 17   Global Step: 184250   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:44:57,355-Speed 5475.49 samples/sec   Loss 1.3597   LearningRate 0.0041   Epoch: 17   Global Step: 184260   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:45:04,781-Speed 5516.71 samples/sec   Loss 1.3632   LearningRate 0.0041   Epoch: 17   Global Step: 184270   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:45:12,237-Speed 5494.15 samples/sec   Loss 1.3597   LearningRate 0.0041   Epoch: 17   Global Step: 184280   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:45:19,661-Speed 5518.60 samples/sec   Loss 1.3740   LearningRate 0.0041   Epoch: 17   Global Step: 184290   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:45:27,159-Speed 5463.31 samples/sec   Loss 1.3535   LearningRate 0.0041   Epoch: 17   Global Step: 184300   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:45:34,632-Speed 5481.76 samples/sec   Loss 1.3616   LearningRate 0.0041   Epoch: 17   Global Step: 184310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 12:45:42,097-Speed 5487.52 samples/sec   Loss 1.3516   LearningRate 0.0041   Epoch: 17   Global Step: 184320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 12:45:49,513-Speed 5523.81 samples/sec   Loss 1.3971   LearningRate 0.0041   Epoch: 17   Global Step: 184330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 12:45:56,945-Speed 5512.06 samples/sec   Loss 1.3534   LearningRate 0.0041   Epoch: 17   Global Step: 184340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 12:46:04,493-Speed 5427.50 samples/sec   Loss 1.3674   LearningRate 0.0041   Epoch: 17   Global Step: 184350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 12:46:11,930-Speed 5508.52 samples/sec   Loss 1.3429   LearningRate 0.0041   Epoch: 17   Global Step: 184360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 12:46:19,377-Speed 5500.61 samples/sec   Loss 1.3376   LearningRate 0.0041   Epoch: 17   Global Step: 184370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 12:46:26,958-Speed 5404.17 samples/sec   Loss 1.3708   LearningRate 0.0041   Epoch: 17   Global Step: 184380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 12:46:34,544-Speed 5400.53 samples/sec   Loss 1.3731   LearningRate 0.0041   Epoch: 17   Global Step: 184390   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:46:42,148-Speed 5387.02 samples/sec   Loss 1.3554   LearningRate 0.0041   Epoch: 17   Global Step: 184400   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:46:49,690-Speed 5431.30 samples/sec   Loss 1.3519   LearningRate 0.0041   Epoch: 17   Global Step: 184410   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:46:57,188-Speed 5463.90 samples/sec   Loss 1.3434   LearningRate 0.0041   Epoch: 17   Global Step: 184420   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:47:04,813-Speed 5373.00 samples/sec   Loss 1.3551   LearningRate 0.0041   Epoch: 17   Global Step: 184430   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:47:12,402-Speed 5397.88 samples/sec   Loss 1.3593   LearningRate 0.0041   Epoch: 17   Global Step: 184440   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:47:19,992-Speed 5396.52 samples/sec   Loss 1.3653   LearningRate 0.0041   Epoch: 17   Global Step: 184450   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:47:27,483-Speed 5469.21 samples/sec   Loss 1.3386   LearningRate 0.0041   Epoch: 17   Global Step: 184460   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:47:35,031-Speed 5427.47 samples/sec   Loss 1.3440   LearningRate 0.0041   Epoch: 17   Global Step: 184470   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:47:42,532-Speed 5460.81 samples/sec   Loss 1.3589   LearningRate 0.0041   Epoch: 17   Global Step: 184480   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:47:50,015-Speed 5474.21 samples/sec   Loss 1.3675   LearningRate 0.0040   Epoch: 17   Global Step: 184490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 12:47:57,576-Speed 5418.57 samples/sec   Loss 1.3306   LearningRate 0.0040   Epoch: 17   Global Step: 184500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 12:48:05,065-Speed 5469.30 samples/sec   Loss 1.3666   LearningRate 0.0040   Epoch: 17   Global Step: 184510   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:48:12,668-Speed 5388.51 samples/sec   Loss 1.3589   LearningRate 0.0040   Epoch: 17   Global Step: 184520   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:48:20,381-Speed 5310.96 samples/sec   Loss 1.3236   LearningRate 0.0040   Epoch: 17   Global Step: 184530   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:48:27,846-Speed 5487.75 samples/sec   Loss 1.3435   LearningRate 0.0040   Epoch: 17   Global Step: 184540   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:48:35,298-Speed 5497.28 samples/sec   Loss 1.3597   LearningRate 0.0040   Epoch: 17   Global Step: 184550   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:48:42,850-Speed 5424.66 samples/sec   Loss 1.3551   LearningRate 0.0040   Epoch: 17   Global Step: 184560   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:48:50,370-Speed 5447.46 samples/sec   Loss 1.3470   LearningRate 0.0040   Epoch: 17   Global Step: 184570   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:48:57,966-Speed 5392.82 samples/sec   Loss 1.3228   LearningRate 0.0040   Epoch: 17   Global Step: 184580   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:49:05,454-Speed 5471.37 samples/sec   Loss 1.3516   LearningRate 0.0040   Epoch: 17   Global Step: 184590   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:49:12,945-Speed 5468.03 samples/sec   Loss 1.3572   LearningRate 0.0040   Epoch: 17   Global Step: 184600   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:49:20,624-Speed 5335.15 samples/sec   Loss 1.3421   LearningRate 0.0040   Epoch: 17   Global Step: 184610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 12:49:28,237-Speed 5380.69 samples/sec   Loss 1.3739   LearningRate 0.0040   Epoch: 17   Global Step: 184620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 12:49:35,765-Speed 5441.97 samples/sec   Loss 1.3592   LearningRate 0.0040   Epoch: 17   Global Step: 184630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 12:49:43,572-Speed 5247.10 samples/sec   Loss 1.3250   LearningRate 0.0040   Epoch: 17   Global Step: 184640   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:49:51,130-Speed 5419.95 samples/sec   Loss 1.3341   LearningRate 0.0040   Epoch: 17   Global Step: 184650   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:49:58,655-Speed 5444.37 samples/sec   Loss 1.3560   LearningRate 0.0040   Epoch: 17   Global Step: 184660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:50:06,211-Speed 5421.49 samples/sec   Loss 1.3220   LearningRate 0.0040   Epoch: 17   Global Step: 184670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:50:13,691-Speed 5476.20 samples/sec   Loss 1.3649   LearningRate 0.0040   Epoch: 17   Global Step: 184680   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:50:21,279-Speed 5399.16 samples/sec   Loss 1.3431   LearningRate 0.0040   Epoch: 17   Global Step: 184690   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:50:28,730-Speed 5498.27 samples/sec   Loss 1.3484   LearningRate 0.0040   Epoch: 17   Global Step: 184700   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:50:36,227-Speed 5464.07 samples/sec   Loss 1.3343   LearningRate 0.0040   Epoch: 17   Global Step: 184710   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:50:43,710-Speed 5474.41 samples/sec   Loss 1.3588   LearningRate 0.0040   Epoch: 17   Global Step: 184720   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:50:51,236-Speed 5443.15 samples/sec   Loss 1.3311   LearningRate 0.0040   Epoch: 17   Global Step: 184730   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:50:58,791-Speed 5422.27 samples/sec   Loss 1.3439   LearningRate 0.0040   Epoch: 17   Global Step: 184740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 12:51:06,311-Speed 5447.46 samples/sec   Loss 1.3394   LearningRate 0.0040   Epoch: 17   Global Step: 184750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 12:51:13,783-Speed 5482.20 samples/sec   Loss 1.3393   LearningRate 0.0040   Epoch: 17   Global Step: 184760   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:51:21,329-Speed 5428.94 samples/sec   Loss 1.3337   LearningRate 0.0040   Epoch: 17   Global Step: 184770   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:51:28,803-Speed 5481.00 samples/sec   Loss 1.3337   LearningRate 0.0039   Epoch: 17   Global Step: 184780   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:51:36,379-Speed 5407.10 samples/sec   Loss 1.3393   LearningRate 0.0039   Epoch: 17   Global Step: 184790   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:51:43,978-Speed 5390.85 samples/sec   Loss 1.3440   LearningRate 0.0039   Epoch: 17   Global Step: 184800   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:51:51,638-Speed 5348.26 samples/sec   Loss 1.3423   LearningRate 0.0039   Epoch: 17   Global Step: 184810   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:51:59,197-Speed 5419.00 samples/sec   Loss 1.3614   LearningRate 0.0039   Epoch: 17   Global Step: 184820   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:52:06,841-Speed 5359.40 samples/sec   Loss 1.3443   LearningRate 0.0039   Epoch: 17   Global Step: 184830   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:52:14,460-Speed 5376.18 samples/sec   Loss 1.3408   LearningRate 0.0039   Epoch: 17   Global Step: 184840   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:52:22,027-Speed 5413.78 samples/sec   Loss 1.3563   LearningRate 0.0039   Epoch: 17   Global Step: 184850   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:52:29,655-Speed 5371.15 samples/sec   Loss 1.3290   LearningRate 0.0039   Epoch: 17   Global Step: 184860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 12:52:37,169-Speed 5451.08 samples/sec   Loss 1.3646   LearningRate 0.0039   Epoch: 17   Global Step: 184870   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:52:44,648-Speed 5477.48 samples/sec   Loss 1.3570   LearningRate 0.0039   Epoch: 17   Global Step: 184880   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:52:52,156-Speed 5456.56 samples/sec   Loss 1.3281   LearningRate 0.0039   Epoch: 17   Global Step: 184890   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:52:59,607-Speed 5498.46 samples/sec   Loss 1.3111   LearningRate 0.0039   Epoch: 17   Global Step: 184900   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:53:07,083-Speed 5479.49 samples/sec   Loss 1.3255   LearningRate 0.0039   Epoch: 17   Global Step: 184910   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:53:14,565-Speed 5474.99 samples/sec   Loss 1.3246   LearningRate 0.0039   Epoch: 17   Global Step: 184920   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:53:22,080-Speed 5450.71 samples/sec   Loss 1.3297   LearningRate 0.0039   Epoch: 17   Global Step: 184930   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:53:29,766-Speed 5330.58 samples/sec   Loss 1.3373   LearningRate 0.0039   Epoch: 17   Global Step: 184940   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:53:37,276-Speed 5455.30 samples/sec   Loss 1.3603   LearningRate 0.0039   Epoch: 17   Global Step: 184950   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:53:44,865-Speed 5397.69 samples/sec   Loss 1.3492   LearningRate 0.0039   Epoch: 17   Global Step: 184960   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:53:52,458-Speed 5395.00 samples/sec   Loss 1.3386   LearningRate 0.0039   Epoch: 17   Global Step: 184970   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:54:00,078-Speed 5376.25 samples/sec   Loss 1.3255   LearningRate 0.0039   Epoch: 17   Global Step: 184980   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:54:07,641-Speed 5416.93 samples/sec   Loss 1.3128   LearningRate 0.0039   Epoch: 17   Global Step: 184990   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:54:15,183-Speed 5431.36 samples/sec   Loss 1.3363   LearningRate 0.0039   Epoch: 17   Global Step: 185000   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:54:22,717-Speed 5436.97 samples/sec   Loss 1.3557   LearningRate 0.0039   Epoch: 17   Global Step: 185010   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:54:30,191-Speed 5481.14 samples/sec   Loss 1.3573   LearningRate 0.0039   Epoch: 17   Global Step: 185020   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:54:37,622-Speed 5513.24 samples/sec   Loss 1.3496   LearningRate 0.0039   Epoch: 17   Global Step: 185030   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:54:45,111-Speed 5469.65 samples/sec   Loss 1.3103   LearningRate 0.0039   Epoch: 17   Global Step: 185040   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:54:52,599-Speed 5470.55 samples/sec   Loss 1.3215   LearningRate 0.0039   Epoch: 17   Global Step: 185050   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:55:00,079-Speed 5476.74 samples/sec   Loss 1.3344   LearningRate 0.0039   Epoch: 17   Global Step: 185060   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:55:07,627-Speed 5427.63 samples/sec   Loss 1.3531   LearningRate 0.0038   Epoch: 17   Global Step: 185070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 12:55:15,192-Speed 5414.87 samples/sec   Loss 1.3514   LearningRate 0.0038   Epoch: 17   Global Step: 185080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 12:55:22,717-Speed 5444.39 samples/sec   Loss 1.3454   LearningRate 0.0038   Epoch: 17   Global Step: 185090   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:55:30,195-Speed 5478.08 samples/sec   Loss 1.3214   LearningRate 0.0038   Epoch: 17   Global Step: 185100   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:55:37,652-Speed 5493.99 samples/sec   Loss 1.3355   LearningRate 0.0038   Epoch: 17   Global Step: 185110   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:55:45,170-Speed 5448.64 samples/sec   Loss 1.3168   LearningRate 0.0038   Epoch: 17   Global Step: 185120   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:55:52,823-Speed 5353.50 samples/sec   Loss 1.3567   LearningRate 0.0038   Epoch: 17   Global Step: 185130   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:56:00,411-Speed 5398.29 samples/sec   Loss 1.3366   LearningRate 0.0038   Epoch: 17   Global Step: 185140   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:56:07,936-Speed 5444.67 samples/sec   Loss 1.3221   LearningRate 0.0038   Epoch: 17   Global Step: 185150   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:56:15,403-Speed 5485.83 samples/sec   Loss 1.3639   LearningRate 0.0038   Epoch: 17   Global Step: 185160   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:56:22,898-Speed 5466.28 samples/sec   Loss 1.3080   LearningRate 0.0038   Epoch: 17   Global Step: 185170   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:56:30,471-Speed 5409.30 samples/sec   Loss 1.3419   LearningRate 0.0038   Epoch: 17   Global Step: 185180   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:56:38,022-Speed 5425.04 samples/sec   Loss 1.3438   LearningRate 0.0038   Epoch: 17   Global Step: 185190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 12:56:45,617-Speed 5393.90 samples/sec   Loss 1.3081   LearningRate 0.0038   Epoch: 17   Global Step: 185200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 12:56:53,181-Speed 5415.71 samples/sec   Loss 1.3292   LearningRate 0.0038   Epoch: 17   Global Step: 185210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 12:57:00,717-Speed 5435.75 samples/sec   Loss 1.3082   LearningRate 0.0038   Epoch: 17   Global Step: 185220   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:57:08,335-Speed 5377.85 samples/sec   Loss 1.3309   LearningRate 0.0038   Epoch: 17   Global Step: 185230   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:57:15,842-Speed 5456.87 samples/sec   Loss 1.3095   LearningRate 0.0038   Epoch: 17   Global Step: 185240   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:57:23,304-Speed 5489.70 samples/sec   Loss 1.3371   LearningRate 0.0038   Epoch: 17   Global Step: 185250   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:57:30,761-Speed 5493.77 samples/sec   Loss 1.2951   LearningRate 0.0038   Epoch: 17   Global Step: 185260   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:57:38,205-Speed 5503.63 samples/sec   Loss 1.3187   LearningRate 0.0038   Epoch: 17   Global Step: 185270   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:57:45,671-Speed 5487.01 samples/sec   Loss 1.3432   LearningRate 0.0038   Epoch: 17   Global Step: 185280   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 12:57:53,114-Speed 5503.50 samples/sec   Loss 1.3264   LearningRate 0.0038   Epoch: 17   Global Step: 185290   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 12:58:00,621-Speed 5457.07 samples/sec   Loss 1.3164   LearningRate 0.0038   Epoch: 17   Global Step: 185300   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 12:58:08,097-Speed 5479.75 samples/sec   Loss 1.3252   LearningRate 0.0038   Epoch: 17   Global Step: 185310   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 12:58:15,605-Speed 5456.14 samples/sec   Loss 1.3329   LearningRate 0.0038   Epoch: 17   Global Step: 185320   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 12:58:23,104-Speed 5462.99 samples/sec   Loss 1.2934   LearningRate 0.0038   Epoch: 17   Global Step: 185330   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 12:58:30,626-Speed 5446.48 samples/sec   Loss 1.2881   LearningRate 0.0038   Epoch: 17   Global Step: 185340   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 12:58:38,089-Speed 5489.51 samples/sec   Loss 1.2781   LearningRate 0.0038   Epoch: 17   Global Step: 185350   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 12:58:45,586-Speed 5464.17 samples/sec   Loss 1.3550   LearningRate 0.0037   Epoch: 17   Global Step: 185360   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 12:58:53,079-Speed 5467.43 samples/sec   Loss 1.3045   LearningRate 0.0037   Epoch: 17   Global Step: 185370   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 12:59:00,589-Speed 5454.79 samples/sec   Loss 1.3042   LearningRate 0.0037   Epoch: 17   Global Step: 185380   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:59:08,072-Speed 5474.45 samples/sec   Loss 1.3151   LearningRate 0.0037   Epoch: 17   Global Step: 185390   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:59:15,660-Speed 5398.67 samples/sec   Loss 1.3168   LearningRate 0.0037   Epoch: 17   Global Step: 185400   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:59:23,141-Speed 5476.12 samples/sec   Loss 1.3269   LearningRate 0.0037   Epoch: 17   Global Step: 185410   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:59:30,650-Speed 5455.52 samples/sec   Loss 1.3047   LearningRate 0.0037   Epoch: 17   Global Step: 185420   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:59:38,165-Speed 5450.93 samples/sec   Loss 1.3227   LearningRate 0.0037   Epoch: 17   Global Step: 185430   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:59:45,592-Speed 5516.01 samples/sec   Loss 1.3330   LearningRate 0.0037   Epoch: 17   Global Step: 185440   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 12:59:53,172-Speed 5404.45 samples/sec   Loss 1.3164   LearningRate 0.0037   Epoch: 17   Global Step: 185450   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:00:00,738-Speed 5414.71 samples/sec   Loss 1.2986   LearningRate 0.0037   Epoch: 17   Global Step: 185460   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:00:08,236-Speed 5463.53 samples/sec   Loss 1.3255   LearningRate 0.0037   Epoch: 17   Global Step: 185470   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:00:15,765-Speed 5441.10 samples/sec   Loss 1.3176   LearningRate 0.0037   Epoch: 17   Global Step: 185480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 13:00:23,289-Speed 5444.28 samples/sec   Loss 1.3254   LearningRate 0.0037   Epoch: 17   Global Step: 185490   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:00:30,752-Speed 5489.63 samples/sec   Loss 1.3476   LearningRate 0.0037   Epoch: 17   Global Step: 185500   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:00:38,189-Speed 5508.57 samples/sec   Loss 1.2933   LearningRate 0.0037   Epoch: 17   Global Step: 185510   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:00:45,696-Speed 5456.93 samples/sec   Loss 1.2675   LearningRate 0.0037   Epoch: 17   Global Step: 185520   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:00:53,358-Speed 5346.73 samples/sec   Loss 1.3344   LearningRate 0.0037   Epoch: 17   Global Step: 185530   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:01:00,841-Speed 5473.82 samples/sec   Loss 1.2997   LearningRate 0.0037   Epoch: 17   Global Step: 185540   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:01:08,397-Speed 5421.31 samples/sec   Loss 1.3263   LearningRate 0.0037   Epoch: 17   Global Step: 185550   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:01:15,945-Speed 5428.11 samples/sec   Loss 1.2920   LearningRate 0.0037   Epoch: 17   Global Step: 185560   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:01:23,495-Speed 5426.04 samples/sec   Loss 1.3008   LearningRate 0.0037   Epoch: 17   Global Step: 185570   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:01:31,018-Speed 5444.70 samples/sec   Loss 1.3279   LearningRate 0.0037   Epoch: 17   Global Step: 185580   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:01:38,538-Speed 5447.73 samples/sec   Loss 1.3187   LearningRate 0.0037   Epoch: 17   Global Step: 185590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 13:01:46,240-Speed 5318.24 samples/sec   Loss 1.3200   LearningRate 0.0037   Epoch: 17   Global Step: 185600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 13:01:53,833-Speed 5395.38 samples/sec   Loss 1.3220   LearningRate 0.0037   Epoch: 17   Global Step: 185610   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:02:01,436-Speed 5388.33 samples/sec   Loss 1.3073   LearningRate 0.0037   Epoch: 17   Global Step: 185620   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:02:08,959-Speed 5445.14 samples/sec   Loss 1.3221   LearningRate 0.0037   Epoch: 17   Global Step: 185630   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:02:16,482-Speed 5445.00 samples/sec   Loss 1.3277   LearningRate 0.0037   Epoch: 17   Global Step: 185640   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:02:23,988-Speed 5458.03 samples/sec   Loss 1.3059   LearningRate 0.0036   Epoch: 17   Global Step: 185650   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:02:31,582-Speed 5394.18 samples/sec   Loss 1.2885   LearningRate 0.0036   Epoch: 17   Global Step: 185660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:02:39,173-Speed 5396.76 samples/sec   Loss 1.3143   LearningRate 0.0036   Epoch: 17   Global Step: 185670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:02:46,667-Speed 5467.40 samples/sec   Loss 1.3228   LearningRate 0.0036   Epoch: 17   Global Step: 185680   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:02:54,221-Speed 5422.79 samples/sec   Loss 1.3098   LearningRate 0.0036   Epoch: 17   Global Step: 185690   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:03:01,731-Speed 5455.16 samples/sec   Loss 1.2859   LearningRate 0.0036   Epoch: 17   Global Step: 185700   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:03:09,147-Speed 5524.08 samples/sec   Loss 1.2796   LearningRate 0.0036   Epoch: 17   Global Step: 185710   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:03:16,692-Speed 5429.59 samples/sec   Loss 1.3175   LearningRate 0.0036   Epoch: 17   Global Step: 185720   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:03:24,227-Speed 5436.37 samples/sec   Loss 1.3018   LearningRate 0.0036   Epoch: 17   Global Step: 185730   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:03:31,757-Speed 5440.46 samples/sec   Loss 1.3157   LearningRate 0.0036   Epoch: 17   Global Step: 185740   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:03:39,229-Speed 5482.74 samples/sec   Loss 1.3347   LearningRate 0.0036   Epoch: 17   Global Step: 185750   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:03:46,744-Speed 5451.23 samples/sec   Loss 1.3277   LearningRate 0.0036   Epoch: 17   Global Step: 185760   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:03:54,239-Speed 5465.42 samples/sec   Loss 1.2939   LearningRate 0.0036   Epoch: 17   Global Step: 185770   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:04:01,754-Speed 5451.46 samples/sec   Loss 1.3016   LearningRate 0.0036   Epoch: 17   Global Step: 185780   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:04:09,270-Speed 5450.52 samples/sec   Loss 1.3331   LearningRate 0.0036   Epoch: 17   Global Step: 185790   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:04:16,761-Speed 5468.88 samples/sec   Loss 1.3204   LearningRate 0.0036   Epoch: 17   Global Step: 185800   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:04:24,298-Speed 5434.70 samples/sec   Loss 1.3103   LearningRate 0.0036   Epoch: 17   Global Step: 185810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 13:04:31,838-Speed 5433.00 samples/sec   Loss 1.3054   LearningRate 0.0036   Epoch: 17   Global Step: 185820   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:04:39,433-Speed 5394.48 samples/sec   Loss 1.3043   LearningRate 0.0036   Epoch: 17   Global Step: 185830   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:04:46,956-Speed 5445.06 samples/sec   Loss 1.2684   LearningRate 0.0036   Epoch: 17   Global Step: 185840   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:04:54,416-Speed 5490.96 samples/sec   Loss 1.2922   LearningRate 0.0036   Epoch: 17   Global Step: 185850   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:05:01,934-Speed 5449.62 samples/sec   Loss 1.3026   LearningRate 0.0036   Epoch: 17   Global Step: 185860   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:05:09,402-Speed 5485.49 samples/sec   Loss 1.2786   LearningRate 0.0036   Epoch: 17   Global Step: 185870   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:05:16,903-Speed 5461.17 samples/sec   Loss 1.2749   LearningRate 0.0036   Epoch: 17   Global Step: 185880   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:05:24,348-Speed 5502.00 samples/sec   Loss 1.2669   LearningRate 0.0036   Epoch: 17   Global Step: 185890   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:05:31,819-Speed 5484.18 samples/sec   Loss 1.3208   LearningRate 0.0036   Epoch: 17   Global Step: 185900   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:05:39,327-Speed 5456.01 samples/sec   Loss 1.3001   LearningRate 0.0036   Epoch: 17   Global Step: 185910   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:05:46,824-Speed 5464.63 samples/sec   Loss 1.3018   LearningRate 0.0036   Epoch: 17   Global Step: 185920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 13:05:54,343-Speed 5447.78 samples/sec   Loss 1.2913   LearningRate 0.0036   Epoch: 17   Global Step: 185930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 13:06:01,879-Speed 5436.00 samples/sec   Loss 1.2729   LearningRate 0.0036   Epoch: 17   Global Step: 185940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 13:06:09,378-Speed 5463.08 samples/sec   Loss 1.3149   LearningRate 0.0035   Epoch: 17   Global Step: 185950   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:06:16,855-Speed 5478.83 samples/sec   Loss 1.3022   LearningRate 0.0035   Epoch: 17   Global Step: 185960   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:06:24,294-Speed 5506.84 samples/sec   Loss 1.3238   LearningRate 0.0035   Epoch: 17   Global Step: 185970   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:06:31,967-Speed 5338.95 samples/sec   Loss 1.2875   LearningRate 0.0035   Epoch: 17   Global Step: 185980   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:06:39,466-Speed 5462.77 samples/sec   Loss 1.2759   LearningRate 0.0035   Epoch: 17   Global Step: 185990   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:06:46,943-Speed 5479.08 samples/sec   Loss 1.2913   LearningRate 0.0035   Epoch: 17   Global Step: 186000   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:07:31,268-[lfw][186000]XNorm: 22.746796
Training: 2022-01-09 13:07:31,269-[lfw][186000]Accuracy-Flip: 0.99833+-0.00236
Training: 2022-01-09 13:07:31,269-[lfw][186000]Accuracy-Highest: 0.99850
Training: 2022-01-09 13:08:22,818-[cfp_fp][186000]XNorm: 22.144995
Training: 2022-01-09 13:08:22,819-[cfp_fp][186000]Accuracy-Flip: 0.99371+-0.00345
Training: 2022-01-09 13:08:22,820-[cfp_fp][186000]Accuracy-Highest: 0.99371
Training: 2022-01-09 13:09:07,030-[agedb_30][186000]XNorm: 23.323391
Training: 2022-01-09 13:09:07,030-[agedb_30][186000]Accuracy-Flip: 0.98450+-0.00543
Training: 2022-01-09 13:09:07,031-[agedb_30][186000]Accuracy-Highest: 0.98500
Training: 2022-01-09 13:09:14,073-Speed 278.40 samples/sec   Loss 1.3107   LearningRate 0.0035   Epoch: 17   Global Step: 186010   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:09:21,497-Speed 5518.28 samples/sec   Loss 1.3049   LearningRate 0.0035   Epoch: 17   Global Step: 186020   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:09:29,070-Speed 5409.61 samples/sec   Loss 1.3192   LearningRate 0.0035   Epoch: 17   Global Step: 186030   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 13:09:36,615-Speed 5429.86 samples/sec   Loss 1.2869   LearningRate 0.0035   Epoch: 17   Global Step: 186040   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 13:09:44,134-Speed 5448.23 samples/sec   Loss 1.3060   LearningRate 0.0035   Epoch: 17   Global Step: 186050   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 13:09:51,645-Speed 5453.84 samples/sec   Loss 1.3040   LearningRate 0.0035   Epoch: 17   Global Step: 186060   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 13:10:00,362-Speed 5436.61 samples/sec   Loss 1.3067   LearningRate 0.0035   Epoch: 17   Global Step: 186070   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 13:10:07,888-Speed 5443.25 samples/sec   Loss 1.2899   LearningRate 0.0035   Epoch: 17   Global Step: 186080   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 13:10:15,426-Speed 5434.09 samples/sec   Loss 1.3157   LearningRate 0.0035   Epoch: 17   Global Step: 186090   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 13:10:22,982-Speed 5421.93 samples/sec   Loss 1.2918   LearningRate 0.0035   Epoch: 17   Global Step: 186100   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 13:10:30,527-Speed 5429.62 samples/sec   Loss 1.2794   LearningRate 0.0035   Epoch: 17   Global Step: 186110   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 13:10:38,043-Speed 5449.84 samples/sec   Loss 1.3039   LearningRate 0.0035   Epoch: 17   Global Step: 186120   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 13:10:45,557-Speed 5451.69 samples/sec   Loss 1.2906   LearningRate 0.0035   Epoch: 17   Global Step: 186130   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:10:53,093-Speed 5436.29 samples/sec   Loss 1.2889   LearningRate 0.0035   Epoch: 17   Global Step: 186140   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:11:00,576-Speed 5474.74 samples/sec   Loss 1.2948   LearningRate 0.0035   Epoch: 17   Global Step: 186150   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:11:08,118-Speed 5431.81 samples/sec   Loss 1.2806   LearningRate 0.0035   Epoch: 17   Global Step: 186160   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:11:15,681-Speed 5415.93 samples/sec   Loss 1.3006   LearningRate 0.0035   Epoch: 17   Global Step: 186170   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:11:23,232-Speed 5425.13 samples/sec   Loss 1.2836   LearningRate 0.0035   Epoch: 17   Global Step: 186180   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:11:30,734-Speed 5461.14 samples/sec   Loss 1.2861   LearningRate 0.0035   Epoch: 17   Global Step: 186190   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:11:38,225-Speed 5468.68 samples/sec   Loss 1.2768   LearningRate 0.0035   Epoch: 17   Global Step: 186200   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:11:45,726-Speed 5460.92 samples/sec   Loss 1.3004   LearningRate 0.0035   Epoch: 17   Global Step: 186210   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:11:53,256-Speed 5440.60 samples/sec   Loss 1.3030   LearningRate 0.0035   Epoch: 17   Global Step: 186220   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:12:00,793-Speed 5435.23 samples/sec   Loss 1.3037   LearningRate 0.0035   Epoch: 17   Global Step: 186230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 13:12:08,397-Speed 5387.08 samples/sec   Loss 1.2758   LearningRate 0.0035   Epoch: 17   Global Step: 186240   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:12:15,899-Speed 5460.61 samples/sec   Loss 1.2797   LearningRate 0.0035   Epoch: 17   Global Step: 186250   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:12:23,406-Speed 5457.19 samples/sec   Loss 1.2875   LearningRate 0.0034   Epoch: 17   Global Step: 186260   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:12:30,935-Speed 5441.06 samples/sec   Loss 1.2891   LearningRate 0.0034   Epoch: 17   Global Step: 186270   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:12:38,422-Speed 5471.86 samples/sec   Loss 1.2855   LearningRate 0.0034   Epoch: 17   Global Step: 186280   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:12:45,911-Speed 5470.01 samples/sec   Loss 1.2945   LearningRate 0.0034   Epoch: 17   Global Step: 186290   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:12:53,567-Speed 5350.64 samples/sec   Loss 1.2796   LearningRate 0.0034   Epoch: 17   Global Step: 186300   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:13:01,079-Speed 5453.29 samples/sec   Loss 1.2811   LearningRate 0.0034   Epoch: 17   Global Step: 186310   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:13:08,694-Speed 5379.98 samples/sec   Loss 1.2982   LearningRate 0.0034   Epoch: 17   Global Step: 186320   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:13:16,319-Speed 5372.36 samples/sec   Loss 1.3031   LearningRate 0.0034   Epoch: 17   Global Step: 186330   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:13:23,850-Speed 5439.71 samples/sec   Loss 1.2760   LearningRate 0.0034   Epoch: 17   Global Step: 186340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 13:13:31,381-Speed 5439.35 samples/sec   Loss 1.2820   LearningRate 0.0034   Epoch: 17   Global Step: 186350   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:13:39,289-Speed 5180.30 samples/sec   Loss 1.3006   LearningRate 0.0034   Epoch: 17   Global Step: 186360   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:13:46,955-Speed 5343.60 samples/sec   Loss 1.2792   LearningRate 0.0034   Epoch: 17   Global Step: 186370   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:13:54,464-Speed 5455.50 samples/sec   Loss 1.2902   LearningRate 0.0034   Epoch: 17   Global Step: 186380   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:14:02,012-Speed 5427.41 samples/sec   Loss 1.2970   LearningRate 0.0034   Epoch: 17   Global Step: 186390   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:14:09,511-Speed 5463.10 samples/sec   Loss 1.2779   LearningRate 0.0034   Epoch: 17   Global Step: 186400   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:14:16,959-Speed 5500.58 samples/sec   Loss 1.2712   LearningRate 0.0034   Epoch: 17   Global Step: 186410   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:14:24,462-Speed 5459.90 samples/sec   Loss 1.3120   LearningRate 0.0034   Epoch: 17   Global Step: 186420   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:14:32,005-Speed 5430.57 samples/sec   Loss 1.2707   LearningRate 0.0034   Epoch: 17   Global Step: 186430   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:14:39,675-Speed 5340.92 samples/sec   Loss 1.2993   LearningRate 0.0034   Epoch: 17   Global Step: 186440   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:14:47,232-Speed 5421.36 samples/sec   Loss 1.2898   LearningRate 0.0034   Epoch: 17   Global Step: 186450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 13:14:54,760-Speed 5441.78 samples/sec   Loss 1.2681   LearningRate 0.0034   Epoch: 17   Global Step: 186460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 13:15:02,330-Speed 5410.88 samples/sec   Loss 1.2991   LearningRate 0.0034   Epoch: 17   Global Step: 186470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 13:15:09,847-Speed 5449.88 samples/sec   Loss 1.2900   LearningRate 0.0034   Epoch: 17   Global Step: 186480   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:15:17,451-Speed 5387.70 samples/sec   Loss 1.2979   LearningRate 0.0034   Epoch: 17   Global Step: 186490   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:15:24,958-Speed 5457.17 samples/sec   Loss 1.2595   LearningRate 0.0034   Epoch: 17   Global Step: 186500   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:15:32,477-Speed 5448.16 samples/sec   Loss 1.2820   LearningRate 0.0034   Epoch: 17   Global Step: 186510   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:15:39,987-Speed 5455.08 samples/sec   Loss 1.2902   LearningRate 0.0034   Epoch: 17   Global Step: 186520   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:15:47,515-Speed 5441.16 samples/sec   Loss 1.3065   LearningRate 0.0034   Epoch: 17   Global Step: 186530   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:15:55,034-Speed 5448.51 samples/sec   Loss 1.2813   LearningRate 0.0034   Epoch: 17   Global Step: 186540   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:16:02,638-Speed 5387.74 samples/sec   Loss 1.2693   LearningRate 0.0034   Epoch: 17   Global Step: 186550   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:16:10,176-Speed 5434.45 samples/sec   Loss 1.2722   LearningRate 0.0034   Epoch: 17   Global Step: 186560   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:16:17,727-Speed 5425.38 samples/sec   Loss 1.2775   LearningRate 0.0033   Epoch: 17   Global Step: 186570   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:16:25,258-Speed 5439.11 samples/sec   Loss 1.2888   LearningRate 0.0033   Epoch: 17   Global Step: 186580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 13:16:32,768-Speed 5454.68 samples/sec   Loss 1.2547   LearningRate 0.0033   Epoch: 17   Global Step: 186590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 13:16:40,297-Speed 5441.52 samples/sec   Loss 1.3167   LearningRate 0.0033   Epoch: 17   Global Step: 186600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 13:16:47,808-Speed 5453.53 samples/sec   Loss 1.2833   LearningRate 0.0033   Epoch: 17   Global Step: 186610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 13:16:55,329-Speed 5446.80 samples/sec   Loss 1.2723   LearningRate 0.0033   Epoch: 17   Global Step: 186620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 13:17:02,832-Speed 5460.05 samples/sec   Loss 1.2780   LearningRate 0.0033   Epoch: 17   Global Step: 186630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 13:17:10,345-Speed 5452.33 samples/sec   Loss 1.2730   LearningRate 0.0033   Epoch: 17   Global Step: 186640   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:17:17,878-Speed 5438.72 samples/sec   Loss 1.3029   LearningRate 0.0033   Epoch: 17   Global Step: 186650   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:17:41,772-Speed 1714.30 samples/sec   Loss 1.2811   LearningRate 0.0033   Epoch: 18   Global Step: 186660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:17:49,281-Speed 5455.38 samples/sec   Loss 1.2775   LearningRate 0.0033   Epoch: 18   Global Step: 186670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:17:56,752-Speed 5482.85 samples/sec   Loss 1.2878   LearningRate 0.0033   Epoch: 18   Global Step: 186680   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:18:04,239-Speed 5471.78 samples/sec   Loss 1.2927   LearningRate 0.0033   Epoch: 18   Global Step: 186690   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:18:11,709-Speed 5484.20 samples/sec   Loss 1.2463   LearningRate 0.0033   Epoch: 18   Global Step: 186700   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:18:19,205-Speed 5464.62 samples/sec   Loss 1.2600   LearningRate 0.0033   Epoch: 18   Global Step: 186710   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:18:26,692-Speed 5470.90 samples/sec   Loss 1.2806   LearningRate 0.0033   Epoch: 18   Global Step: 186720   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:18:34,193-Speed 5461.76 samples/sec   Loss 1.2637   LearningRate 0.0033   Epoch: 18   Global Step: 186730   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:18:41,667-Speed 5481.67 samples/sec   Loss 1.2533   LearningRate 0.0033   Epoch: 18   Global Step: 186740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 13:18:49,163-Speed 5464.54 samples/sec   Loss 1.2585   LearningRate 0.0033   Epoch: 18   Global Step: 186750   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:18:56,620-Speed 5493.37 samples/sec   Loss 1.2508   LearningRate 0.0033   Epoch: 18   Global Step: 186760   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:19:04,130-Speed 5455.44 samples/sec   Loss 1.2687   LearningRate 0.0033   Epoch: 18   Global Step: 186770   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:19:11,673-Speed 5431.42 samples/sec   Loss 1.2729   LearningRate 0.0033   Epoch: 18   Global Step: 186780   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:19:19,208-Speed 5436.31 samples/sec   Loss 1.2827   LearningRate 0.0033   Epoch: 18   Global Step: 186790   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:19:26,706-Speed 5463.50 samples/sec   Loss 1.2781   LearningRate 0.0033   Epoch: 18   Global Step: 186800   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:19:34,246-Speed 5433.04 samples/sec   Loss 1.2431   LearningRate 0.0033   Epoch: 18   Global Step: 186810   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:19:41,761-Speed 5451.53 samples/sec   Loss 1.2525   LearningRate 0.0033   Epoch: 18   Global Step: 186820   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:19:49,246-Speed 5472.74 samples/sec   Loss 1.2479   LearningRate 0.0033   Epoch: 18   Global Step: 186830   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:19:56,821-Speed 5407.88 samples/sec   Loss 1.2575   LearningRate 0.0033   Epoch: 18   Global Step: 186840   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:20:04,260-Speed 5507.20 samples/sec   Loss 1.2883   LearningRate 0.0033   Epoch: 18   Global Step: 186850   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:20:11,759-Speed 5462.83 samples/sec   Loss 1.2801   LearningRate 0.0033   Epoch: 18   Global Step: 186860   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:20:19,250-Speed 5468.43 samples/sec   Loss 1.2599   LearningRate 0.0033   Epoch: 18   Global Step: 186870   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:20:26,759-Speed 5455.68 samples/sec   Loss 1.2474   LearningRate 0.0032   Epoch: 18   Global Step: 186880   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:20:34,344-Speed 5400.79 samples/sec   Loss 1.2482   LearningRate 0.0032   Epoch: 18   Global Step: 186890   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:20:41,855-Speed 5454.47 samples/sec   Loss 1.2809   LearningRate 0.0032   Epoch: 18   Global Step: 186900   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:20:49,362-Speed 5456.80 samples/sec   Loss 1.2480   LearningRate 0.0032   Epoch: 18   Global Step: 186910   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:20:56,855-Speed 5467.29 samples/sec   Loss 1.2668   LearningRate 0.0032   Epoch: 18   Global Step: 186920   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:21:04,429-Speed 5409.07 samples/sec   Loss 1.2541   LearningRate 0.0032   Epoch: 18   Global Step: 186930   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:21:12,131-Speed 5319.20 samples/sec   Loss 1.2730   LearningRate 0.0032   Epoch: 18   Global Step: 186940   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:21:19,810-Speed 5334.44 samples/sec   Loss 1.2393   LearningRate 0.0032   Epoch: 18   Global Step: 186950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 13:21:27,351-Speed 5432.39 samples/sec   Loss 1.2583   LearningRate 0.0032   Epoch: 18   Global Step: 186960   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:21:34,869-Speed 5449.38 samples/sec   Loss 1.2824   LearningRate 0.0032   Epoch: 18   Global Step: 186970   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:21:42,419-Speed 5426.08 samples/sec   Loss 1.2403   LearningRate 0.0032   Epoch: 18   Global Step: 186980   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:21:50,133-Speed 5309.75 samples/sec   Loss 1.2457   LearningRate 0.0032   Epoch: 18   Global Step: 186990   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:21:57,672-Speed 5434.72 samples/sec   Loss 1.2418   LearningRate 0.0032   Epoch: 18   Global Step: 187000   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:22:05,181-Speed 5455.69 samples/sec   Loss 1.2544   LearningRate 0.0032   Epoch: 18   Global Step: 187010   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:22:12,707-Speed 5443.42 samples/sec   Loss 1.2578   LearningRate 0.0032   Epoch: 18   Global Step: 187020   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:22:20,211-Speed 5459.36 samples/sec   Loss 1.2749   LearningRate 0.0032   Epoch: 18   Global Step: 187030   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:22:27,706-Speed 5465.24 samples/sec   Loss 1.2566   LearningRate 0.0032   Epoch: 18   Global Step: 187040   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:22:35,207-Speed 5461.06 samples/sec   Loss 1.2397   LearningRate 0.0032   Epoch: 18   Global Step: 187050   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:22:42,859-Speed 5354.51 samples/sec   Loss 1.2535   LearningRate 0.0032   Epoch: 18   Global Step: 187060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 13:22:50,372-Speed 5451.76 samples/sec   Loss 1.2776   LearningRate 0.0032   Epoch: 18   Global Step: 187070   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:22:57,915-Speed 5431.33 samples/sec   Loss 1.2593   LearningRate 0.0032   Epoch: 18   Global Step: 187080   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:23:05,420-Speed 5458.70 samples/sec   Loss 1.2363   LearningRate 0.0032   Epoch: 18   Global Step: 187090   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:23:12,926-Speed 5457.72 samples/sec   Loss 1.2849   LearningRate 0.0032   Epoch: 18   Global Step: 187100   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:23:20,448-Speed 5446.07 samples/sec   Loss 1.2450   LearningRate 0.0032   Epoch: 18   Global Step: 187110   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:23:27,971-Speed 5445.02 samples/sec   Loss 1.2426   LearningRate 0.0032   Epoch: 18   Global Step: 187120   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:23:35,508-Speed 5435.52 samples/sec   Loss 1.2350   LearningRate 0.0032   Epoch: 18   Global Step: 187130   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:23:42,987-Speed 5477.83 samples/sec   Loss 1.2587   LearningRate 0.0032   Epoch: 18   Global Step: 187140   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 13:23:50,508-Speed 5446.51 samples/sec   Loss 1.2260   LearningRate 0.0032   Epoch: 18   Global Step: 187150   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 13:23:58,052-Speed 5429.90 samples/sec   Loss 1.2520   LearningRate 0.0032   Epoch: 18   Global Step: 187160   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 13:24:05,582-Speed 5440.44 samples/sec   Loss 1.2577   LearningRate 0.0032   Epoch: 18   Global Step: 187170   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 13:24:13,151-Speed 5412.62 samples/sec   Loss 1.2629   LearningRate 0.0032   Epoch: 18   Global Step: 187180   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 13:24:20,735-Speed 5401.15 samples/sec   Loss 1.2699   LearningRate 0.0032   Epoch: 18   Global Step: 187190   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 13:24:28,317-Speed 5403.03 samples/sec   Loss 1.2465   LearningRate 0.0031   Epoch: 18   Global Step: 187200   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 13:24:35,905-Speed 5398.45 samples/sec   Loss 1.2376   LearningRate 0.0031   Epoch: 18   Global Step: 187210   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 13:24:43,405-Speed 5462.47 samples/sec   Loss 1.2670   LearningRate 0.0031   Epoch: 18   Global Step: 187220   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 13:24:50,957-Speed 5424.95 samples/sec   Loss 1.2342   LearningRate 0.0031   Epoch: 18   Global Step: 187230   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 13:24:58,441-Speed 5472.99 samples/sec   Loss 1.2685   LearningRate 0.0031   Epoch: 18   Global Step: 187240   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:25:06,061-Speed 5375.93 samples/sec   Loss 1.2526   LearningRate 0.0031   Epoch: 18   Global Step: 187250   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:25:13,588-Speed 5442.78 samples/sec   Loss 1.2444   LearningRate 0.0031   Epoch: 18   Global Step: 187260   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:25:21,190-Speed 5388.99 samples/sec   Loss 1.2508   LearningRate 0.0031   Epoch: 18   Global Step: 187270   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:25:28,713-Speed 5445.43 samples/sec   Loss 1.2465   LearningRate 0.0031   Epoch: 18   Global Step: 187280   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:25:36,461-Speed 5286.47 samples/sec   Loss 1.2517   LearningRate 0.0031   Epoch: 18   Global Step: 187290   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 13:25:44,080-Speed 5377.03 samples/sec   Loss 1.2345   LearningRate 0.0031   Epoch: 18   Global Step: 187300   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 13:25:51,609-Speed 5441.37 samples/sec   Loss 1.2564   LearningRate 0.0031   Epoch: 18   Global Step: 187310   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 13:25:59,144-Speed 5436.40 samples/sec   Loss 1.2871   LearningRate 0.0031   Epoch: 18   Global Step: 187320   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 13:26:06,635-Speed 5468.14 samples/sec   Loss 1.2712   LearningRate 0.0031   Epoch: 18   Global Step: 187330   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 13:26:14,101-Speed 5487.32 samples/sec   Loss 1.2418   LearningRate 0.0031   Epoch: 18   Global Step: 187340   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 13:26:21,595-Speed 5466.32 samples/sec   Loss 1.2473   LearningRate 0.0031   Epoch: 18   Global Step: 187350   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 13:26:29,182-Speed 5399.18 samples/sec   Loss 1.2617   LearningRate 0.0031   Epoch: 18   Global Step: 187360   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 13:26:36,761-Speed 5405.38 samples/sec   Loss 1.2343   LearningRate 0.0031   Epoch: 18   Global Step: 187370   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 13:26:44,370-Speed 5383.49 samples/sec   Loss 1.2418   LearningRate 0.0031   Epoch: 18   Global Step: 187380   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 13:26:51,943-Speed 5409.66 samples/sec   Loss 1.2473   LearningRate 0.0031   Epoch: 18   Global Step: 187390   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:26:59,463-Speed 5447.90 samples/sec   Loss 1.2247   LearningRate 0.0031   Epoch: 18   Global Step: 187400   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:27:06,986-Speed 5444.98 samples/sec   Loss 1.2342   LearningRate 0.0031   Epoch: 18   Global Step: 187410   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:27:14,537-Speed 5425.13 samples/sec   Loss 1.2487   LearningRate 0.0031   Epoch: 18   Global Step: 187420   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:27:22,081-Speed 5430.10 samples/sec   Loss 1.1981   LearningRate 0.0031   Epoch: 18   Global Step: 187430   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:27:29,621-Speed 5432.87 samples/sec   Loss 1.2546   LearningRate 0.0031   Epoch: 18   Global Step: 187440   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:27:37,183-Speed 5417.12 samples/sec   Loss 1.2478   LearningRate 0.0031   Epoch: 18   Global Step: 187450   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:27:44,750-Speed 5413.60 samples/sec   Loss 1.2549   LearningRate 0.0031   Epoch: 18   Global Step: 187460   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:27:52,271-Speed 5446.76 samples/sec   Loss 1.2290   LearningRate 0.0031   Epoch: 18   Global Step: 187470   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:27:59,813-Speed 5431.67 samples/sec   Loss 1.2524   LearningRate 0.0031   Epoch: 18   Global Step: 187480   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:28:07,351-Speed 5434.59 samples/sec   Loss 1.2160   LearningRate 0.0031   Epoch: 18   Global Step: 187490   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:28:14,851-Speed 5461.87 samples/sec   Loss 1.2613   LearningRate 0.0031   Epoch: 18   Global Step: 187500   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:28:22,403-Speed 5424.88 samples/sec   Loss 1.2437   LearningRate 0.0031   Epoch: 18   Global Step: 187510   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:28:29,952-Speed 5426.07 samples/sec   Loss 1.2403   LearningRate 0.0030   Epoch: 18   Global Step: 187520   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:28:37,516-Speed 5415.88 samples/sec   Loss 1.2352   LearningRate 0.0030   Epoch: 18   Global Step: 187530   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:28:45,081-Speed 5414.96 samples/sec   Loss 1.2487   LearningRate 0.0030   Epoch: 18   Global Step: 187540   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:28:52,807-Speed 5302.31 samples/sec   Loss 1.2564   LearningRate 0.0030   Epoch: 18   Global Step: 187550   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:29:00,376-Speed 5412.96 samples/sec   Loss 1.2473   LearningRate 0.0030   Epoch: 18   Global Step: 187560   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:29:07,883-Speed 5456.03 samples/sec   Loss 1.2549   LearningRate 0.0030   Epoch: 18   Global Step: 187570   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:29:15,447-Speed 5416.09 samples/sec   Loss 1.2333   LearningRate 0.0030   Epoch: 18   Global Step: 187580   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:29:23,134-Speed 5329.07 samples/sec   Loss 1.2451   LearningRate 0.0030   Epoch: 18   Global Step: 187590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 13:29:30,642-Speed 5456.78 samples/sec   Loss 1.2656   LearningRate 0.0030   Epoch: 18   Global Step: 187600   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:29:38,190-Speed 5426.57 samples/sec   Loss 1.2475   LearningRate 0.0030   Epoch: 18   Global Step: 187610   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:29:45,716-Speed 5443.62 samples/sec   Loss 1.2470   LearningRate 0.0030   Epoch: 18   Global Step: 187620   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:29:53,270-Speed 5423.23 samples/sec   Loss 1.2622   LearningRate 0.0030   Epoch: 18   Global Step: 187630   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:30:00,827-Speed 5420.47 samples/sec   Loss 1.2540   LearningRate 0.0030   Epoch: 18   Global Step: 187640   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:30:08,377-Speed 5425.91 samples/sec   Loss 1.2486   LearningRate 0.0030   Epoch: 18   Global Step: 187650   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:30:15,950-Speed 5409.07 samples/sec   Loss 1.2335   LearningRate 0.0030   Epoch: 18   Global Step: 187660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:30:23,543-Speed 5395.58 samples/sec   Loss 1.2203   LearningRate 0.0030   Epoch: 18   Global Step: 187670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:30:31,053-Speed 5454.55 samples/sec   Loss 1.2339   LearningRate 0.0030   Epoch: 18   Global Step: 187680   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:30:38,570-Speed 5449.89 samples/sec   Loss 1.2666   LearningRate 0.0030   Epoch: 18   Global Step: 187690   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:30:46,022-Speed 5497.42 samples/sec   Loss 1.2349   LearningRate 0.0030   Epoch: 18   Global Step: 187700   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:30:53,545-Speed 5445.59 samples/sec   Loss 1.2347   LearningRate 0.0030   Epoch: 18   Global Step: 187710   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:31:01,147-Speed 5388.73 samples/sec   Loss 1.2476   LearningRate 0.0030   Epoch: 18   Global Step: 187720   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:31:08,695-Speed 5426.68 samples/sec   Loss 1.2271   LearningRate 0.0030   Epoch: 18   Global Step: 187730   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 13:31:16,172-Speed 5479.22 samples/sec   Loss 1.2463   LearningRate 0.0030   Epoch: 18   Global Step: 187740   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:31:23,719-Speed 5428.34 samples/sec   Loss 1.2281   LearningRate 0.0030   Epoch: 18   Global Step: 187750   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:31:31,226-Speed 5457.43 samples/sec   Loss 1.2188   LearningRate 0.0030   Epoch: 18   Global Step: 187760   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:31:38,738-Speed 5452.59 samples/sec   Loss 1.2420   LearningRate 0.0030   Epoch: 18   Global Step: 187770   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-01-09 13:31:46,242-Speed 5458.96 samples/sec   Loss 1.2636   LearningRate 0.0030   Epoch: 18   Global Step: 187780   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-01-09 13:31:53,852-Speed 5383.66 samples/sec   Loss 1.2061   LearningRate 0.0030   Epoch: 18   Global Step: 187790   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-01-09 13:32:01,506-Speed 5352.31 samples/sec   Loss 1.2212   LearningRate 0.0030   Epoch: 18   Global Step: 187800   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-01-09 13:32:09,070-Speed 5415.36 samples/sec   Loss 1.2139   LearningRate 0.0030   Epoch: 18   Global Step: 187810   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-01-09 13:32:16,626-Speed 5422.16 samples/sec   Loss 1.1994   LearningRate 0.0030   Epoch: 18   Global Step: 187820   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-01-09 13:32:24,247-Speed 5375.14 samples/sec   Loss 1.2275   LearningRate 0.0030   Epoch: 18   Global Step: 187830   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-01-09 13:32:31,896-Speed 5356.23 samples/sec   Loss 1.2231   LearningRate 0.0030   Epoch: 18   Global Step: 187840   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-01-09 13:32:39,596-Speed 5319.92 samples/sec   Loss 1.2339   LearningRate 0.0029   Epoch: 18   Global Step: 187850   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-01-09 13:32:47,131-Speed 5436.18 samples/sec   Loss 1.2437   LearningRate 0.0029   Epoch: 18   Global Step: 187860   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-01-09 13:32:54,707-Speed 5407.89 samples/sec   Loss 1.2077   LearningRate 0.0029   Epoch: 18   Global Step: 187870   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:33:02,347-Speed 5361.69 samples/sec   Loss 1.2411   LearningRate 0.0029   Epoch: 18   Global Step: 187880   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:33:09,887-Speed 5433.17 samples/sec   Loss 1.2302   LearningRate 0.0029   Epoch: 18   Global Step: 187890   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:33:17,381-Speed 5465.76 samples/sec   Loss 1.2209   LearningRate 0.0029   Epoch: 18   Global Step: 187900   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:33:24,840-Speed 5492.37 samples/sec   Loss 1.2395   LearningRate 0.0029   Epoch: 18   Global Step: 187910   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:33:32,331-Speed 5468.89 samples/sec   Loss 1.2342   LearningRate 0.0029   Epoch: 18   Global Step: 187920   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:33:39,909-Speed 5405.51 samples/sec   Loss 1.2279   LearningRate 0.0029   Epoch: 18   Global Step: 187930   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:33:47,496-Speed 5400.14 samples/sec   Loss 1.2090   LearningRate 0.0029   Epoch: 18   Global Step: 187940   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:33:55,042-Speed 5428.84 samples/sec   Loss 1.2116   LearningRate 0.0029   Epoch: 18   Global Step: 187950   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:34:02,628-Speed 5400.42 samples/sec   Loss 1.2289   LearningRate 0.0029   Epoch: 18   Global Step: 187960   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:34:10,303-Speed 5337.08 samples/sec   Loss 1.2245   LearningRate 0.0029   Epoch: 18   Global Step: 187970   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:34:17,767-Speed 5488.30 samples/sec   Loss 1.2020   LearningRate 0.0029   Epoch: 18   Global Step: 187980   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:34:25,319-Speed 5424.79 samples/sec   Loss 1.2178   LearningRate 0.0029   Epoch: 18   Global Step: 187990   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:34:32,745-Speed 5516.40 samples/sec   Loss 1.2125   LearningRate 0.0029   Epoch: 18   Global Step: 188000   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:35:16,728-[lfw][188000]XNorm: 21.914746
Training: 2022-01-09 13:35:16,729-[lfw][188000]Accuracy-Flip: 0.99833+-0.00236
Training: 2022-01-09 13:35:16,730-[lfw][188000]Accuracy-Highest: 0.99850
Training: 2022-01-09 13:36:07,931-[cfp_fp][188000]XNorm: 21.509004
Training: 2022-01-09 13:36:07,931-[cfp_fp][188000]Accuracy-Flip: 0.99371+-0.00357
Training: 2022-01-09 13:36:07,932-[cfp_fp][188000]Accuracy-Highest: 0.99371
Training: 2022-01-09 13:36:51,967-[agedb_30][188000]XNorm: 22.512485
Training: 2022-01-09 13:36:51,968-[agedb_30][188000]Accuracy-Flip: 0.98467+-0.00600
Training: 2022-01-09 13:36:51,968-[agedb_30][188000]Accuracy-Highest: 0.98500
Training: 2022-01-09 13:36:59,706-Speed 278.72 samples/sec   Loss 1.2108   LearningRate 0.0029   Epoch: 18   Global Step: 188010   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:37:07,247-Speed 5432.53 samples/sec   Loss 1.2158   LearningRate 0.0029   Epoch: 18   Global Step: 188020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:37:14,678-Speed 5512.59 samples/sec   Loss 1.2291   LearningRate 0.0029   Epoch: 18   Global Step: 188030   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:37:22,224-Speed 5428.89 samples/sec   Loss 1.2058   LearningRate 0.0029   Epoch: 18   Global Step: 188040   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:37:29,902-Speed 5335.64 samples/sec   Loss 1.2241   LearningRate 0.0029   Epoch: 18   Global Step: 188050   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:37:37,437-Speed 5436.32 samples/sec   Loss 1.2333   LearningRate 0.0029   Epoch: 18   Global Step: 188060   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:37:44,940-Speed 5460.17 samples/sec   Loss 1.2066   LearningRate 0.0029   Epoch: 18   Global Step: 188070   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:37:52,512-Speed 5409.86 samples/sec   Loss 1.2298   LearningRate 0.0029   Epoch: 18   Global Step: 188080   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:38:00,119-Speed 5385.73 samples/sec   Loss 1.2320   LearningRate 0.0029   Epoch: 18   Global Step: 188090   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:38:07,607-Speed 5470.29 samples/sec   Loss 1.2329   LearningRate 0.0029   Epoch: 18   Global Step: 188100   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:38:15,150-Speed 5430.87 samples/sec   Loss 1.2272   LearningRate 0.0029   Epoch: 18   Global Step: 188110   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:38:22,685-Speed 5436.75 samples/sec   Loss 1.2110   LearningRate 0.0029   Epoch: 18   Global Step: 188120   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:38:30,403-Speed 5308.03 samples/sec   Loss 1.2306   LearningRate 0.0029   Epoch: 18   Global Step: 188130   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:38:38,067-Speed 5345.40 samples/sec   Loss 1.2485   LearningRate 0.0029   Epoch: 18   Global Step: 188140   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:38:45,585-Speed 5448.85 samples/sec   Loss 1.2001   LearningRate 0.0029   Epoch: 18   Global Step: 188150   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:38:53,165-Speed 5404.40 samples/sec   Loss 1.2133   LearningRate 0.0029   Epoch: 18   Global Step: 188160   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:39:00,756-Speed 5396.46 samples/sec   Loss 1.2016   LearningRate 0.0029   Epoch: 18   Global Step: 188170   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:39:08,245-Speed 5470.37 samples/sec   Loss 1.1981   LearningRate 0.0028   Epoch: 18   Global Step: 188180   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:39:15,783-Speed 5434.13 samples/sec   Loss 1.2058   LearningRate 0.0028   Epoch: 18   Global Step: 188190   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:39:23,332-Speed 5426.60 samples/sec   Loss 1.2264   LearningRate 0.0028   Epoch: 18   Global Step: 188200   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:39:31,051-Speed 5307.64 samples/sec   Loss 1.1930   LearningRate 0.0028   Epoch: 18   Global Step: 188210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:39:38,625-Speed 5408.69 samples/sec   Loss 1.1937   LearningRate 0.0028   Epoch: 18   Global Step: 188220   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:39:46,202-Speed 5406.38 samples/sec   Loss 1.1940   LearningRate 0.0028   Epoch: 18   Global Step: 188230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 13:39:53,699-Speed 5464.11 samples/sec   Loss 1.2178   LearningRate 0.0028   Epoch: 18   Global Step: 188240   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:40:01,259-Speed 5418.94 samples/sec   Loss 1.2359   LearningRate 0.0028   Epoch: 18   Global Step: 188250   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:40:08,814-Speed 5421.95 samples/sec   Loss 1.2120   LearningRate 0.0028   Epoch: 18   Global Step: 188260   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:40:16,374-Speed 5418.75 samples/sec   Loss 1.1983   LearningRate 0.0028   Epoch: 18   Global Step: 188270   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:40:23,991-Speed 5378.09 samples/sec   Loss 1.2231   LearningRate 0.0028   Epoch: 18   Global Step: 188280   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:40:31,506-Speed 5451.07 samples/sec   Loss 1.2146   LearningRate 0.0028   Epoch: 18   Global Step: 188290   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:40:39,084-Speed 5405.72 samples/sec   Loss 1.2329   LearningRate 0.0028   Epoch: 18   Global Step: 188300   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:40:46,606-Speed 5446.00 samples/sec   Loss 1.1942   LearningRate 0.0028   Epoch: 18   Global Step: 188310   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:40:54,148-Speed 5431.21 samples/sec   Loss 1.2276   LearningRate 0.0028   Epoch: 18   Global Step: 188320   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:41:01,728-Speed 5404.97 samples/sec   Loss 1.2174   LearningRate 0.0028   Epoch: 18   Global Step: 188330   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:41:09,343-Speed 5379.79 samples/sec   Loss 1.1844   LearningRate 0.0028   Epoch: 18   Global Step: 188340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 13:41:16,929-Speed 5400.14 samples/sec   Loss 1.2313   LearningRate 0.0028   Epoch: 18   Global Step: 188350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 13:41:24,416-Speed 5471.20 samples/sec   Loss 1.1943   LearningRate 0.0028   Epoch: 18   Global Step: 188360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 13:41:31,978-Speed 5416.99 samples/sec   Loss 1.2151   LearningRate 0.0028   Epoch: 18   Global Step: 188370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:41:39,413-Speed 5510.25 samples/sec   Loss 1.2125   LearningRate 0.0028   Epoch: 18   Global Step: 188380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:41:46,981-Speed 5412.94 samples/sec   Loss 1.2445   LearningRate 0.0028   Epoch: 18   Global Step: 188390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:41:54,538-Speed 5420.15 samples/sec   Loss 1.2029   LearningRate 0.0028   Epoch: 18   Global Step: 188400   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:42:02,060-Speed 5446.45 samples/sec   Loss 1.2282   LearningRate 0.0028   Epoch: 18   Global Step: 188410   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:42:09,618-Speed 5420.32 samples/sec   Loss 1.1919   LearningRate 0.0028   Epoch: 18   Global Step: 188420   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:42:17,191-Speed 5409.09 samples/sec   Loss 1.2322   LearningRate 0.0028   Epoch: 18   Global Step: 188430   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:42:24,721-Speed 5440.17 samples/sec   Loss 1.2145   LearningRate 0.0028   Epoch: 18   Global Step: 188440   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:42:32,299-Speed 5406.30 samples/sec   Loss 1.2169   LearningRate 0.0028   Epoch: 18   Global Step: 188450   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:42:39,779-Speed 5476.72 samples/sec   Loss 1.1809   LearningRate 0.0028   Epoch: 18   Global Step: 188460   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:42:47,300-Speed 5446.74 samples/sec   Loss 1.2198   LearningRate 0.0028   Epoch: 18   Global Step: 188470   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:42:54,749-Speed 5499.67 samples/sec   Loss 1.1979   LearningRate 0.0028   Epoch: 18   Global Step: 188480   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:43:02,248-Speed 5463.18 samples/sec   Loss 1.2136   LearningRate 0.0028   Epoch: 18   Global Step: 188490   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:43:09,690-Speed 5504.75 samples/sec   Loss 1.2035   LearningRate 0.0028   Epoch: 18   Global Step: 188500   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:43:17,131-Speed 5505.09 samples/sec   Loss 1.2102   LearningRate 0.0028   Epoch: 18   Global Step: 188510   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:43:24,686-Speed 5422.28 samples/sec   Loss 1.2206   LearningRate 0.0027   Epoch: 18   Global Step: 188520   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:43:32,155-Speed 5484.79 samples/sec   Loss 1.2044   LearningRate 0.0027   Epoch: 18   Global Step: 188530   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:43:39,818-Speed 5345.79 samples/sec   Loss 1.1906   LearningRate 0.0027   Epoch: 18   Global Step: 188540   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:43:47,366-Speed 5427.37 samples/sec   Loss 1.1821   LearningRate 0.0027   Epoch: 18   Global Step: 188550   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:43:54,828-Speed 5489.97 samples/sec   Loss 1.2107   LearningRate 0.0027   Epoch: 18   Global Step: 188560   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:44:02,353-Speed 5444.07 samples/sec   Loss 1.1680   LearningRate 0.0027   Epoch: 18   Global Step: 188570   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:44:09,858-Speed 5458.07 samples/sec   Loss 1.2023   LearningRate 0.0027   Epoch: 18   Global Step: 188580   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:44:17,329-Speed 5483.48 samples/sec   Loss 1.2010   LearningRate 0.0027   Epoch: 18   Global Step: 188590   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:44:24,792-Speed 5489.10 samples/sec   Loss 1.1916   LearningRate 0.0027   Epoch: 18   Global Step: 188600   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:44:32,301-Speed 5455.07 samples/sec   Loss 1.1697   LearningRate 0.0027   Epoch: 18   Global Step: 188610   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:44:39,763-Speed 5489.93 samples/sec   Loss 1.1843   LearningRate 0.0027   Epoch: 18   Global Step: 188620   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:44:47,244-Speed 5476.69 samples/sec   Loss 1.2113   LearningRate 0.0027   Epoch: 18   Global Step: 188630   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:44:54,818-Speed 5408.47 samples/sec   Loss 1.1933   LearningRate 0.0027   Epoch: 18   Global Step: 188640   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:45:02,316-Speed 5463.64 samples/sec   Loss 1.2007   LearningRate 0.0027   Epoch: 18   Global Step: 188650   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:45:09,838-Speed 5446.02 samples/sec   Loss 1.1925   LearningRate 0.0027   Epoch: 18   Global Step: 188660   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:45:17,336-Speed 5463.97 samples/sec   Loss 1.2063   LearningRate 0.0027   Epoch: 18   Global Step: 188670   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:45:24,894-Speed 5419.81 samples/sec   Loss 1.1964   LearningRate 0.0027   Epoch: 18   Global Step: 188680   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:45:32,313-Speed 5521.46 samples/sec   Loss 1.2002   LearningRate 0.0027   Epoch: 18   Global Step: 188690   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:45:39,821-Speed 5456.58 samples/sec   Loss 1.2049   LearningRate 0.0027   Epoch: 18   Global Step: 188700   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:45:47,387-Speed 5414.74 samples/sec   Loss 1.1936   LearningRate 0.0027   Epoch: 18   Global Step: 188710   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:45:54,909-Speed 5446.24 samples/sec   Loss 1.1955   LearningRate 0.0027   Epoch: 18   Global Step: 188720   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:46:02,365-Speed 5494.23 samples/sec   Loss 1.1741   LearningRate 0.0027   Epoch: 18   Global Step: 188730   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:46:09,878-Speed 5452.12 samples/sec   Loss 1.1837   LearningRate 0.0027   Epoch: 18   Global Step: 188740   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:46:17,377-Speed 5463.17 samples/sec   Loss 1.1830   LearningRate 0.0027   Epoch: 18   Global Step: 188750   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:46:24,914-Speed 5435.61 samples/sec   Loss 1.1920   LearningRate 0.0027   Epoch: 18   Global Step: 188760   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:46:32,447-Speed 5437.66 samples/sec   Loss 1.1837   LearningRate 0.0027   Epoch: 18   Global Step: 188770   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:46:39,953-Speed 5457.50 samples/sec   Loss 1.1847   LearningRate 0.0027   Epoch: 18   Global Step: 188780   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:46:47,452-Speed 5462.84 samples/sec   Loss 1.2165   LearningRate 0.0027   Epoch: 18   Global Step: 188790   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:46:54,934-Speed 5475.62 samples/sec   Loss 1.1897   LearningRate 0.0027   Epoch: 18   Global Step: 188800   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:47:02,397-Speed 5488.87 samples/sec   Loss 1.1796   LearningRate 0.0027   Epoch: 18   Global Step: 188810   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:47:09,853-Speed 5494.34 samples/sec   Loss 1.2264   LearningRate 0.0027   Epoch: 18   Global Step: 188820   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:47:17,360-Speed 5457.20 samples/sec   Loss 1.2067   LearningRate 0.0027   Epoch: 18   Global Step: 188830   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:47:24,858-Speed 5464.06 samples/sec   Loss 1.1970   LearningRate 0.0027   Epoch: 18   Global Step: 188840   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:47:32,410-Speed 5424.48 samples/sec   Loss 1.1924   LearningRate 0.0027   Epoch: 18   Global Step: 188850   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:47:39,850-Speed 5505.42 samples/sec   Loss 1.2123   LearningRate 0.0027   Epoch: 18   Global Step: 188860   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:47:47,317-Speed 5486.58 samples/sec   Loss 1.2017   LearningRate 0.0026   Epoch: 18   Global Step: 188870   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:47:54,773-Speed 5494.56 samples/sec   Loss 1.1935   LearningRate 0.0026   Epoch: 18   Global Step: 188880   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:48:02,213-Speed 5505.63 samples/sec   Loss 1.2093   LearningRate 0.0026   Epoch: 18   Global Step: 188890   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:48:09,629-Speed 5523.92 samples/sec   Loss 1.1971   LearningRate 0.0026   Epoch: 18   Global Step: 188900   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:48:17,118-Speed 5469.96 samples/sec   Loss 1.1973   LearningRate 0.0026   Epoch: 18   Global Step: 188910   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:48:24,541-Speed 5519.55 samples/sec   Loss 1.1885   LearningRate 0.0026   Epoch: 18   Global Step: 188920   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:48:31,997-Speed 5494.17 samples/sec   Loss 1.1759   LearningRate 0.0026   Epoch: 18   Global Step: 188930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 13:48:39,447-Speed 5498.43 samples/sec   Loss 1.1862   LearningRate 0.0026   Epoch: 18   Global Step: 188940   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:48:46,889-Speed 5504.83 samples/sec   Loss 1.1972   LearningRate 0.0026   Epoch: 18   Global Step: 188950   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:48:54,335-Speed 5501.57 samples/sec   Loss 1.2034   LearningRate 0.0026   Epoch: 18   Global Step: 188960   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:49:01,801-Speed 5486.83 samples/sec   Loss 1.1854   LearningRate 0.0026   Epoch: 18   Global Step: 188970   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:49:09,372-Speed 5411.03 samples/sec   Loss 1.1938   LearningRate 0.0026   Epoch: 18   Global Step: 188980   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:49:16,942-Speed 5411.54 samples/sec   Loss 1.1926   LearningRate 0.0026   Epoch: 18   Global Step: 188990   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:49:24,393-Speed 5498.45 samples/sec   Loss 1.1827   LearningRate 0.0026   Epoch: 18   Global Step: 189000   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:49:31,914-Speed 5446.74 samples/sec   Loss 1.2073   LearningRate 0.0026   Epoch: 18   Global Step: 189010   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:49:39,394-Speed 5476.47 samples/sec   Loss 1.2093   LearningRate 0.0026   Epoch: 18   Global Step: 189020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:49:46,816-Speed 5519.35 samples/sec   Loss 1.1873   LearningRate 0.0026   Epoch: 18   Global Step: 189030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:49:54,230-Speed 5525.50 samples/sec   Loss 1.1927   LearningRate 0.0026   Epoch: 18   Global Step: 189040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:50:01,750-Speed 5447.22 samples/sec   Loss 1.1789   LearningRate 0.0026   Epoch: 18   Global Step: 189050   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:50:09,296-Speed 5429.13 samples/sec   Loss 1.1847   LearningRate 0.0026   Epoch: 18   Global Step: 189060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:50:16,743-Speed 5500.48 samples/sec   Loss 1.2068   LearningRate 0.0026   Epoch: 18   Global Step: 189070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:50:24,231-Speed 5470.99 samples/sec   Loss 1.1993   LearningRate 0.0026   Epoch: 18   Global Step: 189080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:50:31,647-Speed 5524.40 samples/sec   Loss 1.1931   LearningRate 0.0026   Epoch: 18   Global Step: 189090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:50:39,204-Speed 5420.91 samples/sec   Loss 1.1836   LearningRate 0.0026   Epoch: 18   Global Step: 189100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:50:46,660-Speed 5494.11 samples/sec   Loss 1.1868   LearningRate 0.0026   Epoch: 18   Global Step: 189110   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:50:54,134-Speed 5481.14 samples/sec   Loss 1.1646   LearningRate 0.0026   Epoch: 18   Global Step: 189120   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:51:01,644-Speed 5454.31 samples/sec   Loss 1.1896   LearningRate 0.0026   Epoch: 18   Global Step: 189130   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:51:09,054-Speed 5528.80 samples/sec   Loss 1.1735   LearningRate 0.0026   Epoch: 18   Global Step: 189140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 13:51:16,531-Speed 5478.98 samples/sec   Loss 1.1667   LearningRate 0.0026   Epoch: 18   Global Step: 189150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 13:51:23,995-Speed 5488.30 samples/sec   Loss 1.1276   LearningRate 0.0026   Epoch: 18   Global Step: 189160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 13:51:31,466-Speed 5482.95 samples/sec   Loss 1.1795   LearningRate 0.0026   Epoch: 18   Global Step: 189170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 13:51:39,000-Speed 5437.99 samples/sec   Loss 1.2113   LearningRate 0.0026   Epoch: 18   Global Step: 189180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 13:51:46,435-Speed 5509.77 samples/sec   Loss 1.1864   LearningRate 0.0026   Epoch: 18   Global Step: 189190   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:51:53,933-Speed 5463.41 samples/sec   Loss 1.1931   LearningRate 0.0026   Epoch: 18   Global Step: 189200   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:52:01,421-Speed 5470.85 samples/sec   Loss 1.2093   LearningRate 0.0026   Epoch: 18   Global Step: 189210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:52:08,884-Speed 5489.36 samples/sec   Loss 1.1929   LearningRate 0.0025   Epoch: 18   Global Step: 189220   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:52:16,280-Speed 5539.02 samples/sec   Loss 1.1935   LearningRate 0.0025   Epoch: 18   Global Step: 189230   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:52:23,738-Speed 5492.25 samples/sec   Loss 1.1898   LearningRate 0.0025   Epoch: 18   Global Step: 189240   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:52:31,190-Speed 5497.77 samples/sec   Loss 1.1789   LearningRate 0.0025   Epoch: 18   Global Step: 189250   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:52:38,695-Speed 5458.24 samples/sec   Loss 1.1837   LearningRate 0.0025   Epoch: 18   Global Step: 189260   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:52:46,247-Speed 5424.64 samples/sec   Loss 1.1673   LearningRate 0.0025   Epoch: 18   Global Step: 189270   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:52:53,668-Speed 5520.11 samples/sec   Loss 1.1837   LearningRate 0.0025   Epoch: 18   Global Step: 189280   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:53:01,089-Speed 5520.72 samples/sec   Loss 1.1778   LearningRate 0.0025   Epoch: 18   Global Step: 189290   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:53:08,562-Speed 5481.83 samples/sec   Loss 1.1883   LearningRate 0.0025   Epoch: 18   Global Step: 189300   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:53:16,015-Speed 5495.95 samples/sec   Loss 1.1723   LearningRate 0.0025   Epoch: 18   Global Step: 189310   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:53:23,579-Speed 5415.76 samples/sec   Loss 1.1871   LearningRate 0.0025   Epoch: 18   Global Step: 189320   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:53:31,035-Speed 5494.52 samples/sec   Loss 1.1815   LearningRate 0.0025   Epoch: 18   Global Step: 189330   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:53:38,465-Speed 5513.77 samples/sec   Loss 1.1410   LearningRate 0.0025   Epoch: 18   Global Step: 189340   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:53:45,897-Speed 5512.43 samples/sec   Loss 1.1454   LearningRate 0.0025   Epoch: 18   Global Step: 189350   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:53:53,327-Speed 5513.29 samples/sec   Loss 1.1744   LearningRate 0.0025   Epoch: 18   Global Step: 189360   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:54:00,897-Speed 5411.62 samples/sec   Loss 1.1582   LearningRate 0.0025   Epoch: 18   Global Step: 189370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:54:08,380-Speed 5473.97 samples/sec   Loss 1.1864   LearningRate 0.0025   Epoch: 18   Global Step: 189380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:54:15,869-Speed 5469.97 samples/sec   Loss 1.1884   LearningRate 0.0025   Epoch: 18   Global Step: 189390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:54:23,393-Speed 5445.05 samples/sec   Loss 1.1772   LearningRate 0.0025   Epoch: 18   Global Step: 189400   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:54:30,906-Speed 5452.76 samples/sec   Loss 1.1678   LearningRate 0.0025   Epoch: 18   Global Step: 189410   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:54:38,324-Speed 5522.23 samples/sec   Loss 1.1664   LearningRate 0.0025   Epoch: 18   Global Step: 189420   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:54:45,732-Speed 5529.92 samples/sec   Loss 1.1859   LearningRate 0.0025   Epoch: 18   Global Step: 189430   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:54:53,111-Speed 5551.66 samples/sec   Loss 1.1721   LearningRate 0.0025   Epoch: 18   Global Step: 189440   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:55:00,561-Speed 5499.15 samples/sec   Loss 1.1759   LearningRate 0.0025   Epoch: 18   Global Step: 189450   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:55:08,150-Speed 5397.86 samples/sec   Loss 1.1622   LearningRate 0.0025   Epoch: 18   Global Step: 189460   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:55:15,633-Speed 5474.50 samples/sec   Loss 1.1709   LearningRate 0.0025   Epoch: 18   Global Step: 189470   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:55:23,061-Speed 5515.14 samples/sec   Loss 1.1945   LearningRate 0.0025   Epoch: 18   Global Step: 189480   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:55:30,491-Speed 5513.59 samples/sec   Loss 1.1953   LearningRate 0.0025   Epoch: 18   Global Step: 189490   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:55:37,916-Speed 5517.20 samples/sec   Loss 1.1509   LearningRate 0.0025   Epoch: 18   Global Step: 189500   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:55:45,350-Speed 5510.81 samples/sec   Loss 1.1878   LearningRate 0.0025   Epoch: 18   Global Step: 189510   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:55:52,841-Speed 5468.65 samples/sec   Loss 1.1665   LearningRate 0.0025   Epoch: 18   Global Step: 189520   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:56:00,283-Speed 5504.63 samples/sec   Loss 1.1582   LearningRate 0.0025   Epoch: 18   Global Step: 189530   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:56:07,747-Speed 5488.33 samples/sec   Loss 1.1805   LearningRate 0.0025   Epoch: 18   Global Step: 189540   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:56:15,620-Speed 5203.53 samples/sec   Loss 1.1526   LearningRate 0.0025   Epoch: 18   Global Step: 189550   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:56:23,132-Speed 5453.03 samples/sec   Loss 1.1832   LearningRate 0.0025   Epoch: 18   Global Step: 189560   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:56:30,757-Speed 5372.76 samples/sec   Loss 1.1665   LearningRate 0.0025   Epoch: 18   Global Step: 189570   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:56:38,257-Speed 5461.82 samples/sec   Loss 1.1705   LearningRate 0.0024   Epoch: 18   Global Step: 189580   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:56:45,746-Speed 5470.18 samples/sec   Loss 1.1869   LearningRate 0.0024   Epoch: 18   Global Step: 189590   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:56:53,243-Speed 5464.41 samples/sec   Loss 1.1701   LearningRate 0.0024   Epoch: 18   Global Step: 189600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 13:57:00,842-Speed 5390.84 samples/sec   Loss 1.1747   LearningRate 0.0024   Epoch: 18   Global Step: 189610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 13:57:08,333-Speed 5469.10 samples/sec   Loss 1.1696   LearningRate 0.0024   Epoch: 18   Global Step: 189620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 13:57:15,787-Speed 5495.52 samples/sec   Loss 1.1925   LearningRate 0.0024   Epoch: 18   Global Step: 189630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 13:57:23,229-Speed 5504.40 samples/sec   Loss 1.1733   LearningRate 0.0024   Epoch: 18   Global Step: 189640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 13:57:30,723-Speed 5466.68 samples/sec   Loss 1.1724   LearningRate 0.0024   Epoch: 18   Global Step: 189650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 13:57:38,215-Speed 5468.00 samples/sec   Loss 1.1585   LearningRate 0.0024   Epoch: 18   Global Step: 189660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 13:57:45,719-Speed 5459.02 samples/sec   Loss 1.1755   LearningRate 0.0024   Epoch: 18   Global Step: 189670   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:57:53,266-Speed 5428.16 samples/sec   Loss 1.1910   LearningRate 0.0024   Epoch: 18   Global Step: 189680   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:58:00,743-Speed 5478.63 samples/sec   Loss 1.1532   LearningRate 0.0024   Epoch: 18   Global Step: 189690   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:58:08,196-Speed 5496.85 samples/sec   Loss 1.1583   LearningRate 0.0024   Epoch: 18   Global Step: 189700   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:58:15,695-Speed 5462.68 samples/sec   Loss 1.1591   LearningRate 0.0024   Epoch: 18   Global Step: 189710   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:58:23,281-Speed 5400.21 samples/sec   Loss 1.1573   LearningRate 0.0024   Epoch: 18   Global Step: 189720   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:58:30,852-Speed 5411.00 samples/sec   Loss 1.1744   LearningRate 0.0024   Epoch: 18   Global Step: 189730   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:58:38,448-Speed 5393.39 samples/sec   Loss 1.1749   LearningRate 0.0024   Epoch: 18   Global Step: 189740   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:58:45,962-Speed 5451.54 samples/sec   Loss 1.1952   LearningRate 0.0024   Epoch: 18   Global Step: 189750   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:58:53,373-Speed 5527.63 samples/sec   Loss 1.1739   LearningRate 0.0024   Epoch: 18   Global Step: 189760   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 13:59:00,831-Speed 5493.13 samples/sec   Loss 1.1677   LearningRate 0.0024   Epoch: 18   Global Step: 189770   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:59:08,311-Speed 5476.72 samples/sec   Loss 1.1841   LearningRate 0.0024   Epoch: 18   Global Step: 189780   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:59:16,033-Speed 5304.85 samples/sec   Loss 1.1554   LearningRate 0.0024   Epoch: 18   Global Step: 189790   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:59:23,604-Speed 5410.77 samples/sec   Loss 1.1527   LearningRate 0.0024   Epoch: 18   Global Step: 189800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:59:31,212-Speed 5384.84 samples/sec   Loss 1.1660   LearningRate 0.0024   Epoch: 18   Global Step: 189810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:59:39,003-Speed 5257.74 samples/sec   Loss 1.1685   LearningRate 0.0024   Epoch: 18   Global Step: 189820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:59:46,762-Speed 5279.70 samples/sec   Loss 1.1752   LearningRate 0.0024   Epoch: 18   Global Step: 189830   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 13:59:54,307-Speed 5429.36 samples/sec   Loss 1.1638   LearningRate 0.0024   Epoch: 18   Global Step: 189840   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:00:01,814-Speed 5457.21 samples/sec   Loss 1.1842   LearningRate 0.0024   Epoch: 18   Global Step: 189850   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:00:09,396-Speed 5403.27 samples/sec   Loss 1.1748   LearningRate 0.0024   Epoch: 18   Global Step: 189860   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:00:17,147-Speed 5285.04 samples/sec   Loss 1.1651   LearningRate 0.0024   Epoch: 18   Global Step: 189870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 14:00:24,810-Speed 5346.16 samples/sec   Loss 1.1851   LearningRate 0.0024   Epoch: 18   Global Step: 189880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 14:00:32,370-Speed 5418.57 samples/sec   Loss 1.1603   LearningRate 0.0024   Epoch: 18   Global Step: 189890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 14:00:39,788-Speed 5522.49 samples/sec   Loss 1.1775   LearningRate 0.0024   Epoch: 18   Global Step: 189900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 14:00:47,203-Speed 5525.08 samples/sec   Loss 1.1382   LearningRate 0.0024   Epoch: 18   Global Step: 189910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 14:00:54,644-Speed 5505.39 samples/sec   Loss 1.1635   LearningRate 0.0024   Epoch: 18   Global Step: 189920   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:01:02,138-Speed 5466.21 samples/sec   Loss 1.1793   LearningRate 0.0024   Epoch: 18   Global Step: 189930   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:01:09,784-Speed 5358.52 samples/sec   Loss 1.1690   LearningRate 0.0024   Epoch: 18   Global Step: 189940   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:01:17,273-Speed 5469.16 samples/sec   Loss 1.1495   LearningRate 0.0023   Epoch: 18   Global Step: 189950   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:01:24,885-Speed 5382.37 samples/sec   Loss 1.1254   LearningRate 0.0023   Epoch: 18   Global Step: 189960   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:01:32,406-Speed 5446.14 samples/sec   Loss 1.1638   LearningRate 0.0023   Epoch: 18   Global Step: 189970   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:01:39,894-Speed 5470.94 samples/sec   Loss 1.1779   LearningRate 0.0023   Epoch: 18   Global Step: 189980   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:01:47,367-Speed 5481.94 samples/sec   Loss 1.1447   LearningRate 0.0023   Epoch: 18   Global Step: 189990   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:01:54,843-Speed 5479.46 samples/sec   Loss 1.1486   LearningRate 0.0023   Epoch: 18   Global Step: 190000   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:02:38,322-[lfw][190000]XNorm: 22.212047
Training: 2022-01-09 14:02:38,323-[lfw][190000]Accuracy-Flip: 0.99833+-0.00236
Training: 2022-01-09 14:02:38,323-[lfw][190000]Accuracy-Highest: 0.99850
Training: 2022-01-09 14:03:29,070-[cfp_fp][190000]XNorm: 21.831634
Training: 2022-01-09 14:03:29,071-[cfp_fp][190000]Accuracy-Flip: 0.99343+-0.00420
Training: 2022-01-09 14:03:29,072-[cfp_fp][190000]Accuracy-Highest: 0.99371
Training: 2022-01-09 14:04:12,595-[agedb_30][190000]XNorm: 22.882939
Training: 2022-01-09 14:04:12,596-[agedb_30][190000]Accuracy-Flip: 0.98483+-0.00626
Training: 2022-01-09 14:04:12,596-[agedb_30][190000]Accuracy-Highest: 0.98500
Training: 2022-01-09 14:04:20,135-Speed 281.92 samples/sec   Loss 1.1583   LearningRate 0.0023   Epoch: 18   Global Step: 190010   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:04:27,623-Speed 5471.14 samples/sec   Loss 1.1748   LearningRate 0.0023   Epoch: 18   Global Step: 190020   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:04:35,064-Speed 5504.66 samples/sec   Loss 1.1609   LearningRate 0.0023   Epoch: 18   Global Step: 190030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:04:42,656-Speed 5396.59 samples/sec   Loss 1.1825   LearningRate 0.0023   Epoch: 18   Global Step: 190040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:04:50,309-Speed 5352.18 samples/sec   Loss 1.1518   LearningRate 0.0023   Epoch: 18   Global Step: 190050   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:04:57,813-Speed 5459.87 samples/sec   Loss 1.1557   LearningRate 0.0023   Epoch: 18   Global Step: 190060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:05:05,322-Speed 5455.31 samples/sec   Loss 1.1467   LearningRate 0.0023   Epoch: 18   Global Step: 190070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:05:12,782-Speed 5491.50 samples/sec   Loss 1.1464   LearningRate 0.0023   Epoch: 18   Global Step: 190080   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:05:20,205-Speed 5518.14 samples/sec   Loss 1.1572   LearningRate 0.0023   Epoch: 18   Global Step: 190090   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:05:27,633-Speed 5515.45 samples/sec   Loss 1.1499   LearningRate 0.0023   Epoch: 18   Global Step: 190100   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:05:35,090-Speed 5493.40 samples/sec   Loss 1.1637   LearningRate 0.0023   Epoch: 18   Global Step: 190110   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:05:42,626-Speed 5435.89 samples/sec   Loss 1.1666   LearningRate 0.0023   Epoch: 18   Global Step: 190120   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:05:50,242-Speed 5378.75 samples/sec   Loss 1.1357   LearningRate 0.0023   Epoch: 18   Global Step: 190130   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:05:57,805-Speed 5416.92 samples/sec   Loss 1.1449   LearningRate 0.0023   Epoch: 18   Global Step: 190140   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:06:05,328-Speed 5445.11 samples/sec   Loss 1.1563   LearningRate 0.0023   Epoch: 18   Global Step: 190150   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:06:12,756-Speed 5514.97 samples/sec   Loss 1.1337   LearningRate 0.0023   Epoch: 18   Global Step: 190160   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:06:20,177-Speed 5520.23 samples/sec   Loss 1.1592   LearningRate 0.0023   Epoch: 18   Global Step: 190170   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:06:27,639-Speed 5490.51 samples/sec   Loss 1.1311   LearningRate 0.0023   Epoch: 18   Global Step: 190180   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:06:35,165-Speed 5443.00 samples/sec   Loss 1.1361   LearningRate 0.0023   Epoch: 18   Global Step: 190190   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:06:42,613-Speed 5500.43 samples/sec   Loss 1.1636   LearningRate 0.0023   Epoch: 18   Global Step: 190200   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:06:50,091-Speed 5477.85 samples/sec   Loss 1.1446   LearningRate 0.0023   Epoch: 18   Global Step: 190210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:06:57,565-Speed 5481.35 samples/sec   Loss 1.1492   LearningRate 0.0023   Epoch: 18   Global Step: 190220   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:07:05,113-Speed 5427.21 samples/sec   Loss 1.1308   LearningRate 0.0023   Epoch: 18   Global Step: 190230   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:07:12,610-Speed 5464.09 samples/sec   Loss 1.1480   LearningRate 0.0023   Epoch: 18   Global Step: 190240   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:07:20,150-Speed 5433.15 samples/sec   Loss 1.1569   LearningRate 0.0023   Epoch: 18   Global Step: 190250   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:07:27,675-Speed 5444.20 samples/sec   Loss 1.1485   LearningRate 0.0023   Epoch: 18   Global Step: 190260   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:07:35,137-Speed 5489.94 samples/sec   Loss 1.1682   LearningRate 0.0023   Epoch: 18   Global Step: 190270   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:07:42,663-Speed 5443.37 samples/sec   Loss 1.1531   LearningRate 0.0023   Epoch: 18   Global Step: 190280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 14:07:50,161-Speed 5463.87 samples/sec   Loss 1.1474   LearningRate 0.0023   Epoch: 18   Global Step: 190290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 14:07:57,592-Speed 5513.06 samples/sec   Loss 1.1465   LearningRate 0.0023   Epoch: 18   Global Step: 190300   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:08:05,100-Speed 5455.89 samples/sec   Loss 1.1458   LearningRate 0.0023   Epoch: 18   Global Step: 190310   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:08:12,592-Speed 5467.74 samples/sec   Loss 1.1583   LearningRate 0.0022   Epoch: 18   Global Step: 190320   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:08:20,049-Speed 5494.24 samples/sec   Loss 1.1733   LearningRate 0.0022   Epoch: 18   Global Step: 190330   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:08:27,628-Speed 5404.88 samples/sec   Loss 1.1216   LearningRate 0.0022   Epoch: 18   Global Step: 190340   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:08:35,204-Speed 5407.24 samples/sec   Loss 1.1451   LearningRate 0.0022   Epoch: 18   Global Step: 190350   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:08:42,699-Speed 5465.81 samples/sec   Loss 1.1443   LearningRate 0.0022   Epoch: 18   Global Step: 190360   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:08:50,356-Speed 5350.32 samples/sec   Loss 1.1204   LearningRate 0.0022   Epoch: 18   Global Step: 190370   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:08:57,938-Speed 5402.69 samples/sec   Loss 1.1375   LearningRate 0.0022   Epoch: 18   Global Step: 190380   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:09:05,371-Speed 5511.20 samples/sec   Loss 1.1260   LearningRate 0.0022   Epoch: 18   Global Step: 190390   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:09:12,827-Speed 5494.39 samples/sec   Loss 1.1700   LearningRate 0.0022   Epoch: 18   Global Step: 190400   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:09:20,266-Speed 5507.10 samples/sec   Loss 1.1488   LearningRate 0.0022   Epoch: 18   Global Step: 190410   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:09:27,755-Speed 5469.55 samples/sec   Loss 1.1704   LearningRate 0.0022   Epoch: 18   Global Step: 190420   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:09:35,296-Speed 5432.63 samples/sec   Loss 1.1655   LearningRate 0.0022   Epoch: 18   Global Step: 190430   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:09:42,820-Speed 5444.93 samples/sec   Loss 1.1488   LearningRate 0.0022   Epoch: 18   Global Step: 190440   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:09:50,349-Speed 5440.94 samples/sec   Loss 1.1335   LearningRate 0.0022   Epoch: 18   Global Step: 190450   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:09:57,894-Speed 5428.77 samples/sec   Loss 1.1312   LearningRate 0.0022   Epoch: 18   Global Step: 190460   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:10:05,348-Speed 5496.43 samples/sec   Loss 1.1532   LearningRate 0.0022   Epoch: 18   Global Step: 190470   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:10:12,876-Speed 5441.88 samples/sec   Loss 1.1589   LearningRate 0.0022   Epoch: 18   Global Step: 190480   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:10:20,337-Speed 5490.24 samples/sec   Loss 1.1320   LearningRate 0.0022   Epoch: 18   Global Step: 190490   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:10:27,816-Speed 5477.62 samples/sec   Loss 1.1612   LearningRate 0.0022   Epoch: 18   Global Step: 190500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 14:10:35,340-Speed 5444.13 samples/sec   Loss 1.1548   LearningRate 0.0022   Epoch: 18   Global Step: 190510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 14:10:42,908-Speed 5412.91 samples/sec   Loss 1.1522   LearningRate 0.0022   Epoch: 18   Global Step: 190520   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:10:50,466-Speed 5420.80 samples/sec   Loss 1.1263   LearningRate 0.0022   Epoch: 18   Global Step: 190530   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:10:57,937-Speed 5482.94 samples/sec   Loss 1.1632   LearningRate 0.0022   Epoch: 18   Global Step: 190540   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:11:05,406-Speed 5484.36 samples/sec   Loss 1.1616   LearningRate 0.0022   Epoch: 18   Global Step: 190550   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:11:12,951-Speed 5430.01 samples/sec   Loss 1.1432   LearningRate 0.0022   Epoch: 18   Global Step: 190560   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:11:20,485-Speed 5437.70 samples/sec   Loss 1.1134   LearningRate 0.0022   Epoch: 18   Global Step: 190570   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:11:27,977-Speed 5467.75 samples/sec   Loss 1.1476   LearningRate 0.0022   Epoch: 18   Global Step: 190580   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:11:35,437-Speed 5490.69 samples/sec   Loss 1.1476   LearningRate 0.0022   Epoch: 18   Global Step: 190590   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:11:42,955-Speed 5449.62 samples/sec   Loss 1.1246   LearningRate 0.0022   Epoch: 18   Global Step: 190600   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:11:50,431-Speed 5479.13 samples/sec   Loss 1.1536   LearningRate 0.0022   Epoch: 18   Global Step: 190610   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:11:57,983-Speed 5425.07 samples/sec   Loss 1.1222   LearningRate 0.0022   Epoch: 18   Global Step: 190620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 14:12:05,517-Speed 5437.19 samples/sec   Loss 1.1527   LearningRate 0.0022   Epoch: 18   Global Step: 190630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 14:12:12,973-Speed 5493.99 samples/sec   Loss 1.1421   LearningRate 0.0022   Epoch: 18   Global Step: 190640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 14:12:20,501-Speed 5441.56 samples/sec   Loss 1.1212   LearningRate 0.0022   Epoch: 18   Global Step: 190650   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:12:27,982-Speed 5476.19 samples/sec   Loss 1.1330   LearningRate 0.0022   Epoch: 18   Global Step: 190660   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:12:35,587-Speed 5386.82 samples/sec   Loss 1.1361   LearningRate 0.0022   Epoch: 18   Global Step: 190670   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:12:43,220-Speed 5367.09 samples/sec   Loss 1.1305   LearningRate 0.0022   Epoch: 18   Global Step: 190680   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:12:50,849-Speed 5369.08 samples/sec   Loss 1.1210   LearningRate 0.0022   Epoch: 18   Global Step: 190690   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:12:58,292-Speed 5504.14 samples/sec   Loss 1.1534   LearningRate 0.0022   Epoch: 18   Global Step: 190700   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:13:05,862-Speed 5411.67 samples/sec   Loss 1.1326   LearningRate 0.0021   Epoch: 18   Global Step: 190710   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:13:13,240-Speed 5552.67 samples/sec   Loss 1.1257   LearningRate 0.0021   Epoch: 18   Global Step: 190720   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:13:20,772-Speed 5439.19 samples/sec   Loss 1.1500   LearningRate 0.0021   Epoch: 18   Global Step: 190730   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:13:28,313-Speed 5431.70 samples/sec   Loss 1.1379   LearningRate 0.0021   Epoch: 18   Global Step: 190740   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:13:36,128-Speed 5242.01 samples/sec   Loss 1.1208   LearningRate 0.0021   Epoch: 18   Global Step: 190750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 14:13:43,584-Speed 5494.99 samples/sec   Loss 1.1230   LearningRate 0.0021   Epoch: 18   Global Step: 190760   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:13:51,047-Speed 5488.72 samples/sec   Loss 1.1317   LearningRate 0.0021   Epoch: 18   Global Step: 190770   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:13:58,603-Speed 5421.90 samples/sec   Loss 1.1231   LearningRate 0.0021   Epoch: 18   Global Step: 190780   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:14:06,180-Speed 5406.85 samples/sec   Loss 1.1356   LearningRate 0.0021   Epoch: 18   Global Step: 190790   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:14:13,742-Speed 5416.74 samples/sec   Loss 1.1629   LearningRate 0.0021   Epoch: 18   Global Step: 190800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:14:21,172-Speed 5513.66 samples/sec   Loss 1.1228   LearningRate 0.0021   Epoch: 18   Global Step: 190810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:14:28,648-Speed 5479.45 samples/sec   Loss 1.1555   LearningRate 0.0021   Epoch: 18   Global Step: 190820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:14:36,137-Speed 5470.33 samples/sec   Loss 1.1359   LearningRate 0.0021   Epoch: 18   Global Step: 190830   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:14:43,741-Speed 5387.59 samples/sec   Loss 1.1221   LearningRate 0.0021   Epoch: 18   Global Step: 190840   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:14:51,209-Speed 5485.49 samples/sec   Loss 1.1289   LearningRate 0.0021   Epoch: 18   Global Step: 190850   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:14:58,672-Speed 5488.97 samples/sec   Loss 1.1280   LearningRate 0.0021   Epoch: 18   Global Step: 190860   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:15:06,226-Speed 5423.43 samples/sec   Loss 1.1428   LearningRate 0.0021   Epoch: 18   Global Step: 190870   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:15:13,909-Speed 5331.28 samples/sec   Loss 1.1178   LearningRate 0.0021   Epoch: 18   Global Step: 190880   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:15:21,535-Speed 5371.88 samples/sec   Loss 1.1398   LearningRate 0.0021   Epoch: 18   Global Step: 190890   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:15:29,025-Speed 5469.14 samples/sec   Loss 1.1182   LearningRate 0.0021   Epoch: 18   Global Step: 190900   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:15:36,634-Speed 5384.57 samples/sec   Loss 1.1454   LearningRate 0.0021   Epoch: 18   Global Step: 190910   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:15:44,150-Speed 5450.05 samples/sec   Loss 1.1163   LearningRate 0.0021   Epoch: 18   Global Step: 190920   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:15:51,611-Speed 5490.69 samples/sec   Loss 1.1108   LearningRate 0.0021   Epoch: 18   Global Step: 190930   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:15:59,071-Speed 5491.19 samples/sec   Loss 1.1152   LearningRate 0.0021   Epoch: 18   Global Step: 190940   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:16:06,793-Speed 5305.26 samples/sec   Loss 1.1276   LearningRate 0.0021   Epoch: 18   Global Step: 190950   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:16:14,357-Speed 5416.22 samples/sec   Loss 1.1311   LearningRate 0.0021   Epoch: 18   Global Step: 190960   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:16:21,879-Speed 5446.00 samples/sec   Loss 1.1168   LearningRate 0.0021   Epoch: 18   Global Step: 190970   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:16:29,361-Speed 5475.05 samples/sec   Loss 1.1254   LearningRate 0.0021   Epoch: 18   Global Step: 190980   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:16:36,861-Speed 5461.85 samples/sec   Loss 1.1192   LearningRate 0.0021   Epoch: 18   Global Step: 190990   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:16:44,311-Speed 5499.17 samples/sec   Loss 1.1380   LearningRate 0.0021   Epoch: 18   Global Step: 191000   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:16:51,762-Speed 5497.50 samples/sec   Loss 1.1279   LearningRate 0.0021   Epoch: 18   Global Step: 191010   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:16:59,370-Speed 5384.65 samples/sec   Loss 1.1576   LearningRate 0.0021   Epoch: 18   Global Step: 191020   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:17:06,922-Speed 5424.70 samples/sec   Loss 1.1240   LearningRate 0.0021   Epoch: 18   Global Step: 191030   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:17:14,409-Speed 5471.29 samples/sec   Loss 1.1208   LearningRate 0.0021   Epoch: 18   Global Step: 191040   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:17:21,929-Speed 5447.52 samples/sec   Loss 1.1246   LearningRate 0.0021   Epoch: 18   Global Step: 191050   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:17:29,437-Speed 5456.54 samples/sec   Loss 1.1034   LearningRate 0.0021   Epoch: 18   Global Step: 191060   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:17:36,964-Speed 5442.18 samples/sec   Loss 1.0985   LearningRate 0.0021   Epoch: 18   Global Step: 191070   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:17:44,404-Speed 5506.69 samples/sec   Loss 1.1462   LearningRate 0.0021   Epoch: 18   Global Step: 191080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:17:51,892-Speed 5470.84 samples/sec   Loss 1.1475   LearningRate 0.0021   Epoch: 18   Global Step: 191090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:17:59,530-Speed 5363.03 samples/sec   Loss 1.1230   LearningRate 0.0020   Epoch: 18   Global Step: 191100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:18:07,188-Speed 5349.29 samples/sec   Loss 1.1065   LearningRate 0.0020   Epoch: 18   Global Step: 191110   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:18:14,723-Speed 5436.48 samples/sec   Loss 1.1052   LearningRate 0.0020   Epoch: 18   Global Step: 191120   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:18:22,245-Speed 5446.61 samples/sec   Loss 1.1048   LearningRate 0.0020   Epoch: 18   Global Step: 191130   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:18:29,698-Speed 5496.42 samples/sec   Loss 1.1314   LearningRate 0.0020   Epoch: 18   Global Step: 191140   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:18:37,181-Speed 5474.53 samples/sec   Loss 1.1207   LearningRate 0.0020   Epoch: 18   Global Step: 191150   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:18:44,770-Speed 5397.59 samples/sec   Loss 1.1205   LearningRate 0.0020   Epoch: 18   Global Step: 191160   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:18:52,266-Speed 5464.83 samples/sec   Loss 1.1223   LearningRate 0.0020   Epoch: 18   Global Step: 191170   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:18:59,746-Speed 5476.82 samples/sec   Loss 1.1237   LearningRate 0.0020   Epoch: 18   Global Step: 191180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 14:19:07,250-Speed 5459.64 samples/sec   Loss 1.1053   LearningRate 0.0020   Epoch: 18   Global Step: 191190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 14:19:14,889-Speed 5361.99 samples/sec   Loss 1.1205   LearningRate 0.0020   Epoch: 18   Global Step: 191200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 14:19:22,329-Speed 5506.64 samples/sec   Loss 1.1432   LearningRate 0.0020   Epoch: 18   Global Step: 191210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:19:29,846-Speed 5450.66 samples/sec   Loss 1.1026   LearningRate 0.0020   Epoch: 18   Global Step: 191220   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:19:37,318-Speed 5482.37 samples/sec   Loss 1.1378   LearningRate 0.0020   Epoch: 18   Global Step: 191230   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:19:44,860-Speed 5431.77 samples/sec   Loss 1.1261   LearningRate 0.0020   Epoch: 18   Global Step: 191240   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:19:52,352-Speed 5467.63 samples/sec   Loss 1.1103   LearningRate 0.0020   Epoch: 18   Global Step: 191250   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:19:59,885-Speed 5437.77 samples/sec   Loss 1.1224   LearningRate 0.0020   Epoch: 18   Global Step: 191260   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:20:07,537-Speed 5354.18 samples/sec   Loss 1.1052   LearningRate 0.0020   Epoch: 18   Global Step: 191270   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:20:15,168-Speed 5367.97 samples/sec   Loss 1.1235   LearningRate 0.0020   Epoch: 18   Global Step: 191280   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:20:22,855-Speed 5329.05 samples/sec   Loss 1.1323   LearningRate 0.0020   Epoch: 18   Global Step: 191290   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:20:30,546-Speed 5326.77 samples/sec   Loss 1.1287   LearningRate 0.0020   Epoch: 18   Global Step: 191300   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:20:38,095-Speed 5426.28 samples/sec   Loss 1.1129   LearningRate 0.0020   Epoch: 18   Global Step: 191310   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:20:45,689-Speed 5394.48 samples/sec   Loss 1.0883   LearningRate 0.0020   Epoch: 18   Global Step: 191320   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:20:53,128-Speed 5506.90 samples/sec   Loss 1.1107   LearningRate 0.0020   Epoch: 18   Global Step: 191330   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:21:00,530-Speed 5534.03 samples/sec   Loss 1.1124   LearningRate 0.0020   Epoch: 18   Global Step: 191340   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:21:07,980-Speed 5498.61 samples/sec   Loss 1.1069   LearningRate 0.0020   Epoch: 18   Global Step: 191350   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:21:15,717-Speed 5295.31 samples/sec   Loss 1.1257   LearningRate 0.0020   Epoch: 18   Global Step: 191360   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:21:23,259-Speed 5431.38 samples/sec   Loss 1.1150   LearningRate 0.0020   Epoch: 18   Global Step: 191370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:21:30,859-Speed 5390.16 samples/sec   Loss 1.1039   LearningRate 0.0020   Epoch: 18   Global Step: 191380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:21:38,456-Speed 5392.33 samples/sec   Loss 1.1138   LearningRate 0.0020   Epoch: 18   Global Step: 191390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:21:45,964-Speed 5456.35 samples/sec   Loss 1.1153   LearningRate 0.0020   Epoch: 18   Global Step: 191400   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:21:53,447-Speed 5474.38 samples/sec   Loss 1.1321   LearningRate 0.0020   Epoch: 18   Global Step: 191410   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:22:01,138-Speed 5326.74 samples/sec   Loss 1.1273   LearningRate 0.0020   Epoch: 18   Global Step: 191420   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:22:08,645-Speed 5456.46 samples/sec   Loss 1.1034   LearningRate 0.0020   Epoch: 18   Global Step: 191430   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:22:16,201-Speed 5422.30 samples/sec   Loss 1.0974   LearningRate 0.0020   Epoch: 18   Global Step: 191440   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:22:23,707-Speed 5457.09 samples/sec   Loss 1.1105   LearningRate 0.0020   Epoch: 18   Global Step: 191450   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:22:31,283-Speed 5407.27 samples/sec   Loss 1.1051   LearningRate 0.0020   Epoch: 18   Global Step: 191460   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:22:38,763-Speed 5476.73 samples/sec   Loss 1.1358   LearningRate 0.0020   Epoch: 18   Global Step: 191470   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:22:46,199-Speed 5509.71 samples/sec   Loss 1.1340   LearningRate 0.0020   Epoch: 18   Global Step: 191480   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:22:53,705-Speed 5457.46 samples/sec   Loss 1.1262   LearningRate 0.0020   Epoch: 18   Global Step: 191490   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:23:01,251-Speed 5429.01 samples/sec   Loss 1.1401   LearningRate 0.0019   Epoch: 18   Global Step: 191500   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:23:08,805-Speed 5422.36 samples/sec   Loss 1.1166   LearningRate 0.0019   Epoch: 18   Global Step: 191510   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:23:16,416-Speed 5383.00 samples/sec   Loss 1.1267   LearningRate 0.0019   Epoch: 18   Global Step: 191520   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:23:23,923-Speed 5457.17 samples/sec   Loss 1.1185   LearningRate 0.0019   Epoch: 18   Global Step: 191530   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:23:31,569-Speed 5357.77 samples/sec   Loss 1.1207   LearningRate 0.0019   Epoch: 18   Global Step: 191540   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:23:39,217-Speed 5356.59 samples/sec   Loss 1.0921   LearningRate 0.0019   Epoch: 18   Global Step: 191550   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:23:46,779-Speed 5417.17 samples/sec   Loss 1.1105   LearningRate 0.0019   Epoch: 18   Global Step: 191560   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:23:54,297-Speed 5449.51 samples/sec   Loss 1.0997   LearningRate 0.0019   Epoch: 18   Global Step: 191570   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:24:01,865-Speed 5412.94 samples/sec   Loss 1.1138   LearningRate 0.0019   Epoch: 18   Global Step: 191580   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:24:09,416-Speed 5424.48 samples/sec   Loss 1.1102   LearningRate 0.0019   Epoch: 18   Global Step: 191590   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:24:16,964-Speed 5427.95 samples/sec   Loss 1.1132   LearningRate 0.0019   Epoch: 18   Global Step: 191600   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:24:24,505-Speed 5431.98 samples/sec   Loss 1.1008   LearningRate 0.0019   Epoch: 18   Global Step: 191610   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:24:32,125-Speed 5376.48 samples/sec   Loss 1.0865   LearningRate 0.0019   Epoch: 18   Global Step: 191620   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:24:39,590-Speed 5486.98 samples/sec   Loss 1.1082   LearningRate 0.0019   Epoch: 18   Global Step: 191630   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:24:47,084-Speed 5466.82 samples/sec   Loss 1.1281   LearningRate 0.0019   Epoch: 18   Global Step: 191640   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:24:54,715-Speed 5368.51 samples/sec   Loss 1.1111   LearningRate 0.0019   Epoch: 18   Global Step: 191650   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:25:02,229-Speed 5451.39 samples/sec   Loss 1.1078   LearningRate 0.0019   Epoch: 18   Global Step: 191660   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:25:09,740-Speed 5454.44 samples/sec   Loss 1.1176   LearningRate 0.0019   Epoch: 18   Global Step: 191670   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:25:17,225-Speed 5472.81 samples/sec   Loss 1.1123   LearningRate 0.0019   Epoch: 18   Global Step: 191680   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:25:24,687-Speed 5490.45 samples/sec   Loss 1.1105   LearningRate 0.0019   Epoch: 18   Global Step: 191690   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:25:32,092-Speed 5531.98 samples/sec   Loss 1.1084   LearningRate 0.0019   Epoch: 18   Global Step: 191700   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:25:39,601-Speed 5455.28 samples/sec   Loss 1.1022   LearningRate 0.0019   Epoch: 18   Global Step: 191710   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:25:47,182-Speed 5403.54 samples/sec   Loss 1.0970   LearningRate 0.0019   Epoch: 18   Global Step: 191720   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:25:54,650-Speed 5485.85 samples/sec   Loss 1.1174   LearningRate 0.0019   Epoch: 18   Global Step: 191730   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:26:02,225-Speed 5408.07 samples/sec   Loss 1.1100   LearningRate 0.0019   Epoch: 18   Global Step: 191740   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:26:09,999-Speed 5268.91 samples/sec   Loss 1.1221   LearningRate 0.0019   Epoch: 18   Global Step: 191750   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:26:17,459-Speed 5491.80 samples/sec   Loss 1.0991   LearningRate 0.0019   Epoch: 18   Global Step: 191760   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:26:24,994-Speed 5437.14 samples/sec   Loss 1.1236   LearningRate 0.0019   Epoch: 18   Global Step: 191770   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:26:32,539-Speed 5429.05 samples/sec   Loss 1.0831   LearningRate 0.0019   Epoch: 18   Global Step: 191780   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:26:40,074-Speed 5437.34 samples/sec   Loss 1.1271   LearningRate 0.0019   Epoch: 18   Global Step: 191790   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:26:47,456-Speed 5548.74 samples/sec   Loss 1.1310   LearningRate 0.0019   Epoch: 18   Global Step: 191800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:26:55,052-Speed 5393.12 samples/sec   Loss 1.0900   LearningRate 0.0019   Epoch: 18   Global Step: 191810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:27:02,769-Speed 5308.54 samples/sec   Loss 1.1146   LearningRate 0.0019   Epoch: 18   Global Step: 191820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:27:10,221-Speed 5497.40 samples/sec   Loss 1.1011   LearningRate 0.0019   Epoch: 18   Global Step: 191830   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:27:17,703-Speed 5475.07 samples/sec   Loss 1.0953   LearningRate 0.0019   Epoch: 18   Global Step: 191840   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:27:25,088-Speed 5547.76 samples/sec   Loss 1.1217   LearningRate 0.0019   Epoch: 18   Global Step: 191850   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:27:32,534-Speed 5501.36 samples/sec   Loss 1.1190   LearningRate 0.0019   Epoch: 18   Global Step: 191860   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:27:40,015-Speed 5476.11 samples/sec   Loss 1.0998   LearningRate 0.0019   Epoch: 18   Global Step: 191870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 14:27:47,468-Speed 5496.18 samples/sec   Loss 1.0964   LearningRate 0.0019   Epoch: 18   Global Step: 191880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 14:27:54,938-Speed 5484.15 samples/sec   Loss 1.0967   LearningRate 0.0019   Epoch: 18   Global Step: 191890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 14:28:02,428-Speed 5469.55 samples/sec   Loss 1.0922   LearningRate 0.0019   Epoch: 18   Global Step: 191900   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:28:09,905-Speed 5479.03 samples/sec   Loss 1.1068   LearningRate 0.0018   Epoch: 18   Global Step: 191910   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:28:17,381-Speed 5479.39 samples/sec   Loss 1.0809   LearningRate 0.0018   Epoch: 18   Global Step: 191920   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:28:24,834-Speed 5496.31 samples/sec   Loss 1.0961   LearningRate 0.0018   Epoch: 18   Global Step: 191930   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:28:32,334-Speed 5462.27 samples/sec   Loss 1.1162   LearningRate 0.0018   Epoch: 18   Global Step: 191940   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:28:39,762-Speed 5515.01 samples/sec   Loss 1.0912   LearningRate 0.0018   Epoch: 18   Global Step: 191950   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:28:47,216-Speed 5495.55 samples/sec   Loss 1.0973   LearningRate 0.0018   Epoch: 18   Global Step: 191960   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:28:54,848-Speed 5367.79 samples/sec   Loss 1.1096   LearningRate 0.0018   Epoch: 18   Global Step: 191970   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:29:02,361-Speed 5452.74 samples/sec   Loss 1.0868   LearningRate 0.0018   Epoch: 18   Global Step: 191980   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:29:09,863-Speed 5460.83 samples/sec   Loss 1.0998   LearningRate 0.0018   Epoch: 18   Global Step: 191990   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:29:17,337-Speed 5481.28 samples/sec   Loss 1.0940   LearningRate 0.0018   Epoch: 18   Global Step: 192000   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:30:01,117-[lfw][192000]XNorm: 22.344807
Training: 2022-01-09 14:30:01,118-[lfw][192000]Accuracy-Flip: 0.99833+-0.00236
Training: 2022-01-09 14:30:01,118-[lfw][192000]Accuracy-Highest: 0.99850
Training: 2022-01-09 14:30:51,921-[cfp_fp][192000]XNorm: 22.083677
Training: 2022-01-09 14:30:51,922-[cfp_fp][192000]Accuracy-Flip: 0.99443+-0.00341
Training: 2022-01-09 14:30:51,922-[cfp_fp][192000]Accuracy-Highest: 0.99443
Training: 2022-01-09 14:31:35,781-[agedb_30][192000]XNorm: 22.929586
Training: 2022-01-09 14:31:35,781-[agedb_30][192000]Accuracy-Flip: 0.98617+-0.00619
Training: 2022-01-09 14:31:35,782-[agedb_30][192000]Accuracy-Highest: 0.98617
Training: 2022-01-09 14:31:43,305-Speed 280.61 samples/sec   Loss 1.0934   LearningRate 0.0018   Epoch: 18   Global Step: 192010   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:31:50,859-Speed 5422.79 samples/sec   Loss 1.0835   LearningRate 0.0018   Epoch: 18   Global Step: 192020   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:31:58,388-Speed 5441.54 samples/sec   Loss 1.0993   LearningRate 0.0018   Epoch: 18   Global Step: 192030   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:32:05,951-Speed 5416.40 samples/sec   Loss 1.0793   LearningRate 0.0018   Epoch: 18   Global Step: 192040   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:32:13,553-Speed 5388.22 samples/sec   Loss 1.0838   LearningRate 0.0018   Epoch: 18   Global Step: 192050   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:32:21,054-Speed 5461.29 samples/sec   Loss 1.0829   LearningRate 0.0018   Epoch: 18   Global Step: 192060   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:32:28,686-Speed 5368.00 samples/sec   Loss 1.0745   LearningRate 0.0018   Epoch: 18   Global Step: 192070   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:32:36,169-Speed 5474.52 samples/sec   Loss 1.0968   LearningRate 0.0018   Epoch: 18   Global Step: 192080   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 14:32:43,774-Speed 5386.64 samples/sec   Loss 1.0893   LearningRate 0.0018   Epoch: 18   Global Step: 192090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:32:51,243-Speed 5484.66 samples/sec   Loss 1.1081   LearningRate 0.0018   Epoch: 18   Global Step: 192100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 14:32:58,718-Speed 5480.65 samples/sec   Loss 1.0916   LearningRate 0.0018   Epoch: 18   Global Step: 192110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:33:06,375-Speed 5349.45 samples/sec   Loss 1.0764   LearningRate 0.0018   Epoch: 18   Global Step: 192120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:33:14,000-Speed 5372.19 samples/sec   Loss 1.0854   LearningRate 0.0018   Epoch: 18   Global Step: 192130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:33:21,622-Speed 5374.98 samples/sec   Loss 1.0836   LearningRate 0.0018   Epoch: 18   Global Step: 192140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:33:29,072-Speed 5498.99 samples/sec   Loss 1.0895   LearningRate 0.0018   Epoch: 18   Global Step: 192150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:33:36,513-Speed 5504.72 samples/sec   Loss 1.1269   LearningRate 0.0018   Epoch: 18   Global Step: 192160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:33:44,020-Speed 5457.27 samples/sec   Loss 1.0903   LearningRate 0.0018   Epoch: 18   Global Step: 192170   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:33:51,563-Speed 5430.34 samples/sec   Loss 1.1141   LearningRate 0.0018   Epoch: 18   Global Step: 192180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:33:59,088-Speed 5444.62 samples/sec   Loss 1.0849   LearningRate 0.0018   Epoch: 18   Global Step: 192190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 14:34:06,601-Speed 5452.19 samples/sec   Loss 1.0990   LearningRate 0.0018   Epoch: 18   Global Step: 192200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 14:34:14,140-Speed 5433.61 samples/sec   Loss 1.0858   LearningRate 0.0018   Epoch: 18   Global Step: 192210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 14:34:21,691-Speed 5424.97 samples/sec   Loss 1.0946   LearningRate 0.0018   Epoch: 18   Global Step: 192220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 14:34:29,196-Speed 5458.85 samples/sec   Loss 1.1040   LearningRate 0.0018   Epoch: 18   Global Step: 192230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:34:36,662-Speed 5487.16 samples/sec   Loss 1.0866   LearningRate 0.0018   Epoch: 18   Global Step: 192240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:34:44,168-Speed 5457.27 samples/sec   Loss 1.0879   LearningRate 0.0018   Epoch: 18   Global Step: 192250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:34:51,658-Speed 5469.22 samples/sec   Loss 1.1032   LearningRate 0.0018   Epoch: 18   Global Step: 192260   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:34:59,108-Speed 5499.35 samples/sec   Loss 1.0756   LearningRate 0.0018   Epoch: 18   Global Step: 192270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:35:06,630-Speed 5445.91 samples/sec   Loss 1.0972   LearningRate 0.0018   Epoch: 18   Global Step: 192280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:35:14,200-Speed 5411.39 samples/sec   Loss 1.0825   LearningRate 0.0018   Epoch: 18   Global Step: 192290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:35:21,674-Speed 5481.04 samples/sec   Loss 1.1086   LearningRate 0.0018   Epoch: 18   Global Step: 192300   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:35:29,162-Speed 5470.81 samples/sec   Loss 1.0938   LearningRate 0.0018   Epoch: 18   Global Step: 192310   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:35:36,699-Speed 5434.82 samples/sec   Loss 1.0949   LearningRate 0.0018   Epoch: 18   Global Step: 192320   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:35:44,180-Speed 5476.10 samples/sec   Loss 1.0854   LearningRate 0.0018   Epoch: 18   Global Step: 192330   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:35:51,786-Speed 5385.59 samples/sec   Loss 1.0835   LearningRate 0.0017   Epoch: 18   Global Step: 192340   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:35:59,513-Speed 5301.94 samples/sec   Loss 1.0833   LearningRate 0.0017   Epoch: 18   Global Step: 192350   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:36:07,049-Speed 5435.63 samples/sec   Loss 1.0999   LearningRate 0.0017   Epoch: 18   Global Step: 192360   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:36:14,636-Speed 5399.25 samples/sec   Loss 1.1032   LearningRate 0.0017   Epoch: 18   Global Step: 192370   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:36:22,106-Speed 5484.47 samples/sec   Loss 1.1002   LearningRate 0.0017   Epoch: 18   Global Step: 192380   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:36:29,585-Speed 5477.27 samples/sec   Loss 1.0821   LearningRate 0.0017   Epoch: 18   Global Step: 192390   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:36:37,079-Speed 5466.76 samples/sec   Loss 1.0838   LearningRate 0.0017   Epoch: 18   Global Step: 192400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:36:44,692-Speed 5380.59 samples/sec   Loss 1.1032   LearningRate 0.0017   Epoch: 18   Global Step: 192410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:36:52,177-Speed 5472.72 samples/sec   Loss 1.1005   LearningRate 0.0017   Epoch: 18   Global Step: 192420   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:36:59,666-Speed 5470.20 samples/sec   Loss 1.0899   LearningRate 0.0017   Epoch: 18   Global Step: 192430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:37:07,287-Speed 5375.26 samples/sec   Loss 1.0816   LearningRate 0.0017   Epoch: 18   Global Step: 192440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:37:14,848-Speed 5418.22 samples/sec   Loss 1.0950   LearningRate 0.0017   Epoch: 18   Global Step: 192450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:37:22,370-Speed 5445.97 samples/sec   Loss 1.0899   LearningRate 0.0017   Epoch: 18   Global Step: 192460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:37:29,864-Speed 5467.12 samples/sec   Loss 1.1230   LearningRate 0.0017   Epoch: 18   Global Step: 192470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:37:37,387-Speed 5445.30 samples/sec   Loss 1.0760   LearningRate 0.0017   Epoch: 18   Global Step: 192480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:37:44,962-Speed 5407.71 samples/sec   Loss 1.0897   LearningRate 0.0017   Epoch: 18   Global Step: 192490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:37:52,507-Speed 5429.70 samples/sec   Loss 1.0877   LearningRate 0.0017   Epoch: 18   Global Step: 192500   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:38:00,130-Speed 5373.71 samples/sec   Loss 1.0754   LearningRate 0.0017   Epoch: 18   Global Step: 192510   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:38:07,730-Speed 5390.34 samples/sec   Loss 1.0932   LearningRate 0.0017   Epoch: 18   Global Step: 192520   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:38:15,250-Speed 5448.03 samples/sec   Loss 1.0915   LearningRate 0.0017   Epoch: 18   Global Step: 192530   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:38:22,691-Speed 5505.10 samples/sec   Loss 1.0800   LearningRate 0.0017   Epoch: 18   Global Step: 192540   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:38:30,200-Speed 5455.77 samples/sec   Loss 1.0936   LearningRate 0.0017   Epoch: 18   Global Step: 192550   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:38:37,682-Speed 5475.02 samples/sec   Loss 1.0810   LearningRate 0.0017   Epoch: 18   Global Step: 192560   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:38:45,143-Speed 5490.66 samples/sec   Loss 1.0929   LearningRate 0.0017   Epoch: 18   Global Step: 192570   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:38:52,691-Speed 5427.36 samples/sec   Loss 1.0826   LearningRate 0.0017   Epoch: 18   Global Step: 192580   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:39:00,141-Speed 5498.52 samples/sec   Loss 1.0894   LearningRate 0.0017   Epoch: 18   Global Step: 192590   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:39:07,563-Speed 5520.03 samples/sec   Loss 1.0842   LearningRate 0.0017   Epoch: 18   Global Step: 192600   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:39:15,039-Speed 5479.84 samples/sec   Loss 1.0621   LearningRate 0.0017   Epoch: 18   Global Step: 192610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:39:22,482-Speed 5503.38 samples/sec   Loss 1.0730   LearningRate 0.0017   Epoch: 18   Global Step: 192620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:39:30,017-Speed 5437.09 samples/sec   Loss 1.0992   LearningRate 0.0017   Epoch: 18   Global Step: 192630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:39:37,558-Speed 5432.38 samples/sec   Loss 1.0693   LearningRate 0.0017   Epoch: 18   Global Step: 192640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:39:45,047-Speed 5470.08 samples/sec   Loss 1.0808   LearningRate 0.0017   Epoch: 18   Global Step: 192650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:39:52,590-Speed 5431.02 samples/sec   Loss 1.0989   LearningRate 0.0017   Epoch: 18   Global Step: 192660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:40:00,247-Speed 5349.88 samples/sec   Loss 1.0792   LearningRate 0.0017   Epoch: 18   Global Step: 192670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:40:07,716-Speed 5485.18 samples/sec   Loss 1.0971   LearningRate 0.0017   Epoch: 18   Global Step: 192680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:40:15,456-Speed 5292.88 samples/sec   Loss 1.0851   LearningRate 0.0017   Epoch: 18   Global Step: 192690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:40:22,987-Speed 5439.73 samples/sec   Loss 1.0953   LearningRate 0.0017   Epoch: 18   Global Step: 192700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 14:40:30,436-Speed 5499.52 samples/sec   Loss 1.0876   LearningRate 0.0017   Epoch: 18   Global Step: 192710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 14:40:37,942-Speed 5457.39 samples/sec   Loss 1.0842   LearningRate 0.0017   Epoch: 18   Global Step: 192720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:40:45,417-Speed 5480.15 samples/sec   Loss 1.0726   LearningRate 0.0017   Epoch: 18   Global Step: 192730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:40:52,903-Speed 5472.66 samples/sec   Loss 1.0729   LearningRate 0.0017   Epoch: 18   Global Step: 192740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:41:00,374-Speed 5483.46 samples/sec   Loss 1.0886   LearningRate 0.0017   Epoch: 18   Global Step: 192750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:41:07,885-Speed 5453.80 samples/sec   Loss 1.0821   LearningRate 0.0017   Epoch: 18   Global Step: 192760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:41:15,347-Speed 5490.03 samples/sec   Loss 1.0742   LearningRate 0.0016   Epoch: 18   Global Step: 192770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:41:22,805-Speed 5493.07 samples/sec   Loss 1.0791   LearningRate 0.0016   Epoch: 18   Global Step: 192780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:41:30,325-Speed 5447.49 samples/sec   Loss 1.0577   LearningRate 0.0016   Epoch: 18   Global Step: 192790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:41:37,781-Speed 5494.29 samples/sec   Loss 1.0686   LearningRate 0.0016   Epoch: 18   Global Step: 192800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:41:45,263-Speed 5475.33 samples/sec   Loss 1.0702   LearningRate 0.0016   Epoch: 18   Global Step: 192810   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:41:52,996-Speed 5297.83 samples/sec   Loss 1.0717   LearningRate 0.0016   Epoch: 18   Global Step: 192820   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:42:00,632-Speed 5364.56 samples/sec   Loss 1.0750   LearningRate 0.0016   Epoch: 18   Global Step: 192830   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:42:08,186-Speed 5422.85 samples/sec   Loss 1.0806   LearningRate 0.0016   Epoch: 18   Global Step: 192840   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:42:15,688-Speed 5460.54 samples/sec   Loss 1.0831   LearningRate 0.0016   Epoch: 18   Global Step: 192850   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:42:23,173-Speed 5473.17 samples/sec   Loss 1.0908   LearningRate 0.0016   Epoch: 18   Global Step: 192860   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:42:30,882-Speed 5313.94 samples/sec   Loss 1.0684   LearningRate 0.0016   Epoch: 18   Global Step: 192870   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:42:38,418-Speed 5436.24 samples/sec   Loss 1.0711   LearningRate 0.0016   Epoch: 18   Global Step: 192880   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:42:46,115-Speed 5322.12 samples/sec   Loss 1.0805   LearningRate 0.0016   Epoch: 18   Global Step: 192890   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:42:53,568-Speed 5496.79 samples/sec   Loss 1.0928   LearningRate 0.0016   Epoch: 18   Global Step: 192900   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:43:01,054-Speed 5472.31 samples/sec   Loss 1.0736   LearningRate 0.0016   Epoch: 18   Global Step: 192910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:43:08,567-Speed 5452.36 samples/sec   Loss 1.0857   LearningRate 0.0016   Epoch: 18   Global Step: 192920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:43:16,154-Speed 5399.86 samples/sec   Loss 1.0589   LearningRate 0.0016   Epoch: 18   Global Step: 192930   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:43:23,660-Speed 5457.69 samples/sec   Loss 1.0711   LearningRate 0.0016   Epoch: 18   Global Step: 192940   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:43:31,047-Speed 5545.34 samples/sec   Loss 1.0741   LearningRate 0.0016   Epoch: 18   Global Step: 192950   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:43:38,670-Speed 5373.87 samples/sec   Loss 1.0746   LearningRate 0.0016   Epoch: 18   Global Step: 192960   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:43:46,243-Speed 5409.66 samples/sec   Loss 1.0839   LearningRate 0.0016   Epoch: 18   Global Step: 192970   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:43:53,675-Speed 5512.32 samples/sec   Loss 1.0584   LearningRate 0.0016   Epoch: 18   Global Step: 192980   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:44:01,142-Speed 5485.99 samples/sec   Loss 1.0657   LearningRate 0.0016   Epoch: 18   Global Step: 192990   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:44:08,645-Speed 5460.13 samples/sec   Loss 1.0919   LearningRate 0.0016   Epoch: 18   Global Step: 193000   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:44:16,227-Speed 5403.08 samples/sec   Loss 1.0821   LearningRate 0.0016   Epoch: 18   Global Step: 193010   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:44:23,651-Speed 5517.36 samples/sec   Loss 1.0756   LearningRate 0.0016   Epoch: 18   Global Step: 193020   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:44:31,116-Speed 5487.97 samples/sec   Loss 1.0691   LearningRate 0.0016   Epoch: 18   Global Step: 193030   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:44:38,568-Speed 5497.08 samples/sec   Loss 1.0592   LearningRate 0.0016   Epoch: 18   Global Step: 193040   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:44:46,040-Speed 5483.08 samples/sec   Loss 1.0573   LearningRate 0.0016   Epoch: 18   Global Step: 193050   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:44:53,590-Speed 5425.67 samples/sec   Loss 1.0566   LearningRate 0.0016   Epoch: 18   Global Step: 193060   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:45:01,127-Speed 5435.60 samples/sec   Loss 1.0651   LearningRate 0.0016   Epoch: 18   Global Step: 193070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:45:08,703-Speed 5406.97 samples/sec   Loss 1.0480   LearningRate 0.0016   Epoch: 18   Global Step: 193080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:45:16,175-Speed 5482.90 samples/sec   Loss 1.0716   LearningRate 0.0016   Epoch: 18   Global Step: 193090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:45:23,711-Speed 5435.97 samples/sec   Loss 1.0645   LearningRate 0.0016   Epoch: 18   Global Step: 193100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:45:31,207-Speed 5465.08 samples/sec   Loss 1.0495   LearningRate 0.0016   Epoch: 18   Global Step: 193110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:45:38,658-Speed 5497.58 samples/sec   Loss 1.0641   LearningRate 0.0016   Epoch: 18   Global Step: 193120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:45:46,190-Speed 5439.20 samples/sec   Loss 1.0726   LearningRate 0.0016   Epoch: 18   Global Step: 193130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 14:45:53,693-Speed 5459.87 samples/sec   Loss 1.0507   LearningRate 0.0016   Epoch: 18   Global Step: 193140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:46:01,254-Speed 5417.94 samples/sec   Loss 1.0653   LearningRate 0.0016   Epoch: 18   Global Step: 193150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:46:08,728-Speed 5481.48 samples/sec   Loss 1.0785   LearningRate 0.0016   Epoch: 18   Global Step: 193160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:46:16,199-Speed 5482.97 samples/sec   Loss 1.0692   LearningRate 0.0016   Epoch: 18   Global Step: 193170   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:46:23,770-Speed 5410.93 samples/sec   Loss 1.0610   LearningRate 0.0016   Epoch: 18   Global Step: 193180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:46:31,418-Speed 5356.52 samples/sec   Loss 1.0753   LearningRate 0.0016   Epoch: 18   Global Step: 193190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:46:38,995-Speed 5406.13 samples/sec   Loss 1.0917   LearningRate 0.0016   Epoch: 18   Global Step: 193200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:46:46,511-Speed 5450.80 samples/sec   Loss 1.0463   LearningRate 0.0016   Epoch: 18   Global Step: 193210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:46:54,094-Speed 5402.23 samples/sec   Loss 1.0391   LearningRate 0.0015   Epoch: 18   Global Step: 193220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:47:01,561-Speed 5486.42 samples/sec   Loss 1.0547   LearningRate 0.0015   Epoch: 18   Global Step: 193230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:47:09,036-Speed 5479.99 samples/sec   Loss 1.0599   LearningRate 0.0015   Epoch: 18   Global Step: 193240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:47:16,424-Speed 5545.04 samples/sec   Loss 1.0713   LearningRate 0.0015   Epoch: 18   Global Step: 193250   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:47:23,951-Speed 5442.60 samples/sec   Loss 1.0765   LearningRate 0.0015   Epoch: 18   Global Step: 193260   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:47:31,493-Speed 5431.35 samples/sec   Loss 1.0783   LearningRate 0.0015   Epoch: 18   Global Step: 193270   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:47:38,961-Speed 5485.37 samples/sec   Loss 1.0450   LearningRate 0.0015   Epoch: 18   Global Step: 193280   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:47:46,450-Speed 5470.47 samples/sec   Loss 1.0485   LearningRate 0.0015   Epoch: 18   Global Step: 193290   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:47:53,894-Speed 5503.08 samples/sec   Loss 1.0632   LearningRate 0.0015   Epoch: 18   Global Step: 193300   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:48:01,472-Speed 5405.91 samples/sec   Loss 1.0550   LearningRate 0.0015   Epoch: 18   Global Step: 193310   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:48:09,068-Speed 5393.19 samples/sec   Loss 1.0533   LearningRate 0.0015   Epoch: 18   Global Step: 193320   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:48:16,622-Speed 5422.80 samples/sec   Loss 1.0682   LearningRate 0.0015   Epoch: 18   Global Step: 193330   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:48:24,093-Speed 5483.87 samples/sec   Loss 1.0379   LearningRate 0.0015   Epoch: 18   Global Step: 193340   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:48:31,654-Speed 5417.65 samples/sec   Loss 1.0604   LearningRate 0.0015   Epoch: 18   Global Step: 193350   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:48:39,166-Speed 5453.45 samples/sec   Loss 1.0650   LearningRate 0.0015   Epoch: 18   Global Step: 193360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:48:46,837-Speed 5340.15 samples/sec   Loss 1.0580   LearningRate 0.0015   Epoch: 18   Global Step: 193370   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:48:54,365-Speed 5442.01 samples/sec   Loss 1.0473   LearningRate 0.0015   Epoch: 18   Global Step: 193380   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:49:01,896-Speed 5439.08 samples/sec   Loss 1.0611   LearningRate 0.0015   Epoch: 18   Global Step: 193390   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:49:09,396-Speed 5462.03 samples/sec   Loss 1.0788   LearningRate 0.0015   Epoch: 18   Global Step: 193400   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:49:16,865-Speed 5486.84 samples/sec   Loss 1.0533   LearningRate 0.0015   Epoch: 18   Global Step: 193410   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:49:24,406-Speed 5432.29 samples/sec   Loss 1.0659   LearningRate 0.0015   Epoch: 18   Global Step: 193420   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:49:31,772-Speed 5561.77 samples/sec   Loss 1.0488   LearningRate 0.0015   Epoch: 18   Global Step: 193430   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:49:39,309-Speed 5434.76 samples/sec   Loss 1.0879   LearningRate 0.0015   Epoch: 18   Global Step: 193440   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:49:46,790-Speed 5475.98 samples/sec   Loss 1.0596   LearningRate 0.0015   Epoch: 18   Global Step: 193450   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:49:54,210-Speed 5520.89 samples/sec   Loss 1.0538   LearningRate 0.0015   Epoch: 18   Global Step: 193460   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:50:01,757-Speed 5427.75 samples/sec   Loss 1.0685   LearningRate 0.0015   Epoch: 18   Global Step: 193470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:50:09,261-Speed 5458.97 samples/sec   Loss 1.0671   LearningRate 0.0015   Epoch: 18   Global Step: 193480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:50:16,678-Speed 5523.60 samples/sec   Loss 1.0640   LearningRate 0.0015   Epoch: 18   Global Step: 193490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:50:24,157-Speed 5477.28 samples/sec   Loss 1.0644   LearningRate 0.0015   Epoch: 18   Global Step: 193500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:50:31,567-Speed 5528.38 samples/sec   Loss 1.0423   LearningRate 0.0015   Epoch: 18   Global Step: 193510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:50:38,984-Speed 5523.38 samples/sec   Loss 1.0631   LearningRate 0.0015   Epoch: 18   Global Step: 193520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:50:46,415-Speed 5512.07 samples/sec   Loss 1.0511   LearningRate 0.0015   Epoch: 18   Global Step: 193530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:50:53,934-Speed 5448.27 samples/sec   Loss 1.0588   LearningRate 0.0015   Epoch: 18   Global Step: 193540   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:51:01,433-Speed 5463.21 samples/sec   Loss 1.0415   LearningRate 0.0015   Epoch: 18   Global Step: 193550   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:51:09,037-Speed 5387.20 samples/sec   Loss 1.0638   LearningRate 0.0015   Epoch: 18   Global Step: 193560   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:51:16,520-Speed 5474.31 samples/sec   Loss 1.0373   LearningRate 0.0015   Epoch: 18   Global Step: 193570   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:51:24,057-Speed 5435.50 samples/sec   Loss 1.0334   LearningRate 0.0015   Epoch: 18   Global Step: 193580   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:51:31,675-Speed 5377.54 samples/sec   Loss 1.0745   LearningRate 0.0015   Epoch: 18   Global Step: 193590   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:51:39,309-Speed 5366.80 samples/sec   Loss 1.0525   LearningRate 0.0015   Epoch: 18   Global Step: 193600   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:51:46,788-Speed 5476.80 samples/sec   Loss 1.0546   LearningRate 0.0015   Epoch: 18   Global Step: 193610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:51:54,342-Speed 5423.55 samples/sec   Loss 1.0749   LearningRate 0.0015   Epoch: 18   Global Step: 193620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:52:01,842-Speed 5462.24 samples/sec   Loss 1.0430   LearningRate 0.0015   Epoch: 18   Global Step: 193630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:52:09,373-Speed 5439.90 samples/sec   Loss 1.0336   LearningRate 0.0015   Epoch: 18   Global Step: 193640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:52:16,856-Speed 5473.76 samples/sec   Loss 1.0841   LearningRate 0.0015   Epoch: 18   Global Step: 193650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:52:24,315-Speed 5491.91 samples/sec   Loss 1.0411   LearningRate 0.0015   Epoch: 18   Global Step: 193660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:52:31,755-Speed 5506.81 samples/sec   Loss 1.0780   LearningRate 0.0015   Epoch: 18   Global Step: 193670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 14:52:39,184-Speed 5513.94 samples/sec   Loss 1.0497   LearningRate 0.0015   Epoch: 18   Global Step: 193680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 14:52:46,577-Speed 5541.33 samples/sec   Loss 1.0633   LearningRate 0.0014   Epoch: 18   Global Step: 193690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 14:52:54,057-Speed 5476.27 samples/sec   Loss 1.0546   LearningRate 0.0014   Epoch: 18   Global Step: 193700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 14:53:01,471-Speed 5526.22 samples/sec   Loss 1.0483   LearningRate 0.0014   Epoch: 18   Global Step: 193710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 14:53:08,883-Speed 5526.23 samples/sec   Loss 1.0193   LearningRate 0.0014   Epoch: 18   Global Step: 193720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:53:16,428-Speed 5429.67 samples/sec   Loss 1.0722   LearningRate 0.0014   Epoch: 18   Global Step: 193730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:53:23,938-Speed 5454.85 samples/sec   Loss 1.0352   LearningRate 0.0014   Epoch: 18   Global Step: 193740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:53:31,562-Speed 5372.99 samples/sec   Loss 1.0504   LearningRate 0.0014   Epoch: 18   Global Step: 193750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:53:39,080-Speed 5449.49 samples/sec   Loss 1.0542   LearningRate 0.0014   Epoch: 18   Global Step: 193760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:53:46,668-Speed 5398.30 samples/sec   Loss 1.0773   LearningRate 0.0014   Epoch: 18   Global Step: 193770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:53:54,233-Speed 5415.12 samples/sec   Loss 1.0692   LearningRate 0.0014   Epoch: 18   Global Step: 193780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:54:01,825-Speed 5396.18 samples/sec   Loss 1.0556   LearningRate 0.0014   Epoch: 18   Global Step: 193790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:54:09,412-Speed 5399.06 samples/sec   Loss 1.0561   LearningRate 0.0014   Epoch: 18   Global Step: 193800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:54:16,869-Speed 5493.55 samples/sec   Loss 1.0626   LearningRate 0.0014   Epoch: 18   Global Step: 193810   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:54:24,334-Speed 5487.87 samples/sec   Loss 1.0507   LearningRate 0.0014   Epoch: 18   Global Step: 193820   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:54:31,837-Speed 5459.87 samples/sec   Loss 1.0511   LearningRate 0.0014   Epoch: 18   Global Step: 193830   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:54:39,297-Speed 5491.71 samples/sec   Loss 1.0422   LearningRate 0.0014   Epoch: 18   Global Step: 193840   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:54:46,766-Speed 5484.58 samples/sec   Loss 1.0556   LearningRate 0.0014   Epoch: 18   Global Step: 193850   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:54:54,254-Speed 5470.95 samples/sec   Loss 1.0737   LearningRate 0.0014   Epoch: 18   Global Step: 193860   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:55:01,696-Speed 5504.71 samples/sec   Loss 1.0661   LearningRate 0.0014   Epoch: 18   Global Step: 193870   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:55:09,142-Speed 5501.77 samples/sec   Loss 1.0512   LearningRate 0.0014   Epoch: 18   Global Step: 193880   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:55:16,601-Speed 5491.74 samples/sec   Loss 1.0688   LearningRate 0.0014   Epoch: 18   Global Step: 193890   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:55:24,067-Speed 5487.44 samples/sec   Loss 1.0516   LearningRate 0.0014   Epoch: 18   Global Step: 193900   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:55:31,507-Speed 5506.14 samples/sec   Loss 1.0617   LearningRate 0.0014   Epoch: 18   Global Step: 193910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:55:38,995-Speed 5470.88 samples/sec   Loss 1.0402   LearningRate 0.0014   Epoch: 18   Global Step: 193920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:55:46,643-Speed 5356.27 samples/sec   Loss 1.0496   LearningRate 0.0014   Epoch: 18   Global Step: 193930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:55:54,092-Speed 5499.72 samples/sec   Loss 1.0300   LearningRate 0.0014   Epoch: 18   Global Step: 193940   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:56:01,575-Speed 5474.25 samples/sec   Loss 1.0365   LearningRate 0.0014   Epoch: 18   Global Step: 193950   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:56:09,057-Speed 5475.04 samples/sec   Loss 1.0324   LearningRate 0.0014   Epoch: 18   Global Step: 193960   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:56:16,559-Speed 5460.71 samples/sec   Loss 1.0552   LearningRate 0.0014   Epoch: 18   Global Step: 193970   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:56:24,066-Speed 5456.87 samples/sec   Loss 1.0204   LearningRate 0.0014   Epoch: 18   Global Step: 193980   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:56:31,584-Speed 5449.20 samples/sec   Loss 1.0411   LearningRate 0.0014   Epoch: 18   Global Step: 193990   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:56:39,101-Speed 5449.61 samples/sec   Loss 1.0506   LearningRate 0.0014   Epoch: 18   Global Step: 194000   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:57:23,135-[lfw][194000]XNorm: 22.458612
Training: 2022-01-09 14:57:23,136-[lfw][194000]Accuracy-Flip: 0.99833+-0.00236
Training: 2022-01-09 14:57:23,136-[lfw][194000]Accuracy-Highest: 0.99850
Training: 2022-01-09 14:58:15,006-[cfp_fp][194000]XNorm: 22.183354
Training: 2022-01-09 14:58:15,006-[cfp_fp][194000]Accuracy-Flip: 0.99386+-0.00332
Training: 2022-01-09 14:58:15,007-[cfp_fp][194000]Accuracy-Highest: 0.99443
Training: 2022-01-09 14:58:59,428-[agedb_30][194000]XNorm: 23.049855
Training: 2022-01-09 14:58:59,429-[agedb_30][194000]Accuracy-Flip: 0.98583+-0.00518
Training: 2022-01-09 14:58:59,429-[agedb_30][194000]Accuracy-Highest: 0.98617
Training: 2022-01-09 14:59:07,005-Speed 276.94 samples/sec   Loss 1.0463   LearningRate 0.0014   Epoch: 18   Global Step: 194010   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:59:14,395-Speed 5543.08 samples/sec   Loss 1.0297   LearningRate 0.0014   Epoch: 18   Global Step: 194020   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:59:21,803-Speed 5529.75 samples/sec   Loss 1.0459   LearningRate 0.0014   Epoch: 18   Global Step: 194030   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:59:29,274-Speed 5483.83 samples/sec   Loss 1.0609   LearningRate 0.0014   Epoch: 18   Global Step: 194040   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 14:59:36,679-Speed 5531.84 samples/sec   Loss 1.0424   LearningRate 0.0014   Epoch: 18   Global Step: 194050   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:59:44,236-Speed 5421.01 samples/sec   Loss 1.0310   LearningRate 0.0014   Epoch: 18   Global Step: 194060   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:59:51,663-Speed 5515.88 samples/sec   Loss 1.0331   LearningRate 0.0014   Epoch: 18   Global Step: 194070   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 14:59:59,084-Speed 5520.25 samples/sec   Loss 1.0424   LearningRate 0.0014   Epoch: 18   Global Step: 194080   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:00:06,487-Speed 5533.20 samples/sec   Loss 1.0442   LearningRate 0.0014   Epoch: 18   Global Step: 194090   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:00:13,864-Speed 5553.01 samples/sec   Loss 1.0339   LearningRate 0.0014   Epoch: 18   Global Step: 194100   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:00:21,285-Speed 5520.22 samples/sec   Loss 1.0371   LearningRate 0.0014   Epoch: 18   Global Step: 194110   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:00:28,677-Speed 5541.97 samples/sec   Loss 1.0587   LearningRate 0.0014   Epoch: 18   Global Step: 194120   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:00:36,061-Speed 5548.11 samples/sec   Loss 1.0343   LearningRate 0.0014   Epoch: 18   Global Step: 194130   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:00:43,475-Speed 5524.92 samples/sec   Loss 1.0461   LearningRate 0.0014   Epoch: 18   Global Step: 194140   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:00:50,926-Speed 5498.39 samples/sec   Loss 1.0432   LearningRate 0.0014   Epoch: 18   Global Step: 194150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:00:58,380-Speed 5495.50 samples/sec   Loss 1.0525   LearningRate 0.0014   Epoch: 18   Global Step: 194160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:01:05,856-Speed 5479.48 samples/sec   Loss 1.0330   LearningRate 0.0013   Epoch: 18   Global Step: 194170   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:01:13,256-Speed 5536.07 samples/sec   Loss 1.0458   LearningRate 0.0013   Epoch: 18   Global Step: 194180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:01:20,696-Speed 5506.33 samples/sec   Loss 1.0302   LearningRate 0.0013   Epoch: 18   Global Step: 194190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:01:28,089-Speed 5540.65 samples/sec   Loss 1.0238   LearningRate 0.0013   Epoch: 18   Global Step: 194200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:01:35,534-Speed 5502.31 samples/sec   Loss 1.0188   LearningRate 0.0013   Epoch: 18   Global Step: 194210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:01:43,013-Speed 5477.28 samples/sec   Loss 1.0303   LearningRate 0.0013   Epoch: 18   Global Step: 194220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:01:50,400-Speed 5546.09 samples/sec   Loss 1.0204   LearningRate 0.0013   Epoch: 18   Global Step: 194230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:01:57,832-Speed 5511.98 samples/sec   Loss 1.0224   LearningRate 0.0013   Epoch: 18   Global Step: 194240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:02:05,241-Speed 5528.61 samples/sec   Loss 1.0335   LearningRate 0.0013   Epoch: 18   Global Step: 194250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:02:12,629-Speed 5545.07 samples/sec   Loss 1.0133   LearningRate 0.0013   Epoch: 18   Global Step: 194260   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:02:20,085-Speed 5494.84 samples/sec   Loss 1.0210   LearningRate 0.0013   Epoch: 18   Global Step: 194270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:02:27,545-Speed 5490.87 samples/sec   Loss 1.0435   LearningRate 0.0013   Epoch: 18   Global Step: 194280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:02:34,979-Speed 5510.65 samples/sec   Loss 1.0427   LearningRate 0.0013   Epoch: 18   Global Step: 194290   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:02:42,421-Speed 5505.05 samples/sec   Loss 1.0243   LearningRate 0.0013   Epoch: 18   Global Step: 194300   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:02:49,826-Speed 5531.87 samples/sec   Loss 1.0452   LearningRate 0.0013   Epoch: 18   Global Step: 194310   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:02:57,285-Speed 5492.52 samples/sec   Loss 1.0207   LearningRate 0.0013   Epoch: 18   Global Step: 194320   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:03:04,691-Speed 5531.45 samples/sec   Loss 1.0352   LearningRate 0.0013   Epoch: 18   Global Step: 194330   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:03:12,058-Speed 5560.07 samples/sec   Loss 1.0448   LearningRate 0.0013   Epoch: 18   Global Step: 194340   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:03:19,511-Speed 5497.24 samples/sec   Loss 1.0228   LearningRate 0.0013   Epoch: 18   Global Step: 194350   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:03:26,961-Speed 5497.99 samples/sec   Loss 1.0579   LearningRate 0.0013   Epoch: 18   Global Step: 194360   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:03:34,435-Speed 5481.59 samples/sec   Loss 1.0311   LearningRate 0.0013   Epoch: 18   Global Step: 194370   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:03:41,887-Speed 5497.03 samples/sec   Loss 1.0462   LearningRate 0.0013   Epoch: 18   Global Step: 194380   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:03:49,303-Speed 5523.80 samples/sec   Loss 1.0286   LearningRate 0.0013   Epoch: 18   Global Step: 194390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:03:56,767-Speed 5488.37 samples/sec   Loss 1.0469   LearningRate 0.0013   Epoch: 18   Global Step: 194400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:04:04,215-Speed 5500.21 samples/sec   Loss 1.0449   LearningRate 0.0013   Epoch: 18   Global Step: 194410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:04:11,655-Speed 5506.05 samples/sec   Loss 1.0227   LearningRate 0.0013   Epoch: 18   Global Step: 194420   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:04:19,083-Speed 5515.73 samples/sec   Loss 1.0116   LearningRate 0.0013   Epoch: 18   Global Step: 194430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:04:26,525-Speed 5504.44 samples/sec   Loss 1.0431   LearningRate 0.0013   Epoch: 18   Global Step: 194440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:04:33,984-Speed 5492.37 samples/sec   Loss 1.0546   LearningRate 0.0013   Epoch: 18   Global Step: 194450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:04:41,543-Speed 5419.45 samples/sec   Loss 1.0374   LearningRate 0.0013   Epoch: 18   Global Step: 194460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:04:48,992-Speed 5499.27 samples/sec   Loss 1.0294   LearningRate 0.0013   Epoch: 18   Global Step: 194470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:04:56,422-Speed 5513.78 samples/sec   Loss 1.0494   LearningRate 0.0013   Epoch: 18   Global Step: 194480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:05:03,814-Speed 5541.76 samples/sec   Loss 1.0435   LearningRate 0.0013   Epoch: 18   Global Step: 194490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:05:11,338-Speed 5444.43 samples/sec   Loss 1.0203   LearningRate 0.0013   Epoch: 18   Global Step: 194500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:05:18,775-Speed 5508.22 samples/sec   Loss 1.0109   LearningRate 0.0013   Epoch: 18   Global Step: 194510   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:05:26,281-Speed 5458.15 samples/sec   Loss 1.0314   LearningRate 0.0013   Epoch: 18   Global Step: 194520   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:05:33,776-Speed 5465.41 samples/sec   Loss 1.0153   LearningRate 0.0013   Epoch: 18   Global Step: 194530   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:05:41,351-Speed 5407.98 samples/sec   Loss 1.0273   LearningRate 0.0013   Epoch: 18   Global Step: 194540   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:05:48,791-Speed 5506.39 samples/sec   Loss 1.0194   LearningRate 0.0013   Epoch: 18   Global Step: 194550   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:05:56,235-Speed 5503.40 samples/sec   Loss 1.0349   LearningRate 0.0013   Epoch: 18   Global Step: 194560   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:06:03,743-Speed 5455.67 samples/sec   Loss 1.0307   LearningRate 0.0013   Epoch: 18   Global Step: 194570   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:06:11,364-Speed 5375.57 samples/sec   Loss 1.0204   LearningRate 0.0013   Epoch: 18   Global Step: 194580   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:06:18,788-Speed 5518.18 samples/sec   Loss 1.0298   LearningRate 0.0013   Epoch: 18   Global Step: 194590   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:06:26,339-Speed 5425.25 samples/sec   Loss 1.0304   LearningRate 0.0013   Epoch: 18   Global Step: 194600   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:06:33,779-Speed 5506.14 samples/sec   Loss 1.0265   LearningRate 0.0013   Epoch: 18   Global Step: 194610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:06:41,400-Speed 5374.81 samples/sec   Loss 1.0308   LearningRate 0.0013   Epoch: 18   Global Step: 194620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:06:48,958-Speed 5420.63 samples/sec   Loss 1.0375   LearningRate 0.0013   Epoch: 18   Global Step: 194630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:06:56,392-Speed 5510.82 samples/sec   Loss 1.0376   LearningRate 0.0013   Epoch: 18   Global Step: 194640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:07:03,874-Speed 5474.74 samples/sec   Loss 1.0265   LearningRate 0.0013   Epoch: 18   Global Step: 194650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:07:11,387-Speed 5452.40 samples/sec   Loss 1.0195   LearningRate 0.0013   Epoch: 18   Global Step: 194660   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:07:18,951-Speed 5416.37 samples/sec   Loss 1.0219   LearningRate 0.0012   Epoch: 18   Global Step: 194670   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:07:26,399-Speed 5499.78 samples/sec   Loss 1.0358   LearningRate 0.0012   Epoch: 18   Global Step: 194680   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:07:33,775-Speed 5553.53 samples/sec   Loss 1.0360   LearningRate 0.0012   Epoch: 18   Global Step: 194690   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:07:41,185-Speed 5528.85 samples/sec   Loss 1.0402   LearningRate 0.0012   Epoch: 18   Global Step: 194700   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:07:48,588-Speed 5533.65 samples/sec   Loss 1.0492   LearningRate 0.0012   Epoch: 18   Global Step: 194710   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:07:56,086-Speed 5463.09 samples/sec   Loss 1.0337   LearningRate 0.0012   Epoch: 18   Global Step: 194720   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:08:03,668-Speed 5403.57 samples/sec   Loss 1.0135   LearningRate 0.0012   Epoch: 18   Global Step: 194730   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:08:11,143-Speed 5480.27 samples/sec   Loss 1.0170   LearningRate 0.0012   Epoch: 18   Global Step: 194740   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:08:18,573-Speed 5513.45 samples/sec   Loss 1.0009   LearningRate 0.0012   Epoch: 18   Global Step: 194750   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:08:26,062-Speed 5469.94 samples/sec   Loss 1.0232   LearningRate 0.0012   Epoch: 18   Global Step: 194760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:08:33,506-Speed 5503.21 samples/sec   Loss 1.0225   LearningRate 0.0012   Epoch: 18   Global Step: 194770   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:08:40,957-Speed 5499.06 samples/sec   Loss 1.0146   LearningRate 0.0012   Epoch: 18   Global Step: 194780   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:08:48,507-Speed 5425.38 samples/sec   Loss 1.0285   LearningRate 0.0012   Epoch: 18   Global Step: 194790   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:08:55,963-Speed 5494.25 samples/sec   Loss 1.0346   LearningRate 0.0012   Epoch: 18   Global Step: 194800   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:09:03,410-Speed 5501.63 samples/sec   Loss 1.0190   LearningRate 0.0012   Epoch: 18   Global Step: 194810   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:09:10,798-Speed 5545.09 samples/sec   Loss 1.0324   LearningRate 0.0012   Epoch: 18   Global Step: 194820   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:09:18,340-Speed 5431.14 samples/sec   Loss 1.0290   LearningRate 0.0012   Epoch: 18   Global Step: 194830   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:09:25,742-Speed 5534.30 samples/sec   Loss 1.0288   LearningRate 0.0012   Epoch: 18   Global Step: 194840   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:09:33,228-Speed 5472.46 samples/sec   Loss 1.0268   LearningRate 0.0012   Epoch: 18   Global Step: 194850   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:09:40,662-Speed 5510.72 samples/sec   Loss 0.9927   LearningRate 0.0012   Epoch: 18   Global Step: 194860   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:09:48,161-Speed 5463.08 samples/sec   Loss 1.0092   LearningRate 0.0012   Epoch: 18   Global Step: 194870   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:09:55,686-Speed 5443.78 samples/sec   Loss 1.0469   LearningRate 0.0012   Epoch: 18   Global Step: 194880   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:10:03,135-Speed 5499.27 samples/sec   Loss 1.0184   LearningRate 0.0012   Epoch: 18   Global Step: 194890   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:10:10,550-Speed 5524.65 samples/sec   Loss 1.0305   LearningRate 0.0012   Epoch: 18   Global Step: 194900   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:10:18,062-Speed 5453.51 samples/sec   Loss 1.0317   LearningRate 0.0012   Epoch: 18   Global Step: 194910   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:10:25,596-Speed 5437.21 samples/sec   Loss 1.0211   LearningRate 0.0012   Epoch: 18   Global Step: 194920   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:10:33,047-Speed 5498.20 samples/sec   Loss 1.0356   LearningRate 0.0012   Epoch: 18   Global Step: 194930   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:10:40,443-Speed 5538.32 samples/sec   Loss 1.0178   LearningRate 0.0012   Epoch: 18   Global Step: 194940   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:10:48,060-Speed 5378.60 samples/sec   Loss 1.0397   LearningRate 0.0012   Epoch: 18   Global Step: 194950   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:10:55,563-Speed 5459.40 samples/sec   Loss 1.0151   LearningRate 0.0012   Epoch: 18   Global Step: 194960   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:11:03,095-Speed 5439.23 samples/sec   Loss 1.0326   LearningRate 0.0012   Epoch: 18   Global Step: 194970   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:11:10,528-Speed 5511.57 samples/sec   Loss 1.0210   LearningRate 0.0012   Epoch: 18   Global Step: 194980   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:11:17,977-Speed 5499.55 samples/sec   Loss 0.9873   LearningRate 0.0012   Epoch: 18   Global Step: 194990   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:11:25,410-Speed 5511.12 samples/sec   Loss 1.0145   LearningRate 0.0012   Epoch: 18   Global Step: 195000   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:11:32,859-Speed 5499.55 samples/sec   Loss 1.0140   LearningRate 0.0012   Epoch: 18   Global Step: 195010   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:11:40,345-Speed 5472.67 samples/sec   Loss 1.0185   LearningRate 0.0012   Epoch: 18   Global Step: 195020   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:11:47,920-Speed 5407.54 samples/sec   Loss 1.0167   LearningRate 0.0012   Epoch: 18   Global Step: 195030   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:11:55,349-Speed 5514.60 samples/sec   Loss 1.0132   LearningRate 0.0012   Epoch: 18   Global Step: 195040   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:12:02,802-Speed 5496.42 samples/sec   Loss 1.0107   LearningRate 0.0012   Epoch: 18   Global Step: 195050   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:12:10,293-Speed 5468.39 samples/sec   Loss 1.0203   LearningRate 0.0012   Epoch: 18   Global Step: 195060   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:12:17,808-Speed 5451.37 samples/sec   Loss 1.0026   LearningRate 0.0012   Epoch: 18   Global Step: 195070   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:12:25,413-Speed 5386.13 samples/sec   Loss 1.0124   LearningRate 0.0012   Epoch: 18   Global Step: 195080   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:12:32,871-Speed 5493.57 samples/sec   Loss 1.0198   LearningRate 0.0012   Epoch: 18   Global Step: 195090   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:12:40,327-Speed 5494.31 samples/sec   Loss 1.0075   LearningRate 0.0012   Epoch: 18   Global Step: 195100   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:12:47,784-Speed 5493.07 samples/sec   Loss 1.0433   LearningRate 0.0012   Epoch: 18   Global Step: 195110   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:12:55,247-Speed 5489.72 samples/sec   Loss 1.0187   LearningRate 0.0012   Epoch: 18   Global Step: 195120   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:13:02,684-Speed 5507.88 samples/sec   Loss 1.0131   LearningRate 0.0012   Epoch: 18   Global Step: 195130   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:13:10,115-Speed 5513.15 samples/sec   Loss 1.0327   LearningRate 0.0012   Epoch: 18   Global Step: 195140   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:13:17,585-Speed 5483.52 samples/sec   Loss 1.0322   LearningRate 0.0012   Epoch: 18   Global Step: 195150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:13:24,977-Speed 5542.07 samples/sec   Loss 0.9920   LearningRate 0.0012   Epoch: 18   Global Step: 195160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:13:32,460-Speed 5475.07 samples/sec   Loss 1.0194   LearningRate 0.0012   Epoch: 18   Global Step: 195170   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:13:39,855-Speed 5539.15 samples/sec   Loss 1.0211   LearningRate 0.0012   Epoch: 18   Global Step: 195180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:13:47,304-Speed 5499.47 samples/sec   Loss 1.0046   LearningRate 0.0011   Epoch: 18   Global Step: 195190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:13:54,753-Speed 5499.51 samples/sec   Loss 0.9980   LearningRate 0.0011   Epoch: 18   Global Step: 195200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:14:02,257-Speed 5459.76 samples/sec   Loss 1.0141   LearningRate 0.0011   Epoch: 18   Global Step: 195210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:14:09,697-Speed 5505.65 samples/sec   Loss 1.0187   LearningRate 0.0011   Epoch: 18   Global Step: 195220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:14:17,234-Speed 5435.05 samples/sec   Loss 0.9992   LearningRate 0.0011   Epoch: 18   Global Step: 195230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:14:24,647-Speed 5526.17 samples/sec   Loss 1.0368   LearningRate 0.0011   Epoch: 18   Global Step: 195240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:14:32,167-Speed 5448.20 samples/sec   Loss 1.0324   LearningRate 0.0011   Epoch: 18   Global Step: 195250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 15:14:39,590-Speed 5518.72 samples/sec   Loss 1.0077   LearningRate 0.0011   Epoch: 18   Global Step: 195260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 15:14:47,019-Speed 5513.88 samples/sec   Loss 1.0052   LearningRate 0.0011   Epoch: 18   Global Step: 195270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 15:14:54,472-Speed 5496.21 samples/sec   Loss 1.0141   LearningRate 0.0011   Epoch: 18   Global Step: 195280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 15:15:01,943-Speed 5483.76 samples/sec   Loss 1.0156   LearningRate 0.0011   Epoch: 18   Global Step: 195290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 15:15:09,396-Speed 5496.43 samples/sec   Loss 1.0238   LearningRate 0.0011   Epoch: 18   Global Step: 195300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 15:15:16,811-Speed 5524.54 samples/sec   Loss 1.0435   LearningRate 0.0011   Epoch: 18   Global Step: 195310   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:15:24,215-Speed 5532.95 samples/sec   Loss 1.0093   LearningRate 0.0011   Epoch: 18   Global Step: 195320   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:15:31,768-Speed 5423.75 samples/sec   Loss 1.0153   LearningRate 0.0011   Epoch: 18   Global Step: 195330   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:15:39,170-Speed 5534.33 samples/sec   Loss 1.0176   LearningRate 0.0011   Epoch: 18   Global Step: 195340   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:15:46,657-Speed 5471.71 samples/sec   Loss 1.0109   LearningRate 0.0011   Epoch: 18   Global Step: 195350   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:15:54,099-Speed 5505.17 samples/sec   Loss 1.0164   LearningRate 0.0011   Epoch: 18   Global Step: 195360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:16:01,528-Speed 5514.07 samples/sec   Loss 1.0202   LearningRate 0.0011   Epoch: 18   Global Step: 195370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:16:08,993-Speed 5487.65 samples/sec   Loss 1.0049   LearningRate 0.0011   Epoch: 18   Global Step: 195380   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:16:16,407-Speed 5525.10 samples/sec   Loss 1.0044   LearningRate 0.0011   Epoch: 18   Global Step: 195390   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:16:23,930-Speed 5445.27 samples/sec   Loss 0.9997   LearningRate 0.0011   Epoch: 18   Global Step: 195400   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:16:31,377-Speed 5501.00 samples/sec   Loss 1.0204   LearningRate 0.0011   Epoch: 18   Global Step: 195410   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:16:38,810-Speed 5511.39 samples/sec   Loss 1.0228   LearningRate 0.0011   Epoch: 18   Global Step: 195420   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:16:46,238-Speed 5514.84 samples/sec   Loss 1.0051   LearningRate 0.0011   Epoch: 18   Global Step: 195430   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:16:53,706-Speed 5485.59 samples/sec   Loss 1.0164   LearningRate 0.0011   Epoch: 18   Global Step: 195440   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:17:01,182-Speed 5479.79 samples/sec   Loss 1.0044   LearningRate 0.0011   Epoch: 18   Global Step: 195450   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:17:08,749-Speed 5414.16 samples/sec   Loss 1.0079   LearningRate 0.0011   Epoch: 18   Global Step: 195460   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:17:16,219-Speed 5483.22 samples/sec   Loss 1.0323   LearningRate 0.0011   Epoch: 18   Global Step: 195470   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:17:23,684-Speed 5488.18 samples/sec   Loss 1.0132   LearningRate 0.0011   Epoch: 18   Global Step: 195480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:17:31,198-Speed 5451.44 samples/sec   Loss 1.0166   LearningRate 0.0011   Epoch: 18   Global Step: 195490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:17:38,683-Speed 5473.34 samples/sec   Loss 1.0128   LearningRate 0.0011   Epoch: 18   Global Step: 195500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:17:46,117-Speed 5510.83 samples/sec   Loss 1.0001   LearningRate 0.0011   Epoch: 18   Global Step: 195510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:17:53,592-Speed 5479.83 samples/sec   Loss 0.9994   LearningRate 0.0011   Epoch: 18   Global Step: 195520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:18:01,053-Speed 5490.69 samples/sec   Loss 1.0207   LearningRate 0.0011   Epoch: 18   Global Step: 195530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:18:08,491-Speed 5508.26 samples/sec   Loss 1.0076   LearningRate 0.0011   Epoch: 18   Global Step: 195540   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:18:15,877-Speed 5545.82 samples/sec   Loss 1.0094   LearningRate 0.0011   Epoch: 18   Global Step: 195550   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:18:23,277-Speed 5536.12 samples/sec   Loss 1.0076   LearningRate 0.0011   Epoch: 18   Global Step: 195560   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:18:30,691-Speed 5525.25 samples/sec   Loss 0.9901   LearningRate 0.0011   Epoch: 18   Global Step: 195570   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:18:38,061-Speed 5558.66 samples/sec   Loss 1.0171   LearningRate 0.0011   Epoch: 18   Global Step: 195580   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:18:45,567-Speed 5457.53 samples/sec   Loss 0.9987   LearningRate 0.0011   Epoch: 18   Global Step: 195590   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:18:52,943-Speed 5554.05 samples/sec   Loss 1.0171   LearningRate 0.0011   Epoch: 18   Global Step: 195600   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:19:00,395-Speed 5497.67 samples/sec   Loss 1.0141   LearningRate 0.0011   Epoch: 18   Global Step: 195610   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:19:07,802-Speed 5530.29 samples/sec   Loss 1.0252   LearningRate 0.0011   Epoch: 18   Global Step: 195620   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:19:15,210-Speed 5530.50 samples/sec   Loss 1.0154   LearningRate 0.0011   Epoch: 18   Global Step: 195630   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:19:22,602-Speed 5541.59 samples/sec   Loss 1.0007   LearningRate 0.0011   Epoch: 18   Global Step: 195640   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:19:30,070-Speed 5485.28 samples/sec   Loss 0.9871   LearningRate 0.0011   Epoch: 18   Global Step: 195650   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:19:37,568-Speed 5463.99 samples/sec   Loss 1.0169   LearningRate 0.0011   Epoch: 18   Global Step: 195660   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:19:45,144-Speed 5407.00 samples/sec   Loss 1.0111   LearningRate 0.0011   Epoch: 18   Global Step: 195670   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:19:52,594-Speed 5498.75 samples/sec   Loss 0.9928   LearningRate 0.0011   Epoch: 18   Global Step: 195680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:20:00,049-Speed 5494.85 samples/sec   Loss 1.0017   LearningRate 0.0011   Epoch: 18   Global Step: 195690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:20:07,507-Speed 5492.75 samples/sec   Loss 1.0014   LearningRate 0.0011   Epoch: 18   Global Step: 195700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:20:14,982-Speed 5481.30 samples/sec   Loss 1.0001   LearningRate 0.0011   Epoch: 18   Global Step: 195710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:20:22,588-Speed 5385.21 samples/sec   Loss 0.9994   LearningRate 0.0011   Epoch: 18   Global Step: 195720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:20:30,087-Speed 5462.94 samples/sec   Loss 0.9678   LearningRate 0.0010   Epoch: 18   Global Step: 195730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:20:37,647-Speed 5418.85 samples/sec   Loss 1.0104   LearningRate 0.0010   Epoch: 18   Global Step: 195740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:20:45,135-Speed 5471.43 samples/sec   Loss 1.0130   LearningRate 0.0010   Epoch: 18   Global Step: 195750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:20:52,611-Speed 5479.07 samples/sec   Loss 0.9912   LearningRate 0.0010   Epoch: 18   Global Step: 195760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:21:00,032-Speed 5520.10 samples/sec   Loss 0.9873   LearningRate 0.0010   Epoch: 18   Global Step: 195770   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:21:07,445-Speed 5526.51 samples/sec   Loss 0.9988   LearningRate 0.0010   Epoch: 18   Global Step: 195780   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:21:14,926-Speed 5476.27 samples/sec   Loss 0.9860   LearningRate 0.0010   Epoch: 18   Global Step: 195790   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:21:22,513-Speed 5399.08 samples/sec   Loss 1.0110   LearningRate 0.0010   Epoch: 18   Global Step: 195800   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:21:30,067-Speed 5423.32 samples/sec   Loss 1.0100   LearningRate 0.0010   Epoch: 18   Global Step: 195810   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:21:37,534-Speed 5485.93 samples/sec   Loss 1.0342   LearningRate 0.0010   Epoch: 18   Global Step: 195820   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:21:45,029-Speed 5465.73 samples/sec   Loss 0.9837   LearningRate 0.0010   Epoch: 18   Global Step: 195830   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:21:52,456-Speed 5516.26 samples/sec   Loss 0.9903   LearningRate 0.0010   Epoch: 18   Global Step: 195840   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:21:59,963-Speed 5457.01 samples/sec   Loss 1.0018   LearningRate 0.0010   Epoch: 18   Global Step: 195850   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:22:07,384-Speed 5520.09 samples/sec   Loss 1.0075   LearningRate 0.0010   Epoch: 18   Global Step: 195860   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:22:14,873-Speed 5469.99 samples/sec   Loss 0.9968   LearningRate 0.0010   Epoch: 18   Global Step: 195870   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:22:22,261-Speed 5545.04 samples/sec   Loss 1.0137   LearningRate 0.0010   Epoch: 18   Global Step: 195880   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:22:29,714-Speed 5496.49 samples/sec   Loss 0.9952   LearningRate 0.0010   Epoch: 18   Global Step: 195890   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:22:37,138-Speed 5517.77 samples/sec   Loss 0.9868   LearningRate 0.0010   Epoch: 18   Global Step: 195900   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:22:44,575-Speed 5508.26 samples/sec   Loss 0.9951   LearningRate 0.0010   Epoch: 18   Global Step: 195910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:22:52,022-Speed 5501.51 samples/sec   Loss 0.9917   LearningRate 0.0010   Epoch: 18   Global Step: 195920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:22:59,468-Speed 5501.77 samples/sec   Loss 0.9897   LearningRate 0.0010   Epoch: 18   Global Step: 195930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:23:08,055-Speed 4770.07 samples/sec   Loss 0.9902   LearningRate 0.0010   Epoch: 18   Global Step: 195940   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:23:15,524-Speed 5484.81 samples/sec   Loss 1.0115   LearningRate 0.0010   Epoch: 18   Global Step: 195950   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:23:22,996-Speed 5483.40 samples/sec   Loss 0.9955   LearningRate 0.0010   Epoch: 18   Global Step: 195960   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:23:30,670-Speed 5337.76 samples/sec   Loss 0.9860   LearningRate 0.0010   Epoch: 18   Global Step: 195970   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:23:38,128-Speed 5492.63 samples/sec   Loss 1.0126   LearningRate 0.0010   Epoch: 18   Global Step: 195980   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:23:45,555-Speed 5515.27 samples/sec   Loss 0.9937   LearningRate 0.0010   Epoch: 18   Global Step: 195990   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:23:52,955-Speed 5536.60 samples/sec   Loss 1.0107   LearningRate 0.0010   Epoch: 18   Global Step: 196000   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:24:37,124-[lfw][196000]XNorm: 22.506727
Training: 2022-01-09 15:24:37,125-[lfw][196000]Accuracy-Flip: 0.99833+-0.00236
Training: 2022-01-09 15:24:37,126-[lfw][196000]Accuracy-Highest: 0.99850
Training: 2022-01-09 15:25:28,575-[cfp_fp][196000]XNorm: 22.295594
Training: 2022-01-09 15:25:28,576-[cfp_fp][196000]Accuracy-Flip: 0.99386+-0.00367
Training: 2022-01-09 15:25:28,577-[cfp_fp][196000]Accuracy-Highest: 0.99443
Training: 2022-01-09 15:26:12,807-[agedb_30][196000]XNorm: 23.196709
Training: 2022-01-09 15:26:12,808-[agedb_30][196000]Accuracy-Flip: 0.98583+-0.00512
Training: 2022-01-09 15:26:12,809-[agedb_30][196000]Accuracy-Highest: 0.98617
Training: 2022-01-09 15:26:19,930-Speed 278.69 samples/sec   Loss 1.0175   LearningRate 0.0010   Epoch: 18   Global Step: 196010   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:26:26,913-Speed 5866.27 samples/sec   Loss 0.9964   LearningRate 0.0010   Epoch: 18   Global Step: 196020   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:26:34,220-Speed 5605.98 samples/sec   Loss 1.0135   LearningRate 0.0010   Epoch: 18   Global Step: 196030   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 15:26:41,773-Speed 5423.61 samples/sec   Loss 0.9841   LearningRate 0.0010   Epoch: 18   Global Step: 196040   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:26:49,381-Speed 5384.79 samples/sec   Loss 1.0097   LearningRate 0.0010   Epoch: 18   Global Step: 196050   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:26:56,857-Speed 5479.84 samples/sec   Loss 0.9840   LearningRate 0.0010   Epoch: 18   Global Step: 196060   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:27:04,265-Speed 5529.70 samples/sec   Loss 0.9810   LearningRate 0.0010   Epoch: 18   Global Step: 196070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:27:11,751-Speed 5471.91 samples/sec   Loss 1.0062   LearningRate 0.0010   Epoch: 18   Global Step: 196080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:27:19,162-Speed 5528.17 samples/sec   Loss 0.9806   LearningRate 0.0010   Epoch: 18   Global Step: 196090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:27:26,615-Speed 5496.83 samples/sec   Loss 0.9813   LearningRate 0.0010   Epoch: 18   Global Step: 196100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:27:34,132-Speed 5449.78 samples/sec   Loss 1.0189   LearningRate 0.0010   Epoch: 18   Global Step: 196110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:27:41,743-Speed 5382.26 samples/sec   Loss 0.9957   LearningRate 0.0010   Epoch: 18   Global Step: 196120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:27:49,289-Speed 5428.47 samples/sec   Loss 0.9853   LearningRate 0.0010   Epoch: 18   Global Step: 196130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:27:56,732-Speed 5504.21 samples/sec   Loss 1.0055   LearningRate 0.0010   Epoch: 18   Global Step: 196140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 15:28:04,196-Speed 5488.85 samples/sec   Loss 0.9870   LearningRate 0.0010   Epoch: 18   Global Step: 196150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:28:11,723-Speed 5442.22 samples/sec   Loss 0.9943   LearningRate 0.0010   Epoch: 18   Global Step: 196160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:28:19,106-Speed 5548.55 samples/sec   Loss 1.0017   LearningRate 0.0010   Epoch: 18   Global Step: 196170   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:28:26,460-Speed 5570.60 samples/sec   Loss 1.0034   LearningRate 0.0010   Epoch: 18   Global Step: 196180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:28:33,870-Speed 5528.91 samples/sec   Loss 0.9970   LearningRate 0.0010   Epoch: 18   Global Step: 196190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:28:41,387-Speed 5448.72 samples/sec   Loss 0.9894   LearningRate 0.0010   Epoch: 18   Global Step: 196200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:28:48,802-Speed 5524.92 samples/sec   Loss 1.0037   LearningRate 0.0010   Epoch: 18   Global Step: 196210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:28:56,233-Speed 5513.25 samples/sec   Loss 0.9923   LearningRate 0.0010   Epoch: 18   Global Step: 196220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:29:03,768-Speed 5436.72 samples/sec   Loss 0.9998   LearningRate 0.0010   Epoch: 18   Global Step: 196230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:29:11,206-Speed 5507.40 samples/sec   Loss 1.0223   LearningRate 0.0010   Epoch: 18   Global Step: 196240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:29:18,831-Speed 5372.98 samples/sec   Loss 0.9813   LearningRate 0.0010   Epoch: 18   Global Step: 196250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 15:29:26,266-Speed 5509.78 samples/sec   Loss 1.0024   LearningRate 0.0010   Epoch: 18   Global Step: 196260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 15:29:33,786-Speed 5447.91 samples/sec   Loss 0.9932   LearningRate 0.0010   Epoch: 18   Global Step: 196270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 15:29:41,375-Speed 5397.76 samples/sec   Loss 0.9904   LearningRate 0.0010   Epoch: 18   Global Step: 196280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 15:29:48,873-Speed 5463.33 samples/sec   Loss 0.9900   LearningRate 0.0010   Epoch: 18   Global Step: 196290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:29:56,494-Speed 5375.72 samples/sec   Loss 0.9719   LearningRate 0.0009   Epoch: 18   Global Step: 196300   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:30:03,987-Speed 5466.91 samples/sec   Loss 0.9918   LearningRate 0.0009   Epoch: 18   Global Step: 196310   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:30:11,391-Speed 5533.14 samples/sec   Loss 1.0146   LearningRate 0.0009   Epoch: 18   Global Step: 196320   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:30:18,843-Speed 5496.90 samples/sec   Loss 0.9962   LearningRate 0.0009   Epoch: 18   Global Step: 196330   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:30:26,353-Speed 5455.07 samples/sec   Loss 1.0042   LearningRate 0.0009   Epoch: 18   Global Step: 196340   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:30:33,822-Speed 5484.55 samples/sec   Loss 0.9949   LearningRate 0.0009   Epoch: 18   Global Step: 196350   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:30:41,236-Speed 5525.11 samples/sec   Loss 0.9834   LearningRate 0.0009   Epoch: 18   Global Step: 196360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:30:48,692-Speed 5494.37 samples/sec   Loss 0.9729   LearningRate 0.0009   Epoch: 18   Global Step: 196370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:30:56,245-Speed 5424.19 samples/sec   Loss 0.9977   LearningRate 0.0009   Epoch: 18   Global Step: 196380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:31:03,782-Speed 5435.39 samples/sec   Loss 0.9907   LearningRate 0.0009   Epoch: 18   Global Step: 196390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 15:31:11,304-Speed 5446.17 samples/sec   Loss 0.9884   LearningRate 0.0009   Epoch: 18   Global Step: 196400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:31:18,928-Speed 5372.59 samples/sec   Loss 0.9760   LearningRate 0.0009   Epoch: 18   Global Step: 196410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:31:26,431-Speed 5460.44 samples/sec   Loss 0.9890   LearningRate 0.0009   Epoch: 18   Global Step: 196420   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:31:34,016-Speed 5401.22 samples/sec   Loss 0.9934   LearningRate 0.0009   Epoch: 18   Global Step: 196430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:31:41,542-Speed 5442.98 samples/sec   Loss 0.9743   LearningRate 0.0009   Epoch: 18   Global Step: 196440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:31:49,073-Speed 5439.02 samples/sec   Loss 0.9999   LearningRate 0.0009   Epoch: 18   Global Step: 196450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:31:56,545-Speed 5482.80 samples/sec   Loss 1.0019   LearningRate 0.0009   Epoch: 18   Global Step: 196460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 15:32:03,985-Speed 5511.15 samples/sec   Loss 1.0004   LearningRate 0.0009   Epoch: 18   Global Step: 196470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:32:11,499-Speed 5451.83 samples/sec   Loss 0.9831   LearningRate 0.0009   Epoch: 18   Global Step: 196480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:32:19,025-Speed 5443.04 samples/sec   Loss 0.9863   LearningRate 0.0009   Epoch: 18   Global Step: 196490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:32:26,527-Speed 5460.51 samples/sec   Loss 0.9987   LearningRate 0.0009   Epoch: 18   Global Step: 196500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 15:32:33,956-Speed 5514.99 samples/sec   Loss 0.9995   LearningRate 0.0009   Epoch: 18   Global Step: 196510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:32:41,404-Speed 5500.11 samples/sec   Loss 0.9687   LearningRate 0.0009   Epoch: 18   Global Step: 196520   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:32:48,843-Speed 5506.31 samples/sec   Loss 0.9836   LearningRate 0.0009   Epoch: 18   Global Step: 196530   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:32:56,260-Speed 5523.24 samples/sec   Loss 0.9865   LearningRate 0.0009   Epoch: 18   Global Step: 196540   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:33:03,767-Speed 5457.90 samples/sec   Loss 1.0062   LearningRate 0.0009   Epoch: 18   Global Step: 196550   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:33:11,266-Speed 5462.72 samples/sec   Loss 0.9623   LearningRate 0.0009   Epoch: 18   Global Step: 196560   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:33:18,681-Speed 5524.53 samples/sec   Loss 1.0081   LearningRate 0.0009   Epoch: 18   Global Step: 196570   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:33:26,049-Speed 5560.07 samples/sec   Loss 0.9853   LearningRate 0.0009   Epoch: 18   Global Step: 196580   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:33:33,442-Speed 5541.07 samples/sec   Loss 0.9874   LearningRate 0.0009   Epoch: 18   Global Step: 196590   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:33:40,830-Speed 5544.98 samples/sec   Loss 0.9958   LearningRate 0.0009   Epoch: 18   Global Step: 196600   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:33:48,254-Speed 5518.24 samples/sec   Loss 0.9752   LearningRate 0.0009   Epoch: 18   Global Step: 196610   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:33:55,666-Speed 5526.23 samples/sec   Loss 0.9950   LearningRate 0.0009   Epoch: 18   Global Step: 196620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:34:03,118-Speed 5497.72 samples/sec   Loss 0.9834   LearningRate 0.0009   Epoch: 18   Global Step: 196630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:34:10,668-Speed 5425.83 samples/sec   Loss 0.9561   LearningRate 0.0009   Epoch: 18   Global Step: 196640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:34:18,082-Speed 5525.21 samples/sec   Loss 1.0114   LearningRate 0.0009   Epoch: 18   Global Step: 196650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:34:25,514-Speed 5512.08 samples/sec   Loss 0.9883   LearningRate 0.0009   Epoch: 18   Global Step: 196660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:34:32,955-Speed 5505.62 samples/sec   Loss 1.0085   LearningRate 0.0009   Epoch: 18   Global Step: 196670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:34:40,362-Speed 5530.77 samples/sec   Loss 0.9939   LearningRate 0.0009   Epoch: 18   Global Step: 196680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:34:47,838-Speed 5479.59 samples/sec   Loss 0.9620   LearningRate 0.0009   Epoch: 18   Global Step: 196690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:34:55,318-Speed 5476.55 samples/sec   Loss 0.9985   LearningRate 0.0009   Epoch: 18   Global Step: 196700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:35:02,874-Speed 5421.82 samples/sec   Loss 0.9723   LearningRate 0.0009   Epoch: 18   Global Step: 196710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:35:10,385-Speed 5454.26 samples/sec   Loss 0.9967   LearningRate 0.0009   Epoch: 18   Global Step: 196720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 15:35:17,832-Speed 5501.28 samples/sec   Loss 0.9785   LearningRate 0.0009   Epoch: 18   Global Step: 196730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:35:25,253-Speed 5519.27 samples/sec   Loss 0.9760   LearningRate 0.0009   Epoch: 18   Global Step: 196740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:35:32,736-Speed 5474.73 samples/sec   Loss 0.9916   LearningRate 0.0009   Epoch: 18   Global Step: 196750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:35:40,246-Speed 5455.47 samples/sec   Loss 1.0037   LearningRate 0.0009   Epoch: 18   Global Step: 196760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:35:47,707-Speed 5490.54 samples/sec   Loss 0.9782   LearningRate 0.0009   Epoch: 18   Global Step: 196770   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:35:55,166-Speed 5491.89 samples/sec   Loss 0.9714   LearningRate 0.0009   Epoch: 18   Global Step: 196780   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:36:02,716-Speed 5425.95 samples/sec   Loss 0.9806   LearningRate 0.0009   Epoch: 18   Global Step: 196790   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:36:10,164-Speed 5500.15 samples/sec   Loss 0.9943   LearningRate 0.0009   Epoch: 18   Global Step: 196800   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:36:17,693-Speed 5440.92 samples/sec   Loss 0.9898   LearningRate 0.0009   Epoch: 18   Global Step: 196810   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:36:25,088-Speed 5539.78 samples/sec   Loss 0.9749   LearningRate 0.0009   Epoch: 18   Global Step: 196820   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:36:32,538-Speed 5498.89 samples/sec   Loss 0.9780   LearningRate 0.0009   Epoch: 18   Global Step: 196830   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:36:40,014-Speed 5479.62 samples/sec   Loss 0.9859   LearningRate 0.0009   Epoch: 18   Global Step: 196840   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:36:47,406-Speed 5541.87 samples/sec   Loss 0.9910   LearningRate 0.0009   Epoch: 18   Global Step: 196850   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:36:54,920-Speed 5451.61 samples/sec   Loss 0.9848   LearningRate 0.0009   Epoch: 18   Global Step: 196860   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:37:02,315-Speed 5539.12 samples/sec   Loss 0.9806   LearningRate 0.0009   Epoch: 18   Global Step: 196870   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:37:09,718-Speed 5534.37 samples/sec   Loss 0.9830   LearningRate 0.0009   Epoch: 18   Global Step: 196880   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:37:17,184-Speed 5486.54 samples/sec   Loss 0.9764   LearningRate 0.0009   Epoch: 18   Global Step: 196890   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:37:24,708-Speed 5445.15 samples/sec   Loss 0.9880   LearningRate 0.0008   Epoch: 18   Global Step: 196900   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:37:32,154-Speed 5501.01 samples/sec   Loss 0.9842   LearningRate 0.0008   Epoch: 18   Global Step: 196910   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:37:39,621-Speed 5486.53 samples/sec   Loss 1.0009   LearningRate 0.0008   Epoch: 18   Global Step: 196920   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:37:47,106-Speed 5473.17 samples/sec   Loss 0.9975   LearningRate 0.0008   Epoch: 18   Global Step: 196930   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:37:54,554-Speed 5500.63 samples/sec   Loss 0.9600   LearningRate 0.0008   Epoch: 18   Global Step: 196940   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:38:02,009-Speed 5494.47 samples/sec   Loss 0.9813   LearningRate 0.0008   Epoch: 18   Global Step: 196950   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:38:09,488-Speed 5477.33 samples/sec   Loss 0.9817   LearningRate 0.0008   Epoch: 18   Global Step: 196960   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:38:16,931-Speed 5504.06 samples/sec   Loss 0.9773   LearningRate 0.0008   Epoch: 18   Global Step: 196970   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:38:24,359-Speed 5514.98 samples/sec   Loss 0.9963   LearningRate 0.0008   Epoch: 18   Global Step: 196980   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:38:31,872-Speed 5452.40 samples/sec   Loss 0.9738   LearningRate 0.0008   Epoch: 18   Global Step: 196990   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:38:39,312-Speed 5506.68 samples/sec   Loss 0.9898   LearningRate 0.0008   Epoch: 18   Global Step: 197000   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:38:46,759-Speed 5500.73 samples/sec   Loss 0.9537   LearningRate 0.0008   Epoch: 18   Global Step: 197010   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:38:54,238-Speed 5477.43 samples/sec   Loss 0.9557   LearningRate 0.0008   Epoch: 18   Global Step: 197020   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:39:16,683-Speed 1825.00 samples/sec   Loss 0.9695   LearningRate 0.0008   Epoch: 19   Global Step: 197030   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:39:24,182-Speed 5463.13 samples/sec   Loss 0.9923   LearningRate 0.0008   Epoch: 19   Global Step: 197040   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:39:31,618-Speed 5508.88 samples/sec   Loss 0.9889   LearningRate 0.0008   Epoch: 19   Global Step: 197050   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:39:39,076-Speed 5492.82 samples/sec   Loss 0.9573   LearningRate 0.0008   Epoch: 19   Global Step: 197060   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:39:46,568-Speed 5467.78 samples/sec   Loss 0.9978   LearningRate 0.0008   Epoch: 19   Global Step: 197070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:39:53,972-Speed 5533.52 samples/sec   Loss 0.9845   LearningRate 0.0008   Epoch: 19   Global Step: 197080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:40:01,472-Speed 5461.81 samples/sec   Loss 0.9754   LearningRate 0.0008   Epoch: 19   Global Step: 197090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:40:08,927-Speed 5495.03 samples/sec   Loss 0.9807   LearningRate 0.0008   Epoch: 19   Global Step: 197100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:40:16,398-Speed 5483.29 samples/sec   Loss 1.0016   LearningRate 0.0008   Epoch: 19   Global Step: 197110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:40:23,882-Speed 5474.42 samples/sec   Loss 0.9751   LearningRate 0.0008   Epoch: 19   Global Step: 197120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:40:31,360-Speed 5477.83 samples/sec   Loss 0.9707   LearningRate 0.0008   Epoch: 19   Global Step: 197130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:40:38,904-Speed 5430.35 samples/sec   Loss 0.9713   LearningRate 0.0008   Epoch: 19   Global Step: 197140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:40:46,322-Speed 5522.72 samples/sec   Loss 0.9580   LearningRate 0.0008   Epoch: 19   Global Step: 197150   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:40:53,964-Speed 5359.99 samples/sec   Loss 0.9936   LearningRate 0.0008   Epoch: 19   Global Step: 197160   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:41:01,420-Speed 5494.68 samples/sec   Loss 0.9643   LearningRate 0.0008   Epoch: 19   Global Step: 197170   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:41:08,983-Speed 5416.25 samples/sec   Loss 0.9996   LearningRate 0.0008   Epoch: 19   Global Step: 197180   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:41:16,396-Speed 5526.33 samples/sec   Loss 0.9657   LearningRate 0.0008   Epoch: 19   Global Step: 197190   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:41:23,848-Speed 5497.43 samples/sec   Loss 0.9761   LearningRate 0.0008   Epoch: 19   Global Step: 197200   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:41:31,251-Speed 5533.44 samples/sec   Loss 0.9394   LearningRate 0.0008   Epoch: 19   Global Step: 197210   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:41:38,668-Speed 5523.55 samples/sec   Loss 1.0043   LearningRate 0.0008   Epoch: 19   Global Step: 197220   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:41:46,042-Speed 5555.16 samples/sec   Loss 0.9745   LearningRate 0.0008   Epoch: 19   Global Step: 197230   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:41:53,677-Speed 5365.91 samples/sec   Loss 0.9705   LearningRate 0.0008   Epoch: 19   Global Step: 197240   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:42:01,310-Speed 5366.91 samples/sec   Loss 0.9705   LearningRate 0.0008   Epoch: 19   Global Step: 197250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:42:08,983-Speed 5339.28 samples/sec   Loss 0.9638   LearningRate 0.0008   Epoch: 19   Global Step: 197260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:42:16,676-Speed 5324.83 samples/sec   Loss 0.9747   LearningRate 0.0008   Epoch: 19   Global Step: 197270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:42:24,222-Speed 5428.63 samples/sec   Loss 0.9654   LearningRate 0.0008   Epoch: 19   Global Step: 197280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:42:31,632-Speed 5528.74 samples/sec   Loss 0.9667   LearningRate 0.0008   Epoch: 19   Global Step: 197290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:42:39,107-Speed 5480.14 samples/sec   Loss 0.9580   LearningRate 0.0008   Epoch: 19   Global Step: 197300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:42:46,567-Speed 5491.47 samples/sec   Loss 0.9764   LearningRate 0.0008   Epoch: 19   Global Step: 197310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:42:54,029-Speed 5489.95 samples/sec   Loss 0.9554   LearningRate 0.0008   Epoch: 19   Global Step: 197320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:43:01,643-Speed 5380.39 samples/sec   Loss 0.9673   LearningRate 0.0008   Epoch: 19   Global Step: 197330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:43:09,065-Speed 5519.20 samples/sec   Loss 0.9633   LearningRate 0.0008   Epoch: 19   Global Step: 197340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:43:16,545-Speed 5477.38 samples/sec   Loss 0.9919   LearningRate 0.0008   Epoch: 19   Global Step: 197350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 15:43:23,958-Speed 5526.15 samples/sec   Loss 0.9810   LearningRate 0.0008   Epoch: 19   Global Step: 197360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:43:31,407-Speed 5498.96 samples/sec   Loss 0.9352   LearningRate 0.0008   Epoch: 19   Global Step: 197370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:43:38,792-Speed 5547.54 samples/sec   Loss 0.9895   LearningRate 0.0008   Epoch: 19   Global Step: 197380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:43:46,275-Speed 5474.63 samples/sec   Loss 0.9652   LearningRate 0.0008   Epoch: 19   Global Step: 197390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:43:53,708-Speed 5510.77 samples/sec   Loss 0.9716   LearningRate 0.0008   Epoch: 19   Global Step: 197400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:44:01,197-Speed 5470.99 samples/sec   Loss 0.9671   LearningRate 0.0008   Epoch: 19   Global Step: 197410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:44:08,664-Speed 5486.47 samples/sec   Loss 0.9487   LearningRate 0.0008   Epoch: 19   Global Step: 197420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:44:16,132-Speed 5484.99 samples/sec   Loss 0.9569   LearningRate 0.0008   Epoch: 19   Global Step: 197430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:44:23,606-Speed 5480.79 samples/sec   Loss 0.9698   LearningRate 0.0008   Epoch: 19   Global Step: 197440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:44:31,067-Speed 5491.40 samples/sec   Loss 0.9555   LearningRate 0.0008   Epoch: 19   Global Step: 197450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:44:38,486-Speed 5520.91 samples/sec   Loss 0.9653   LearningRate 0.0008   Epoch: 19   Global Step: 197460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 15:44:46,054-Speed 5413.42 samples/sec   Loss 0.9886   LearningRate 0.0008   Epoch: 19   Global Step: 197470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 15:44:53,492-Speed 5507.41 samples/sec   Loss 0.9800   LearningRate 0.0008   Epoch: 19   Global Step: 197480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 15:45:00,912-Speed 5520.83 samples/sec   Loss 0.9683   LearningRate 0.0008   Epoch: 19   Global Step: 197490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 15:45:08,295-Speed 5548.98 samples/sec   Loss 0.9709   LearningRate 0.0008   Epoch: 19   Global Step: 197500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 15:45:15,780-Speed 5473.49 samples/sec   Loss 0.9684   LearningRate 0.0008   Epoch: 19   Global Step: 197510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 15:45:23,148-Speed 5559.16 samples/sec   Loss 0.9778   LearningRate 0.0008   Epoch: 19   Global Step: 197520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:45:30,716-Speed 5413.52 samples/sec   Loss 0.9791   LearningRate 0.0007   Epoch: 19   Global Step: 197530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:45:38,219-Speed 5459.95 samples/sec   Loss 0.9647   LearningRate 0.0007   Epoch: 19   Global Step: 197540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:45:45,693-Speed 5480.55 samples/sec   Loss 0.9676   LearningRate 0.0007   Epoch: 19   Global Step: 197550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:45:53,118-Speed 5517.46 samples/sec   Loss 0.9739   LearningRate 0.0007   Epoch: 19   Global Step: 197560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:46:00,528-Speed 5528.82 samples/sec   Loss 0.9742   LearningRate 0.0007   Epoch: 19   Global Step: 197570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:46:08,074-Speed 5428.54 samples/sec   Loss 0.9674   LearningRate 0.0007   Epoch: 19   Global Step: 197580   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:46:15,560-Speed 5472.23 samples/sec   Loss 0.9472   LearningRate 0.0007   Epoch: 19   Global Step: 197590   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:46:23,042-Speed 5475.14 samples/sec   Loss 0.9482   LearningRate 0.0007   Epoch: 19   Global Step: 197600   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:46:30,549-Speed 5456.64 samples/sec   Loss 0.9743   LearningRate 0.0007   Epoch: 19   Global Step: 197610   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:46:38,035-Speed 5472.48 samples/sec   Loss 0.9309   LearningRate 0.0007   Epoch: 19   Global Step: 197620   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:46:45,557-Speed 5445.91 samples/sec   Loss 0.9539   LearningRate 0.0007   Epoch: 19   Global Step: 197630   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:46:53,035-Speed 5478.70 samples/sec   Loss 0.9541   LearningRate 0.0007   Epoch: 19   Global Step: 197640   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:47:00,510-Speed 5479.91 samples/sec   Loss 0.9577   LearningRate 0.0007   Epoch: 19   Global Step: 197650   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:47:08,040-Speed 5440.39 samples/sec   Loss 0.9623   LearningRate 0.0007   Epoch: 19   Global Step: 197660   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:47:15,578-Speed 5434.28 samples/sec   Loss 0.9598   LearningRate 0.0007   Epoch: 19   Global Step: 197670   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:47:23,070-Speed 5468.15 samples/sec   Loss 0.9726   LearningRate 0.0007   Epoch: 19   Global Step: 197680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:47:30,542-Speed 5482.28 samples/sec   Loss 0.9794   LearningRate 0.0007   Epoch: 19   Global Step: 197690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:47:37,980-Speed 5507.47 samples/sec   Loss 0.9656   LearningRate 0.0007   Epoch: 19   Global Step: 197700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:47:45,500-Speed 5448.15 samples/sec   Loss 0.9677   LearningRate 0.0007   Epoch: 19   Global Step: 197710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:47:53,004-Speed 5459.01 samples/sec   Loss 0.9684   LearningRate 0.0007   Epoch: 19   Global Step: 197720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:48:00,655-Speed 5353.69 samples/sec   Loss 0.9722   LearningRate 0.0007   Epoch: 19   Global Step: 197730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:48:08,246-Speed 5396.35 samples/sec   Loss 0.9567   LearningRate 0.0007   Epoch: 19   Global Step: 197740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:48:15,755-Speed 5456.57 samples/sec   Loss 0.9619   LearningRate 0.0007   Epoch: 19   Global Step: 197750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:48:23,210-Speed 5494.52 samples/sec   Loss 0.9811   LearningRate 0.0007   Epoch: 19   Global Step: 197760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:48:30,685-Speed 5479.97 samples/sec   Loss 0.9996   LearningRate 0.0007   Epoch: 19   Global Step: 197770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:48:38,118-Speed 5511.09 samples/sec   Loss 0.9668   LearningRate 0.0007   Epoch: 19   Global Step: 197780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 15:48:45,507-Speed 5544.90 samples/sec   Loss 0.9416   LearningRate 0.0007   Epoch: 19   Global Step: 197790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 15:48:52,922-Speed 5524.59 samples/sec   Loss 0.9631   LearningRate 0.0007   Epoch: 19   Global Step: 197800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 15:49:00,431-Speed 5455.21 samples/sec   Loss 0.9752   LearningRate 0.0007   Epoch: 19   Global Step: 197810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 15:49:07,906-Speed 5480.39 samples/sec   Loss 0.9392   LearningRate 0.0007   Epoch: 19   Global Step: 197820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 15:49:15,518-Speed 5381.89 samples/sec   Loss 0.9555   LearningRate 0.0007   Epoch: 19   Global Step: 197830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 15:49:23,092-Speed 5408.88 samples/sec   Loss 0.9661   LearningRate 0.0007   Epoch: 19   Global Step: 197840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 15:49:30,520-Speed 5514.66 samples/sec   Loss 0.9665   LearningRate 0.0007   Epoch: 19   Global Step: 197850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 15:49:38,009-Speed 5469.54 samples/sec   Loss 0.9709   LearningRate 0.0007   Epoch: 19   Global Step: 197860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 15:49:45,427-Speed 5522.94 samples/sec   Loss 0.9561   LearningRate 0.0007   Epoch: 19   Global Step: 197870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:49:52,969-Speed 5432.11 samples/sec   Loss 0.9724   LearningRate 0.0007   Epoch: 19   Global Step: 197880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:50:00,435-Speed 5486.46 samples/sec   Loss 0.9732   LearningRate 0.0007   Epoch: 19   Global Step: 197890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:50:07,944-Speed 5455.28 samples/sec   Loss 0.9645   LearningRate 0.0007   Epoch: 19   Global Step: 197900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:50:15,435-Speed 5468.90 samples/sec   Loss 0.9658   LearningRate 0.0007   Epoch: 19   Global Step: 197910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:50:22,874-Speed 5507.33 samples/sec   Loss 0.9542   LearningRate 0.0007   Epoch: 19   Global Step: 197920   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:50:30,318-Speed 5502.74 samples/sec   Loss 0.9597   LearningRate 0.0007   Epoch: 19   Global Step: 197930   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:50:37,788-Speed 5483.80 samples/sec   Loss 0.9711   LearningRate 0.0007   Epoch: 19   Global Step: 197940   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:50:45,250-Speed 5489.96 samples/sec   Loss 0.9682   LearningRate 0.0007   Epoch: 19   Global Step: 197950   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:50:52,671-Speed 5520.10 samples/sec   Loss 0.9617   LearningRate 0.0007   Epoch: 19   Global Step: 197960   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:51:00,148-Speed 5479.17 samples/sec   Loss 0.9614   LearningRate 0.0007   Epoch: 19   Global Step: 197970   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:51:07,683-Speed 5436.10 samples/sec   Loss 0.9641   LearningRate 0.0007   Epoch: 19   Global Step: 197980   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:51:15,174-Speed 5469.22 samples/sec   Loss 0.9746   LearningRate 0.0007   Epoch: 19   Global Step: 197990   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:51:22,593-Speed 5521.80 samples/sec   Loss 0.9513   LearningRate 0.0007   Epoch: 19   Global Step: 198000   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:52:20,582-[lfw][198000]XNorm: 22.175054
Training: 2022-01-09 15:52:20,583-[lfw][198000]Accuracy-Flip: 0.99833+-0.00236
Training: 2022-01-09 15:52:20,583-[lfw][198000]Accuracy-Highest: 0.99850
Training: 2022-01-09 15:53:28,293-[cfp_fp][198000]XNorm: 22.006353
Training: 2022-01-09 15:53:28,293-[cfp_fp][198000]Accuracy-Flip: 0.99386+-0.00367
Training: 2022-01-09 15:53:28,294-[cfp_fp][198000]Accuracy-Highest: 0.99443
Training: 2022-01-09 15:54:26,509-[agedb_30][198000]XNorm: 22.937108
Training: 2022-01-09 15:54:26,510-[agedb_30][198000]Accuracy-Flip: 0.98650+-0.00555
Training: 2022-01-09 15:54:26,511-[agedb_30][198000]Accuracy-Highest: 0.98650
Training: 2022-01-09 15:54:34,193-Speed 213.78 samples/sec   Loss 0.9591   LearningRate 0.0007   Epoch: 19   Global Step: 198010   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:54:41,800-Speed 5384.62 samples/sec   Loss 0.9621   LearningRate 0.0007   Epoch: 19   Global Step: 198020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:54:49,188-Speed 5545.01 samples/sec   Loss 0.9599   LearningRate 0.0007   Epoch: 19   Global Step: 198030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:54:56,593-Speed 5532.02 samples/sec   Loss 0.9497   LearningRate 0.0007   Epoch: 19   Global Step: 198040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:55:04,144-Speed 5425.68 samples/sec   Loss 0.9593   LearningRate 0.0007   Epoch: 19   Global Step: 198050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:55:11,696-Speed 5424.32 samples/sec   Loss 0.9570   LearningRate 0.0007   Epoch: 19   Global Step: 198060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:55:19,268-Speed 5410.56 samples/sec   Loss 0.9467   LearningRate 0.0007   Epoch: 19   Global Step: 198070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:55:26,757-Speed 5469.32 samples/sec   Loss 0.9465   LearningRate 0.0007   Epoch: 19   Global Step: 198080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:55:34,255-Speed 5464.33 samples/sec   Loss 0.9447   LearningRate 0.0007   Epoch: 19   Global Step: 198090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:55:41,872-Speed 5378.18 samples/sec   Loss 0.9395   LearningRate 0.0007   Epoch: 19   Global Step: 198100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:55:49,357-Speed 5473.02 samples/sec   Loss 0.9725   LearningRate 0.0007   Epoch: 19   Global Step: 198110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:55:56,786-Speed 5514.02 samples/sec   Loss 0.9695   LearningRate 0.0007   Epoch: 19   Global Step: 198120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 15:56:04,304-Speed 5448.88 samples/sec   Loss 0.9644   LearningRate 0.0007   Epoch: 19   Global Step: 198130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 15:56:11,864-Speed 5418.77 samples/sec   Loss 0.9496   LearningRate 0.0007   Epoch: 19   Global Step: 198140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 15:56:19,330-Speed 5487.21 samples/sec   Loss 0.9550   LearningRate 0.0007   Epoch: 19   Global Step: 198150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 15:56:26,826-Speed 5464.47 samples/sec   Loss 0.9714   LearningRate 0.0007   Epoch: 19   Global Step: 198160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:56:34,439-Speed 5380.96 samples/sec   Loss 0.9451   LearningRate 0.0007   Epoch: 19   Global Step: 198170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:56:41,959-Speed 5448.25 samples/sec   Loss 0.9522   LearningRate 0.0007   Epoch: 19   Global Step: 198180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:56:49,438-Speed 5477.62 samples/sec   Loss 0.9663   LearningRate 0.0007   Epoch: 19   Global Step: 198190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:56:56,900-Speed 5488.93 samples/sec   Loss 0.9458   LearningRate 0.0007   Epoch: 19   Global Step: 198200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:57:04,432-Speed 5439.75 samples/sec   Loss 0.9670   LearningRate 0.0006   Epoch: 19   Global Step: 198210   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:57:11,969-Speed 5434.80 samples/sec   Loss 0.9571   LearningRate 0.0006   Epoch: 19   Global Step: 198220   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:57:19,533-Speed 5416.42 samples/sec   Loss 0.9462   LearningRate 0.0006   Epoch: 19   Global Step: 198230   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:57:27,094-Speed 5417.55 samples/sec   Loss 0.9332   LearningRate 0.0006   Epoch: 19   Global Step: 198240   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:57:34,689-Speed 5394.11 samples/sec   Loss 0.9673   LearningRate 0.0006   Epoch: 19   Global Step: 198250   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:57:42,143-Speed 5495.80 samples/sec   Loss 0.9581   LearningRate 0.0006   Epoch: 19   Global Step: 198260   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:57:49,613-Speed 5483.54 samples/sec   Loss 0.9491   LearningRate 0.0006   Epoch: 19   Global Step: 198270   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:57:57,140-Speed 5442.61 samples/sec   Loss 0.9524   LearningRate 0.0006   Epoch: 19   Global Step: 198280   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:58:04,573-Speed 5511.07 samples/sec   Loss 0.9625   LearningRate 0.0006   Epoch: 19   Global Step: 198290   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:58:12,006-Speed 5511.21 samples/sec   Loss 0.9505   LearningRate 0.0006   Epoch: 19   Global Step: 198300   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:58:19,534-Speed 5442.11 samples/sec   Loss 0.9303   LearningRate 0.0006   Epoch: 19   Global Step: 198310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:58:26,981-Speed 5500.81 samples/sec   Loss 0.9641   LearningRate 0.0006   Epoch: 19   Global Step: 198320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:58:34,489-Speed 5456.27 samples/sec   Loss 0.9526   LearningRate 0.0006   Epoch: 19   Global Step: 198330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:58:41,975-Speed 5472.78 samples/sec   Loss 0.9580   LearningRate 0.0006   Epoch: 19   Global Step: 198340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:58:49,478-Speed 5459.47 samples/sec   Loss 0.9472   LearningRate 0.0006   Epoch: 19   Global Step: 198350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 15:58:56,968-Speed 5469.31 samples/sec   Loss 0.9533   LearningRate 0.0006   Epoch: 19   Global Step: 198360   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:59:04,553-Speed 5401.32 samples/sec   Loss 0.9707   LearningRate 0.0006   Epoch: 19   Global Step: 198370   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:59:12,181-Speed 5370.07 samples/sec   Loss 0.9370   LearningRate 0.0006   Epoch: 19   Global Step: 198380   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:59:19,692-Speed 5454.19 samples/sec   Loss 0.9468   LearningRate 0.0006   Epoch: 19   Global Step: 198390   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:59:27,129-Speed 5508.68 samples/sec   Loss 0.9596   LearningRate 0.0006   Epoch: 19   Global Step: 198400   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:59:34,564-Speed 5509.42 samples/sec   Loss 0.9663   LearningRate 0.0006   Epoch: 19   Global Step: 198410   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:59:42,003-Speed 5507.14 samples/sec   Loss 0.9431   LearningRate 0.0006   Epoch: 19   Global Step: 198420   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:59:49,474-Speed 5483.56 samples/sec   Loss 0.9543   LearningRate 0.0006   Epoch: 19   Global Step: 198430   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 15:59:56,861-Speed 5545.97 samples/sec   Loss 0.9705   LearningRate 0.0006   Epoch: 19   Global Step: 198440   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:00:04,414-Speed 5423.21 samples/sec   Loss 0.9503   LearningRate 0.0006   Epoch: 19   Global Step: 198450   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:00:11,923-Speed 5455.67 samples/sec   Loss 0.9298   LearningRate 0.0006   Epoch: 19   Global Step: 198460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:00:19,400-Speed 5478.72 samples/sec   Loss 0.9294   LearningRate 0.0006   Epoch: 19   Global Step: 198470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:00:26,828-Speed 5515.22 samples/sec   Loss 0.9295   LearningRate 0.0006   Epoch: 19   Global Step: 198480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:00:34,295-Speed 5485.95 samples/sec   Loss 0.9561   LearningRate 0.0006   Epoch: 19   Global Step: 198490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:00:41,815-Speed 5447.77 samples/sec   Loss 0.9472   LearningRate 0.0006   Epoch: 19   Global Step: 198500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:00:49,339-Speed 5445.01 samples/sec   Loss 0.9518   LearningRate 0.0006   Epoch: 19   Global Step: 198510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:00:56,929-Speed 5397.20 samples/sec   Loss 0.9624   LearningRate 0.0006   Epoch: 19   Global Step: 198520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:01:04,413-Speed 5473.30 samples/sec   Loss 0.9273   LearningRate 0.0006   Epoch: 19   Global Step: 198530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:01:11,840-Speed 5516.04 samples/sec   Loss 0.9545   LearningRate 0.0006   Epoch: 19   Global Step: 198540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:01:19,359-Speed 5448.54 samples/sec   Loss 0.9483   LearningRate 0.0006   Epoch: 19   Global Step: 198550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:01:26,828-Speed 5484.64 samples/sec   Loss 0.9341   LearningRate 0.0006   Epoch: 19   Global Step: 198560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 16:01:34,262-Speed 5510.71 samples/sec   Loss 0.9279   LearningRate 0.0006   Epoch: 19   Global Step: 198570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 16:01:41,787-Speed 5443.48 samples/sec   Loss 0.9505   LearningRate 0.0006   Epoch: 19   Global Step: 198580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 16:01:49,286-Speed 5463.19 samples/sec   Loss 0.9478   LearningRate 0.0006   Epoch: 19   Global Step: 198590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:01:56,728-Speed 5504.38 samples/sec   Loss 0.9417   LearningRate 0.0006   Epoch: 19   Global Step: 198600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:02:04,210-Speed 5475.60 samples/sec   Loss 0.9522   LearningRate 0.0006   Epoch: 19   Global Step: 198610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:02:11,634-Speed 5517.44 samples/sec   Loss 0.9378   LearningRate 0.0006   Epoch: 19   Global Step: 198620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:02:19,088-Speed 5496.24 samples/sec   Loss 0.9630   LearningRate 0.0006   Epoch: 19   Global Step: 198630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:02:26,578-Speed 5469.70 samples/sec   Loss 0.9569   LearningRate 0.0006   Epoch: 19   Global Step: 198640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:02:34,040-Speed 5489.65 samples/sec   Loss 0.9400   LearningRate 0.0006   Epoch: 19   Global Step: 198650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:02:41,451-Speed 5527.58 samples/sec   Loss 0.9418   LearningRate 0.0006   Epoch: 19   Global Step: 198660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:02:48,868-Speed 5523.36 samples/sec   Loss 0.9488   LearningRate 0.0006   Epoch: 19   Global Step: 198670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:02:56,358-Speed 5468.88 samples/sec   Loss 0.9624   LearningRate 0.0006   Epoch: 19   Global Step: 198680   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:03:03,766-Speed 5529.87 samples/sec   Loss 0.9569   LearningRate 0.0006   Epoch: 19   Global Step: 198690   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:03:11,176-Speed 5529.09 samples/sec   Loss 0.9229   LearningRate 0.0006   Epoch: 19   Global Step: 198700   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:03:18,721-Speed 5429.33 samples/sec   Loss 0.9531   LearningRate 0.0006   Epoch: 19   Global Step: 198710   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:03:26,149-Speed 5515.16 samples/sec   Loss 0.9219   LearningRate 0.0006   Epoch: 19   Global Step: 198720   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:03:33,808-Speed 5348.92 samples/sec   Loss 0.9306   LearningRate 0.0006   Epoch: 19   Global Step: 198730   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:03:41,301-Speed 5466.61 samples/sec   Loss 0.9457   LearningRate 0.0006   Epoch: 19   Global Step: 198740   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:03:48,780-Speed 5477.90 samples/sec   Loss 0.9380   LearningRate 0.0006   Epoch: 19   Global Step: 198750   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:03:56,215-Speed 5509.57 samples/sec   Loss 0.9552   LearningRate 0.0006   Epoch: 19   Global Step: 198760   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:04:03,733-Speed 5449.09 samples/sec   Loss 0.9481   LearningRate 0.0006   Epoch: 19   Global Step: 198770   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:04:11,266-Speed 5438.15 samples/sec   Loss 0.9592   LearningRate 0.0006   Epoch: 19   Global Step: 198780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:04:18,727-Speed 5491.01 samples/sec   Loss 0.9401   LearningRate 0.0006   Epoch: 19   Global Step: 198790   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:04:26,268-Speed 5431.82 samples/sec   Loss 0.9558   LearningRate 0.0006   Epoch: 19   Global Step: 198800   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:04:33,650-Speed 5549.57 samples/sec   Loss 0.9365   LearningRate 0.0006   Epoch: 19   Global Step: 198810   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:04:41,048-Speed 5537.28 samples/sec   Loss 0.9359   LearningRate 0.0006   Epoch: 19   Global Step: 198820   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:04:48,641-Speed 5395.63 samples/sec   Loss 0.9441   LearningRate 0.0006   Epoch: 19   Global Step: 198830   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:04:56,221-Speed 5404.17 samples/sec   Loss 0.9602   LearningRate 0.0006   Epoch: 19   Global Step: 198840   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:05:03,657-Speed 5509.38 samples/sec   Loss 0.9382   LearningRate 0.0006   Epoch: 19   Global Step: 198850   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:05:11,236-Speed 5405.27 samples/sec   Loss 0.9555   LearningRate 0.0006   Epoch: 19   Global Step: 198860   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:05:18,677-Speed 5505.31 samples/sec   Loss 0.9491   LearningRate 0.0006   Epoch: 19   Global Step: 198870   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:05:26,175-Speed 5463.53 samples/sec   Loss 0.9375   LearningRate 0.0006   Epoch: 19   Global Step: 198880   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:05:33,620-Speed 5502.29 samples/sec   Loss 0.9287   LearningRate 0.0006   Epoch: 19   Global Step: 198890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:05:41,165-Speed 5429.59 samples/sec   Loss 0.9461   LearningRate 0.0006   Epoch: 19   Global Step: 198900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:05:48,662-Speed 5464.41 samples/sec   Loss 0.9519   LearningRate 0.0006   Epoch: 19   Global Step: 198910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:05:56,138-Speed 5480.63 samples/sec   Loss 0.9043   LearningRate 0.0006   Epoch: 19   Global Step: 198920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:06:03,562-Speed 5518.39 samples/sec   Loss 0.9274   LearningRate 0.0006   Epoch: 19   Global Step: 198930   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:06:11,092-Speed 5440.52 samples/sec   Loss 0.9159   LearningRate 0.0006   Epoch: 19   Global Step: 198940   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:06:18,538-Speed 5501.19 samples/sec   Loss 0.9531   LearningRate 0.0005   Epoch: 19   Global Step: 198950   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:06:25,954-Speed 5524.22 samples/sec   Loss 0.9267   LearningRate 0.0005   Epoch: 19   Global Step: 198960   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:06:33,378-Speed 5518.05 samples/sec   Loss 0.9430   LearningRate 0.0005   Epoch: 19   Global Step: 198970   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:06:40,762-Speed 5548.07 samples/sec   Loss 0.9251   LearningRate 0.0005   Epoch: 19   Global Step: 198980   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:06:48,257-Speed 5465.79 samples/sec   Loss 0.9365   LearningRate 0.0005   Epoch: 19   Global Step: 198990   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:06:55,669-Speed 5527.16 samples/sec   Loss 0.9466   LearningRate 0.0005   Epoch: 19   Global Step: 199000   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:07:03,229-Speed 5418.32 samples/sec   Loss 0.9242   LearningRate 0.0005   Epoch: 19   Global Step: 199010   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:07:10,676-Speed 5501.08 samples/sec   Loss 0.9269   LearningRate 0.0005   Epoch: 19   Global Step: 199020   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:07:18,089-Speed 5526.40 samples/sec   Loss 0.9395   LearningRate 0.0005   Epoch: 19   Global Step: 199030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:07:25,539-Speed 5498.42 samples/sec   Loss 0.9345   LearningRate 0.0005   Epoch: 19   Global Step: 199040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:07:33,048-Speed 5455.29 samples/sec   Loss 0.9438   LearningRate 0.0005   Epoch: 19   Global Step: 199050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:07:40,529-Speed 5475.96 samples/sec   Loss 0.9515   LearningRate 0.0005   Epoch: 19   Global Step: 199060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:07:47,984-Speed 5494.97 samples/sec   Loss 0.9214   LearningRate 0.0005   Epoch: 19   Global Step: 199070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:07:55,537-Speed 5423.90 samples/sec   Loss 0.9300   LearningRate 0.0005   Epoch: 19   Global Step: 199080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:08:03,001-Speed 5488.75 samples/sec   Loss 0.9371   LearningRate 0.0005   Epoch: 19   Global Step: 199090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:08:10,450-Speed 5499.09 samples/sec   Loss 0.9768   LearningRate 0.0005   Epoch: 19   Global Step: 199100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:08:17,899-Speed 5500.15 samples/sec   Loss 0.9460   LearningRate 0.0005   Epoch: 19   Global Step: 199110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:08:25,302-Speed 5533.74 samples/sec   Loss 0.9155   LearningRate 0.0005   Epoch: 19   Global Step: 199120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:08:32,802-Speed 5461.48 samples/sec   Loss 0.9490   LearningRate 0.0005   Epoch: 19   Global Step: 199130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 16:08:40,301-Speed 5463.23 samples/sec   Loss 0.9375   LearningRate 0.0005   Epoch: 19   Global Step: 199140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 16:08:47,747-Speed 5501.52 samples/sec   Loss 0.9401   LearningRate 0.0005   Epoch: 19   Global Step: 199150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 16:08:55,117-Speed 5558.26 samples/sec   Loss 0.9275   LearningRate 0.0005   Epoch: 19   Global Step: 199160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:09:02,600-Speed 5474.86 samples/sec   Loss 0.9442   LearningRate 0.0005   Epoch: 19   Global Step: 199170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:09:10,022-Speed 5519.19 samples/sec   Loss 0.9574   LearningRate 0.0005   Epoch: 19   Global Step: 199180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:09:17,617-Speed 5394.15 samples/sec   Loss 0.9323   LearningRate 0.0005   Epoch: 19   Global Step: 199190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:09:25,041-Speed 5518.12 samples/sec   Loss 0.9298   LearningRate 0.0005   Epoch: 19   Global Step: 199200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:09:32,520-Speed 5477.71 samples/sec   Loss 0.9447   LearningRate 0.0005   Epoch: 19   Global Step: 199210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:09:40,010-Speed 5469.29 samples/sec   Loss 0.9474   LearningRate 0.0005   Epoch: 19   Global Step: 199220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:09:47,489-Speed 5477.20 samples/sec   Loss 0.9421   LearningRate 0.0005   Epoch: 19   Global Step: 199230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:09:54,921-Speed 5511.93 samples/sec   Loss 0.9357   LearningRate 0.0005   Epoch: 19   Global Step: 199240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:10:02,383-Speed 5490.28 samples/sec   Loss 0.9344   LearningRate 0.0005   Epoch: 19   Global Step: 199250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:10:09,804-Speed 5520.51 samples/sec   Loss 0.9428   LearningRate 0.0005   Epoch: 19   Global Step: 199260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 16:10:17,278-Speed 5481.17 samples/sec   Loss 0.9364   LearningRate 0.0005   Epoch: 19   Global Step: 199270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 16:10:24,706-Speed 5514.69 samples/sec   Loss 0.9370   LearningRate 0.0005   Epoch: 19   Global Step: 199280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 16:10:32,135-Speed 5514.80 samples/sec   Loss 0.9535   LearningRate 0.0005   Epoch: 19   Global Step: 199290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 16:10:39,614-Speed 5477.04 samples/sec   Loss 0.9591   LearningRate 0.0005   Epoch: 19   Global Step: 199300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 16:10:47,132-Speed 5449.24 samples/sec   Loss 0.9560   LearningRate 0.0005   Epoch: 19   Global Step: 199310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 16:10:54,591-Speed 5492.11 samples/sec   Loss 0.9077   LearningRate 0.0005   Epoch: 19   Global Step: 199320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:11:02,095-Speed 5459.12 samples/sec   Loss 0.9405   LearningRate 0.0005   Epoch: 19   Global Step: 199330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:11:09,725-Speed 5369.20 samples/sec   Loss 0.9420   LearningRate 0.0005   Epoch: 19   Global Step: 199340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:11:17,192-Speed 5485.96 samples/sec   Loss 0.9288   LearningRate 0.0005   Epoch: 19   Global Step: 199350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:11:24,652-Speed 5491.65 samples/sec   Loss 0.9311   LearningRate 0.0005   Epoch: 19   Global Step: 199360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:11:32,015-Speed 5563.63 samples/sec   Loss 0.9398   LearningRate 0.0005   Epoch: 19   Global Step: 199370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:11:39,491-Speed 5480.02 samples/sec   Loss 0.9268   LearningRate 0.0005   Epoch: 19   Global Step: 199380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:11:47,110-Speed 5376.25 samples/sec   Loss 0.9491   LearningRate 0.0005   Epoch: 19   Global Step: 199390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:11:54,574-Speed 5488.62 samples/sec   Loss 0.9297   LearningRate 0.0005   Epoch: 19   Global Step: 199400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:12:01,972-Speed 5537.74 samples/sec   Loss 0.9374   LearningRate 0.0005   Epoch: 19   Global Step: 199410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:12:09,409-Speed 5508.31 samples/sec   Loss 0.9239   LearningRate 0.0005   Epoch: 19   Global Step: 199420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:12:16,825-Speed 5523.93 samples/sec   Loss 0.9366   LearningRate 0.0005   Epoch: 19   Global Step: 199430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:12:24,425-Speed 5390.06 samples/sec   Loss 0.9204   LearningRate 0.0005   Epoch: 19   Global Step: 199440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:12:31,820-Speed 5539.47 samples/sec   Loss 0.9524   LearningRate 0.0005   Epoch: 19   Global Step: 199450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:12:39,243-Speed 5519.54 samples/sec   Loss 0.9404   LearningRate 0.0005   Epoch: 19   Global Step: 199460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:12:46,716-Speed 5481.80 samples/sec   Loss 0.9428   LearningRate 0.0005   Epoch: 19   Global Step: 199470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:12:54,252-Speed 5435.94 samples/sec   Loss 0.9379   LearningRate 0.0005   Epoch: 19   Global Step: 199480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:13:01,640-Speed 5544.81 samples/sec   Loss 0.9373   LearningRate 0.0005   Epoch: 19   Global Step: 199490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:13:09,086-Speed 5501.65 samples/sec   Loss 0.9126   LearningRate 0.0005   Epoch: 19   Global Step: 199500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:13:16,537-Speed 5497.74 samples/sec   Loss 0.9044   LearningRate 0.0005   Epoch: 19   Global Step: 199510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:13:23,920-Speed 5548.72 samples/sec   Loss 0.9460   LearningRate 0.0005   Epoch: 19   Global Step: 199520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 16:13:31,396-Speed 5480.08 samples/sec   Loss 0.9313   LearningRate 0.0005   Epoch: 19   Global Step: 199530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 16:13:38,805-Speed 5529.17 samples/sec   Loss 0.9352   LearningRate 0.0005   Epoch: 19   Global Step: 199540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 16:13:46,274-Speed 5484.29 samples/sec   Loss 0.9330   LearningRate 0.0005   Epoch: 19   Global Step: 199550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 16:13:53,742-Speed 5485.31 samples/sec   Loss 0.9328   LearningRate 0.0005   Epoch: 19   Global Step: 199560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 16:14:01,184-Speed 5504.97 samples/sec   Loss 0.9180   LearningRate 0.0005   Epoch: 19   Global Step: 199570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 16:14:08,543-Speed 5566.75 samples/sec   Loss 0.9138   LearningRate 0.0005   Epoch: 19   Global Step: 199580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:14:16,061-Speed 5448.70 samples/sec   Loss 0.9245   LearningRate 0.0005   Epoch: 19   Global Step: 199590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:14:23,827-Speed 5275.22 samples/sec   Loss 0.9385   LearningRate 0.0005   Epoch: 19   Global Step: 199600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:14:31,255-Speed 5515.56 samples/sec   Loss 0.9413   LearningRate 0.0005   Epoch: 19   Global Step: 199610   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:14:38,679-Speed 5517.57 samples/sec   Loss 0.9301   LearningRate 0.0005   Epoch: 19   Global Step: 199620   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:14:46,152-Speed 5481.43 samples/sec   Loss 0.9436   LearningRate 0.0005   Epoch: 19   Global Step: 199630   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:14:53,623-Speed 5483.67 samples/sec   Loss 0.9281   LearningRate 0.0005   Epoch: 19   Global Step: 199640   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:15:01,076-Speed 5496.35 samples/sec   Loss 0.9406   LearningRate 0.0005   Epoch: 19   Global Step: 199650   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:15:08,479-Speed 5533.70 samples/sec   Loss 0.9325   LearningRate 0.0005   Epoch: 19   Global Step: 199660   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:15:15,931-Speed 5497.39 samples/sec   Loss 0.9365   LearningRate 0.0005   Epoch: 19   Global Step: 199670   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:15:23,387-Speed 5494.54 samples/sec   Loss 0.9218   LearningRate 0.0005   Epoch: 19   Global Step: 199680   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:15:30,857-Speed 5483.49 samples/sec   Loss 0.9396   LearningRate 0.0005   Epoch: 19   Global Step: 199690   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:15:38,320-Speed 5489.68 samples/sec   Loss 0.9309   LearningRate 0.0005   Epoch: 19   Global Step: 199700   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:15:45,827-Speed 5456.57 samples/sec   Loss 0.9250   LearningRate 0.0005   Epoch: 19   Global Step: 199710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:15:53,366-Speed 5434.01 samples/sec   Loss 0.9122   LearningRate 0.0005   Epoch: 19   Global Step: 199720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:16:00,870-Speed 5459.69 samples/sec   Loss 0.9468   LearningRate 0.0005   Epoch: 19   Global Step: 199730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:16:08,437-Speed 5413.18 samples/sec   Loss 0.9433   LearningRate 0.0005   Epoch: 19   Global Step: 199740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:16:15,931-Speed 5467.08 samples/sec   Loss 0.9044   LearningRate 0.0004   Epoch: 19   Global Step: 199750   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:16:23,407-Speed 5479.35 samples/sec   Loss 0.9236   LearningRate 0.0004   Epoch: 19   Global Step: 199760   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:16:30,818-Speed 5527.39 samples/sec   Loss 0.9155   LearningRate 0.0004   Epoch: 19   Global Step: 199770   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:16:38,242-Speed 5518.12 samples/sec   Loss 0.9232   LearningRate 0.0004   Epoch: 19   Global Step: 199780   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:16:45,699-Speed 5493.90 samples/sec   Loss 0.9180   LearningRate 0.0004   Epoch: 19   Global Step: 199790   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:16:53,143-Speed 5502.65 samples/sec   Loss 0.9203   LearningRate 0.0004   Epoch: 19   Global Step: 199800   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:17:00,631-Speed 5471.07 samples/sec   Loss 0.9229   LearningRate 0.0004   Epoch: 19   Global Step: 199810   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:17:08,071-Speed 5506.74 samples/sec   Loss 0.9097   LearningRate 0.0004   Epoch: 19   Global Step: 199820   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:17:15,493-Speed 5519.02 samples/sec   Loss 0.9360   LearningRate 0.0004   Epoch: 19   Global Step: 199830   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:17:23,004-Speed 5454.40 samples/sec   Loss 0.9278   LearningRate 0.0004   Epoch: 19   Global Step: 199840   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:17:30,472-Speed 5485.53 samples/sec   Loss 0.9279   LearningRate 0.0004   Epoch: 19   Global Step: 199850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:17:38,052-Speed 5404.32 samples/sec   Loss 0.9265   LearningRate 0.0004   Epoch: 19   Global Step: 199860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:17:45,482-Speed 5513.74 samples/sec   Loss 0.9211   LearningRate 0.0004   Epoch: 19   Global Step: 199870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:17:53,080-Speed 5391.21 samples/sec   Loss 0.9478   LearningRate 0.0004   Epoch: 19   Global Step: 199880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:18:00,620-Speed 5433.43 samples/sec   Loss 0.9450   LearningRate 0.0004   Epoch: 19   Global Step: 199890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:18:08,017-Speed 5538.42 samples/sec   Loss 0.9497   LearningRate 0.0004   Epoch: 19   Global Step: 199900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:18:15,501-Speed 5473.53 samples/sec   Loss 0.9248   LearningRate 0.0004   Epoch: 19   Global Step: 199910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:18:22,943-Speed 5504.26 samples/sec   Loss 0.9319   LearningRate 0.0004   Epoch: 19   Global Step: 199920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:18:30,419-Speed 5480.18 samples/sec   Loss 0.9325   LearningRate 0.0004   Epoch: 19   Global Step: 199930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:18:37,867-Speed 5500.59 samples/sec   Loss 0.9268   LearningRate 0.0004   Epoch: 19   Global Step: 199940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:18:45,338-Speed 5482.41 samples/sec   Loss 0.9379   LearningRate 0.0004   Epoch: 19   Global Step: 199950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 16:18:52,825-Speed 5471.87 samples/sec   Loss 0.9373   LearningRate 0.0004   Epoch: 19   Global Step: 199960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 16:19:00,276-Speed 5497.86 samples/sec   Loss 0.9092   LearningRate 0.0004   Epoch: 19   Global Step: 199970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 16:19:07,701-Speed 5517.38 samples/sec   Loss 0.9265   LearningRate 0.0004   Epoch: 19   Global Step: 199980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 16:19:15,148-Speed 5500.87 samples/sec   Loss 0.9145   LearningRate 0.0004   Epoch: 19   Global Step: 199990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:19:22,613-Speed 5487.77 samples/sec   Loss 0.9165   LearningRate 0.0004   Epoch: 19   Global Step: 200000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:20:07,022-[lfw][200000]XNorm: 22.299258
Training: 2022-01-09 16:20:07,023-[lfw][200000]Accuracy-Flip: 0.99833+-0.00236
Training: 2022-01-09 16:20:07,023-[lfw][200000]Accuracy-Highest: 0.99850
Training: 2022-01-09 16:20:58,656-[cfp_fp][200000]XNorm: 22.083487
Training: 2022-01-09 16:20:58,657-[cfp_fp][200000]Accuracy-Flip: 0.99371+-0.00368
Training: 2022-01-09 16:20:58,657-[cfp_fp][200000]Accuracy-Highest: 0.99443
Training: 2022-01-09 16:21:43,027-[agedb_30][200000]XNorm: 23.061013
Training: 2022-01-09 16:21:43,028-[agedb_30][200000]Accuracy-Flip: 0.98667+-0.00532
Training: 2022-01-09 16:21:43,028-[agedb_30][200000]Accuracy-Highest: 0.98667
Training: 2022-01-09 16:21:50,606-Speed 276.77 samples/sec   Loss 0.9496   LearningRate 0.0004   Epoch: 19   Global Step: 200010   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:21:58,044-Speed 5507.20 samples/sec   Loss 0.9432   LearningRate 0.0004   Epoch: 19   Global Step: 200020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:22:05,594-Speed 5425.94 samples/sec   Loss 0.9422   LearningRate 0.0004   Epoch: 19   Global Step: 200030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:22:13,214-Speed 5376.19 samples/sec   Loss 0.9192   LearningRate 0.0004   Epoch: 19   Global Step: 200040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:22:20,671-Speed 5493.93 samples/sec   Loss 0.9508   LearningRate 0.0004   Epoch: 19   Global Step: 200050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:22:28,103-Speed 5511.44 samples/sec   Loss 0.9342   LearningRate 0.0004   Epoch: 19   Global Step: 200060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:22:35,602-Speed 5463.11 samples/sec   Loss 0.9253   LearningRate 0.0004   Epoch: 19   Global Step: 200070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:22:43,064-Speed 5489.77 samples/sec   Loss 0.9119   LearningRate 0.0004   Epoch: 19   Global Step: 200080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:22:50,493-Speed 5514.83 samples/sec   Loss 0.9327   LearningRate 0.0004   Epoch: 19   Global Step: 200090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 16:22:57,933-Speed 5505.65 samples/sec   Loss 0.9334   LearningRate 0.0004   Epoch: 19   Global Step: 200100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:23:05,417-Speed 5473.82 samples/sec   Loss 0.9407   LearningRate 0.0004   Epoch: 19   Global Step: 200110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:23:12,877-Speed 5491.31 samples/sec   Loss 0.9252   LearningRate 0.0004   Epoch: 19   Global Step: 200120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:23:20,317-Speed 5506.20 samples/sec   Loss 0.9240   LearningRate 0.0004   Epoch: 19   Global Step: 200130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:23:27,734-Speed 5523.48 samples/sec   Loss 0.9288   LearningRate 0.0004   Epoch: 19   Global Step: 200140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:23:35,181-Speed 5500.63 samples/sec   Loss 0.9217   LearningRate 0.0004   Epoch: 19   Global Step: 200150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:23:42,587-Speed 5531.49 samples/sec   Loss 0.9379   LearningRate 0.0004   Epoch: 19   Global Step: 200160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:23:50,084-Speed 5464.59 samples/sec   Loss 0.9276   LearningRate 0.0004   Epoch: 19   Global Step: 200170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:23:57,525-Speed 5504.97 samples/sec   Loss 0.9294   LearningRate 0.0004   Epoch: 19   Global Step: 200180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:24:04,979-Speed 5496.06 samples/sec   Loss 0.9141   LearningRate 0.0004   Epoch: 19   Global Step: 200190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:24:12,386-Speed 5530.47 samples/sec   Loss 0.9211   LearningRate 0.0004   Epoch: 19   Global Step: 200200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 16:24:19,825-Speed 5506.75 samples/sec   Loss 0.9143   LearningRate 0.0004   Epoch: 19   Global Step: 200210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 16:24:27,263-Speed 5507.63 samples/sec   Loss 0.9133   LearningRate 0.0004   Epoch: 19   Global Step: 200220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 16:24:34,675-Speed 5527.07 samples/sec   Loss 0.9434   LearningRate 0.0004   Epoch: 19   Global Step: 200230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 16:24:42,087-Speed 5527.14 samples/sec   Loss 0.9051   LearningRate 0.0004   Epoch: 19   Global Step: 200240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 16:24:49,523-Speed 5509.23 samples/sec   Loss 0.9193   LearningRate 0.0004   Epoch: 19   Global Step: 200250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 16:24:56,946-Speed 5518.15 samples/sec   Loss 0.9349   LearningRate 0.0004   Epoch: 19   Global Step: 200260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 16:25:04,396-Speed 5498.32 samples/sec   Loss 0.9243   LearningRate 0.0004   Epoch: 19   Global Step: 200270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:25:11,845-Speed 5500.43 samples/sec   Loss 0.9162   LearningRate 0.0004   Epoch: 19   Global Step: 200280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:25:19,268-Speed 5518.53 samples/sec   Loss 0.9437   LearningRate 0.0004   Epoch: 19   Global Step: 200290   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:25:26,709-Speed 5505.22 samples/sec   Loss 0.9110   LearningRate 0.0004   Epoch: 19   Global Step: 200300   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:25:34,135-Speed 5516.02 samples/sec   Loss 0.9172   LearningRate 0.0004   Epoch: 19   Global Step: 200310   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:25:41,636-Speed 5461.55 samples/sec   Loss 0.9282   LearningRate 0.0004   Epoch: 19   Global Step: 200320   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:25:49,208-Speed 5410.53 samples/sec   Loss 0.8999   LearningRate 0.0004   Epoch: 19   Global Step: 200330   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:25:56,922-Speed 5310.70 samples/sec   Loss 0.9370   LearningRate 0.0004   Epoch: 19   Global Step: 200340   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:26:04,533-Speed 5382.26 samples/sec   Loss 0.9271   LearningRate 0.0004   Epoch: 19   Global Step: 200350   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:26:12,004-Speed 5483.80 samples/sec   Loss 0.9074   LearningRate 0.0004   Epoch: 19   Global Step: 200360   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:26:19,377-Speed 5555.87 samples/sec   Loss 0.9257   LearningRate 0.0004   Epoch: 19   Global Step: 200370   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:26:26,811-Speed 5510.31 samples/sec   Loss 0.9395   LearningRate 0.0004   Epoch: 19   Global Step: 200380   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:26:34,311-Speed 5461.74 samples/sec   Loss 0.9182   LearningRate 0.0004   Epoch: 19   Global Step: 200390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:26:41,744-Speed 5512.33 samples/sec   Loss 0.9161   LearningRate 0.0004   Epoch: 19   Global Step: 200400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:26:49,211-Speed 5486.34 samples/sec   Loss 0.9243   LearningRate 0.0004   Epoch: 19   Global Step: 200410   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:26:56,682-Speed 5483.44 samples/sec   Loss 0.9287   LearningRate 0.0004   Epoch: 19   Global Step: 200420   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:27:04,069-Speed 5545.61 samples/sec   Loss 0.9244   LearningRate 0.0004   Epoch: 19   Global Step: 200430   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:27:11,421-Speed 5571.54 samples/sec   Loss 0.9243   LearningRate 0.0004   Epoch: 19   Global Step: 200440   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:27:18,756-Speed 5585.29 samples/sec   Loss 0.9055   LearningRate 0.0004   Epoch: 19   Global Step: 200450   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:27:26,134-Speed 5558.28 samples/sec   Loss 0.9101   LearningRate 0.0004   Epoch: 19   Global Step: 200460   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:27:34,150-Speed 5539.81 samples/sec   Loss 0.9176   LearningRate 0.0004   Epoch: 19   Global Step: 200470   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:27:41,618-Speed 5484.90 samples/sec   Loss 0.9035   LearningRate 0.0004   Epoch: 19   Global Step: 200480   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:27:48,997-Speed 5552.55 samples/sec   Loss 0.9286   LearningRate 0.0004   Epoch: 19   Global Step: 200490   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:27:56,426-Speed 5514.20 samples/sec   Loss 0.9095   LearningRate 0.0004   Epoch: 19   Global Step: 200500   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:28:03,978-Speed 5424.54 samples/sec   Loss 0.9075   LearningRate 0.0004   Epoch: 19   Global Step: 200510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:28:11,426-Speed 5499.78 samples/sec   Loss 0.8897   LearningRate 0.0004   Epoch: 19   Global Step: 200520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:28:18,769-Speed 5579.24 samples/sec   Loss 0.9174   LearningRate 0.0004   Epoch: 19   Global Step: 200530   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:28:26,170-Speed 5535.02 samples/sec   Loss 0.9129   LearningRate 0.0004   Epoch: 19   Global Step: 200540   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:28:33,620-Speed 5499.16 samples/sec   Loss 0.8802   LearningRate 0.0004   Epoch: 19   Global Step: 200550   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:28:41,059-Speed 5506.31 samples/sec   Loss 0.9201   LearningRate 0.0004   Epoch: 19   Global Step: 200560   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:28:48,482-Speed 5518.69 samples/sec   Loss 0.9185   LearningRate 0.0004   Epoch: 19   Global Step: 200570   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:28:55,912-Speed 5513.78 samples/sec   Loss 0.9333   LearningRate 0.0004   Epoch: 19   Global Step: 200580   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:29:03,401-Speed 5470.11 samples/sec   Loss 0.9259   LearningRate 0.0004   Epoch: 19   Global Step: 200590   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:29:10,879-Speed 5478.10 samples/sec   Loss 0.9220   LearningRate 0.0004   Epoch: 19   Global Step: 200600   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:29:18,384-Speed 5458.30 samples/sec   Loss 0.8901   LearningRate 0.0004   Epoch: 19   Global Step: 200610   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:29:25,776-Speed 5542.49 samples/sec   Loss 0.9234   LearningRate 0.0004   Epoch: 19   Global Step: 200620   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:29:33,318-Speed 5431.67 samples/sec   Loss 0.9392   LearningRate 0.0004   Epoch: 19   Global Step: 200630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:29:40,768-Speed 5498.82 samples/sec   Loss 0.9357   LearningRate 0.0004   Epoch: 19   Global Step: 200640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:29:48,171-Speed 5533.49 samples/sec   Loss 0.9358   LearningRate 0.0004   Epoch: 19   Global Step: 200650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:29:55,591-Speed 5521.05 samples/sec   Loss 0.9315   LearningRate 0.0003   Epoch: 19   Global Step: 200660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:30:03,028-Speed 5508.02 samples/sec   Loss 0.9136   LearningRate 0.0003   Epoch: 19   Global Step: 200670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:30:10,452-Speed 5517.70 samples/sec   Loss 0.9223   LearningRate 0.0003   Epoch: 19   Global Step: 200680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:30:17,949-Speed 5464.27 samples/sec   Loss 0.9200   LearningRate 0.0003   Epoch: 19   Global Step: 200690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:30:25,371-Speed 5520.19 samples/sec   Loss 0.9299   LearningRate 0.0003   Epoch: 19   Global Step: 200700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:30:32,810-Speed 5506.34 samples/sec   Loss 0.8991   LearningRate 0.0003   Epoch: 19   Global Step: 200710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:30:40,254-Speed 5503.38 samples/sec   Loss 0.9170   LearningRate 0.0003   Epoch: 19   Global Step: 200720   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:30:47,665-Speed 5527.71 samples/sec   Loss 0.9260   LearningRate 0.0003   Epoch: 19   Global Step: 200730   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:30:55,210-Speed 5429.38 samples/sec   Loss 0.9055   LearningRate 0.0003   Epoch: 19   Global Step: 200740   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:31:02,597-Speed 5546.00 samples/sec   Loss 0.8996   LearningRate 0.0003   Epoch: 19   Global Step: 200750   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:31:10,138-Speed 5432.15 samples/sec   Loss 0.9200   LearningRate 0.0003   Epoch: 19   Global Step: 200760   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:31:17,639-Speed 5461.65 samples/sec   Loss 0.8991   LearningRate 0.0003   Epoch: 19   Global Step: 200770   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:31:25,135-Speed 5465.34 samples/sec   Loss 0.9140   LearningRate 0.0003   Epoch: 19   Global Step: 200780   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:31:32,660-Speed 5443.88 samples/sec   Loss 0.9340   LearningRate 0.0003   Epoch: 19   Global Step: 200790   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:31:40,119-Speed 5491.95 samples/sec   Loss 0.9262   LearningRate 0.0003   Epoch: 19   Global Step: 200800   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:31:47,561-Speed 5504.22 samples/sec   Loss 0.9006   LearningRate 0.0003   Epoch: 19   Global Step: 200810   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 16:31:54,964-Speed 5534.29 samples/sec   Loss 0.9104   LearningRate 0.0003   Epoch: 19   Global Step: 200820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:32:02,498-Speed 5436.90 samples/sec   Loss 0.9030   LearningRate 0.0003   Epoch: 19   Global Step: 200830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 16:32:09,898-Speed 5536.36 samples/sec   Loss 0.9057   LearningRate 0.0003   Epoch: 19   Global Step: 200840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:32:17,440-Speed 5431.55 samples/sec   Loss 0.9248   LearningRate 0.0003   Epoch: 19   Global Step: 200850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:32:24,848-Speed 5529.90 samples/sec   Loss 0.9236   LearningRate 0.0003   Epoch: 19   Global Step: 200860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:32:32,269-Speed 5520.44 samples/sec   Loss 0.9158   LearningRate 0.0003   Epoch: 19   Global Step: 200870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:32:39,740-Speed 5485.06 samples/sec   Loss 0.9073   LearningRate 0.0003   Epoch: 19   Global Step: 200880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:32:47,182-Speed 5503.73 samples/sec   Loss 0.9217   LearningRate 0.0003   Epoch: 19   Global Step: 200890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:32:54,589-Speed 5530.91 samples/sec   Loss 0.9066   LearningRate 0.0003   Epoch: 19   Global Step: 200900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:33:02,011-Speed 5520.22 samples/sec   Loss 0.9234   LearningRate 0.0003   Epoch: 19   Global Step: 200910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:33:09,409-Speed 5537.53 samples/sec   Loss 0.9254   LearningRate 0.0003   Epoch: 19   Global Step: 200920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 16:33:16,841-Speed 5511.33 samples/sec   Loss 0.9094   LearningRate 0.0003   Epoch: 19   Global Step: 200930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:33:24,255-Speed 5525.36 samples/sec   Loss 0.9303   LearningRate 0.0003   Epoch: 19   Global Step: 200940   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:33:31,608-Speed 5571.47 samples/sec   Loss 0.9155   LearningRate 0.0003   Epoch: 19   Global Step: 200950   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:33:39,061-Speed 5496.82 samples/sec   Loss 0.9139   LearningRate 0.0003   Epoch: 19   Global Step: 200960   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:33:46,599-Speed 5434.44 samples/sec   Loss 0.9262   LearningRate 0.0003   Epoch: 19   Global Step: 200970   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:33:54,111-Speed 5452.98 samples/sec   Loss 0.9081   LearningRate 0.0003   Epoch: 19   Global Step: 200980   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:34:01,702-Speed 5396.75 samples/sec   Loss 0.9147   LearningRate 0.0003   Epoch: 19   Global Step: 200990   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:34:09,237-Speed 5437.18 samples/sec   Loss 0.9238   LearningRate 0.0003   Epoch: 19   Global Step: 201000   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:34:16,668-Speed 5512.45 samples/sec   Loss 0.9097   LearningRate 0.0003   Epoch: 19   Global Step: 201010   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:34:24,113-Speed 5502.18 samples/sec   Loss 0.9043   LearningRate 0.0003   Epoch: 19   Global Step: 201020   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:34:31,805-Speed 5326.20 samples/sec   Loss 0.9287   LearningRate 0.0003   Epoch: 19   Global Step: 201030   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:34:39,213-Speed 5530.10 samples/sec   Loss 0.9047   LearningRate 0.0003   Epoch: 19   Global Step: 201040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:34:46,699-Speed 5472.29 samples/sec   Loss 0.9148   LearningRate 0.0003   Epoch: 19   Global Step: 201050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:34:54,138-Speed 5506.59 samples/sec   Loss 0.9301   LearningRate 0.0003   Epoch: 19   Global Step: 201060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:35:01,574-Speed 5508.97 samples/sec   Loss 0.9134   LearningRate 0.0003   Epoch: 19   Global Step: 201070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:35:08,987-Speed 5526.34 samples/sec   Loss 0.9088   LearningRate 0.0003   Epoch: 19   Global Step: 201080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:35:16,407-Speed 5520.87 samples/sec   Loss 0.8951   LearningRate 0.0003   Epoch: 19   Global Step: 201090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:35:23,854-Speed 5501.33 samples/sec   Loss 0.8981   LearningRate 0.0003   Epoch: 19   Global Step: 201100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:35:31,269-Speed 5524.68 samples/sec   Loss 0.9124   LearningRate 0.0003   Epoch: 19   Global Step: 201110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:35:38,890-Speed 5375.09 samples/sec   Loss 0.8982   LearningRate 0.0003   Epoch: 19   Global Step: 201120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:35:46,352-Speed 5490.34 samples/sec   Loss 0.8800   LearningRate 0.0003   Epoch: 19   Global Step: 201130   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:35:53,826-Speed 5480.84 samples/sec   Loss 0.9298   LearningRate 0.0003   Epoch: 19   Global Step: 201140   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:36:01,245-Speed 5522.05 samples/sec   Loss 0.8974   LearningRate 0.0003   Epoch: 19   Global Step: 201150   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:36:08,679-Speed 5510.38 samples/sec   Loss 0.8980   LearningRate 0.0003   Epoch: 19   Global Step: 201160   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:36:16,168-Speed 5469.70 samples/sec   Loss 0.9221   LearningRate 0.0003   Epoch: 19   Global Step: 201170   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:36:23,595-Speed 5516.00 samples/sec   Loss 0.9273   LearningRate 0.0003   Epoch: 19   Global Step: 201180   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:36:31,011-Speed 5524.17 samples/sec   Loss 0.9106   LearningRate 0.0003   Epoch: 19   Global Step: 201190   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:36:38,453-Speed 5504.29 samples/sec   Loss 0.8898   LearningRate 0.0003   Epoch: 19   Global Step: 201200   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:36:45,897-Speed 5502.98 samples/sec   Loss 0.8934   LearningRate 0.0003   Epoch: 19   Global Step: 201210   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:36:53,325-Speed 5515.11 samples/sec   Loss 0.9267   LearningRate 0.0003   Epoch: 19   Global Step: 201220   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:37:00,715-Speed 5543.19 samples/sec   Loss 0.9210   LearningRate 0.0003   Epoch: 19   Global Step: 201230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:37:08,140-Speed 5517.28 samples/sec   Loss 0.9161   LearningRate 0.0003   Epoch: 19   Global Step: 201240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:37:15,567-Speed 5515.68 samples/sec   Loss 0.9105   LearningRate 0.0003   Epoch: 19   Global Step: 201250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:37:22,957-Speed 5544.01 samples/sec   Loss 0.8953   LearningRate 0.0003   Epoch: 19   Global Step: 201260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:37:30,356-Speed 5536.32 samples/sec   Loss 0.9135   LearningRate 0.0003   Epoch: 19   Global Step: 201270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:37:37,751-Speed 5539.37 samples/sec   Loss 0.9057   LearningRate 0.0003   Epoch: 19   Global Step: 201280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:37:45,218-Speed 5486.75 samples/sec   Loss 0.9126   LearningRate 0.0003   Epoch: 19   Global Step: 201290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:37:52,706-Speed 5470.80 samples/sec   Loss 0.9053   LearningRate 0.0003   Epoch: 19   Global Step: 201300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:38:00,115-Speed 5528.78 samples/sec   Loss 0.8854   LearningRate 0.0003   Epoch: 19   Global Step: 201310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:38:07,669-Speed 5423.21 samples/sec   Loss 0.8976   LearningRate 0.0003   Epoch: 19   Global Step: 201320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:38:15,059-Speed 5543.81 samples/sec   Loss 0.9017   LearningRate 0.0003   Epoch: 19   Global Step: 201330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 16:38:22,475-Speed 5523.99 samples/sec   Loss 0.9135   LearningRate 0.0003   Epoch: 19   Global Step: 201340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:38:29,953-Speed 5477.72 samples/sec   Loss 0.9099   LearningRate 0.0003   Epoch: 19   Global Step: 201350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:38:37,337-Speed 5547.93 samples/sec   Loss 0.9211   LearningRate 0.0003   Epoch: 19   Global Step: 201360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:38:44,746-Speed 5529.36 samples/sec   Loss 0.9261   LearningRate 0.0003   Epoch: 19   Global Step: 201370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:38:52,295-Speed 5426.53 samples/sec   Loss 0.9121   LearningRate 0.0003   Epoch: 19   Global Step: 201380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:38:59,856-Speed 5417.64 samples/sec   Loss 0.9007   LearningRate 0.0003   Epoch: 19   Global Step: 201390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:39:07,476-Speed 5376.70 samples/sec   Loss 0.9187   LearningRate 0.0003   Epoch: 19   Global Step: 201400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:39:14,859-Speed 5548.89 samples/sec   Loss 0.9021   LearningRate 0.0003   Epoch: 19   Global Step: 201410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:39:22,311-Speed 5497.38 samples/sec   Loss 0.9063   LearningRate 0.0003   Epoch: 19   Global Step: 201420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:39:29,695-Speed 5547.01 samples/sec   Loss 0.8990   LearningRate 0.0003   Epoch: 19   Global Step: 201430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:39:37,149-Speed 5496.18 samples/sec   Loss 0.9209   LearningRate 0.0003   Epoch: 19   Global Step: 201440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 16:39:44,667-Speed 5448.70 samples/sec   Loss 0.9054   LearningRate 0.0003   Epoch: 19   Global Step: 201450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 16:39:52,269-Speed 5389.18 samples/sec   Loss 0.8958   LearningRate 0.0003   Epoch: 19   Global Step: 201460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:39:59,685-Speed 5524.12 samples/sec   Loss 0.8994   LearningRate 0.0003   Epoch: 19   Global Step: 201470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:40:07,127-Speed 5504.93 samples/sec   Loss 0.9141   LearningRate 0.0003   Epoch: 19   Global Step: 201480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:40:14,522-Speed 5539.39 samples/sec   Loss 0.9104   LearningRate 0.0003   Epoch: 19   Global Step: 201490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:40:21,930-Speed 5529.66 samples/sec   Loss 0.9220   LearningRate 0.0003   Epoch: 19   Global Step: 201500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:40:29,442-Speed 5453.69 samples/sec   Loss 0.8927   LearningRate 0.0003   Epoch: 19   Global Step: 201510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:40:37,044-Speed 5388.58 samples/sec   Loss 0.9163   LearningRate 0.0003   Epoch: 19   Global Step: 201520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:40:44,529-Speed 5473.36 samples/sec   Loss 0.9127   LearningRate 0.0003   Epoch: 19   Global Step: 201530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:40:52,005-Speed 5479.56 samples/sec   Loss 0.8859   LearningRate 0.0003   Epoch: 19   Global Step: 201540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:40:59,461-Speed 5494.03 samples/sec   Loss 0.9109   LearningRate 0.0003   Epoch: 19   Global Step: 201550   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:41:06,875-Speed 5525.08 samples/sec   Loss 0.9207   LearningRate 0.0003   Epoch: 19   Global Step: 201560   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:41:14,463-Speed 5399.35 samples/sec   Loss 0.9054   LearningRate 0.0003   Epoch: 19   Global Step: 201570   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:41:21,903-Speed 5506.18 samples/sec   Loss 0.9212   LearningRate 0.0003   Epoch: 19   Global Step: 201580   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:41:29,327-Speed 5517.82 samples/sec   Loss 0.9098   LearningRate 0.0003   Epoch: 19   Global Step: 201590   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:41:36,747-Speed 5520.43 samples/sec   Loss 0.9396   LearningRate 0.0003   Epoch: 19   Global Step: 201600   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:41:44,156-Speed 5530.08 samples/sec   Loss 0.9127   LearningRate 0.0003   Epoch: 19   Global Step: 201610   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:41:51,548-Speed 5542.13 samples/sec   Loss 0.8931   LearningRate 0.0003   Epoch: 19   Global Step: 201620   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:41:59,015-Speed 5485.67 samples/sec   Loss 0.9013   LearningRate 0.0003   Epoch: 19   Global Step: 201630   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:42:06,449-Speed 5510.24 samples/sec   Loss 0.9051   LearningRate 0.0003   Epoch: 19   Global Step: 201640   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:42:13,861-Speed 5527.76 samples/sec   Loss 0.8747   LearningRate 0.0003   Epoch: 19   Global Step: 201650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:42:21,213-Speed 5572.18 samples/sec   Loss 0.9182   LearningRate 0.0003   Epoch: 19   Global Step: 201660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:42:28,635-Speed 5519.23 samples/sec   Loss 0.8719   LearningRate 0.0003   Epoch: 19   Global Step: 201670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:42:36,141-Speed 5457.14 samples/sec   Loss 0.9129   LearningRate 0.0003   Epoch: 19   Global Step: 201680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:42:43,554-Speed 5526.82 samples/sec   Loss 0.8923   LearningRate 0.0003   Epoch: 19   Global Step: 201690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:42:51,081-Speed 5441.98 samples/sec   Loss 0.8864   LearningRate 0.0002   Epoch: 19   Global Step: 201700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:42:58,499-Speed 5522.88 samples/sec   Loss 0.9199   LearningRate 0.0002   Epoch: 19   Global Step: 201710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:43:05,967-Speed 5485.07 samples/sec   Loss 0.9034   LearningRate 0.0002   Epoch: 19   Global Step: 201720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:43:13,369-Speed 5534.21 samples/sec   Loss 0.8845   LearningRate 0.0002   Epoch: 19   Global Step: 201730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:43:20,778-Speed 5529.14 samples/sec   Loss 0.8990   LearningRate 0.0002   Epoch: 19   Global Step: 201740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:43:28,212-Speed 5511.03 samples/sec   Loss 0.8993   LearningRate 0.0002   Epoch: 19   Global Step: 201750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:43:35,640-Speed 5515.11 samples/sec   Loss 0.8875   LearningRate 0.0002   Epoch: 19   Global Step: 201760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:43:43,094-Speed 5495.01 samples/sec   Loss 0.9146   LearningRate 0.0002   Epoch: 19   Global Step: 201770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:43:50,530-Speed 5509.92 samples/sec   Loss 0.8958   LearningRate 0.0002   Epoch: 19   Global Step: 201780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:43:57,965-Speed 5509.87 samples/sec   Loss 0.9046   LearningRate 0.0002   Epoch: 19   Global Step: 201790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:44:05,401-Speed 5508.82 samples/sec   Loss 0.8971   LearningRate 0.0002   Epoch: 19   Global Step: 201800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:44:12,813-Speed 5526.67 samples/sec   Loss 0.8863   LearningRate 0.0002   Epoch: 19   Global Step: 201810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:44:20,228-Speed 5525.09 samples/sec   Loss 0.9103   LearningRate 0.0002   Epoch: 19   Global Step: 201820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:44:27,782-Speed 5422.96 samples/sec   Loss 0.8901   LearningRate 0.0002   Epoch: 19   Global Step: 201830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:44:35,228-Speed 5501.39 samples/sec   Loss 0.9152   LearningRate 0.0002   Epoch: 19   Global Step: 201840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:44:42,756-Speed 5441.87 samples/sec   Loss 0.9163   LearningRate 0.0002   Epoch: 19   Global Step: 201850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 16:44:50,167-Speed 5527.79 samples/sec   Loss 0.9153   LearningRate 0.0002   Epoch: 19   Global Step: 201860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 16:44:57,704-Speed 5434.99 samples/sec   Loss 0.9120   LearningRate 0.0002   Epoch: 19   Global Step: 201870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 16:45:05,183-Speed 5477.53 samples/sec   Loss 0.9054   LearningRate 0.0002   Epoch: 19   Global Step: 201880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:45:12,667-Speed 5473.95 samples/sec   Loss 0.9076   LearningRate 0.0002   Epoch: 19   Global Step: 201890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:45:20,212-Speed 5429.66 samples/sec   Loss 0.9031   LearningRate 0.0002   Epoch: 19   Global Step: 201900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:45:27,632-Speed 5520.82 samples/sec   Loss 0.8888   LearningRate 0.0002   Epoch: 19   Global Step: 201910   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:45:35,094-Speed 5490.05 samples/sec   Loss 0.9078   LearningRate 0.0002   Epoch: 19   Global Step: 201920   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:45:42,635-Speed 5431.72 samples/sec   Loss 0.8986   LearningRate 0.0002   Epoch: 19   Global Step: 201930   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:45:50,082-Speed 5501.06 samples/sec   Loss 0.9211   LearningRate 0.0002   Epoch: 19   Global Step: 201940   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:45:57,536-Speed 5496.12 samples/sec   Loss 0.8990   LearningRate 0.0002   Epoch: 19   Global Step: 201950   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:46:05,021-Speed 5472.70 samples/sec   Loss 0.8986   LearningRate 0.0002   Epoch: 19   Global Step: 201960   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:46:12,433-Speed 5526.94 samples/sec   Loss 0.9017   LearningRate 0.0002   Epoch: 19   Global Step: 201970   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:46:19,855-Speed 5519.96 samples/sec   Loss 0.9062   LearningRate 0.0002   Epoch: 19   Global Step: 201980   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:46:27,331-Speed 5479.09 samples/sec   Loss 0.9079   LearningRate 0.0002   Epoch: 19   Global Step: 201990   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:46:34,873-Speed 5431.67 samples/sec   Loss 0.9042   LearningRate 0.0002   Epoch: 19   Global Step: 202000   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:47:18,840-[lfw][202000]XNorm: 22.206904
Training: 2022-01-09 16:47:18,841-[lfw][202000]Accuracy-Flip: 0.99833+-0.00236
Training: 2022-01-09 16:47:18,841-[lfw][202000]Accuracy-Highest: 0.99850
Training: 2022-01-09 16:48:10,304-[cfp_fp][202000]XNorm: 22.083832
Training: 2022-01-09 16:48:10,305-[cfp_fp][202000]Accuracy-Flip: 0.99414+-0.00316
Training: 2022-01-09 16:48:10,305-[cfp_fp][202000]Accuracy-Highest: 0.99443
Training: 2022-01-09 16:48:54,221-[agedb_30][202000]XNorm: 23.016906
Training: 2022-01-09 16:48:54,222-[agedb_30][202000]Accuracy-Flip: 0.98650+-0.00545
Training: 2022-01-09 16:48:54,222-[agedb_30][202000]Accuracy-Highest: 0.98667
Training: 2022-01-09 16:49:01,733-Speed 278.91 samples/sec   Loss 0.9049   LearningRate 0.0002   Epoch: 19   Global Step: 202010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:49:09,159-Speed 5516.67 samples/sec   Loss 0.9177   LearningRate 0.0002   Epoch: 19   Global Step: 202020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:49:16,505-Speed 5576.36 samples/sec   Loss 0.8849   LearningRate 0.0002   Epoch: 19   Global Step: 202030   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:49:23,930-Speed 5517.79 samples/sec   Loss 0.9119   LearningRate 0.0002   Epoch: 19   Global Step: 202040   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:49:31,364-Speed 5510.51 samples/sec   Loss 0.8963   LearningRate 0.0002   Epoch: 19   Global Step: 202050   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:49:38,811-Speed 5500.64 samples/sec   Loss 0.9061   LearningRate 0.0002   Epoch: 19   Global Step: 202060   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:49:46,230-Speed 5521.17 samples/sec   Loss 0.8965   LearningRate 0.0002   Epoch: 19   Global Step: 202070   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:49:53,648-Speed 5522.96 samples/sec   Loss 0.9039   LearningRate 0.0002   Epoch: 19   Global Step: 202080   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:50:01,101-Speed 5496.71 samples/sec   Loss 0.9098   LearningRate 0.0002   Epoch: 19   Global Step: 202090   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:50:08,599-Speed 5463.10 samples/sec   Loss 0.9102   LearningRate 0.0002   Epoch: 19   Global Step: 202100   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:50:15,993-Speed 5540.32 samples/sec   Loss 0.9030   LearningRate 0.0002   Epoch: 19   Global Step: 202110   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:50:23,410-Speed 5523.42 samples/sec   Loss 0.9090   LearningRate 0.0002   Epoch: 19   Global Step: 202120   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:50:30,843-Speed 5511.60 samples/sec   Loss 0.8934   LearningRate 0.0002   Epoch: 19   Global Step: 202130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:50:38,236-Speed 5540.83 samples/sec   Loss 0.8964   LearningRate 0.0002   Epoch: 19   Global Step: 202140   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:50:45,635-Speed 5536.84 samples/sec   Loss 0.9146   LearningRate 0.0002   Epoch: 19   Global Step: 202150   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:50:53,076-Speed 5504.95 samples/sec   Loss 0.9142   LearningRate 0.0002   Epoch: 19   Global Step: 202160   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:51:00,473-Speed 5538.61 samples/sec   Loss 0.8926   LearningRate 0.0002   Epoch: 19   Global Step: 202170   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:51:07,970-Speed 5464.10 samples/sec   Loss 0.9141   LearningRate 0.0002   Epoch: 19   Global Step: 202180   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:51:15,526-Speed 5422.04 samples/sec   Loss 0.9199   LearningRate 0.0002   Epoch: 19   Global Step: 202190   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:51:22,925-Speed 5536.30 samples/sec   Loss 0.8716   LearningRate 0.0002   Epoch: 19   Global Step: 202200   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:51:30,367-Speed 5504.59 samples/sec   Loss 0.9106   LearningRate 0.0002   Epoch: 19   Global Step: 202210   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:51:37,802-Speed 5510.32 samples/sec   Loss 0.9126   LearningRate 0.0002   Epoch: 19   Global Step: 202220   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:51:45,176-Speed 5555.01 samples/sec   Loss 0.8959   LearningRate 0.0002   Epoch: 19   Global Step: 202230   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:51:52,589-Speed 5526.21 samples/sec   Loss 0.9003   LearningRate 0.0002   Epoch: 19   Global Step: 202240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:52:00,093-Speed 5459.44 samples/sec   Loss 0.9119   LearningRate 0.0002   Epoch: 19   Global Step: 202250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:52:07,678-Speed 5401.08 samples/sec   Loss 0.8955   LearningRate 0.0002   Epoch: 19   Global Step: 202260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:52:15,259-Speed 5403.29 samples/sec   Loss 0.8878   LearningRate 0.0002   Epoch: 19   Global Step: 202270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:52:22,834-Speed 5407.75 samples/sec   Loss 0.8871   LearningRate 0.0002   Epoch: 19   Global Step: 202280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:52:30,476-Speed 5360.83 samples/sec   Loss 0.8940   LearningRate 0.0002   Epoch: 19   Global Step: 202290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:52:38,017-Speed 5432.50 samples/sec   Loss 0.8934   LearningRate 0.0002   Epoch: 19   Global Step: 202300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:52:45,552-Speed 5436.79 samples/sec   Loss 0.8864   LearningRate 0.0002   Epoch: 19   Global Step: 202310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:52:53,027-Speed 5480.15 samples/sec   Loss 0.9013   LearningRate 0.0002   Epoch: 19   Global Step: 202320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:53:00,540-Speed 5453.30 samples/sec   Loss 0.8878   LearningRate 0.0002   Epoch: 19   Global Step: 202330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:53:07,951-Speed 5527.30 samples/sec   Loss 0.9065   LearningRate 0.0002   Epoch: 19   Global Step: 202340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:53:15,440-Speed 5470.36 samples/sec   Loss 0.9071   LearningRate 0.0002   Epoch: 19   Global Step: 202350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:53:23,005-Speed 5414.99 samples/sec   Loss 0.9082   LearningRate 0.0002   Epoch: 19   Global Step: 202360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:53:30,453-Speed 5499.97 samples/sec   Loss 0.8912   LearningRate 0.0002   Epoch: 19   Global Step: 202370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:53:37,891-Speed 5507.60 samples/sec   Loss 0.9023   LearningRate 0.0002   Epoch: 19   Global Step: 202380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:53:45,450-Speed 5419.45 samples/sec   Loss 0.8863   LearningRate 0.0002   Epoch: 19   Global Step: 202390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:53:52,904-Speed 5496.13 samples/sec   Loss 0.8953   LearningRate 0.0002   Epoch: 19   Global Step: 202400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:54:00,387-Speed 5474.15 samples/sec   Loss 0.8933   LearningRate 0.0002   Epoch: 19   Global Step: 202410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:54:07,940-Speed 5424.27 samples/sec   Loss 0.8926   LearningRate 0.0002   Epoch: 19   Global Step: 202420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:54:15,434-Speed 5465.88 samples/sec   Loss 0.8914   LearningRate 0.0002   Epoch: 19   Global Step: 202430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:54:22,908-Speed 5481.38 samples/sec   Loss 0.8977   LearningRate 0.0002   Epoch: 19   Global Step: 202440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 16:54:30,498-Speed 5396.99 samples/sec   Loss 0.8745   LearningRate 0.0002   Epoch: 19   Global Step: 202450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 16:54:37,942-Speed 5503.57 samples/sec   Loss 0.9100   LearningRate 0.0002   Epoch: 19   Global Step: 202460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 16:54:45,321-Speed 5551.34 samples/sec   Loss 0.8889   LearningRate 0.0002   Epoch: 19   Global Step: 202470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:54:52,754-Speed 5511.06 samples/sec   Loss 0.8929   LearningRate 0.0002   Epoch: 19   Global Step: 202480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:55:00,321-Speed 5413.97 samples/sec   Loss 0.9052   LearningRate 0.0002   Epoch: 19   Global Step: 202490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:55:07,775-Speed 5496.32 samples/sec   Loss 0.9008   LearningRate 0.0002   Epoch: 19   Global Step: 202500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:55:15,231-Speed 5493.98 samples/sec   Loss 0.9021   LearningRate 0.0002   Epoch: 19   Global Step: 202510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:55:22,717-Speed 5472.41 samples/sec   Loss 0.8921   LearningRate 0.0002   Epoch: 19   Global Step: 202520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:55:30,112-Speed 5540.00 samples/sec   Loss 0.8821   LearningRate 0.0002   Epoch: 19   Global Step: 202530   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:55:37,615-Speed 5460.11 samples/sec   Loss 0.8993   LearningRate 0.0002   Epoch: 19   Global Step: 202540   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:55:45,230-Speed 5379.48 samples/sec   Loss 0.9106   LearningRate 0.0002   Epoch: 19   Global Step: 202550   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:55:52,701-Speed 5482.75 samples/sec   Loss 0.8749   LearningRate 0.0002   Epoch: 19   Global Step: 202560   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:56:00,152-Speed 5497.97 samples/sec   Loss 0.9219   LearningRate 0.0002   Epoch: 19   Global Step: 202570   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:56:07,526-Speed 5555.69 samples/sec   Loss 0.8875   LearningRate 0.0002   Epoch: 19   Global Step: 202580   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:56:14,967-Speed 5505.86 samples/sec   Loss 0.8842   LearningRate 0.0002   Epoch: 19   Global Step: 202590   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:56:22,396-Speed 5514.06 samples/sec   Loss 0.9100   LearningRate 0.0002   Epoch: 19   Global Step: 202600   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:56:29,848-Speed 5497.00 samples/sec   Loss 0.8823   LearningRate 0.0002   Epoch: 19   Global Step: 202610   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:56:37,278-Speed 5513.92 samples/sec   Loss 0.8958   LearningRate 0.0002   Epoch: 19   Global Step: 202620   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 16:56:44,686-Speed 5529.70 samples/sec   Loss 0.8882   LearningRate 0.0002   Epoch: 19   Global Step: 202630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:56:52,185-Speed 5463.05 samples/sec   Loss 0.9005   LearningRate 0.0002   Epoch: 19   Global Step: 202640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:56:59,641-Speed 5494.36 samples/sec   Loss 0.9154   LearningRate 0.0002   Epoch: 19   Global Step: 202650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:57:07,091-Speed 5498.38 samples/sec   Loss 0.9056   LearningRate 0.0002   Epoch: 19   Global Step: 202660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:57:14,524-Speed 5511.82 samples/sec   Loss 0.8791   LearningRate 0.0002   Epoch: 19   Global Step: 202670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:57:22,004-Speed 5476.28 samples/sec   Loss 0.8767   LearningRate 0.0002   Epoch: 19   Global Step: 202680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:57:29,455-Speed 5498.17 samples/sec   Loss 0.8890   LearningRate 0.0002   Epoch: 19   Global Step: 202690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:57:36,904-Speed 5498.95 samples/sec   Loss 0.9097   LearningRate 0.0002   Epoch: 19   Global Step: 202700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:57:44,335-Speed 5512.80 samples/sec   Loss 0.8983   LearningRate 0.0002   Epoch: 19   Global Step: 202710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:57:51,779-Speed 5503.46 samples/sec   Loss 0.8848   LearningRate 0.0002   Epoch: 19   Global Step: 202720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:57:59,211-Speed 5511.77 samples/sec   Loss 0.8799   LearningRate 0.0002   Epoch: 19   Global Step: 202730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 16:58:06,583-Speed 5557.15 samples/sec   Loss 0.9005   LearningRate 0.0002   Epoch: 19   Global Step: 202740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:58:14,059-Speed 5480.16 samples/sec   Loss 0.8926   LearningRate 0.0002   Epoch: 19   Global Step: 202750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:58:21,671-Speed 5381.37 samples/sec   Loss 0.8974   LearningRate 0.0002   Epoch: 19   Global Step: 202760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:58:29,201-Speed 5440.38 samples/sec   Loss 0.8852   LearningRate 0.0002   Epoch: 19   Global Step: 202770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:58:36,662-Speed 5490.65 samples/sec   Loss 0.9158   LearningRate 0.0002   Epoch: 19   Global Step: 202780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:58:44,050-Speed 5545.14 samples/sec   Loss 0.8871   LearningRate 0.0002   Epoch: 19   Global Step: 202790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:58:51,471-Speed 5520.09 samples/sec   Loss 0.8974   LearningRate 0.0002   Epoch: 19   Global Step: 202800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:58:58,869-Speed 5536.81 samples/sec   Loss 0.8922   LearningRate 0.0002   Epoch: 19   Global Step: 202810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:59:06,283-Speed 5525.52 samples/sec   Loss 0.8884   LearningRate 0.0002   Epoch: 19   Global Step: 202820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:59:13,666-Speed 5549.15 samples/sec   Loss 0.8842   LearningRate 0.0002   Epoch: 19   Global Step: 202830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:59:21,096-Speed 5513.83 samples/sec   Loss 0.9004   LearningRate 0.0002   Epoch: 19   Global Step: 202840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 16:59:28,505-Speed 5528.74 samples/sec   Loss 0.8940   LearningRate 0.0002   Epoch: 19   Global Step: 202850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 16:59:35,915-Speed 5528.66 samples/sec   Loss 0.8778   LearningRate 0.0002   Epoch: 19   Global Step: 202860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 16:59:43,323-Speed 5530.03 samples/sec   Loss 0.8954   LearningRate 0.0002   Epoch: 19   Global Step: 202870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:59:50,728-Speed 5531.80 samples/sec   Loss 0.8971   LearningRate 0.0002   Epoch: 19   Global Step: 202880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 16:59:58,145-Speed 5523.46 samples/sec   Loss 0.8997   LearningRate 0.0002   Epoch: 19   Global Step: 202890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:00:05,568-Speed 5518.35 samples/sec   Loss 0.8947   LearningRate 0.0002   Epoch: 19   Global Step: 202900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:00:13,042-Speed 5481.67 samples/sec   Loss 0.9007   LearningRate 0.0002   Epoch: 19   Global Step: 202910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:00:20,448-Speed 5531.26 samples/sec   Loss 0.9022   LearningRate 0.0002   Epoch: 19   Global Step: 202920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:00:27,951-Speed 5459.28 samples/sec   Loss 0.8779   LearningRate 0.0002   Epoch: 19   Global Step: 202930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:00:35,432-Speed 5476.43 samples/sec   Loss 0.9000   LearningRate 0.0002   Epoch: 19   Global Step: 202940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:00:42,818-Speed 5545.71 samples/sec   Loss 0.8900   LearningRate 0.0002   Epoch: 19   Global Step: 202950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:00:50,223-Speed 5532.20 samples/sec   Loss 0.8772   LearningRate 0.0002   Epoch: 19   Global Step: 202960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:00:57,696-Speed 5482.45 samples/sec   Loss 0.8923   LearningRate 0.0002   Epoch: 19   Global Step: 202970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 17:01:05,144-Speed 5499.64 samples/sec   Loss 0.8794   LearningRate 0.0001   Epoch: 19   Global Step: 202980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:01:12,539-Speed 5540.09 samples/sec   Loss 0.9116   LearningRate 0.0001   Epoch: 19   Global Step: 202990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:01:19,973-Speed 5510.85 samples/sec   Loss 0.9037   LearningRate 0.0001   Epoch: 19   Global Step: 203000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:01:27,482-Speed 5455.02 samples/sec   Loss 0.9107   LearningRate 0.0001   Epoch: 19   Global Step: 203010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:01:34,886-Speed 5532.57 samples/sec   Loss 0.8879   LearningRate 0.0001   Epoch: 19   Global Step: 203020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:01:42,371-Speed 5473.52 samples/sec   Loss 0.8924   LearningRate 0.0001   Epoch: 19   Global Step: 203030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:01:49,836-Speed 5487.97 samples/sec   Loss 0.9057   LearningRate 0.0001   Epoch: 19   Global Step: 203040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:01:57,337-Speed 5461.34 samples/sec   Loss 0.8850   LearningRate 0.0001   Epoch: 19   Global Step: 203050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:02:04,888-Speed 5424.62 samples/sec   Loss 0.8992   LearningRate 0.0001   Epoch: 19   Global Step: 203060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:02:12,361-Speed 5482.16 samples/sec   Loss 0.8959   LearningRate 0.0001   Epoch: 19   Global Step: 203070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:02:19,785-Speed 5518.04 samples/sec   Loss 0.8807   LearningRate 0.0001   Epoch: 19   Global Step: 203080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 17:02:27,349-Speed 5416.08 samples/sec   Loss 0.9052   LearningRate 0.0001   Epoch: 19   Global Step: 203090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:02:34,788-Speed 5506.53 samples/sec   Loss 0.8939   LearningRate 0.0001   Epoch: 19   Global Step: 203100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:02:42,261-Speed 5481.84 samples/sec   Loss 0.8883   LearningRate 0.0001   Epoch: 19   Global Step: 203110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:02:49,772-Speed 5454.11 samples/sec   Loss 0.9016   LearningRate 0.0001   Epoch: 19   Global Step: 203120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:02:57,288-Speed 5450.44 samples/sec   Loss 0.8749   LearningRate 0.0001   Epoch: 19   Global Step: 203130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:03:04,742-Speed 5495.58 samples/sec   Loss 0.8985   LearningRate 0.0001   Epoch: 19   Global Step: 203140   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:03:12,207-Speed 5488.29 samples/sec   Loss 0.8773   LearningRate 0.0001   Epoch: 19   Global Step: 203150   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:03:19,706-Speed 5462.99 samples/sec   Loss 0.8911   LearningRate 0.0001   Epoch: 19   Global Step: 203160   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:03:27,099-Speed 5540.91 samples/sec   Loss 0.9058   LearningRate 0.0001   Epoch: 19   Global Step: 203170   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:03:34,639-Speed 5433.15 samples/sec   Loss 0.8898   LearningRate 0.0001   Epoch: 19   Global Step: 203180   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:03:42,118-Speed 5477.31 samples/sec   Loss 0.8804   LearningRate 0.0001   Epoch: 19   Global Step: 203190   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:03:49,525-Speed 5530.86 samples/sec   Loss 0.8778   LearningRate 0.0001   Epoch: 19   Global Step: 203200   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:03:57,061-Speed 5436.18 samples/sec   Loss 0.8922   LearningRate 0.0001   Epoch: 19   Global Step: 203210   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:04:04,538-Speed 5478.48 samples/sec   Loss 0.8911   LearningRate 0.0001   Epoch: 19   Global Step: 203220   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:04:11,982-Speed 5503.18 samples/sec   Loss 0.8962   LearningRate 0.0001   Epoch: 19   Global Step: 203230   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:04:19,468-Speed 5472.44 samples/sec   Loss 0.8961   LearningRate 0.0001   Epoch: 19   Global Step: 203240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:04:26,917-Speed 5499.57 samples/sec   Loss 0.8909   LearningRate 0.0001   Epoch: 19   Global Step: 203250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:04:34,390-Speed 5481.46 samples/sec   Loss 0.8945   LearningRate 0.0001   Epoch: 19   Global Step: 203260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:04:41,824-Speed 5510.32 samples/sec   Loss 0.9078   LearningRate 0.0001   Epoch: 19   Global Step: 203270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:04:49,313-Speed 5470.38 samples/sec   Loss 0.9100   LearningRate 0.0001   Epoch: 19   Global Step: 203280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:04:56,766-Speed 5497.12 samples/sec   Loss 0.8909   LearningRate 0.0001   Epoch: 19   Global Step: 203290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:05:04,198-Speed 5511.75 samples/sec   Loss 0.8863   LearningRate 0.0001   Epoch: 19   Global Step: 203300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:05:11,760-Speed 5416.87 samples/sec   Loss 0.8948   LearningRate 0.0001   Epoch: 19   Global Step: 203310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:05:19,264-Speed 5459.36 samples/sec   Loss 0.8942   LearningRate 0.0001   Epoch: 19   Global Step: 203320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:05:26,701-Speed 5508.44 samples/sec   Loss 0.8836   LearningRate 0.0001   Epoch: 19   Global Step: 203330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:05:34,140-Speed 5506.79 samples/sec   Loss 0.8892   LearningRate 0.0001   Epoch: 19   Global Step: 203340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 17:05:41,546-Speed 5531.75 samples/sec   Loss 0.9046   LearningRate 0.0001   Epoch: 19   Global Step: 203350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 17:05:48,937-Speed 5542.18 samples/sec   Loss 0.8933   LearningRate 0.0001   Epoch: 19   Global Step: 203360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 17:05:56,383-Speed 5501.97 samples/sec   Loss 0.8817   LearningRate 0.0001   Epoch: 19   Global Step: 203370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:06:03,834-Speed 5497.99 samples/sec   Loss 0.8851   LearningRate 0.0001   Epoch: 19   Global Step: 203380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:06:11,251-Speed 5522.77 samples/sec   Loss 0.9270   LearningRate 0.0001   Epoch: 19   Global Step: 203390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:06:18,672-Speed 5520.22 samples/sec   Loss 0.8762   LearningRate 0.0001   Epoch: 19   Global Step: 203400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:06:26,175-Speed 5460.34 samples/sec   Loss 0.8844   LearningRate 0.0001   Epoch: 19   Global Step: 203410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:06:33,699-Speed 5444.74 samples/sec   Loss 0.8965   LearningRate 0.0001   Epoch: 19   Global Step: 203420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:06:41,113-Speed 5525.12 samples/sec   Loss 0.8711   LearningRate 0.0001   Epoch: 19   Global Step: 203430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:06:48,647-Speed 5437.84 samples/sec   Loss 0.9013   LearningRate 0.0001   Epoch: 19   Global Step: 203440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:06:56,036-Speed 5543.56 samples/sec   Loss 0.8942   LearningRate 0.0001   Epoch: 19   Global Step: 203450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:07:03,418-Speed 5550.16 samples/sec   Loss 0.8803   LearningRate 0.0001   Epoch: 19   Global Step: 203460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:07:10,845-Speed 5515.49 samples/sec   Loss 0.8791   LearningRate 0.0001   Epoch: 19   Global Step: 203470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 17:07:18,269-Speed 5517.34 samples/sec   Loss 0.8866   LearningRate 0.0001   Epoch: 19   Global Step: 203480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:07:25,724-Speed 5495.27 samples/sec   Loss 0.8895   LearningRate 0.0001   Epoch: 19   Global Step: 203490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:07:33,158-Speed 5510.63 samples/sec   Loss 0.8914   LearningRate 0.0001   Epoch: 19   Global Step: 203500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:07:40,515-Speed 5568.67 samples/sec   Loss 0.9066   LearningRate 0.0001   Epoch: 19   Global Step: 203510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:07:47,953-Speed 5507.36 samples/sec   Loss 0.8851   LearningRate 0.0001   Epoch: 19   Global Step: 203520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:07:55,485-Speed 5438.55 samples/sec   Loss 0.8701   LearningRate 0.0001   Epoch: 19   Global Step: 203530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:08:03,111-Speed 5372.06 samples/sec   Loss 0.9103   LearningRate 0.0001   Epoch: 19   Global Step: 203540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:08:10,616-Speed 5458.86 samples/sec   Loss 0.8814   LearningRate 0.0001   Epoch: 19   Global Step: 203550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:08:17,989-Speed 5555.45 samples/sec   Loss 0.8642   LearningRate 0.0001   Epoch: 19   Global Step: 203560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:08:25,349-Speed 5566.52 samples/sec   Loss 0.8812   LearningRate 0.0001   Epoch: 19   Global Step: 203570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:08:32,790-Speed 5505.55 samples/sec   Loss 0.8753   LearningRate 0.0001   Epoch: 19   Global Step: 203580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 17:08:40,210-Speed 5520.68 samples/sec   Loss 0.8789   LearningRate 0.0001   Epoch: 19   Global Step: 203590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:08:47,693-Speed 5474.02 samples/sec   Loss 0.8849   LearningRate 0.0001   Epoch: 19   Global Step: 203600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:08:55,079-Speed 5547.03 samples/sec   Loss 0.8757   LearningRate 0.0001   Epoch: 19   Global Step: 203610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:09:02,502-Speed 5518.22 samples/sec   Loss 0.8763   LearningRate 0.0001   Epoch: 19   Global Step: 203620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:09:09,962-Speed 5491.37 samples/sec   Loss 0.8928   LearningRate 0.0001   Epoch: 19   Global Step: 203630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:09:17,389-Speed 5516.00 samples/sec   Loss 0.9077   LearningRate 0.0001   Epoch: 19   Global Step: 203640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:09:24,828-Speed 5506.93 samples/sec   Loss 0.8903   LearningRate 0.0001   Epoch: 19   Global Step: 203650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:09:32,189-Speed 5564.99 samples/sec   Loss 0.8745   LearningRate 0.0001   Epoch: 19   Global Step: 203660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:09:39,555-Speed 5561.66 samples/sec   Loss 0.8954   LearningRate 0.0001   Epoch: 19   Global Step: 203670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:09:46,941-Speed 5546.46 samples/sec   Loss 0.8806   LearningRate 0.0001   Epoch: 19   Global Step: 203680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:09:54,366-Speed 5516.94 samples/sec   Loss 0.8965   LearningRate 0.0001   Epoch: 19   Global Step: 203690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:10:01,753-Speed 5545.72 samples/sec   Loss 0.8806   LearningRate 0.0001   Epoch: 19   Global Step: 203700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:10:09,278-Speed 5443.83 samples/sec   Loss 0.8887   LearningRate 0.0001   Epoch: 19   Global Step: 203710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:10:16,733-Speed 5495.50 samples/sec   Loss 0.8941   LearningRate 0.0001   Epoch: 19   Global Step: 203720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:10:24,126-Speed 5541.34 samples/sec   Loss 0.8974   LearningRate 0.0001   Epoch: 19   Global Step: 203730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:10:31,568-Speed 5504.59 samples/sec   Loss 0.8954   LearningRate 0.0001   Epoch: 19   Global Step: 203740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:10:38,967-Speed 5536.84 samples/sec   Loss 0.8879   LearningRate 0.0001   Epoch: 19   Global Step: 203750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:10:46,379-Speed 5527.22 samples/sec   Loss 0.8890   LearningRate 0.0001   Epoch: 19   Global Step: 203760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:10:53,784-Speed 5532.26 samples/sec   Loss 0.8938   LearningRate 0.0001   Epoch: 19   Global Step: 203770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:11:01,211-Speed 5515.16 samples/sec   Loss 0.8905   LearningRate 0.0001   Epoch: 19   Global Step: 203780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:11:08,615-Speed 5532.86 samples/sec   Loss 0.8970   LearningRate 0.0001   Epoch: 19   Global Step: 203790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 17:11:15,991-Speed 5554.91 samples/sec   Loss 0.8819   LearningRate 0.0001   Epoch: 19   Global Step: 203800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 17:11:23,454-Speed 5488.83 samples/sec   Loss 0.8842   LearningRate 0.0001   Epoch: 19   Global Step: 203810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 17:11:31,162-Speed 5314.59 samples/sec   Loss 0.8841   LearningRate 0.0001   Epoch: 19   Global Step: 203820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:11:38,620-Speed 5492.41 samples/sec   Loss 0.8829   LearningRate 0.0001   Epoch: 19   Global Step: 203830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:11:46,030-Speed 5529.32 samples/sec   Loss 0.8627   LearningRate 0.0001   Epoch: 19   Global Step: 203840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:11:53,484-Speed 5495.19 samples/sec   Loss 0.9094   LearningRate 0.0001   Epoch: 19   Global Step: 203850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:12:00,892-Speed 5530.57 samples/sec   Loss 0.8778   LearningRate 0.0001   Epoch: 19   Global Step: 203860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:12:08,283-Speed 5542.19 samples/sec   Loss 0.8858   LearningRate 0.0001   Epoch: 19   Global Step: 203870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:12:15,709-Speed 5516.82 samples/sec   Loss 0.8874   LearningRate 0.0001   Epoch: 19   Global Step: 203880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:12:23,178-Speed 5484.57 samples/sec   Loss 0.8923   LearningRate 0.0001   Epoch: 19   Global Step: 203890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:12:30,734-Speed 5421.41 samples/sec   Loss 0.8811   LearningRate 0.0001   Epoch: 19   Global Step: 203900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:12:38,251-Speed 5450.29 samples/sec   Loss 0.8777   LearningRate 0.0001   Epoch: 19   Global Step: 203910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:12:45,674-Speed 5518.62 samples/sec   Loss 0.8867   LearningRate 0.0001   Epoch: 19   Global Step: 203920   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:12:53,083-Speed 5528.76 samples/sec   Loss 0.8760   LearningRate 0.0001   Epoch: 19   Global Step: 203930   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:13:00,583-Speed 5462.72 samples/sec   Loss 0.8784   LearningRate 0.0001   Epoch: 19   Global Step: 203940   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:13:08,030-Speed 5500.54 samples/sec   Loss 0.8942   LearningRate 0.0001   Epoch: 19   Global Step: 203950   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:13:15,472-Speed 5504.22 samples/sec   Loss 0.8676   LearningRate 0.0001   Epoch: 19   Global Step: 203960   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:13:22,901-Speed 5514.81 samples/sec   Loss 0.8745   LearningRate 0.0001   Epoch: 19   Global Step: 203970   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:13:30,473-Speed 5410.26 samples/sec   Loss 0.8820   LearningRate 0.0001   Epoch: 19   Global Step: 203980   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:13:37,922-Speed 5499.30 samples/sec   Loss 0.9134   LearningRate 0.0001   Epoch: 19   Global Step: 203990   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-01-09 17:13:45,333-Speed 5527.43 samples/sec   Loss 0.8809   LearningRate 0.0001   Epoch: 19   Global Step: 204000   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-01-09 17:14:29,924-[lfw][204000]XNorm: 22.147161
Training: 2022-01-09 17:14:29,925-[lfw][204000]Accuracy-Flip: 0.99833+-0.00236
Training: 2022-01-09 17:14:29,926-[lfw][204000]Accuracy-Highest: 0.99850
Training: 2022-01-09 17:15:21,191-[cfp_fp][204000]XNorm: 22.013811
Training: 2022-01-09 17:15:21,192-[cfp_fp][204000]Accuracy-Flip: 0.99400+-0.00343
Training: 2022-01-09 17:15:21,193-[cfp_fp][204000]Accuracy-Highest: 0.99443
Training: 2022-01-09 17:16:05,085-[agedb_30][204000]XNorm: 22.905120
Training: 2022-01-09 17:16:05,085-[agedb_30][204000]Accuracy-Flip: 0.98683+-0.00529
Training: 2022-01-09 17:16:05,086-[agedb_30][204000]Accuracy-Highest: 0.98683
Training: 2022-01-09 17:16:12,689-Speed 277.97 samples/sec   Loss 0.8988   LearningRate 0.0001   Epoch: 19   Global Step: 204010   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-01-09 17:16:20,190-Speed 5461.43 samples/sec   Loss 0.8793   LearningRate 0.0001   Epoch: 19   Global Step: 204020   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-01-09 17:16:27,631-Speed 5504.90 samples/sec   Loss 0.8764   LearningRate 0.0001   Epoch: 19   Global Step: 204030   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-01-09 17:16:35,131-Speed 5462.16 samples/sec   Loss 0.8731   LearningRate 0.0001   Epoch: 19   Global Step: 204040   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-01-09 17:16:42,621-Speed 5469.62 samples/sec   Loss 0.8792   LearningRate 0.0001   Epoch: 19   Global Step: 204050   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-01-09 17:16:50,097-Speed 5479.02 samples/sec   Loss 0.8592   LearningRate 0.0001   Epoch: 19   Global Step: 204060   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-01-09 17:16:57,511-Speed 5526.02 samples/sec   Loss 0.8968   LearningRate 0.0001   Epoch: 19   Global Step: 204070   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-01-09 17:17:05,034-Speed 5445.38 samples/sec   Loss 0.8976   LearningRate 0.0001   Epoch: 19   Global Step: 204080   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-01-09 17:17:12,574-Speed 5433.26 samples/sec   Loss 0.8697   LearningRate 0.0001   Epoch: 19   Global Step: 204090   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:17:19,980-Speed 5530.92 samples/sec   Loss 0.8838   LearningRate 0.0001   Epoch: 19   Global Step: 204100   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:17:27,419-Speed 5507.39 samples/sec   Loss 0.8800   LearningRate 0.0001   Epoch: 19   Global Step: 204110   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:17:34,813-Speed 5539.96 samples/sec   Loss 0.8955   LearningRate 0.0001   Epoch: 19   Global Step: 204120   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:17:42,345-Speed 5439.20 samples/sec   Loss 0.8895   LearningRate 0.0001   Epoch: 19   Global Step: 204130   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:17:49,936-Speed 5396.21 samples/sec   Loss 0.9062   LearningRate 0.0001   Epoch: 19   Global Step: 204140   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:17:57,398-Speed 5490.52 samples/sec   Loss 0.8984   LearningRate 0.0001   Epoch: 19   Global Step: 204150   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:18:05,004-Speed 5385.54 samples/sec   Loss 0.8869   LearningRate 0.0001   Epoch: 19   Global Step: 204160   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:18:12,431-Speed 5515.58 samples/sec   Loss 0.9113   LearningRate 0.0001   Epoch: 19   Global Step: 204170   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:18:19,868-Speed 5508.43 samples/sec   Loss 0.8836   LearningRate 0.0001   Epoch: 19   Global Step: 204180   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:18:27,282-Speed 5525.29 samples/sec   Loss 0.8772   LearningRate 0.0001   Epoch: 19   Global Step: 204190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:18:34,763-Speed 5476.13 samples/sec   Loss 0.8826   LearningRate 0.0001   Epoch: 19   Global Step: 204200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:18:42,262-Speed 5462.84 samples/sec   Loss 0.8780   LearningRate 0.0001   Epoch: 19   Global Step: 204210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:18:49,690-Speed 5515.07 samples/sec   Loss 0.8842   LearningRate 0.0001   Epoch: 19   Global Step: 204220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:18:57,334-Speed 5359.11 samples/sec   Loss 0.8800   LearningRate 0.0001   Epoch: 19   Global Step: 204230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:19:05,003-Speed 5342.19 samples/sec   Loss 0.8859   LearningRate 0.0001   Epoch: 19   Global Step: 204240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:19:12,630-Speed 5370.93 samples/sec   Loss 0.8778   LearningRate 0.0001   Epoch: 19   Global Step: 204250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:19:20,166-Speed 5435.70 samples/sec   Loss 0.9080   LearningRate 0.0001   Epoch: 19   Global Step: 204260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:19:27,689-Speed 5445.25 samples/sec   Loss 0.8810   LearningRate 0.0001   Epoch: 19   Global Step: 204270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:19:35,098-Speed 5529.77 samples/sec   Loss 0.8844   LearningRate 0.0001   Epoch: 19   Global Step: 204280   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:19:42,645-Speed 5428.00 samples/sec   Loss 0.8825   LearningRate 0.0001   Epoch: 19   Global Step: 204290   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:19:50,187-Speed 5431.17 samples/sec   Loss 0.8736   LearningRate 0.0001   Epoch: 19   Global Step: 204300   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:19:57,766-Speed 5405.24 samples/sec   Loss 0.8839   LearningRate 0.0001   Epoch: 19   Global Step: 204310   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:20:05,244-Speed 5478.82 samples/sec   Loss 0.8948   LearningRate 0.0001   Epoch: 19   Global Step: 204320   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:20:12,768-Speed 5444.15 samples/sec   Loss 0.8945   LearningRate 0.0001   Epoch: 19   Global Step: 204330   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:20:20,334-Speed 5414.13 samples/sec   Loss 0.8594   LearningRate 0.0001   Epoch: 19   Global Step: 204340   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:20:27,791-Speed 5494.04 samples/sec   Loss 0.8775   LearningRate 0.0001   Epoch: 19   Global Step: 204350   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:20:35,280-Speed 5470.33 samples/sec   Loss 0.8870   LearningRate 0.0001   Epoch: 19   Global Step: 204360   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:20:42,746-Speed 5487.16 samples/sec   Loss 0.8786   LearningRate 0.0001   Epoch: 19   Global Step: 204370   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:20:50,306-Speed 5418.15 samples/sec   Loss 0.8779   LearningRate 0.0001   Epoch: 19   Global Step: 204380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:20:57,946-Speed 5362.47 samples/sec   Loss 0.8733   LearningRate 0.0001   Epoch: 19   Global Step: 204390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:21:05,485-Speed 5433.35 samples/sec   Loss 0.8783   LearningRate 0.0001   Epoch: 19   Global Step: 204400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:21:13,037-Speed 5424.74 samples/sec   Loss 0.8712   LearningRate 0.0001   Epoch: 19   Global Step: 204410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:21:20,424-Speed 5545.55 samples/sec   Loss 0.8867   LearningRate 0.0001   Epoch: 19   Global Step: 204420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:21:27,845-Speed 5520.35 samples/sec   Loss 0.8768   LearningRate 0.0001   Epoch: 19   Global Step: 204430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:21:35,322-Speed 5479.11 samples/sec   Loss 0.8988   LearningRate 0.0001   Epoch: 19   Global Step: 204440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:21:42,742-Speed 5521.02 samples/sec   Loss 0.8720   LearningRate 0.0001   Epoch: 19   Global Step: 204450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:21:50,142-Speed 5536.05 samples/sec   Loss 0.8650   LearningRate 0.0001   Epoch: 19   Global Step: 204460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:21:57,550-Speed 5529.58 samples/sec   Loss 0.8710   LearningRate 0.0001   Epoch: 19   Global Step: 204470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:22:04,997-Speed 5500.64 samples/sec   Loss 0.8772   LearningRate 0.0001   Epoch: 19   Global Step: 204480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:22:12,370-Speed 5556.69 samples/sec   Loss 0.8852   LearningRate 0.0001   Epoch: 19   Global Step: 204490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:22:19,793-Speed 5518.43 samples/sec   Loss 0.8725   LearningRate 0.0001   Epoch: 19   Global Step: 204500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:22:27,174-Speed 5550.41 samples/sec   Loss 0.8834   LearningRate 0.0001   Epoch: 19   Global Step: 204510   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:22:34,566-Speed 5542.05 samples/sec   Loss 0.8712   LearningRate 0.0001   Epoch: 19   Global Step: 204520   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:22:41,953-Speed 5545.61 samples/sec   Loss 0.8730   LearningRate 0.0001   Epoch: 19   Global Step: 204530   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:22:49,382-Speed 5514.56 samples/sec   Loss 0.8645   LearningRate 0.0001   Epoch: 19   Global Step: 204540   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:22:56,756-Speed 5554.69 samples/sec   Loss 0.8924   LearningRate 0.0001   Epoch: 19   Global Step: 204550   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:23:04,211-Speed 5495.53 samples/sec   Loss 0.8920   LearningRate 0.0001   Epoch: 19   Global Step: 204560   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:23:11,660-Speed 5499.73 samples/sec   Loss 0.8782   LearningRate 0.0001   Epoch: 19   Global Step: 204570   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:23:19,049-Speed 5544.42 samples/sec   Loss 0.8830   LearningRate 0.0001   Epoch: 19   Global Step: 204580   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:23:26,428-Speed 5551.09 samples/sec   Loss 0.8993   LearningRate 0.0001   Epoch: 19   Global Step: 204590   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:23:33,843-Speed 5525.11 samples/sec   Loss 0.8736   LearningRate 0.0001   Epoch: 19   Global Step: 204600   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:23:41,226-Speed 5548.61 samples/sec   Loss 0.8848   LearningRate 0.0001   Epoch: 19   Global Step: 204610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:23:48,596-Speed 5558.70 samples/sec   Loss 0.8912   LearningRate 0.0001   Epoch: 19   Global Step: 204620   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:23:55,999-Speed 5533.52 samples/sec   Loss 0.8749   LearningRate 0.0001   Epoch: 19   Global Step: 204630   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:24:03,415-Speed 5523.58 samples/sec   Loss 0.8859   LearningRate 0.0001   Epoch: 19   Global Step: 204640   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:24:10,861-Speed 5502.49 samples/sec   Loss 0.8761   LearningRate 0.0001   Epoch: 19   Global Step: 204650   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:24:18,428-Speed 5413.68 samples/sec   Loss 0.8675   LearningRate 0.0001   Epoch: 19   Global Step: 204660   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:24:25,912-Speed 5473.36 samples/sec   Loss 0.8581   LearningRate 0.0001   Epoch: 19   Global Step: 204670   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:24:33,398-Speed 5472.27 samples/sec   Loss 0.8770   LearningRate 0.0001   Epoch: 19   Global Step: 204680   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:24:40,967-Speed 5412.56 samples/sec   Loss 0.8915   LearningRate 0.0001   Epoch: 19   Global Step: 204690   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:24:48,380-Speed 5526.11 samples/sec   Loss 0.8658   LearningRate 0.0001   Epoch: 19   Global Step: 204700   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:24:55,782-Speed 5534.39 samples/sec   Loss 0.8697   LearningRate 0.0001   Epoch: 19   Global Step: 204710   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:25:03,195-Speed 5525.55 samples/sec   Loss 0.8842   LearningRate 0.0001   Epoch: 19   Global Step: 204720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:25:10,705-Speed 5455.64 samples/sec   Loss 0.8995   LearningRate 0.0001   Epoch: 19   Global Step: 204730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:25:18,155-Speed 5498.62 samples/sec   Loss 0.8620   LearningRate 0.0001   Epoch: 19   Global Step: 204740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:25:25,580-Speed 5517.34 samples/sec   Loss 0.8601   LearningRate 0.0001   Epoch: 19   Global Step: 204750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:25:33,000-Speed 5520.65 samples/sec   Loss 0.8710   LearningRate 0.0001   Epoch: 19   Global Step: 204760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:25:40,419-Speed 5522.11 samples/sec   Loss 0.8693   LearningRate 0.0001   Epoch: 19   Global Step: 204770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:25:48,004-Speed 5401.12 samples/sec   Loss 0.8733   LearningRate 0.0001   Epoch: 19   Global Step: 204780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:25:55,438-Speed 5510.68 samples/sec   Loss 0.8732   LearningRate 0.0001   Epoch: 19   Global Step: 204790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:26:02,968-Speed 5439.80 samples/sec   Loss 0.8783   LearningRate 0.0001   Epoch: 19   Global Step: 204800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:26:10,579-Speed 5382.92 samples/sec   Loss 0.8740   LearningRate 0.0001   Epoch: 19   Global Step: 204810   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:26:18,050-Speed 5483.25 samples/sec   Loss 0.8823   LearningRate 0.0001   Epoch: 19   Global Step: 204820   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:26:25,432-Speed 5549.22 samples/sec   Loss 0.8743   LearningRate 0.0001   Epoch: 19   Global Step: 204830   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:26:32,848-Speed 5524.31 samples/sec   Loss 0.8624   LearningRate 0.0000   Epoch: 19   Global Step: 204840   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:26:40,252-Speed 5533.07 samples/sec   Loss 0.8630   LearningRate 0.0000   Epoch: 19   Global Step: 204850   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:26:47,668-Speed 5524.07 samples/sec   Loss 0.8747   LearningRate 0.0000   Epoch: 19   Global Step: 204860   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:26:55,103-Speed 5509.69 samples/sec   Loss 0.8823   LearningRate 0.0000   Epoch: 19   Global Step: 204870   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:27:02,503-Speed 5535.85 samples/sec   Loss 0.8765   LearningRate 0.0000   Epoch: 19   Global Step: 204880   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:27:09,949-Speed 5501.81 samples/sec   Loss 0.8726   LearningRate 0.0000   Epoch: 19   Global Step: 204890   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:27:17,350-Speed 5535.13 samples/sec   Loss 0.8779   LearningRate 0.0000   Epoch: 19   Global Step: 204900   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 17:27:24,821-Speed 5483.62 samples/sec   Loss 0.8730   LearningRate 0.0000   Epoch: 19   Global Step: 204910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:27:32,264-Speed 5504.05 samples/sec   Loss 0.8711   LearningRate 0.0000   Epoch: 19   Global Step: 204920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:27:39,685-Speed 5520.14 samples/sec   Loss 0.8924   LearningRate 0.0000   Epoch: 19   Global Step: 204930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:27:47,098-Speed 5526.54 samples/sec   Loss 0.8768   LearningRate 0.0000   Epoch: 19   Global Step: 204940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:27:54,473-Speed 5554.46 samples/sec   Loss 0.8773   LearningRate 0.0000   Epoch: 19   Global Step: 204950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:28:01,845-Speed 5556.77 samples/sec   Loss 0.8897   LearningRate 0.0000   Epoch: 19   Global Step: 204960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:28:09,328-Speed 5474.30 samples/sec   Loss 0.8648   LearningRate 0.0000   Epoch: 19   Global Step: 204970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:28:16,775-Speed 5501.19 samples/sec   Loss 0.8846   LearningRate 0.0000   Epoch: 19   Global Step: 204980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:28:24,147-Speed 5557.20 samples/sec   Loss 0.8826   LearningRate 0.0000   Epoch: 19   Global Step: 204990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:28:31,567-Speed 5520.89 samples/sec   Loss 0.8522   LearningRate 0.0000   Epoch: 19   Global Step: 205000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:28:38,931-Speed 5562.68 samples/sec   Loss 0.8786   LearningRate 0.0000   Epoch: 19   Global Step: 205010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 17:28:46,361-Speed 5513.70 samples/sec   Loss 0.8688   LearningRate 0.0000   Epoch: 19   Global Step: 205020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 17:28:53,823-Speed 5490.44 samples/sec   Loss 0.8737   LearningRate 0.0000   Epoch: 19   Global Step: 205030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 17:29:01,383-Speed 5418.03 samples/sec   Loss 0.8956   LearningRate 0.0000   Epoch: 19   Global Step: 205040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 17:29:08,796-Speed 5526.05 samples/sec   Loss 0.8988   LearningRate 0.0000   Epoch: 19   Global Step: 205050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 17:29:16,195-Speed 5537.05 samples/sec   Loss 0.8716   LearningRate 0.0000   Epoch: 19   Global Step: 205060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:29:23,732-Speed 5435.41 samples/sec   Loss 0.8817   LearningRate 0.0000   Epoch: 19   Global Step: 205070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:29:31,138-Speed 5531.57 samples/sec   Loss 0.8785   LearningRate 0.0000   Epoch: 19   Global Step: 205080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:29:38,549-Speed 5526.95 samples/sec   Loss 0.8729   LearningRate 0.0000   Epoch: 19   Global Step: 205090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:29:45,989-Speed 5506.68 samples/sec   Loss 0.8887   LearningRate 0.0000   Epoch: 19   Global Step: 205100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:29:53,407-Speed 5523.08 samples/sec   Loss 0.8892   LearningRate 0.0000   Epoch: 19   Global Step: 205110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:30:00,841-Speed 5510.19 samples/sec   Loss 0.8931   LearningRate 0.0000   Epoch: 19   Global Step: 205120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:30:08,308-Speed 5486.29 samples/sec   Loss 0.8691   LearningRate 0.0000   Epoch: 19   Global Step: 205130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:30:15,711-Speed 5533.42 samples/sec   Loss 0.8733   LearningRate 0.0000   Epoch: 19   Global Step: 205140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:30:23,131-Speed 5521.14 samples/sec   Loss 0.8981   LearningRate 0.0000   Epoch: 19   Global Step: 205150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 17:30:30,630-Speed 5463.23 samples/sec   Loss 0.8789   LearningRate 0.0000   Epoch: 19   Global Step: 205160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 17:30:38,179-Speed 5426.62 samples/sec   Loss 0.8819   LearningRate 0.0000   Epoch: 19   Global Step: 205170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 17:30:45,793-Speed 5380.53 samples/sec   Loss 0.8946   LearningRate 0.0000   Epoch: 19   Global Step: 205180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 17:30:53,178-Speed 5547.06 samples/sec   Loss 0.8788   LearningRate 0.0000   Epoch: 19   Global Step: 205190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 17:31:00,682-Speed 5459.04 samples/sec   Loss 0.8711   LearningRate 0.0000   Epoch: 19   Global Step: 205200   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 17:31:08,120-Speed 5507.54 samples/sec   Loss 0.8586   LearningRate 0.0000   Epoch: 19   Global Step: 205210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 17:31:15,533-Speed 5525.82 samples/sec   Loss 0.8618   LearningRate 0.0000   Epoch: 19   Global Step: 205220   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 17:31:22,963-Speed 5514.39 samples/sec   Loss 0.8601   LearningRate 0.0000   Epoch: 19   Global Step: 205230   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:31:30,518-Speed 5422.27 samples/sec   Loss 0.8401   LearningRate 0.0000   Epoch: 19   Global Step: 205240   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:31:37,966-Speed 5499.75 samples/sec   Loss 0.8931   LearningRate 0.0000   Epoch: 19   Global Step: 205250   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:31:45,436-Speed 5484.19 samples/sec   Loss 0.8657   LearningRate 0.0000   Epoch: 19   Global Step: 205260   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:31:52,889-Speed 5496.69 samples/sec   Loss 0.8786   LearningRate 0.0000   Epoch: 19   Global Step: 205270   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:32:00,334-Speed 5502.07 samples/sec   Loss 0.8892   LearningRate 0.0000   Epoch: 19   Global Step: 205280   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:32:07,895-Speed 5417.69 samples/sec   Loss 0.8731   LearningRate 0.0000   Epoch: 19   Global Step: 205290   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:32:15,340-Speed 5502.97 samples/sec   Loss 0.8787   LearningRate 0.0000   Epoch: 19   Global Step: 205300   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:32:22,761-Speed 5520.16 samples/sec   Loss 0.8745   LearningRate 0.0000   Epoch: 19   Global Step: 205310   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:32:30,175-Speed 5525.71 samples/sec   Loss 0.8938   LearningRate 0.0000   Epoch: 19   Global Step: 205320   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:32:37,604-Speed 5514.03 samples/sec   Loss 0.8730   LearningRate 0.0000   Epoch: 19   Global Step: 205330   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 17:32:45,003-Speed 5536.48 samples/sec   Loss 0.9006   LearningRate 0.0000   Epoch: 19   Global Step: 205340   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 17:32:52,532-Speed 5441.63 samples/sec   Loss 0.8715   LearningRate 0.0000   Epoch: 19   Global Step: 205350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 17:32:59,978-Speed 5502.00 samples/sec   Loss 0.8730   LearningRate 0.0000   Epoch: 19   Global Step: 205360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 17:33:07,409-Speed 5512.37 samples/sec   Loss 0.8865   LearningRate 0.0000   Epoch: 19   Global Step: 205370   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:33:14,858-Speed 5499.75 samples/sec   Loss 0.8733   LearningRate 0.0000   Epoch: 19   Global Step: 205380   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:33:22,268-Speed 5528.69 samples/sec   Loss 0.8545   LearningRate 0.0000   Epoch: 19   Global Step: 205390   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:33:29,752-Speed 5473.38 samples/sec   Loss 0.8610   LearningRate 0.0000   Epoch: 19   Global Step: 205400   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:33:37,195-Speed 5504.02 samples/sec   Loss 0.8782   LearningRate 0.0000   Epoch: 19   Global Step: 205410   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:33:44,573-Speed 5552.03 samples/sec   Loss 0.8825   LearningRate 0.0000   Epoch: 19   Global Step: 205420   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:33:52,044-Speed 5483.45 samples/sec   Loss 0.8914   LearningRate 0.0000   Epoch: 19   Global Step: 205430   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:33:59,539-Speed 5466.10 samples/sec   Loss 0.8824   LearningRate 0.0000   Epoch: 19   Global Step: 205440   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:34:07,145-Speed 5386.18 samples/sec   Loss 0.8584   LearningRate 0.0000   Epoch: 19   Global Step: 205450   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:34:14,586-Speed 5505.13 samples/sec   Loss 0.8783   LearningRate 0.0000   Epoch: 19   Global Step: 205460   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:34:22,390-Speed 5249.73 samples/sec   Loss 0.8613   LearningRate 0.0000   Epoch: 19   Global Step: 205470   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:34:29,938-Speed 5427.06 samples/sec   Loss 0.8855   LearningRate 0.0000   Epoch: 19   Global Step: 205480   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:34:37,375-Speed 5508.57 samples/sec   Loss 0.8875   LearningRate 0.0000   Epoch: 19   Global Step: 205490   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:34:44,809-Speed 5510.07 samples/sec   Loss 0.8877   LearningRate 0.0000   Epoch: 19   Global Step: 205500   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:34:52,331-Speed 5446.32 samples/sec   Loss 0.8615   LearningRate 0.0000   Epoch: 19   Global Step: 205510   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:34:59,770-Speed 5507.61 samples/sec   Loss 0.8945   LearningRate 0.0000   Epoch: 19   Global Step: 205520   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:35:07,228-Speed 5492.46 samples/sec   Loss 0.8778   LearningRate 0.0000   Epoch: 19   Global Step: 205530   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:35:14,667-Speed 5507.13 samples/sec   Loss 0.8688   LearningRate 0.0000   Epoch: 19   Global Step: 205540   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:35:22,073-Speed 5531.49 samples/sec   Loss 0.8718   LearningRate 0.0000   Epoch: 19   Global Step: 205550   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:35:29,584-Speed 5454.22 samples/sec   Loss 0.8913   LearningRate 0.0000   Epoch: 19   Global Step: 205560   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:35:37,060-Speed 5479.43 samples/sec   Loss 0.8784   LearningRate 0.0000   Epoch: 19   Global Step: 205570   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:35:44,559-Speed 5462.85 samples/sec   Loss 0.8658   LearningRate 0.0000   Epoch: 19   Global Step: 205580   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:35:52,169-Speed 5382.83 samples/sec   Loss 0.8751   LearningRate 0.0000   Epoch: 19   Global Step: 205590   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:35:59,634-Speed 5488.01 samples/sec   Loss 0.8708   LearningRate 0.0000   Epoch: 19   Global Step: 205600   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:36:07,241-Speed 5385.00 samples/sec   Loss 0.8809   LearningRate 0.0000   Epoch: 19   Global Step: 205610   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:36:14,792-Speed 5424.82 samples/sec   Loss 0.8649   LearningRate 0.0000   Epoch: 19   Global Step: 205620   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:36:22,277-Speed 5472.72 samples/sec   Loss 0.8877   LearningRate 0.0000   Epoch: 19   Global Step: 205630   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:36:29,798-Speed 5447.36 samples/sec   Loss 0.8633   LearningRate 0.0000   Epoch: 19   Global Step: 205640   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:36:37,219-Speed 5520.06 samples/sec   Loss 0.8652   LearningRate 0.0000   Epoch: 19   Global Step: 205650   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:36:44,727-Speed 5456.41 samples/sec   Loss 0.8800   LearningRate 0.0000   Epoch: 19   Global Step: 205660   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:36:52,234-Speed 5456.56 samples/sec   Loss 0.8672   LearningRate 0.0000   Epoch: 19   Global Step: 205670   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:36:59,669-Speed 5510.51 samples/sec   Loss 0.8763   LearningRate 0.0000   Epoch: 19   Global Step: 205680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:37:07,104-Speed 5509.33 samples/sec   Loss 0.8851   LearningRate 0.0000   Epoch: 19   Global Step: 205690   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:37:14,626-Speed 5446.05 samples/sec   Loss 0.8713   LearningRate 0.0000   Epoch: 19   Global Step: 205700   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:37:22,106-Speed 5476.97 samples/sec   Loss 0.8624   LearningRate 0.0000   Epoch: 19   Global Step: 205710   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:37:29,578-Speed 5482.44 samples/sec   Loss 0.8755   LearningRate 0.0000   Epoch: 19   Global Step: 205720   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:37:37,037-Speed 5492.29 samples/sec   Loss 0.8636   LearningRate 0.0000   Epoch: 19   Global Step: 205730   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:37:44,534-Speed 5463.77 samples/sec   Loss 0.8713   LearningRate 0.0000   Epoch: 19   Global Step: 205740   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:37:51,931-Speed 5538.74 samples/sec   Loss 0.8740   LearningRate 0.0000   Epoch: 19   Global Step: 205750   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:37:59,452-Speed 5446.82 samples/sec   Loss 0.8859   LearningRate 0.0000   Epoch: 19   Global Step: 205760   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:38:06,939-Speed 5471.94 samples/sec   Loss 0.8932   LearningRate 0.0000   Epoch: 19   Global Step: 205770   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 17:38:14,504-Speed 5414.36 samples/sec   Loss 0.8899   LearningRate 0.0000   Epoch: 19   Global Step: 205780   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:38:21,944-Speed 5506.77 samples/sec   Loss 0.8808   LearningRate 0.0000   Epoch: 19   Global Step: 205790   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:38:29,416-Speed 5482.65 samples/sec   Loss 0.8740   LearningRate 0.0000   Epoch: 19   Global Step: 205800   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:38:36,832-Speed 5524.18 samples/sec   Loss 0.8758   LearningRate 0.0000   Epoch: 19   Global Step: 205810   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:38:44,232-Speed 5535.29 samples/sec   Loss 0.8955   LearningRate 0.0000   Epoch: 19   Global Step: 205820   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:38:51,743-Speed 5454.61 samples/sec   Loss 0.8714   LearningRate 0.0000   Epoch: 19   Global Step: 205830   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:38:59,131-Speed 5544.62 samples/sec   Loss 0.8713   LearningRate 0.0000   Epoch: 19   Global Step: 205840   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:39:06,636-Speed 5457.96 samples/sec   Loss 0.8893   LearningRate 0.0000   Epoch: 19   Global Step: 205850   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:39:14,095-Speed 5492.18 samples/sec   Loss 0.8805   LearningRate 0.0000   Epoch: 19   Global Step: 205860   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:39:21,557-Speed 5489.96 samples/sec   Loss 0.8588   LearningRate 0.0000   Epoch: 19   Global Step: 205870   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:39:29,072-Speed 5451.00 samples/sec   Loss 0.8759   LearningRate 0.0000   Epoch: 19   Global Step: 205880   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:39:36,495-Speed 5518.79 samples/sec   Loss 0.8634   LearningRate 0.0000   Epoch: 19   Global Step: 205890   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:39:43,929-Speed 5511.08 samples/sec   Loss 0.8911   LearningRate 0.0000   Epoch: 19   Global Step: 205900   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:39:51,381-Speed 5496.85 samples/sec   Loss 0.8699   LearningRate 0.0000   Epoch: 19   Global Step: 205910   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:39:58,832-Speed 5497.86 samples/sec   Loss 0.8842   LearningRate 0.0000   Epoch: 19   Global Step: 205920   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:40:06,246-Speed 5526.11 samples/sec   Loss 0.8965   LearningRate 0.0000   Epoch: 19   Global Step: 205930   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:40:13,782-Speed 5435.51 samples/sec   Loss 0.8775   LearningRate 0.0000   Epoch: 19   Global Step: 205940   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:40:21,343-Speed 5418.15 samples/sec   Loss 0.8730   LearningRate 0.0000   Epoch: 19   Global Step: 205950   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:40:28,788-Speed 5502.56 samples/sec   Loss 0.8657   LearningRate 0.0000   Epoch: 19   Global Step: 205960   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:40:36,238-Speed 5499.01 samples/sec   Loss 0.8779   LearningRate 0.0000   Epoch: 19   Global Step: 205970   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:40:43,660-Speed 5519.44 samples/sec   Loss 0.8805   LearningRate 0.0000   Epoch: 19   Global Step: 205980   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:40:51,190-Speed 5440.30 samples/sec   Loss 0.8866   LearningRate 0.0000   Epoch: 19   Global Step: 205990   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:40:58,673-Speed 5474.79 samples/sec   Loss 0.8730   LearningRate 0.0000   Epoch: 19   Global Step: 206000   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:41:42,849-[lfw][206000]XNorm: 22.171104
Training: 2022-01-09 17:41:42,850-[lfw][206000]Accuracy-Flip: 0.99833+-0.00236
Training: 2022-01-09 17:41:42,851-[lfw][206000]Accuracy-Highest: 0.99850
Training: 2022-01-09 17:42:34,401-[cfp_fp][206000]XNorm: 22.023721
Training: 2022-01-09 17:42:34,402-[cfp_fp][206000]Accuracy-Flip: 0.99400+-0.00343
Training: 2022-01-09 17:42:34,402-[cfp_fp][206000]Accuracy-Highest: 0.99443
Training: 2022-01-09 17:43:19,063-[agedb_30][206000]XNorm: 22.929383
Training: 2022-01-09 17:43:19,064-[agedb_30][206000]Accuracy-Flip: 0.98650+-0.00529
Training: 2022-01-09 17:43:19,064-[agedb_30][206000]Accuracy-Highest: 0.98683
Training: 2022-01-09 17:43:26,662-Speed 276.78 samples/sec   Loss 0.9022   LearningRate 0.0000   Epoch: 19   Global Step: 206010   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:43:34,161-Speed 5463.02 samples/sec   Loss 0.8746   LearningRate 0.0000   Epoch: 19   Global Step: 206020   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:43:41,629-Speed 5485.63 samples/sec   Loss 0.8807   LearningRate 0.0000   Epoch: 19   Global Step: 206030   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:43:49,165-Speed 5436.19 samples/sec   Loss 0.8638   LearningRate 0.0000   Epoch: 19   Global Step: 206040   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:43:56,605-Speed 5505.47 samples/sec   Loss 0.8566   LearningRate 0.0000   Epoch: 19   Global Step: 206050   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:44:04,138-Speed 5437.97 samples/sec   Loss 0.8731   LearningRate 0.0000   Epoch: 19   Global Step: 206060   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:44:11,607-Speed 5485.03 samples/sec   Loss 0.8533   LearningRate 0.0000   Epoch: 19   Global Step: 206070   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:44:19,045-Speed 5507.48 samples/sec   Loss 0.8671   LearningRate 0.0000   Epoch: 19   Global Step: 206080   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:44:26,479-Speed 5510.44 samples/sec   Loss 0.8579   LearningRate 0.0000   Epoch: 19   Global Step: 206090   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:44:34,041-Speed 5417.64 samples/sec   Loss 0.8639   LearningRate 0.0000   Epoch: 19   Global Step: 206100   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:44:41,472-Speed 5512.66 samples/sec   Loss 0.8660   LearningRate 0.0000   Epoch: 19   Global Step: 206110   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:44:48,890-Speed 5522.66 samples/sec   Loss 0.8790   LearningRate 0.0000   Epoch: 19   Global Step: 206120   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:44:56,338-Speed 5499.77 samples/sec   Loss 0.8771   LearningRate 0.0000   Epoch: 19   Global Step: 206130   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:45:03,773-Speed 5510.35 samples/sec   Loss 0.8663   LearningRate 0.0000   Epoch: 19   Global Step: 206140   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:45:11,189-Speed 5524.23 samples/sec   Loss 0.8621   LearningRate 0.0000   Epoch: 19   Global Step: 206150   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:45:18,664-Speed 5480.08 samples/sec   Loss 0.8806   LearningRate 0.0000   Epoch: 19   Global Step: 206160   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:45:26,082-Speed 5522.47 samples/sec   Loss 0.8641   LearningRate 0.0000   Epoch: 19   Global Step: 206170   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:45:33,491-Speed 5529.47 samples/sec   Loss 0.8713   LearningRate 0.0000   Epoch: 19   Global Step: 206180   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:45:40,919-Speed 5514.48 samples/sec   Loss 0.8977   LearningRate 0.0000   Epoch: 19   Global Step: 206190   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:45:48,371-Speed 5497.70 samples/sec   Loss 0.8714   LearningRate 0.0000   Epoch: 19   Global Step: 206200   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:45:55,831-Speed 5491.08 samples/sec   Loss 0.8623   LearningRate 0.0000   Epoch: 19   Global Step: 206210   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:46:03,235-Speed 5533.51 samples/sec   Loss 0.8760   LearningRate 0.0000   Epoch: 19   Global Step: 206220   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:46:10,668-Speed 5510.85 samples/sec   Loss 0.8739   LearningRate 0.0000   Epoch: 19   Global Step: 206230   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:46:18,173-Speed 5458.56 samples/sec   Loss 0.8710   LearningRate 0.0000   Epoch: 19   Global Step: 206240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 17:46:25,705-Speed 5439.44 samples/sec   Loss 0.8603   LearningRate 0.0000   Epoch: 19   Global Step: 206250   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 17:46:33,082-Speed 5552.56 samples/sec   Loss 0.8677   LearningRate 0.0000   Epoch: 19   Global Step: 206260   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:46:40,520-Speed 5507.63 samples/sec   Loss 0.8848   LearningRate 0.0000   Epoch: 19   Global Step: 206270   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:46:48,014-Speed 5466.69 samples/sec   Loss 0.8811   LearningRate 0.0000   Epoch: 19   Global Step: 206280   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:46:55,518-Speed 5459.68 samples/sec   Loss 0.8952   LearningRate 0.0000   Epoch: 19   Global Step: 206290   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:47:03,051-Speed 5438.15 samples/sec   Loss 0.8632   LearningRate 0.0000   Epoch: 19   Global Step: 206300   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:47:10,494-Speed 5503.82 samples/sec   Loss 0.8690   LearningRate 0.0000   Epoch: 19   Global Step: 206310   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:47:17,921-Speed 5516.10 samples/sec   Loss 0.8671   LearningRate 0.0000   Epoch: 19   Global Step: 206320   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:47:25,334-Speed 5526.28 samples/sec   Loss 0.8761   LearningRate 0.0000   Epoch: 19   Global Step: 206330   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:47:32,806-Speed 5482.37 samples/sec   Loss 0.8615   LearningRate 0.0000   Epoch: 19   Global Step: 206340   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:47:40,237-Speed 5512.71 samples/sec   Loss 0.8708   LearningRate 0.0000   Epoch: 19   Global Step: 206350   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:47:47,655-Speed 5522.59 samples/sec   Loss 0.8831   LearningRate 0.0000   Epoch: 19   Global Step: 206360   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:47:55,119-Speed 5488.23 samples/sec   Loss 0.8792   LearningRate 0.0000   Epoch: 19   Global Step: 206370   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:48:02,592-Speed 5482.03 samples/sec   Loss 0.8962   LearningRate 0.0000   Epoch: 19   Global Step: 206380   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:48:10,056-Speed 5487.97 samples/sec   Loss 0.8785   LearningRate 0.0000   Epoch: 19   Global Step: 206390   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:48:17,498-Speed 5505.07 samples/sec   Loss 0.8466   LearningRate 0.0000   Epoch: 19   Global Step: 206400   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:48:24,898-Speed 5535.84 samples/sec   Loss 0.8867   LearningRate 0.0000   Epoch: 19   Global Step: 206410   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:48:32,347-Speed 5499.29 samples/sec   Loss 0.8894   LearningRate 0.0000   Epoch: 19   Global Step: 206420   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:48:39,742-Speed 5539.52 samples/sec   Loss 0.8959   LearningRate 0.0000   Epoch: 19   Global Step: 206430   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:48:47,188-Speed 5502.02 samples/sec   Loss 0.8568   LearningRate 0.0000   Epoch: 19   Global Step: 206440   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:48:54,706-Speed 5449.04 samples/sec   Loss 0.8727   LearningRate 0.0000   Epoch: 19   Global Step: 206450   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:49:02,156-Speed 5498.50 samples/sec   Loss 0.8730   LearningRate 0.0000   Epoch: 19   Global Step: 206460   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:49:09,733-Speed 5406.82 samples/sec   Loss 0.8620   LearningRate 0.0000   Epoch: 19   Global Step: 206470   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:49:17,275-Speed 5431.81 samples/sec   Loss 0.8590   LearningRate 0.0000   Epoch: 19   Global Step: 206480   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:49:24,697-Speed 5519.81 samples/sec   Loss 0.8634   LearningRate 0.0000   Epoch: 19   Global Step: 206490   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:49:32,129-Speed 5512.09 samples/sec   Loss 0.8723   LearningRate 0.0000   Epoch: 19   Global Step: 206500   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:49:39,594-Speed 5487.22 samples/sec   Loss 0.8579   LearningRate 0.0000   Epoch: 19   Global Step: 206510   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:49:47,094-Speed 5461.94 samples/sec   Loss 0.8852   LearningRate 0.0000   Epoch: 19   Global Step: 206520   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:49:54,501-Speed 5531.20 samples/sec   Loss 0.8758   LearningRate 0.0000   Epoch: 19   Global Step: 206530   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:50:01,962-Speed 5490.61 samples/sec   Loss 0.8829   LearningRate 0.0000   Epoch: 19   Global Step: 206540   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:50:09,390-Speed 5514.91 samples/sec   Loss 0.8691   LearningRate 0.0000   Epoch: 19   Global Step: 206550   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:50:16,878-Speed 5470.82 samples/sec   Loss 0.8830   LearningRate 0.0000   Epoch: 19   Global Step: 206560   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:50:24,333-Speed 5494.84 samples/sec   Loss 0.9007   LearningRate 0.0000   Epoch: 19   Global Step: 206570   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:50:31,734-Speed 5535.60 samples/sec   Loss 0.8756   LearningRate 0.0000   Epoch: 19   Global Step: 206580   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 17:50:39,125-Speed 5541.95 samples/sec   Loss 0.8670   LearningRate 0.0000   Epoch: 19   Global Step: 206590   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:50:46,527-Speed 5534.72 samples/sec   Loss 0.8687   LearningRate 0.0000   Epoch: 19   Global Step: 206600   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:50:53,953-Speed 5517.06 samples/sec   Loss 0.8605   LearningRate 0.0000   Epoch: 19   Global Step: 206610   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:51:01,383-Speed 5513.43 samples/sec   Loss 0.8609   LearningRate 0.0000   Epoch: 19   Global Step: 206620   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:51:08,857-Speed 5480.46 samples/sec   Loss 0.8574   LearningRate 0.0000   Epoch: 19   Global Step: 206630   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:51:16,374-Speed 5449.75 samples/sec   Loss 0.8663   LearningRate 0.0000   Epoch: 19   Global Step: 206640   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:51:23,860-Speed 5472.76 samples/sec   Loss 0.8748   LearningRate 0.0000   Epoch: 19   Global Step: 206650   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:51:31,258-Speed 5537.26 samples/sec   Loss 0.8514   LearningRate 0.0000   Epoch: 19   Global Step: 206660   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:51:38,761-Speed 5459.43 samples/sec   Loss 0.8700   LearningRate 0.0000   Epoch: 19   Global Step: 206670   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:51:46,190-Speed 5515.17 samples/sec   Loss 0.8748   LearningRate 0.0000   Epoch: 19   Global Step: 206680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:51:53,638-Speed 5500.49 samples/sec   Loss 0.8713   LearningRate 0.0000   Epoch: 19   Global Step: 206690   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 17:52:01,079-Speed 5504.87 samples/sec   Loss 0.8913   LearningRate 0.0000   Epoch: 19   Global Step: 206700   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 17:52:08,584-Speed 5458.81 samples/sec   Loss 0.8471   LearningRate 0.0000   Epoch: 19   Global Step: 206710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 17:52:16,076-Speed 5467.72 samples/sec   Loss 0.8801   LearningRate 0.0000   Epoch: 19   Global Step: 206720   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 17:52:23,481-Speed 5532.71 samples/sec   Loss 0.8497   LearningRate 0.0000   Epoch: 19   Global Step: 206730   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:52:30,962-Speed 5475.68 samples/sec   Loss 0.8596   LearningRate 0.0000   Epoch: 19   Global Step: 206740   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:52:38,391-Speed 5514.04 samples/sec   Loss 0.8937   LearningRate 0.0000   Epoch: 19   Global Step: 206750   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:52:45,802-Speed 5528.18 samples/sec   Loss 0.8811   LearningRate 0.0000   Epoch: 19   Global Step: 206760   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:52:53,210-Speed 5529.84 samples/sec   Loss 0.8892   LearningRate 0.0000   Epoch: 19   Global Step: 206770   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:53:00,635-Speed 5516.91 samples/sec   Loss 0.8896   LearningRate 0.0000   Epoch: 19   Global Step: 206780   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:53:08,056-Speed 5520.73 samples/sec   Loss 0.8675   LearningRate 0.0000   Epoch: 19   Global Step: 206790   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:53:15,510-Speed 5495.44 samples/sec   Loss 0.8689   LearningRate 0.0000   Epoch: 19   Global Step: 206800   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:53:22,943-Speed 5511.58 samples/sec   Loss 0.8688   LearningRate 0.0000   Epoch: 19   Global Step: 206810   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:53:30,525-Speed 5402.93 samples/sec   Loss 0.8968   LearningRate 0.0000   Epoch: 19   Global Step: 206820   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:53:37,905-Speed 5551.20 samples/sec   Loss 0.8567   LearningRate 0.0000   Epoch: 19   Global Step: 206830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 17:53:45,305-Speed 5535.44 samples/sec   Loss 0.8700   LearningRate 0.0000   Epoch: 19   Global Step: 206840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 17:53:52,708-Speed 5533.59 samples/sec   Loss 0.8684   LearningRate 0.0000   Epoch: 19   Global Step: 206850   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 17:54:00,232-Speed 5445.35 samples/sec   Loss 0.8575   LearningRate 0.0000   Epoch: 19   Global Step: 206860   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 17:54:07,693-Speed 5490.04 samples/sec   Loss 0.8813   LearningRate 0.0000   Epoch: 19   Global Step: 206870   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 17:54:15,100-Speed 5531.15 samples/sec   Loss 0.8631   LearningRate 0.0000   Epoch: 19   Global Step: 206880   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 17:54:22,508-Speed 5529.21 samples/sec   Loss 0.8635   LearningRate 0.0000   Epoch: 19   Global Step: 206890   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:54:29,953-Speed 5503.24 samples/sec   Loss 0.8742   LearningRate 0.0000   Epoch: 19   Global Step: 206900   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:54:37,362-Speed 5528.65 samples/sec   Loss 0.8798   LearningRate 0.0000   Epoch: 19   Global Step: 206910   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:54:44,936-Speed 5409.10 samples/sec   Loss 0.8810   LearningRate 0.0000   Epoch: 19   Global Step: 206920   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:54:52,359-Speed 5518.95 samples/sec   Loss 0.8727   LearningRate 0.0000   Epoch: 19   Global Step: 206930   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:54:59,831-Speed 5482.24 samples/sec   Loss 0.8706   LearningRate 0.0000   Epoch: 19   Global Step: 206940   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:55:07,220-Speed 5544.00 samples/sec   Loss 0.8785   LearningRate 0.0000   Epoch: 19   Global Step: 206950   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:55:14,622-Speed 5534.97 samples/sec   Loss 0.8737   LearningRate 0.0000   Epoch: 19   Global Step: 206960   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:55:22,336-Speed 5310.06 samples/sec   Loss 0.8756   LearningRate 0.0000   Epoch: 19   Global Step: 206970   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:55:30,136-Speed 5252.32 samples/sec   Loss 0.8611   LearningRate 0.0000   Epoch: 19   Global Step: 206980   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:55:37,743-Speed 5385.11 samples/sec   Loss 0.8601   LearningRate 0.0000   Epoch: 19   Global Step: 206990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 17:55:45,156-Speed 5526.41 samples/sec   Loss 0.8796   LearningRate 0.0000   Epoch: 19   Global Step: 207000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 17:55:52,643-Speed 5471.80 samples/sec   Loss 0.8752   LearningRate 0.0000   Epoch: 19   Global Step: 207010   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:56:00,063-Speed 5520.29 samples/sec   Loss 0.8794   LearningRate 0.0000   Epoch: 19   Global Step: 207020   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:56:07,517-Speed 5495.97 samples/sec   Loss 0.8744   LearningRate 0.0000   Epoch: 19   Global Step: 207030   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:56:15,078-Speed 5418.29 samples/sec   Loss 0.8972   LearningRate 0.0000   Epoch: 19   Global Step: 207040   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:56:22,740-Speed 5346.71 samples/sec   Loss 0.8884   LearningRate 0.0000   Epoch: 19   Global Step: 207050   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:56:30,446-Speed 5316.02 samples/sec   Loss 0.8682   LearningRate 0.0000   Epoch: 19   Global Step: 207060   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:56:38,039-Speed 5395.09 samples/sec   Loss 0.8746   LearningRate 0.0000   Epoch: 19   Global Step: 207070   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:56:45,645-Speed 5385.79 samples/sec   Loss 0.8822   LearningRate 0.0000   Epoch: 19   Global Step: 207080   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:56:53,243-Speed 5391.52 samples/sec   Loss 0.8694   LearningRate 0.0000   Epoch: 19   Global Step: 207090   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:57:00,866-Speed 5373.94 samples/sec   Loss 0.8862   LearningRate 0.0000   Epoch: 19   Global Step: 207100   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:57:08,406-Speed 5433.19 samples/sec   Loss 0.8929   LearningRate 0.0000   Epoch: 19   Global Step: 207110   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:57:15,984-Speed 5405.92 samples/sec   Loss 0.8719   LearningRate 0.0000   Epoch: 19   Global Step: 207120   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:57:23,543-Speed 5418.60 samples/sec   Loss 0.8692   LearningRate 0.0000   Epoch: 19   Global Step: 207130   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:57:30,966-Speed 5519.46 samples/sec   Loss 0.8681   LearningRate 0.0000   Epoch: 19   Global Step: 207140   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:57:38,499-Speed 5438.15 samples/sec   Loss 0.8705   LearningRate 0.0000   Epoch: 19   Global Step: 207150   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:57:45,947-Speed 5499.96 samples/sec   Loss 0.8641   LearningRate 0.0000   Epoch: 19   Global Step: 207160   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:57:53,381-Speed 5510.62 samples/sec   Loss 0.8891   LearningRate 0.0000   Epoch: 19   Global Step: 207170   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:58:00,801-Speed 5520.45 samples/sec   Loss 0.8711   LearningRate 0.0000   Epoch: 19   Global Step: 207180   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:58:08,226-Speed 5517.71 samples/sec   Loss 0.8790   LearningRate 0.0000   Epoch: 19   Global Step: 207190   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:58:15,650-Speed 5517.70 samples/sec   Loss 0.8652   LearningRate 0.0000   Epoch: 19   Global Step: 207200   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:58:23,056-Speed 5531.15 samples/sec   Loss 0.8701   LearningRate 0.0000   Epoch: 19   Global Step: 207210   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:58:30,606-Speed 5425.81 samples/sec   Loss 0.8779   LearningRate 0.0000   Epoch: 19   Global Step: 207220   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:58:38,087-Speed 5475.79 samples/sec   Loss 0.8842   LearningRate 0.0000   Epoch: 19   Global Step: 207230   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:58:45,483-Speed 5539.10 samples/sec   Loss 0.8616   LearningRate 0.0000   Epoch: 19   Global Step: 207240   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:58:52,921-Speed 5507.58 samples/sec   Loss 0.8714   LearningRate 0.0000   Epoch: 19   Global Step: 207250   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:59:00,367-Speed 5501.64 samples/sec   Loss 0.8804   LearningRate 0.0000   Epoch: 19   Global Step: 207260   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 17:59:07,827-Speed 5491.28 samples/sec   Loss 0.8747   LearningRate 0.0000   Epoch: 19   Global Step: 207270   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:59:15,268-Speed 5505.76 samples/sec   Loss 0.8707   LearningRate 0.0000   Epoch: 19   Global Step: 207280   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:59:22,772-Speed 5459.30 samples/sec   Loss 0.8572   LearningRate 0.0000   Epoch: 19   Global Step: 207290   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:59:30,228-Speed 5494.36 samples/sec   Loss 0.8690   LearningRate 0.0000   Epoch: 19   Global Step: 207300   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:59:37,791-Speed 5416.53 samples/sec   Loss 0.8708   LearningRate 0.0000   Epoch: 19   Global Step: 207310   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:59:45,245-Speed 5495.67 samples/sec   Loss 0.8849   LearningRate 0.0000   Epoch: 19   Global Step: 207320   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 17:59:52,718-Speed 5481.61 samples/sec   Loss 0.8631   LearningRate 0.0000   Epoch: 19   Global Step: 207330   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 18:00:00,263-Speed 5429.54 samples/sec   Loss 0.8702   LearningRate 0.0000   Epoch: 19   Global Step: 207340   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 18:00:07,896-Speed 5367.24 samples/sec   Loss 0.8911   LearningRate 0.0000   Epoch: 19   Global Step: 207350   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 18:00:15,321-Speed 5517.24 samples/sec   Loss 0.8496   LearningRate 0.0000   Epoch: 19   Global Step: 207360   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 18:00:22,685-Speed 5562.67 samples/sec   Loss 0.8369   LearningRate 0.0000   Epoch: 19   Global Step: 207370   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 18:00:30,122-Speed 5508.61 samples/sec   Loss 0.8911   LearningRate 0.0000   Epoch: 19   Global Step: 207380   Fp16 Grad Scale: 16384   Required: -0 hours
Training: 2022-01-09 18:00:37,532-Speed 5527.72 samples/sec   Loss 0.8652   LearningRate 0.0000   Epoch: 19   Global Step: 207390   Fp16 Grad Scale: 16384   Required: -0 hours