Training: 2022-04-10 23:13:34,727-rank_id: 0
Training: 2022-04-10 23:13:47,955-: margin_list              [1.0, 0.5, 0.0]
Training: 2022-04-10 23:13:47,956-: network                  mbf
Training: 2022-04-10 23:13:47,956-: resume                   False
Training: 2022-04-10 23:13:47,956-: output                   work_dirs/ms1mv3_mbf
Training: 2022-04-10 23:13:47,956-: embedding_size           512
Training: 2022-04-10 23:13:47,956-: sample_rate              1.0
Training: 2022-04-10 23:13:47,956-: interclass_filtering_threshold0
Training: 2022-04-10 23:13:47,956-: fp16                     True
Training: 2022-04-10 23:13:47,956-: batch_size               128
Training: 2022-04-10 23:13:47,956-: optimizer                sgd
Training: 2022-04-10 23:13:47,956-: lr                       0.1
Training: 2022-04-10 23:13:47,956-: momentum                 0.9
Training: 2022-04-10 23:13:47,956-: weight_decay             0.0001
Training: 2022-04-10 23:13:47,956-: verbose                  2000
Training: 2022-04-10 23:13:47,956-: frequent                 10
Training: 2022-04-10 23:13:47,956-: dali                     False
Training: 2022-04-10 23:13:47,957-: rec                      /train_tmp/ms1m-retinaface-t1
Training: 2022-04-10 23:13:47,957-: num_classes              93431
Training: 2022-04-10 23:13:47,957-: num_image                5179510
Training: 2022-04-10 23:13:47,957-: num_epoch                40
Training: 2022-04-10 23:13:47,957-: warmup_epoch             0
Training: 2022-04-10 23:13:47,957-: val_targets              ['lfw', 'cfp_fp', 'agedb_30']
Training: 2022-04-10 23:13:47,957-: total_batch_size         1024
Training: 2022-04-10 23:13:47,957-: warmup_step              0
Training: 2022-04-10 23:13:47,957-: total_step               202320
Training: 2022-04-10 23:14:59,967-Reducer buckets have been rebuilt in this iteration.
Training: 2022-04-10 23:15:01,747-Speed 10488.17 samples/sec   Loss 46.1264   LearningRate 0.1000   Epoch: 0   Global Step: 20   Fp16 Grad Scale: 4096   Required: 40 hours
Training: 2022-04-10 23:15:02,779-Speed 9929.62 samples/sec   Loss 46.3272   LearningRate 0.1000   Epoch: 0   Global Step: 30   Fp16 Grad Scale: 4096   Required: 29 hours
Training: 2022-04-10 23:15:03,773-Speed 10309.45 samples/sec   Loss 46.7615   LearningRate 0.1000   Epoch: 0   Global Step: 40   Fp16 Grad Scale: 4096   Required: 23 hours
Training: 2022-04-10 23:15:04,754-Speed 10448.87 samples/sec   Loss 46.6888   LearningRate 0.1000   Epoch: 0   Global Step: 50   Fp16 Grad Scale: 4096   Required: 20 hours
Training: 2022-04-10 23:15:05,720-Speed 10607.62 samples/sec   Loss 47.2198   LearningRate 0.0999   Epoch: 0   Global Step: 60   Fp16 Grad Scale: 4096   Required: 18 hours
Training: 2022-04-10 23:15:06,719-Speed 10263.25 samples/sec   Loss 47.3228   LearningRate 0.0999   Epoch: 0   Global Step: 70   Fp16 Grad Scale: 4096   Required: 16 hours
Training: 2022-04-10 23:15:07,699-Speed 10464.45 samples/sec   Loss 47.3204   LearningRate 0.0999   Epoch: 0   Global Step: 80   Fp16 Grad Scale: 4096   Required: 15 hours
Training: 2022-04-10 23:15:08,645-Speed 10835.84 samples/sec   Loss 47.4220   LearningRate 0.0999   Epoch: 0   Global Step: 90   Fp16 Grad Scale: 4096   Required: 14 hours
Training: 2022-04-10 23:15:09,623-Speed 10474.74 samples/sec   Loss 47.2313   LearningRate 0.0999   Epoch: 0   Global Step: 100   Fp16 Grad Scale: 4096   Required: 13 hours
Training: 2022-04-10 23:15:10,638-Speed 10099.52 samples/sec   Loss 47.2685   LearningRate 0.0999   Epoch: 0   Global Step: 110   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-10 23:15:11,633-Speed 10304.79 samples/sec   Loss 47.1208   LearningRate 0.0999   Epoch: 0   Global Step: 120   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-10 23:15:12,669-Speed 9893.36 samples/sec   Loss 47.2172   LearningRate 0.0999   Epoch: 0   Global Step: 130   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-10 23:15:13,617-Speed 10807.71 samples/sec   Loss 47.1149   LearningRate 0.0999   Epoch: 0   Global Step: 140   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-10 23:15:14,602-Speed 10409.27 samples/sec   Loss 46.9971   LearningRate 0.0999   Epoch: 0   Global Step: 150   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-10 23:15:15,563-Speed 10662.65 samples/sec   Loss 46.8688   LearningRate 0.0998   Epoch: 0   Global Step: 160   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-10 23:15:16,566-Speed 10223.33 samples/sec   Loss 46.8709   LearningRate 0.0998   Epoch: 0   Global Step: 170   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-10 23:15:17,520-Speed 10742.72 samples/sec   Loss 46.8352   LearningRate 0.0998   Epoch: 0   Global Step: 180   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-10 23:15:18,492-Speed 10548.95 samples/sec   Loss 46.8928   LearningRate 0.0998   Epoch: 0   Global Step: 190   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-10 23:15:19,453-Speed 10669.19 samples/sec   Loss 46.7725   LearningRate 0.0998   Epoch: 0   Global Step: 200   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-10 23:15:20,436-Speed 10420.65 samples/sec   Loss 46.6891   LearningRate 0.0998   Epoch: 0   Global Step: 210   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-10 23:15:21,476-Speed 9859.43 samples/sec   Loss 46.5328   LearningRate 0.0998   Epoch: 0   Global Step: 220   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-10 23:15:22,452-Speed 10504.03 samples/sec   Loss 46.5115   LearningRate 0.0998   Epoch: 0   Global Step: 230   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-10 23:15:23,432-Speed 10453.11 samples/sec   Loss 46.5059   LearningRate 0.0998   Epoch: 0   Global Step: 240   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-10 23:15:24,393-Speed 10666.70 samples/sec   Loss 46.2008   LearningRate 0.0998   Epoch: 0   Global Step: 250   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-10 23:15:25,338-Speed 10848.41 samples/sec   Loss 46.2478   LearningRate 0.0997   Epoch: 0   Global Step: 260   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-10 23:15:26,303-Speed 10636.37 samples/sec   Loss 46.2209   LearningRate 0.0997   Epoch: 0   Global Step: 270   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-10 23:15:27,239-Speed 10948.23 samples/sec   Loss 46.2827   LearningRate 0.0997   Epoch: 0   Global Step: 280   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-10 23:15:28,187-Speed 10811.01 samples/sec   Loss 46.1135   LearningRate 0.0997   Epoch: 0   Global Step: 290   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-10 23:15:29,136-Speed 10800.03 samples/sec   Loss 46.0992   LearningRate 0.0997   Epoch: 0   Global Step: 300   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-10 23:15:30,119-Speed 10433.94 samples/sec   Loss 45.8585   LearningRate 0.0997   Epoch: 0   Global Step: 310   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-10 23:15:31,079-Speed 10678.26 samples/sec   Loss 45.7198   LearningRate 0.0997   Epoch: 0   Global Step: 320   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-10 23:15:32,006-Speed 11049.26 samples/sec   Loss 45.7071   LearningRate 0.0997   Epoch: 0   Global Step: 330   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-10 23:15:32,995-Speed 10363.55 samples/sec   Loss 45.6671   LearningRate 0.0997   Epoch: 0   Global Step: 340   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-10 23:15:34,017-Speed 10035.58 samples/sec   Loss 45.4988   LearningRate 0.0997   Epoch: 0   Global Step: 350   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-10 23:15:34,950-Speed 10985.79 samples/sec   Loss 45.5386   LearningRate 0.0996   Epoch: 0   Global Step: 360   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-10 23:15:35,907-Speed 10721.17 samples/sec   Loss 45.3846   LearningRate 0.0996   Epoch: 0   Global Step: 370   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:15:36,855-Speed 10800.68 samples/sec   Loss 45.3462   LearningRate 0.0996   Epoch: 0   Global Step: 380   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:15:37,806-Speed 10786.46 samples/sec   Loss 45.2252   LearningRate 0.0996   Epoch: 0   Global Step: 390   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:15:38,751-Speed 10849.11 samples/sec   Loss 45.2969   LearningRate 0.0996   Epoch: 0   Global Step: 400   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:15:39,750-Speed 10260.56 samples/sec   Loss 45.1653   LearningRate 0.0996   Epoch: 0   Global Step: 410   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:15:40,734-Speed 10407.97 samples/sec   Loss 45.0193   LearningRate 0.0996   Epoch: 0   Global Step: 420   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:15:41,756-Speed 10034.78 samples/sec   Loss 45.0363   LearningRate 0.0996   Epoch: 0   Global Step: 430   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:15:42,715-Speed 10692.83 samples/sec   Loss 45.0652   LearningRate 0.0996   Epoch: 0   Global Step: 440   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:15:43,690-Speed 10510.88 samples/sec   Loss 44.9884   LearningRate 0.0996   Epoch: 0   Global Step: 450   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:15:44,668-Speed 10481.84 samples/sec   Loss 44.9003   LearningRate 0.0995   Epoch: 0   Global Step: 460   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:15:45,625-Speed 10706.77 samples/sec   Loss 44.8525   LearningRate 0.0995   Epoch: 0   Global Step: 470   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:15:46,598-Speed 10544.85 samples/sec   Loss 44.7704   LearningRate 0.0995   Epoch: 0   Global Step: 480   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:15:47,550-Speed 10759.41 samples/sec   Loss 44.6254   LearningRate 0.0995   Epoch: 0   Global Step: 490   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:15:48,511-Speed 10658.38 samples/sec   Loss 44.5961   LearningRate 0.0995   Epoch: 0   Global Step: 500   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:15:49,511-Speed 10259.83 samples/sec   Loss 44.5222   LearningRate 0.0995   Epoch: 0   Global Step: 510   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:15:50,486-Speed 10519.06 samples/sec   Loss 44.4704   LearningRate 0.0995   Epoch: 0   Global Step: 520   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:15:51,463-Speed 10505.37 samples/sec   Loss 44.3328   LearningRate 0.0995   Epoch: 0   Global Step: 530   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:15:52,415-Speed 10757.11 samples/sec   Loss 44.3342   LearningRate 0.0995   Epoch: 0   Global Step: 540   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:15:53,363-Speed 10812.16 samples/sec   Loss 44.4249   LearningRate 0.0995   Epoch: 0   Global Step: 550   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-10 23:15:54,326-Speed 10651.42 samples/sec   Loss 44.2405   LearningRate 0.0994   Epoch: 0   Global Step: 560   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-10 23:15:55,287-Speed 10666.47 samples/sec   Loss 44.1747   LearningRate 0.0994   Epoch: 0   Global Step: 570   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-10 23:15:56,308-Speed 10034.41 samples/sec   Loss 44.0491   LearningRate 0.0994   Epoch: 0   Global Step: 580   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-10 23:15:57,504-Speed 8569.46 samples/sec   Loss 43.9550   LearningRate 0.0994   Epoch: 0   Global Step: 590   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-10 23:15:58,586-Speed 9476.87 samples/sec   Loss 43.8987   LearningRate 0.0994   Epoch: 0   Global Step: 600   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-10 23:15:59,556-Speed 10565.84 samples/sec   Loss 43.8575   LearningRate 0.0994   Epoch: 0   Global Step: 610   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-10 23:16:00,497-Speed 10882.98 samples/sec   Loss 43.7689   LearningRate 0.0994   Epoch: 0   Global Step: 620   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-10 23:16:01,463-Speed 10612.66 samples/sec   Loss 43.6588   LearningRate 0.0994   Epoch: 0   Global Step: 630   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-10 23:16:02,384-Speed 11135.28 samples/sec   Loss 43.6638   LearningRate 0.0994   Epoch: 0   Global Step: 640   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-10 23:16:03,410-Speed 9988.83 samples/sec   Loss 43.5875   LearningRate 0.0994   Epoch: 0   Global Step: 650   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-10 23:16:04,435-Speed 10000.32 samples/sec   Loss 43.5359   LearningRate 0.0993   Epoch: 0   Global Step: 660   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-10 23:16:05,380-Speed 10836.32 samples/sec   Loss 43.3980   LearningRate 0.0993   Epoch: 0   Global Step: 670   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-10 23:16:06,363-Speed 10439.40 samples/sec   Loss 43.3444   LearningRate 0.0993   Epoch: 0   Global Step: 680   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-10 23:16:07,316-Speed 10754.01 samples/sec   Loss 43.2464   LearningRate 0.0993   Epoch: 0   Global Step: 690   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-10 23:16:08,297-Speed 10452.30 samples/sec   Loss 43.2405   LearningRate 0.0993   Epoch: 0   Global Step: 700   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-10 23:16:09,283-Speed 10395.90 samples/sec   Loss 43.1416   LearningRate 0.0993   Epoch: 0   Global Step: 710   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-10 23:16:10,325-Speed 9845.66 samples/sec   Loss 43.1085   LearningRate 0.0993   Epoch: 0   Global Step: 720   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-10 23:16:11,270-Speed 10841.58 samples/sec   Loss 42.9506   LearningRate 0.0993   Epoch: 0   Global Step: 730   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-10 23:16:12,211-Speed 10895.16 samples/sec   Loss 42.9289   LearningRate 0.0993   Epoch: 0   Global Step: 740   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-10 23:16:13,182-Speed 10554.43 samples/sec   Loss 42.9364   LearningRate 0.0993   Epoch: 0   Global Step: 750   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-10 23:16:14,164-Speed 10439.25 samples/sec   Loss 42.8856   LearningRate 0.0993   Epoch: 0   Global Step: 760   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-10 23:16:15,137-Speed 10534.33 samples/sec   Loss 42.7881   LearningRate 0.0992   Epoch: 0   Global Step: 770   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-10 23:16:16,150-Speed 10120.68 samples/sec   Loss 42.6772   LearningRate 0.0992   Epoch: 0   Global Step: 780   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-10 23:16:17,103-Speed 10758.19 samples/sec   Loss 42.6628   LearningRate 0.0992   Epoch: 0   Global Step: 790   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-10 23:16:18,047-Speed 10850.14 samples/sec   Loss 42.5366   LearningRate 0.0992   Epoch: 0   Global Step: 800   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-10 23:16:19,028-Speed 10446.60 samples/sec   Loss 42.4811   LearningRate 0.0992   Epoch: 0   Global Step: 810   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-10 23:16:20,035-Speed 10185.40 samples/sec   Loss 42.4423   LearningRate 0.0992   Epoch: 0   Global Step: 820   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-10 23:16:21,001-Speed 10614.65 samples/sec   Loss 42.3709   LearningRate 0.0992   Epoch: 0   Global Step: 830   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-10 23:16:21,994-Speed 10314.06 samples/sec   Loss 42.2214   LearningRate 0.0992   Epoch: 0   Global Step: 840   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-10 23:16:22,958-Speed 10636.38 samples/sec   Loss 42.1903   LearningRate 0.0992   Epoch: 0   Global Step: 850   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-10 23:16:23,967-Speed 10161.19 samples/sec   Loss 42.1356   LearningRate 0.0992   Epoch: 0   Global Step: 860   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-10 23:16:24,946-Speed 10471.22 samples/sec   Loss 42.0135   LearningRate 0.0991   Epoch: 0   Global Step: 870   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-10 23:16:25,920-Speed 10520.53 samples/sec   Loss 42.0003   LearningRate 0.0991   Epoch: 0   Global Step: 880   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-10 23:16:26,874-Speed 10748.87 samples/sec   Loss 41.8516   LearningRate 0.0991   Epoch: 0   Global Step: 890   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-10 23:16:27,885-Speed 10138.57 samples/sec   Loss 41.8064   LearningRate 0.0991   Epoch: 0   Global Step: 900   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-10 23:16:28,865-Speed 10462.55 samples/sec   Loss 41.7326   LearningRate 0.0991   Epoch: 0   Global Step: 910   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-10 23:16:29,829-Speed 10626.18 samples/sec   Loss 41.6579   LearningRate 0.0991   Epoch: 0   Global Step: 920   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-10 23:16:30,767-Speed 10933.41 samples/sec   Loss 41.5986   LearningRate 0.0991   Epoch: 0   Global Step: 930   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-10 23:16:31,763-Speed 10290.26 samples/sec   Loss 41.5365   LearningRate 0.0991   Epoch: 0   Global Step: 940   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-10 23:16:32,722-Speed 10681.00 samples/sec   Loss 41.4625   LearningRate 0.0991   Epoch: 0   Global Step: 950   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-10 23:16:33,660-Speed 10929.54 samples/sec   Loss 41.4007   LearningRate 0.0991   Epoch: 0   Global Step: 960   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-10 23:16:34,615-Speed 10756.48 samples/sec   Loss 41.2646   LearningRate 0.0990   Epoch: 0   Global Step: 970   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-10 23:16:35,593-Speed 10477.52 samples/sec   Loss 41.3153   LearningRate 0.0990   Epoch: 0   Global Step: 980   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-10 23:16:36,578-Speed 10402.30 samples/sec   Loss 41.1768   LearningRate 0.0990   Epoch: 0   Global Step: 990   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-10 23:16:37,563-Speed 10406.10 samples/sec   Loss 41.0795   LearningRate 0.0990   Epoch: 0   Global Step: 1000   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-10 23:16:38,552-Speed 10361.47 samples/sec   Loss 41.0342   LearningRate 0.0990   Epoch: 0   Global Step: 1010   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-10 23:16:39,542-Speed 10369.67 samples/sec   Loss 40.9754   LearningRate 0.0990   Epoch: 0   Global Step: 1020   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-10 23:16:40,514-Speed 10535.82 samples/sec   Loss 40.9016   LearningRate 0.0990   Epoch: 0   Global Step: 1030   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-10 23:16:41,494-Speed 10466.85 samples/sec   Loss 40.8478   LearningRate 0.0990   Epoch: 0   Global Step: 1040   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-10 23:16:42,461-Speed 10594.66 samples/sec   Loss 40.8183   LearningRate 0.0990   Epoch: 0   Global Step: 1050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:16:43,436-Speed 10519.01 samples/sec   Loss 40.7581   LearningRate 0.0990   Epoch: 0   Global Step: 1060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:16:44,407-Speed 10551.45 samples/sec   Loss 40.5397   LearningRate 0.0989   Epoch: 0   Global Step: 1070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:16:45,348-Speed 10889.92 samples/sec   Loss 40.4850   LearningRate 0.0989   Epoch: 0   Global Step: 1080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:16:46,294-Speed 10834.82 samples/sec   Loss 40.4037   LearningRate 0.0989   Epoch: 0   Global Step: 1090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:16:47,287-Speed 10320.65 samples/sec   Loss 40.3846   LearningRate 0.0989   Epoch: 0   Global Step: 1100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:16:48,273-Speed 10397.57 samples/sec   Loss 40.3420   LearningRate 0.0989   Epoch: 0   Global Step: 1110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:16:49,257-Speed 10435.42 samples/sec   Loss 40.2671   LearningRate 0.0989   Epoch: 0   Global Step: 1120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:16:50,243-Speed 10389.82 samples/sec   Loss 40.1110   LearningRate 0.0989   Epoch: 0   Global Step: 1130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:16:51,268-Speed 10009.02 samples/sec   Loss 40.1156   LearningRate 0.0989   Epoch: 0   Global Step: 1140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:16:52,251-Speed 10421.54 samples/sec   Loss 39.9954   LearningRate 0.0989   Epoch: 0   Global Step: 1150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-10 23:16:53,225-Speed 10526.59 samples/sec   Loss 39.9812   LearningRate 0.0989   Epoch: 0   Global Step: 1160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-10 23:16:54,182-Speed 10701.50 samples/sec   Loss 39.8716   LearningRate 0.0988   Epoch: 0   Global Step: 1170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:16:55,150-Speed 10589.13 samples/sec   Loss 39.7660   LearningRate 0.0988   Epoch: 0   Global Step: 1180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:16:56,141-Speed 10344.04 samples/sec   Loss 39.7295   LearningRate 0.0988   Epoch: 0   Global Step: 1190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:16:57,113-Speed 10547.97 samples/sec   Loss 39.6910   LearningRate 0.0988   Epoch: 0   Global Step: 1200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:16:58,099-Speed 10396.47 samples/sec   Loss 39.5803   LearningRate 0.0988   Epoch: 0   Global Step: 1210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:16:59,074-Speed 10502.80 samples/sec   Loss 39.5704   LearningRate 0.0988   Epoch: 0   Global Step: 1220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:17:00,065-Speed 10345.50 samples/sec   Loss 39.4682   LearningRate 0.0988   Epoch: 0   Global Step: 1230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:17:01,028-Speed 10649.66 samples/sec   Loss 39.3373   LearningRate 0.0988   Epoch: 0   Global Step: 1240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:17:01,998-Speed 10559.69 samples/sec   Loss 39.3508   LearningRate 0.0988   Epoch: 0   Global Step: 1250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:17:03,027-Speed 9963.81 samples/sec   Loss 39.1333   LearningRate 0.0988   Epoch: 0   Global Step: 1260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:17:03,992-Speed 10628.00 samples/sec   Loss 39.1415   LearningRate 0.0987   Epoch: 0   Global Step: 1270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-10 23:17:04,961-Speed 10578.20 samples/sec   Loss 39.1811   LearningRate 0.0987   Epoch: 0   Global Step: 1280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-10 23:17:05,914-Speed 10760.78 samples/sec   Loss 39.0271   LearningRate 0.0987   Epoch: 0   Global Step: 1290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-10 23:17:06,879-Speed 10620.73 samples/sec   Loss 38.9204   LearningRate 0.0987   Epoch: 0   Global Step: 1300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-10 23:17:07,911-Speed 9926.84 samples/sec   Loss 39.0071   LearningRate 0.0987   Epoch: 0   Global Step: 1310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-10 23:17:08,853-Speed 10893.20 samples/sec   Loss 38.7848   LearningRate 0.0987   Epoch: 0   Global Step: 1320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-10 23:17:09,797-Speed 10855.51 samples/sec   Loss 38.6883   LearningRate 0.0987   Epoch: 0   Global Step: 1330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-10 23:17:10,791-Speed 10314.01 samples/sec   Loss 38.6453   LearningRate 0.0987   Epoch: 0   Global Step: 1340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-10 23:17:11,812-Speed 10045.48 samples/sec   Loss 38.6020   LearningRate 0.0987   Epoch: 0   Global Step: 1350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-10 23:17:12,763-Speed 10785.02 samples/sec   Loss 38.4439   LearningRate 0.0987   Epoch: 0   Global Step: 1360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-10 23:17:13,755-Speed 10330.63 samples/sec   Loss 38.4591   LearningRate 0.0987   Epoch: 0   Global Step: 1370   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-10 23:17:14,716-Speed 10666.66 samples/sec   Loss 38.3632   LearningRate 0.0986   Epoch: 0   Global Step: 1380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-10 23:17:15,651-Speed 10965.13 samples/sec   Loss 38.1804   LearningRate 0.0986   Epoch: 0   Global Step: 1390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-10 23:17:16,590-Speed 10903.65 samples/sec   Loss 38.2434   LearningRate 0.0986   Epoch: 0   Global Step: 1400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-10 23:17:17,582-Speed 10335.92 samples/sec   Loss 38.1960   LearningRate 0.0986   Epoch: 0   Global Step: 1410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-10 23:17:18,525-Speed 10876.31 samples/sec   Loss 38.1666   LearningRate 0.0986   Epoch: 0   Global Step: 1420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-10 23:17:19,497-Speed 10552.97 samples/sec   Loss 38.0879   LearningRate 0.0986   Epoch: 0   Global Step: 1430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-10 23:17:20,480-Speed 10423.63 samples/sec   Loss 38.0616   LearningRate 0.0986   Epoch: 0   Global Step: 1440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-10 23:17:21,456-Speed 10505.93 samples/sec   Loss 37.8313   LearningRate 0.0986   Epoch: 0   Global Step: 1450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:17:22,409-Speed 10760.04 samples/sec   Loss 37.8568   LearningRate 0.0986   Epoch: 0   Global Step: 1460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:17:23,318-Speed 11273.33 samples/sec   Loss 37.5178   LearningRate 0.0986   Epoch: 0   Global Step: 1470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:17:24,280-Speed 10654.35 samples/sec   Loss 37.5585   LearningRate 0.0985   Epoch: 0   Global Step: 1480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:17:25,232-Speed 10769.47 samples/sec   Loss 37.4585   LearningRate 0.0985   Epoch: 0   Global Step: 1490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:17:26,217-Speed 10403.30 samples/sec   Loss 37.4651   LearningRate 0.0985   Epoch: 0   Global Step: 1500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:17:27,205-Speed 10379.82 samples/sec   Loss 37.2871   LearningRate 0.0985   Epoch: 0   Global Step: 1510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:17:28,175-Speed 10562.27 samples/sec   Loss 37.2608   LearningRate 0.0985   Epoch: 0   Global Step: 1520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:17:29,130-Speed 10734.68 samples/sec   Loss 37.3434   LearningRate 0.0985   Epoch: 0   Global Step: 1530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:17:30,135-Speed 10212.66 samples/sec   Loss 37.2650   LearningRate 0.0985   Epoch: 0   Global Step: 1540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:17:31,065-Speed 11015.40 samples/sec   Loss 37.1931   LearningRate 0.0985   Epoch: 0   Global Step: 1550   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-10 23:17:32,008-Speed 10888.28 samples/sec   Loss 37.0316   LearningRate 0.0985   Epoch: 0   Global Step: 1560   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-10 23:17:32,935-Speed 11053.44 samples/sec   Loss 36.9877   LearningRate 0.0985   Epoch: 0   Global Step: 1570   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-10 23:17:33,889-Speed 10743.44 samples/sec   Loss 36.7745   LearningRate 0.0984   Epoch: 0   Global Step: 1580   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-10 23:17:34,851-Speed 10661.75 samples/sec   Loss 36.6830   LearningRate 0.0984   Epoch: 0   Global Step: 1590   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-10 23:17:35,833-Speed 10432.61 samples/sec   Loss 36.6857   LearningRate 0.0984   Epoch: 0   Global Step: 1600   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-10 23:17:36,821-Speed 10374.96 samples/sec   Loss 36.6354   LearningRate 0.0984   Epoch: 0   Global Step: 1610   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-10 23:17:37,777-Speed 10721.79 samples/sec   Loss 36.7441   LearningRate 0.0984   Epoch: 0   Global Step: 1620   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-10 23:17:38,739-Speed 10656.62 samples/sec   Loss 36.5864   LearningRate 0.0984   Epoch: 0   Global Step: 1630   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-10 23:17:39,721-Speed 10447.87 samples/sec   Loss 36.3974   LearningRate 0.0984   Epoch: 0   Global Step: 1640   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-10 23:17:40,685-Speed 10622.89 samples/sec   Loss 36.3303   LearningRate 0.0984   Epoch: 0   Global Step: 1650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:17:41,660-Speed 10518.67 samples/sec   Loss 36.1223   LearningRate 0.0984   Epoch: 0   Global Step: 1660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:17:42,598-Speed 10928.55 samples/sec   Loss 36.1607   LearningRate 0.0984   Epoch: 0   Global Step: 1670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:17:43,601-Speed 10217.41 samples/sec   Loss 36.0850   LearningRate 0.0983   Epoch: 0   Global Step: 1680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:17:44,593-Speed 10345.88 samples/sec   Loss 35.9617   LearningRate 0.0983   Epoch: 0   Global Step: 1690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:17:45,568-Speed 10510.24 samples/sec   Loss 36.0872   LearningRate 0.0983   Epoch: 0   Global Step: 1700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:17:46,548-Speed 10456.08 samples/sec   Loss 35.9023   LearningRate 0.0983   Epoch: 0   Global Step: 1710   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-10 23:17:47,535-Speed 10381.58 samples/sec   Loss 35.8570   LearningRate 0.0983   Epoch: 0   Global Step: 1720   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-10 23:17:48,495-Speed 10681.70 samples/sec   Loss 35.6730   LearningRate 0.0983   Epoch: 0   Global Step: 1730   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-10 23:17:49,463-Speed 10585.24 samples/sec   Loss 35.5795   LearningRate 0.0983   Epoch: 0   Global Step: 1740   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-10 23:17:50,456-Speed 10322.94 samples/sec   Loss 35.5008   LearningRate 0.0983   Epoch: 0   Global Step: 1750   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-10 23:17:51,434-Speed 10483.65 samples/sec   Loss 35.5454   LearningRate 0.0983   Epoch: 0   Global Step: 1760   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-10 23:17:52,424-Speed 10344.84 samples/sec   Loss 35.2855   LearningRate 0.0983   Epoch: 0   Global Step: 1770   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-10 23:17:53,392-Speed 10587.80 samples/sec   Loss 35.2582   LearningRate 0.0982   Epoch: 0   Global Step: 1780   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-10 23:17:54,356-Speed 10640.48 samples/sec   Loss 35.2333   LearningRate 0.0982   Epoch: 0   Global Step: 1790   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-10 23:17:55,341-Speed 10409.71 samples/sec   Loss 35.2361   LearningRate 0.0982   Epoch: 0   Global Step: 1800   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-10 23:17:56,314-Speed 10531.01 samples/sec   Loss 35.0334   LearningRate 0.0982   Epoch: 0   Global Step: 1810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:17:57,281-Speed 10592.89 samples/sec   Loss 35.0207   LearningRate 0.0982   Epoch: 0   Global Step: 1820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:17:58,270-Speed 10365.39 samples/sec   Loss 34.9716   LearningRate 0.0982   Epoch: 0   Global Step: 1830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:17:59,251-Speed 10450.08 samples/sec   Loss 34.7877   LearningRate 0.0982   Epoch: 0   Global Step: 1840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:18:00,220-Speed 10572.79 samples/sec   Loss 34.7186   LearningRate 0.0982   Epoch: 0   Global Step: 1850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:18:01,200-Speed 10460.25 samples/sec   Loss 34.6096   LearningRate 0.0982   Epoch: 0   Global Step: 1860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:18:02,154-Speed 10735.93 samples/sec   Loss 34.5926   LearningRate 0.0982   Epoch: 0   Global Step: 1870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:18:03,122-Speed 10591.35 samples/sec   Loss 34.6271   LearningRate 0.0982   Epoch: 0   Global Step: 1880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:18:04,051-Speed 11043.02 samples/sec   Loss 34.4743   LearningRate 0.0981   Epoch: 0   Global Step: 1890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:18:04,999-Speed 10806.78 samples/sec   Loss 34.4316   LearningRate 0.0981   Epoch: 0   Global Step: 1900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:18:05,978-Speed 10476.26 samples/sec   Loss 34.3025   LearningRate 0.0981   Epoch: 0   Global Step: 1910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-10 23:18:06,944-Speed 10602.56 samples/sec   Loss 34.0534   LearningRate 0.0981   Epoch: 0   Global Step: 1920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-10 23:18:07,915-Speed 10552.55 samples/sec   Loss 34.1829   LearningRate 0.0981   Epoch: 0   Global Step: 1930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-10 23:18:08,888-Speed 10534.85 samples/sec   Loss 34.1372   LearningRate 0.0981   Epoch: 0   Global Step: 1940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:18:09,864-Speed 10506.85 samples/sec   Loss 33.9138   LearningRate 0.0981   Epoch: 0   Global Step: 1950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:18:10,838-Speed 10516.82 samples/sec   Loss 33.8866   LearningRate 0.0981   Epoch: 0   Global Step: 1960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:18:11,831-Speed 10328.44 samples/sec   Loss 33.7189   LearningRate 0.0981   Epoch: 0   Global Step: 1970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:18:12,815-Speed 10410.00 samples/sec   Loss 33.6812   LearningRate 0.0981   Epoch: 0   Global Step: 1980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:18:13,814-Speed 10267.37 samples/sec   Loss 33.5993   LearningRate 0.0980   Epoch: 0   Global Step: 1990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:18:14,798-Speed 10417.93 samples/sec   Loss 33.3914   LearningRate 0.0980   Epoch: 0   Global Step: 2000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-10 23:18:37,531-[lfw][2000]XNorm: 17.565309
Training: 2022-04-10 23:18:37,532-[lfw][2000]Accuracy-Flip: 0.95650+-0.01221
Training: 2022-04-10 23:18:37,532-[lfw][2000]Accuracy-Highest: 0.95650
Training: 2022-04-10 23:19:03,054-[cfp_fp][2000]XNorm: 15.936612
Training: 2022-04-10 23:19:03,055-[cfp_fp][2000]Accuracy-Flip: 0.76257+-0.02101
Training: 2022-04-10 23:19:03,055-[cfp_fp][2000]Accuracy-Highest: 0.76257
Training: 2022-04-10 23:19:24,905-[agedb_30][2000]XNorm: 16.875576
Training: 2022-04-10 23:19:24,906-[agedb_30][2000]Accuracy-Flip: 0.79400+-0.02213
Training: 2022-04-10 23:19:24,906-[agedb_30][2000]Accuracy-Highest: 0.79400
Training: 2022-04-10 23:19:25,885-Speed 144.05 samples/sec   Loss 33.3834   LearningRate 0.0980   Epoch: 0   Global Step: 2010   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-10 23:19:26,866-Speed 10444.06 samples/sec   Loss 33.3408   LearningRate 0.0980   Epoch: 0   Global Step: 2020   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-10 23:19:27,846-Speed 10474.44 samples/sec   Loss 33.1879   LearningRate 0.0980   Epoch: 0   Global Step: 2030   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-10 23:19:28,789-Speed 10864.53 samples/sec   Loss 33.2760   LearningRate 0.0980   Epoch: 0   Global Step: 2040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:19:29,732-Speed 10873.57 samples/sec   Loss 33.1392   LearningRate 0.0980   Epoch: 0   Global Step: 2050   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-10 23:19:30,666-Speed 10971.07 samples/sec   Loss 33.0193   LearningRate 0.0980   Epoch: 0   Global Step: 2060   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-10 23:19:31,632-Speed 10604.92 samples/sec   Loss 33.1065   LearningRate 0.0980   Epoch: 0   Global Step: 2070   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-10 23:19:32,613-Speed 10455.47 samples/sec   Loss 32.8978   LearningRate 0.0980   Epoch: 0   Global Step: 2080   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-10 23:19:33,559-Speed 10908.76 samples/sec   Loss 32.7463   LearningRate 0.0979   Epoch: 0   Global Step: 2090   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-10 23:19:34,526-Speed 10599.05 samples/sec   Loss 32.7467   LearningRate 0.0979   Epoch: 0   Global Step: 2100   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-10 23:19:35,471-Speed 10842.07 samples/sec   Loss 32.5321   LearningRate 0.0979   Epoch: 0   Global Step: 2110   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-10 23:19:36,444-Speed 10538.16 samples/sec   Loss 32.7364   LearningRate 0.0979   Epoch: 0   Global Step: 2120   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-10 23:19:37,424-Speed 10454.01 samples/sec   Loss 32.4740   LearningRate 0.0979   Epoch: 0   Global Step: 2130   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-10 23:19:38,403-Speed 10475.52 samples/sec   Loss 32.4728   LearningRate 0.0979   Epoch: 0   Global Step: 2140   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-10 23:19:39,359-Speed 10721.96 samples/sec   Loss 32.3214   LearningRate 0.0979   Epoch: 0   Global Step: 2150   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-10 23:19:40,343-Speed 10411.99 samples/sec   Loss 32.1928   LearningRate 0.0979   Epoch: 0   Global Step: 2160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-10 23:19:41,345-Speed 10241.77 samples/sec   Loss 31.9726   LearningRate 0.0979   Epoch: 0   Global Step: 2170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-10 23:19:42,301-Speed 10717.43 samples/sec   Loss 32.1697   LearningRate 0.0979   Epoch: 0   Global Step: 2180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-10 23:19:43,235-Speed 10978.08 samples/sec   Loss 31.8638   LearningRate 0.0978   Epoch: 0   Global Step: 2190   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-10 23:19:44,175-Speed 10901.54 samples/sec   Loss 31.9664   LearningRate 0.0978   Epoch: 0   Global Step: 2200   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-10 23:19:45,137-Speed 10654.96 samples/sec   Loss 31.9122   LearningRate 0.0978   Epoch: 0   Global Step: 2210   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-10 23:19:46,142-Speed 10200.75 samples/sec   Loss 31.7005   LearningRate 0.0978   Epoch: 0   Global Step: 2220   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:19:47,112-Speed 10566.69 samples/sec   Loss 31.5635   LearningRate 0.0978   Epoch: 0   Global Step: 2230   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:19:48,070-Speed 10698.85 samples/sec   Loss 31.4894   LearningRate 0.0978   Epoch: 0   Global Step: 2240   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-10 23:19:49,028-Speed 10696.43 samples/sec   Loss 31.5965   LearningRate 0.0978   Epoch: 0   Global Step: 2250   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-10 23:19:49,973-Speed 10851.63 samples/sec   Loss 31.4793   LearningRate 0.0978   Epoch: 0   Global Step: 2260   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-10 23:19:50,941-Speed 10576.28 samples/sec   Loss 31.4661   LearningRate 0.0978   Epoch: 0   Global Step: 2270   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-10 23:19:51,979-Speed 9876.77 samples/sec   Loss 31.2601   LearningRate 0.0978   Epoch: 0   Global Step: 2280   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-10 23:19:52,978-Speed 10261.51 samples/sec   Loss 31.2672   LearningRate 0.0977   Epoch: 0   Global Step: 2290   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-10 23:19:53,944-Speed 10617.34 samples/sec   Loss 31.1772   LearningRate 0.0977   Epoch: 0   Global Step: 2300   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-10 23:19:54,899-Speed 10727.49 samples/sec   Loss 31.0174   LearningRate 0.0977   Epoch: 0   Global Step: 2310   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-10 23:19:55,857-Speed 10706.50 samples/sec   Loss 30.9306   LearningRate 0.0977   Epoch: 0   Global Step: 2320   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-10 23:19:56,821-Speed 10624.35 samples/sec   Loss 30.9291   LearningRate 0.0977   Epoch: 0   Global Step: 2330   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-10 23:19:57,779-Speed 10701.96 samples/sec   Loss 30.8972   LearningRate 0.0977   Epoch: 0   Global Step: 2340   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-10 23:19:58,739-Speed 10674.87 samples/sec   Loss 30.6853   LearningRate 0.0977   Epoch: 0   Global Step: 2350   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-10 23:19:59,695-Speed 10733.08 samples/sec   Loss 30.6374   LearningRate 0.0977   Epoch: 0   Global Step: 2360   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-10 23:20:00,629-Speed 10961.53 samples/sec   Loss 30.5468   LearningRate 0.0977   Epoch: 0   Global Step: 2370   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-10 23:20:01,666-Speed 9887.39 samples/sec   Loss 30.5170   LearningRate 0.0977   Epoch: 0   Global Step: 2380   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-10 23:20:02,618-Speed 10789.25 samples/sec   Loss 30.2851   LearningRate 0.0977   Epoch: 0   Global Step: 2390   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-10 23:20:03,598-Speed 10457.19 samples/sec   Loss 30.2073   LearningRate 0.0976   Epoch: 0   Global Step: 2400   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-10 23:20:04,562-Speed 10636.43 samples/sec   Loss 30.4106   LearningRate 0.0976   Epoch: 0   Global Step: 2410   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-10 23:20:05,518-Speed 10715.65 samples/sec   Loss 30.2539   LearningRate 0.0976   Epoch: 0   Global Step: 2420   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-10 23:20:06,470-Speed 10765.03 samples/sec   Loss 30.1373   LearningRate 0.0976   Epoch: 0   Global Step: 2430   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-10 23:20:07,504-Speed 9908.37 samples/sec   Loss 30.0500   LearningRate 0.0976   Epoch: 0   Global Step: 2440   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:20:08,448-Speed 10861.26 samples/sec   Loss 29.8653   LearningRate 0.0976   Epoch: 0   Global Step: 2450   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:20:09,418-Speed 10570.16 samples/sec   Loss 29.6931   LearningRate 0.0976   Epoch: 0   Global Step: 2460   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:20:10,447-Speed 9956.55 samples/sec   Loss 29.7085   LearningRate 0.0976   Epoch: 0   Global Step: 2470   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:20:11,411-Speed 10645.54 samples/sec   Loss 29.6928   LearningRate 0.0976   Epoch: 0   Global Step: 2480   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:20:12,394-Speed 10427.66 samples/sec   Loss 29.5389   LearningRate 0.0976   Epoch: 0   Global Step: 2490   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:20:13,362-Speed 10589.79 samples/sec   Loss 29.6222   LearningRate 0.0975   Epoch: 0   Global Step: 2500   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:20:14,320-Speed 10702.31 samples/sec   Loss 29.3424   LearningRate 0.0975   Epoch: 0   Global Step: 2510   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:20:15,273-Speed 10748.63 samples/sec   Loss 29.3673   LearningRate 0.0975   Epoch: 0   Global Step: 2520   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:20:16,239-Speed 10618.25 samples/sec   Loss 29.2888   LearningRate 0.0975   Epoch: 0   Global Step: 2530   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:20:17,218-Speed 10457.34 samples/sec   Loss 29.2546   LearningRate 0.0975   Epoch: 0   Global Step: 2540   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:20:18,186-Speed 10593.15 samples/sec   Loss 29.2359   LearningRate 0.0975   Epoch: 0   Global Step: 2550   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:20:19,113-Speed 11059.48 samples/sec   Loss 29.0587   LearningRate 0.0975   Epoch: 0   Global Step: 2560   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:20:20,080-Speed 10601.11 samples/sec   Loss 28.9319   LearningRate 0.0975   Epoch: 0   Global Step: 2570   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:20:21,049-Speed 10573.51 samples/sec   Loss 29.0080   LearningRate 0.0975   Epoch: 0   Global Step: 2580   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:20:22,036-Speed 10381.55 samples/sec   Loss 28.9004   LearningRate 0.0975   Epoch: 0   Global Step: 2590   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:20:22,985-Speed 10804.19 samples/sec   Loss 28.7417   LearningRate 0.0974   Epoch: 0   Global Step: 2600   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:20:23,961-Speed 10501.29 samples/sec   Loss 28.6494   LearningRate 0.0974   Epoch: 0   Global Step: 2610   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:20:24,929-Speed 10585.17 samples/sec   Loss 28.5531   LearningRate 0.0974   Epoch: 0   Global Step: 2620   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:20:25,895-Speed 10608.56 samples/sec   Loss 28.5718   LearningRate 0.0974   Epoch: 0   Global Step: 2630   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:20:26,878-Speed 10427.59 samples/sec   Loss 28.5392   LearningRate 0.0974   Epoch: 0   Global Step: 2640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:20:27,816-Speed 10927.94 samples/sec   Loss 28.3881   LearningRate 0.0974   Epoch: 0   Global Step: 2650   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:20:28,754-Speed 10931.57 samples/sec   Loss 28.3528   LearningRate 0.0974   Epoch: 0   Global Step: 2660   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:20:29,715-Speed 10664.79 samples/sec   Loss 28.2777   LearningRate 0.0974   Epoch: 0   Global Step: 2670   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-10 23:20:30,713-Speed 10272.76 samples/sec   Loss 28.3315   LearningRate 0.0974   Epoch: 0   Global Step: 2680   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-10 23:20:31,668-Speed 10734.19 samples/sec   Loss 28.1448   LearningRate 0.0974   Epoch: 0   Global Step: 2690   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-10 23:20:32,617-Speed 10799.09 samples/sec   Loss 28.0238   LearningRate 0.0973   Epoch: 0   Global Step: 2700   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-10 23:20:33,589-Speed 10547.64 samples/sec   Loss 27.9648   LearningRate 0.0973   Epoch: 0   Global Step: 2710   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-10 23:20:34,531-Speed 10882.62 samples/sec   Loss 27.6623   LearningRate 0.0973   Epoch: 0   Global Step: 2720   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-10 23:20:35,461-Speed 11010.42 samples/sec   Loss 27.6101   LearningRate 0.0973   Epoch: 0   Global Step: 2730   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-10 23:20:36,450-Speed 10375.05 samples/sec   Loss 27.6321   LearningRate 0.0973   Epoch: 0   Global Step: 2740   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-10 23:20:37,426-Speed 10495.24 samples/sec   Loss 27.7574   LearningRate 0.0973   Epoch: 0   Global Step: 2750   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-10 23:20:38,363-Speed 10945.63 samples/sec   Loss 27.5480   LearningRate 0.0973   Epoch: 0   Global Step: 2760   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-10 23:20:39,319-Speed 10718.74 samples/sec   Loss 27.3944   LearningRate 0.0973   Epoch: 0   Global Step: 2770   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-10 23:20:40,339-Speed 10044.55 samples/sec   Loss 27.5243   LearningRate 0.0973   Epoch: 0   Global Step: 2780   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-10 23:20:41,318-Speed 10469.51 samples/sec   Loss 27.6116   LearningRate 0.0973   Epoch: 0   Global Step: 2790   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-10 23:20:42,277-Speed 10687.15 samples/sec   Loss 27.2402   LearningRate 0.0973   Epoch: 0   Global Step: 2800   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-10 23:20:43,229-Speed 10766.69 samples/sec   Loss 27.3460   LearningRate 0.0972   Epoch: 0   Global Step: 2810   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-10 23:20:44,160-Speed 11002.26 samples/sec   Loss 27.2328   LearningRate 0.0972   Epoch: 0   Global Step: 2820   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-10 23:20:45,144-Speed 10418.61 samples/sec   Loss 27.1466   LearningRate 0.0972   Epoch: 0   Global Step: 2830   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-10 23:20:46,159-Speed 10104.79 samples/sec   Loss 26.8686   LearningRate 0.0972   Epoch: 0   Global Step: 2840   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-10 23:20:47,071-Speed 11247.82 samples/sec   Loss 27.0263   LearningRate 0.0972   Epoch: 0   Global Step: 2850   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-10 23:20:47,999-Speed 11041.24 samples/sec   Loss 27.0031   LearningRate 0.0972   Epoch: 0   Global Step: 2860   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-10 23:20:48,965-Speed 10605.16 samples/sec   Loss 26.8147   LearningRate 0.0972   Epoch: 0   Global Step: 2870   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:20:49,899-Speed 10979.83 samples/sec   Loss 26.7707   LearningRate 0.0972   Epoch: 0   Global Step: 2880   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:20:50,882-Speed 10428.66 samples/sec   Loss 26.7967   LearningRate 0.0972   Epoch: 0   Global Step: 2890   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:20:51,829-Speed 10820.75 samples/sec   Loss 26.8987   LearningRate 0.0972   Epoch: 0   Global Step: 2900   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:20:52,796-Speed 10596.95 samples/sec   Loss 26.8072   LearningRate 0.0971   Epoch: 0   Global Step: 2910   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:20:53,732-Speed 10949.28 samples/sec   Loss 26.5361   LearningRate 0.0971   Epoch: 0   Global Step: 2920   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:20:54,688-Speed 10721.48 samples/sec   Loss 26.4577   LearningRate 0.0971   Epoch: 0   Global Step: 2930   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:20:55,675-Speed 10387.81 samples/sec   Loss 26.4034   LearningRate 0.0971   Epoch: 0   Global Step: 2940   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:20:56,648-Speed 10529.20 samples/sec   Loss 26.4091   LearningRate 0.0971   Epoch: 0   Global Step: 2950   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:20:57,611-Speed 10647.37 samples/sec   Loss 26.1864   LearningRate 0.0971   Epoch: 0   Global Step: 2960   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:20:58,595-Speed 10417.70 samples/sec   Loss 26.2852   LearningRate 0.0971   Epoch: 0   Global Step: 2970   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:20:59,528-Speed 10985.78 samples/sec   Loss 26.0661   LearningRate 0.0971   Epoch: 0   Global Step: 2980   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:21:00,473-Speed 10848.09 samples/sec   Loss 25.8522   LearningRate 0.0971   Epoch: 0   Global Step: 2990   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:21:01,469-Speed 10285.81 samples/sec   Loss 26.1644   LearningRate 0.0971   Epoch: 0   Global Step: 3000   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:21:02,403-Speed 10975.53 samples/sec   Loss 25.8694   LearningRate 0.0970   Epoch: 0   Global Step: 3010   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:21:03,445-Speed 9834.03 samples/sec   Loss 25.8197   LearningRate 0.0970   Epoch: 0   Global Step: 3020   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:21:04,448-Speed 10214.77 samples/sec   Loss 25.6448   LearningRate 0.0970   Epoch: 0   Global Step: 3030   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:21:05,432-Speed 10425.12 samples/sec   Loss 25.7225   LearningRate 0.0970   Epoch: 0   Global Step: 3040   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:21:06,434-Speed 10221.43 samples/sec   Loss 25.6283   LearningRate 0.0970   Epoch: 0   Global Step: 3050   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:21:07,371-Speed 10945.21 samples/sec   Loss 25.8491   LearningRate 0.0970   Epoch: 0   Global Step: 3060   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:21:08,335-Speed 10638.45 samples/sec   Loss 25.6028   LearningRate 0.0970   Epoch: 0   Global Step: 3070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:21:09,343-Speed 10163.99 samples/sec   Loss 25.5089   LearningRate 0.0970   Epoch: 0   Global Step: 3080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:21:10,353-Speed 10155.72 samples/sec   Loss 25.4041   LearningRate 0.0970   Epoch: 0   Global Step: 3090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:21:11,311-Speed 10698.39 samples/sec   Loss 25.3262   LearningRate 0.0970   Epoch: 0   Global Step: 3100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:21:12,273-Speed 10650.17 samples/sec   Loss 25.2903   LearningRate 0.0969   Epoch: 0   Global Step: 3110   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:21:13,245-Speed 10541.51 samples/sec   Loss 25.1940   LearningRate 0.0969   Epoch: 0   Global Step: 3120   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:21:14,206-Speed 10671.46 samples/sec   Loss 25.2644   LearningRate 0.0969   Epoch: 0   Global Step: 3130   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:21:15,120-Speed 11304.75 samples/sec   Loss 25.0559   LearningRate 0.0969   Epoch: 0   Global Step: 3140   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:21:16,070-Speed 10782.55 samples/sec   Loss 25.1211   LearningRate 0.0969   Epoch: 0   Global Step: 3150   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:21:17,004-Speed 10977.41 samples/sec   Loss 25.0518   LearningRate 0.0969   Epoch: 0   Global Step: 3160   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:21:17,979-Speed 10515.63 samples/sec   Loss 24.8046   LearningRate 0.0969   Epoch: 0   Global Step: 3170   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:21:18,947-Speed 10589.50 samples/sec   Loss 24.8900   LearningRate 0.0969   Epoch: 0   Global Step: 3180   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:21:19,870-Speed 11114.40 samples/sec   Loss 24.8516   LearningRate 0.0969   Epoch: 0   Global Step: 3190   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:21:20,829-Speed 10680.53 samples/sec   Loss 24.8081   LearningRate 0.0969   Epoch: 0   Global Step: 3200   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:21:21,811-Speed 10436.78 samples/sec   Loss 24.4241   LearningRate 0.0969   Epoch: 0   Global Step: 3210   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:21:22,795-Speed 10421.78 samples/sec   Loss 24.6064   LearningRate 0.0968   Epoch: 0   Global Step: 3220   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:21:23,728-Speed 10990.51 samples/sec   Loss 24.3956   LearningRate 0.0968   Epoch: 0   Global Step: 3230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:21:24,694-Speed 10610.29 samples/sec   Loss 24.2660   LearningRate 0.0968   Epoch: 0   Global Step: 3240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:21:25,661-Speed 10596.35 samples/sec   Loss 24.2180   LearningRate 0.0968   Epoch: 0   Global Step: 3250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:21:26,685-Speed 10007.69 samples/sec   Loss 24.2201   LearningRate 0.0968   Epoch: 0   Global Step: 3260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:21:27,653-Speed 10589.45 samples/sec   Loss 24.5137   LearningRate 0.0968   Epoch: 0   Global Step: 3270   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:21:28,616-Speed 10647.34 samples/sec   Loss 24.1810   LearningRate 0.0968   Epoch: 0   Global Step: 3280   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:21:29,551-Speed 10958.41 samples/sec   Loss 24.2316   LearningRate 0.0968   Epoch: 0   Global Step: 3290   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:21:30,528-Speed 10491.10 samples/sec   Loss 24.2594   LearningRate 0.0968   Epoch: 0   Global Step: 3300   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:21:31,477-Speed 10792.80 samples/sec   Loss 24.1367   LearningRate 0.0968   Epoch: 0   Global Step: 3310   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:21:32,436-Speed 10689.21 samples/sec   Loss 23.8949   LearningRate 0.0967   Epoch: 0   Global Step: 3320   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:21:33,364-Speed 11042.17 samples/sec   Loss 23.8408   LearningRate 0.0967   Epoch: 0   Global Step: 3330   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:21:34,282-Speed 11191.55 samples/sec   Loss 23.7103   LearningRate 0.0967   Epoch: 0   Global Step: 3340   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:21:35,265-Speed 10427.08 samples/sec   Loss 23.8530   LearningRate 0.0967   Epoch: 0   Global Step: 3350   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:21:36,243-Speed 10477.84 samples/sec   Loss 23.7146   LearningRate 0.0967   Epoch: 0   Global Step: 3360   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:21:37,187-Speed 10856.17 samples/sec   Loss 23.7332   LearningRate 0.0967   Epoch: 0   Global Step: 3370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:21:38,145-Speed 10708.85 samples/sec   Loss 23.8578   LearningRate 0.0967   Epoch: 0   Global Step: 3380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:21:39,077-Speed 10990.05 samples/sec   Loss 23.6149   LearningRate 0.0967   Epoch: 0   Global Step: 3390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:21:40,021-Speed 10849.18 samples/sec   Loss 23.7060   LearningRate 0.0967   Epoch: 0   Global Step: 3400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:21:40,993-Speed 10554.51 samples/sec   Loss 23.7814   LearningRate 0.0967   Epoch: 0   Global Step: 3410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:21:41,908-Speed 11197.11 samples/sec   Loss 23.4462   LearningRate 0.0966   Epoch: 0   Global Step: 3420   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-10 23:21:42,865-Speed 10716.52 samples/sec   Loss 23.2733   LearningRate 0.0966   Epoch: 0   Global Step: 3430   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-10 23:21:43,812-Speed 10812.73 samples/sec   Loss 23.4554   LearningRate 0.0966   Epoch: 0   Global Step: 3440   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-10 23:21:44,778-Speed 10615.59 samples/sec   Loss 23.3255   LearningRate 0.0966   Epoch: 0   Global Step: 3450   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-10 23:21:45,710-Speed 10993.22 samples/sec   Loss 23.2798   LearningRate 0.0966   Epoch: 0   Global Step: 3460   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-10 23:21:46,691-Speed 10455.95 samples/sec   Loss 23.2694   LearningRate 0.0966   Epoch: 0   Global Step: 3470   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-10 23:21:47,668-Speed 10487.26 samples/sec   Loss 23.1917   LearningRate 0.0966   Epoch: 0   Global Step: 3480   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-10 23:21:48,643-Speed 10509.10 samples/sec   Loss 23.4809   LearningRate 0.0966   Epoch: 0   Global Step: 3490   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-10 23:21:49,605-Speed 10655.48 samples/sec   Loss 22.8809   LearningRate 0.0966   Epoch: 0   Global Step: 3500   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-10 23:21:50,532-Speed 11047.54 samples/sec   Loss 23.1065   LearningRate 0.0966   Epoch: 0   Global Step: 3510   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-10 23:21:51,586-Speed 9732.80 samples/sec   Loss 22.9543   LearningRate 0.0966   Epoch: 0   Global Step: 3520   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:21:52,544-Speed 10697.38 samples/sec   Loss 22.8545   LearningRate 0.0965   Epoch: 0   Global Step: 3530   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:21:53,481-Speed 10941.17 samples/sec   Loss 22.7816   LearningRate 0.0965   Epoch: 0   Global Step: 3540   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:21:54,494-Speed 10115.59 samples/sec   Loss 22.7550   LearningRate 0.0965   Epoch: 0   Global Step: 3550   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:21:55,443-Speed 10794.36 samples/sec   Loss 22.7345   LearningRate 0.0965   Epoch: 0   Global Step: 3560   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:21:56,418-Speed 10519.91 samples/sec   Loss 22.6049   LearningRate 0.0965   Epoch: 0   Global Step: 3570   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:21:57,339-Speed 11129.16 samples/sec   Loss 22.8534   LearningRate 0.0965   Epoch: 0   Global Step: 3580   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:21:58,286-Speed 10821.29 samples/sec   Loss 22.6524   LearningRate 0.0965   Epoch: 0   Global Step: 3590   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:21:59,274-Speed 10374.75 samples/sec   Loss 22.6374   LearningRate 0.0965   Epoch: 0   Global Step: 3600   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:22:00,240-Speed 10606.00 samples/sec   Loss 22.8060   LearningRate 0.0965   Epoch: 0   Global Step: 3610   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:22:01,222-Speed 10439.86 samples/sec   Loss 22.4811   LearningRate 0.0965   Epoch: 0   Global Step: 3620   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:22:02,134-Speed 11244.71 samples/sec   Loss 22.4818   LearningRate 0.0964   Epoch: 0   Global Step: 3630   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:22:03,088-Speed 10742.19 samples/sec   Loss 22.5542   LearningRate 0.0964   Epoch: 0   Global Step: 3640   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:22:04,049-Speed 10675.67 samples/sec   Loss 22.5312   LearningRate 0.0964   Epoch: 0   Global Step: 3650   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:22:05,022-Speed 10524.60 samples/sec   Loss 22.3466   LearningRate 0.0964   Epoch: 0   Global Step: 3660   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:22:06,012-Speed 10354.95 samples/sec   Loss 22.1634   LearningRate 0.0964   Epoch: 0   Global Step: 3670   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:22:06,981-Speed 10579.76 samples/sec   Loss 22.3630   LearningRate 0.0964   Epoch: 0   Global Step: 3680   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:22:07,909-Speed 11038.31 samples/sec   Loss 22.1369   LearningRate 0.0964   Epoch: 0   Global Step: 3690   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:22:08,840-Speed 11013.33 samples/sec   Loss 22.1740   LearningRate 0.0964   Epoch: 0   Global Step: 3700   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:22:09,796-Speed 10727.60 samples/sec   Loss 22.2032   LearningRate 0.0964   Epoch: 0   Global Step: 3710   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:22:10,755-Speed 10686.04 samples/sec   Loss 22.0512   LearningRate 0.0964   Epoch: 0   Global Step: 3720   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:22:11,755-Speed 10249.56 samples/sec   Loss 22.1708   LearningRate 0.0963   Epoch: 0   Global Step: 3730   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-10 23:22:12,685-Speed 11022.70 samples/sec   Loss 22.0710   LearningRate 0.0963   Epoch: 0   Global Step: 3740   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:22:13,614-Speed 11043.66 samples/sec   Loss 22.0051   LearningRate 0.0963   Epoch: 0   Global Step: 3750   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:22:14,609-Speed 10300.76 samples/sec   Loss 21.9738   LearningRate 0.0963   Epoch: 0   Global Step: 3760   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:22:15,556-Speed 10828.52 samples/sec   Loss 21.8043   LearningRate 0.0963   Epoch: 0   Global Step: 3770   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:22:16,481-Speed 11069.52 samples/sec   Loss 21.8833   LearningRate 0.0963   Epoch: 0   Global Step: 3780   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:22:17,447-Speed 10615.23 samples/sec   Loss 21.7321   LearningRate 0.0963   Epoch: 0   Global Step: 3790   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:22:18,439-Speed 10331.36 samples/sec   Loss 21.4550   LearningRate 0.0963   Epoch: 0   Global Step: 3800   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:22:19,413-Speed 10525.56 samples/sec   Loss 21.8895   LearningRate 0.0963   Epoch: 0   Global Step: 3810   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:22:20,376-Speed 10637.91 samples/sec   Loss 21.7674   LearningRate 0.0963   Epoch: 0   Global Step: 3820   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:22:21,314-Speed 10938.25 samples/sec   Loss 21.6399   LearningRate 0.0962   Epoch: 0   Global Step: 3830   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:22:22,343-Speed 9972.12 samples/sec   Loss 21.6653   LearningRate 0.0962   Epoch: 0   Global Step: 3840   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:22:23,297-Speed 10739.34 samples/sec   Loss 21.6366   LearningRate 0.0962   Epoch: 0   Global Step: 3850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:22:24,245-Speed 10814.48 samples/sec   Loss 21.7619   LearningRate 0.0962   Epoch: 0   Global Step: 3860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:22:25,192-Speed 10827.06 samples/sec   Loss 21.4822   LearningRate 0.0962   Epoch: 0   Global Step: 3870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:22:26,164-Speed 10543.74 samples/sec   Loss 21.5563   LearningRate 0.0962   Epoch: 0   Global Step: 3880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:22:27,157-Speed 10317.17 samples/sec   Loss 21.2329   LearningRate 0.0962   Epoch: 0   Global Step: 3890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:22:28,107-Speed 10793.87 samples/sec   Loss 21.5142   LearningRate 0.0962   Epoch: 0   Global Step: 3900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:22:29,095-Speed 10372.82 samples/sec   Loss 21.4532   LearningRate 0.0962   Epoch: 0   Global Step: 3910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:22:30,058-Speed 10646.64 samples/sec   Loss 21.4478   LearningRate 0.0962   Epoch: 0   Global Step: 3920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:22:31,005-Speed 10818.34 samples/sec   Loss 21.3576   LearningRate 0.0962   Epoch: 0   Global Step: 3930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:22:31,969-Speed 10632.95 samples/sec   Loss 21.3190   LearningRate 0.0961   Epoch: 0   Global Step: 3940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:22:32,924-Speed 10738.00 samples/sec   Loss 21.3171   LearningRate 0.0961   Epoch: 0   Global Step: 3950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:22:33,893-Speed 10579.27 samples/sec   Loss 21.0920   LearningRate 0.0961   Epoch: 0   Global Step: 3960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-10 23:22:34,830-Speed 10939.15 samples/sec   Loss 21.3701   LearningRate 0.0961   Epoch: 0   Global Step: 3970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-10 23:22:35,776-Speed 10828.85 samples/sec   Loss 21.1111   LearningRate 0.0961   Epoch: 0   Global Step: 3980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-10 23:22:36,729-Speed 10752.28 samples/sec   Loss 21.2270   LearningRate 0.0961   Epoch: 0   Global Step: 3990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-10 23:22:37,702-Speed 10539.42 samples/sec   Loss 21.0229   LearningRate 0.0961   Epoch: 0   Global Step: 4000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-10 23:23:00,260-[lfw][4000]XNorm: 19.542373
Training: 2022-04-10 23:23:00,261-[lfw][4000]Accuracy-Flip: 0.98100+-0.00727
Training: 2022-04-10 23:23:00,262-[lfw][4000]Accuracy-Highest: 0.98100
Training: 2022-04-10 23:23:26,121-[cfp_fp][4000]XNorm: 16.627521
Training: 2022-04-10 23:23:26,122-[cfp_fp][4000]Accuracy-Flip: 0.84100+-0.01543
Training: 2022-04-10 23:23:26,123-[cfp_fp][4000]Accuracy-Highest: 0.84100
Training: 2022-04-10 23:23:48,641-[agedb_30][4000]XNorm: 18.726647
Training: 2022-04-10 23:23:48,642-[agedb_30][4000]Accuracy-Flip: 0.86617+-0.02536
Training: 2022-04-10 23:23:48,642-[agedb_30][4000]Accuracy-Highest: 0.86617
Training: 2022-04-10 23:23:49,598-Speed 142.43 samples/sec   Loss 21.0415   LearningRate 0.0961   Epoch: 0   Global Step: 4010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:23:50,561-Speed 10641.46 samples/sec   Loss 20.9669   LearningRate 0.0961   Epoch: 0   Global Step: 4020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:23:51,573-Speed 10129.23 samples/sec   Loss 21.0512   LearningRate 0.0961   Epoch: 0   Global Step: 4030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:23:52,515-Speed 10885.64 samples/sec   Loss 20.8060   LearningRate 0.0960   Epoch: 0   Global Step: 4040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:23:53,500-Speed 10399.36 samples/sec   Loss 20.7965   LearningRate 0.0960   Epoch: 0   Global Step: 4050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:23:54,461-Speed 10661.55 samples/sec   Loss 20.9564   LearningRate 0.0960   Epoch: 0   Global Step: 4060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:23:55,400-Speed 10920.70 samples/sec   Loss 20.7808   LearningRate 0.0960   Epoch: 0   Global Step: 4070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:23:56,351-Speed 10772.56 samples/sec   Loss 20.7235   LearningRate 0.0960   Epoch: 0   Global Step: 4080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:23:57,374-Speed 10021.54 samples/sec   Loss 20.9402   LearningRate 0.0960   Epoch: 0   Global Step: 4090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:23:58,336-Speed 10661.63 samples/sec   Loss 20.6327   LearningRate 0.0960   Epoch: 0   Global Step: 4100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:23:59,281-Speed 10844.24 samples/sec   Loss 20.6646   LearningRate 0.0960   Epoch: 0   Global Step: 4110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:24:00,268-Speed 10380.04 samples/sec   Loss 20.6139   LearningRate 0.0960   Epoch: 0   Global Step: 4120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:24:01,232-Speed 10644.73 samples/sec   Loss 20.7633   LearningRate 0.0960   Epoch: 0   Global Step: 4130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:24:02,182-Speed 10803.05 samples/sec   Loss 20.6241   LearningRate 0.0959   Epoch: 0   Global Step: 4140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:24:03,153-Speed 10558.44 samples/sec   Loss 20.6225   LearningRate 0.0959   Epoch: 0   Global Step: 4150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:24:04,143-Speed 10346.27 samples/sec   Loss 20.4714   LearningRate 0.0959   Epoch: 0   Global Step: 4160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:24:05,099-Speed 10722.38 samples/sec   Loss 20.6242   LearningRate 0.0959   Epoch: 0   Global Step: 4170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:24:06,046-Speed 10828.69 samples/sec   Loss 20.4648   LearningRate 0.0959   Epoch: 0   Global Step: 4180   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:24:06,994-Speed 10810.78 samples/sec   Loss 20.4666   LearningRate 0.0959   Epoch: 0   Global Step: 4190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:24:07,981-Speed 10387.05 samples/sec   Loss 20.4380   LearningRate 0.0959   Epoch: 0   Global Step: 4200   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:24:08,964-Speed 10431.60 samples/sec   Loss 20.4817   LearningRate 0.0959   Epoch: 0   Global Step: 4210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:24:09,947-Speed 10419.30 samples/sec   Loss 20.3937   LearningRate 0.0959   Epoch: 0   Global Step: 4220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:24:10,906-Speed 10690.18 samples/sec   Loss 20.3096   LearningRate 0.0959   Epoch: 0   Global Step: 4230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:24:11,921-Speed 10096.11 samples/sec   Loss 20.1746   LearningRate 0.0959   Epoch: 0   Global Step: 4240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:24:12,872-Speed 10776.44 samples/sec   Loss 20.2805   LearningRate 0.0958   Epoch: 0   Global Step: 4250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:24:13,821-Speed 10801.44 samples/sec   Loss 20.3558   LearningRate 0.0958   Epoch: 0   Global Step: 4260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:24:14,800-Speed 10476.97 samples/sec   Loss 20.2892   LearningRate 0.0958   Epoch: 0   Global Step: 4270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:24:15,764-Speed 10633.58 samples/sec   Loss 20.0983   LearningRate 0.0958   Epoch: 0   Global Step: 4280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:24:16,697-Speed 10982.34 samples/sec   Loss 20.2249   LearningRate 0.0958   Epoch: 0   Global Step: 4290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:24:17,652-Speed 10726.54 samples/sec   Loss 20.1532   LearningRate 0.0958   Epoch: 0   Global Step: 4300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:24:18,619-Speed 10614.39 samples/sec   Loss 20.1052   LearningRate 0.0958   Epoch: 0   Global Step: 4310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:24:19,593-Speed 10519.61 samples/sec   Loss 20.1933   LearningRate 0.0958   Epoch: 0   Global Step: 4320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:24:20,522-Speed 11029.12 samples/sec   Loss 20.0885   LearningRate 0.0958   Epoch: 0   Global Step: 4330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:24:21,494-Speed 10546.80 samples/sec   Loss 19.9039   LearningRate 0.0958   Epoch: 0   Global Step: 4340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:24:22,469-Speed 10516.71 samples/sec   Loss 19.8507   LearningRate 0.0957   Epoch: 0   Global Step: 4350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:24:23,424-Speed 10732.83 samples/sec   Loss 19.7787   LearningRate 0.0957   Epoch: 0   Global Step: 4360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:24:24,377-Speed 10757.57 samples/sec   Loss 19.9834   LearningRate 0.0957   Epoch: 0   Global Step: 4370   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:24:25,317-Speed 10893.57 samples/sec   Loss 20.0103   LearningRate 0.0957   Epoch: 0   Global Step: 4380   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:24:26,265-Speed 10809.65 samples/sec   Loss 20.0827   LearningRate 0.0957   Epoch: 0   Global Step: 4390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:24:27,229-Speed 10639.30 samples/sec   Loss 19.6548   LearningRate 0.0957   Epoch: 0   Global Step: 4400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:24:28,201-Speed 10543.22 samples/sec   Loss 20.0824   LearningRate 0.0957   Epoch: 0   Global Step: 4410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:24:29,157-Speed 10717.40 samples/sec   Loss 19.8652   LearningRate 0.0957   Epoch: 0   Global Step: 4420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:24:30,134-Speed 10495.79 samples/sec   Loss 19.7470   LearningRate 0.0957   Epoch: 0   Global Step: 4430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:24:31,098-Speed 10635.02 samples/sec   Loss 19.7804   LearningRate 0.0957   Epoch: 0   Global Step: 4440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:24:32,093-Speed 10298.33 samples/sec   Loss 19.8232   LearningRate 0.0956   Epoch: 0   Global Step: 4450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:24:33,047-Speed 10745.24 samples/sec   Loss 19.7028   LearningRate 0.0956   Epoch: 0   Global Step: 4460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:24:33,999-Speed 10758.57 samples/sec   Loss 19.8992   LearningRate 0.0956   Epoch: 0   Global Step: 4470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:24:34,953-Speed 10748.01 samples/sec   Loss 19.5217   LearningRate 0.0956   Epoch: 0   Global Step: 4480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:24:35,913-Speed 10673.69 samples/sec   Loss 19.6243   LearningRate 0.0956   Epoch: 0   Global Step: 4490   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:24:36,859-Speed 10832.65 samples/sec   Loss 19.5781   LearningRate 0.0956   Epoch: 0   Global Step: 4500   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:24:37,832-Speed 10538.27 samples/sec   Loss 19.6134   LearningRate 0.0956   Epoch: 0   Global Step: 4510   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:24:38,788-Speed 10726.76 samples/sec   Loss 19.6336   LearningRate 0.0956   Epoch: 0   Global Step: 4520   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:24:39,764-Speed 10491.62 samples/sec   Loss 19.7100   LearningRate 0.0956   Epoch: 0   Global Step: 4530   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:24:40,716-Speed 10771.44 samples/sec   Loss 19.6332   LearningRate 0.0956   Epoch: 0   Global Step: 4540   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:24:41,672-Speed 10723.89 samples/sec   Loss 19.5374   LearningRate 0.0956   Epoch: 0   Global Step: 4550   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:24:42,619-Speed 10830.48 samples/sec   Loss 19.5169   LearningRate 0.0955   Epoch: 0   Global Step: 4560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:24:43,552-Speed 10991.30 samples/sec   Loss 19.5653   LearningRate 0.0955   Epoch: 0   Global Step: 4570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:24:44,525-Speed 10532.59 samples/sec   Loss 19.2849   LearningRate 0.0955   Epoch: 0   Global Step: 4580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:24:45,475-Speed 10787.84 samples/sec   Loss 19.3676   LearningRate 0.0955   Epoch: 0   Global Step: 4590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:24:46,448-Speed 10527.68 samples/sec   Loss 19.1786   LearningRate 0.0955   Epoch: 0   Global Step: 4600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:24:47,420-Speed 10552.94 samples/sec   Loss 19.2603   LearningRate 0.0955   Epoch: 0   Global Step: 4610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:24:48,398-Speed 10474.25 samples/sec   Loss 19.3186   LearningRate 0.0955   Epoch: 0   Global Step: 4620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:24:49,390-Speed 10336.50 samples/sec   Loss 19.2944   LearningRate 0.0955   Epoch: 0   Global Step: 4630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:24:50,369-Speed 10472.16 samples/sec   Loss 19.1785   LearningRate 0.0955   Epoch: 0   Global Step: 4640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:24:51,301-Speed 10997.02 samples/sec   Loss 19.2354   LearningRate 0.0955   Epoch: 0   Global Step: 4650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:24:52,270-Speed 10569.28 samples/sec   Loss 19.1715   LearningRate 0.0954   Epoch: 0   Global Step: 4660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:24:53,205-Speed 10964.85 samples/sec   Loss 19.2636   LearningRate 0.0954   Epoch: 0   Global Step: 4670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:24:54,167-Speed 10654.41 samples/sec   Loss 19.1479   LearningRate 0.0954   Epoch: 0   Global Step: 4680   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:24:55,136-Speed 10582.19 samples/sec   Loss 19.2796   LearningRate 0.0954   Epoch: 0   Global Step: 4690   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:24:56,096-Speed 10680.07 samples/sec   Loss 19.1827   LearningRate 0.0954   Epoch: 0   Global Step: 4700   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:24:57,083-Speed 10388.22 samples/sec   Loss 19.2141   LearningRate 0.0954   Epoch: 0   Global Step: 4710   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:24:58,058-Speed 10518.13 samples/sec   Loss 19.3013   LearningRate 0.0954   Epoch: 0   Global Step: 4720   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:24:58,993-Speed 10958.95 samples/sec   Loss 18.9982   LearningRate 0.0954   Epoch: 0   Global Step: 4730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:24:59,920-Speed 11057.73 samples/sec   Loss 19.1988   LearningRate 0.0954   Epoch: 0   Global Step: 4740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:25:00,823-Speed 11358.50 samples/sec   Loss 18.8805   LearningRate 0.0954   Epoch: 0   Global Step: 4750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:25:01,772-Speed 10789.45 samples/sec   Loss 19.1965   LearningRate 0.0953   Epoch: 0   Global Step: 4760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:25:02,758-Speed 10403.15 samples/sec   Loss 19.0719   LearningRate 0.0953   Epoch: 0   Global Step: 4770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:25:03,702-Speed 10878.42 samples/sec   Loss 18.9849   LearningRate 0.0953   Epoch: 0   Global Step: 4780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:25:04,647-Speed 10848.96 samples/sec   Loss 19.1165   LearningRate 0.0953   Epoch: 0   Global Step: 4790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:25:05,597-Speed 10788.46 samples/sec   Loss 19.0230   LearningRate 0.0953   Epoch: 0   Global Step: 4800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:25:06,543-Speed 10826.82 samples/sec   Loss 18.8757   LearningRate 0.0953   Epoch: 0   Global Step: 4810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:25:07,507-Speed 10639.12 samples/sec   Loss 18.8084   LearningRate 0.0953   Epoch: 0   Global Step: 4820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:25:08,441-Speed 10962.98 samples/sec   Loss 18.8827   LearningRate 0.0953   Epoch: 0   Global Step: 4830   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:25:09,444-Speed 10221.98 samples/sec   Loss 18.8677   LearningRate 0.0953   Epoch: 0   Global Step: 4840   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:25:10,432-Speed 10377.60 samples/sec   Loss 18.7290   LearningRate 0.0953   Epoch: 0   Global Step: 4850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:25:11,368-Speed 10955.00 samples/sec   Loss 18.6515   LearningRate 0.0953   Epoch: 0   Global Step: 4860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:25:12,370-Speed 10229.89 samples/sec   Loss 18.4812   LearningRate 0.0952   Epoch: 0   Global Step: 4870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:25:13,326-Speed 10728.81 samples/sec   Loss 18.5945   LearningRate 0.0952   Epoch: 0   Global Step: 4880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:25:14,318-Speed 10336.40 samples/sec   Loss 18.8555   LearningRate 0.0952   Epoch: 0   Global Step: 4890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:25:15,225-Speed 11300.57 samples/sec   Loss 18.7568   LearningRate 0.0952   Epoch: 0   Global Step: 4900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:25:16,160-Speed 10952.34 samples/sec   Loss 18.8199   LearningRate 0.0952   Epoch: 0   Global Step: 4910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:25:17,167-Speed 10191.26 samples/sec   Loss 18.3896   LearningRate 0.0952   Epoch: 0   Global Step: 4920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:25:18,103-Speed 10943.01 samples/sec   Loss 18.6964   LearningRate 0.0952   Epoch: 0   Global Step: 4930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:25:19,068-Speed 10623.26 samples/sec   Loss 18.4526   LearningRate 0.0952   Epoch: 0   Global Step: 4940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:25:20,048-Speed 10457.64 samples/sec   Loss 18.4988   LearningRate 0.0952   Epoch: 0   Global Step: 4950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:25:21,027-Speed 10468.66 samples/sec   Loss 18.6651   LearningRate 0.0952   Epoch: 0   Global Step: 4960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:25:22,007-Speed 10458.72 samples/sec   Loss 18.6845   LearningRate 0.0951   Epoch: 0   Global Step: 4970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:25:22,993-Speed 10389.96 samples/sec   Loss 18.4944   LearningRate 0.0951   Epoch: 0   Global Step: 4980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:25:23,929-Speed 10948.41 samples/sec   Loss 18.6493   LearningRate 0.0951   Epoch: 0   Global Step: 4990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:25:24,904-Speed 10518.99 samples/sec   Loss 18.5402   LearningRate 0.0951   Epoch: 0   Global Step: 5000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:25:25,835-Speed 11006.58 samples/sec   Loss 18.6749   LearningRate 0.0951   Epoch: 0   Global Step: 5010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:25:26,810-Speed 10515.27 samples/sec   Loss 18.4367   LearningRate 0.0951   Epoch: 0   Global Step: 5020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:25:27,739-Speed 11022.24 samples/sec   Loss 18.6064   LearningRate 0.0951   Epoch: 0   Global Step: 5030   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:25:28,706-Speed 10603.58 samples/sec   Loss 18.4758   LearningRate 0.0951   Epoch: 0   Global Step: 5040   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:25:29,783-Speed 9519.14 samples/sec   Loss 18.3839   LearningRate 0.0951   Epoch: 0   Global Step: 5050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:25:41,146-Speed 901.29 samples/sec   Loss 18.0639   LearningRate 0.0951   Epoch: 1   Global Step: 5060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:25:42,086-Speed 10907.92 samples/sec   Loss 17.3678   LearningRate 0.0951   Epoch: 1   Global Step: 5070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:25:43,358-Speed 8053.51 samples/sec   Loss 17.2777   LearningRate 0.0950   Epoch: 1   Global Step: 5080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:25:44,384-Speed 9985.14 samples/sec   Loss 17.1803   LearningRate 0.0950   Epoch: 1   Global Step: 5090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:25:45,536-Speed 8894.26 samples/sec   Loss 17.4635   LearningRate 0.0950   Epoch: 1   Global Step: 5100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:25:46,621-Speed 9456.37 samples/sec   Loss 17.1053   LearningRate 0.0950   Epoch: 1   Global Step: 5110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:25:47,569-Speed 10809.15 samples/sec   Loss 17.2023   LearningRate 0.0950   Epoch: 1   Global Step: 5120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:25:48,536-Speed 10608.31 samples/sec   Loss 17.3112   LearningRate 0.0950   Epoch: 1   Global Step: 5130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:25:49,509-Speed 10533.57 samples/sec   Loss 17.4736   LearningRate 0.0950   Epoch: 1   Global Step: 5140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:25:50,475-Speed 10613.04 samples/sec   Loss 17.0809   LearningRate 0.0950   Epoch: 1   Global Step: 5150   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:25:51,458-Speed 10420.64 samples/sec   Loss 17.5287   LearningRate 0.0950   Epoch: 1   Global Step: 5160   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:25:52,410-Speed 10769.15 samples/sec   Loss 17.3183   LearningRate 0.0950   Epoch: 1   Global Step: 5170   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:25:53,412-Speed 10223.68 samples/sec   Loss 17.3674   LearningRate 0.0949   Epoch: 1   Global Step: 5180   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:25:54,376-Speed 10632.85 samples/sec   Loss 17.1280   LearningRate 0.0949   Epoch: 1   Global Step: 5190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:25:55,361-Speed 10417.70 samples/sec   Loss 17.4285   LearningRate 0.0949   Epoch: 1   Global Step: 5200   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:25:56,289-Speed 11039.26 samples/sec   Loss 17.4123   LearningRate 0.0949   Epoch: 1   Global Step: 5210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:25:57,301-Speed 10127.96 samples/sec   Loss 17.3166   LearningRate 0.0949   Epoch: 1   Global Step: 5220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:25:58,428-Speed 9089.55 samples/sec   Loss 17.2976   LearningRate 0.0949   Epoch: 1   Global Step: 5230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:25:59,531-Speed 9297.29 samples/sec   Loss 17.1530   LearningRate 0.0949   Epoch: 1   Global Step: 5240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:26:00,637-Speed 9264.75 samples/sec   Loss 17.2553   LearningRate 0.0949   Epoch: 1   Global Step: 5250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:26:01,597-Speed 10681.06 samples/sec   Loss 17.5369   LearningRate 0.0949   Epoch: 1   Global Step: 5260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:26:02,574-Speed 10489.82 samples/sec   Loss 17.4516   LearningRate 0.0949   Epoch: 1   Global Step: 5270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:26:03,564-Speed 10364.10 samples/sec   Loss 17.2954   LearningRate 0.0948   Epoch: 1   Global Step: 5280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:26:04,519-Speed 10733.51 samples/sec   Loss 17.3374   LearningRate 0.0948   Epoch: 1   Global Step: 5290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:05,525-Speed 10183.26 samples/sec   Loss 17.3051   LearningRate 0.0948   Epoch: 1   Global Step: 5300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:06,482-Speed 10713.61 samples/sec   Loss 17.1800   LearningRate 0.0948   Epoch: 1   Global Step: 5310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:07,486-Speed 10205.20 samples/sec   Loss 17.3617   LearningRate 0.0948   Epoch: 1   Global Step: 5320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:08,474-Speed 10381.46 samples/sec   Loss 17.3851   LearningRate 0.0948   Epoch: 1   Global Step: 5330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:09,424-Speed 10790.63 samples/sec   Loss 17.2661   LearningRate 0.0948   Epoch: 1   Global Step: 5340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:10,379-Speed 10734.57 samples/sec   Loss 17.2470   LearningRate 0.0948   Epoch: 1   Global Step: 5350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:11,336-Speed 10710.20 samples/sec   Loss 17.3330   LearningRate 0.0948   Epoch: 1   Global Step: 5360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:12,252-Speed 11192.86 samples/sec   Loss 17.2048   LearningRate 0.0948   Epoch: 1   Global Step: 5370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:13,237-Speed 10409.50 samples/sec   Loss 17.2632   LearningRate 0.0948   Epoch: 1   Global Step: 5380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:14,223-Speed 10386.96 samples/sec   Loss 17.3834   LearningRate 0.0947   Epoch: 1   Global Step: 5390   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:26:15,173-Speed 10788.82 samples/sec   Loss 17.0901   LearningRate 0.0947   Epoch: 1   Global Step: 5400   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:26:16,109-Speed 10955.53 samples/sec   Loss 17.2571   LearningRate 0.0947   Epoch: 1   Global Step: 5410   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:26:17,089-Speed 10456.15 samples/sec   Loss 17.3986   LearningRate 0.0947   Epoch: 1   Global Step: 5420   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:26:18,062-Speed 10530.53 samples/sec   Loss 17.4535   LearningRate 0.0947   Epoch: 1   Global Step: 5430   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:26:19,006-Speed 10860.02 samples/sec   Loss 17.1310   LearningRate 0.0947   Epoch: 1   Global Step: 5440   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:26:19,937-Speed 11017.87 samples/sec   Loss 17.1863   LearningRate 0.0947   Epoch: 1   Global Step: 5450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:20,890-Speed 10747.36 samples/sec   Loss 17.4337   LearningRate 0.0947   Epoch: 1   Global Step: 5460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:21,844-Speed 10745.25 samples/sec   Loss 17.0669   LearningRate 0.0947   Epoch: 1   Global Step: 5470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:22,841-Speed 10287.09 samples/sec   Loss 17.2276   LearningRate 0.0947   Epoch: 1   Global Step: 5480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:23,783-Speed 10884.93 samples/sec   Loss 17.2096   LearningRate 0.0946   Epoch: 1   Global Step: 5490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:24,739-Speed 10721.02 samples/sec   Loss 17.2349   LearningRate 0.0946   Epoch: 1   Global Step: 5500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:25,682-Speed 10865.06 samples/sec   Loss 17.2483   LearningRate 0.0946   Epoch: 1   Global Step: 5510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:26,661-Speed 10470.93 samples/sec   Loss 17.2845   LearningRate 0.0946   Epoch: 1   Global Step: 5520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:27,635-Speed 10523.26 samples/sec   Loss 16.8217   LearningRate 0.0946   Epoch: 1   Global Step: 5530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:28,633-Speed 10270.28 samples/sec   Loss 17.0210   LearningRate 0.0946   Epoch: 1   Global Step: 5540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:29,645-Speed 10133.04 samples/sec   Loss 17.0368   LearningRate 0.0946   Epoch: 1   Global Step: 5550   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:26:30,606-Speed 10661.77 samples/sec   Loss 17.2184   LearningRate 0.0946   Epoch: 1   Global Step: 5560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:31,566-Speed 10677.25 samples/sec   Loss 17.0249   LearningRate 0.0946   Epoch: 1   Global Step: 5570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:32,529-Speed 10653.38 samples/sec   Loss 17.2158   LearningRate 0.0946   Epoch: 1   Global Step: 5580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:33,510-Speed 10444.03 samples/sec   Loss 17.2524   LearningRate 0.0946   Epoch: 1   Global Step: 5590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:34,504-Speed 10313.76 samples/sec   Loss 16.9093   LearningRate 0.0945   Epoch: 1   Global Step: 5600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:35,478-Speed 10512.35 samples/sec   Loss 17.2530   LearningRate 0.0945   Epoch: 1   Global Step: 5610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:36,417-Speed 10916.88 samples/sec   Loss 16.9869   LearningRate 0.0945   Epoch: 1   Global Step: 5620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:37,398-Speed 10453.99 samples/sec   Loss 16.9259   LearningRate 0.0945   Epoch: 1   Global Step: 5630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:38,351-Speed 10749.13 samples/sec   Loss 17.1232   LearningRate 0.0945   Epoch: 1   Global Step: 5640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:39,284-Speed 10985.08 samples/sec   Loss 17.0234   LearningRate 0.0945   Epoch: 1   Global Step: 5650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:40,247-Speed 10644.82 samples/sec   Loss 17.0446   LearningRate 0.0945   Epoch: 1   Global Step: 5660   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:26:41,252-Speed 10204.50 samples/sec   Loss 17.1580   LearningRate 0.0945   Epoch: 1   Global Step: 5670   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:26:42,230-Speed 10479.37 samples/sec   Loss 16.9441   LearningRate 0.0945   Epoch: 1   Global Step: 5680   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:26:43,195-Speed 10610.04 samples/sec   Loss 16.9875   LearningRate 0.0945   Epoch: 1   Global Step: 5690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:44,176-Speed 10456.76 samples/sec   Loss 17.0969   LearningRate 0.0944   Epoch: 1   Global Step: 5700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:45,133-Speed 10704.33 samples/sec   Loss 16.9720   LearningRate 0.0944   Epoch: 1   Global Step: 5710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:46,109-Speed 10497.21 samples/sec   Loss 17.1962   LearningRate 0.0944   Epoch: 1   Global Step: 5720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:47,071-Speed 10661.54 samples/sec   Loss 17.0748   LearningRate 0.0944   Epoch: 1   Global Step: 5730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:48,048-Speed 10490.33 samples/sec   Loss 17.2307   LearningRate 0.0944   Epoch: 1   Global Step: 5740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:49,056-Speed 10164.57 samples/sec   Loss 16.9245   LearningRate 0.0944   Epoch: 1   Global Step: 5750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:50,035-Speed 10468.77 samples/sec   Loss 16.8470   LearningRate 0.0944   Epoch: 1   Global Step: 5760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:50,979-Speed 10867.64 samples/sec   Loss 17.0150   LearningRate 0.0944   Epoch: 1   Global Step: 5770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:51,944-Speed 10612.48 samples/sec   Loss 16.7068   LearningRate 0.0944   Epoch: 1   Global Step: 5780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:52,897-Speed 10754.59 samples/sec   Loss 16.9237   LearningRate 0.0944   Epoch: 1   Global Step: 5790   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:26:53,869-Speed 10547.62 samples/sec   Loss 16.9673   LearningRate 0.0943   Epoch: 1   Global Step: 5800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:54,825-Speed 10715.79 samples/sec   Loss 16.6765   LearningRate 0.0943   Epoch: 1   Global Step: 5810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:55,794-Speed 10574.77 samples/sec   Loss 16.7501   LearningRate 0.0943   Epoch: 1   Global Step: 5820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:56,761-Speed 10600.84 samples/sec   Loss 16.7576   LearningRate 0.0943   Epoch: 1   Global Step: 5830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:57,741-Speed 10463.87 samples/sec   Loss 16.8359   LearningRate 0.0943   Epoch: 1   Global Step: 5840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:58,711-Speed 10563.46 samples/sec   Loss 16.8441   LearningRate 0.0943   Epoch: 1   Global Step: 5850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:26:59,713-Speed 10239.27 samples/sec   Loss 16.9616   LearningRate 0.0943   Epoch: 1   Global Step: 5860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:27:00,665-Speed 10756.39 samples/sec   Loss 16.9129   LearningRate 0.0943   Epoch: 1   Global Step: 5870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:27:01,651-Speed 10401.62 samples/sec   Loss 16.8603   LearningRate 0.0943   Epoch: 1   Global Step: 5880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:27:02,612-Speed 10676.80 samples/sec   Loss 16.9059   LearningRate 0.0943   Epoch: 1   Global Step: 5890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:27:03,564-Speed 10765.91 samples/sec   Loss 16.9768   LearningRate 0.0943   Epoch: 1   Global Step: 5900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:27:04,544-Speed 10456.83 samples/sec   Loss 16.8157   LearningRate 0.0942   Epoch: 1   Global Step: 5910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:27:05,505-Speed 10658.05 samples/sec   Loss 16.9108   LearningRate 0.0942   Epoch: 1   Global Step: 5920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:27:06,511-Speed 10190.07 samples/sec   Loss 16.9137   LearningRate 0.0942   Epoch: 1   Global Step: 5930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:27:07,461-Speed 10794.33 samples/sec   Loss 16.9152   LearningRate 0.0942   Epoch: 1   Global Step: 5940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:27:08,419-Speed 10701.02 samples/sec   Loss 16.7021   LearningRate 0.0942   Epoch: 1   Global Step: 5950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:27:09,361-Speed 10877.96 samples/sec   Loss 16.8781   LearningRate 0.0942   Epoch: 1   Global Step: 5960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:27:10,389-Speed 9970.10 samples/sec   Loss 16.7323   LearningRate 0.0942   Epoch: 1   Global Step: 5970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:27:11,344-Speed 10735.21 samples/sec   Loss 16.8459   LearningRate 0.0942   Epoch: 1   Global Step: 5980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:27:12,287-Speed 10888.70 samples/sec   Loss 16.5414   LearningRate 0.0942   Epoch: 1   Global Step: 5990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:27:13,266-Speed 10465.57 samples/sec   Loss 16.7198   LearningRate 0.0942   Epoch: 1   Global Step: 6000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:27:36,011-[lfw][6000]XNorm: 17.172822
Training: 2022-04-10 23:27:36,012-[lfw][6000]Accuracy-Flip: 0.98700+-0.00614
Training: 2022-04-10 23:27:36,012-[lfw][6000]Accuracy-Highest: 0.98700
Training: 2022-04-10 23:28:01,556-[cfp_fp][6000]XNorm: 14.477218
Training: 2022-04-10 23:28:01,556-[cfp_fp][6000]Accuracy-Flip: 0.88343+-0.01855
Training: 2022-04-10 23:28:01,557-[cfp_fp][6000]Accuracy-Highest: 0.88343
Training: 2022-04-10 23:28:23,652-[agedb_30][6000]XNorm: 16.517357
Training: 2022-04-10 23:28:23,653-[agedb_30][6000]Accuracy-Flip: 0.90283+-0.02082
Training: 2022-04-10 23:28:23,654-[agedb_30][6000]Accuracy-Highest: 0.90283
Training: 2022-04-10 23:28:24,605-Speed 143.54 samples/sec   Loss 16.8388   LearningRate 0.0941   Epoch: 1   Global Step: 6010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:28:25,559-Speed 10743.47 samples/sec   Loss 16.9753   LearningRate 0.0941   Epoch: 1   Global Step: 6020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:28:26,528-Speed 10573.74 samples/sec   Loss 16.8952   LearningRate 0.0941   Epoch: 1   Global Step: 6030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:28:27,532-Speed 10206.61 samples/sec   Loss 16.6381   LearningRate 0.0941   Epoch: 1   Global Step: 6040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:28:28,495-Speed 10644.74 samples/sec   Loss 16.8902   LearningRate 0.0941   Epoch: 1   Global Step: 6050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:28:29,459-Speed 10637.05 samples/sec   Loss 16.5703   LearningRate 0.0941   Epoch: 1   Global Step: 6060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:28:30,405-Speed 10844.96 samples/sec   Loss 16.8678   LearningRate 0.0941   Epoch: 1   Global Step: 6070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:28:31,351-Speed 10838.68 samples/sec   Loss 16.6410   LearningRate 0.0941   Epoch: 1   Global Step: 6080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:28:32,321-Speed 10564.83 samples/sec   Loss 16.7747   LearningRate 0.0941   Epoch: 1   Global Step: 6090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:28:33,303-Speed 10435.86 samples/sec   Loss 16.6829   LearningRate 0.0941   Epoch: 1   Global Step: 6100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:28:34,261-Speed 10709.13 samples/sec   Loss 16.5976   LearningRate 0.0941   Epoch: 1   Global Step: 6110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:28:35,230-Speed 10572.22 samples/sec   Loss 16.6782   LearningRate 0.0940   Epoch: 1   Global Step: 6120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:28:36,174-Speed 10859.36 samples/sec   Loss 16.7378   LearningRate 0.0940   Epoch: 1   Global Step: 6130   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:28:37,155-Speed 10440.02 samples/sec   Loss 16.7406   LearningRate 0.0940   Epoch: 1   Global Step: 6140   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:28:38,113-Speed 10703.85 samples/sec   Loss 16.5415   LearningRate 0.0940   Epoch: 1   Global Step: 6150   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:28:39,089-Speed 10504.49 samples/sec   Loss 16.7332   LearningRate 0.0940   Epoch: 1   Global Step: 6160   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:28:40,034-Speed 10840.72 samples/sec   Loss 16.7055   LearningRate 0.0940   Epoch: 1   Global Step: 6170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:28:40,995-Speed 10681.81 samples/sec   Loss 16.7623   LearningRate 0.0940   Epoch: 1   Global Step: 6180   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:28:41,948-Speed 10755.65 samples/sec   Loss 16.5505   LearningRate 0.0940   Epoch: 1   Global Step: 6190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:28:42,943-Speed 10300.18 samples/sec   Loss 16.6822   LearningRate 0.0940   Epoch: 1   Global Step: 6200   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:28:43,927-Speed 10410.60 samples/sec   Loss 16.6904   LearningRate 0.0940   Epoch: 1   Global Step: 6210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:28:44,887-Speed 10672.27 samples/sec   Loss 16.5473   LearningRate 0.0939   Epoch: 1   Global Step: 6220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:28:45,851-Speed 10637.56 samples/sec   Loss 16.4477   LearningRate 0.0939   Epoch: 1   Global Step: 6230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:28:46,799-Speed 10818.07 samples/sec   Loss 16.5789   LearningRate 0.0939   Epoch: 1   Global Step: 6240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:28:47,818-Speed 10065.09 samples/sec   Loss 16.6224   LearningRate 0.0939   Epoch: 1   Global Step: 6250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:28:48,779-Speed 10662.62 samples/sec   Loss 16.6595   LearningRate 0.0939   Epoch: 1   Global Step: 6260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:28:49,740-Speed 10666.03 samples/sec   Loss 16.5995   LearningRate 0.0939   Epoch: 1   Global Step: 6270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:28:50,685-Speed 10846.62 samples/sec   Loss 16.4565   LearningRate 0.0939   Epoch: 1   Global Step: 6280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:28:51,633-Speed 10810.88 samples/sec   Loss 16.4512   LearningRate 0.0939   Epoch: 1   Global Step: 6290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:28:52,570-Speed 10941.83 samples/sec   Loss 16.7447   LearningRate 0.0939   Epoch: 1   Global Step: 6300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:28:53,601-Speed 9943.08 samples/sec   Loss 16.3425   LearningRate 0.0939   Epoch: 1   Global Step: 6310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:28:54,680-Speed 9500.38 samples/sec   Loss 16.6750   LearningRate 0.0939   Epoch: 1   Global Step: 6320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:28:55,667-Speed 10384.62 samples/sec   Loss 16.4619   LearningRate 0.0938   Epoch: 1   Global Step: 6330   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:28:56,611-Speed 10851.62 samples/sec   Loss 16.4968   LearningRate 0.0938   Epoch: 1   Global Step: 6340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:28:57,598-Speed 10388.10 samples/sec   Loss 16.3081   LearningRate 0.0938   Epoch: 1   Global Step: 6350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:28:58,581-Speed 10430.74 samples/sec   Loss 16.4554   LearningRate 0.0938   Epoch: 1   Global Step: 6360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:28:59,556-Speed 10513.10 samples/sec   Loss 16.4718   LearningRate 0.0938   Epoch: 1   Global Step: 6370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:29:00,514-Speed 10700.77 samples/sec   Loss 16.5112   LearningRate 0.0938   Epoch: 1   Global Step: 6380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:29:01,467-Speed 10755.42 samples/sec   Loss 16.5560   LearningRate 0.0938   Epoch: 1   Global Step: 6390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:29:02,423-Speed 10730.37 samples/sec   Loss 16.4757   LearningRate 0.0938   Epoch: 1   Global Step: 6400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:29:03,397-Speed 10560.01 samples/sec   Loss 16.5310   LearningRate 0.0938   Epoch: 1   Global Step: 6410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:29:04,381-Speed 10418.49 samples/sec   Loss 16.5564   LearningRate 0.0938   Epoch: 1   Global Step: 6420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:29:05,356-Speed 10502.73 samples/sec   Loss 16.3250   LearningRate 0.0937   Epoch: 1   Global Step: 6430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:29:06,354-Speed 10280.96 samples/sec   Loss 16.6166   LearningRate 0.0937   Epoch: 1   Global Step: 6440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:29:07,305-Speed 10785.01 samples/sec   Loss 16.3808   LearningRate 0.0937   Epoch: 1   Global Step: 6450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:29:08,266-Speed 10664.43 samples/sec   Loss 16.2121   LearningRate 0.0937   Epoch: 1   Global Step: 6460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:29:09,275-Speed 10152.25 samples/sec   Loss 16.4510   LearningRate 0.0937   Epoch: 1   Global Step: 6470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:29:10,240-Speed 10631.82 samples/sec   Loss 16.4026   LearningRate 0.0937   Epoch: 1   Global Step: 6480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:29:11,232-Speed 10330.88 samples/sec   Loss 16.3035   LearningRate 0.0937   Epoch: 1   Global Step: 6490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:29:12,155-Speed 11108.69 samples/sec   Loss 16.3107   LearningRate 0.0937   Epoch: 1   Global Step: 6500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:29:13,111-Speed 10714.60 samples/sec   Loss 16.2690   LearningRate 0.0937   Epoch: 1   Global Step: 6510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:29:14,105-Speed 10324.35 samples/sec   Loss 16.2596   LearningRate 0.0937   Epoch: 1   Global Step: 6520   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:29:15,059-Speed 10745.47 samples/sec   Loss 16.1946   LearningRate 0.0936   Epoch: 1   Global Step: 6530   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:29:16,052-Speed 10330.64 samples/sec   Loss 16.3075   LearningRate 0.0936   Epoch: 1   Global Step: 6540   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:29:17,031-Speed 10465.88 samples/sec   Loss 16.1610   LearningRate 0.0936   Epoch: 1   Global Step: 6550   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:29:18,001-Speed 10568.49 samples/sec   Loss 16.1273   LearningRate 0.0936   Epoch: 1   Global Step: 6560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:29:19,004-Speed 10216.25 samples/sec   Loss 16.2830   LearningRate 0.0936   Epoch: 1   Global Step: 6570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:29:19,945-Speed 10893.72 samples/sec   Loss 16.0524   LearningRate 0.0936   Epoch: 1   Global Step: 6580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:29:20,888-Speed 10874.98 samples/sec   Loss 16.3649   LearningRate 0.0936   Epoch: 1   Global Step: 6590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:29:21,866-Speed 10468.51 samples/sec   Loss 16.2966   LearningRate 0.0936   Epoch: 1   Global Step: 6600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:29:22,821-Speed 10743.74 samples/sec   Loss 16.5212   LearningRate 0.0936   Epoch: 1   Global Step: 6610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:29:23,787-Speed 10609.83 samples/sec   Loss 16.2931   LearningRate 0.0936   Epoch: 1   Global Step: 6620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:29:24,741-Speed 10744.24 samples/sec   Loss 16.1389   LearningRate 0.0936   Epoch: 1   Global Step: 6630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:29:25,692-Speed 10779.68 samples/sec   Loss 16.3038   LearningRate 0.0935   Epoch: 1   Global Step: 6640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:29:26,677-Speed 10405.94 samples/sec   Loss 16.3127   LearningRate 0.0935   Epoch: 1   Global Step: 6650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:29:27,656-Speed 10461.46 samples/sec   Loss 16.0731   LearningRate 0.0935   Epoch: 1   Global Step: 6660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:29:28,611-Speed 10739.01 samples/sec   Loss 16.2232   LearningRate 0.0935   Epoch: 1   Global Step: 6670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:29:29,590-Speed 10466.86 samples/sec   Loss 16.2056   LearningRate 0.0935   Epoch: 1   Global Step: 6680   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:29:30,535-Speed 10850.38 samples/sec   Loss 16.2569   LearningRate 0.0935   Epoch: 1   Global Step: 6690   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:29:31,497-Speed 10649.23 samples/sec   Loss 16.1514   LearningRate 0.0935   Epoch: 1   Global Step: 6700   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:29:32,473-Speed 10508.34 samples/sec   Loss 16.0320   LearningRate 0.0935   Epoch: 1   Global Step: 6710   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:29:33,417-Speed 10861.65 samples/sec   Loss 16.0329   LearningRate 0.0935   Epoch: 1   Global Step: 6720   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:29:34,409-Speed 10326.21 samples/sec   Loss 16.2527   LearningRate 0.0935   Epoch: 1   Global Step: 6730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:29:35,370-Speed 10664.06 samples/sec   Loss 15.9836   LearningRate 0.0934   Epoch: 1   Global Step: 6740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:29:36,349-Speed 10479.78 samples/sec   Loss 16.0479   LearningRate 0.0934   Epoch: 1   Global Step: 6750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:29:37,302-Speed 10747.39 samples/sec   Loss 16.1936   LearningRate 0.0934   Epoch: 1   Global Step: 6760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:29:38,265-Speed 10649.07 samples/sec   Loss 16.1990   LearningRate 0.0934   Epoch: 1   Global Step: 6770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:29:39,200-Speed 10958.76 samples/sec   Loss 16.1584   LearningRate 0.0934   Epoch: 1   Global Step: 6780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:29:40,175-Speed 10513.23 samples/sec   Loss 16.0511   LearningRate 0.0934   Epoch: 1   Global Step: 6790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:29:41,192-Speed 10083.93 samples/sec   Loss 16.0847   LearningRate 0.0934   Epoch: 1   Global Step: 6800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:29:42,169-Speed 10483.55 samples/sec   Loss 15.9704   LearningRate 0.0934   Epoch: 1   Global Step: 6810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:29:43,104-Speed 10968.99 samples/sec   Loss 16.1108   LearningRate 0.0934   Epoch: 1   Global Step: 6820   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:29:44,065-Speed 10656.51 samples/sec   Loss 16.2047   LearningRate 0.0934   Epoch: 1   Global Step: 6830   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:29:45,028-Speed 10650.30 samples/sec   Loss 16.0744   LearningRate 0.0934   Epoch: 1   Global Step: 6840   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:29:46,000-Speed 10548.24 samples/sec   Loss 16.1528   LearningRate 0.0933   Epoch: 1   Global Step: 6850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:29:46,967-Speed 10603.27 samples/sec   Loss 15.9580   LearningRate 0.0933   Epoch: 1   Global Step: 6860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:29:47,917-Speed 10791.70 samples/sec   Loss 15.8687   LearningRate 0.0933   Epoch: 1   Global Step: 6870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:29:48,861-Speed 10859.36 samples/sec   Loss 15.7680   LearningRate 0.0933   Epoch: 1   Global Step: 6880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:29:49,827-Speed 10610.67 samples/sec   Loss 16.0538   LearningRate 0.0933   Epoch: 1   Global Step: 6890   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:29:50,807-Speed 10453.47 samples/sec   Loss 15.9377   LearningRate 0.0933   Epoch: 1   Global Step: 6900   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:29:51,759-Speed 10769.65 samples/sec   Loss 16.2202   LearningRate 0.0933   Epoch: 1   Global Step: 6910   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:29:52,701-Speed 10892.92 samples/sec   Loss 16.0745   LearningRate 0.0933   Epoch: 1   Global Step: 6920   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:29:53,682-Speed 10447.60 samples/sec   Loss 15.9760   LearningRate 0.0933   Epoch: 1   Global Step: 6930   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:29:54,639-Speed 10703.37 samples/sec   Loss 15.7550   LearningRate 0.0933   Epoch: 1   Global Step: 6940   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:29:55,594-Speed 10737.09 samples/sec   Loss 15.9304   LearningRate 0.0932   Epoch: 1   Global Step: 6950   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:29:56,538-Speed 10847.73 samples/sec   Loss 15.9611   LearningRate 0.0932   Epoch: 1   Global Step: 6960   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:29:57,589-Speed 9756.44 samples/sec   Loss 16.0462   LearningRate 0.0932   Epoch: 1   Global Step: 6970   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:29:58,577-Speed 10372.97 samples/sec   Loss 15.8595   LearningRate 0.0932   Epoch: 1   Global Step: 6980   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:29:59,532-Speed 10738.80 samples/sec   Loss 15.7513   LearningRate 0.0932   Epoch: 1   Global Step: 6990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:30:00,493-Speed 10659.13 samples/sec   Loss 16.0838   LearningRate 0.0932   Epoch: 1   Global Step: 7000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:30:01,475-Speed 10433.58 samples/sec   Loss 15.9007   LearningRate 0.0932   Epoch: 1   Global Step: 7010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:30:02,484-Speed 10164.41 samples/sec   Loss 15.9832   LearningRate 0.0932   Epoch: 1   Global Step: 7020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:30:03,461-Speed 10489.10 samples/sec   Loss 15.7450   LearningRate 0.0932   Epoch: 1   Global Step: 7030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:30:04,437-Speed 10505.39 samples/sec   Loss 15.8796   LearningRate 0.0932   Epoch: 1   Global Step: 7040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:30:05,382-Speed 10836.02 samples/sec   Loss 15.6633   LearningRate 0.0932   Epoch: 1   Global Step: 7050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:30:06,356-Speed 10531.22 samples/sec   Loss 15.8664   LearningRate 0.0931   Epoch: 1   Global Step: 7060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:30:07,326-Speed 10563.18 samples/sec   Loss 15.8311   LearningRate 0.0931   Epoch: 1   Global Step: 7070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:30:08,282-Speed 10726.38 samples/sec   Loss 15.7340   LearningRate 0.0931   Epoch: 1   Global Step: 7080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:30:09,271-Speed 10368.91 samples/sec   Loss 15.8105   LearningRate 0.0931   Epoch: 1   Global Step: 7090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:30:10,207-Speed 10945.81 samples/sec   Loss 16.0435   LearningRate 0.0931   Epoch: 1   Global Step: 7100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:30:11,157-Speed 10789.93 samples/sec   Loss 15.8661   LearningRate 0.0931   Epoch: 1   Global Step: 7110   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:30:12,102-Speed 10843.34 samples/sec   Loss 15.9769   LearningRate 0.0931   Epoch: 1   Global Step: 7120   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:30:13,044-Speed 10884.32 samples/sec   Loss 15.9153   LearningRate 0.0931   Epoch: 1   Global Step: 7130   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:30:13,994-Speed 10786.06 samples/sec   Loss 15.8106   LearningRate 0.0931   Epoch: 1   Global Step: 7140   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:30:14,980-Speed 10401.27 samples/sec   Loss 16.0046   LearningRate 0.0931   Epoch: 1   Global Step: 7150   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:30:15,926-Speed 10832.87 samples/sec   Loss 15.8229   LearningRate 0.0930   Epoch: 1   Global Step: 7160   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:30:16,907-Speed 10447.64 samples/sec   Loss 15.8433   LearningRate 0.0930   Epoch: 1   Global Step: 7170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:30:17,887-Speed 10460.17 samples/sec   Loss 15.6755   LearningRate 0.0930   Epoch: 1   Global Step: 7180   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:30:18,840-Speed 10756.90 samples/sec   Loss 15.7239   LearningRate 0.0930   Epoch: 1   Global Step: 7190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:30:19,842-Speed 10222.35 samples/sec   Loss 15.8167   LearningRate 0.0930   Epoch: 1   Global Step: 7200   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:30:20,769-Speed 11059.56 samples/sec   Loss 15.7981   LearningRate 0.0930   Epoch: 1   Global Step: 7210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:30:21,723-Speed 10746.04 samples/sec   Loss 15.8606   LearningRate 0.0930   Epoch: 1   Global Step: 7220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:30:22,723-Speed 10254.09 samples/sec   Loss 15.6595   LearningRate 0.0930   Epoch: 1   Global Step: 7230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:30:23,674-Speed 10774.57 samples/sec   Loss 15.7416   LearningRate 0.0930   Epoch: 1   Global Step: 7240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:30:24,625-Speed 10782.84 samples/sec   Loss 15.7264   LearningRate 0.0930   Epoch: 1   Global Step: 7250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:30:25,575-Speed 10783.79 samples/sec   Loss 15.8014   LearningRate 0.0930   Epoch: 1   Global Step: 7260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:30:26,536-Speed 10663.67 samples/sec   Loss 15.7144   LearningRate 0.0929   Epoch: 1   Global Step: 7270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:30:27,524-Speed 10374.00 samples/sec   Loss 15.6876   LearningRate 0.0929   Epoch: 1   Global Step: 7280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:30:28,503-Speed 10466.18 samples/sec   Loss 15.7067   LearningRate 0.0929   Epoch: 1   Global Step: 7290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:30:29,515-Speed 10130.20 samples/sec   Loss 15.7955   LearningRate 0.0929   Epoch: 1   Global Step: 7300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:30:30,488-Speed 10538.68 samples/sec   Loss 15.5663   LearningRate 0.0929   Epoch: 1   Global Step: 7310   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:30:31,427-Speed 10911.58 samples/sec   Loss 15.6623   LearningRate 0.0929   Epoch: 1   Global Step: 7320   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:30:32,374-Speed 10821.94 samples/sec   Loss 15.8217   LearningRate 0.0929   Epoch: 1   Global Step: 7330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:30:33,380-Speed 10192.69 samples/sec   Loss 15.7056   LearningRate 0.0929   Epoch: 1   Global Step: 7340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:30:34,326-Speed 10838.15 samples/sec   Loss 15.7041   LearningRate 0.0929   Epoch: 1   Global Step: 7350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:30:35,276-Speed 10789.33 samples/sec   Loss 15.4645   LearningRate 0.0929   Epoch: 1   Global Step: 7360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:30:36,240-Speed 10628.07 samples/sec   Loss 15.6885   LearningRate 0.0928   Epoch: 1   Global Step: 7370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:30:37,173-Speed 10983.57 samples/sec   Loss 15.6912   LearningRate 0.0928   Epoch: 1   Global Step: 7380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:30:38,153-Speed 10457.14 samples/sec   Loss 15.6511   LearningRate 0.0928   Epoch: 1   Global Step: 7390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:30:39,197-Speed 9828.90 samples/sec   Loss 15.6462   LearningRate 0.0928   Epoch: 1   Global Step: 7400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:30:40,182-Speed 10402.81 samples/sec   Loss 15.4929   LearningRate 0.0928   Epoch: 1   Global Step: 7410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:30:41,152-Speed 10568.41 samples/sec   Loss 15.6265   LearningRate 0.0928   Epoch: 1   Global Step: 7420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:30:42,102-Speed 10787.80 samples/sec   Loss 15.5618   LearningRate 0.0928   Epoch: 1   Global Step: 7430   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:30:43,085-Speed 10429.64 samples/sec   Loss 15.5128   LearningRate 0.0928   Epoch: 1   Global Step: 7440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:30:44,071-Speed 10401.66 samples/sec   Loss 15.4134   LearningRate 0.0928   Epoch: 1   Global Step: 7450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:30:44,990-Speed 11151.90 samples/sec   Loss 15.5669   LearningRate 0.0928   Epoch: 1   Global Step: 7460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:30:45,973-Speed 10430.35 samples/sec   Loss 15.6678   LearningRate 0.0928   Epoch: 1   Global Step: 7470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:30:46,930-Speed 10709.11 samples/sec   Loss 15.2959   LearningRate 0.0927   Epoch: 1   Global Step: 7480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:30:47,870-Speed 10894.73 samples/sec   Loss 15.7361   LearningRate 0.0927   Epoch: 1   Global Step: 7490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:30:48,926-Speed 9709.09 samples/sec   Loss 15.6004   LearningRate 0.0927   Epoch: 1   Global Step: 7500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:30:49,901-Speed 10516.56 samples/sec   Loss 15.5528   LearningRate 0.0927   Epoch: 1   Global Step: 7510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:30:50,876-Speed 10512.90 samples/sec   Loss 15.3690   LearningRate 0.0927   Epoch: 1   Global Step: 7520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:30:51,856-Speed 10451.28 samples/sec   Loss 15.4680   LearningRate 0.0927   Epoch: 1   Global Step: 7530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:30:52,821-Speed 10623.34 samples/sec   Loss 15.3977   LearningRate 0.0927   Epoch: 1   Global Step: 7540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:30:53,761-Speed 10911.65 samples/sec   Loss 15.5039   LearningRate 0.0927   Epoch: 1   Global Step: 7550   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:30:54,749-Speed 10374.24 samples/sec   Loss 15.6707   LearningRate 0.0927   Epoch: 1   Global Step: 7560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:30:55,697-Speed 10814.89 samples/sec   Loss 15.5336   LearningRate 0.0927   Epoch: 1   Global Step: 7570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:30:56,669-Speed 10546.62 samples/sec   Loss 15.5854   LearningRate 0.0926   Epoch: 1   Global Step: 7580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:30:57,644-Speed 10514.23 samples/sec   Loss 15.1840   LearningRate 0.0926   Epoch: 1   Global Step: 7590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:30:58,617-Speed 10527.85 samples/sec   Loss 15.5696   LearningRate 0.0926   Epoch: 1   Global Step: 7600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:30:59,588-Speed 10552.89 samples/sec   Loss 15.5332   LearningRate 0.0926   Epoch: 1   Global Step: 7610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:31:00,578-Speed 10361.82 samples/sec   Loss 15.3153   LearningRate 0.0926   Epoch: 1   Global Step: 7620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:31:01,546-Speed 10590.33 samples/sec   Loss 15.3034   LearningRate 0.0926   Epoch: 1   Global Step: 7630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:31:02,496-Speed 10786.22 samples/sec   Loss 15.3205   LearningRate 0.0926   Epoch: 1   Global Step: 7640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:31:03,497-Speed 10244.69 samples/sec   Loss 15.4071   LearningRate 0.0926   Epoch: 1   Global Step: 7650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:31:04,455-Speed 10695.05 samples/sec   Loss 15.2479   LearningRate 0.0926   Epoch: 1   Global Step: 7660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:31:05,367-Speed 11245.72 samples/sec   Loss 15.4399   LearningRate 0.0926   Epoch: 1   Global Step: 7670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:31:06,362-Speed 10297.25 samples/sec   Loss 15.3918   LearningRate 0.0926   Epoch: 1   Global Step: 7680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:31:07,326-Speed 10627.60 samples/sec   Loss 15.2675   LearningRate 0.0925   Epoch: 1   Global Step: 7690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:31:08,324-Speed 10275.48 samples/sec   Loss 15.3967   LearningRate 0.0925   Epoch: 1   Global Step: 7700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:31:09,281-Speed 10715.36 samples/sec   Loss 15.3457   LearningRate 0.0925   Epoch: 1   Global Step: 7710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:31:10,243-Speed 10654.33 samples/sec   Loss 15.3650   LearningRate 0.0925   Epoch: 1   Global Step: 7720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:31:11,180-Speed 10933.91 samples/sec   Loss 15.3377   LearningRate 0.0925   Epoch: 1   Global Step: 7730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:31:12,113-Speed 10987.71 samples/sec   Loss 15.3574   LearningRate 0.0925   Epoch: 1   Global Step: 7740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:31:13,062-Speed 10801.35 samples/sec   Loss 15.1974   LearningRate 0.0925   Epoch: 1   Global Step: 7750   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:31:14,024-Speed 10651.66 samples/sec   Loss 15.3168   LearningRate 0.0925   Epoch: 1   Global Step: 7760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:31:14,977-Speed 10757.54 samples/sec   Loss 15.2129   LearningRate 0.0925   Epoch: 1   Global Step: 7770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:31:15,888-Speed 11257.18 samples/sec   Loss 15.4398   LearningRate 0.0925   Epoch: 1   Global Step: 7780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:31:16,859-Speed 10560.24 samples/sec   Loss 15.4490   LearningRate 0.0924   Epoch: 1   Global Step: 7790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:31:17,808-Speed 10797.98 samples/sec   Loss 15.2418   LearningRate 0.0924   Epoch: 1   Global Step: 7800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:31:18,758-Speed 10790.50 samples/sec   Loss 15.3722   LearningRate 0.0924   Epoch: 1   Global Step: 7810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:31:19,673-Speed 11210.64 samples/sec   Loss 15.1391   LearningRate 0.0924   Epoch: 1   Global Step: 7820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:31:20,637-Speed 10638.43 samples/sec   Loss 15.2819   LearningRate 0.0924   Epoch: 1   Global Step: 7830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:31:21,593-Speed 10719.41 samples/sec   Loss 15.3998   LearningRate 0.0924   Epoch: 1   Global Step: 7840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:31:22,559-Speed 10609.06 samples/sec   Loss 15.2446   LearningRate 0.0924   Epoch: 1   Global Step: 7850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:31:23,519-Speed 10679.34 samples/sec   Loss 15.3143   LearningRate 0.0924   Epoch: 1   Global Step: 7860   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:31:24,502-Speed 10426.08 samples/sec   Loss 15.1881   LearningRate 0.0924   Epoch: 1   Global Step: 7870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:31:25,479-Speed 10487.51 samples/sec   Loss 15.1074   LearningRate 0.0924   Epoch: 1   Global Step: 7880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:31:26,435-Speed 10721.97 samples/sec   Loss 15.0902   LearningRate 0.0924   Epoch: 1   Global Step: 7890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:31:27,392-Speed 10712.63 samples/sec   Loss 15.2559   LearningRate 0.0923   Epoch: 1   Global Step: 7900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:31:28,354-Speed 10657.27 samples/sec   Loss 15.3415   LearningRate 0.0923   Epoch: 1   Global Step: 7910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:31:29,369-Speed 10101.73 samples/sec   Loss 15.3929   LearningRate 0.0923   Epoch: 1   Global Step: 7920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:31:30,330-Speed 10664.51 samples/sec   Loss 15.3137   LearningRate 0.0923   Epoch: 1   Global Step: 7930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:31:31,289-Speed 10687.81 samples/sec   Loss 15.0722   LearningRate 0.0923   Epoch: 1   Global Step: 7940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:31:32,216-Speed 11052.95 samples/sec   Loss 15.0144   LearningRate 0.0923   Epoch: 1   Global Step: 7950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:31:33,160-Speed 10859.85 samples/sec   Loss 15.2435   LearningRate 0.0923   Epoch: 1   Global Step: 7960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:31:34,181-Speed 10047.33 samples/sec   Loss 15.1681   LearningRate 0.0923   Epoch: 1   Global Step: 7970   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:31:35,170-Speed 10356.12 samples/sec   Loss 15.2296   LearningRate 0.0923   Epoch: 1   Global Step: 7980   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:31:36,134-Speed 10631.77 samples/sec   Loss 15.0135   LearningRate 0.0923   Epoch: 1   Global Step: 7990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:31:37,120-Speed 10390.25 samples/sec   Loss 15.2524   LearningRate 0.0922   Epoch: 1   Global Step: 8000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:31:59,517-[lfw][8000]XNorm: 16.266155
Training: 2022-04-10 23:31:59,517-[lfw][8000]Accuracy-Flip: 0.98933+-0.00374
Training: 2022-04-10 23:31:59,518-[lfw][8000]Accuracy-Highest: 0.98933
Training: 2022-04-10 23:32:25,161-[cfp_fp][8000]XNorm: 13.833325
Training: 2022-04-10 23:32:25,162-[cfp_fp][8000]Accuracy-Flip: 0.90686+-0.01870
Training: 2022-04-10 23:32:25,162-[cfp_fp][8000]Accuracy-Highest: 0.90686
Training: 2022-04-10 23:32:47,824-[agedb_30][8000]XNorm: 15.790380
Training: 2022-04-10 23:32:47,825-[agedb_30][8000]Accuracy-Flip: 0.91283+-0.01954
Training: 2022-04-10 23:32:47,826-[agedb_30][8000]Accuracy-Highest: 0.91283
Training: 2022-04-10 23:32:48,766-Speed 142.93 samples/sec   Loss 15.2741   LearningRate 0.0922   Epoch: 1   Global Step: 8010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:32:49,740-Speed 10528.02 samples/sec   Loss 15.1934   LearningRate 0.0922   Epoch: 1   Global Step: 8020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:32:50,687-Speed 10826.81 samples/sec   Loss 15.1844   LearningRate 0.0922   Epoch: 1   Global Step: 8030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:32:51,654-Speed 10601.05 samples/sec   Loss 15.2782   LearningRate 0.0922   Epoch: 1   Global Step: 8040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:32:52,615-Speed 10660.50 samples/sec   Loss 15.1322   LearningRate 0.0922   Epoch: 1   Global Step: 8050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:32:53,561-Speed 10834.22 samples/sec   Loss 15.1024   LearningRate 0.0922   Epoch: 1   Global Step: 8060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:32:54,510-Speed 10805.19 samples/sec   Loss 15.1390   LearningRate 0.0922   Epoch: 1   Global Step: 8070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:32:55,500-Speed 10351.47 samples/sec   Loss 14.9849   LearningRate 0.0922   Epoch: 1   Global Step: 8080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:32:56,458-Speed 10694.70 samples/sec   Loss 15.2656   LearningRate 0.0922   Epoch: 1   Global Step: 8090   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:32:57,454-Speed 10289.14 samples/sec   Loss 15.2562   LearningRate 0.0922   Epoch: 1   Global Step: 8100   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:32:58,404-Speed 10801.48 samples/sec   Loss 14.9887   LearningRate 0.0921   Epoch: 1   Global Step: 8110   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:32:59,368-Speed 10628.86 samples/sec   Loss 14.9707   LearningRate 0.0921   Epoch: 1   Global Step: 8120   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:33:00,316-Speed 10816.89 samples/sec   Loss 15.1017   LearningRate 0.0921   Epoch: 1   Global Step: 8130   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:33:01,261-Speed 10841.28 samples/sec   Loss 15.0671   LearningRate 0.0921   Epoch: 1   Global Step: 8140   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:33:02,213-Speed 10767.67 samples/sec   Loss 15.1345   LearningRate 0.0921   Epoch: 1   Global Step: 8150   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:33:03,198-Speed 10412.08 samples/sec   Loss 14.9260   LearningRate 0.0921   Epoch: 1   Global Step: 8160   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:33:04,162-Speed 10626.30 samples/sec   Loss 14.9585   LearningRate 0.0921   Epoch: 1   Global Step: 8170   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:33:05,105-Speed 10868.95 samples/sec   Loss 15.2158   LearningRate 0.0921   Epoch: 1   Global Step: 8180   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:33:06,091-Speed 10403.43 samples/sec   Loss 14.9332   LearningRate 0.0921   Epoch: 1   Global Step: 8190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:07,020-Speed 11030.61 samples/sec   Loss 15.1470   LearningRate 0.0921   Epoch: 1   Global Step: 8200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:07,987-Speed 10606.22 samples/sec   Loss 15.2906   LearningRate 0.0920   Epoch: 1   Global Step: 8210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:08,952-Speed 10617.37 samples/sec   Loss 15.1390   LearningRate 0.0920   Epoch: 1   Global Step: 8220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:09,937-Speed 10407.32 samples/sec   Loss 15.0825   LearningRate 0.0920   Epoch: 1   Global Step: 8230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:10,946-Speed 10160.64 samples/sec   Loss 15.2344   LearningRate 0.0920   Epoch: 1   Global Step: 8240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:11,888-Speed 10881.21 samples/sec   Loss 15.2469   LearningRate 0.0920   Epoch: 1   Global Step: 8250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:12,847-Speed 10690.10 samples/sec   Loss 15.0583   LearningRate 0.0920   Epoch: 1   Global Step: 8260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:13,839-Speed 10328.84 samples/sec   Loss 15.0150   LearningRate 0.0920   Epoch: 1   Global Step: 8270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:14,832-Speed 10321.92 samples/sec   Loss 14.9919   LearningRate 0.0920   Epoch: 1   Global Step: 8280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:15,784-Speed 10773.17 samples/sec   Loss 14.9615   LearningRate 0.0920   Epoch: 1   Global Step: 8290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:16,722-Speed 10921.76 samples/sec   Loss 15.0682   LearningRate 0.0920   Epoch: 1   Global Step: 8300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:17,696-Speed 10525.49 samples/sec   Loss 15.1520   LearningRate 0.0920   Epoch: 1   Global Step: 8310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:18,692-Speed 10508.24 samples/sec   Loss 15.0274   LearningRate 0.0919   Epoch: 1   Global Step: 8320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:19,682-Speed 10353.37 samples/sec   Loss 14.9512   LearningRate 0.0919   Epoch: 1   Global Step: 8330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:20,636-Speed 10747.47 samples/sec   Loss 15.2060   LearningRate 0.0919   Epoch: 1   Global Step: 8340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:21,596-Speed 10674.80 samples/sec   Loss 14.7699   LearningRate 0.0919   Epoch: 1   Global Step: 8350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:22,570-Speed 10524.36 samples/sec   Loss 14.9299   LearningRate 0.0919   Epoch: 1   Global Step: 8360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:23,550-Speed 10454.63 samples/sec   Loss 15.1317   LearningRate 0.0919   Epoch: 1   Global Step: 8370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:24,502-Speed 10767.13 samples/sec   Loss 14.9604   LearningRate 0.0919   Epoch: 1   Global Step: 8380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:25,467-Speed 10625.91 samples/sec   Loss 14.9883   LearningRate 0.0919   Epoch: 1   Global Step: 8390   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:33:26,400-Speed 10989.78 samples/sec   Loss 14.7972   LearningRate 0.0919   Epoch: 1   Global Step: 8400   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:33:27,365-Speed 10614.13 samples/sec   Loss 15.0955   LearningRate 0.0919   Epoch: 1   Global Step: 8410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:28,366-Speed 10238.71 samples/sec   Loss 14.9952   LearningRate 0.0918   Epoch: 1   Global Step: 8420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:29,317-Speed 10784.28 samples/sec   Loss 14.9759   LearningRate 0.0918   Epoch: 1   Global Step: 8430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:30,277-Speed 10671.33 samples/sec   Loss 14.9504   LearningRate 0.0918   Epoch: 1   Global Step: 8440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:31,310-Speed 9917.96 samples/sec   Loss 14.9687   LearningRate 0.0918   Epoch: 1   Global Step: 8450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:32,256-Speed 10847.96 samples/sec   Loss 15.0079   LearningRate 0.0918   Epoch: 1   Global Step: 8460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:33,201-Speed 10840.23 samples/sec   Loss 14.9865   LearningRate 0.0918   Epoch: 1   Global Step: 8470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:34,136-Speed 10961.01 samples/sec   Loss 14.9684   LearningRate 0.0918   Epoch: 1   Global Step: 8480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:35,063-Speed 11054.48 samples/sec   Loss 14.8774   LearningRate 0.0918   Epoch: 1   Global Step: 8490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:36,022-Speed 10687.56 samples/sec   Loss 14.8961   LearningRate 0.0918   Epoch: 1   Global Step: 8500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:37,019-Speed 10272.89 samples/sec   Loss 14.9093   LearningRate 0.0918   Epoch: 1   Global Step: 8510   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:33:38,070-Speed 9753.07 samples/sec   Loss 14.9223   LearningRate 0.0918   Epoch: 1   Global Step: 8520   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:33:39,029-Speed 10691.32 samples/sec   Loss 14.8312   LearningRate 0.0917   Epoch: 1   Global Step: 8530   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:33:40,003-Speed 10519.00 samples/sec   Loss 14.7942   LearningRate 0.0917   Epoch: 1   Global Step: 8540   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:33:40,970-Speed 10603.46 samples/sec   Loss 14.7176   LearningRate 0.0917   Epoch: 1   Global Step: 8550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:41,950-Speed 10458.16 samples/sec   Loss 15.0391   LearningRate 0.0917   Epoch: 1   Global Step: 8560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:42,942-Speed 10335.88 samples/sec   Loss 14.8593   LearningRate 0.0917   Epoch: 1   Global Step: 8570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:43,899-Speed 10709.21 samples/sec   Loss 14.8283   LearningRate 0.0917   Epoch: 1   Global Step: 8580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:44,859-Speed 10684.35 samples/sec   Loss 15.0120   LearningRate 0.0917   Epoch: 1   Global Step: 8590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:45,787-Speed 11037.96 samples/sec   Loss 14.7726   LearningRate 0.0917   Epoch: 1   Global Step: 8600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:46,741-Speed 10742.51 samples/sec   Loss 14.9584   LearningRate 0.0917   Epoch: 1   Global Step: 8610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:47,724-Speed 10430.51 samples/sec   Loss 14.7245   LearningRate 0.0917   Epoch: 1   Global Step: 8620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:48,691-Speed 10600.61 samples/sec   Loss 14.7289   LearningRate 0.0917   Epoch: 1   Global Step: 8630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:49,673-Speed 10441.06 samples/sec   Loss 15.0300   LearningRate 0.0916   Epoch: 1   Global Step: 8640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:50,596-Speed 11097.54 samples/sec   Loss 14.9133   LearningRate 0.0916   Epoch: 1   Global Step: 8650   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:33:51,593-Speed 10285.33 samples/sec   Loss 14.7223   LearningRate 0.0916   Epoch: 1   Global Step: 8660   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:33:52,551-Speed 10700.10 samples/sec   Loss 14.8335   LearningRate 0.0916   Epoch: 1   Global Step: 8670   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:33:53,510-Speed 10690.90 samples/sec   Loss 14.7056   LearningRate 0.0916   Epoch: 1   Global Step: 8680   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:33:54,482-Speed 10543.50 samples/sec   Loss 14.7482   LearningRate 0.0916   Epoch: 1   Global Step: 8690   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:33:55,405-Speed 11103.37 samples/sec   Loss 14.8029   LearningRate 0.0916   Epoch: 1   Global Step: 8700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:56,394-Speed 10362.92 samples/sec   Loss 14.9347   LearningRate 0.0916   Epoch: 1   Global Step: 8710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:33:57,360-Speed 10605.95 samples/sec   Loss 14.8529   LearningRate 0.0916   Epoch: 1   Global Step: 8720   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:33:58,317-Speed 10706.42 samples/sec   Loss 14.8463   LearningRate 0.0916   Epoch: 1   Global Step: 8730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:33:59,303-Speed 10401.69 samples/sec   Loss 14.9620   LearningRate 0.0915   Epoch: 1   Global Step: 8740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:34:00,267-Speed 10633.43 samples/sec   Loss 14.8273   LearningRate 0.0915   Epoch: 1   Global Step: 8750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:34:01,261-Speed 10308.66 samples/sec   Loss 14.6735   LearningRate 0.0915   Epoch: 1   Global Step: 8760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:34:02,228-Speed 10601.97 samples/sec   Loss 14.6517   LearningRate 0.0915   Epoch: 1   Global Step: 8770   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:34:03,182-Speed 10751.24 samples/sec   Loss 14.5978   LearningRate 0.0915   Epoch: 1   Global Step: 8780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:34:04,169-Speed 10377.99 samples/sec   Loss 14.5278   LearningRate 0.0915   Epoch: 1   Global Step: 8790   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:34:05,145-Speed 10507.00 samples/sec   Loss 14.9603   LearningRate 0.0915   Epoch: 1   Global Step: 8800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:34:06,109-Speed 10622.98 samples/sec   Loss 14.6939   LearningRate 0.0915   Epoch: 1   Global Step: 8810   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:34:07,120-Speed 10146.41 samples/sec   Loss 14.5866   LearningRate 0.0915   Epoch: 1   Global Step: 8820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:34:08,073-Speed 10751.11 samples/sec   Loss 14.4634   LearningRate 0.0915   Epoch: 1   Global Step: 8830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:34:09,030-Speed 10709.22 samples/sec   Loss 14.7415   LearningRate 0.0915   Epoch: 1   Global Step: 8840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:34:09,998-Speed 10589.75 samples/sec   Loss 14.7788   LearningRate 0.0914   Epoch: 1   Global Step: 8850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:34:10,987-Speed 10362.10 samples/sec   Loss 14.5207   LearningRate 0.0914   Epoch: 1   Global Step: 8860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:34:11,943-Speed 10718.59 samples/sec   Loss 14.8550   LearningRate 0.0914   Epoch: 1   Global Step: 8870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:34:12,887-Speed 10856.71 samples/sec   Loss 14.5452   LearningRate 0.0914   Epoch: 1   Global Step: 8880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:34:13,836-Speed 10801.61 samples/sec   Loss 14.6458   LearningRate 0.0914   Epoch: 1   Global Step: 8890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:34:14,791-Speed 10731.15 samples/sec   Loss 14.6164   LearningRate 0.0914   Epoch: 1   Global Step: 8900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:34:15,764-Speed 10537.89 samples/sec   Loss 14.7339   LearningRate 0.0914   Epoch: 1   Global Step: 8910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:34:16,769-Speed 10197.02 samples/sec   Loss 14.7151   LearningRate 0.0914   Epoch: 1   Global Step: 8920   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:34:17,767-Speed 10270.21 samples/sec   Loss 14.5746   LearningRate 0.0914   Epoch: 1   Global Step: 8930   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:34:18,709-Speed 10888.63 samples/sec   Loss 14.6715   LearningRate 0.0914   Epoch: 1   Global Step: 8940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:34:19,685-Speed 10494.89 samples/sec   Loss 14.6759   LearningRate 0.0913   Epoch: 1   Global Step: 8950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:34:20,613-Speed 11048.50 samples/sec   Loss 14.5553   LearningRate 0.0913   Epoch: 1   Global Step: 8960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:34:21,623-Speed 10146.77 samples/sec   Loss 14.7243   LearningRate 0.0913   Epoch: 1   Global Step: 8970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:34:22,609-Speed 10401.32 samples/sec   Loss 14.5959   LearningRate 0.0913   Epoch: 1   Global Step: 8980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:34:23,556-Speed 10829.80 samples/sec   Loss 14.5590   LearningRate 0.0913   Epoch: 1   Global Step: 8990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:34:24,522-Speed 10610.49 samples/sec   Loss 14.6621   LearningRate 0.0913   Epoch: 1   Global Step: 9000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:34:25,496-Speed 10516.75 samples/sec   Loss 14.7600   LearningRate 0.0913   Epoch: 1   Global Step: 9010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:34:26,467-Speed 10564.02 samples/sec   Loss 14.5248   LearningRate 0.0913   Epoch: 1   Global Step: 9020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:34:27,454-Speed 10385.37 samples/sec   Loss 14.6226   LearningRate 0.0913   Epoch: 1   Global Step: 9030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:34:28,396-Speed 10874.92 samples/sec   Loss 14.8411   LearningRate 0.0913   Epoch: 1   Global Step: 9040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:34:29,375-Speed 10465.83 samples/sec   Loss 14.5036   LearningRate 0.0913   Epoch: 1   Global Step: 9050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:34:30,369-Speed 10311.99 samples/sec   Loss 14.5397   LearningRate 0.0912   Epoch: 1   Global Step: 9060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:34:31,331-Speed 10656.15 samples/sec   Loss 14.6508   LearningRate 0.0912   Epoch: 1   Global Step: 9070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:34:32,318-Speed 10386.43 samples/sec   Loss 14.5544   LearningRate 0.0912   Epoch: 1   Global Step: 9080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:34:33,276-Speed 10702.46 samples/sec   Loss 14.7069   LearningRate 0.0912   Epoch: 1   Global Step: 9090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:34:34,235-Speed 10683.37 samples/sec   Loss 14.4459   LearningRate 0.0912   Epoch: 1   Global Step: 9100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:34:35,184-Speed 10800.51 samples/sec   Loss 14.5210   LearningRate 0.0912   Epoch: 1   Global Step: 9110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:34:36,151-Speed 10594.44 samples/sec   Loss 14.4485   LearningRate 0.0912   Epoch: 1   Global Step: 9120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:34:37,098-Speed 10822.57 samples/sec   Loss 14.7372   LearningRate 0.0912   Epoch: 1   Global Step: 9130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:34:38,065-Speed 10606.17 samples/sec   Loss 14.6585   LearningRate 0.0912   Epoch: 1   Global Step: 9140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:34:38,991-Speed 11068.92 samples/sec   Loss 14.5351   LearningRate 0.0912   Epoch: 1   Global Step: 9150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:34:39,986-Speed 10301.89 samples/sec   Loss 14.6320   LearningRate 0.0912   Epoch: 1   Global Step: 9160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:34:40,935-Speed 10798.24 samples/sec   Loss 14.4693   LearningRate 0.0911   Epoch: 1   Global Step: 9170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:34:41,878-Speed 10868.94 samples/sec   Loss 14.4275   LearningRate 0.0911   Epoch: 1   Global Step: 9180   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:34:42,844-Speed 10610.77 samples/sec   Loss 14.4787   LearningRate 0.0911   Epoch: 1   Global Step: 9190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:34:43,812-Speed 10585.51 samples/sec   Loss 14.4703   LearningRate 0.0911   Epoch: 1   Global Step: 9200   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:34:44,782-Speed 10568.52 samples/sec   Loss 14.5751   LearningRate 0.0911   Epoch: 1   Global Step: 9210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:34:45,737-Speed 10736.58 samples/sec   Loss 14.5608   LearningRate 0.0911   Epoch: 1   Global Step: 9220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:34:46,697-Speed 10671.16 samples/sec   Loss 14.4542   LearningRate 0.0911   Epoch: 1   Global Step: 9230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:34:47,690-Speed 10329.12 samples/sec   Loss 14.4455   LearningRate 0.0911   Epoch: 1   Global Step: 9240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:34:48,647-Speed 10708.12 samples/sec   Loss 14.5786   LearningRate 0.0911   Epoch: 1   Global Step: 9250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:34:49,615-Speed 10590.94 samples/sec   Loss 14.4702   LearningRate 0.0911   Epoch: 1   Global Step: 9260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:34:50,586-Speed 10553.73 samples/sec   Loss 14.3497   LearningRate 0.0910   Epoch: 1   Global Step: 9270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:34:51,559-Speed 10533.62 samples/sec   Loss 14.2904   LearningRate 0.0910   Epoch: 1   Global Step: 9280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:34:52,516-Speed 10712.47 samples/sec   Loss 14.5184   LearningRate 0.0910   Epoch: 1   Global Step: 9290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:34:53,480-Speed 10635.24 samples/sec   Loss 14.3867   LearningRate 0.0910   Epoch: 1   Global Step: 9300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:34:54,465-Speed 10398.20 samples/sec   Loss 14.5645   LearningRate 0.0910   Epoch: 1   Global Step: 9310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:34:55,431-Speed 10613.05 samples/sec   Loss 14.5235   LearningRate 0.0910   Epoch: 1   Global Step: 9320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:34:56,468-Speed 9886.78 samples/sec   Loss 14.3238   LearningRate 0.0910   Epoch: 1   Global Step: 9330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:34:57,410-Speed 10892.97 samples/sec   Loss 14.4400   LearningRate 0.0910   Epoch: 1   Global Step: 9340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:34:58,338-Speed 11045.70 samples/sec   Loss 14.3908   LearningRate 0.0910   Epoch: 1   Global Step: 9350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:34:59,247-Speed 11282.94 samples/sec   Loss 14.6105   LearningRate 0.0910   Epoch: 1   Global Step: 9360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:35:00,193-Speed 10835.73 samples/sec   Loss 14.3018   LearningRate 0.0910   Epoch: 1   Global Step: 9370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:35:01,176-Speed 10421.45 samples/sec   Loss 14.4835   LearningRate 0.0909   Epoch: 1   Global Step: 9380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:35:02,114-Speed 10936.39 samples/sec   Loss 14.5638   LearningRate 0.0909   Epoch: 1   Global Step: 9390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:35:03,084-Speed 10555.41 samples/sec   Loss 14.3280   LearningRate 0.0909   Epoch: 1   Global Step: 9400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:04,048-Speed 10638.03 samples/sec   Loss 14.3932   LearningRate 0.0909   Epoch: 1   Global Step: 9410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:05,026-Speed 10479.16 samples/sec   Loss 14.3061   LearningRate 0.0909   Epoch: 1   Global Step: 9420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:05,983-Speed 10705.83 samples/sec   Loss 14.4678   LearningRate 0.0909   Epoch: 1   Global Step: 9430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:06,948-Speed 10623.25 samples/sec   Loss 14.6685   LearningRate 0.0909   Epoch: 1   Global Step: 9440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:07,919-Speed 10543.43 samples/sec   Loss 14.2448   LearningRate 0.0909   Epoch: 1   Global Step: 9450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:08,861-Speed 10890.85 samples/sec   Loss 14.5719   LearningRate 0.0909   Epoch: 1   Global Step: 9460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:09,806-Speed 10845.29 samples/sec   Loss 14.4257   LearningRate 0.0909   Epoch: 1   Global Step: 9470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:10,785-Speed 10468.39 samples/sec   Loss 14.3991   LearningRate 0.0908   Epoch: 1   Global Step: 9480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:11,788-Speed 10222.37 samples/sec   Loss 14.3935   LearningRate 0.0908   Epoch: 1   Global Step: 9490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:12,789-Speed 10240.96 samples/sec   Loss 14.3936   LearningRate 0.0908   Epoch: 1   Global Step: 9500   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:35:13,757-Speed 10595.81 samples/sec   Loss 14.2856   LearningRate 0.0908   Epoch: 1   Global Step: 9510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:14,727-Speed 10566.15 samples/sec   Loss 14.3895   LearningRate 0.0908   Epoch: 1   Global Step: 9520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:15,704-Speed 10487.85 samples/sec   Loss 14.4199   LearningRate 0.0908   Epoch: 1   Global Step: 9530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:16,659-Speed 10724.26 samples/sec   Loss 14.2483   LearningRate 0.0908   Epoch: 1   Global Step: 9540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:17,655-Speed 10298.67 samples/sec   Loss 14.1601   LearningRate 0.0908   Epoch: 1   Global Step: 9550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:18,609-Speed 10745.45 samples/sec   Loss 14.4703   LearningRate 0.0908   Epoch: 1   Global Step: 9560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:19,582-Speed 10536.87 samples/sec   Loss 14.1306   LearningRate 0.0908   Epoch: 1   Global Step: 9570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:20,564-Speed 10430.75 samples/sec   Loss 14.3956   LearningRate 0.0908   Epoch: 1   Global Step: 9580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:21,569-Speed 10199.70 samples/sec   Loss 14.5748   LearningRate 0.0907   Epoch: 1   Global Step: 9590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:22,545-Speed 10509.00 samples/sec   Loss 14.4161   LearningRate 0.0907   Epoch: 1   Global Step: 9600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:23,507-Speed 10650.15 samples/sec   Loss 14.2849   LearningRate 0.0907   Epoch: 1   Global Step: 9610   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:35:24,470-Speed 10640.85 samples/sec   Loss 14.1678   LearningRate 0.0907   Epoch: 1   Global Step: 9620   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:35:25,414-Speed 10854.90 samples/sec   Loss 14.0936   LearningRate 0.0907   Epoch: 1   Global Step: 9630   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:35:26,376-Speed 10657.46 samples/sec   Loss 14.3305   LearningRate 0.0907   Epoch: 1   Global Step: 9640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:27,375-Speed 10262.82 samples/sec   Loss 14.3008   LearningRate 0.0907   Epoch: 1   Global Step: 9650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:28,318-Speed 10892.76 samples/sec   Loss 14.3025   LearningRate 0.0907   Epoch: 1   Global Step: 9660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:29,299-Speed 10445.94 samples/sec   Loss 14.1791   LearningRate 0.0907   Epoch: 1   Global Step: 9670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:30,282-Speed 10438.53 samples/sec   Loss 14.1900   LearningRate 0.0907   Epoch: 1   Global Step: 9680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:31,218-Speed 10943.53 samples/sec   Loss 14.3288   LearningRate 0.0907   Epoch: 1   Global Step: 9690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:32,191-Speed 10536.12 samples/sec   Loss 14.3881   LearningRate 0.0906   Epoch: 1   Global Step: 9700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:33,149-Speed 10699.91 samples/sec   Loss 14.2023   LearningRate 0.0906   Epoch: 1   Global Step: 9710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:34,132-Speed 10429.30 samples/sec   Loss 14.2775   LearningRate 0.0906   Epoch: 1   Global Step: 9720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:35,127-Speed 10295.89 samples/sec   Loss 14.1419   LearningRate 0.0906   Epoch: 1   Global Step: 9730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:36,080-Speed 10780.79 samples/sec   Loss 14.1218   LearningRate 0.0906   Epoch: 1   Global Step: 9740   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:35:37,031-Speed 10782.92 samples/sec   Loss 14.2440   LearningRate 0.0906   Epoch: 1   Global Step: 9750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:37,990-Speed 10683.01 samples/sec   Loss 14.1727   LearningRate 0.0906   Epoch: 1   Global Step: 9760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:38,944-Speed 10751.69 samples/sec   Loss 14.0879   LearningRate 0.0906   Epoch: 1   Global Step: 9770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:39,914-Speed 10557.97 samples/sec   Loss 14.3311   LearningRate 0.0906   Epoch: 1   Global Step: 9780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:35:40,854-Speed 10904.88 samples/sec   Loss 14.2902   LearningRate 0.0906   Epoch: 1   Global Step: 9790   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:35:41,805-Speed 10783.79 samples/sec   Loss 14.4167   LearningRate 0.0905   Epoch: 1   Global Step: 9800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:35:42,750-Speed 10851.79 samples/sec   Loss 14.2176   LearningRate 0.0905   Epoch: 1   Global Step: 9810   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:35:43,717-Speed 10592.40 samples/sec   Loss 14.1835   LearningRate 0.0905   Epoch: 1   Global Step: 9820   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:35:44,696-Speed 10467.85 samples/sec   Loss 14.1064   LearningRate 0.0905   Epoch: 1   Global Step: 9830   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:35:45,676-Speed 10454.33 samples/sec   Loss 14.2036   LearningRate 0.0905   Epoch: 1   Global Step: 9840   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:35:46,645-Speed 10578.13 samples/sec   Loss 14.3191   LearningRate 0.0905   Epoch: 1   Global Step: 9850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:35:47,656-Speed 10142.47 samples/sec   Loss 14.4165   LearningRate 0.0905   Epoch: 1   Global Step: 9860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:35:48,601-Speed 10847.05 samples/sec   Loss 14.0990   LearningRate 0.0905   Epoch: 1   Global Step: 9870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:35:49,557-Speed 10720.19 samples/sec   Loss 14.2110   LearningRate 0.0905   Epoch: 1   Global Step: 9880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:50,514-Speed 10711.96 samples/sec   Loss 14.2855   LearningRate 0.0905   Epoch: 1   Global Step: 9890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:51,488-Speed 10527.85 samples/sec   Loss 14.1695   LearningRate 0.0905   Epoch: 1   Global Step: 9900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:52,472-Speed 10418.27 samples/sec   Loss 14.1827   LearningRate 0.0904   Epoch: 1   Global Step: 9910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:53,509-Speed 9883.84 samples/sec   Loss 14.0926   LearningRate 0.0904   Epoch: 1   Global Step: 9920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:54,490-Speed 10445.33 samples/sec   Loss 14.0301   LearningRate 0.0904   Epoch: 1   Global Step: 9930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:55,453-Speed 10648.98 samples/sec   Loss 14.3917   LearningRate 0.0904   Epoch: 1   Global Step: 9940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:56,419-Speed 10607.81 samples/sec   Loss 14.0965   LearningRate 0.0904   Epoch: 1   Global Step: 9950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:57,392-Speed 10540.48 samples/sec   Loss 14.0887   LearningRate 0.0904   Epoch: 1   Global Step: 9960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:58,361-Speed 10570.17 samples/sec   Loss 13.9323   LearningRate 0.0904   Epoch: 1   Global Step: 9970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:35:59,313-Speed 10773.29 samples/sec   Loss 14.0698   LearningRate 0.0904   Epoch: 1   Global Step: 9980   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:36:00,295-Speed 10436.71 samples/sec   Loss 14.2415   LearningRate 0.0904   Epoch: 1   Global Step: 9990   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:36:01,250-Speed 10738.67 samples/sec   Loss 14.0466   LearningRate 0.0904   Epoch: 1   Global Step: 10000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:36:23,642-[lfw][10000]XNorm: 16.042208
Training: 2022-04-10 23:36:23,643-[lfw][10000]Accuracy-Flip: 0.99100+-0.00549
Training: 2022-04-10 23:36:23,643-[lfw][10000]Accuracy-Highest: 0.99100
Training: 2022-04-10 23:36:49,175-[cfp_fp][10000]XNorm: 13.525256
Training: 2022-04-10 23:36:49,176-[cfp_fp][10000]Accuracy-Flip: 0.92014+-0.01684
Training: 2022-04-10 23:36:49,176-[cfp_fp][10000]Accuracy-Highest: 0.92014
Training: 2022-04-10 23:37:11,315-[agedb_30][10000]XNorm: 15.606682
Training: 2022-04-10 23:37:11,315-[agedb_30][10000]Accuracy-Flip: 0.92083+-0.01974
Training: 2022-04-10 23:37:11,316-[agedb_30][10000]Accuracy-Highest: 0.92083
Training: 2022-04-10 23:37:12,246-Speed 144.24 samples/sec   Loss 14.0638   LearningRate 0.0903   Epoch: 1   Global Step: 10010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:37:13,216-Speed 10562.41 samples/sec   Loss 14.1164   LearningRate 0.0903   Epoch: 1   Global Step: 10020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:37:14,200-Speed 10423.24 samples/sec   Loss 14.0781   LearningRate 0.0903   Epoch: 1   Global Step: 10030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:37:15,162-Speed 10649.46 samples/sec   Loss 14.0870   LearningRate 0.0903   Epoch: 1   Global Step: 10040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:37:16,134-Speed 10543.14 samples/sec   Loss 14.1636   LearningRate 0.0903   Epoch: 1   Global Step: 10050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:37:17,079-Speed 10850.24 samples/sec   Loss 13.9407   LearningRate 0.0903   Epoch: 1   Global Step: 10060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:37:18,055-Speed 10498.73 samples/sec   Loss 14.0827   LearningRate 0.0903   Epoch: 1   Global Step: 10070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:37:19,106-Speed 9752.30 samples/sec   Loss 14.1129   LearningRate 0.0903   Epoch: 1   Global Step: 10080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:37:20,120-Speed 10119.22 samples/sec   Loss 14.0586   LearningRate 0.0903   Epoch: 1   Global Step: 10090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:37:21,099-Speed 10470.91 samples/sec   Loss 14.1899   LearningRate 0.0903   Epoch: 1   Global Step: 10100   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:37:22,248-Speed 8915.18 samples/sec   Loss 13.9982   LearningRate 0.0903   Epoch: 1   Global Step: 10110   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:37:32,798-Speed 970.72 samples/sec   Loss 13.4997   LearningRate 0.0902   Epoch: 2   Global Step: 10120   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:37:34,180-Speed 7418.25 samples/sec   Loss 13.0494   LearningRate 0.0902   Epoch: 2   Global Step: 10130   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:37:35,216-Speed 9902.70 samples/sec   Loss 13.3355   LearningRate 0.0902   Epoch: 2   Global Step: 10140   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:37:36,314-Speed 9337.55 samples/sec   Loss 13.3362   LearningRate 0.0902   Epoch: 2   Global Step: 10150   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:37:37,327-Speed 10126.97 samples/sec   Loss 12.9588   LearningRate 0.0902   Epoch: 2   Global Step: 10160   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:37:38,277-Speed 10783.95 samples/sec   Loss 13.0278   LearningRate 0.0902   Epoch: 2   Global Step: 10170   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:37:39,284-Speed 10175.04 samples/sec   Loss 13.1786   LearningRate 0.0902   Epoch: 2   Global Step: 10180   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:37:40,296-Speed 10133.24 samples/sec   Loss 13.1422   LearningRate 0.0902   Epoch: 2   Global Step: 10190   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:37:41,189-Speed 11479.17 samples/sec   Loss 13.1383   LearningRate 0.0902   Epoch: 2   Global Step: 10200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:37:42,241-Speed 9742.48 samples/sec   Loss 13.1433   LearningRate 0.0902   Epoch: 2   Global Step: 10210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:37:43,262-Speed 10037.74 samples/sec   Loss 13.2103   LearningRate 0.0902   Epoch: 2   Global Step: 10220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:37:44,290-Speed 9988.44 samples/sec   Loss 13.1688   LearningRate 0.0901   Epoch: 2   Global Step: 10230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:37:45,560-Speed 8068.10 samples/sec   Loss 13.1984   LearningRate 0.0901   Epoch: 2   Global Step: 10240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:37:46,709-Speed 8924.34 samples/sec   Loss 12.9766   LearningRate 0.0901   Epoch: 2   Global Step: 10250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:37:47,772-Speed 9640.01 samples/sec   Loss 13.1153   LearningRate 0.0901   Epoch: 2   Global Step: 10260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:37:48,797-Speed 10011.22 samples/sec   Loss 13.2587   LearningRate 0.0901   Epoch: 2   Global Step: 10270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:37:49,731-Speed 10980.48 samples/sec   Loss 13.2376   LearningRate 0.0901   Epoch: 2   Global Step: 10280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:37:50,761-Speed 9946.68 samples/sec   Loss 13.3021   LearningRate 0.0901   Epoch: 2   Global Step: 10290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:37:51,738-Speed 10503.45 samples/sec   Loss 13.3113   LearningRate 0.0901   Epoch: 2   Global Step: 10300   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:37:52,700-Speed 10653.63 samples/sec   Loss 13.1802   LearningRate 0.0901   Epoch: 2   Global Step: 10310   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:37:53,682-Speed 10443.33 samples/sec   Loss 13.1891   LearningRate 0.0901   Epoch: 2   Global Step: 10320   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:37:54,650-Speed 10589.51 samples/sec   Loss 13.2765   LearningRate 0.0900   Epoch: 2   Global Step: 10330   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:37:55,626-Speed 10501.02 samples/sec   Loss 13.2699   LearningRate 0.0900   Epoch: 2   Global Step: 10340   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:37:56,658-Speed 9934.18 samples/sec   Loss 13.3100   LearningRate 0.0900   Epoch: 2   Global Step: 10350   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:37:57,707-Speed 9772.19 samples/sec   Loss 13.4001   LearningRate 0.0900   Epoch: 2   Global Step: 10360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:37:58,722-Speed 10094.34 samples/sec   Loss 13.2729   LearningRate 0.0900   Epoch: 2   Global Step: 10370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:37:59,730-Speed 10163.15 samples/sec   Loss 13.3622   LearningRate 0.0900   Epoch: 2   Global Step: 10380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:00,702-Speed 10547.18 samples/sec   Loss 13.1585   LearningRate 0.0900   Epoch: 2   Global Step: 10390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:01,689-Speed 10388.85 samples/sec   Loss 13.2737   LearningRate 0.0900   Epoch: 2   Global Step: 10400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:02,661-Speed 10542.09 samples/sec   Loss 13.4141   LearningRate 0.0900   Epoch: 2   Global Step: 10410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:03,660-Speed 10264.21 samples/sec   Loss 13.2586   LearningRate 0.0900   Epoch: 2   Global Step: 10420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:04,730-Speed 9571.31 samples/sec   Loss 13.5449   LearningRate 0.0900   Epoch: 2   Global Step: 10430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:05,696-Speed 10613.03 samples/sec   Loss 13.3030   LearningRate 0.0899   Epoch: 2   Global Step: 10440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:06,756-Speed 9668.25 samples/sec   Loss 13.2545   LearningRate 0.0899   Epoch: 2   Global Step: 10450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:07,731-Speed 10509.25 samples/sec   Loss 13.1909   LearningRate 0.0899   Epoch: 2   Global Step: 10460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:08,760-Speed 9964.60 samples/sec   Loss 13.1656   LearningRate 0.0899   Epoch: 2   Global Step: 10470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:09,764-Speed 10228.77 samples/sec   Loss 13.2997   LearningRate 0.0899   Epoch: 2   Global Step: 10480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:10,821-Speed 9693.25 samples/sec   Loss 13.1425   LearningRate 0.0899   Epoch: 2   Global Step: 10490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:38:11,812-Speed 10345.06 samples/sec   Loss 13.3663   LearningRate 0.0899   Epoch: 2   Global Step: 10500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:38:12,841-Speed 9968.94 samples/sec   Loss 13.2345   LearningRate 0.0899   Epoch: 2   Global Step: 10510   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:38:13,830-Speed 10354.12 samples/sec   Loss 13.2486   LearningRate 0.0899   Epoch: 2   Global Step: 10520   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:38:14,809-Speed 10474.46 samples/sec   Loss 13.3305   LearningRate 0.0899   Epoch: 2   Global Step: 10530   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:38:15,902-Speed 9377.34 samples/sec   Loss 13.4594   LearningRate 0.0899   Epoch: 2   Global Step: 10540   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:38:17,012-Speed 9231.04 samples/sec   Loss 13.4051   LearningRate 0.0898   Epoch: 2   Global Step: 10550   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:38:18,097-Speed 9447.40 samples/sec   Loss 13.4302   LearningRate 0.0898   Epoch: 2   Global Step: 10560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:38:19,243-Speed 8935.56 samples/sec   Loss 13.3616   LearningRate 0.0898   Epoch: 2   Global Step: 10570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:38:20,192-Speed 10810.34 samples/sec   Loss 13.4351   LearningRate 0.0898   Epoch: 2   Global Step: 10580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:38:21,243-Speed 9748.03 samples/sec   Loss 13.3570   LearningRate 0.0898   Epoch: 2   Global Step: 10590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:22,372-Speed 9083.33 samples/sec   Loss 13.2943   LearningRate 0.0898   Epoch: 2   Global Step: 10600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:23,367-Speed 10298.66 samples/sec   Loss 13.4347   LearningRate 0.0898   Epoch: 2   Global Step: 10610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:24,329-Speed 10655.01 samples/sec   Loss 13.4187   LearningRate 0.0898   Epoch: 2   Global Step: 10620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:25,295-Speed 10616.29 samples/sec   Loss 13.2934   LearningRate 0.0898   Epoch: 2   Global Step: 10630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:26,263-Speed 10585.46 samples/sec   Loss 13.3413   LearningRate 0.0898   Epoch: 2   Global Step: 10640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:27,274-Speed 10139.30 samples/sec   Loss 13.5531   LearningRate 0.0897   Epoch: 2   Global Step: 10650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:28,237-Speed 10645.09 samples/sec   Loss 13.3793   LearningRate 0.0897   Epoch: 2   Global Step: 10660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:29,199-Speed 10657.07 samples/sec   Loss 13.4474   LearningRate 0.0897   Epoch: 2   Global Step: 10670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:30,177-Speed 10479.32 samples/sec   Loss 13.5578   LearningRate 0.0897   Epoch: 2   Global Step: 10680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:31,118-Speed 10895.54 samples/sec   Loss 13.5534   LearningRate 0.0897   Epoch: 2   Global Step: 10690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:32,158-Speed 9857.62 samples/sec   Loss 13.6503   LearningRate 0.0897   Epoch: 2   Global Step: 10700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:33,126-Speed 10588.81 samples/sec   Loss 13.4316   LearningRate 0.0897   Epoch: 2   Global Step: 10710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:34,145-Speed 10055.36 samples/sec   Loss 13.4761   LearningRate 0.0897   Epoch: 2   Global Step: 10720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:35,208-Speed 9639.47 samples/sec   Loss 13.5167   LearningRate 0.0897   Epoch: 2   Global Step: 10730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:36,270-Speed 9653.31 samples/sec   Loss 13.4148   LearningRate 0.0897   Epoch: 2   Global Step: 10740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:37,347-Speed 9513.50 samples/sec   Loss 13.5275   LearningRate 0.0897   Epoch: 2   Global Step: 10750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:38,316-Speed 10580.04 samples/sec   Loss 13.5516   LearningRate 0.0896   Epoch: 2   Global Step: 10760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:39,282-Speed 10607.32 samples/sec   Loss 13.5856   LearningRate 0.0896   Epoch: 2   Global Step: 10770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:40,268-Speed 10398.44 samples/sec   Loss 13.5518   LearningRate 0.0896   Epoch: 2   Global Step: 10780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:41,339-Speed 9567.83 samples/sec   Loss 13.5990   LearningRate 0.0896   Epoch: 2   Global Step: 10790   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:38:42,332-Speed 10326.80 samples/sec   Loss 13.3495   LearningRate 0.0896   Epoch: 2   Global Step: 10800   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:38:43,308-Speed 10508.66 samples/sec   Loss 13.3644   LearningRate 0.0896   Epoch: 2   Global Step: 10810   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:38:44,303-Speed 10321.84 samples/sec   Loss 13.6531   LearningRate 0.0896   Epoch: 2   Global Step: 10820   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:38:45,388-Speed 9444.76 samples/sec   Loss 13.5200   LearningRate 0.0896   Epoch: 2   Global Step: 10830   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:38:46,371-Speed 10425.03 samples/sec   Loss 13.3970   LearningRate 0.0896   Epoch: 2   Global Step: 10840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:47,357-Speed 10397.20 samples/sec   Loss 13.3430   LearningRate 0.0896   Epoch: 2   Global Step: 10850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:48,355-Speed 10273.44 samples/sec   Loss 13.6504   LearningRate 0.0896   Epoch: 2   Global Step: 10860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:49,295-Speed 10907.76 samples/sec   Loss 13.3659   LearningRate 0.0895   Epoch: 2   Global Step: 10870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:50,282-Speed 10378.80 samples/sec   Loss 13.3222   LearningRate 0.0895   Epoch: 2   Global Step: 10880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:51,246-Speed 10641.22 samples/sec   Loss 13.3341   LearningRate 0.0895   Epoch: 2   Global Step: 10890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:52,225-Speed 10474.18 samples/sec   Loss 13.4640   LearningRate 0.0895   Epoch: 2   Global Step: 10900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:53,202-Speed 10491.14 samples/sec   Loss 13.5412   LearningRate 0.0895   Epoch: 2   Global Step: 10910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:54,206-Speed 10208.35 samples/sec   Loss 13.3677   LearningRate 0.0895   Epoch: 2   Global Step: 10920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:55,197-Speed 10338.11 samples/sec   Loss 13.3152   LearningRate 0.0895   Epoch: 2   Global Step: 10930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:38:56,263-Speed 9605.36 samples/sec   Loss 13.5865   LearningRate 0.0895   Epoch: 2   Global Step: 10940   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:38:57,232-Speed 10580.49 samples/sec   Loss 13.3263   LearningRate 0.0895   Epoch: 2   Global Step: 10950   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:38:58,268-Speed 9895.77 samples/sec   Loss 13.5026   LearningRate 0.0895   Epoch: 2   Global Step: 10960   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:38:59,270-Speed 10227.41 samples/sec   Loss 13.3908   LearningRate 0.0894   Epoch: 2   Global Step: 10970   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:39:00,303-Speed 9920.25 samples/sec   Loss 13.5130   LearningRate 0.0894   Epoch: 2   Global Step: 10980   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:39:01,397-Speed 9367.58 samples/sec   Loss 13.5478   LearningRate 0.0894   Epoch: 2   Global Step: 10990   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:39:02,401-Speed 10211.96 samples/sec   Loss 13.5076   LearningRate 0.0894   Epoch: 2   Global Step: 11000   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:39:03,359-Speed 10702.31 samples/sec   Loss 13.4515   LearningRate 0.0894   Epoch: 2   Global Step: 11010   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:39:04,316-Speed 10704.79 samples/sec   Loss 13.4610   LearningRate 0.0894   Epoch: 2   Global Step: 11020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:39:05,363-Speed 9786.89 samples/sec   Loss 13.5297   LearningRate 0.0894   Epoch: 2   Global Step: 11030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:39:06,359-Speed 10296.96 samples/sec   Loss 13.4346   LearningRate 0.0894   Epoch: 2   Global Step: 11040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:39:07,373-Speed 10108.06 samples/sec   Loss 13.5554   LearningRate 0.0894   Epoch: 2   Global Step: 11050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:39:08,350-Speed 10492.12 samples/sec   Loss 13.5904   LearningRate 0.0894   Epoch: 2   Global Step: 11060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:39:09,342-Speed 10326.11 samples/sec   Loss 13.4087   LearningRate 0.0894   Epoch: 2   Global Step: 11070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:39:10,322-Speed 10459.65 samples/sec   Loss 13.6351   LearningRate 0.0893   Epoch: 2   Global Step: 11080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:39:11,249-Speed 11055.84 samples/sec   Loss 13.3148   LearningRate 0.0893   Epoch: 2   Global Step: 11090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:39:12,264-Speed 10104.73 samples/sec   Loss 13.3746   LearningRate 0.0893   Epoch: 2   Global Step: 11100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:39:13,251-Speed 10377.30 samples/sec   Loss 13.3756   LearningRate 0.0893   Epoch: 2   Global Step: 11110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:39:14,304-Speed 9735.98 samples/sec   Loss 13.4975   LearningRate 0.0893   Epoch: 2   Global Step: 11120   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:39:15,255-Speed 10781.74 samples/sec   Loss 13.4361   LearningRate 0.0893   Epoch: 2   Global Step: 11130   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:39:16,241-Speed 10388.97 samples/sec   Loss 13.4045   LearningRate 0.0893   Epoch: 2   Global Step: 11140   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:39:17,321-Speed 9525.40 samples/sec   Loss 13.4293   LearningRate 0.0893   Epoch: 2   Global Step: 11150   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:39:18,310-Speed 10358.21 samples/sec   Loss 13.5047   LearningRate 0.0893   Epoch: 2   Global Step: 11160   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:39:19,288-Speed 10484.00 samples/sec   Loss 13.5193   LearningRate 0.0893   Epoch: 2   Global Step: 11170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:39:20,308-Speed 10047.64 samples/sec   Loss 13.5083   LearningRate 0.0893   Epoch: 2   Global Step: 11180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:39:21,294-Speed 10396.54 samples/sec   Loss 13.3476   LearningRate 0.0892   Epoch: 2   Global Step: 11190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:39:22,258-Speed 10631.34 samples/sec   Loss 13.4434   LearningRate 0.0892   Epoch: 2   Global Step: 11200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:39:23,246-Speed 10377.46 samples/sec   Loss 13.4656   LearningRate 0.0892   Epoch: 2   Global Step: 11210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:39:24,228-Speed 10436.26 samples/sec   Loss 13.3853   LearningRate 0.0892   Epoch: 2   Global Step: 11220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:39:25,495-Speed 8086.19 samples/sec   Loss 13.3823   LearningRate 0.0892   Epoch: 2   Global Step: 11230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:39:26,523-Speed 9971.14 samples/sec   Loss 13.3969   LearningRate 0.0892   Epoch: 2   Global Step: 11240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:39:27,593-Speed 9584.29 samples/sec   Loss 13.5644   LearningRate 0.0892   Epoch: 2   Global Step: 11250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:39:28,597-Speed 10203.09 samples/sec   Loss 13.4889   LearningRate 0.0892   Epoch: 2   Global Step: 11260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:39:29,559-Speed 10654.61 samples/sec   Loss 13.6238   LearningRate 0.0892   Epoch: 2   Global Step: 11270   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:39:30,637-Speed 9512.89 samples/sec   Loss 13.3538   LearningRate 0.0892   Epoch: 2   Global Step: 11280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:39:31,601-Speed 10625.70 samples/sec   Loss 13.5979   LearningRate 0.0892   Epoch: 2   Global Step: 11290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:39:32,614-Speed 10117.20 samples/sec   Loss 13.4533   LearningRate 0.0891   Epoch: 2   Global Step: 11300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:39:33,564-Speed 10798.34 samples/sec   Loss 13.3780   LearningRate 0.0891   Epoch: 2   Global Step: 11310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:39:34,552-Speed 10370.45 samples/sec   Loss 13.4357   LearningRate 0.0891   Epoch: 2   Global Step: 11320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:39:35,510-Speed 10707.16 samples/sec   Loss 13.4640   LearningRate 0.0891   Epoch: 2   Global Step: 11330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:39:36,530-Speed 10042.36 samples/sec   Loss 13.5826   LearningRate 0.0891   Epoch: 2   Global Step: 11340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:39:37,728-Speed 8557.42 samples/sec   Loss 13.4457   LearningRate 0.0891   Epoch: 2   Global Step: 11350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:39:38,706-Speed 10489.99 samples/sec   Loss 13.3677   LearningRate 0.0891   Epoch: 2   Global Step: 11360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:39:39,747-Speed 9840.86 samples/sec   Loss 13.3139   LearningRate 0.0891   Epoch: 2   Global Step: 11370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:39:40,713-Speed 10605.50 samples/sec   Loss 13.3159   LearningRate 0.0891   Epoch: 2   Global Step: 11380   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:39:41,671-Speed 10709.37 samples/sec   Loss 13.5439   LearningRate 0.0891   Epoch: 2   Global Step: 11390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:39:42,709-Speed 9866.25 samples/sec   Loss 13.4880   LearningRate 0.0890   Epoch: 2   Global Step: 11400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:39:43,709-Speed 10257.98 samples/sec   Loss 13.4636   LearningRate 0.0890   Epoch: 2   Global Step: 11410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:39:44,714-Speed 10193.80 samples/sec   Loss 13.5386   LearningRate 0.0890   Epoch: 2   Global Step: 11420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:39:45,692-Speed 10482.85 samples/sec   Loss 13.4048   LearningRate 0.0890   Epoch: 2   Global Step: 11430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:39:46,849-Speed 8858.97 samples/sec   Loss 13.3149   LearningRate 0.0890   Epoch: 2   Global Step: 11440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:39:47,846-Speed 10278.59 samples/sec   Loss 13.5890   LearningRate 0.0890   Epoch: 2   Global Step: 11450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:39:48,817-Speed 10562.49 samples/sec   Loss 13.5874   LearningRate 0.0890   Epoch: 2   Global Step: 11460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:39:49,806-Speed 10359.07 samples/sec   Loss 13.3430   LearningRate 0.0890   Epoch: 2   Global Step: 11470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:39:50,849-Speed 9828.71 samples/sec   Loss 13.6050   LearningRate 0.0890   Epoch: 2   Global Step: 11480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:39:51,867-Speed 10069.72 samples/sec   Loss 13.5878   LearningRate 0.0890   Epoch: 2   Global Step: 11490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:39:52,838-Speed 10551.18 samples/sec   Loss 13.3941   LearningRate 0.0890   Epoch: 2   Global Step: 11500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:39:53,898-Speed 9670.23 samples/sec   Loss 13.5238   LearningRate 0.0889   Epoch: 2   Global Step: 11510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:39:54,927-Speed 9955.69 samples/sec   Loss 13.4440   LearningRate 0.0889   Epoch: 2   Global Step: 11520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:39:55,881-Speed 10746.32 samples/sec   Loss 13.3058   LearningRate 0.0889   Epoch: 2   Global Step: 11530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:39:56,875-Speed 10309.22 samples/sec   Loss 13.3769   LearningRate 0.0889   Epoch: 2   Global Step: 11540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:39:57,830-Speed 10738.57 samples/sec   Loss 13.4143   LearningRate 0.0889   Epoch: 2   Global Step: 11550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:39:58,814-Speed 10409.11 samples/sec   Loss 13.2897   LearningRate 0.0889   Epoch: 2   Global Step: 11560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:39:59,853-Speed 9872.18 samples/sec   Loss 13.3854   LearningRate 0.0889   Epoch: 2   Global Step: 11570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:40:00,823-Speed 10564.44 samples/sec   Loss 13.3870   LearningRate 0.0889   Epoch: 2   Global Step: 11580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:40:01,811-Speed 10382.37 samples/sec   Loss 13.2757   LearningRate 0.0889   Epoch: 2   Global Step: 11590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:40:02,767-Speed 10717.06 samples/sec   Loss 13.4965   LearningRate 0.0889   Epoch: 2   Global Step: 11600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:40:03,762-Speed 10301.95 samples/sec   Loss 13.4716   LearningRate 0.0889   Epoch: 2   Global Step: 11610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:40:04,758-Speed 10296.26 samples/sec   Loss 13.2978   LearningRate 0.0888   Epoch: 2   Global Step: 11620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:40:05,835-Speed 9513.86 samples/sec   Loss 13.4220   LearningRate 0.0888   Epoch: 2   Global Step: 11630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:40:06,813-Speed 10475.51 samples/sec   Loss 13.3356   LearningRate 0.0888   Epoch: 2   Global Step: 11640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:40:07,803-Speed 10354.99 samples/sec   Loss 13.2790   LearningRate 0.0888   Epoch: 2   Global Step: 11650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:40:08,872-Speed 9593.39 samples/sec   Loss 13.4197   LearningRate 0.0888   Epoch: 2   Global Step: 11660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:40:09,895-Speed 10010.20 samples/sec   Loss 13.3434   LearningRate 0.0888   Epoch: 2   Global Step: 11670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:40:10,886-Speed 10349.18 samples/sec   Loss 13.1635   LearningRate 0.0888   Epoch: 2   Global Step: 11680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:40:11,920-Speed 9915.05 samples/sec   Loss 13.3887   LearningRate 0.0888   Epoch: 2   Global Step: 11690   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:40:12,901-Speed 10453.03 samples/sec   Loss 13.4932   LearningRate 0.0888   Epoch: 2   Global Step: 11700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:40:13,868-Speed 10601.66 samples/sec   Loss 13.4497   LearningRate 0.0888   Epoch: 2   Global Step: 11710   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:40:14,883-Speed 10095.16 samples/sec   Loss 13.4958   LearningRate 0.0887   Epoch: 2   Global Step: 11720   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:40:15,877-Speed 10309.76 samples/sec   Loss 13.3503   LearningRate 0.0887   Epoch: 2   Global Step: 11730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:40:16,887-Speed 10153.04 samples/sec   Loss 13.1722   LearningRate 0.0887   Epoch: 2   Global Step: 11740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:40:17,947-Speed 9667.67 samples/sec   Loss 13.4606   LearningRate 0.0887   Epoch: 2   Global Step: 11750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:40:19,004-Speed 9703.49 samples/sec   Loss 13.4771   LearningRate 0.0887   Epoch: 2   Global Step: 11760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:40:20,097-Speed 9370.37 samples/sec   Loss 13.1989   LearningRate 0.0887   Epoch: 2   Global Step: 11770   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:40:21,106-Speed 10152.90 samples/sec   Loss 13.3452   LearningRate 0.0887   Epoch: 2   Global Step: 11780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:40:22,059-Speed 10765.20 samples/sec   Loss 13.4350   LearningRate 0.0887   Epoch: 2   Global Step: 11790   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:40:22,996-Speed 10940.80 samples/sec   Loss 13.2993   LearningRate 0.0887   Epoch: 2   Global Step: 11800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:40:23,947-Speed 10772.47 samples/sec   Loss 13.4527   LearningRate 0.0887   Epoch: 2   Global Step: 11810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:40:24,951-Speed 10211.16 samples/sec   Loss 13.4877   LearningRate 0.0887   Epoch: 2   Global Step: 11820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:40:25,998-Speed 9787.37 samples/sec   Loss 13.4558   LearningRate 0.0886   Epoch: 2   Global Step: 11830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:40:26,990-Speed 10352.69 samples/sec   Loss 13.2886   LearningRate 0.0886   Epoch: 2   Global Step: 11840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:40:28,025-Speed 9912.52 samples/sec   Loss 13.3673   LearningRate 0.0886   Epoch: 2   Global Step: 11850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:40:29,047-Speed 10030.88 samples/sec   Loss 13.2183   LearningRate 0.0886   Epoch: 2   Global Step: 11860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:40:30,038-Speed 10350.26 samples/sec   Loss 13.4001   LearningRate 0.0886   Epoch: 2   Global Step: 11870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:40:31,050-Speed 10119.65 samples/sec   Loss 13.1921   LearningRate 0.0886   Epoch: 2   Global Step: 11880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:40:32,060-Speed 10152.44 samples/sec   Loss 13.3564   LearningRate 0.0886   Epoch: 2   Global Step: 11890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:40:33,001-Speed 10892.30 samples/sec   Loss 13.5784   LearningRate 0.0886   Epoch: 2   Global Step: 11900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:40:33,992-Speed 10334.36 samples/sec   Loss 13.2766   LearningRate 0.0886   Epoch: 2   Global Step: 11910   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:40:34,988-Speed 10296.47 samples/sec   Loss 13.4230   LearningRate 0.0886   Epoch: 2   Global Step: 11920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:40:35,933-Speed 10845.04 samples/sec   Loss 13.3504   LearningRate 0.0886   Epoch: 2   Global Step: 11930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:40:36,893-Speed 10673.69 samples/sec   Loss 13.2419   LearningRate 0.0885   Epoch: 2   Global Step: 11940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:40:37,859-Speed 10608.14 samples/sec   Loss 13.2609   LearningRate 0.0885   Epoch: 2   Global Step: 11950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:40:38,815-Speed 10721.41 samples/sec   Loss 13.1858   LearningRate 0.0885   Epoch: 2   Global Step: 11960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:40:39,929-Speed 9196.74 samples/sec   Loss 13.4568   LearningRate 0.0885   Epoch: 2   Global Step: 11970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:40:40,971-Speed 9834.44 samples/sec   Loss 13.2473   LearningRate 0.0885   Epoch: 2   Global Step: 11980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:40:41,987-Speed 10100.05 samples/sec   Loss 13.3744   LearningRate 0.0885   Epoch: 2   Global Step: 11990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:40:42,966-Speed 10466.69 samples/sec   Loss 13.3780   LearningRate 0.0885   Epoch: 2   Global Step: 12000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:41:05,139-[lfw][12000]XNorm: 15.835851
Training: 2022-04-10 23:41:05,140-[lfw][12000]Accuracy-Flip: 0.98933+-0.00564
Training: 2022-04-10 23:41:05,140-[lfw][12000]Accuracy-Highest: 0.99100
Training: 2022-04-10 23:41:30,648-[cfp_fp][12000]XNorm: 13.378788
Training: 2022-04-10 23:41:30,649-[cfp_fp][12000]Accuracy-Flip: 0.91600+-0.01666
Training: 2022-04-10 23:41:30,650-[cfp_fp][12000]Accuracy-Highest: 0.92014
Training: 2022-04-10 23:41:52,740-[agedb_30][12000]XNorm: 15.481708
Training: 2022-04-10 23:41:52,741-[agedb_30][12000]Accuracy-Flip: 0.92917+-0.01812
Training: 2022-04-10 23:41:52,741-[agedb_30][12000]Accuracy-Highest: 0.92917
Training: 2022-04-10 23:41:53,707-Speed 144.76 samples/sec   Loss 13.1986   LearningRate 0.0885   Epoch: 2   Global Step: 12010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:41:54,660-Speed 10754.97 samples/sec   Loss 13.3072   LearningRate 0.0885   Epoch: 2   Global Step: 12020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:41:55,626-Speed 10608.65 samples/sec   Loss 13.3199   LearningRate 0.0885   Epoch: 2   Global Step: 12030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:41:56,588-Speed 10656.03 samples/sec   Loss 13.3721   LearningRate 0.0885   Epoch: 2   Global Step: 12040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:41:57,562-Speed 10528.86 samples/sec   Loss 13.3654   LearningRate 0.0884   Epoch: 2   Global Step: 12050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:41:58,544-Speed 10438.40 samples/sec   Loss 13.3170   LearningRate 0.0884   Epoch: 2   Global Step: 12060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:41:59,503-Speed 10687.86 samples/sec   Loss 13.4355   LearningRate 0.0884   Epoch: 2   Global Step: 12070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:42:00,465-Speed 10660.93 samples/sec   Loss 13.3728   LearningRate 0.0884   Epoch: 2   Global Step: 12080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:42:01,414-Speed 10793.19 samples/sec   Loss 13.3750   LearningRate 0.0884   Epoch: 2   Global Step: 12090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:42:02,387-Speed 10533.23 samples/sec   Loss 13.3324   LearningRate 0.0884   Epoch: 2   Global Step: 12100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:42:03,312-Speed 11083.91 samples/sec   Loss 13.3683   LearningRate 0.0884   Epoch: 2   Global Step: 12110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:42:04,279-Speed 10603.01 samples/sec   Loss 13.1992   LearningRate 0.0884   Epoch: 2   Global Step: 12120   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:42:05,239-Speed 10678.39 samples/sec   Loss 13.2957   LearningRate 0.0884   Epoch: 2   Global Step: 12130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:42:06,224-Speed 10406.05 samples/sec   Loss 13.2837   LearningRate 0.0884   Epoch: 2   Global Step: 12140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:42:07,195-Speed 10549.37 samples/sec   Loss 13.3682   LearningRate 0.0883   Epoch: 2   Global Step: 12150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:42:08,147-Speed 10775.48 samples/sec   Loss 13.2837   LearningRate 0.0883   Epoch: 2   Global Step: 12160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:42:09,101-Speed 10740.64 samples/sec   Loss 13.3809   LearningRate 0.0883   Epoch: 2   Global Step: 12170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:42:10,055-Speed 10742.63 samples/sec   Loss 13.3386   LearningRate 0.0883   Epoch: 2   Global Step: 12180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:42:11,060-Speed 10191.12 samples/sec   Loss 13.3380   LearningRate 0.0883   Epoch: 2   Global Step: 12190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:42:12,042-Speed 10448.84 samples/sec   Loss 13.3288   LearningRate 0.0883   Epoch: 2   Global Step: 12200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:42:13,008-Speed 10607.71 samples/sec   Loss 13.3639   LearningRate 0.0883   Epoch: 2   Global Step: 12210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:42:13,976-Speed 10588.42 samples/sec   Loss 13.2592   LearningRate 0.0883   Epoch: 2   Global Step: 12220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:42:14,935-Speed 10686.38 samples/sec   Loss 13.3786   LearningRate 0.0883   Epoch: 2   Global Step: 12230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:42:15,914-Speed 10469.56 samples/sec   Loss 13.4525   LearningRate 0.0883   Epoch: 2   Global Step: 12240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:42:16,963-Speed 9767.32 samples/sec   Loss 13.4479   LearningRate 0.0883   Epoch: 2   Global Step: 12250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:42:17,932-Speed 10575.68 samples/sec   Loss 13.3289   LearningRate 0.0882   Epoch: 2   Global Step: 12260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:42:18,972-Speed 9858.40 samples/sec   Loss 13.3755   LearningRate 0.0882   Epoch: 2   Global Step: 12270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:42:19,961-Speed 10359.79 samples/sec   Loss 13.1680   LearningRate 0.0882   Epoch: 2   Global Step: 12280   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:42:20,998-Speed 9888.87 samples/sec   Loss 13.2121   LearningRate 0.0882   Epoch: 2   Global Step: 12290   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:42:21,951-Speed 10760.59 samples/sec   Loss 13.2103   LearningRate 0.0882   Epoch: 2   Global Step: 12300   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:42:23,007-Speed 9702.01 samples/sec   Loss 13.3282   LearningRate 0.0882   Epoch: 2   Global Step: 12310   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:42:23,973-Speed 10619.40 samples/sec   Loss 13.4232   LearningRate 0.0882   Epoch: 2   Global Step: 12320   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:42:24,967-Speed 10306.28 samples/sec   Loss 13.2925   LearningRate 0.0882   Epoch: 2   Global Step: 12330   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:42:25,942-Speed 10507.16 samples/sec   Loss 13.1702   LearningRate 0.0882   Epoch: 2   Global Step: 12340   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:42:26,910-Speed 10596.50 samples/sec   Loss 13.0503   LearningRate 0.0882   Epoch: 2   Global Step: 12350   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:42:27,881-Speed 10554.34 samples/sec   Loss 13.1441   LearningRate 0.0882   Epoch: 2   Global Step: 12360   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:42:28,859-Speed 10485.24 samples/sec   Loss 13.1848   LearningRate 0.0881   Epoch: 2   Global Step: 12370   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:42:29,846-Speed 10379.57 samples/sec   Loss 13.3159   LearningRate 0.0881   Epoch: 2   Global Step: 12380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:42:30,927-Speed 9483.86 samples/sec   Loss 13.1613   LearningRate 0.0881   Epoch: 2   Global Step: 12390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:42:31,972-Speed 9804.44 samples/sec   Loss 13.3162   LearningRate 0.0881   Epoch: 2   Global Step: 12400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:42:32,967-Speed 10306.27 samples/sec   Loss 13.2815   LearningRate 0.0881   Epoch: 2   Global Step: 12410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:42:34,037-Speed 9577.91 samples/sec   Loss 13.3594   LearningRate 0.0881   Epoch: 2   Global Step: 12420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:42:35,083-Speed 9798.64 samples/sec   Loss 13.2461   LearningRate 0.0881   Epoch: 2   Global Step: 12430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:42:36,104-Speed 10032.72 samples/sec   Loss 13.2831   LearningRate 0.0881   Epoch: 2   Global Step: 12440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:42:37,084-Speed 10469.78 samples/sec   Loss 13.2898   LearningRate 0.0881   Epoch: 2   Global Step: 12450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:42:38,173-Speed 9405.17 samples/sec   Loss 13.1103   LearningRate 0.0881   Epoch: 2   Global Step: 12460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:42:39,125-Speed 10767.17 samples/sec   Loss 13.0966   LearningRate 0.0881   Epoch: 2   Global Step: 12470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:42:40,211-Speed 9434.13 samples/sec   Loss 13.3565   LearningRate 0.0880   Epoch: 2   Global Step: 12480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:42:41,220-Speed 10164.22 samples/sec   Loss 13.1365   LearningRate 0.0880   Epoch: 2   Global Step: 12490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:42:42,234-Speed 10103.74 samples/sec   Loss 13.1575   LearningRate 0.0880   Epoch: 2   Global Step: 12500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:42:43,214-Speed 10460.73 samples/sec   Loss 13.1517   LearningRate 0.0880   Epoch: 2   Global Step: 12510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:42:44,201-Speed 10380.33 samples/sec   Loss 13.0922   LearningRate 0.0880   Epoch: 2   Global Step: 12520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:42:45,225-Speed 10004.54 samples/sec   Loss 13.4318   LearningRate 0.0880   Epoch: 2   Global Step: 12530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:42:46,218-Speed 10320.85 samples/sec   Loss 13.2678   LearningRate 0.0880   Epoch: 2   Global Step: 12540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:42:47,195-Speed 10490.41 samples/sec   Loss 13.1774   LearningRate 0.0880   Epoch: 2   Global Step: 12550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:42:48,146-Speed 10775.96 samples/sec   Loss 13.4430   LearningRate 0.0880   Epoch: 2   Global Step: 12560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:42:49,227-Speed 9486.99 samples/sec   Loss 13.1911   LearningRate 0.0880   Epoch: 2   Global Step: 12570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:42:50,172-Speed 10843.33 samples/sec   Loss 13.1015   LearningRate 0.0880   Epoch: 2   Global Step: 12580   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:42:51,176-Speed 10203.41 samples/sec   Loss 13.1020   LearningRate 0.0879   Epoch: 2   Global Step: 12590   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:42:52,127-Speed 10787.59 samples/sec   Loss 13.2772   LearningRate 0.0879   Epoch: 2   Global Step: 12600   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:42:53,182-Speed 9707.46 samples/sec   Loss 13.1598   LearningRate 0.0879   Epoch: 2   Global Step: 12610   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:42:54,194-Speed 10128.19 samples/sec   Loss 13.0171   LearningRate 0.0879   Epoch: 2   Global Step: 12620   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:42:55,196-Speed 10229.57 samples/sec   Loss 13.2263   LearningRate 0.0879   Epoch: 2   Global Step: 12630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:42:56,223-Speed 9982.70 samples/sec   Loss 13.1663   LearningRate 0.0879   Epoch: 2   Global Step: 12640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:42:57,183-Speed 10679.04 samples/sec   Loss 13.2625   LearningRate 0.0879   Epoch: 2   Global Step: 12650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:42:58,206-Speed 10017.53 samples/sec   Loss 13.1910   LearningRate 0.0879   Epoch: 2   Global Step: 12660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:42:59,260-Speed 9718.79 samples/sec   Loss 13.0991   LearningRate 0.0879   Epoch: 2   Global Step: 12670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:43:00,331-Speed 9573.49 samples/sec   Loss 13.2352   LearningRate 0.0879   Epoch: 2   Global Step: 12680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:43:01,300-Speed 10582.93 samples/sec   Loss 13.1550   LearningRate 0.0878   Epoch: 2   Global Step: 12690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:43:02,316-Speed 10077.49 samples/sec   Loss 13.1780   LearningRate 0.0878   Epoch: 2   Global Step: 12700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:43:03,402-Speed 9437.59 samples/sec   Loss 13.2067   LearningRate 0.0878   Epoch: 2   Global Step: 12710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:43:04,458-Speed 9715.53 samples/sec   Loss 13.0024   LearningRate 0.0878   Epoch: 2   Global Step: 12720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:43:05,483-Speed 10003.48 samples/sec   Loss 12.9907   LearningRate 0.0878   Epoch: 2   Global Step: 12730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:43:06,478-Speed 10296.42 samples/sec   Loss 13.0738   LearningRate 0.0878   Epoch: 2   Global Step: 12740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:43:07,474-Speed 10296.43 samples/sec   Loss 13.0989   LearningRate 0.0878   Epoch: 2   Global Step: 12750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:43:08,456-Speed 10443.47 samples/sec   Loss 13.1945   LearningRate 0.0878   Epoch: 2   Global Step: 12760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:43:09,421-Speed 10625.00 samples/sec   Loss 13.2550   LearningRate 0.0878   Epoch: 2   Global Step: 12770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:43:10,389-Speed 10594.34 samples/sec   Loss 13.0939   LearningRate 0.0878   Epoch: 2   Global Step: 12780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:43:11,352-Speed 10632.48 samples/sec   Loss 13.0520   LearningRate 0.0878   Epoch: 2   Global Step: 12790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:43:12,349-Speed 10287.29 samples/sec   Loss 13.1696   LearningRate 0.0877   Epoch: 2   Global Step: 12800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:43:13,328-Speed 10478.20 samples/sec   Loss 13.1827   LearningRate 0.0877   Epoch: 2   Global Step: 12810   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:43:14,367-Speed 9862.96 samples/sec   Loss 13.0511   LearningRate 0.0877   Epoch: 2   Global Step: 12820   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:43:15,336-Speed 10572.27 samples/sec   Loss 13.1451   LearningRate 0.0877   Epoch: 2   Global Step: 12830   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:43:16,396-Speed 9670.39 samples/sec   Loss 13.2108   LearningRate 0.0877   Epoch: 2   Global Step: 12840   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:43:17,348-Speed 10775.90 samples/sec   Loss 13.2847   LearningRate 0.0877   Epoch: 2   Global Step: 12850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:43:18,393-Speed 9808.00 samples/sec   Loss 13.2269   LearningRate 0.0877   Epoch: 2   Global Step: 12860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:43:19,408-Speed 10093.00 samples/sec   Loss 13.0036   LearningRate 0.0877   Epoch: 2   Global Step: 12870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:43:20,403-Speed 10306.33 samples/sec   Loss 13.0964   LearningRate 0.0877   Epoch: 2   Global Step: 12880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:43:21,369-Speed 10602.89 samples/sec   Loss 13.1001   LearningRate 0.0877   Epoch: 2   Global Step: 12890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:43:22,336-Speed 10604.48 samples/sec   Loss 12.9757   LearningRate 0.0877   Epoch: 2   Global Step: 12900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:43:23,293-Speed 10707.31 samples/sec   Loss 13.1535   LearningRate 0.0876   Epoch: 2   Global Step: 12910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:43:24,330-Speed 9885.23 samples/sec   Loss 13.1655   LearningRate 0.0876   Epoch: 2   Global Step: 12920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:43:25,316-Speed 10402.62 samples/sec   Loss 13.0023   LearningRate 0.0876   Epoch: 2   Global Step: 12930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:43:26,352-Speed 9887.97 samples/sec   Loss 13.0550   LearningRate 0.0876   Epoch: 2   Global Step: 12940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:43:27,360-Speed 10172.14 samples/sec   Loss 13.0641   LearningRate 0.0876   Epoch: 2   Global Step: 12950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:43:28,369-Speed 10152.82 samples/sec   Loss 13.0293   LearningRate 0.0876   Epoch: 2   Global Step: 12960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:43:29,321-Speed 10767.54 samples/sec   Loss 13.2102   LearningRate 0.0876   Epoch: 2   Global Step: 12970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:43:30,281-Speed 10676.78 samples/sec   Loss 12.9857   LearningRate 0.0876   Epoch: 2   Global Step: 12980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:43:31,283-Speed 10235.32 samples/sec   Loss 13.2484   LearningRate 0.0876   Epoch: 2   Global Step: 12990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:43:32,301-Speed 10319.90 samples/sec   Loss 12.9784   LearningRate 0.0876   Epoch: 2   Global Step: 13000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:43:33,337-Speed 9897.33 samples/sec   Loss 13.0669   LearningRate 0.0876   Epoch: 2   Global Step: 13010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:43:34,346-Speed 10158.81 samples/sec   Loss 13.2349   LearningRate 0.0875   Epoch: 2   Global Step: 13020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:43:35,374-Speed 9968.12 samples/sec   Loss 13.2243   LearningRate 0.0875   Epoch: 2   Global Step: 13030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:43:36,333-Speed 10680.94 samples/sec   Loss 13.0280   LearningRate 0.0875   Epoch: 2   Global Step: 13040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:43:37,352-Speed 10060.68 samples/sec   Loss 13.0356   LearningRate 0.0875   Epoch: 2   Global Step: 13050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:43:38,317-Speed 10619.61 samples/sec   Loss 13.2306   LearningRate 0.0875   Epoch: 2   Global Step: 13060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:43:39,418-Speed 9306.67 samples/sec   Loss 13.0895   LearningRate 0.0875   Epoch: 2   Global Step: 13070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:43:40,362-Speed 10858.87 samples/sec   Loss 13.2299   LearningRate 0.0875   Epoch: 2   Global Step: 13080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:43:41,429-Speed 9607.24 samples/sec   Loss 13.1026   LearningRate 0.0875   Epoch: 2   Global Step: 13090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:43:42,469-Speed 9857.02 samples/sec   Loss 13.0598   LearningRate 0.0875   Epoch: 2   Global Step: 13100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:43:43,420-Speed 10794.83 samples/sec   Loss 13.1165   LearningRate 0.0875   Epoch: 2   Global Step: 13110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:43:44,438-Speed 10072.69 samples/sec   Loss 13.1065   LearningRate 0.0875   Epoch: 2   Global Step: 13120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:43:45,408-Speed 10566.06 samples/sec   Loss 13.0043   LearningRate 0.0874   Epoch: 2   Global Step: 13130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:43:46,392-Speed 10414.96 samples/sec   Loss 12.8316   LearningRate 0.0874   Epoch: 2   Global Step: 13140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:43:47,386-Speed 10315.08 samples/sec   Loss 13.0493   LearningRate 0.0874   Epoch: 2   Global Step: 13150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:43:48,441-Speed 9713.35 samples/sec   Loss 12.9990   LearningRate 0.0874   Epoch: 2   Global Step: 13160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:43:49,452-Speed 10135.89 samples/sec   Loss 13.0791   LearningRate 0.0874   Epoch: 2   Global Step: 13170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:43:50,439-Speed 10383.87 samples/sec   Loss 13.0876   LearningRate 0.0874   Epoch: 2   Global Step: 13180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:43:51,402-Speed 10649.12 samples/sec   Loss 13.0143   LearningRate 0.0874   Epoch: 2   Global Step: 13190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:43:52,402-Speed 10253.29 samples/sec   Loss 13.0709   LearningRate 0.0874   Epoch: 2   Global Step: 13200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:43:53,366-Speed 10634.24 samples/sec   Loss 13.2746   LearningRate 0.0874   Epoch: 2   Global Step: 13210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:43:54,324-Speed 10696.09 samples/sec   Loss 12.9230   LearningRate 0.0874   Epoch: 2   Global Step: 13220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:43:55,432-Speed 9241.95 samples/sec   Loss 13.0082   LearningRate 0.0873   Epoch: 2   Global Step: 13230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:43:56,453-Speed 10038.06 samples/sec   Loss 13.1061   LearningRate 0.0873   Epoch: 2   Global Step: 13240   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:43:57,485-Speed 9939.64 samples/sec   Loss 13.0474   LearningRate 0.0873   Epoch: 2   Global Step: 13250   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:43:58,542-Speed 9693.75 samples/sec   Loss 13.0188   LearningRate 0.0873   Epoch: 2   Global Step: 13260   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:43:59,568-Speed 9993.57 samples/sec   Loss 13.0737   LearningRate 0.0873   Epoch: 2   Global Step: 13270   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:44:00,537-Speed 10572.23 samples/sec   Loss 13.1804   LearningRate 0.0873   Epoch: 2   Global Step: 13280   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:44:01,616-Speed 9506.64 samples/sec   Loss 12.9194   LearningRate 0.0873   Epoch: 2   Global Step: 13290   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:44:02,685-Speed 9583.50 samples/sec   Loss 13.1004   LearningRate 0.0873   Epoch: 2   Global Step: 13300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:44:03,666-Speed 10455.79 samples/sec   Loss 13.1051   LearningRate 0.0873   Epoch: 2   Global Step: 13310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:44:04,667-Speed 10237.36 samples/sec   Loss 12.9641   LearningRate 0.0873   Epoch: 2   Global Step: 13320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:44:05,697-Speed 9951.63 samples/sec   Loss 13.1267   LearningRate 0.0873   Epoch: 2   Global Step: 13330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:44:06,757-Speed 9662.78 samples/sec   Loss 12.9153   LearningRate 0.0872   Epoch: 2   Global Step: 13340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:44:07,746-Speed 10364.44 samples/sec   Loss 13.1720   LearningRate 0.0872   Epoch: 2   Global Step: 13350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:44:08,803-Speed 9693.96 samples/sec   Loss 12.9984   LearningRate 0.0872   Epoch: 2   Global Step: 13360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:44:09,916-Speed 9211.41 samples/sec   Loss 12.8793   LearningRate 0.0872   Epoch: 2   Global Step: 13370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:44:10,871-Speed 10727.65 samples/sec   Loss 13.1298   LearningRate 0.0872   Epoch: 2   Global Step: 13380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:44:11,902-Speed 9943.58 samples/sec   Loss 12.8673   LearningRate 0.0872   Epoch: 2   Global Step: 13390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:44:12,863-Speed 10669.46 samples/sec   Loss 13.0404   LearningRate 0.0872   Epoch: 2   Global Step: 13400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:44:13,934-Speed 9566.34 samples/sec   Loss 12.9004   LearningRate 0.0872   Epoch: 2   Global Step: 13410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:44:14,925-Speed 10346.37 samples/sec   Loss 13.0265   LearningRate 0.0872   Epoch: 2   Global Step: 13420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:44:15,906-Speed 10450.37 samples/sec   Loss 13.0704   LearningRate 0.0872   Epoch: 2   Global Step: 13430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:44:16,912-Speed 10186.73 samples/sec   Loss 13.1101   LearningRate 0.0872   Epoch: 2   Global Step: 13440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:44:17,896-Speed 10413.31 samples/sec   Loss 13.0047   LearningRate 0.0871   Epoch: 2   Global Step: 13450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:44:18,911-Speed 10096.67 samples/sec   Loss 12.7874   LearningRate 0.0871   Epoch: 2   Global Step: 13460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:44:19,852-Speed 10895.51 samples/sec   Loss 13.0573   LearningRate 0.0871   Epoch: 2   Global Step: 13470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:44:20,828-Speed 10503.94 samples/sec   Loss 12.9233   LearningRate 0.0871   Epoch: 2   Global Step: 13480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:44:21,826-Speed 10264.56 samples/sec   Loss 13.0584   LearningRate 0.0871   Epoch: 2   Global Step: 13490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:44:22,826-Speed 10253.63 samples/sec   Loss 12.9826   LearningRate 0.0871   Epoch: 2   Global Step: 13500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:44:23,798-Speed 10553.75 samples/sec   Loss 13.0717   LearningRate 0.0871   Epoch: 2   Global Step: 13510   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:44:24,763-Speed 10622.95 samples/sec   Loss 13.0665   LearningRate 0.0871   Epoch: 2   Global Step: 13520   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:44:25,778-Speed 10101.92 samples/sec   Loss 13.0760   LearningRate 0.0871   Epoch: 2   Global Step: 13530   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:44:26,792-Speed 10103.33 samples/sec   Loss 12.8879   LearningRate 0.0871   Epoch: 2   Global Step: 13540   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:44:27,842-Speed 9766.06 samples/sec   Loss 12.9289   LearningRate 0.0871   Epoch: 2   Global Step: 13550   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:44:28,804-Speed 10648.97 samples/sec   Loss 12.9792   LearningRate 0.0870   Epoch: 2   Global Step: 13560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:44:29,793-Speed 10365.74 samples/sec   Loss 12.8381   LearningRate 0.0870   Epoch: 2   Global Step: 13570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:44:30,729-Speed 10942.32 samples/sec   Loss 12.9199   LearningRate 0.0870   Epoch: 2   Global Step: 13580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:44:31,735-Speed 10192.76 samples/sec   Loss 12.8177   LearningRate 0.0870   Epoch: 2   Global Step: 13590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:44:32,824-Speed 9410.00 samples/sec   Loss 12.7725   LearningRate 0.0870   Epoch: 2   Global Step: 13600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:44:33,808-Speed 10413.51 samples/sec   Loss 12.7353   LearningRate 0.0870   Epoch: 2   Global Step: 13610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:44:34,817-Speed 10165.40 samples/sec   Loss 12.7588   LearningRate 0.0870   Epoch: 2   Global Step: 13620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:44:35,842-Speed 9995.80 samples/sec   Loss 12.8447   LearningRate 0.0870   Epoch: 2   Global Step: 13630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:44:36,872-Speed 9956.98 samples/sec   Loss 13.0720   LearningRate 0.0870   Epoch: 2   Global Step: 13640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:44:37,968-Speed 9347.92 samples/sec   Loss 13.0084   LearningRate 0.0870   Epoch: 2   Global Step: 13650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:44:38,943-Speed 10516.88 samples/sec   Loss 13.0173   LearningRate 0.0870   Epoch: 2   Global Step: 13660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:44:40,034-Speed 9394.74 samples/sec   Loss 13.0060   LearningRate 0.0869   Epoch: 2   Global Step: 13670   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:44:41,012-Speed 10478.32 samples/sec   Loss 13.1144   LearningRate 0.0869   Epoch: 2   Global Step: 13680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:44:42,030-Speed 10066.90 samples/sec   Loss 13.0454   LearningRate 0.0869   Epoch: 2   Global Step: 13690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:44:43,012-Speed 10439.48 samples/sec   Loss 12.9578   LearningRate 0.0869   Epoch: 2   Global Step: 13700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:44:44,013-Speed 10258.10 samples/sec   Loss 12.9982   LearningRate 0.0869   Epoch: 2   Global Step: 13710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:44:44,983-Speed 10566.54 samples/sec   Loss 13.1302   LearningRate 0.0869   Epoch: 2   Global Step: 13720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:44:45,929-Speed 10833.40 samples/sec   Loss 12.8128   LearningRate 0.0869   Epoch: 2   Global Step: 13730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:44:46,980-Speed 9749.32 samples/sec   Loss 13.0363   LearningRate 0.0869   Epoch: 2   Global Step: 13740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:44:47,968-Speed 10380.88 samples/sec   Loss 12.8927   LearningRate 0.0869   Epoch: 2   Global Step: 13750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:44:48,940-Speed 10547.51 samples/sec   Loss 12.9674   LearningRate 0.0869   Epoch: 2   Global Step: 13760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:44:49,951-Speed 10136.53 samples/sec   Loss 12.6830   LearningRate 0.0869   Epoch: 2   Global Step: 13770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:44:50,918-Speed 10603.15 samples/sec   Loss 12.9311   LearningRate 0.0868   Epoch: 2   Global Step: 13780   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:44:51,952-Speed 9904.19 samples/sec   Loss 12.7239   LearningRate 0.0868   Epoch: 2   Global Step: 13790   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:44:52,987-Speed 9907.19 samples/sec   Loss 12.9387   LearningRate 0.0868   Epoch: 2   Global Step: 13800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:44:54,035-Speed 9784.15 samples/sec   Loss 12.8064   LearningRate 0.0868   Epoch: 2   Global Step: 13810   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:44:55,024-Speed 10363.47 samples/sec   Loss 12.8926   LearningRate 0.0868   Epoch: 2   Global Step: 13820   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:44:56,085-Speed 9656.08 samples/sec   Loss 12.9251   LearningRate 0.0868   Epoch: 2   Global Step: 13830   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:44:57,085-Speed 10242.85 samples/sec   Loss 12.8302   LearningRate 0.0868   Epoch: 2   Global Step: 13840   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:44:58,084-Speed 10270.48 samples/sec   Loss 12.7443   LearningRate 0.0868   Epoch: 2   Global Step: 13850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:44:59,170-Speed 9435.09 samples/sec   Loss 12.8330   LearningRate 0.0868   Epoch: 2   Global Step: 13860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:45:00,147-Speed 10486.54 samples/sec   Loss 12.9600   LearningRate 0.0868   Epoch: 2   Global Step: 13870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:45:01,159-Speed 10125.89 samples/sec   Loss 12.7601   LearningRate 0.0867   Epoch: 2   Global Step: 13880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:45:02,266-Speed 9261.43 samples/sec   Loss 12.8826   LearningRate 0.0867   Epoch: 2   Global Step: 13890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:45:03,295-Speed 9957.96 samples/sec   Loss 12.7960   LearningRate 0.0867   Epoch: 2   Global Step: 13900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:45:04,271-Speed 10504.60 samples/sec   Loss 12.8495   LearningRate 0.0867   Epoch: 2   Global Step: 13910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:45:05,299-Speed 9965.96 samples/sec   Loss 12.8692   LearningRate 0.0867   Epoch: 2   Global Step: 13920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:45:06,388-Speed 9410.11 samples/sec   Loss 12.7499   LearningRate 0.0867   Epoch: 2   Global Step: 13930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:45:07,424-Speed 9903.04 samples/sec   Loss 12.9168   LearningRate 0.0867   Epoch: 2   Global Step: 13940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:45:08,419-Speed 10308.98 samples/sec   Loss 12.8333   LearningRate 0.0867   Epoch: 2   Global Step: 13950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:45:09,419-Speed 10246.88 samples/sec   Loss 12.8933   LearningRate 0.0867   Epoch: 2   Global Step: 13960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:45:10,391-Speed 10538.93 samples/sec   Loss 12.9203   LearningRate 0.0867   Epoch: 2   Global Step: 13970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:45:11,352-Speed 10666.81 samples/sec   Loss 12.9096   LearningRate 0.0867   Epoch: 2   Global Step: 13980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:45:12,290-Speed 10926.65 samples/sec   Loss 12.7974   LearningRate 0.0866   Epoch: 2   Global Step: 13990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:45:13,280-Speed 10358.52 samples/sec   Loss 12.7051   LearningRate 0.0866   Epoch: 2   Global Step: 14000   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:45:35,745-[lfw][14000]XNorm: 15.606247
Training: 2022-04-10 23:45:35,745-[lfw][14000]Accuracy-Flip: 0.99033+-0.00440
Training: 2022-04-10 23:45:35,746-[lfw][14000]Accuracy-Highest: 0.99100
Training: 2022-04-10 23:46:00,956-[cfp_fp][14000]XNorm: 13.164626
Training: 2022-04-10 23:46:00,957-[cfp_fp][14000]Accuracy-Flip: 0.92271+-0.01436
Training: 2022-04-10 23:46:00,958-[cfp_fp][14000]Accuracy-Highest: 0.92271
Training: 2022-04-10 23:46:22,987-[agedb_30][14000]XNorm: 15.180883
Training: 2022-04-10 23:46:22,987-[agedb_30][14000]Accuracy-Flip: 0.93700+-0.01818
Training: 2022-04-10 23:46:22,988-[agedb_30][14000]Accuracy-Highest: 0.93700
Training: 2022-04-10 23:46:23,947-Speed 144.91 samples/sec   Loss 12.8980   LearningRate 0.0866   Epoch: 2   Global Step: 14010   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:46:24,902-Speed 10734.60 samples/sec   Loss 12.9604   LearningRate 0.0866   Epoch: 2   Global Step: 14020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:46:25,896-Speed 10312.14 samples/sec   Loss 12.7434   LearningRate 0.0866   Epoch: 2   Global Step: 14030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:46:26,870-Speed 10520.93 samples/sec   Loss 12.6619   LearningRate 0.0866   Epoch: 2   Global Step: 14040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:46:27,894-Speed 10012.57 samples/sec   Loss 12.8289   LearningRate 0.0866   Epoch: 2   Global Step: 14050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:46:28,868-Speed 10526.42 samples/sec   Loss 12.8505   LearningRate 0.0866   Epoch: 2   Global Step: 14060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:46:29,834-Speed 10606.39 samples/sec   Loss 12.7982   LearningRate 0.0866   Epoch: 2   Global Step: 14070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:46:30,796-Speed 10660.34 samples/sec   Loss 12.7028   LearningRate 0.0866   Epoch: 2   Global Step: 14080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:46:31,772-Speed 10520.46 samples/sec   Loss 12.9116   LearningRate 0.0866   Epoch: 2   Global Step: 14090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:46:32,742-Speed 10562.61 samples/sec   Loss 12.8179   LearningRate 0.0865   Epoch: 2   Global Step: 14100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:46:33,716-Speed 10525.21 samples/sec   Loss 12.7188   LearningRate 0.0865   Epoch: 2   Global Step: 14110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:46:34,695-Speed 10475.20 samples/sec   Loss 12.9226   LearningRate 0.0865   Epoch: 2   Global Step: 14120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:46:35,665-Speed 10572.60 samples/sec   Loss 12.8518   LearningRate 0.0865   Epoch: 2   Global Step: 14130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:46:36,649-Speed 10406.51 samples/sec   Loss 12.8725   LearningRate 0.0865   Epoch: 2   Global Step: 14140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:46:37,630-Speed 10451.71 samples/sec   Loss 12.7520   LearningRate 0.0865   Epoch: 2   Global Step: 14150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:46:38,595-Speed 10624.52 samples/sec   Loss 13.0086   LearningRate 0.0865   Epoch: 2   Global Step: 14160   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:46:39,551-Speed 10715.06 samples/sec   Loss 12.9205   LearningRate 0.0865   Epoch: 2   Global Step: 14170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:46:40,570-Speed 10059.75 samples/sec   Loss 12.7552   LearningRate 0.0865   Epoch: 2   Global Step: 14180   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:46:41,548-Speed 10493.33 samples/sec   Loss 12.9175   LearningRate 0.0865   Epoch: 2   Global Step: 14190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:46:42,528-Speed 10484.15 samples/sec   Loss 12.8136   LearningRate 0.0865   Epoch: 2   Global Step: 14200   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:46:43,459-Speed 11000.25 samples/sec   Loss 12.9212   LearningRate 0.0864   Epoch: 2   Global Step: 14210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:46:44,466-Speed 10181.31 samples/sec   Loss 12.6808   LearningRate 0.0864   Epoch: 2   Global Step: 14220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:46:45,576-Speed 9236.31 samples/sec   Loss 12.8853   LearningRate 0.0864   Epoch: 2   Global Step: 14230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:46:46,545-Speed 10571.12 samples/sec   Loss 12.8458   LearningRate 0.0864   Epoch: 2   Global Step: 14240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:46:47,544-Speed 10267.99 samples/sec   Loss 12.7739   LearningRate 0.0864   Epoch: 2   Global Step: 14250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:46:48,643-Speed 9324.53 samples/sec   Loss 12.6995   LearningRate 0.0864   Epoch: 2   Global Step: 14260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:46:49,631-Speed 10381.52 samples/sec   Loss 12.7368   LearningRate 0.0864   Epoch: 2   Global Step: 14270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:46:50,653-Speed 10029.71 samples/sec   Loss 12.6743   LearningRate 0.0864   Epoch: 2   Global Step: 14280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:46:51,692-Speed 9859.72 samples/sec   Loss 12.6752   LearningRate 0.0864   Epoch: 2   Global Step: 14290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:46:52,676-Speed 10419.92 samples/sec   Loss 12.9604   LearningRate 0.0864   Epoch: 2   Global Step: 14300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:46:53,636-Speed 10682.87 samples/sec   Loss 12.8304   LearningRate 0.0864   Epoch: 2   Global Step: 14310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:46:54,596-Speed 10673.63 samples/sec   Loss 13.0269   LearningRate 0.0863   Epoch: 2   Global Step: 14320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:46:55,565-Speed 10581.81 samples/sec   Loss 12.9011   LearningRate 0.0863   Epoch: 2   Global Step: 14330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:46:56,685-Speed 9145.51 samples/sec   Loss 13.0337   LearningRate 0.0863   Epoch: 2   Global Step: 14340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:46:57,673-Speed 10374.81 samples/sec   Loss 12.8397   LearningRate 0.0863   Epoch: 2   Global Step: 14350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:46:58,636-Speed 10647.66 samples/sec   Loss 12.7359   LearningRate 0.0863   Epoch: 2   Global Step: 14360   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:46:59,571-Speed 10961.85 samples/sec   Loss 13.0837   LearningRate 0.0863   Epoch: 2   Global Step: 14370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:47:00,523-Speed 10766.48 samples/sec   Loss 12.7143   LearningRate 0.0863   Epoch: 2   Global Step: 14380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:47:01,512-Speed 10365.99 samples/sec   Loss 12.7764   LearningRate 0.0863   Epoch: 2   Global Step: 14390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:47:02,492-Speed 10460.14 samples/sec   Loss 12.8581   LearningRate 0.0863   Epoch: 2   Global Step: 14400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:47:03,445-Speed 10750.32 samples/sec   Loss 12.6647   LearningRate 0.0863   Epoch: 2   Global Step: 14410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:47:04,424-Speed 10471.06 samples/sec   Loss 12.7486   LearningRate 0.0863   Epoch: 2   Global Step: 14420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:47:05,385-Speed 10661.22 samples/sec   Loss 13.0260   LearningRate 0.0862   Epoch: 2   Global Step: 14430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:47:06,343-Speed 10695.51 samples/sec   Loss 12.8321   LearningRate 0.0862   Epoch: 2   Global Step: 14440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:47:07,335-Speed 10344.59 samples/sec   Loss 12.7748   LearningRate 0.0862   Epoch: 2   Global Step: 14450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:47:08,289-Speed 10743.41 samples/sec   Loss 12.7679   LearningRate 0.0862   Epoch: 2   Global Step: 14460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:47:09,256-Speed 10597.63 samples/sec   Loss 12.8203   LearningRate 0.0862   Epoch: 2   Global Step: 14470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:47:10,264-Speed 10166.50 samples/sec   Loss 12.7492   LearningRate 0.0862   Epoch: 2   Global Step: 14480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:47:11,202-Speed 10929.90 samples/sec   Loss 13.0362   LearningRate 0.0862   Epoch: 2   Global Step: 14490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:47:12,168-Speed 10608.82 samples/sec   Loss 12.7482   LearningRate 0.0862   Epoch: 2   Global Step: 14500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:47:13,168-Speed 10251.72 samples/sec   Loss 12.9153   LearningRate 0.0862   Epoch: 2   Global Step: 14510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:47:14,161-Speed 10313.91 samples/sec   Loss 12.5425   LearningRate 0.0862   Epoch: 2   Global Step: 14520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:47:15,140-Speed 10478.20 samples/sec   Loss 12.7332   LearningRate 0.0862   Epoch: 2   Global Step: 14530   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:47:16,101-Speed 10664.55 samples/sec   Loss 12.8078   LearningRate 0.0861   Epoch: 2   Global Step: 14540   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:47:17,072-Speed 10560.07 samples/sec   Loss 12.7820   LearningRate 0.0861   Epoch: 2   Global Step: 14550   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:47:18,044-Speed 10540.36 samples/sec   Loss 12.8182   LearningRate 0.0861   Epoch: 2   Global Step: 14560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:47:19,015-Speed 10562.97 samples/sec   Loss 12.8388   LearningRate 0.0861   Epoch: 2   Global Step: 14570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:47:20,034-Speed 10055.32 samples/sec   Loss 12.7009   LearningRate 0.0861   Epoch: 2   Global Step: 14580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:47:21,054-Speed 10050.63 samples/sec   Loss 12.6471   LearningRate 0.0861   Epoch: 2   Global Step: 14590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:47:21,989-Speed 10957.93 samples/sec   Loss 12.6299   LearningRate 0.0861   Epoch: 2   Global Step: 14600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:47:22,962-Speed 10532.13 samples/sec   Loss 12.6087   LearningRate 0.0861   Epoch: 2   Global Step: 14610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:47:23,960-Speed 10269.83 samples/sec   Loss 12.7669   LearningRate 0.0861   Epoch: 2   Global Step: 14620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:47:24,943-Speed 10434.51 samples/sec   Loss 12.9085   LearningRate 0.0861   Epoch: 2   Global Step: 14630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:47:25,915-Speed 10544.58 samples/sec   Loss 12.8208   LearningRate 0.0861   Epoch: 2   Global Step: 14640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:47:26,868-Speed 10754.55 samples/sec   Loss 12.7386   LearningRate 0.0860   Epoch: 2   Global Step: 14650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:47:27,818-Speed 10794.45 samples/sec   Loss 12.7699   LearningRate 0.0860   Epoch: 2   Global Step: 14660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:47:28,773-Speed 10732.19 samples/sec   Loss 12.7724   LearningRate 0.0860   Epoch: 2   Global Step: 14670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:47:29,766-Speed 10323.38 samples/sec   Loss 12.6352   LearningRate 0.0860   Epoch: 2   Global Step: 14680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:47:30,751-Speed 10407.35 samples/sec   Loss 12.7255   LearningRate 0.0860   Epoch: 2   Global Step: 14690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:47:31,761-Speed 10146.45 samples/sec   Loss 12.6892   LearningRate 0.0860   Epoch: 2   Global Step: 14700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:47:32,765-Speed 10208.83 samples/sec   Loss 12.6572   LearningRate 0.0860   Epoch: 2   Global Step: 14710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:47:33,726-Speed 10665.88 samples/sec   Loss 12.7062   LearningRate 0.0860   Epoch: 2   Global Step: 14720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:47:34,730-Speed 10206.86 samples/sec   Loss 12.8086   LearningRate 0.0860   Epoch: 2   Global Step: 14730   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:47:35,674-Speed 10863.44 samples/sec   Loss 12.7279   LearningRate 0.0860   Epoch: 2   Global Step: 14740   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:47:36,612-Speed 10922.08 samples/sec   Loss 12.6794   LearningRate 0.0860   Epoch: 2   Global Step: 14750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:47:37,578-Speed 10611.33 samples/sec   Loss 12.7819   LearningRate 0.0859   Epoch: 2   Global Step: 14760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:47:38,539-Speed 10663.16 samples/sec   Loss 12.6627   LearningRate 0.0859   Epoch: 2   Global Step: 14770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:47:39,514-Speed 10515.00 samples/sec   Loss 12.6657   LearningRate 0.0859   Epoch: 2   Global Step: 14780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:47:40,464-Speed 10812.97 samples/sec   Loss 12.6042   LearningRate 0.0859   Epoch: 2   Global Step: 14790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:47:41,410-Speed 10834.46 samples/sec   Loss 12.6253   LearningRate 0.0859   Epoch: 2   Global Step: 14800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:47:42,411-Speed 10231.75 samples/sec   Loss 12.8979   LearningRate 0.0859   Epoch: 2   Global Step: 14810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:47:43,404-Speed 10333.83 samples/sec   Loss 12.5333   LearningRate 0.0859   Epoch: 2   Global Step: 14820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:47:44,391-Speed 10381.50 samples/sec   Loss 12.6231   LearningRate 0.0859   Epoch: 2   Global Step: 14830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:47:45,349-Speed 10695.06 samples/sec   Loss 12.6005   LearningRate 0.0859   Epoch: 2   Global Step: 14840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:47:46,304-Speed 10731.88 samples/sec   Loss 12.5596   LearningRate 0.0859   Epoch: 2   Global Step: 14850   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:47:47,246-Speed 10888.32 samples/sec   Loss 12.6710   LearningRate 0.0858   Epoch: 2   Global Step: 14860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:47:48,242-Speed 10490.24 samples/sec   Loss 12.5144   LearningRate 0.0858   Epoch: 2   Global Step: 14870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:47:49,217-Speed 10519.20 samples/sec   Loss 12.8239   LearningRate 0.0858   Epoch: 2   Global Step: 14880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:47:50,190-Speed 10532.15 samples/sec   Loss 12.6822   LearningRate 0.0858   Epoch: 2   Global Step: 14890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:47:51,151-Speed 10668.57 samples/sec   Loss 12.6841   LearningRate 0.0858   Epoch: 2   Global Step: 14900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:47:52,103-Speed 10772.31 samples/sec   Loss 12.7135   LearningRate 0.0858   Epoch: 2   Global Step: 14910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:47:53,087-Speed 10413.14 samples/sec   Loss 12.5121   LearningRate 0.0858   Epoch: 2   Global Step: 14920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:47:54,033-Speed 10837.21 samples/sec   Loss 12.6645   LearningRate 0.0858   Epoch: 2   Global Step: 14930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:47:55,029-Speed 10283.99 samples/sec   Loss 12.7163   LearningRate 0.0858   Epoch: 2   Global Step: 14940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:47:55,992-Speed 10654.65 samples/sec   Loss 12.8425   LearningRate 0.0858   Epoch: 2   Global Step: 14950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:47:56,941-Speed 10793.73 samples/sec   Loss 12.5515   LearningRate 0.0858   Epoch: 2   Global Step: 14960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:47:57,934-Speed 10318.51 samples/sec   Loss 12.6410   LearningRate 0.0857   Epoch: 2   Global Step: 14970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:47:58,906-Speed 10548.29 samples/sec   Loss 12.6211   LearningRate 0.0857   Epoch: 2   Global Step: 14980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:47:59,915-Speed 10162.08 samples/sec   Loss 12.6652   LearningRate 0.0857   Epoch: 2   Global Step: 14990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:00,874-Speed 10692.61 samples/sec   Loss 12.9320   LearningRate 0.0857   Epoch: 2   Global Step: 15000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:01,859-Speed 10408.23 samples/sec   Loss 12.5618   LearningRate 0.0857   Epoch: 2   Global Step: 15010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:02,825-Speed 10605.20 samples/sec   Loss 12.7559   LearningRate 0.0857   Epoch: 2   Global Step: 15020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:03,810-Speed 10409.04 samples/sec   Loss 12.6221   LearningRate 0.0857   Epoch: 2   Global Step: 15030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:04,783-Speed 10536.58 samples/sec   Loss 12.4926   LearningRate 0.0857   Epoch: 2   Global Step: 15040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:05,715-Speed 10998.81 samples/sec   Loss 12.5376   LearningRate 0.0857   Epoch: 2   Global Step: 15050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:06,717-Speed 10223.07 samples/sec   Loss 12.7490   LearningRate 0.0857   Epoch: 2   Global Step: 15060   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:48:07,647-Speed 11027.00 samples/sec   Loss 12.4702   LearningRate 0.0857   Epoch: 2   Global Step: 15070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:08,604-Speed 10707.45 samples/sec   Loss 12.8718   LearningRate 0.0856   Epoch: 2   Global Step: 15080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:09,559-Speed 10739.74 samples/sec   Loss 12.8328   LearningRate 0.0856   Epoch: 2   Global Step: 15090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:10,509-Speed 10791.65 samples/sec   Loss 12.6447   LearningRate 0.0856   Epoch: 2   Global Step: 15100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:11,453-Speed 10853.07 samples/sec   Loss 12.5787   LearningRate 0.0856   Epoch: 2   Global Step: 15110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:12,417-Speed 10638.36 samples/sec   Loss 12.6802   LearningRate 0.0856   Epoch: 2   Global Step: 15120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:13,392-Speed 10511.01 samples/sec   Loss 12.4896   LearningRate 0.0856   Epoch: 2   Global Step: 15130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:14,332-Speed 10896.77 samples/sec   Loss 12.6088   LearningRate 0.0856   Epoch: 2   Global Step: 15140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:15,266-Speed 10979.66 samples/sec   Loss 12.6775   LearningRate 0.0856   Epoch: 2   Global Step: 15150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:16,255-Speed 10356.00 samples/sec   Loss 12.7974   LearningRate 0.0856   Epoch: 2   Global Step: 15160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:17,352-Speed 9344.53 samples/sec   Loss 12.5385   LearningRate 0.0856   Epoch: 2   Global Step: 15170   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:48:28,620-Speed 908.94 samples/sec   Loss 12.1565   LearningRate 0.0856   Epoch: 3   Global Step: 15180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:29,811-Speed 8606.20 samples/sec   Loss 11.5987   LearningRate 0.0855   Epoch: 3   Global Step: 15190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:30,776-Speed 10613.68 samples/sec   Loss 11.6279   LearningRate 0.0855   Epoch: 3   Global Step: 15200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:31,771-Speed 10306.21 samples/sec   Loss 11.5211   LearningRate 0.0855   Epoch: 3   Global Step: 15210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:32,733-Speed 10661.82 samples/sec   Loss 11.6548   LearningRate 0.0855   Epoch: 3   Global Step: 15220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:33,736-Speed 10215.68 samples/sec   Loss 11.5914   LearningRate 0.0855   Epoch: 3   Global Step: 15230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:34,787-Speed 9745.17 samples/sec   Loss 11.6856   LearningRate 0.0855   Epoch: 3   Global Step: 15240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:35,766-Speed 10476.06 samples/sec   Loss 11.5837   LearningRate 0.0855   Epoch: 3   Global Step: 15250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:36,769-Speed 10217.03 samples/sec   Loss 11.9511   LearningRate 0.0855   Epoch: 3   Global Step: 15260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:37,750-Speed 10454.36 samples/sec   Loss 11.7699   LearningRate 0.0855   Epoch: 3   Global Step: 15270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:38,759-Speed 10158.11 samples/sec   Loss 11.8582   LearningRate 0.0855   Epoch: 3   Global Step: 15280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:39,753-Speed 10310.41 samples/sec   Loss 11.7115   LearningRate 0.0855   Epoch: 3   Global Step: 15290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:41,031-Speed 8020.11 samples/sec   Loss 11.8766   LearningRate 0.0854   Epoch: 3   Global Step: 15300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:41,980-Speed 10801.54 samples/sec   Loss 11.7921   LearningRate 0.0854   Epoch: 3   Global Step: 15310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:42,984-Speed 10206.93 samples/sec   Loss 11.8889   LearningRate 0.0854   Epoch: 3   Global Step: 15320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:44,047-Speed 9640.38 samples/sec   Loss 11.8495   LearningRate 0.0854   Epoch: 3   Global Step: 15330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:45,034-Speed 10387.34 samples/sec   Loss 11.8597   LearningRate 0.0854   Epoch: 3   Global Step: 15340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:46,011-Speed 10491.58 samples/sec   Loss 11.8386   LearningRate 0.0854   Epoch: 3   Global Step: 15350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:47,071-Speed 9667.58 samples/sec   Loss 11.7791   LearningRate 0.0854   Epoch: 3   Global Step: 15360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:48,091-Speed 10051.37 samples/sec   Loss 11.8200   LearningRate 0.0854   Epoch: 3   Global Step: 15370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:49,095-Speed 10215.80 samples/sec   Loss 11.7833   LearningRate 0.0854   Epoch: 3   Global Step: 15380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:50,132-Speed 9881.29 samples/sec   Loss 11.9104   LearningRate 0.0854   Epoch: 3   Global Step: 15390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:51,150-Speed 10067.72 samples/sec   Loss 11.9756   LearningRate 0.0854   Epoch: 3   Global Step: 15400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:52,155-Speed 10198.17 samples/sec   Loss 11.6883   LearningRate 0.0853   Epoch: 3   Global Step: 15410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:53,148-Speed 10323.21 samples/sec   Loss 11.8866   LearningRate 0.0853   Epoch: 3   Global Step: 15420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:54,158-Speed 10146.42 samples/sec   Loss 11.7326   LearningRate 0.0853   Epoch: 3   Global Step: 15430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:55,155-Speed 10278.05 samples/sec   Loss 11.7675   LearningRate 0.0853   Epoch: 3   Global Step: 15440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:56,219-Speed 9630.82 samples/sec   Loss 11.8572   LearningRate 0.0853   Epoch: 3   Global Step: 15450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:57,259-Speed 9859.92 samples/sec   Loss 11.9280   LearningRate 0.0853   Epoch: 3   Global Step: 15460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:58,252-Speed 10321.38 samples/sec   Loss 11.9259   LearningRate 0.0853   Epoch: 3   Global Step: 15470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:48:59,288-Speed 9889.95 samples/sec   Loss 12.0971   LearningRate 0.0853   Epoch: 3   Global Step: 15480   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:49:00,267-Speed 10477.38 samples/sec   Loss 11.9693   LearningRate 0.0853   Epoch: 3   Global Step: 15490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:49:01,254-Speed 10392.13 samples/sec   Loss 12.0750   LearningRate 0.0853   Epoch: 3   Global Step: 15500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:49:02,347-Speed 9369.41 samples/sec   Loss 11.9908   LearningRate 0.0853   Epoch: 3   Global Step: 15510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:49:03,328-Speed 10452.96 samples/sec   Loss 12.0064   LearningRate 0.0852   Epoch: 3   Global Step: 15520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:49:04,323-Speed 10302.84 samples/sec   Loss 11.9418   LearningRate 0.0852   Epoch: 3   Global Step: 15530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:49:05,324-Speed 10230.62 samples/sec   Loss 11.9620   LearningRate 0.0852   Epoch: 3   Global Step: 15540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:49:06,373-Speed 9777.28 samples/sec   Loss 12.0441   LearningRate 0.0852   Epoch: 3   Global Step: 15550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:49:07,446-Speed 9547.81 samples/sec   Loss 11.9795   LearningRate 0.0852   Epoch: 3   Global Step: 15560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:49:08,468-Speed 10027.39 samples/sec   Loss 11.8766   LearningRate 0.0852   Epoch: 3   Global Step: 15570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:49:09,522-Speed 9730.20 samples/sec   Loss 11.7981   LearningRate 0.0852   Epoch: 3   Global Step: 15580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:49:10,534-Speed 10123.24 samples/sec   Loss 11.9976   LearningRate 0.0852   Epoch: 3   Global Step: 15590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:49:11,515-Speed 10456.43 samples/sec   Loss 12.0644   LearningRate 0.0852   Epoch: 3   Global Step: 15600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:49:12,490-Speed 10510.40 samples/sec   Loss 12.0482   LearningRate 0.0852   Epoch: 3   Global Step: 15610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:49:13,546-Speed 9704.95 samples/sec   Loss 12.0298   LearningRate 0.0852   Epoch: 3   Global Step: 15620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:49:14,527-Speed 10448.89 samples/sec   Loss 11.8341   LearningRate 0.0851   Epoch: 3   Global Step: 15630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:49:15,559-Speed 9926.64 samples/sec   Loss 12.0738   LearningRate 0.0851   Epoch: 3   Global Step: 15640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:49:16,549-Speed 10361.73 samples/sec   Loss 12.0431   LearningRate 0.0851   Epoch: 3   Global Step: 15650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:49:17,622-Speed 9550.90 samples/sec   Loss 12.0240   LearningRate 0.0851   Epoch: 3   Global Step: 15660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:49:18,582-Speed 10680.65 samples/sec   Loss 12.1823   LearningRate 0.0851   Epoch: 3   Global Step: 15670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:49:19,638-Speed 9700.09 samples/sec   Loss 12.0089   LearningRate 0.0851   Epoch: 3   Global Step: 15680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:49:20,699-Speed 9660.86 samples/sec   Loss 11.9593   LearningRate 0.0851   Epoch: 3   Global Step: 15690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:49:21,697-Speed 10274.08 samples/sec   Loss 11.7298   LearningRate 0.0851   Epoch: 3   Global Step: 15700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:49:22,764-Speed 9603.82 samples/sec   Loss 12.0055   LearningRate 0.0851   Epoch: 3   Global Step: 15710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:49:23,737-Speed 10533.84 samples/sec   Loss 12.1707   LearningRate 0.0851   Epoch: 3   Global Step: 15720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:49:24,743-Speed 10185.96 samples/sec   Loss 12.1162   LearningRate 0.0851   Epoch: 3   Global Step: 15730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:49:25,786-Speed 9822.09 samples/sec   Loss 11.9639   LearningRate 0.0850   Epoch: 3   Global Step: 15740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:49:26,754-Speed 10596.38 samples/sec   Loss 11.9619   LearningRate 0.0850   Epoch: 3   Global Step: 15750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:49:27,777-Speed 10015.94 samples/sec   Loss 12.1395   LearningRate 0.0850   Epoch: 3   Global Step: 15760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:49:28,783-Speed 10187.27 samples/sec   Loss 11.8404   LearningRate 0.0850   Epoch: 3   Global Step: 15770   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:49:29,794-Speed 10143.48 samples/sec   Loss 12.0881   LearningRate 0.0850   Epoch: 3   Global Step: 15780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:49:30,791-Speed 10278.74 samples/sec   Loss 12.1177   LearningRate 0.0850   Epoch: 3   Global Step: 15790   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:49:31,777-Speed 10398.41 samples/sec   Loss 12.2130   LearningRate 0.0850   Epoch: 3   Global Step: 15800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:49:32,803-Speed 9986.44 samples/sec   Loss 12.2441   LearningRate 0.0850   Epoch: 3   Global Step: 15810   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:49:33,852-Speed 9769.76 samples/sec   Loss 12.0629   LearningRate 0.0850   Epoch: 3   Global Step: 15820   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:49:34,881-Speed 9966.43 samples/sec   Loss 12.0640   LearningRate 0.0850   Epoch: 3   Global Step: 15830   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:49:35,881-Speed 10240.34 samples/sec   Loss 11.9704   LearningRate 0.0850   Epoch: 3   Global Step: 15840   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:49:36,930-Speed 9779.36 samples/sec   Loss 11.9406   LearningRate 0.0849   Epoch: 3   Global Step: 15850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:49:37,885-Speed 10725.61 samples/sec   Loss 12.1659   LearningRate 0.0849   Epoch: 3   Global Step: 15860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:49:38,895-Speed 10149.70 samples/sec   Loss 11.9984   LearningRate 0.0849   Epoch: 3   Global Step: 15870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:49:39,879-Speed 10414.97 samples/sec   Loss 12.1900   LearningRate 0.0849   Epoch: 3   Global Step: 15880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:49:40,879-Speed 10261.42 samples/sec   Loss 12.0590   LearningRate 0.0849   Epoch: 3   Global Step: 15890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:49:41,836-Speed 10719.44 samples/sec   Loss 12.0231   LearningRate 0.0849   Epoch: 3   Global Step: 15900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:49:42,936-Speed 9315.86 samples/sec   Loss 12.2779   LearningRate 0.0849   Epoch: 3   Global Step: 15910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:49:44,018-Speed 9474.64 samples/sec   Loss 12.0830   LearningRate 0.0849   Epoch: 3   Global Step: 15920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:49:45,017-Speed 10259.20 samples/sec   Loss 12.1089   LearningRate 0.0849   Epoch: 3   Global Step: 15930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:49:45,997-Speed 10455.87 samples/sec   Loss 12.1611   LearningRate 0.0849   Epoch: 3   Global Step: 15940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:49:46,958-Speed 10662.67 samples/sec   Loss 12.1840   LearningRate 0.0849   Epoch: 3   Global Step: 15950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:49:48,015-Speed 9700.17 samples/sec   Loss 12.0452   LearningRate 0.0848   Epoch: 3   Global Step: 15960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:49:49,034-Speed 10057.14 samples/sec   Loss 12.2398   LearningRate 0.0848   Epoch: 3   Global Step: 15970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:49:50,026-Speed 10340.30 samples/sec   Loss 12.2568   LearningRate 0.0848   Epoch: 3   Global Step: 15980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:49:51,134-Speed 9245.55 samples/sec   Loss 12.2793   LearningRate 0.0848   Epoch: 3   Global Step: 15990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:49:52,149-Speed 10096.37 samples/sec   Loss 12.0756   LearningRate 0.0848   Epoch: 3   Global Step: 16000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:50:14,440-[lfw][16000]XNorm: 15.239646
Training: 2022-04-10 23:50:14,441-[lfw][16000]Accuracy-Flip: 0.99200+-0.00470
Training: 2022-04-10 23:50:14,441-[lfw][16000]Accuracy-Highest: 0.99200
Training: 2022-04-10 23:50:39,862-[cfp_fp][16000]XNorm: 12.798633
Training: 2022-04-10 23:50:39,862-[cfp_fp][16000]Accuracy-Flip: 0.92386+-0.01597
Training: 2022-04-10 23:50:39,864-[cfp_fp][16000]Accuracy-Highest: 0.92386
Training: 2022-04-10 23:51:01,894-[agedb_30][16000]XNorm: 14.826289
Training: 2022-04-10 23:51:01,894-[agedb_30][16000]Accuracy-Flip: 0.94367+-0.01781
Training: 2022-04-10 23:51:01,895-[agedb_30][16000]Accuracy-Highest: 0.94367
Training: 2022-04-10 23:51:02,841-Speed 144.86 samples/sec   Loss 12.1230   LearningRate 0.0848   Epoch: 3   Global Step: 16010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:51:03,778-Speed 10930.53 samples/sec   Loss 12.1101   LearningRate 0.0848   Epoch: 3   Global Step: 16020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:51:04,741-Speed 10655.68 samples/sec   Loss 12.2997   LearningRate 0.0848   Epoch: 3   Global Step: 16030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:51:05,705-Speed 10623.72 samples/sec   Loss 12.0192   LearningRate 0.0848   Epoch: 3   Global Step: 16040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:51:06,668-Speed 10675.03 samples/sec   Loss 12.1390   LearningRate 0.0848   Epoch: 3   Global Step: 16050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:51:07,624-Speed 10716.65 samples/sec   Loss 12.2318   LearningRate 0.0848   Epoch: 3   Global Step: 16060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:51:08,635-Speed 10143.85 samples/sec   Loss 12.1635   LearningRate 0.0847   Epoch: 3   Global Step: 16070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:51:09,598-Speed 10642.89 samples/sec   Loss 12.0638   LearningRate 0.0847   Epoch: 3   Global Step: 16080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:51:10,554-Speed 10732.06 samples/sec   Loss 12.0035   LearningRate 0.0847   Epoch: 3   Global Step: 16090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:51:11,533-Speed 10466.51 samples/sec   Loss 12.2047   LearningRate 0.0847   Epoch: 3   Global Step: 16100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:51:12,491-Speed 10693.07 samples/sec   Loss 12.1361   LearningRate 0.0847   Epoch: 3   Global Step: 16110   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:51:13,475-Speed 10421.62 samples/sec   Loss 12.2687   LearningRate 0.0847   Epoch: 3   Global Step: 16120   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:51:14,424-Speed 10803.03 samples/sec   Loss 12.3057   LearningRate 0.0847   Epoch: 3   Global Step: 16130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:51:15,364-Speed 10907.91 samples/sec   Loss 12.0348   LearningRate 0.0847   Epoch: 3   Global Step: 16140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:51:16,312-Speed 10805.51 samples/sec   Loss 12.1779   LearningRate 0.0847   Epoch: 3   Global Step: 16150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:51:17,266-Speed 10749.59 samples/sec   Loss 12.1581   LearningRate 0.0847   Epoch: 3   Global Step: 16160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:51:18,226-Speed 10673.67 samples/sec   Loss 11.9898   LearningRate 0.0847   Epoch: 3   Global Step: 16170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:51:19,263-Speed 9880.58 samples/sec   Loss 12.0323   LearningRate 0.0846   Epoch: 3   Global Step: 16180   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:51:20,237-Speed 10531.73 samples/sec   Loss 12.1447   LearningRate 0.0846   Epoch: 3   Global Step: 16190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:51:21,219-Speed 10433.19 samples/sec   Loss 12.1069   LearningRate 0.0846   Epoch: 3   Global Step: 16200   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:51:22,200-Speed 10450.63 samples/sec   Loss 12.1291   LearningRate 0.0846   Epoch: 3   Global Step: 16210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:51:23,176-Speed 10510.27 samples/sec   Loss 12.2318   LearningRate 0.0846   Epoch: 3   Global Step: 16220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:51:24,165-Speed 10358.93 samples/sec   Loss 12.1074   LearningRate 0.0846   Epoch: 3   Global Step: 16230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:51:25,114-Speed 10801.77 samples/sec   Loss 12.1232   LearningRate 0.0846   Epoch: 3   Global Step: 16240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:51:26,096-Speed 10433.88 samples/sec   Loss 12.2252   LearningRate 0.0846   Epoch: 3   Global Step: 16250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:51:27,043-Speed 10827.74 samples/sec   Loss 12.1895   LearningRate 0.0846   Epoch: 3   Global Step: 16260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:51:28,062-Speed 10055.77 samples/sec   Loss 12.2531   LearningRate 0.0846   Epoch: 3   Global Step: 16270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:51:29,004-Speed 10879.02 samples/sec   Loss 12.1945   LearningRate 0.0846   Epoch: 3   Global Step: 16280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:51:29,937-Speed 10991.58 samples/sec   Loss 12.1846   LearningRate 0.0845   Epoch: 3   Global Step: 16290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:51:30,940-Speed 10215.29 samples/sec   Loss 12.1464   LearningRate 0.0845   Epoch: 3   Global Step: 16300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:51:31,879-Speed 10923.76 samples/sec   Loss 12.1337   LearningRate 0.0845   Epoch: 3   Global Step: 16310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:51:32,871-Speed 10329.81 samples/sec   Loss 12.0468   LearningRate 0.0845   Epoch: 3   Global Step: 16320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:51:33,871-Speed 10249.93 samples/sec   Loss 12.2460   LearningRate 0.0845   Epoch: 3   Global Step: 16330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:51:34,841-Speed 10569.77 samples/sec   Loss 12.0568   LearningRate 0.0845   Epoch: 3   Global Step: 16340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:51:35,817-Speed 10504.34 samples/sec   Loss 12.2916   LearningRate 0.0845   Epoch: 3   Global Step: 16350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:51:36,773-Speed 10715.70 samples/sec   Loss 12.2999   LearningRate 0.0845   Epoch: 3   Global Step: 16360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:51:37,711-Speed 10933.99 samples/sec   Loss 12.3185   LearningRate 0.0845   Epoch: 3   Global Step: 16370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:51:38,652-Speed 10889.87 samples/sec   Loss 12.0244   LearningRate 0.0845   Epoch: 3   Global Step: 16380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:51:39,606-Speed 10744.20 samples/sec   Loss 12.1280   LearningRate 0.0845   Epoch: 3   Global Step: 16390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:51:40,576-Speed 10563.40 samples/sec   Loss 12.0002   LearningRate 0.0844   Epoch: 3   Global Step: 16400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:51:41,579-Speed 10226.16 samples/sec   Loss 12.0811   LearningRate 0.0844   Epoch: 3   Global Step: 16410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:51:42,563-Speed 10418.09 samples/sec   Loss 12.1922   LearningRate 0.0844   Epoch: 3   Global Step: 16420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:51:43,503-Speed 10901.03 samples/sec   Loss 12.2803   LearningRate 0.0844   Epoch: 3   Global Step: 16430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:51:44,462-Speed 10691.74 samples/sec   Loss 12.1415   LearningRate 0.0844   Epoch: 3   Global Step: 16440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:51:45,403-Speed 10881.60 samples/sec   Loss 12.0362   LearningRate 0.0844   Epoch: 3   Global Step: 16450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:51:46,357-Speed 10746.39 samples/sec   Loss 12.0960   LearningRate 0.0844   Epoch: 3   Global Step: 16460   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:51:47,292-Speed 10966.12 samples/sec   Loss 12.2006   LearningRate 0.0844   Epoch: 3   Global Step: 16470   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:51:48,257-Speed 10615.21 samples/sec   Loss 12.2719   LearningRate 0.0844   Epoch: 3   Global Step: 16480   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:51:49,229-Speed 10549.42 samples/sec   Loss 12.0891   LearningRate 0.0844   Epoch: 3   Global Step: 16490   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:51:50,186-Speed 10715.73 samples/sec   Loss 12.1665   LearningRate 0.0844   Epoch: 3   Global Step: 16500   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:51:51,120-Speed 10973.80 samples/sec   Loss 12.2839   LearningRate 0.0843   Epoch: 3   Global Step: 16510   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:51:52,111-Speed 10334.14 samples/sec   Loss 12.2205   LearningRate 0.0843   Epoch: 3   Global Step: 16520   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:51:53,031-Speed 11139.17 samples/sec   Loss 12.2998   LearningRate 0.0843   Epoch: 3   Global Step: 16530   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:51:54,042-Speed 10145.02 samples/sec   Loss 12.1343   LearningRate 0.0843   Epoch: 3   Global Step: 16540   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:51:55,020-Speed 10478.98 samples/sec   Loss 12.3366   LearningRate 0.0843   Epoch: 3   Global Step: 16550   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:51:56,028-Speed 10162.30 samples/sec   Loss 12.4045   LearningRate 0.0843   Epoch: 3   Global Step: 16560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:51:56,984-Speed 10725.64 samples/sec   Loss 12.2038   LearningRate 0.0843   Epoch: 3   Global Step: 16570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:51:57,932-Speed 10815.55 samples/sec   Loss 12.0385   LearningRate 0.0843   Epoch: 3   Global Step: 16580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:51:58,901-Speed 10573.74 samples/sec   Loss 12.0775   LearningRate 0.0843   Epoch: 3   Global Step: 16590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:51:59,849-Speed 10807.45 samples/sec   Loss 12.1755   LearningRate 0.0843   Epoch: 3   Global Step: 16600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:52:00,849-Speed 10255.22 samples/sec   Loss 12.3437   LearningRate 0.0843   Epoch: 3   Global Step: 16610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:52:01,802-Speed 10755.83 samples/sec   Loss 12.1497   LearningRate 0.0842   Epoch: 3   Global Step: 16620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:52:02,808-Speed 10191.86 samples/sec   Loss 12.0669   LearningRate 0.0842   Epoch: 3   Global Step: 16630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:52:03,794-Speed 10396.24 samples/sec   Loss 12.2564   LearningRate 0.0842   Epoch: 3   Global Step: 16640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:52:04,759-Speed 10613.76 samples/sec   Loss 12.0684   LearningRate 0.0842   Epoch: 3   Global Step: 16650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:52:05,743-Speed 10414.21 samples/sec   Loss 12.2149   LearningRate 0.0842   Epoch: 3   Global Step: 16660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:52:06,735-Speed 10332.31 samples/sec   Loss 12.1200   LearningRate 0.0842   Epoch: 3   Global Step: 16670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:52:07,753-Speed 10075.72 samples/sec   Loss 12.2852   LearningRate 0.0842   Epoch: 3   Global Step: 16680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:52:08,742-Speed 10366.89 samples/sec   Loss 12.3965   LearningRate 0.0842   Epoch: 3   Global Step: 16690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:52:09,715-Speed 10533.31 samples/sec   Loss 12.1181   LearningRate 0.0842   Epoch: 3   Global Step: 16700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:52:10,665-Speed 10791.95 samples/sec   Loss 12.2718   LearningRate 0.0842   Epoch: 3   Global Step: 16710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:52:11,606-Speed 10888.75 samples/sec   Loss 12.1171   LearningRate 0.0842   Epoch: 3   Global Step: 16720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:52:12,612-Speed 10185.47 samples/sec   Loss 12.1776   LearningRate 0.0841   Epoch: 3   Global Step: 16730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:52:13,564-Speed 10767.77 samples/sec   Loss 12.3336   LearningRate 0.0841   Epoch: 3   Global Step: 16740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:52:14,508-Speed 10857.99 samples/sec   Loss 12.2249   LearningRate 0.0841   Epoch: 3   Global Step: 16750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:52:15,445-Speed 10939.12 samples/sec   Loss 12.0881   LearningRate 0.0841   Epoch: 3   Global Step: 16760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:52:16,372-Speed 11054.83 samples/sec   Loss 12.2464   LearningRate 0.0841   Epoch: 3   Global Step: 16770   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:52:17,355-Speed 10427.58 samples/sec   Loss 12.1651   LearningRate 0.0841   Epoch: 3   Global Step: 16780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:52:18,346-Speed 10346.24 samples/sec   Loss 12.1933   LearningRate 0.0841   Epoch: 3   Global Step: 16790   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:52:19,337-Speed 10338.32 samples/sec   Loss 12.2058   LearningRate 0.0841   Epoch: 3   Global Step: 16800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:52:20,323-Speed 10393.78 samples/sec   Loss 12.0936   LearningRate 0.0841   Epoch: 3   Global Step: 16810   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:52:21,296-Speed 10531.25 samples/sec   Loss 12.1404   LearningRate 0.0841   Epoch: 3   Global Step: 16820   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:52:22,242-Speed 10839.34 samples/sec   Loss 12.1762   LearningRate 0.0841   Epoch: 3   Global Step: 16830   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:52:23,229-Speed 10381.30 samples/sec   Loss 12.3341   LearningRate 0.0840   Epoch: 3   Global Step: 16840   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:52:24,204-Speed 10524.13 samples/sec   Loss 12.3281   LearningRate 0.0840   Epoch: 3   Global Step: 16850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:52:25,166-Speed 10655.86 samples/sec   Loss 12.2613   LearningRate 0.0840   Epoch: 3   Global Step: 16860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:52:26,132-Speed 10601.53 samples/sec   Loss 12.0207   LearningRate 0.0840   Epoch: 3   Global Step: 16870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:52:27,087-Speed 10738.29 samples/sec   Loss 12.2054   LearningRate 0.0840   Epoch: 3   Global Step: 16880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:52:28,074-Speed 10385.13 samples/sec   Loss 12.0625   LearningRate 0.0840   Epoch: 3   Global Step: 16890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:52:29,026-Speed 10778.45 samples/sec   Loss 12.0622   LearningRate 0.0840   Epoch: 3   Global Step: 16900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:52:29,986-Speed 10678.66 samples/sec   Loss 12.0062   LearningRate 0.0840   Epoch: 3   Global Step: 16910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:52:30,979-Speed 10321.79 samples/sec   Loss 12.2956   LearningRate 0.0840   Epoch: 3   Global Step: 16920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:52:31,921-Speed 10877.41 samples/sec   Loss 12.1847   LearningRate 0.0840   Epoch: 3   Global Step: 16930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:52:32,922-Speed 10228.51 samples/sec   Loss 12.2812   LearningRate 0.0840   Epoch: 3   Global Step: 16940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:52:33,891-Speed 10579.01 samples/sec   Loss 12.1171   LearningRate 0.0839   Epoch: 3   Global Step: 16950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:52:34,871-Speed 10465.84 samples/sec   Loss 12.3659   LearningRate 0.0839   Epoch: 3   Global Step: 16960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:52:35,838-Speed 10597.48 samples/sec   Loss 12.1877   LearningRate 0.0839   Epoch: 3   Global Step: 16970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:52:36,806-Speed 10590.58 samples/sec   Loss 12.1814   LearningRate 0.0839   Epoch: 3   Global Step: 16980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:52:37,806-Speed 10246.47 samples/sec   Loss 12.1162   LearningRate 0.0839   Epoch: 3   Global Step: 16990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:52:38,808-Speed 10233.97 samples/sec   Loss 12.0182   LearningRate 0.0839   Epoch: 3   Global Step: 17000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:52:39,800-Speed 10335.76 samples/sec   Loss 12.0987   LearningRate 0.0839   Epoch: 3   Global Step: 17010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:52:40,767-Speed 10593.99 samples/sec   Loss 12.1848   LearningRate 0.0839   Epoch: 3   Global Step: 17020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:52:41,784-Speed 10081.49 samples/sec   Loss 12.2938   LearningRate 0.0839   Epoch: 3   Global Step: 17030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:52:42,765-Speed 10445.21 samples/sec   Loss 12.1505   LearningRate 0.0839   Epoch: 3   Global Step: 17040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:52:43,713-Speed 10833.13 samples/sec   Loss 12.1595   LearningRate 0.0839   Epoch: 3   Global Step: 17050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:52:44,678-Speed 10619.28 samples/sec   Loss 12.2126   LearningRate 0.0838   Epoch: 3   Global Step: 17060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:52:45,631-Speed 10748.63 samples/sec   Loss 12.2274   LearningRate 0.0838   Epoch: 3   Global Step: 17070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:52:46,594-Speed 10644.48 samples/sec   Loss 12.4017   LearningRate 0.0838   Epoch: 3   Global Step: 17080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:52:47,567-Speed 10526.99 samples/sec   Loss 12.1424   LearningRate 0.0838   Epoch: 3   Global Step: 17090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:52:48,545-Speed 10486.94 samples/sec   Loss 12.2284   LearningRate 0.0838   Epoch: 3   Global Step: 17100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:52:49,518-Speed 10531.89 samples/sec   Loss 12.0516   LearningRate 0.0838   Epoch: 3   Global Step: 17110   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:52:50,497-Speed 10506.79 samples/sec   Loss 12.2984   LearningRate 0.0838   Epoch: 3   Global Step: 17120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:52:51,459-Speed 10644.86 samples/sec   Loss 12.2904   LearningRate 0.0838   Epoch: 3   Global Step: 17130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:52:52,455-Speed 10296.20 samples/sec   Loss 12.1625   LearningRate 0.0838   Epoch: 3   Global Step: 17140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:52:53,402-Speed 10824.92 samples/sec   Loss 12.1011   LearningRate 0.0838   Epoch: 3   Global Step: 17150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:52:54,365-Speed 10640.41 samples/sec   Loss 12.2072   LearningRate 0.0838   Epoch: 3   Global Step: 17160   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:52:55,312-Speed 10816.26 samples/sec   Loss 12.1318   LearningRate 0.0837   Epoch: 3   Global Step: 17170   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:52:56,277-Speed 10621.64 samples/sec   Loss 12.1399   LearningRate 0.0837   Epoch: 3   Global Step: 17180   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:52:57,284-Speed 10178.72 samples/sec   Loss 12.0708   LearningRate 0.0837   Epoch: 3   Global Step: 17190   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:52:58,237-Speed 10747.09 samples/sec   Loss 12.0393   LearningRate 0.0837   Epoch: 3   Global Step: 17200   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:52:59,221-Speed 10425.82 samples/sec   Loss 12.1985   LearningRate 0.0837   Epoch: 3   Global Step: 17210   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:53:00,199-Speed 10484.92 samples/sec   Loss 12.1492   LearningRate 0.0837   Epoch: 3   Global Step: 17220   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:53:01,187-Speed 10374.71 samples/sec   Loss 12.0154   LearningRate 0.0837   Epoch: 3   Global Step: 17230   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:53:02,155-Speed 10588.27 samples/sec   Loss 12.0482   LearningRate 0.0837   Epoch: 3   Global Step: 17240   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:53:03,156-Speed 10234.83 samples/sec   Loss 12.2011   LearningRate 0.0837   Epoch: 3   Global Step: 17250   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:53:04,120-Speed 10638.51 samples/sec   Loss 12.0352   LearningRate 0.0837   Epoch: 3   Global Step: 17260   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-10 23:53:05,072-Speed 10760.57 samples/sec   Loss 12.0830   LearningRate 0.0837   Epoch: 3   Global Step: 17270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:53:06,040-Speed 10585.88 samples/sec   Loss 12.1177   LearningRate 0.0836   Epoch: 3   Global Step: 17280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:53:07,019-Speed 10472.68 samples/sec   Loss 12.1876   LearningRate 0.0836   Epoch: 3   Global Step: 17290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:53:07,992-Speed 10533.31 samples/sec   Loss 11.9933   LearningRate 0.0836   Epoch: 3   Global Step: 17300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:53:08,962-Speed 10558.46 samples/sec   Loss 12.1056   LearningRate 0.0836   Epoch: 3   Global Step: 17310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:53:09,923-Speed 10668.03 samples/sec   Loss 12.1306   LearningRate 0.0836   Epoch: 3   Global Step: 17320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:53:10,868-Speed 10854.22 samples/sec   Loss 12.1122   LearningRate 0.0836   Epoch: 3   Global Step: 17330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:53:11,975-Speed 9259.79 samples/sec   Loss 12.0093   LearningRate 0.0836   Epoch: 3   Global Step: 17340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:53:12,944-Speed 10580.64 samples/sec   Loss 12.1494   LearningRate 0.0836   Epoch: 3   Global Step: 17350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:53:13,941-Speed 10273.00 samples/sec   Loss 12.1144   LearningRate 0.0836   Epoch: 3   Global Step: 17360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:53:14,907-Speed 10612.46 samples/sec   Loss 12.0863   LearningRate 0.0836   Epoch: 3   Global Step: 17370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:53:15,855-Speed 10803.75 samples/sec   Loss 12.2533   LearningRate 0.0836   Epoch: 3   Global Step: 17380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:53:16,844-Speed 10363.08 samples/sec   Loss 12.1556   LearningRate 0.0835   Epoch: 3   Global Step: 17390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:53:17,822-Speed 10484.80 samples/sec   Loss 12.1911   LearningRate 0.0835   Epoch: 3   Global Step: 17400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:53:18,800-Speed 10483.51 samples/sec   Loss 12.1987   LearningRate 0.0835   Epoch: 3   Global Step: 17410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:53:19,772-Speed 10546.28 samples/sec   Loss 12.1524   LearningRate 0.0835   Epoch: 3   Global Step: 17420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:53:20,739-Speed 10603.52 samples/sec   Loss 12.1980   LearningRate 0.0835   Epoch: 3   Global Step: 17430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:53:21,682-Speed 10862.60 samples/sec   Loss 12.1283   LearningRate 0.0835   Epoch: 3   Global Step: 17440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:53:22,670-Speed 10373.99 samples/sec   Loss 12.4601   LearningRate 0.0835   Epoch: 3   Global Step: 17450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:53:23,642-Speed 10548.39 samples/sec   Loss 12.1667   LearningRate 0.0835   Epoch: 3   Global Step: 17460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:53:24,624-Speed 10434.24 samples/sec   Loss 12.1109   LearningRate 0.0835   Epoch: 3   Global Step: 17470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:53:25,585-Speed 10668.00 samples/sec   Loss 12.2926   LearningRate 0.0835   Epoch: 3   Global Step: 17480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:53:26,517-Speed 11007.33 samples/sec   Loss 12.2353   LearningRate 0.0835   Epoch: 3   Global Step: 17490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:53:27,500-Speed 10414.72 samples/sec   Loss 12.1226   LearningRate 0.0834   Epoch: 3   Global Step: 17500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:53:28,459-Speed 10696.21 samples/sec   Loss 12.2649   LearningRate 0.0834   Epoch: 3   Global Step: 17510   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:53:29,488-Speed 9957.82 samples/sec   Loss 11.7793   LearningRate 0.0834   Epoch: 3   Global Step: 17520   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:53:30,431-Speed 10877.69 samples/sec   Loss 12.2643   LearningRate 0.0834   Epoch: 3   Global Step: 17530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:53:31,395-Speed 10632.65 samples/sec   Loss 12.0654   LearningRate 0.0834   Epoch: 3   Global Step: 17540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:53:32,367-Speed 10545.89 samples/sec   Loss 12.2324   LearningRate 0.0834   Epoch: 3   Global Step: 17550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:53:33,334-Speed 10597.95 samples/sec   Loss 12.1546   LearningRate 0.0834   Epoch: 3   Global Step: 17560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:53:34,303-Speed 10574.07 samples/sec   Loss 12.0254   LearningRate 0.0834   Epoch: 3   Global Step: 17570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:53:35,277-Speed 10526.56 samples/sec   Loss 12.0614   LearningRate 0.0834   Epoch: 3   Global Step: 17580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:53:36,270-Speed 10320.57 samples/sec   Loss 12.2221   LearningRate 0.0834   Epoch: 3   Global Step: 17590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:53:37,245-Speed 10510.01 samples/sec   Loss 11.9720   LearningRate 0.0834   Epoch: 3   Global Step: 17600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:53:38,276-Speed 9939.37 samples/sec   Loss 12.0754   LearningRate 0.0833   Epoch: 3   Global Step: 17610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:53:39,214-Speed 10926.39 samples/sec   Loss 11.9411   LearningRate 0.0833   Epoch: 3   Global Step: 17620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:53:40,220-Speed 10191.65 samples/sec   Loss 11.8608   LearningRate 0.0833   Epoch: 3   Global Step: 17630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:53:41,162-Speed 10883.24 samples/sec   Loss 12.0611   LearningRate 0.0833   Epoch: 3   Global Step: 17640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:53:42,122-Speed 10675.12 samples/sec   Loss 12.1896   LearningRate 0.0833   Epoch: 3   Global Step: 17650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:53:43,089-Speed 10598.01 samples/sec   Loss 12.1790   LearningRate 0.0833   Epoch: 3   Global Step: 17660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:53:44,041-Speed 10777.27 samples/sec   Loss 12.1049   LearningRate 0.0833   Epoch: 3   Global Step: 17670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:53:44,983-Speed 10875.80 samples/sec   Loss 12.0969   LearningRate 0.0833   Epoch: 3   Global Step: 17680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:53:45,927-Speed 10856.51 samples/sec   Loss 12.0283   LearningRate 0.0833   Epoch: 3   Global Step: 17690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:53:46,914-Speed 10382.29 samples/sec   Loss 12.0054   LearningRate 0.0833   Epoch: 3   Global Step: 17700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:53:47,869-Speed 10740.62 samples/sec   Loss 12.0738   LearningRate 0.0833   Epoch: 3   Global Step: 17710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:53:48,831-Speed 10649.30 samples/sec   Loss 12.0586   LearningRate 0.0833   Epoch: 3   Global Step: 17720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:53:49,782-Speed 10784.35 samples/sec   Loss 12.0132   LearningRate 0.0832   Epoch: 3   Global Step: 17730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:53:50,724-Speed 10878.10 samples/sec   Loss 12.0562   LearningRate 0.0832   Epoch: 3   Global Step: 17740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:53:51,744-Speed 10052.52 samples/sec   Loss 12.0151   LearningRate 0.0832   Epoch: 3   Global Step: 17750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:53:52,727-Speed 10427.39 samples/sec   Loss 11.9708   LearningRate 0.0832   Epoch: 3   Global Step: 17760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:53:53,704-Speed 10489.66 samples/sec   Loss 11.9872   LearningRate 0.0832   Epoch: 3   Global Step: 17770   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:53:54,648-Speed 10853.98 samples/sec   Loss 11.9212   LearningRate 0.0832   Epoch: 3   Global Step: 17780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:53:55,606-Speed 10701.68 samples/sec   Loss 11.9709   LearningRate 0.0832   Epoch: 3   Global Step: 17790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:53:56,641-Speed 9902.83 samples/sec   Loss 12.0891   LearningRate 0.0832   Epoch: 3   Global Step: 17800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:53:57,597-Speed 10728.68 samples/sec   Loss 12.0950   LearningRate 0.0832   Epoch: 3   Global Step: 17810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:53:58,562-Speed 10616.04 samples/sec   Loss 12.2281   LearningRate 0.0832   Epoch: 3   Global Step: 17820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:53:59,556-Speed 10312.84 samples/sec   Loss 12.2161   LearningRate 0.0832   Epoch: 3   Global Step: 17830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:54:00,525-Speed 10595.10 samples/sec   Loss 12.0922   LearningRate 0.0831   Epoch: 3   Global Step: 17840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:54:01,487-Speed 10652.57 samples/sec   Loss 11.9615   LearningRate 0.0831   Epoch: 3   Global Step: 17850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:54:02,439-Speed 10766.15 samples/sec   Loss 11.9973   LearningRate 0.0831   Epoch: 3   Global Step: 17860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:54:03,431-Speed 10334.67 samples/sec   Loss 12.1351   LearningRate 0.0831   Epoch: 3   Global Step: 17870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:54:04,407-Speed 10505.33 samples/sec   Loss 12.2343   LearningRate 0.0831   Epoch: 3   Global Step: 17880   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:54:05,381-Speed 10525.57 samples/sec   Loss 12.1648   LearningRate 0.0831   Epoch: 3   Global Step: 17890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:54:06,329-Speed 10806.73 samples/sec   Loss 12.0522   LearningRate 0.0831   Epoch: 3   Global Step: 17900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:54:07,313-Speed 10411.61 samples/sec   Loss 12.0828   LearningRate 0.0831   Epoch: 3   Global Step: 17910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:54:08,343-Speed 9956.58 samples/sec   Loss 11.9215   LearningRate 0.0831   Epoch: 3   Global Step: 17920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:54:09,321-Speed 10481.95 samples/sec   Loss 12.1725   LearningRate 0.0831   Epoch: 3   Global Step: 17930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:54:10,272-Speed 10782.58 samples/sec   Loss 12.1000   LearningRate 0.0831   Epoch: 3   Global Step: 17940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:54:11,214-Speed 10876.81 samples/sec   Loss 12.0222   LearningRate 0.0830   Epoch: 3   Global Step: 17950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:54:12,212-Speed 10269.54 samples/sec   Loss 12.1407   LearningRate 0.0830   Epoch: 3   Global Step: 17960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:54:13,175-Speed 10640.49 samples/sec   Loss 12.1437   LearningRate 0.0830   Epoch: 3   Global Step: 17970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:54:14,168-Speed 10325.74 samples/sec   Loss 12.1034   LearningRate 0.0830   Epoch: 3   Global Step: 17980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:54:15,174-Speed 10195.91 samples/sec   Loss 12.2070   LearningRate 0.0830   Epoch: 3   Global Step: 17990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:54:16,134-Speed 10679.72 samples/sec   Loss 12.3126   LearningRate 0.0830   Epoch: 3   Global Step: 18000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:54:38,568-[lfw][18000]XNorm: 15.177282
Training: 2022-04-10 23:54:38,568-[lfw][18000]Accuracy-Flip: 0.99333+-0.00459
Training: 2022-04-10 23:54:38,569-[lfw][18000]Accuracy-Highest: 0.99333
Training: 2022-04-10 23:55:04,169-[cfp_fp][18000]XNorm: 12.780324
Training: 2022-04-10 23:55:04,170-[cfp_fp][18000]Accuracy-Flip: 0.93314+-0.01542
Training: 2022-04-10 23:55:04,171-[cfp_fp][18000]Accuracy-Highest: 0.93314
Training: 2022-04-10 23:55:26,349-[agedb_30][18000]XNorm: 14.786306
Training: 2022-04-10 23:55:26,350-[agedb_30][18000]Accuracy-Flip: 0.94217+-0.01455
Training: 2022-04-10 23:55:26,351-[agedb_30][18000]Accuracy-Highest: 0.94367
Training: 2022-04-10 23:55:27,300-Speed 143.89 samples/sec   Loss 12.0345   LearningRate 0.0830   Epoch: 3   Global Step: 18010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:55:28,264-Speed 10629.62 samples/sec   Loss 12.0472   LearningRate 0.0830   Epoch: 3   Global Step: 18020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:55:29,247-Speed 10436.78 samples/sec   Loss 12.1490   LearningRate 0.0830   Epoch: 3   Global Step: 18030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:55:30,240-Speed 10323.89 samples/sec   Loss 11.9577   LearningRate 0.0830   Epoch: 3   Global Step: 18040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:55:31,225-Speed 10406.36 samples/sec   Loss 12.2477   LearningRate 0.0830   Epoch: 3   Global Step: 18050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:55:32,221-Speed 10289.29 samples/sec   Loss 12.0602   LearningRate 0.0829   Epoch: 3   Global Step: 18060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:55:33,179-Speed 10691.70 samples/sec   Loss 11.9919   LearningRate 0.0829   Epoch: 3   Global Step: 18070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:55:34,180-Speed 10257.31 samples/sec   Loss 12.1425   LearningRate 0.0829   Epoch: 3   Global Step: 18080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:55:35,121-Speed 10886.94 samples/sec   Loss 12.2861   LearningRate 0.0829   Epoch: 3   Global Step: 18090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:55:36,088-Speed 10605.97 samples/sec   Loss 11.9209   LearningRate 0.0829   Epoch: 3   Global Step: 18100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:55:37,058-Speed 10564.88 samples/sec   Loss 11.9041   LearningRate 0.0829   Epoch: 3   Global Step: 18110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:55:38,081-Speed 10012.27 samples/sec   Loss 12.1403   LearningRate 0.0829   Epoch: 3   Global Step: 18120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:55:39,009-Speed 11069.08 samples/sec   Loss 12.0237   LearningRate 0.0829   Epoch: 3   Global Step: 18130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:55:39,997-Speed 10370.67 samples/sec   Loss 12.1405   LearningRate 0.0829   Epoch: 3   Global Step: 18140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:55:40,962-Speed 10621.22 samples/sec   Loss 12.0466   LearningRate 0.0829   Epoch: 3   Global Step: 18150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:55:41,904-Speed 10882.18 samples/sec   Loss 12.1448   LearningRate 0.0829   Epoch: 3   Global Step: 18160   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:55:42,869-Speed 10618.11 samples/sec   Loss 12.0127   LearningRate 0.0828   Epoch: 3   Global Step: 18170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:55:43,848-Speed 10474.88 samples/sec   Loss 12.0403   LearningRate 0.0828   Epoch: 3   Global Step: 18180   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:55:44,809-Speed 10672.37 samples/sec   Loss 12.0620   LearningRate 0.0828   Epoch: 3   Global Step: 18190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:55:45,778-Speed 10579.96 samples/sec   Loss 12.0948   LearningRate 0.0828   Epoch: 3   Global Step: 18200   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:55:46,754-Speed 10503.89 samples/sec   Loss 12.1457   LearningRate 0.0828   Epoch: 3   Global Step: 18210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:55:47,725-Speed 10548.24 samples/sec   Loss 12.1372   LearningRate 0.0828   Epoch: 3   Global Step: 18220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:55:48,724-Speed 10257.03 samples/sec   Loss 12.2715   LearningRate 0.0828   Epoch: 3   Global Step: 18230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:55:49,703-Speed 10469.25 samples/sec   Loss 12.1792   LearningRate 0.0828   Epoch: 3   Global Step: 18240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:55:50,679-Speed 10509.53 samples/sec   Loss 11.8717   LearningRate 0.0828   Epoch: 3   Global Step: 18250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:55:51,663-Speed 10412.81 samples/sec   Loss 12.0069   LearningRate 0.0828   Epoch: 3   Global Step: 18260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:55:52,648-Speed 10405.32 samples/sec   Loss 11.7982   LearningRate 0.0828   Epoch: 3   Global Step: 18270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:55:53,632-Speed 10417.60 samples/sec   Loss 11.9626   LearningRate 0.0827   Epoch: 3   Global Step: 18280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:55:54,603-Speed 10554.00 samples/sec   Loss 11.9033   LearningRate 0.0827   Epoch: 3   Global Step: 18290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:55:55,563-Speed 10682.82 samples/sec   Loss 11.9899   LearningRate 0.0827   Epoch: 3   Global Step: 18300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:55:56,480-Speed 11175.99 samples/sec   Loss 11.9437   LearningRate 0.0827   Epoch: 3   Global Step: 18310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:55:57,456-Speed 10495.34 samples/sec   Loss 11.9194   LearningRate 0.0827   Epoch: 3   Global Step: 18320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:55:58,458-Speed 10235.06 samples/sec   Loss 12.1399   LearningRate 0.0827   Epoch: 3   Global Step: 18330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:55:59,404-Speed 10860.53 samples/sec   Loss 11.9943   LearningRate 0.0827   Epoch: 3   Global Step: 18340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:56:00,379-Speed 10507.12 samples/sec   Loss 11.8763   LearningRate 0.0827   Epoch: 3   Global Step: 18350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:56:01,369-Speed 10354.49 samples/sec   Loss 11.9312   LearningRate 0.0827   Epoch: 3   Global Step: 18360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:56:02,368-Speed 10265.73 samples/sec   Loss 11.9002   LearningRate 0.0827   Epoch: 3   Global Step: 18370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:56:03,342-Speed 10519.78 samples/sec   Loss 11.9533   LearningRate 0.0827   Epoch: 3   Global Step: 18380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:56:04,321-Speed 10464.56 samples/sec   Loss 12.0855   LearningRate 0.0826   Epoch: 3   Global Step: 18390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:56:05,294-Speed 10537.41 samples/sec   Loss 12.0228   LearningRate 0.0826   Epoch: 3   Global Step: 18400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:56:06,273-Speed 10493.91 samples/sec   Loss 11.9196   LearningRate 0.0826   Epoch: 3   Global Step: 18410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:56:07,265-Speed 10343.58 samples/sec   Loss 11.9022   LearningRate 0.0826   Epoch: 3   Global Step: 18420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:56:08,205-Speed 10897.36 samples/sec   Loss 12.1200   LearningRate 0.0826   Epoch: 3   Global Step: 18430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:56:09,200-Speed 10304.27 samples/sec   Loss 11.8675   LearningRate 0.0826   Epoch: 3   Global Step: 18440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:56:10,161-Speed 10661.08 samples/sec   Loss 11.8240   LearningRate 0.0826   Epoch: 3   Global Step: 18450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:56:11,161-Speed 10256.29 samples/sec   Loss 11.8559   LearningRate 0.0826   Epoch: 3   Global Step: 18460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:56:12,126-Speed 10614.76 samples/sec   Loss 11.8790   LearningRate 0.0826   Epoch: 3   Global Step: 18470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:56:13,089-Speed 10641.13 samples/sec   Loss 11.9067   LearningRate 0.0826   Epoch: 3   Global Step: 18480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:56:14,046-Speed 10717.06 samples/sec   Loss 11.9145   LearningRate 0.0826   Epoch: 3   Global Step: 18490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:56:15,012-Speed 10606.14 samples/sec   Loss 11.7829   LearningRate 0.0825   Epoch: 3   Global Step: 18500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:56:15,985-Speed 10530.82 samples/sec   Loss 11.9419   LearningRate 0.0825   Epoch: 3   Global Step: 18510   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:56:16,980-Speed 10308.31 samples/sec   Loss 12.1249   LearningRate 0.0825   Epoch: 3   Global Step: 18520   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:56:17,948-Speed 10592.39 samples/sec   Loss 12.1370   LearningRate 0.0825   Epoch: 3   Global Step: 18530   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:56:18,933-Speed 10399.01 samples/sec   Loss 11.8181   LearningRate 0.0825   Epoch: 3   Global Step: 18540   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:56:19,920-Speed 10387.15 samples/sec   Loss 11.8096   LearningRate 0.0825   Epoch: 3   Global Step: 18550   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:56:20,944-Speed 10006.89 samples/sec   Loss 11.9440   LearningRate 0.0825   Epoch: 3   Global Step: 18560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:56:21,922-Speed 10490.32 samples/sec   Loss 12.0554   LearningRate 0.0825   Epoch: 3   Global Step: 18570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:56:22,912-Speed 10350.59 samples/sec   Loss 11.8607   LearningRate 0.0825   Epoch: 3   Global Step: 18580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:56:23,887-Speed 10515.77 samples/sec   Loss 12.0414   LearningRate 0.0825   Epoch: 3   Global Step: 18590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:56:24,851-Speed 10631.69 samples/sec   Loss 12.1264   LearningRate 0.0825   Epoch: 3   Global Step: 18600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:56:25,809-Speed 10697.63 samples/sec   Loss 12.2826   LearningRate 0.0824   Epoch: 3   Global Step: 18610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:56:26,760-Speed 10778.94 samples/sec   Loss 12.0440   LearningRate 0.0824   Epoch: 3   Global Step: 18620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:56:27,745-Speed 10401.55 samples/sec   Loss 12.1004   LearningRate 0.0824   Epoch: 3   Global Step: 18630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:56:28,790-Speed 9816.27 samples/sec   Loss 11.9625   LearningRate 0.0824   Epoch: 3   Global Step: 18640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:56:29,757-Speed 10591.91 samples/sec   Loss 11.8756   LearningRate 0.0824   Epoch: 3   Global Step: 18650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:56:30,719-Speed 10663.50 samples/sec   Loss 12.0420   LearningRate 0.0824   Epoch: 3   Global Step: 18660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:56:31,689-Speed 10560.07 samples/sec   Loss 11.8307   LearningRate 0.0824   Epoch: 3   Global Step: 18670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:56:32,678-Speed 10359.58 samples/sec   Loss 12.0735   LearningRate 0.0824   Epoch: 3   Global Step: 18680   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:56:33,637-Speed 10696.28 samples/sec   Loss 11.9607   LearningRate 0.0824   Epoch: 3   Global Step: 18690   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:56:34,587-Speed 10790.13 samples/sec   Loss 11.9854   LearningRate 0.0824   Epoch: 3   Global Step: 18700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:56:35,536-Speed 10791.20 samples/sec   Loss 12.0136   LearningRate 0.0824   Epoch: 3   Global Step: 18710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:56:36,526-Speed 10352.12 samples/sec   Loss 12.0325   LearningRate 0.0824   Epoch: 3   Global Step: 18720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:56:37,564-Speed 9880.72 samples/sec   Loss 11.9220   LearningRate 0.0823   Epoch: 3   Global Step: 18730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:56:38,549-Speed 10404.24 samples/sec   Loss 11.9491   LearningRate 0.0823   Epoch: 3   Global Step: 18740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:56:39,561-Speed 10137.21 samples/sec   Loss 12.1039   LearningRate 0.0823   Epoch: 3   Global Step: 18750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:56:40,509-Speed 10813.16 samples/sec   Loss 11.8371   LearningRate 0.0823   Epoch: 3   Global Step: 18760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:56:41,525-Speed 10102.38 samples/sec   Loss 11.8616   LearningRate 0.0823   Epoch: 3   Global Step: 18770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:56:42,509-Speed 10409.07 samples/sec   Loss 11.9009   LearningRate 0.0823   Epoch: 3   Global Step: 18780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:56:43,480-Speed 10559.54 samples/sec   Loss 11.9471   LearningRate 0.0823   Epoch: 3   Global Step: 18790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:56:44,453-Speed 10536.05 samples/sec   Loss 11.9503   LearningRate 0.0823   Epoch: 3   Global Step: 18800   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:56:45,403-Speed 10784.46 samples/sec   Loss 11.8337   LearningRate 0.0823   Epoch: 3   Global Step: 18810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:56:46,425-Speed 10032.50 samples/sec   Loss 12.1237   LearningRate 0.0823   Epoch: 3   Global Step: 18820   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:56:47,373-Speed 10812.52 samples/sec   Loss 11.7755   LearningRate 0.0823   Epoch: 3   Global Step: 18830   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:56:48,328-Speed 10734.77 samples/sec   Loss 11.7944   LearningRate 0.0822   Epoch: 3   Global Step: 18840   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:56:49,285-Speed 10713.46 samples/sec   Loss 11.8903   LearningRate 0.0822   Epoch: 3   Global Step: 18850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:56:50,284-Speed 10261.25 samples/sec   Loss 11.8824   LearningRate 0.0822   Epoch: 3   Global Step: 18860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:56:51,254-Speed 10557.03 samples/sec   Loss 12.0999   LearningRate 0.0822   Epoch: 3   Global Step: 18870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:56:52,303-Speed 9777.95 samples/sec   Loss 11.9715   LearningRate 0.0822   Epoch: 3   Global Step: 18880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:56:53,250-Speed 10823.30 samples/sec   Loss 11.8953   LearningRate 0.0822   Epoch: 3   Global Step: 18890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:56:54,210-Speed 10679.56 samples/sec   Loss 11.9994   LearningRate 0.0822   Epoch: 3   Global Step: 18900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:56:55,147-Speed 10936.55 samples/sec   Loss 11.9306   LearningRate 0.0822   Epoch: 3   Global Step: 18910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:56:56,132-Speed 10403.95 samples/sec   Loss 12.0990   LearningRate 0.0822   Epoch: 3   Global Step: 18920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:56:57,098-Speed 10612.22 samples/sec   Loss 11.9662   LearningRate 0.0822   Epoch: 3   Global Step: 18930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:56:58,065-Speed 10603.47 samples/sec   Loss 11.9004   LearningRate 0.0822   Epoch: 3   Global Step: 18940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:56:59,020-Speed 10733.85 samples/sec   Loss 11.7859   LearningRate 0.0821   Epoch: 3   Global Step: 18950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:56:59,972-Speed 10763.08 samples/sec   Loss 11.9933   LearningRate 0.0821   Epoch: 3   Global Step: 18960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:57:00,980-Speed 10169.57 samples/sec   Loss 12.0584   LearningRate 0.0821   Epoch: 3   Global Step: 18970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:57:01,954-Speed 10528.26 samples/sec   Loss 11.8336   LearningRate 0.0821   Epoch: 3   Global Step: 18980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:57:02,918-Speed 10630.79 samples/sec   Loss 11.6351   LearningRate 0.0821   Epoch: 3   Global Step: 18990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:57:03,881-Speed 10648.69 samples/sec   Loss 11.7865   LearningRate 0.0821   Epoch: 3   Global Step: 19000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:57:04,823-Speed 10882.61 samples/sec   Loss 11.6902   LearningRate 0.0821   Epoch: 3   Global Step: 19010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:57:05,791-Speed 10588.89 samples/sec   Loss 11.9756   LearningRate 0.0821   Epoch: 3   Global Step: 19020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:57:06,809-Speed 10078.37 samples/sec   Loss 12.0235   LearningRate 0.0821   Epoch: 3   Global Step: 19030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:57:07,790-Speed 10453.25 samples/sec   Loss 11.9487   LearningRate 0.0821   Epoch: 3   Global Step: 19040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:57:08,754-Speed 10627.23 samples/sec   Loss 11.8933   LearningRate 0.0821   Epoch: 3   Global Step: 19050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:09,750-Speed 10289.58 samples/sec   Loss 11.9003   LearningRate 0.0820   Epoch: 3   Global Step: 19060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:10,761-Speed 10139.86 samples/sec   Loss 11.9204   LearningRate 0.0820   Epoch: 3   Global Step: 19070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:11,742-Speed 10455.20 samples/sec   Loss 11.8646   LearningRate 0.0820   Epoch: 3   Global Step: 19080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:12,695-Speed 10748.31 samples/sec   Loss 11.9097   LearningRate 0.0820   Epoch: 3   Global Step: 19090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:13,667-Speed 10546.57 samples/sec   Loss 11.8438   LearningRate 0.0820   Epoch: 3   Global Step: 19100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:14,669-Speed 10238.96 samples/sec   Loss 11.7937   LearningRate 0.0820   Epoch: 3   Global Step: 19110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:15,629-Speed 10679.09 samples/sec   Loss 11.9157   LearningRate 0.0820   Epoch: 3   Global Step: 19120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:16,612-Speed 10422.71 samples/sec   Loss 11.7514   LearningRate 0.0820   Epoch: 3   Global Step: 19130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:17,600-Speed 10370.24 samples/sec   Loss 11.9250   LearningRate 0.0820   Epoch: 3   Global Step: 19140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:18,621-Speed 10040.27 samples/sec   Loss 11.7194   LearningRate 0.0820   Epoch: 3   Global Step: 19150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:19,608-Speed 10396.03 samples/sec   Loss 11.7806   LearningRate 0.0820   Epoch: 3   Global Step: 19160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:20,586-Speed 10481.37 samples/sec   Loss 11.7057   LearningRate 0.0819   Epoch: 3   Global Step: 19170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:21,567-Speed 10449.41 samples/sec   Loss 11.8655   LearningRate 0.0819   Epoch: 3   Global Step: 19180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:22,574-Speed 10172.94 samples/sec   Loss 12.0664   LearningRate 0.0819   Epoch: 3   Global Step: 19190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:23,530-Speed 10731.00 samples/sec   Loss 12.0168   LearningRate 0.0819   Epoch: 3   Global Step: 19200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:24,479-Speed 10794.88 samples/sec   Loss 11.7944   LearningRate 0.0819   Epoch: 3   Global Step: 19210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:25,431-Speed 10760.97 samples/sec   Loss 11.8907   LearningRate 0.0819   Epoch: 3   Global Step: 19220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:26,402-Speed 10555.64 samples/sec   Loss 11.9219   LearningRate 0.0819   Epoch: 3   Global Step: 19230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:27,404-Speed 10225.25 samples/sec   Loss 11.9695   LearningRate 0.0819   Epoch: 3   Global Step: 19240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:28,402-Speed 10276.02 samples/sec   Loss 11.9191   LearningRate 0.0819   Epoch: 3   Global Step: 19250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:29,341-Speed 10923.61 samples/sec   Loss 11.8613   LearningRate 0.0819   Epoch: 3   Global Step: 19260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:30,321-Speed 10452.75 samples/sec   Loss 11.8757   LearningRate 0.0819   Epoch: 3   Global Step: 19270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:31,312-Speed 10340.30 samples/sec   Loss 11.6965   LearningRate 0.0818   Epoch: 3   Global Step: 19280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:32,305-Speed 10329.63 samples/sec   Loss 12.2097   LearningRate 0.0818   Epoch: 3   Global Step: 19290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:57:33,241-Speed 10963.86 samples/sec   Loss 11.7973   LearningRate 0.0818   Epoch: 3   Global Step: 19300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:57:34,207-Speed 10610.13 samples/sec   Loss 11.8977   LearningRate 0.0818   Epoch: 3   Global Step: 19310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:57:35,206-Speed 10252.20 samples/sec   Loss 12.0100   LearningRate 0.0818   Epoch: 3   Global Step: 19320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:57:36,253-Speed 9791.27 samples/sec   Loss 12.1083   LearningRate 0.0818   Epoch: 3   Global Step: 19330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:57:37,219-Speed 10607.72 samples/sec   Loss 11.9684   LearningRate 0.0818   Epoch: 3   Global Step: 19340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:57:38,133-Speed 11215.70 samples/sec   Loss 11.9362   LearningRate 0.0818   Epoch: 3   Global Step: 19350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:57:39,058-Speed 11085.56 samples/sec   Loss 12.0788   LearningRate 0.0818   Epoch: 3   Global Step: 19360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:57:40,002-Speed 10848.25 samples/sec   Loss 11.8448   LearningRate 0.0818   Epoch: 3   Global Step: 19370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:57:40,998-Speed 10299.62 samples/sec   Loss 11.6467   LearningRate 0.0818   Epoch: 3   Global Step: 19380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:57:41,991-Speed 10324.27 samples/sec   Loss 11.7099   LearningRate 0.0818   Epoch: 3   Global Step: 19390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:42,955-Speed 10632.63 samples/sec   Loss 11.7822   LearningRate 0.0817   Epoch: 3   Global Step: 19400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:43,907-Speed 10767.30 samples/sec   Loss 11.9499   LearningRate 0.0817   Epoch: 3   Global Step: 19410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:44,829-Speed 11115.73 samples/sec   Loss 11.8456   LearningRate 0.0817   Epoch: 3   Global Step: 19420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:45,768-Speed 10903.30 samples/sec   Loss 11.8715   LearningRate 0.0817   Epoch: 3   Global Step: 19430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:46,791-Speed 10016.90 samples/sec   Loss 11.7401   LearningRate 0.0817   Epoch: 3   Global Step: 19440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:47,760-Speed 10586.55 samples/sec   Loss 11.6718   LearningRate 0.0817   Epoch: 3   Global Step: 19450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:48,712-Speed 10766.32 samples/sec   Loss 11.8947   LearningRate 0.0817   Epoch: 3   Global Step: 19460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:49,670-Speed 10694.61 samples/sec   Loss 11.9863   LearningRate 0.0817   Epoch: 3   Global Step: 19470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:50,637-Speed 10623.07 samples/sec   Loss 11.8900   LearningRate 0.0817   Epoch: 3   Global Step: 19480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:51,603-Speed 10610.18 samples/sec   Loss 11.8676   LearningRate 0.0817   Epoch: 3   Global Step: 19490   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:57:52,565-Speed 10649.32 samples/sec   Loss 11.8970   LearningRate 0.0817   Epoch: 3   Global Step: 19500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:53,577-Speed 10135.59 samples/sec   Loss 11.8074   LearningRate 0.0816   Epoch: 3   Global Step: 19510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:54,537-Speed 10679.62 samples/sec   Loss 11.7483   LearningRate 0.0816   Epoch: 3   Global Step: 19520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:55,487-Speed 10789.14 samples/sec   Loss 11.6502   LearningRate 0.0816   Epoch: 3   Global Step: 19530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:56,455-Speed 10601.70 samples/sec   Loss 11.8600   LearningRate 0.0816   Epoch: 3   Global Step: 19540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:57,435-Speed 10450.04 samples/sec   Loss 11.7350   LearningRate 0.0816   Epoch: 3   Global Step: 19550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:58,397-Speed 10656.71 samples/sec   Loss 11.8538   LearningRate 0.0816   Epoch: 3   Global Step: 19560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:57:59,385-Speed 10374.78 samples/sec   Loss 11.7966   LearningRate 0.0816   Epoch: 3   Global Step: 19570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:58:00,355-Speed 10576.28 samples/sec   Loss 11.8035   LearningRate 0.0816   Epoch: 3   Global Step: 19580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:58:01,317-Speed 10649.73 samples/sec   Loss 11.7348   LearningRate 0.0816   Epoch: 3   Global Step: 19590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:58:02,305-Speed 10373.13 samples/sec   Loss 11.8844   LearningRate 0.0816   Epoch: 3   Global Step: 19600   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:58:03,274-Speed 10579.08 samples/sec   Loss 11.9682   LearningRate 0.0816   Epoch: 3   Global Step: 19610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:58:04,238-Speed 10625.27 samples/sec   Loss 11.7705   LearningRate 0.0815   Epoch: 3   Global Step: 19620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:58:05,191-Speed 10761.31 samples/sec   Loss 11.8980   LearningRate 0.0815   Epoch: 3   Global Step: 19630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:58:06,180-Speed 10364.30 samples/sec   Loss 11.6612   LearningRate 0.0815   Epoch: 3   Global Step: 19640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:58:07,153-Speed 10533.01 samples/sec   Loss 11.8616   LearningRate 0.0815   Epoch: 3   Global Step: 19650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:58:08,114-Speed 10664.58 samples/sec   Loss 11.6587   LearningRate 0.0815   Epoch: 3   Global Step: 19660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:58:09,080-Speed 10613.23 samples/sec   Loss 11.8955   LearningRate 0.0815   Epoch: 3   Global Step: 19670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:58:10,067-Speed 10382.58 samples/sec   Loss 11.9961   LearningRate 0.0815   Epoch: 3   Global Step: 19680   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:58:11,074-Speed 10175.04 samples/sec   Loss 11.7035   LearningRate 0.0815   Epoch: 3   Global Step: 19690   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:58:12,029-Speed 10731.24 samples/sec   Loss 11.7480   LearningRate 0.0815   Epoch: 3   Global Step: 19700   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:58:12,987-Speed 10710.75 samples/sec   Loss 11.7832   LearningRate 0.0815   Epoch: 3   Global Step: 19710   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:58:13,962-Speed 10511.03 samples/sec   Loss 11.7642   LearningRate 0.0815   Epoch: 3   Global Step: 19720   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:58:14,899-Speed 10937.97 samples/sec   Loss 11.8696   LearningRate 0.0814   Epoch: 3   Global Step: 19730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:58:15,850-Speed 10776.80 samples/sec   Loss 11.8093   LearningRate 0.0814   Epoch: 3   Global Step: 19740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-10 23:58:16,815-Speed 10612.33 samples/sec   Loss 11.6586   LearningRate 0.0814   Epoch: 3   Global Step: 19750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:58:17,805-Speed 10353.02 samples/sec   Loss 11.7539   LearningRate 0.0814   Epoch: 3   Global Step: 19760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:58:18,792-Speed 10393.74 samples/sec   Loss 11.8463   LearningRate 0.0814   Epoch: 3   Global Step: 19770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:58:19,754-Speed 10656.81 samples/sec   Loss 11.8943   LearningRate 0.0814   Epoch: 3   Global Step: 19780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:58:20,749-Speed 10301.67 samples/sec   Loss 11.8070   LearningRate 0.0814   Epoch: 3   Global Step: 19790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:58:21,735-Speed 10398.08 samples/sec   Loss 11.6902   LearningRate 0.0814   Epoch: 3   Global Step: 19800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:58:22,687-Speed 10777.08 samples/sec   Loss 11.6704   LearningRate 0.0814   Epoch: 3   Global Step: 19810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:58:23,641-Speed 10745.75 samples/sec   Loss 12.0437   LearningRate 0.0814   Epoch: 3   Global Step: 19820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:58:24,591-Speed 10782.18 samples/sec   Loss 11.7925   LearningRate 0.0814   Epoch: 3   Global Step: 19830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:58:25,528-Speed 10940.96 samples/sec   Loss 11.6955   LearningRate 0.0813   Epoch: 3   Global Step: 19840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:58:26,536-Speed 10171.11 samples/sec   Loss 11.5963   LearningRate 0.0813   Epoch: 3   Global Step: 19850   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:58:27,506-Speed 10561.28 samples/sec   Loss 11.6649   LearningRate 0.0813   Epoch: 3   Global Step: 19860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:58:28,487-Speed 10453.34 samples/sec   Loss 11.7712   LearningRate 0.0813   Epoch: 3   Global Step: 19870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:58:29,470-Speed 10430.84 samples/sec   Loss 11.7343   LearningRate 0.0813   Epoch: 3   Global Step: 19880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:58:30,428-Speed 10699.30 samples/sec   Loss 11.8597   LearningRate 0.0813   Epoch: 3   Global Step: 19890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:58:31,400-Speed 10546.41 samples/sec   Loss 11.7341   LearningRate 0.0813   Epoch: 3   Global Step: 19900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:58:32,328-Speed 11047.73 samples/sec   Loss 11.7634   LearningRate 0.0813   Epoch: 3   Global Step: 19910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:58:33,296-Speed 10587.43 samples/sec   Loss 11.8660   LearningRate 0.0813   Epoch: 3   Global Step: 19920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:58:34,263-Speed 10600.44 samples/sec   Loss 11.8716   LearningRate 0.0813   Epoch: 3   Global Step: 19930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:58:35,220-Speed 10712.04 samples/sec   Loss 11.6539   LearningRate 0.0813   Epoch: 3   Global Step: 19940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:58:36,239-Speed 10053.71 samples/sec   Loss 11.7159   LearningRate 0.0813   Epoch: 3   Global Step: 19950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:58:37,194-Speed 10740.42 samples/sec   Loss 11.7924   LearningRate 0.0812   Epoch: 3   Global Step: 19960   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:58:38,151-Speed 10708.92 samples/sec   Loss 11.5397   LearningRate 0.0812   Epoch: 3   Global Step: 19970   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:58:39,139-Speed 10371.31 samples/sec   Loss 11.8477   LearningRate 0.0812   Epoch: 3   Global Step: 19980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:58:40,123-Speed 10420.66 samples/sec   Loss 11.8418   LearningRate 0.0812   Epoch: 3   Global Step: 19990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:58:41,115-Speed 10335.57 samples/sec   Loss 11.7876   LearningRate 0.0812   Epoch: 3   Global Step: 20000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:59:03,927-[lfw][20000]XNorm: 15.108700
Training: 2022-04-10 23:59:03,928-[lfw][20000]Accuracy-Flip: 0.99300+-0.00393
Training: 2022-04-10 23:59:03,929-[lfw][20000]Accuracy-Highest: 0.99333
Training: 2022-04-10 23:59:29,205-[cfp_fp][20000]XNorm: 12.679768
Training: 2022-04-10 23:59:29,206-[cfp_fp][20000]Accuracy-Flip: 0.93500+-0.01810
Training: 2022-04-10 23:59:29,208-[cfp_fp][20000]Accuracy-Highest: 0.93500
Training: 2022-04-10 23:59:50,999-[agedb_30][20000]XNorm: 14.669716
Training: 2022-04-10 23:59:51,000-[agedb_30][20000]Accuracy-Flip: 0.94350+-0.01363
Training: 2022-04-10 23:59:51,001-[agedb_30][20000]Accuracy-Highest: 0.94367
Training: 2022-04-10 23:59:51,944-Speed 144.57 samples/sec   Loss 11.6618   LearningRate 0.0812   Epoch: 3   Global Step: 20010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:59:52,885-Speed 10886.53 samples/sec   Loss 11.6601   LearningRate 0.0812   Epoch: 3   Global Step: 20020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:59:53,847-Speed 10659.36 samples/sec   Loss 11.9508   LearningRate 0.0812   Epoch: 3   Global Step: 20030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:59:54,808-Speed 10669.64 samples/sec   Loss 11.7385   LearningRate 0.0812   Epoch: 3   Global Step: 20040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:59:55,883-Speed 9535.12 samples/sec   Loss 11.6139   LearningRate 0.0812   Epoch: 3   Global Step: 20050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:59:56,833-Speed 10788.60 samples/sec   Loss 11.8293   LearningRate 0.0812   Epoch: 3   Global Step: 20060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:59:57,788-Speed 10737.42 samples/sec   Loss 11.7495   LearningRate 0.0811   Epoch: 3   Global Step: 20070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-10 23:59:58,725-Speed 10942.04 samples/sec   Loss 11.7934   LearningRate 0.0811   Epoch: 3   Global Step: 20080   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-10 23:59:59,740-Speed 10102.18 samples/sec   Loss 11.8542   LearningRate 0.0811   Epoch: 3   Global Step: 20090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:00:00,678-Speed 10928.08 samples/sec   Loss 11.6888   LearningRate 0.0811   Epoch: 3   Global Step: 20100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:00:01,630-Speed 10767.81 samples/sec   Loss 11.7057   LearningRate 0.0811   Epoch: 3   Global Step: 20110   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:00:02,588-Speed 10699.97 samples/sec   Loss 11.6547   LearningRate 0.0811   Epoch: 3   Global Step: 20120   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:00:03,577-Speed 10365.87 samples/sec   Loss 11.8480   LearningRate 0.0811   Epoch: 3   Global Step: 20130   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:00:04,533-Speed 10728.00 samples/sec   Loss 11.6483   LearningRate 0.0811   Epoch: 3   Global Step: 20140   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:00:05,521-Speed 10377.46 samples/sec   Loss 11.7672   LearningRate 0.0811   Epoch: 3   Global Step: 20150   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:00:06,528-Speed 10173.42 samples/sec   Loss 11.6719   LearningRate 0.0811   Epoch: 3   Global Step: 20160   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:00:07,503-Speed 10513.37 samples/sec   Loss 11.8133   LearningRate 0.0811   Epoch: 3   Global Step: 20170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:00:08,481-Speed 10489.48 samples/sec   Loss 12.0042   LearningRate 0.0810   Epoch: 3   Global Step: 20180   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:00:09,426-Speed 10838.21 samples/sec   Loss 11.6385   LearningRate 0.0810   Epoch: 3   Global Step: 20190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:00:10,433-Speed 10191.91 samples/sec   Loss 11.7369   LearningRate 0.0810   Epoch: 3   Global Step: 20200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:00:11,410-Speed 10486.16 samples/sec   Loss 11.7206   LearningRate 0.0810   Epoch: 3   Global Step: 20210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:00:12,385-Speed 10512.10 samples/sec   Loss 11.7689   LearningRate 0.0810   Epoch: 3   Global Step: 20220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:00:13,438-Speed 9734.09 samples/sec   Loss 11.6068   LearningRate 0.0810   Epoch: 3   Global Step: 20230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:00:24,500-Speed 925.85 samples/sec   Loss 10.9804   LearningRate 0.0810   Epoch: 4   Global Step: 20240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:00:25,494-Speed 10309.16 samples/sec   Loss 10.7859   LearningRate 0.0810   Epoch: 4   Global Step: 20250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:00:26,567-Speed 9554.19 samples/sec   Loss 10.7104   LearningRate 0.0810   Epoch: 4   Global Step: 20260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:00:27,560-Speed 10330.69 samples/sec   Loss 11.0744   LearningRate 0.0810   Epoch: 4   Global Step: 20270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:00:28,696-Speed 9019.73 samples/sec   Loss 10.7365   LearningRate 0.0810   Epoch: 4   Global Step: 20280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:00:29,674-Speed 10474.50 samples/sec   Loss 10.7276   LearningRate 0.0809   Epoch: 4   Global Step: 20290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:00:30,687-Speed 10132.94 samples/sec   Loss 10.7585   LearningRate 0.0809   Epoch: 4   Global Step: 20300   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 00:00:31,650-Speed 10650.35 samples/sec   Loss 10.7234   LearningRate 0.0809   Epoch: 4   Global Step: 20310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:00:32,693-Speed 9825.97 samples/sec   Loss 10.8845   LearningRate 0.0809   Epoch: 4   Global Step: 20320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:00:33,651-Speed 10695.82 samples/sec   Loss 10.9058   LearningRate 0.0809   Epoch: 4   Global Step: 20330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:00:34,633-Speed 10445.15 samples/sec   Loss 10.7596   LearningRate 0.0809   Epoch: 4   Global Step: 20340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:00:35,808-Speed 8720.41 samples/sec   Loss 10.9522   LearningRate 0.0809   Epoch: 4   Global Step: 20350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:00:36,769-Speed 10661.31 samples/sec   Loss 10.9734   LearningRate 0.0809   Epoch: 4   Global Step: 20360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:00:37,786-Speed 10081.06 samples/sec   Loss 10.9733   LearningRate 0.0809   Epoch: 4   Global Step: 20370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:00:38,777-Speed 10340.01 samples/sec   Loss 10.9366   LearningRate 0.0809   Epoch: 4   Global Step: 20380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:00:39,723-Speed 10846.39 samples/sec   Loss 11.0199   LearningRate 0.0809   Epoch: 4   Global Step: 20390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:00:40,735-Speed 10129.74 samples/sec   Loss 10.8632   LearningRate 0.0809   Epoch: 4   Global Step: 20400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:00:41,710-Speed 10510.72 samples/sec   Loss 10.9180   LearningRate 0.0808   Epoch: 4   Global Step: 20410   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 00:00:42,703-Speed 10320.90 samples/sec   Loss 11.0116   LearningRate 0.0808   Epoch: 4   Global Step: 20420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:00:43,634-Speed 10996.43 samples/sec   Loss 11.0798   LearningRate 0.0808   Epoch: 4   Global Step: 20430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:00:44,610-Speed 10506.93 samples/sec   Loss 10.9618   LearningRate 0.0808   Epoch: 4   Global Step: 20440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:00:45,573-Speed 10633.12 samples/sec   Loss 11.0503   LearningRate 0.0808   Epoch: 4   Global Step: 20450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:00:46,565-Speed 10346.94 samples/sec   Loss 10.8983   LearningRate 0.0808   Epoch: 4   Global Step: 20460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:00:47,492-Speed 11063.12 samples/sec   Loss 10.9278   LearningRate 0.0808   Epoch: 4   Global Step: 20470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:00:48,477-Speed 10399.09 samples/sec   Loss 11.0552   LearningRate 0.0808   Epoch: 4   Global Step: 20480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:00:49,449-Speed 10540.89 samples/sec   Loss 11.0130   LearningRate 0.0808   Epoch: 4   Global Step: 20490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:00:50,445-Speed 10291.07 samples/sec   Loss 10.8054   LearningRate 0.0808   Epoch: 4   Global Step: 20500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:00:51,385-Speed 10911.66 samples/sec   Loss 11.0108   LearningRate 0.0808   Epoch: 4   Global Step: 20510   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:00:52,438-Speed 9772.87 samples/sec   Loss 11.0026   LearningRate 0.0807   Epoch: 4   Global Step: 20520   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:00:53,520-Speed 9471.93 samples/sec   Loss 11.2164   LearningRate 0.0807   Epoch: 4   Global Step: 20530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:00:54,620-Speed 9319.76 samples/sec   Loss 11.1159   LearningRate 0.0807   Epoch: 4   Global Step: 20540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:00:55,567-Speed 10823.21 samples/sec   Loss 11.0531   LearningRate 0.0807   Epoch: 4   Global Step: 20550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:00:56,525-Speed 10696.67 samples/sec   Loss 11.0650   LearningRate 0.0807   Epoch: 4   Global Step: 20560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:00:57,498-Speed 10537.92 samples/sec   Loss 11.1185   LearningRate 0.0807   Epoch: 4   Global Step: 20570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:00:58,483-Speed 10399.64 samples/sec   Loss 11.2991   LearningRate 0.0807   Epoch: 4   Global Step: 20580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:00:59,442-Speed 10685.27 samples/sec   Loss 11.1618   LearningRate 0.0807   Epoch: 4   Global Step: 20590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:01:00,396-Speed 10747.31 samples/sec   Loss 11.0915   LearningRate 0.0807   Epoch: 4   Global Step: 20600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:01:01,356-Speed 10676.72 samples/sec   Loss 11.2060   LearningRate 0.0807   Epoch: 4   Global Step: 20610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:01:02,341-Speed 10402.68 samples/sec   Loss 11.1917   LearningRate 0.0807   Epoch: 4   Global Step: 20620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:01:03,313-Speed 10548.10 samples/sec   Loss 10.9416   LearningRate 0.0806   Epoch: 4   Global Step: 20630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:01:04,286-Speed 10541.47 samples/sec   Loss 11.0447   LearningRate 0.0806   Epoch: 4   Global Step: 20640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:01:05,269-Speed 10424.77 samples/sec   Loss 11.1014   LearningRate 0.0806   Epoch: 4   Global Step: 20650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:01:06,243-Speed 10519.16 samples/sec   Loss 10.9061   LearningRate 0.0806   Epoch: 4   Global Step: 20660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:01:07,184-Speed 10885.50 samples/sec   Loss 11.2033   LearningRate 0.0806   Epoch: 4   Global Step: 20670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:01:08,119-Speed 10967.98 samples/sec   Loss 11.2575   LearningRate 0.0806   Epoch: 4   Global Step: 20680   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:01:09,086-Speed 10605.98 samples/sec   Loss 11.2808   LearningRate 0.0806   Epoch: 4   Global Step: 20690   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:01:10,020-Speed 10974.67 samples/sec   Loss 11.2528   LearningRate 0.0806   Epoch: 4   Global Step: 20700   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:01:10,971-Speed 10783.33 samples/sec   Loss 11.1282   LearningRate 0.0806   Epoch: 4   Global Step: 20710   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:01:11,952-Speed 10440.86 samples/sec   Loss 11.2980   LearningRate 0.0806   Epoch: 4   Global Step: 20720   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:01:12,922-Speed 10570.59 samples/sec   Loss 11.0522   LearningRate 0.0806   Epoch: 4   Global Step: 20730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:01:13,876-Speed 10742.19 samples/sec   Loss 11.3066   LearningRate 0.0805   Epoch: 4   Global Step: 20740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:01:14,806-Speed 11029.29 samples/sec   Loss 11.1805   LearningRate 0.0805   Epoch: 4   Global Step: 20750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:01:15,764-Speed 10691.75 samples/sec   Loss 11.1943   LearningRate 0.0805   Epoch: 4   Global Step: 20760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:01:16,745-Speed 10452.05 samples/sec   Loss 11.1478   LearningRate 0.0805   Epoch: 4   Global Step: 20770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:01:17,737-Speed 10327.90 samples/sec   Loss 11.2584   LearningRate 0.0805   Epoch: 4   Global Step: 20780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:01:18,713-Speed 10509.37 samples/sec   Loss 11.1085   LearningRate 0.0805   Epoch: 4   Global Step: 20790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:01:19,675-Speed 10651.74 samples/sec   Loss 11.2126   LearningRate 0.0805   Epoch: 4   Global Step: 20800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:01:20,634-Speed 10686.68 samples/sec   Loss 11.2774   LearningRate 0.0805   Epoch: 4   Global Step: 20810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:01:21,639-Speed 10200.65 samples/sec   Loss 11.0272   LearningRate 0.0805   Epoch: 4   Global Step: 20820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:01:22,595-Speed 10715.37 samples/sec   Loss 11.1346   LearningRate 0.0805   Epoch: 4   Global Step: 20830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:01:23,591-Speed 10298.97 samples/sec   Loss 11.1195   LearningRate 0.0805   Epoch: 4   Global Step: 20840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:01:24,591-Speed 10252.43 samples/sec   Loss 11.1105   LearningRate 0.0805   Epoch: 4   Global Step: 20850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:01:25,567-Speed 10496.62 samples/sec   Loss 11.2182   LearningRate 0.0804   Epoch: 4   Global Step: 20860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:01:26,546-Speed 10471.13 samples/sec   Loss 11.3697   LearningRate 0.0804   Epoch: 4   Global Step: 20870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:01:27,525-Speed 10472.41 samples/sec   Loss 11.2601   LearningRate 0.0804   Epoch: 4   Global Step: 20880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:01:28,518-Speed 10327.75 samples/sec   Loss 11.2534   LearningRate 0.0804   Epoch: 4   Global Step: 20890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:01:29,477-Speed 10686.97 samples/sec   Loss 11.3291   LearningRate 0.0804   Epoch: 4   Global Step: 20900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:01:30,428-Speed 10778.42 samples/sec   Loss 11.1820   LearningRate 0.0804   Epoch: 4   Global Step: 20910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:01:31,411-Speed 10428.97 samples/sec   Loss 11.1071   LearningRate 0.0804   Epoch: 4   Global Step: 20920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:01:32,402-Speed 10336.80 samples/sec   Loss 11.2226   LearningRate 0.0804   Epoch: 4   Global Step: 20930   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 00:01:33,403-Speed 10243.22 samples/sec   Loss 11.3742   LearningRate 0.0804   Epoch: 4   Global Step: 20940   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 00:01:34,386-Speed 10434.77 samples/sec   Loss 11.1570   LearningRate 0.0804   Epoch: 4   Global Step: 20950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:01:35,342-Speed 10712.83 samples/sec   Loss 11.1618   LearningRate 0.0804   Epoch: 4   Global Step: 20960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:01:36,333-Speed 10343.43 samples/sec   Loss 11.4251   LearningRate 0.0803   Epoch: 4   Global Step: 20970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:01:37,300-Speed 10594.62 samples/sec   Loss 11.1632   LearningRate 0.0803   Epoch: 4   Global Step: 20980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:01:38,335-Speed 9920.54 samples/sec   Loss 11.3140   LearningRate 0.0803   Epoch: 4   Global Step: 20990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:01:39,316-Speed 10450.07 samples/sec   Loss 11.0930   LearningRate 0.0803   Epoch: 4   Global Step: 21000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:01:40,290-Speed 10521.03 samples/sec   Loss 11.3603   LearningRate 0.0803   Epoch: 4   Global Step: 21010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:01:41,269-Speed 10464.79 samples/sec   Loss 11.2384   LearningRate 0.0803   Epoch: 4   Global Step: 21020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:01:42,259-Speed 10354.29 samples/sec   Loss 11.3305   LearningRate 0.0803   Epoch: 4   Global Step: 21030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:01:43,243-Speed 10426.83 samples/sec   Loss 11.1941   LearningRate 0.0803   Epoch: 4   Global Step: 21040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:01:44,200-Speed 10702.80 samples/sec   Loss 11.4680   LearningRate 0.0803   Epoch: 4   Global Step: 21050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:01:45,137-Speed 10942.75 samples/sec   Loss 11.3103   LearningRate 0.0803   Epoch: 4   Global Step: 21060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:01:46,092-Speed 10722.98 samples/sec   Loss 11.2223   LearningRate 0.0803   Epoch: 4   Global Step: 21070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:01:47,038-Speed 10830.98 samples/sec   Loss 11.4440   LearningRate 0.0802   Epoch: 4   Global Step: 21080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:01:48,004-Speed 10769.79 samples/sec   Loss 11.2592   LearningRate 0.0802   Epoch: 4   Global Step: 21090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:01:48,971-Speed 10602.37 samples/sec   Loss 11.5008   LearningRate 0.0802   Epoch: 4   Global Step: 21100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:01:49,930-Speed 10680.07 samples/sec   Loss 11.3335   LearningRate 0.0802   Epoch: 4   Global Step: 21110   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:01:50,922-Speed 10332.05 samples/sec   Loss 11.3339   LearningRate 0.0802   Epoch: 4   Global Step: 21120   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:01:51,898-Speed 10498.04 samples/sec   Loss 11.3377   LearningRate 0.0802   Epoch: 4   Global Step: 21130   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:01:52,864-Speed 10617.93 samples/sec   Loss 11.2748   LearningRate 0.0802   Epoch: 4   Global Step: 21140   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:01:53,828-Speed 10627.41 samples/sec   Loss 11.5210   LearningRate 0.0802   Epoch: 4   Global Step: 21150   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:01:54,810-Speed 10443.74 samples/sec   Loss 11.3808   LearningRate 0.0802   Epoch: 4   Global Step: 21160   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:01:55,760-Speed 10786.60 samples/sec   Loss 11.4950   LearningRate 0.0802   Epoch: 4   Global Step: 21170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:01:56,725-Speed 10624.88 samples/sec   Loss 11.3916   LearningRate 0.0802   Epoch: 4   Global Step: 21180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:01:57,686-Speed 10662.74 samples/sec   Loss 11.5096   LearningRate 0.0801   Epoch: 4   Global Step: 21190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:01:58,658-Speed 10541.51 samples/sec   Loss 11.3263   LearningRate 0.0801   Epoch: 4   Global Step: 21200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:01:59,605-Speed 10831.53 samples/sec   Loss 11.3389   LearningRate 0.0801   Epoch: 4   Global Step: 21210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:02:00,599-Speed 10305.76 samples/sec   Loss 11.3778   LearningRate 0.0801   Epoch: 4   Global Step: 21220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:02:01,609-Speed 10154.43 samples/sec   Loss 11.2683   LearningRate 0.0801   Epoch: 4   Global Step: 21230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:02:02,588-Speed 10469.06 samples/sec   Loss 11.5230   LearningRate 0.0801   Epoch: 4   Global Step: 21240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:02:03,541-Speed 10777.44 samples/sec   Loss 11.3614   LearningRate 0.0801   Epoch: 4   Global Step: 21250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:02:04,503-Speed 10654.81 samples/sec   Loss 11.3609   LearningRate 0.0801   Epoch: 4   Global Step: 21260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:02:05,478-Speed 10511.76 samples/sec   Loss 11.3052   LearningRate 0.0801   Epoch: 4   Global Step: 21270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:02:06,477-Speed 10257.85 samples/sec   Loss 11.2943   LearningRate 0.0801   Epoch: 4   Global Step: 21280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:02:07,423-Speed 10835.97 samples/sec   Loss 11.2070   LearningRate 0.0801   Epoch: 4   Global Step: 21290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:02:08,376-Speed 10759.02 samples/sec   Loss 11.3520   LearningRate 0.0801   Epoch: 4   Global Step: 21300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:02:09,294-Speed 11157.97 samples/sec   Loss 11.2972   LearningRate 0.0800   Epoch: 4   Global Step: 21310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:02:10,265-Speed 10556.24 samples/sec   Loss 11.4395   LearningRate 0.0800   Epoch: 4   Global Step: 21320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:02:11,228-Speed 10643.81 samples/sec   Loss 11.4490   LearningRate 0.0800   Epoch: 4   Global Step: 21330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:02:12,419-Speed 8605.97 samples/sec   Loss 11.3936   LearningRate 0.0800   Epoch: 4   Global Step: 21340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:02:13,390-Speed 10562.30 samples/sec   Loss 11.4520   LearningRate 0.0800   Epoch: 4   Global Step: 21350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:02:14,402-Speed 10133.33 samples/sec   Loss 11.3304   LearningRate 0.0800   Epoch: 4   Global Step: 21360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:02:15,344-Speed 10883.28 samples/sec   Loss 11.2523   LearningRate 0.0800   Epoch: 4   Global Step: 21370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:02:16,361-Speed 10074.47 samples/sec   Loss 11.4185   LearningRate 0.0800   Epoch: 4   Global Step: 21380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:02:17,305-Speed 10860.19 samples/sec   Loss 11.4077   LearningRate 0.0800   Epoch: 4   Global Step: 21390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:02:18,273-Speed 10594.94 samples/sec   Loss 11.3117   LearningRate 0.0800   Epoch: 4   Global Step: 21400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:02:19,234-Speed 10658.55 samples/sec   Loss 11.3661   LearningRate 0.0800   Epoch: 4   Global Step: 21410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:02:20,264-Speed 9957.19 samples/sec   Loss 11.4614   LearningRate 0.0799   Epoch: 4   Global Step: 21420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:02:21,202-Speed 10935.15 samples/sec   Loss 11.2292   LearningRate 0.0799   Epoch: 4   Global Step: 21430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:02:22,141-Speed 10912.17 samples/sec   Loss 11.2869   LearningRate 0.0799   Epoch: 4   Global Step: 21440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:02:23,143-Speed 10226.85 samples/sec   Loss 11.4702   LearningRate 0.0799   Epoch: 4   Global Step: 21450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:02:24,120-Speed 10497.75 samples/sec   Loss 11.3555   LearningRate 0.0799   Epoch: 4   Global Step: 21460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:02:25,076-Speed 10711.90 samples/sec   Loss 11.5290   LearningRate 0.0799   Epoch: 4   Global Step: 21470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:02:26,062-Speed 10397.06 samples/sec   Loss 11.3135   LearningRate 0.0799   Epoch: 4   Global Step: 21480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:02:27,036-Speed 10523.65 samples/sec   Loss 11.3316   LearningRate 0.0799   Epoch: 4   Global Step: 21490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:02:28,001-Speed 10624.59 samples/sec   Loss 11.5449   LearningRate 0.0799   Epoch: 4   Global Step: 21500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:02:28,969-Speed 10588.46 samples/sec   Loss 11.3419   LearningRate 0.0799   Epoch: 4   Global Step: 21510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:02:29,926-Speed 10713.24 samples/sec   Loss 11.3435   LearningRate 0.0799   Epoch: 4   Global Step: 21520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:02:30,919-Speed 10321.61 samples/sec   Loss 11.3313   LearningRate 0.0798   Epoch: 4   Global Step: 21530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:02:31,887-Speed 10589.96 samples/sec   Loss 11.1001   LearningRate 0.0798   Epoch: 4   Global Step: 21540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:02:32,869-Speed 10437.37 samples/sec   Loss 11.4958   LearningRate 0.0798   Epoch: 4   Global Step: 21550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:02:33,840-Speed 10557.25 samples/sec   Loss 11.5087   LearningRate 0.0798   Epoch: 4   Global Step: 21560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:02:34,804-Speed 10639.94 samples/sec   Loss 11.5918   LearningRate 0.0798   Epoch: 4   Global Step: 21570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:02:35,734-Speed 11013.87 samples/sec   Loss 11.4437   LearningRate 0.0798   Epoch: 4   Global Step: 21580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:02:36,717-Speed 10423.90 samples/sec   Loss 11.3141   LearningRate 0.0798   Epoch: 4   Global Step: 21590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:02:37,687-Speed 10575.65 samples/sec   Loss 11.4473   LearningRate 0.0798   Epoch: 4   Global Step: 21600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:02:38,660-Speed 10527.97 samples/sec   Loss 11.2776   LearningRate 0.0798   Epoch: 4   Global Step: 21610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:02:39,647-Speed 10390.41 samples/sec   Loss 11.3365   LearningRate 0.0798   Epoch: 4   Global Step: 21620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:02:40,626-Speed 10469.74 samples/sec   Loss 11.3833   LearningRate 0.0798   Epoch: 4   Global Step: 21630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:02:41,599-Speed 10527.88 samples/sec   Loss 11.3158   LearningRate 0.0798   Epoch: 4   Global Step: 21640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:02:42,572-Speed 10536.15 samples/sec   Loss 11.2998   LearningRate 0.0797   Epoch: 4   Global Step: 21650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:02:43,525-Speed 10762.08 samples/sec   Loss 11.2835   LearningRate 0.0797   Epoch: 4   Global Step: 21660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:02:44,490-Speed 10622.91 samples/sec   Loss 11.4447   LearningRate 0.0797   Epoch: 4   Global Step: 21670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:02:45,415-Speed 11079.25 samples/sec   Loss 11.3735   LearningRate 0.0797   Epoch: 4   Global Step: 21680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:02:46,367-Speed 10757.30 samples/sec   Loss 11.4455   LearningRate 0.0797   Epoch: 4   Global Step: 21690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:02:47,316-Speed 10808.42 samples/sec   Loss 11.3074   LearningRate 0.0797   Epoch: 4   Global Step: 21700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:02:48,285-Speed 10571.88 samples/sec   Loss 11.4078   LearningRate 0.0797   Epoch: 4   Global Step: 21710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:02:49,248-Speed 10647.26 samples/sec   Loss 11.3322   LearningRate 0.0797   Epoch: 4   Global Step: 21720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:02:50,260-Speed 10129.82 samples/sec   Loss 11.5121   LearningRate 0.0797   Epoch: 4   Global Step: 21730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:02:51,228-Speed 10588.65 samples/sec   Loss 11.4208   LearningRate 0.0797   Epoch: 4   Global Step: 21740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:02:52,226-Speed 10270.06 samples/sec   Loss 11.2681   LearningRate 0.0797   Epoch: 4   Global Step: 21750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:02:53,200-Speed 10525.47 samples/sec   Loss 11.4752   LearningRate 0.0796   Epoch: 4   Global Step: 21760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:02:54,169-Speed 10574.91 samples/sec   Loss 11.5163   LearningRate 0.0796   Epoch: 4   Global Step: 21770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:02:55,197-Speed 9971.51 samples/sec   Loss 11.3207   LearningRate 0.0796   Epoch: 4   Global Step: 21780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:02:56,136-Speed 10910.52 samples/sec   Loss 11.3281   LearningRate 0.0796   Epoch: 4   Global Step: 21790   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:02:57,111-Speed 10519.36 samples/sec   Loss 11.3847   LearningRate 0.0796   Epoch: 4   Global Step: 21800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:02:58,076-Speed 10619.93 samples/sec   Loss 11.4047   LearningRate 0.0796   Epoch: 4   Global Step: 21810   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:02:59,101-Speed 9992.90 samples/sec   Loss 11.2021   LearningRate 0.0796   Epoch: 4   Global Step: 21820   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:03:00,078-Speed 10500.20 samples/sec   Loss 11.4882   LearningRate 0.0796   Epoch: 4   Global Step: 21830   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:03:01,032-Speed 10736.81 samples/sec   Loss 11.3427   LearningRate 0.0796   Epoch: 4   Global Step: 21840   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:03:01,998-Speed 10614.73 samples/sec   Loss 11.3118   LearningRate 0.0796   Epoch: 4   Global Step: 21850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:03:02,928-Speed 11016.64 samples/sec   Loss 11.4330   LearningRate 0.0796   Epoch: 4   Global Step: 21860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:03:03,904-Speed 10500.45 samples/sec   Loss 11.4441   LearningRate 0.0795   Epoch: 4   Global Step: 21870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:03:04,878-Speed 10531.87 samples/sec   Loss 11.4410   LearningRate 0.0795   Epoch: 4   Global Step: 21880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:03:05,851-Speed 10530.48 samples/sec   Loss 11.3453   LearningRate 0.0795   Epoch: 4   Global Step: 21890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:03:06,796-Speed 10845.43 samples/sec   Loss 11.2943   LearningRate 0.0795   Epoch: 4   Global Step: 21900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:03:07,770-Speed 10523.80 samples/sec   Loss 11.4343   LearningRate 0.0795   Epoch: 4   Global Step: 21910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:03:08,742-Speed 10539.85 samples/sec   Loss 11.3975   LearningRate 0.0795   Epoch: 4   Global Step: 21920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:03:09,753-Speed 10144.42 samples/sec   Loss 11.3436   LearningRate 0.0795   Epoch: 4   Global Step: 21930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:03:10,705-Speed 10761.38 samples/sec   Loss 11.3841   LearningRate 0.0795   Epoch: 4   Global Step: 21940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:03:11,685-Speed 10461.55 samples/sec   Loss 11.2599   LearningRate 0.0795   Epoch: 4   Global Step: 21950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:03:12,670-Speed 10408.62 samples/sec   Loss 11.3679   LearningRate 0.0795   Epoch: 4   Global Step: 21960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:03:13,670-Speed 10249.82 samples/sec   Loss 11.2433   LearningRate 0.0795   Epoch: 4   Global Step: 21970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:03:14,619-Speed 10801.09 samples/sec   Loss 11.3048   LearningRate 0.0795   Epoch: 4   Global Step: 21980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:03:15,561-Speed 10885.29 samples/sec   Loss 11.4213   LearningRate 0.0794   Epoch: 4   Global Step: 21990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:03:16,517-Speed 10722.38 samples/sec   Loss 11.3367   LearningRate 0.0794   Epoch: 4   Global Step: 22000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:03:39,089-[lfw][22000]XNorm: 14.933366
Training: 2022-04-11 00:03:39,090-[lfw][22000]Accuracy-Flip: 0.99250+-0.00423
Training: 2022-04-11 00:03:39,091-[lfw][22000]Accuracy-Highest: 0.99333
Training: 2022-04-11 00:04:04,980-[cfp_fp][22000]XNorm: 12.489851
Training: 2022-04-11 00:04:04,981-[cfp_fp][22000]Accuracy-Flip: 0.93871+-0.01167
Training: 2022-04-11 00:04:04,982-[cfp_fp][22000]Accuracy-Highest: 0.93871
Training: 2022-04-11 00:04:27,233-[agedb_30][22000]XNorm: 14.587775
Training: 2022-04-11 00:04:27,234-[agedb_30][22000]Accuracy-Flip: 0.94550+-0.01440
Training: 2022-04-11 00:04:27,234-[agedb_30][22000]Accuracy-Highest: 0.94550
Training: 2022-04-11 00:04:28,201-Speed 142.85 samples/sec   Loss 11.4881   LearningRate 0.0794   Epoch: 4   Global Step: 22010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:04:29,182-Speed 10449.33 samples/sec   Loss 11.3717   LearningRate 0.0794   Epoch: 4   Global Step: 22020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:04:30,202-Speed 10042.54 samples/sec   Loss 11.3734   LearningRate 0.0794   Epoch: 4   Global Step: 22030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:04:31,171-Speed 10578.96 samples/sec   Loss 11.4888   LearningRate 0.0794   Epoch: 4   Global Step: 22040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:04:32,146-Speed 10520.08 samples/sec   Loss 11.3968   LearningRate 0.0794   Epoch: 4   Global Step: 22050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:04:33,112-Speed 10604.43 samples/sec   Loss 11.3688   LearningRate 0.0794   Epoch: 4   Global Step: 22060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:04:34,089-Speed 10487.65 samples/sec   Loss 11.3448   LearningRate 0.0794   Epoch: 4   Global Step: 22070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:04:35,066-Speed 10496.97 samples/sec   Loss 11.3838   LearningRate 0.0794   Epoch: 4   Global Step: 22080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:04:36,052-Speed 10397.96 samples/sec   Loss 11.5585   LearningRate 0.0794   Epoch: 4   Global Step: 22090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:04:37,006-Speed 10738.85 samples/sec   Loss 11.3724   LearningRate 0.0793   Epoch: 4   Global Step: 22100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:04:37,973-Speed 10603.21 samples/sec   Loss 11.3734   LearningRate 0.0793   Epoch: 4   Global Step: 22110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:04:38,950-Speed 10483.30 samples/sec   Loss 11.3953   LearningRate 0.0793   Epoch: 4   Global Step: 22120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:04:39,944-Speed 10310.91 samples/sec   Loss 11.4221   LearningRate 0.0793   Epoch: 4   Global Step: 22130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:04:40,868-Speed 11106.44 samples/sec   Loss 11.4474   LearningRate 0.0793   Epoch: 4   Global Step: 22140   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 00:04:41,837-Speed 10578.38 samples/sec   Loss 11.2395   LearningRate 0.0793   Epoch: 4   Global Step: 22150   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 00:04:42,805-Speed 10590.64 samples/sec   Loss 11.3848   LearningRate 0.0793   Epoch: 4   Global Step: 22160   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 00:04:43,754-Speed 10806.00 samples/sec   Loss 11.3573   LearningRate 0.0793   Epoch: 4   Global Step: 22170   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 00:04:44,720-Speed 10614.59 samples/sec   Loss 11.4226   LearningRate 0.0793   Epoch: 4   Global Step: 22180   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 00:04:45,705-Speed 10403.00 samples/sec   Loss 11.3066   LearningRate 0.0793   Epoch: 4   Global Step: 22190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:04:46,667-Speed 10676.43 samples/sec   Loss 11.5306   LearningRate 0.0793   Epoch: 4   Global Step: 22200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:04:47,638-Speed 10552.73 samples/sec   Loss 11.4927   LearningRate 0.0792   Epoch: 4   Global Step: 22210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:04:48,596-Speed 10699.45 samples/sec   Loss 11.2175   LearningRate 0.0792   Epoch: 4   Global Step: 22220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:04:49,561-Speed 10618.44 samples/sec   Loss 11.5385   LearningRate 0.0792   Epoch: 4   Global Step: 22230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:04:50,530-Speed 10582.71 samples/sec   Loss 11.3419   LearningRate 0.0792   Epoch: 4   Global Step: 22240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:04:51,498-Speed 10577.11 samples/sec   Loss 11.2866   LearningRate 0.0792   Epoch: 4   Global Step: 22250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:04:52,489-Speed 10349.75 samples/sec   Loss 11.2450   LearningRate 0.0792   Epoch: 4   Global Step: 22260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:04:53,441-Speed 10772.84 samples/sec   Loss 11.4250   LearningRate 0.0792   Epoch: 4   Global Step: 22270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:04:54,413-Speed 10548.89 samples/sec   Loss 11.3064   LearningRate 0.0792   Epoch: 4   Global Step: 22280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:04:55,385-Speed 10540.79 samples/sec   Loss 11.3856   LearningRate 0.0792   Epoch: 4   Global Step: 22290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:04:56,362-Speed 10494.58 samples/sec   Loss 11.4782   LearningRate 0.0792   Epoch: 4   Global Step: 22300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:04:57,335-Speed 10529.41 samples/sec   Loss 11.3362   LearningRate 0.0792   Epoch: 4   Global Step: 22310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:04:58,296-Speed 10668.79 samples/sec   Loss 11.3594   LearningRate 0.0792   Epoch: 4   Global Step: 22320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:04:59,239-Speed 10873.61 samples/sec   Loss 11.4035   LearningRate 0.0791   Epoch: 4   Global Step: 22330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:05:00,204-Speed 10620.32 samples/sec   Loss 11.3973   LearningRate 0.0791   Epoch: 4   Global Step: 22340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:05:01,188-Speed 10407.69 samples/sec   Loss 11.2924   LearningRate 0.0791   Epoch: 4   Global Step: 22350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:05:02,170-Speed 10442.00 samples/sec   Loss 11.4035   LearningRate 0.0791   Epoch: 4   Global Step: 22360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:05:03,152-Speed 10437.14 samples/sec   Loss 11.3221   LearningRate 0.0791   Epoch: 4   Global Step: 22370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:05:04,117-Speed 10622.42 samples/sec   Loss 11.3424   LearningRate 0.0791   Epoch: 4   Global Step: 22380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:05:05,083-Speed 10612.97 samples/sec   Loss 11.5704   LearningRate 0.0791   Epoch: 4   Global Step: 22390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:05:06,032-Speed 10797.62 samples/sec   Loss 11.2844   LearningRate 0.0791   Epoch: 4   Global Step: 22400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:05:07,028-Speed 10298.12 samples/sec   Loss 11.4540   LearningRate 0.0791   Epoch: 4   Global Step: 22410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:05:07,997-Speed 10578.28 samples/sec   Loss 11.2967   LearningRate 0.0791   Epoch: 4   Global Step: 22420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:05:08,929-Speed 10991.84 samples/sec   Loss 11.3367   LearningRate 0.0791   Epoch: 4   Global Step: 22430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:05:09,978-Speed 9771.64 samples/sec   Loss 11.5415   LearningRate 0.0790   Epoch: 4   Global Step: 22440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:05:10,944-Speed 10616.34 samples/sec   Loss 11.2102   LearningRate 0.0790   Epoch: 4   Global Step: 22450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:05:11,913-Speed 10578.20 samples/sec   Loss 11.5093   LearningRate 0.0790   Epoch: 4   Global Step: 22460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:05:12,881-Speed 10585.05 samples/sec   Loss 11.4859   LearningRate 0.0790   Epoch: 4   Global Step: 22470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:05:13,883-Speed 10229.49 samples/sec   Loss 11.4037   LearningRate 0.0790   Epoch: 4   Global Step: 22480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:05:14,873-Speed 10356.74 samples/sec   Loss 11.4393   LearningRate 0.0790   Epoch: 4   Global Step: 22490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:05:15,808-Speed 11050.51 samples/sec   Loss 11.3743   LearningRate 0.0790   Epoch: 4   Global Step: 22500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:05:16,810-Speed 10234.72 samples/sec   Loss 11.4145   LearningRate 0.0790   Epoch: 4   Global Step: 22510   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:05:17,782-Speed 10544.53 samples/sec   Loss 11.4193   LearningRate 0.0790   Epoch: 4   Global Step: 22520   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:05:18,743-Speed 10660.70 samples/sec   Loss 11.3480   LearningRate 0.0790   Epoch: 4   Global Step: 22530   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:05:19,671-Speed 11044.73 samples/sec   Loss 11.3500   LearningRate 0.0790   Epoch: 4   Global Step: 22540   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:05:20,676-Speed 10194.70 samples/sec   Loss 11.3152   LearningRate 0.0790   Epoch: 4   Global Step: 22550   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:05:21,629-Speed 10762.76 samples/sec   Loss 11.3893   LearningRate 0.0789   Epoch: 4   Global Step: 22560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:05:22,597-Speed 10580.05 samples/sec   Loss 11.3489   LearningRate 0.0789   Epoch: 4   Global Step: 22570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:05:23,554-Speed 10714.34 samples/sec   Loss 11.3933   LearningRate 0.0789   Epoch: 4   Global Step: 22580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:05:24,543-Speed 10361.66 samples/sec   Loss 11.2943   LearningRate 0.0789   Epoch: 4   Global Step: 22590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:05:25,509-Speed 10608.67 samples/sec   Loss 11.3234   LearningRate 0.0789   Epoch: 4   Global Step: 22600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:05:26,460-Speed 10781.13 samples/sec   Loss 11.3969   LearningRate 0.0789   Epoch: 4   Global Step: 22610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:05:27,411-Speed 10781.91 samples/sec   Loss 11.1847   LearningRate 0.0789   Epoch: 4   Global Step: 22620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:05:28,362-Speed 10778.86 samples/sec   Loss 11.5062   LearningRate 0.0789   Epoch: 4   Global Step: 22630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:05:29,332-Speed 10573.56 samples/sec   Loss 11.4914   LearningRate 0.0789   Epoch: 4   Global Step: 22640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:05:30,319-Speed 10383.19 samples/sec   Loss 11.4002   LearningRate 0.0789   Epoch: 4   Global Step: 22650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:05:31,351-Speed 9926.69 samples/sec   Loss 11.4204   LearningRate 0.0789   Epoch: 4   Global Step: 22660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:05:32,322-Speed 10563.43 samples/sec   Loss 11.4800   LearningRate 0.0788   Epoch: 4   Global Step: 22670   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 00:05:33,323-Speed 10239.71 samples/sec   Loss 11.4620   LearningRate 0.0788   Epoch: 4   Global Step: 22680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:05:34,297-Speed 10537.02 samples/sec   Loss 11.2518   LearningRate 0.0788   Epoch: 4   Global Step: 22690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:05:35,261-Speed 10626.62 samples/sec   Loss 11.2848   LearningRate 0.0788   Epoch: 4   Global Step: 22700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:05:36,248-Speed 10388.31 samples/sec   Loss 11.3928   LearningRate 0.0788   Epoch: 4   Global Step: 22710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:05:37,198-Speed 10789.88 samples/sec   Loss 11.3582   LearningRate 0.0788   Epoch: 4   Global Step: 22720   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:05:38,161-Speed 10644.36 samples/sec   Loss 11.4526   LearningRate 0.0788   Epoch: 4   Global Step: 22730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:05:39,133-Speed 10542.43 samples/sec   Loss 11.4755   LearningRate 0.0788   Epoch: 4   Global Step: 22740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:05:40,134-Speed 10240.01 samples/sec   Loss 11.3489   LearningRate 0.0788   Epoch: 4   Global Step: 22750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:05:41,113-Speed 10470.66 samples/sec   Loss 11.3332   LearningRate 0.0788   Epoch: 4   Global Step: 22760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:05:42,049-Speed 10950.72 samples/sec   Loss 11.3378   LearningRate 0.0788   Epoch: 4   Global Step: 22770   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:05:43,028-Speed 10464.33 samples/sec   Loss 11.4291   LearningRate 0.0787   Epoch: 4   Global Step: 22780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:05:43,989-Speed 10664.75 samples/sec   Loss 11.4477   LearningRate 0.0787   Epoch: 4   Global Step: 22790   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:05:44,957-Speed 10597.63 samples/sec   Loss 11.3658   LearningRate 0.0787   Epoch: 4   Global Step: 22800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:05:45,889-Speed 10995.86 samples/sec   Loss 11.3606   LearningRate 0.0787   Epoch: 4   Global Step: 22810   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:05:46,863-Speed 10524.29 samples/sec   Loss 11.2986   LearningRate 0.0787   Epoch: 4   Global Step: 22820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:05:47,837-Speed 10544.70 samples/sec   Loss 11.2442   LearningRate 0.0787   Epoch: 4   Global Step: 22830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:05:48,810-Speed 10540.94 samples/sec   Loss 11.2252   LearningRate 0.0787   Epoch: 4   Global Step: 22840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:05:49,787-Speed 10487.68 samples/sec   Loss 11.3174   LearningRate 0.0787   Epoch: 4   Global Step: 22850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:05:50,728-Speed 10892.32 samples/sec   Loss 11.3597   LearningRate 0.0787   Epoch: 4   Global Step: 22860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:05:51,712-Speed 10419.22 samples/sec   Loss 11.3784   LearningRate 0.0787   Epoch: 4   Global Step: 22870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:05:52,696-Speed 10413.73 samples/sec   Loss 11.5326   LearningRate 0.0787   Epoch: 4   Global Step: 22880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:05:53,636-Speed 10902.23 samples/sec   Loss 11.3161   LearningRate 0.0787   Epoch: 4   Global Step: 22890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:05:54,620-Speed 10425.02 samples/sec   Loss 11.3457   LearningRate 0.0786   Epoch: 4   Global Step: 22900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:05:55,596-Speed 10495.48 samples/sec   Loss 11.1924   LearningRate 0.0786   Epoch: 4   Global Step: 22910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:05:56,622-Speed 9993.52 samples/sec   Loss 11.2834   LearningRate 0.0786   Epoch: 4   Global Step: 22920   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 00:05:57,576-Speed 10752.14 samples/sec   Loss 11.2460   LearningRate 0.0786   Epoch: 4   Global Step: 22930   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 00:05:58,517-Speed 10890.92 samples/sec   Loss 11.3347   LearningRate 0.0786   Epoch: 4   Global Step: 22940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:05:59,512-Speed 10298.82 samples/sec   Loss 11.4049   LearningRate 0.0786   Epoch: 4   Global Step: 22950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:06:00,451-Speed 10905.60 samples/sec   Loss 11.4741   LearningRate 0.0786   Epoch: 4   Global Step: 22960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:06:01,436-Speed 10409.31 samples/sec   Loss 11.3342   LearningRate 0.0786   Epoch: 4   Global Step: 22970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:06:02,425-Speed 10367.01 samples/sec   Loss 11.3791   LearningRate 0.0786   Epoch: 4   Global Step: 22980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:06:03,362-Speed 10934.78 samples/sec   Loss 11.1755   LearningRate 0.0786   Epoch: 4   Global Step: 22990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:06:04,306-Speed 10858.02 samples/sec   Loss 11.3780   LearningRate 0.0786   Epoch: 4   Global Step: 23000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:06:05,257-Speed 10777.93 samples/sec   Loss 11.2940   LearningRate 0.0785   Epoch: 4   Global Step: 23010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:06:06,229-Speed 10547.22 samples/sec   Loss 11.2967   LearningRate 0.0785   Epoch: 4   Global Step: 23020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:06:07,230-Speed 10241.79 samples/sec   Loss 11.3110   LearningRate 0.0785   Epoch: 4   Global Step: 23030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:06:08,218-Speed 10378.03 samples/sec   Loss 11.3234   LearningRate 0.0785   Epoch: 4   Global Step: 23040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:06:09,180-Speed 10662.69 samples/sec   Loss 11.4505   LearningRate 0.0785   Epoch: 4   Global Step: 23050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:06:10,142-Speed 10649.34 samples/sec   Loss 11.3297   LearningRate 0.0785   Epoch: 4   Global Step: 23060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:06:11,065-Speed 11099.99 samples/sec   Loss 11.2641   LearningRate 0.0785   Epoch: 4   Global Step: 23070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:06:12,043-Speed 10478.63 samples/sec   Loss 11.3565   LearningRate 0.0785   Epoch: 4   Global Step: 23080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:06:13,017-Speed 10536.64 samples/sec   Loss 11.1504   LearningRate 0.0785   Epoch: 4   Global Step: 23090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:06:13,939-Speed 11116.27 samples/sec   Loss 11.2694   LearningRate 0.0785   Epoch: 4   Global Step: 23100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:06:14,909-Speed 10564.18 samples/sec   Loss 11.4477   LearningRate 0.0785   Epoch: 4   Global Step: 23110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:06:15,868-Speed 10694.76 samples/sec   Loss 11.1955   LearningRate 0.0785   Epoch: 4   Global Step: 23120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:06:16,881-Speed 10118.58 samples/sec   Loss 11.4594   LearningRate 0.0784   Epoch: 4   Global Step: 23130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:06:17,895-Speed 10107.83 samples/sec   Loss 11.3242   LearningRate 0.0784   Epoch: 4   Global Step: 23140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:06:18,863-Speed 10587.06 samples/sec   Loss 11.1958   LearningRate 0.0784   Epoch: 4   Global Step: 23150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:06:19,799-Speed 10945.92 samples/sec   Loss 11.2931   LearningRate 0.0784   Epoch: 4   Global Step: 23160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:06:20,761-Speed 10661.75 samples/sec   Loss 11.4230   LearningRate 0.0784   Epoch: 4   Global Step: 23170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:06:21,760-Speed 10263.30 samples/sec   Loss 11.3955   LearningRate 0.0784   Epoch: 4   Global Step: 23180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:06:22,712-Speed 10787.21 samples/sec   Loss 11.2205   LearningRate 0.0784   Epoch: 4   Global Step: 23190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:06:23,702-Speed 10355.73 samples/sec   Loss 11.4155   LearningRate 0.0784   Epoch: 4   Global Step: 23200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:06:24,656-Speed 10742.28 samples/sec   Loss 11.3367   LearningRate 0.0784   Epoch: 4   Global Step: 23210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:06:25,633-Speed 10490.34 samples/sec   Loss 11.2657   LearningRate 0.0784   Epoch: 4   Global Step: 23220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:06:26,577-Speed 10862.69 samples/sec   Loss 11.2233   LearningRate 0.0784   Epoch: 4   Global Step: 23230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:06:27,542-Speed 10618.90 samples/sec   Loss 11.2233   LearningRate 0.0783   Epoch: 4   Global Step: 23240   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 00:06:28,515-Speed 10538.45 samples/sec   Loss 11.2010   LearningRate 0.0783   Epoch: 4   Global Step: 23250   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 00:06:29,476-Speed 10659.02 samples/sec   Loss 11.1778   LearningRate 0.0783   Epoch: 4   Global Step: 23260   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 00:06:30,480-Speed 10209.90 samples/sec   Loss 11.4335   LearningRate 0.0783   Epoch: 4   Global Step: 23270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:06:31,430-Speed 10791.64 samples/sec   Loss 11.2318   LearningRate 0.0783   Epoch: 4   Global Step: 23280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:06:32,411-Speed 10444.48 samples/sec   Loss 11.3700   LearningRate 0.0783   Epoch: 4   Global Step: 23290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:06:33,375-Speed 10638.07 samples/sec   Loss 11.4543   LearningRate 0.0783   Epoch: 4   Global Step: 23300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:06:34,427-Speed 9737.80 samples/sec   Loss 11.2775   LearningRate 0.0783   Epoch: 4   Global Step: 23310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:06:35,408-Speed 10445.29 samples/sec   Loss 11.3406   LearningRate 0.0783   Epoch: 4   Global Step: 23320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:06:36,357-Speed 10803.30 samples/sec   Loss 11.1747   LearningRate 0.0783   Epoch: 4   Global Step: 23330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:06:37,341-Speed 10411.46 samples/sec   Loss 11.3196   LearningRate 0.0783   Epoch: 4   Global Step: 23340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:06:38,353-Speed 10131.59 samples/sec   Loss 11.3118   LearningRate 0.0782   Epoch: 4   Global Step: 23350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:06:39,320-Speed 10622.02 samples/sec   Loss 11.4730   LearningRate 0.0782   Epoch: 4   Global Step: 23360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:06:40,300-Speed 10454.03 samples/sec   Loss 11.4474   LearningRate 0.0782   Epoch: 4   Global Step: 23370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:06:41,294-Speed 10315.37 samples/sec   Loss 11.2810   LearningRate 0.0782   Epoch: 4   Global Step: 23380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:06:42,251-Speed 10708.65 samples/sec   Loss 11.3548   LearningRate 0.0782   Epoch: 4   Global Step: 23390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:06:43,208-Speed 10709.84 samples/sec   Loss 11.2534   LearningRate 0.0782   Epoch: 4   Global Step: 23400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:06:44,121-Speed 11232.50 samples/sec   Loss 11.2583   LearningRate 0.0782   Epoch: 4   Global Step: 23410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:06:45,102-Speed 10451.49 samples/sec   Loss 11.3977   LearningRate 0.0782   Epoch: 4   Global Step: 23420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:06:46,100-Speed 10273.05 samples/sec   Loss 11.1899   LearningRate 0.0782   Epoch: 4   Global Step: 23430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:06:47,033-Speed 10986.84 samples/sec   Loss 11.4055   LearningRate 0.0782   Epoch: 4   Global Step: 23440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:06:47,986-Speed 10750.57 samples/sec   Loss 11.3725   LearningRate 0.0782   Epoch: 4   Global Step: 23450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:06:48,923-Speed 10944.47 samples/sec   Loss 11.3598   LearningRate 0.0782   Epoch: 4   Global Step: 23460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:06:49,921-Speed 10272.59 samples/sec   Loss 11.4232   LearningRate 0.0781   Epoch: 4   Global Step: 23470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:06:50,864-Speed 10867.81 samples/sec   Loss 11.2670   LearningRate 0.0781   Epoch: 4   Global Step: 23480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:06:51,873-Speed 10159.82 samples/sec   Loss 11.5028   LearningRate 0.0781   Epoch: 4   Global Step: 23490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:06:52,817-Speed 10865.62 samples/sec   Loss 11.2716   LearningRate 0.0781   Epoch: 4   Global Step: 23500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:06:53,762-Speed 10859.66 samples/sec   Loss 11.2546   LearningRate 0.0781   Epoch: 4   Global Step: 23510   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:06:54,753-Speed 10347.33 samples/sec   Loss 11.2301   LearningRate 0.0781   Epoch: 4   Global Step: 23520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:06:55,684-Speed 11010.38 samples/sec   Loss 11.3282   LearningRate 0.0781   Epoch: 4   Global Step: 23530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:06:56,731-Speed 9786.48 samples/sec   Loss 11.4093   LearningRate 0.0781   Epoch: 4   Global Step: 23540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:06:57,696-Speed 10626.81 samples/sec   Loss 11.4812   LearningRate 0.0781   Epoch: 4   Global Step: 23550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:06:58,687-Speed 10340.69 samples/sec   Loss 11.3291   LearningRate 0.0781   Epoch: 4   Global Step: 23560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:06:59,697-Speed 10144.02 samples/sec   Loss 10.9805   LearningRate 0.0781   Epoch: 4   Global Step: 23570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:07:00,694-Speed 10289.19 samples/sec   Loss 11.1559   LearningRate 0.0780   Epoch: 4   Global Step: 23580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:07:01,700-Speed 10193.60 samples/sec   Loss 11.2393   LearningRate 0.0780   Epoch: 4   Global Step: 23590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:07:02,666-Speed 10607.87 samples/sec   Loss 11.2680   LearningRate 0.0780   Epoch: 4   Global Step: 23600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:07:03,722-Speed 9705.43 samples/sec   Loss 11.4823   LearningRate 0.0780   Epoch: 4   Global Step: 23610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:07:04,674-Speed 10764.81 samples/sec   Loss 11.3617   LearningRate 0.0780   Epoch: 4   Global Step: 23620   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 00:07:05,630-Speed 10719.84 samples/sec   Loss 11.4826   LearningRate 0.0780   Epoch: 4   Global Step: 23630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:07:06,631-Speed 10241.64 samples/sec   Loss 11.3234   LearningRate 0.0780   Epoch: 4   Global Step: 23640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:07:07,604-Speed 10528.73 samples/sec   Loss 11.1756   LearningRate 0.0780   Epoch: 4   Global Step: 23650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:07:08,579-Speed 10515.62 samples/sec   Loss 11.2457   LearningRate 0.0780   Epoch: 4   Global Step: 23660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:07:09,527-Speed 10820.36 samples/sec   Loss 11.1999   LearningRate 0.0780   Epoch: 4   Global Step: 23670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:07:10,508-Speed 10582.06 samples/sec   Loss 11.2521   LearningRate 0.0780   Epoch: 4   Global Step: 23680   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:07:11,462-Speed 10739.06 samples/sec   Loss 11.3338   LearningRate 0.0780   Epoch: 4   Global Step: 23690   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:07:12,450-Speed 10389.70 samples/sec   Loss 11.3672   LearningRate 0.0779   Epoch: 4   Global Step: 23700   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:07:13,420-Speed 10556.04 samples/sec   Loss 11.4033   LearningRate 0.0779   Epoch: 4   Global Step: 23710   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:07:14,411-Speed 10342.15 samples/sec   Loss 11.1318   LearningRate 0.0779   Epoch: 4   Global Step: 23720   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:07:15,364-Speed 10763.05 samples/sec   Loss 11.3466   LearningRate 0.0779   Epoch: 4   Global Step: 23730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:07:16,338-Speed 10516.07 samples/sec   Loss 11.1426   LearningRate 0.0779   Epoch: 4   Global Step: 23740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:07:17,327-Speed 10370.23 samples/sec   Loss 11.1484   LearningRate 0.0779   Epoch: 4   Global Step: 23750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:07:18,289-Speed 10654.73 samples/sec   Loss 11.0343   LearningRate 0.0779   Epoch: 4   Global Step: 23760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:07:19,251-Speed 10682.31 samples/sec   Loss 11.2948   LearningRate 0.0779   Epoch: 4   Global Step: 23770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:07:20,269-Speed 10080.66 samples/sec   Loss 11.3099   LearningRate 0.0779   Epoch: 4   Global Step: 23780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:07:21,241-Speed 10550.47 samples/sec   Loss 11.1950   LearningRate 0.0779   Epoch: 4   Global Step: 23790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:07:22,207-Speed 10613.10 samples/sec   Loss 11.0554   LearningRate 0.0779   Epoch: 4   Global Step: 23800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:07:23,183-Speed 10497.72 samples/sec   Loss 11.2827   LearningRate 0.0778   Epoch: 4   Global Step: 23810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:07:24,162-Speed 10473.95 samples/sec   Loss 11.3545   LearningRate 0.0778   Epoch: 4   Global Step: 23820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:07:25,100-Speed 10926.79 samples/sec   Loss 11.4583   LearningRate 0.0778   Epoch: 4   Global Step: 23830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:07:26,103-Speed 10220.57 samples/sec   Loss 11.3858   LearningRate 0.0778   Epoch: 4   Global Step: 23840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:07:27,034-Speed 11016.35 samples/sec   Loss 11.1659   LearningRate 0.0778   Epoch: 4   Global Step: 23850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:07:28,002-Speed 10588.49 samples/sec   Loss 11.4135   LearningRate 0.0778   Epoch: 4   Global Step: 23860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:07:29,004-Speed 10230.47 samples/sec   Loss 11.1445   LearningRate 0.0778   Epoch: 4   Global Step: 23870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:07:29,991-Speed 10383.06 samples/sec   Loss 11.1869   LearningRate 0.0778   Epoch: 4   Global Step: 23880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:07:30,972-Speed 10449.76 samples/sec   Loss 11.3218   LearningRate 0.0778   Epoch: 4   Global Step: 23890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:07:31,905-Speed 10985.84 samples/sec   Loss 11.2422   LearningRate 0.0778   Epoch: 4   Global Step: 23900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:07:32,852-Speed 10821.88 samples/sec   Loss 11.3470   LearningRate 0.0778   Epoch: 4   Global Step: 23910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:07:33,901-Speed 9769.97 samples/sec   Loss 11.2774   LearningRate 0.0778   Epoch: 4   Global Step: 23920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:07:34,850-Speed 10801.28 samples/sec   Loss 11.4353   LearningRate 0.0777   Epoch: 4   Global Step: 23930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:07:35,820-Speed 10570.47 samples/sec   Loss 11.2197   LearningRate 0.0777   Epoch: 4   Global Step: 23940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:07:36,792-Speed 10544.67 samples/sec   Loss 11.1538   LearningRate 0.0777   Epoch: 4   Global Step: 23950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:07:37,790-Speed 10270.26 samples/sec   Loss 11.2317   LearningRate 0.0777   Epoch: 4   Global Step: 23960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:07:38,764-Speed 10520.91 samples/sec   Loss 11.2939   LearningRate 0.0777   Epoch: 4   Global Step: 23970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:07:39,741-Speed 10491.99 samples/sec   Loss 11.2472   LearningRate 0.0777   Epoch: 4   Global Step: 23980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:07:40,698-Speed 10715.40 samples/sec   Loss 11.1964   LearningRate 0.0777   Epoch: 4   Global Step: 23990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:07:41,731-Speed 9926.81 samples/sec   Loss 11.1775   LearningRate 0.0777   Epoch: 4   Global Step: 24000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:08:05,227-[lfw][24000]XNorm: 14.726170
Training: 2022-04-11 00:08:05,227-[lfw][24000]Accuracy-Flip: 0.99400+-0.00351
Training: 2022-04-11 00:08:05,228-[lfw][24000]Accuracy-Highest: 0.99400
Training: 2022-04-11 00:08:31,030-[cfp_fp][24000]XNorm: 12.455493
Training: 2022-04-11 00:08:31,030-[cfp_fp][24000]Accuracy-Flip: 0.94214+-0.01356
Training: 2022-04-11 00:08:31,031-[cfp_fp][24000]Accuracy-Highest: 0.94214
Training: 2022-04-11 00:08:53,076-[agedb_30][24000]XNorm: 14.520351
Training: 2022-04-11 00:08:53,077-[agedb_30][24000]Accuracy-Flip: 0.94683+-0.01018
Training: 2022-04-11 00:08:53,077-[agedb_30][24000]Accuracy-Highest: 0.94683
Training: 2022-04-11 00:08:54,041-Speed 141.61 samples/sec   Loss 11.2575   LearningRate 0.0777   Epoch: 4   Global Step: 24010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:08:55,047-Speed 10186.42 samples/sec   Loss 11.3053   LearningRate 0.0777   Epoch: 4   Global Step: 24020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:08:56,019-Speed 10551.32 samples/sec   Loss 11.1402   LearningRate 0.0777   Epoch: 4   Global Step: 24030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:08:56,981-Speed 10655.24 samples/sec   Loss 11.1941   LearningRate 0.0776   Epoch: 4   Global Step: 24040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:08:57,939-Speed 10695.12 samples/sec   Loss 11.3976   LearningRate 0.0776   Epoch: 4   Global Step: 24050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:08:58,917-Speed 10476.53 samples/sec   Loss 11.0577   LearningRate 0.0776   Epoch: 4   Global Step: 24060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:08:59,905-Speed 10379.22 samples/sec   Loss 11.1785   LearningRate 0.0776   Epoch: 4   Global Step: 24070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:00,892-Speed 10382.75 samples/sec   Loss 11.3576   LearningRate 0.0776   Epoch: 4   Global Step: 24080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:01,843-Speed 10782.80 samples/sec   Loss 11.1952   LearningRate 0.0776   Epoch: 4   Global Step: 24090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:02,794-Speed 10772.73 samples/sec   Loss 11.2591   LearningRate 0.0776   Epoch: 4   Global Step: 24100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:03,779-Speed 10409.54 samples/sec   Loss 11.2626   LearningRate 0.0776   Epoch: 4   Global Step: 24110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:04,727-Speed 10814.24 samples/sec   Loss 11.3046   LearningRate 0.0776   Epoch: 4   Global Step: 24120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:05,717-Speed 10354.02 samples/sec   Loss 11.1153   LearningRate 0.0776   Epoch: 4   Global Step: 24130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:06,658-Speed 10886.93 samples/sec   Loss 11.0698   LearningRate 0.0776   Epoch: 4   Global Step: 24140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:07,664-Speed 10185.43 samples/sec   Loss 11.1365   LearningRate 0.0776   Epoch: 4   Global Step: 24150   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 00:09:08,619-Speed 10736.25 samples/sec   Loss 11.2185   LearningRate 0.0775   Epoch: 4   Global Step: 24160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:09,590-Speed 10556.22 samples/sec   Loss 11.3744   LearningRate 0.0775   Epoch: 4   Global Step: 24170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:10,578-Speed 10372.80 samples/sec   Loss 11.1105   LearningRate 0.0775   Epoch: 4   Global Step: 24180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:11,552-Speed 10527.50 samples/sec   Loss 11.2202   LearningRate 0.0775   Epoch: 4   Global Step: 24190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:09:12,525-Speed 10537.53 samples/sec   Loss 11.0789   LearningRate 0.0775   Epoch: 4   Global Step: 24200   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:09:13,465-Speed 10897.50 samples/sec   Loss 11.3177   LearningRate 0.0775   Epoch: 4   Global Step: 24210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:09:14,431-Speed 10612.87 samples/sec   Loss 11.3737   LearningRate 0.0775   Epoch: 4   Global Step: 24220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:09:15,362-Speed 11009.11 samples/sec   Loss 11.1626   LearningRate 0.0775   Epoch: 4   Global Step: 24230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:09:16,275-Speed 11228.57 samples/sec   Loss 11.1486   LearningRate 0.0775   Epoch: 4   Global Step: 24240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:09:17,278-Speed 10212.96 samples/sec   Loss 11.2839   LearningRate 0.0775   Epoch: 4   Global Step: 24250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:09:18,261-Speed 10430.91 samples/sec   Loss 11.3382   LearningRate 0.0775   Epoch: 4   Global Step: 24260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:09:19,219-Speed 10700.50 samples/sec   Loss 11.2428   LearningRate 0.0774   Epoch: 4   Global Step: 24270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:09:20,165-Speed 10827.65 samples/sec   Loss 11.1856   LearningRate 0.0774   Epoch: 4   Global Step: 24280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:09:21,119-Speed 10742.97 samples/sec   Loss 11.3076   LearningRate 0.0774   Epoch: 4   Global Step: 24290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:22,065-Speed 10840.68 samples/sec   Loss 11.2498   LearningRate 0.0774   Epoch: 4   Global Step: 24300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:23,024-Speed 10685.15 samples/sec   Loss 11.3953   LearningRate 0.0774   Epoch: 4   Global Step: 24310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:23,994-Speed 10567.28 samples/sec   Loss 11.1090   LearningRate 0.0774   Epoch: 4   Global Step: 24320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:24,985-Speed 10334.90 samples/sec   Loss 11.5553   LearningRate 0.0774   Epoch: 4   Global Step: 24330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:25,955-Speed 10572.47 samples/sec   Loss 11.3242   LearningRate 0.0774   Epoch: 4   Global Step: 24340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:26,959-Speed 10204.02 samples/sec   Loss 11.3334   LearningRate 0.0774   Epoch: 4   Global Step: 24350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:27,924-Speed 10631.82 samples/sec   Loss 11.3998   LearningRate 0.0774   Epoch: 4   Global Step: 24360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:28,898-Speed 10518.77 samples/sec   Loss 11.1585   LearningRate 0.0774   Epoch: 4   Global Step: 24370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:29,868-Speed 10565.92 samples/sec   Loss 11.3059   LearningRate 0.0774   Epoch: 4   Global Step: 24380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:30,854-Speed 10398.80 samples/sec   Loss 11.2332   LearningRate 0.0773   Epoch: 4   Global Step: 24390   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 00:09:31,813-Speed 10686.64 samples/sec   Loss 11.1544   LearningRate 0.0773   Epoch: 4   Global Step: 24400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:32,733-Speed 11141.53 samples/sec   Loss 11.2574   LearningRate 0.0773   Epoch: 4   Global Step: 24410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:33,706-Speed 10534.12 samples/sec   Loss 11.1228   LearningRate 0.0773   Epoch: 4   Global Step: 24420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:34,681-Speed 10520.76 samples/sec   Loss 11.3056   LearningRate 0.0773   Epoch: 4   Global Step: 24430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:35,605-Speed 11093.88 samples/sec   Loss 11.5075   LearningRate 0.0773   Epoch: 4   Global Step: 24440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:36,577-Speed 10538.03 samples/sec   Loss 11.3425   LearningRate 0.0773   Epoch: 4   Global Step: 24450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:37,555-Speed 10479.19 samples/sec   Loss 11.1332   LearningRate 0.0773   Epoch: 4   Global Step: 24460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:38,558-Speed 10221.71 samples/sec   Loss 11.2848   LearningRate 0.0773   Epoch: 4   Global Step: 24470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:39,564-Speed 10185.44 samples/sec   Loss 11.2487   LearningRate 0.0773   Epoch: 4   Global Step: 24480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:40,507-Speed 10866.87 samples/sec   Loss 11.0710   LearningRate 0.0773   Epoch: 4   Global Step: 24490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:41,494-Speed 10387.18 samples/sec   Loss 11.2805   LearningRate 0.0772   Epoch: 4   Global Step: 24500   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 00:09:42,477-Speed 10424.23 samples/sec   Loss 11.1725   LearningRate 0.0772   Epoch: 4   Global Step: 24510   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 00:09:43,466-Speed 10370.35 samples/sec   Loss 11.1132   LearningRate 0.0772   Epoch: 4   Global Step: 24520   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 00:09:44,397-Speed 11004.87 samples/sec   Loss 11.1516   LearningRate 0.0772   Epoch: 4   Global Step: 24530   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 00:09:45,351-Speed 10737.33 samples/sec   Loss 11.2141   LearningRate 0.0772   Epoch: 4   Global Step: 24540   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 00:09:46,282-Speed 11012.08 samples/sec   Loss 11.2413   LearningRate 0.0772   Epoch: 4   Global Step: 24550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:47,291-Speed 10168.63 samples/sec   Loss 11.1152   LearningRate 0.0772   Epoch: 4   Global Step: 24560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:48,246-Speed 10728.85 samples/sec   Loss 11.1957   LearningRate 0.0772   Epoch: 4   Global Step: 24570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:49,223-Speed 10493.14 samples/sec   Loss 11.2015   LearningRate 0.0772   Epoch: 4   Global Step: 24580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:50,188-Speed 10612.63 samples/sec   Loss 11.3052   LearningRate 0.0772   Epoch: 4   Global Step: 24590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:51,157-Speed 10579.59 samples/sec   Loss 11.0978   LearningRate 0.0772   Epoch: 4   Global Step: 24600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:52,145-Speed 10372.49 samples/sec   Loss 11.2931   LearningRate 0.0772   Epoch: 4   Global Step: 24610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:53,100-Speed 10741.26 samples/sec   Loss 11.1490   LearningRate 0.0771   Epoch: 4   Global Step: 24620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:54,047-Speed 10822.16 samples/sec   Loss 11.1380   LearningRate 0.0771   Epoch: 4   Global Step: 24630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:55,008-Speed 10655.59 samples/sec   Loss 11.3812   LearningRate 0.0771   Epoch: 4   Global Step: 24640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:09:55,960-Speed 10768.02 samples/sec   Loss 11.1142   LearningRate 0.0771   Epoch: 4   Global Step: 24650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:09:56,954-Speed 10311.70 samples/sec   Loss 11.2358   LearningRate 0.0771   Epoch: 4   Global Step: 24660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:09:57,935-Speed 10450.65 samples/sec   Loss 11.2087   LearningRate 0.0771   Epoch: 4   Global Step: 24670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:09:58,906-Speed 10563.91 samples/sec   Loss 11.1419   LearningRate 0.0771   Epoch: 4   Global Step: 24680   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:09:59,894-Speed 10373.99 samples/sec   Loss 11.4278   LearningRate 0.0771   Epoch: 4   Global Step: 24690   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:10:00,870-Speed 10507.06 samples/sec   Loss 11.1694   LearningRate 0.0771   Epoch: 4   Global Step: 24700   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:10:01,840-Speed 10562.16 samples/sec   Loss 11.1591   LearningRate 0.0771   Epoch: 4   Global Step: 24710   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:10:02,818-Speed 10482.29 samples/sec   Loss 11.0244   LearningRate 0.0771   Epoch: 4   Global Step: 24720   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:10:03,802-Speed 10415.52 samples/sec   Loss 11.2769   LearningRate 0.0770   Epoch: 4   Global Step: 24730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:10:04,764-Speed 10655.64 samples/sec   Loss 11.1500   LearningRate 0.0770   Epoch: 4   Global Step: 24740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:10:05,713-Speed 10802.70 samples/sec   Loss 11.1343   LearningRate 0.0770   Epoch: 4   Global Step: 24750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:06,696-Speed 10434.73 samples/sec   Loss 11.2028   LearningRate 0.0770   Epoch: 4   Global Step: 24760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:07,663-Speed 10605.38 samples/sec   Loss 11.1629   LearningRate 0.0770   Epoch: 4   Global Step: 24770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:08,657-Speed 10303.03 samples/sec   Loss 11.2808   LearningRate 0.0770   Epoch: 4   Global Step: 24780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:09,648-Speed 10346.66 samples/sec   Loss 11.3296   LearningRate 0.0770   Epoch: 4   Global Step: 24790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:10,604-Speed 10715.90 samples/sec   Loss 11.1208   LearningRate 0.0770   Epoch: 4   Global Step: 24800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:11,561-Speed 10713.95 samples/sec   Loss 11.0489   LearningRate 0.0770   Epoch: 4   Global Step: 24810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:12,516-Speed 10733.55 samples/sec   Loss 11.1411   LearningRate 0.0770   Epoch: 4   Global Step: 24820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:13,475-Speed 10689.33 samples/sec   Loss 11.2243   LearningRate 0.0770   Epoch: 4   Global Step: 24830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:14,477-Speed 10226.39 samples/sec   Loss 11.2792   LearningRate 0.0770   Epoch: 4   Global Step: 24840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:15,432-Speed 10735.07 samples/sec   Loss 11.3026   LearningRate 0.0769   Epoch: 4   Global Step: 24850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:16,382-Speed 10790.73 samples/sec   Loss 11.1185   LearningRate 0.0769   Epoch: 4   Global Step: 24860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:17,357-Speed 10510.51 samples/sec   Loss 11.2961   LearningRate 0.0769   Epoch: 4   Global Step: 24870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:18,334-Speed 10498.55 samples/sec   Loss 11.1960   LearningRate 0.0769   Epoch: 4   Global Step: 24880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:19,365-Speed 9944.80 samples/sec   Loss 11.2724   LearningRate 0.0769   Epoch: 4   Global Step: 24890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:20,315-Speed 10788.78 samples/sec   Loss 11.2866   LearningRate 0.0769   Epoch: 4   Global Step: 24900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:21,298-Speed 10430.95 samples/sec   Loss 11.1407   LearningRate 0.0769   Epoch: 4   Global Step: 24910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:22,247-Speed 10793.67 samples/sec   Loss 11.1656   LearningRate 0.0769   Epoch: 4   Global Step: 24920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:23,213-Speed 10608.03 samples/sec   Loss 11.1854   LearningRate 0.0769   Epoch: 4   Global Step: 24930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:24,195-Speed 10447.05 samples/sec   Loss 11.0162   LearningRate 0.0769   Epoch: 4   Global Step: 24940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:25,179-Speed 10414.85 samples/sec   Loss 10.9691   LearningRate 0.0769   Epoch: 4   Global Step: 24950   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 00:10:26,126-Speed 10815.50 samples/sec   Loss 11.2975   LearningRate 0.0768   Epoch: 4   Global Step: 24960   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 00:10:27,097-Speed 10552.42 samples/sec   Loss 11.1433   LearningRate 0.0768   Epoch: 4   Global Step: 24970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:28,046-Speed 10804.47 samples/sec   Loss 11.1101   LearningRate 0.0768   Epoch: 4   Global Step: 24980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:29,014-Speed 10588.50 samples/sec   Loss 11.1923   LearningRate 0.0768   Epoch: 4   Global Step: 24990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:10:29,994-Speed 10460.99 samples/sec   Loss 11.1883   LearningRate 0.0768   Epoch: 4   Global Step: 25000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:10:30,983-Speed 10359.17 samples/sec   Loss 11.3255   LearningRate 0.0768   Epoch: 4   Global Step: 25010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:10:31,998-Speed 10100.16 samples/sec   Loss 11.1407   LearningRate 0.0768   Epoch: 4   Global Step: 25020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:10:32,966-Speed 10587.54 samples/sec   Loss 10.9720   LearningRate 0.0768   Epoch: 4   Global Step: 25030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:10:33,909-Speed 10872.02 samples/sec   Loss 11.0874   LearningRate 0.0768   Epoch: 4   Global Step: 25040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:10:34,895-Speed 10452.09 samples/sec   Loss 11.2043   LearningRate 0.0768   Epoch: 4   Global Step: 25050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:10:35,861-Speed 10605.72 samples/sec   Loss 11.1979   LearningRate 0.0768   Epoch: 4   Global Step: 25060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:10:36,866-Speed 10195.54 samples/sec   Loss 11.2127   LearningRate 0.0768   Epoch: 4   Global Step: 25070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:10:37,832-Speed 10616.89 samples/sec   Loss 11.2975   LearningRate 0.0767   Epoch: 4   Global Step: 25080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:10:38,788-Speed 10719.02 samples/sec   Loss 11.0550   LearningRate 0.0767   Epoch: 4   Global Step: 25090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:39,743-Speed 10730.98 samples/sec   Loss 11.2427   LearningRate 0.0767   Epoch: 4   Global Step: 25100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:40,703-Speed 10682.56 samples/sec   Loss 11.1988   LearningRate 0.0767   Epoch: 4   Global Step: 25110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:41,655-Speed 10768.60 samples/sec   Loss 11.1745   LearningRate 0.0767   Epoch: 4   Global Step: 25120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:42,571-Speed 11191.68 samples/sec   Loss 11.1874   LearningRate 0.0767   Epoch: 4   Global Step: 25130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:43,550-Speed 10466.30 samples/sec   Loss 11.2132   LearningRate 0.0767   Epoch: 4   Global Step: 25140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:44,512-Speed 10661.92 samples/sec   Loss 10.9625   LearningRate 0.0767   Epoch: 4   Global Step: 25150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:45,470-Speed 10689.46 samples/sec   Loss 11.2599   LearningRate 0.0767   Epoch: 4   Global Step: 25160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:46,439-Speed 10577.36 samples/sec   Loss 11.3316   LearningRate 0.0767   Epoch: 4   Global Step: 25170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:47,421-Speed 10446.66 samples/sec   Loss 11.2817   LearningRate 0.0767   Epoch: 4   Global Step: 25180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:48,417-Speed 10290.97 samples/sec   Loss 11.0699   LearningRate 0.0766   Epoch: 4   Global Step: 25190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:49,391-Speed 10526.58 samples/sec   Loss 11.1501   LearningRate 0.0766   Epoch: 4   Global Step: 25200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:50,383-Speed 10331.22 samples/sec   Loss 11.0123   LearningRate 0.0766   Epoch: 4   Global Step: 25210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:51,348-Speed 10615.23 samples/sec   Loss 11.1510   LearningRate 0.0766   Epoch: 4   Global Step: 25220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:52,309-Speed 10663.98 samples/sec   Loss 11.2145   LearningRate 0.0766   Epoch: 4   Global Step: 25230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:53,275-Speed 10627.13 samples/sec   Loss 11.1307   LearningRate 0.0766   Epoch: 4   Global Step: 25240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:54,239-Speed 10631.89 samples/sec   Loss 11.1130   LearningRate 0.0766   Epoch: 4   Global Step: 25250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:55,199-Speed 10669.65 samples/sec   Loss 11.2417   LearningRate 0.0766   Epoch: 4   Global Step: 25260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:56,127-Speed 11054.20 samples/sec   Loss 11.0586   LearningRate 0.0766   Epoch: 4   Global Step: 25270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:57,173-Speed 9791.72 samples/sec   Loss 11.3649   LearningRate 0.0766   Epoch: 4   Global Step: 25280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:10:58,186-Speed 10124.42 samples/sec   Loss 11.3153   LearningRate 0.0766   Epoch: 4   Global Step: 25290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:09,135-Speed 935.34 samples/sec   Loss 10.2221   LearningRate 0.0766   Epoch: 5   Global Step: 25300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:10,264-Speed 9081.22 samples/sec   Loss 10.2529   LearningRate 0.0765   Epoch: 5   Global Step: 25310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:11,270-Speed 10185.51 samples/sec   Loss 10.3386   LearningRate 0.0765   Epoch: 5   Global Step: 25320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:12,382-Speed 9219.35 samples/sec   Loss 10.3381   LearningRate 0.0765   Epoch: 5   Global Step: 25330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:13,382-Speed 10248.61 samples/sec   Loss 10.1877   LearningRate 0.0765   Epoch: 5   Global Step: 25340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:14,393-Speed 10136.93 samples/sec   Loss 10.2648   LearningRate 0.0765   Epoch: 5   Global Step: 25350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:15,375-Speed 10431.43 samples/sec   Loss 10.3050   LearningRate 0.0765   Epoch: 5   Global Step: 25360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:16,489-Speed 9209.00 samples/sec   Loss 10.2414   LearningRate 0.0765   Epoch: 5   Global Step: 25370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:17,449-Speed 10674.71 samples/sec   Loss 10.4949   LearningRate 0.0765   Epoch: 5   Global Step: 25380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:18,424-Speed 10516.69 samples/sec   Loss 10.2504   LearningRate 0.0765   Epoch: 5   Global Step: 25390   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 00:11:19,803-Speed 7428.28 samples/sec   Loss 10.3432   LearningRate 0.0765   Epoch: 5   Global Step: 25400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:20,784-Speed 10448.24 samples/sec   Loss 10.3680   LearningRate 0.0765   Epoch: 5   Global Step: 25410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:21,808-Speed 10008.16 samples/sec   Loss 10.2642   LearningRate 0.0765   Epoch: 5   Global Step: 25420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:22,847-Speed 9868.72 samples/sec   Loss 10.1524   LearningRate 0.0764   Epoch: 5   Global Step: 25430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:23,764-Speed 11167.82 samples/sec   Loss 10.2750   LearningRate 0.0764   Epoch: 5   Global Step: 25440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:24,740-Speed 10504.39 samples/sec   Loss 10.2985   LearningRate 0.0764   Epoch: 5   Global Step: 25450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:25,685-Speed 10845.49 samples/sec   Loss 10.4994   LearningRate 0.0764   Epoch: 5   Global Step: 25460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:26,686-Speed 10244.96 samples/sec   Loss 10.3718   LearningRate 0.0764   Epoch: 5   Global Step: 25470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:27,651-Speed 10618.34 samples/sec   Loss 10.3345   LearningRate 0.0764   Epoch: 5   Global Step: 25480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:28,654-Speed 10221.89 samples/sec   Loss 10.4589   LearningRate 0.0764   Epoch: 5   Global Step: 25490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:29,650-Speed 10293.76 samples/sec   Loss 10.2834   LearningRate 0.0764   Epoch: 5   Global Step: 25500   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 00:11:30,636-Speed 10397.68 samples/sec   Loss 10.2100   LearningRate 0.0764   Epoch: 5   Global Step: 25510   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 00:11:31,581-Speed 10847.04 samples/sec   Loss 10.3885   LearningRate 0.0764   Epoch: 5   Global Step: 25520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:32,567-Speed 10402.23 samples/sec   Loss 10.3097   LearningRate 0.0764   Epoch: 5   Global Step: 25530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:33,580-Speed 10121.71 samples/sec   Loss 10.3894   LearningRate 0.0763   Epoch: 5   Global Step: 25540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:34,621-Speed 9839.30 samples/sec   Loss 10.3125   LearningRate 0.0763   Epoch: 5   Global Step: 25550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:35,561-Speed 10908.64 samples/sec   Loss 10.1939   LearningRate 0.0763   Epoch: 5   Global Step: 25560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:36,551-Speed 10358.56 samples/sec   Loss 10.2118   LearningRate 0.0763   Epoch: 5   Global Step: 25570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:37,512-Speed 10662.79 samples/sec   Loss 10.4406   LearningRate 0.0763   Epoch: 5   Global Step: 25580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:38,472-Speed 10676.59 samples/sec   Loss 10.5125   LearningRate 0.0763   Epoch: 5   Global Step: 25590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:39,531-Speed 9681.05 samples/sec   Loss 10.5178   LearningRate 0.0763   Epoch: 5   Global Step: 25600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:40,870-Speed 7649.41 samples/sec   Loss 10.5807   LearningRate 0.0763   Epoch: 5   Global Step: 25610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:41,825-Speed 10739.90 samples/sec   Loss 10.5391   LearningRate 0.0763   Epoch: 5   Global Step: 25620   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 00:11:42,799-Speed 10516.07 samples/sec   Loss 10.4122   LearningRate 0.0763   Epoch: 5   Global Step: 25630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:43,755-Speed 10725.89 samples/sec   Loss 10.4375   LearningRate 0.0763   Epoch: 5   Global Step: 25640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:44,731-Speed 10494.32 samples/sec   Loss 10.6426   LearningRate 0.0763   Epoch: 5   Global Step: 25650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:45,703-Speed 10551.69 samples/sec   Loss 10.5794   LearningRate 0.0762   Epoch: 5   Global Step: 25660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:46,702-Speed 10262.24 samples/sec   Loss 10.6219   LearningRate 0.0762   Epoch: 5   Global Step: 25670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:47,678-Speed 10492.93 samples/sec   Loss 10.5861   LearningRate 0.0762   Epoch: 5   Global Step: 25680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:48,649-Speed 10559.69 samples/sec   Loss 10.4850   LearningRate 0.0762   Epoch: 5   Global Step: 25690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:49,660-Speed 10137.86 samples/sec   Loss 10.6283   LearningRate 0.0762   Epoch: 5   Global Step: 25700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:50,749-Speed 9426.69 samples/sec   Loss 10.7545   LearningRate 0.0762   Epoch: 5   Global Step: 25710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:51,712-Speed 10637.10 samples/sec   Loss 10.2748   LearningRate 0.0762   Epoch: 5   Global Step: 25720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:52,713-Speed 10234.24 samples/sec   Loss 10.4809   LearningRate 0.0762   Epoch: 5   Global Step: 25730   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 00:11:53,675-Speed 10665.20 samples/sec   Loss 10.6021   LearningRate 0.0762   Epoch: 5   Global Step: 25740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:54,681-Speed 10189.01 samples/sec   Loss 10.4836   LearningRate 0.0762   Epoch: 5   Global Step: 25750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:55,677-Speed 10285.89 samples/sec   Loss 10.6759   LearningRate 0.0762   Epoch: 5   Global Step: 25760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:56,645-Speed 10592.70 samples/sec   Loss 10.3787   LearningRate 0.0761   Epoch: 5   Global Step: 25770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:57,646-Speed 10237.25 samples/sec   Loss 10.5848   LearningRate 0.0761   Epoch: 5   Global Step: 25780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:58,588-Speed 10885.63 samples/sec   Loss 10.6112   LearningRate 0.0761   Epoch: 5   Global Step: 25790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:11:59,546-Speed 10694.90 samples/sec   Loss 10.6060   LearningRate 0.0761   Epoch: 5   Global Step: 25800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:12:00,512-Speed 10615.63 samples/sec   Loss 10.6470   LearningRate 0.0761   Epoch: 5   Global Step: 25810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:12:01,585-Speed 9543.90 samples/sec   Loss 10.6331   LearningRate 0.0761   Epoch: 5   Global Step: 25820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:12:02,604-Speed 10058.55 samples/sec   Loss 10.5151   LearningRate 0.0761   Epoch: 5   Global Step: 25830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:12:03,659-Speed 9718.64 samples/sec   Loss 10.5954   LearningRate 0.0761   Epoch: 5   Global Step: 25840   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 00:12:04,640-Speed 10441.44 samples/sec   Loss 10.4738   LearningRate 0.0761   Epoch: 5   Global Step: 25850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:12:05,594-Speed 10749.14 samples/sec   Loss 10.6725   LearningRate 0.0761   Epoch: 5   Global Step: 25860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:12:06,624-Speed 9953.13 samples/sec   Loss 10.4381   LearningRate 0.0761   Epoch: 5   Global Step: 25870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:12:07,777-Speed 8889.82 samples/sec   Loss 10.7727   LearningRate 0.0761   Epoch: 5   Global Step: 25880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:12:08,765-Speed 10371.38 samples/sec   Loss 10.8486   LearningRate 0.0760   Epoch: 5   Global Step: 25890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:12:09,743-Speed 10479.78 samples/sec   Loss 10.5648   LearningRate 0.0760   Epoch: 5   Global Step: 25900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:12:10,717-Speed 10534.78 samples/sec   Loss 10.8461   LearningRate 0.0760   Epoch: 5   Global Step: 25910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:12:11,678-Speed 10670.91 samples/sec   Loss 10.7849   LearningRate 0.0760   Epoch: 5   Global Step: 25920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:12:12,769-Speed 9387.36 samples/sec   Loss 10.5394   LearningRate 0.0760   Epoch: 5   Global Step: 25930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:12:13,783-Speed 10113.13 samples/sec   Loss 10.5753   LearningRate 0.0760   Epoch: 5   Global Step: 25940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:12:14,873-Speed 9399.44 samples/sec   Loss 10.6940   LearningRate 0.0760   Epoch: 5   Global Step: 25950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:12:15,852-Speed 10482.58 samples/sec   Loss 10.7035   LearningRate 0.0760   Epoch: 5   Global Step: 25960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:12:16,863-Speed 10145.18 samples/sec   Loss 10.6734   LearningRate 0.0760   Epoch: 5   Global Step: 25970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:12:17,945-Speed 9474.54 samples/sec   Loss 10.6395   LearningRate 0.0760   Epoch: 5   Global Step: 25980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:12:18,884-Speed 10916.12 samples/sec   Loss 10.7249   LearningRate 0.0760   Epoch: 5   Global Step: 25990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:12:20,040-Speed 8860.06 samples/sec   Loss 10.7131   LearningRate 0.0759   Epoch: 5   Global Step: 26000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:12:42,715-[lfw][26000]XNorm: 14.586502
Training: 2022-04-11 00:12:42,716-[lfw][26000]Accuracy-Flip: 0.99267+-0.00448
Training: 2022-04-11 00:12:42,716-[lfw][26000]Accuracy-Highest: 0.99400
Training: 2022-04-11 00:13:08,343-[cfp_fp][26000]XNorm: 12.288753
Training: 2022-04-11 00:13:08,344-[cfp_fp][26000]Accuracy-Flip: 0.94229+-0.01507
Training: 2022-04-11 00:13:08,345-[cfp_fp][26000]Accuracy-Highest: 0.94229
Training: 2022-04-11 00:13:30,676-[agedb_30][26000]XNorm: 14.204538
Training: 2022-04-11 00:13:30,676-[agedb_30][26000]Accuracy-Flip: 0.94883+-0.01312
Training: 2022-04-11 00:13:30,677-[agedb_30][26000]Accuracy-Highest: 0.94883
Training: 2022-04-11 00:13:31,645-Speed 143.01 samples/sec   Loss 10.6130   LearningRate 0.0759   Epoch: 5   Global Step: 26010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:13:32,627-Speed 10434.83 samples/sec   Loss 10.5751   LearningRate 0.0759   Epoch: 5   Global Step: 26020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:13:33,586-Speed 10688.94 samples/sec   Loss 10.8880   LearningRate 0.0759   Epoch: 5   Global Step: 26030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:13:34,538-Speed 10776.93 samples/sec   Loss 10.7341   LearningRate 0.0759   Epoch: 5   Global Step: 26040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:13:35,510-Speed 10539.17 samples/sec   Loss 10.7264   LearningRate 0.0759   Epoch: 5   Global Step: 26050   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 00:13:36,453-Speed 10872.12 samples/sec   Loss 10.7509   LearningRate 0.0759   Epoch: 5   Global Step: 26060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:13:37,429-Speed 10499.62 samples/sec   Loss 10.6778   LearningRate 0.0759   Epoch: 5   Global Step: 26070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:13:38,415-Speed 10396.23 samples/sec   Loss 10.6588   LearningRate 0.0759   Epoch: 5   Global Step: 26080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:13:39,417-Speed 10225.43 samples/sec   Loss 10.8478   LearningRate 0.0759   Epoch: 5   Global Step: 26090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:13:40,401-Speed 10417.41 samples/sec   Loss 10.6846   LearningRate 0.0759   Epoch: 5   Global Step: 26100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:13:41,348-Speed 10832.78 samples/sec   Loss 10.7484   LearningRate 0.0759   Epoch: 5   Global Step: 26110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:13:42,286-Speed 10918.09 samples/sec   Loss 10.8199   LearningRate 0.0758   Epoch: 5   Global Step: 26120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:13:43,242-Speed 10724.08 samples/sec   Loss 10.7168   LearningRate 0.0758   Epoch: 5   Global Step: 26130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:13:44,236-Speed 10321.22 samples/sec   Loss 10.6491   LearningRate 0.0758   Epoch: 5   Global Step: 26140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:13:45,192-Speed 10713.93 samples/sec   Loss 10.5541   LearningRate 0.0758   Epoch: 5   Global Step: 26150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:13:46,167-Speed 10513.30 samples/sec   Loss 10.6611   LearningRate 0.0758   Epoch: 5   Global Step: 26160   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 00:13:47,146-Speed 10467.91 samples/sec   Loss 10.6469   LearningRate 0.0758   Epoch: 5   Global Step: 26170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:13:48,112-Speed 10617.36 samples/sec   Loss 10.8588   LearningRate 0.0758   Epoch: 5   Global Step: 26180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:13:49,090-Speed 10482.47 samples/sec   Loss 10.7106   LearningRate 0.0758   Epoch: 5   Global Step: 26190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:13:50,052-Speed 10654.37 samples/sec   Loss 10.7919   LearningRate 0.0758   Epoch: 5   Global Step: 26200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:13:51,039-Speed 10383.79 samples/sec   Loss 10.6803   LearningRate 0.0758   Epoch: 5   Global Step: 26210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:13:52,038-Speed 10261.40 samples/sec   Loss 10.7590   LearningRate 0.0758   Epoch: 5   Global Step: 26220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:13:53,023-Speed 10403.81 samples/sec   Loss 10.8546   LearningRate 0.0758   Epoch: 5   Global Step: 26230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:13:53,979-Speed 10731.17 samples/sec   Loss 10.7421   LearningRate 0.0757   Epoch: 5   Global Step: 26240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:13:54,938-Speed 10686.83 samples/sec   Loss 10.6759   LearningRate 0.0757   Epoch: 5   Global Step: 26250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:13:55,956-Speed 10062.30 samples/sec   Loss 10.7871   LearningRate 0.0757   Epoch: 5   Global Step: 26260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:13:56,925-Speed 10586.43 samples/sec   Loss 10.6888   LearningRate 0.0757   Epoch: 5   Global Step: 26270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:13:57,882-Speed 10703.82 samples/sec   Loss 10.7504   LearningRate 0.0757   Epoch: 5   Global Step: 26280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:13:58,842-Speed 10678.58 samples/sec   Loss 10.6936   LearningRate 0.0757   Epoch: 5   Global Step: 26290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:13:59,852-Speed 10147.65 samples/sec   Loss 10.8562   LearningRate 0.0757   Epoch: 5   Global Step: 26300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:14:00,794-Speed 10879.17 samples/sec   Loss 10.7286   LearningRate 0.0757   Epoch: 5   Global Step: 26310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:14:01,742-Speed 10812.94 samples/sec   Loss 10.7104   LearningRate 0.0757   Epoch: 5   Global Step: 26320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:14:02,699-Speed 10712.48 samples/sec   Loss 10.7538   LearningRate 0.0757   Epoch: 5   Global Step: 26330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:14:03,628-Speed 11029.98 samples/sec   Loss 10.7822   LearningRate 0.0757   Epoch: 5   Global Step: 26340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:14:04,576-Speed 10811.23 samples/sec   Loss 10.8025   LearningRate 0.0756   Epoch: 5   Global Step: 26350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:14:05,624-Speed 9787.33 samples/sec   Loss 10.9842   LearningRate 0.0756   Epoch: 5   Global Step: 26360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:14:06,626-Speed 10219.84 samples/sec   Loss 10.6804   LearningRate 0.0756   Epoch: 5   Global Step: 26370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:14:07,639-Speed 10127.58 samples/sec   Loss 10.8637   LearningRate 0.0756   Epoch: 5   Global Step: 26380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:14:08,610-Speed 10556.52 samples/sec   Loss 10.6713   LearningRate 0.0756   Epoch: 5   Global Step: 26390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:14:09,621-Speed 10131.94 samples/sec   Loss 10.6917   LearningRate 0.0756   Epoch: 5   Global Step: 26400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:14:10,708-Speed 9444.51 samples/sec   Loss 10.7777   LearningRate 0.0756   Epoch: 5   Global Step: 26410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:14:11,730-Speed 10021.09 samples/sec   Loss 10.6857   LearningRate 0.0756   Epoch: 5   Global Step: 26420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:14:12,785-Speed 9718.84 samples/sec   Loss 10.7406   LearningRate 0.0756   Epoch: 5   Global Step: 26430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:14:13,793-Speed 10170.67 samples/sec   Loss 10.8162   LearningRate 0.0756   Epoch: 5   Global Step: 26440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:14:14,765-Speed 10543.02 samples/sec   Loss 10.7676   LearningRate 0.0756   Epoch: 5   Global Step: 26450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:14:15,693-Speed 11066.90 samples/sec   Loss 10.7432   LearningRate 0.0756   Epoch: 5   Global Step: 26460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:14:16,622-Speed 11037.96 samples/sec   Loss 10.7229   LearningRate 0.0755   Epoch: 5   Global Step: 26470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:14:17,596-Speed 10525.38 samples/sec   Loss 10.7943   LearningRate 0.0755   Epoch: 5   Global Step: 26480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:14:18,580-Speed 10410.87 samples/sec   Loss 10.8366   LearningRate 0.0755   Epoch: 5   Global Step: 26490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:14:19,581-Speed 10245.75 samples/sec   Loss 10.8891   LearningRate 0.0755   Epoch: 5   Global Step: 26500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:14:20,516-Speed 10969.84 samples/sec   Loss 10.7404   LearningRate 0.0755   Epoch: 5   Global Step: 26510   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:14:21,466-Speed 10779.42 samples/sec   Loss 10.8304   LearningRate 0.0755   Epoch: 5   Global Step: 26520   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:14:22,457-Speed 10339.43 samples/sec   Loss 10.9271   LearningRate 0.0755   Epoch: 5   Global Step: 26530   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:14:23,442-Speed 10405.15 samples/sec   Loss 10.7641   LearningRate 0.0755   Epoch: 5   Global Step: 26540   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:14:24,391-Speed 10806.24 samples/sec   Loss 10.8740   LearningRate 0.0755   Epoch: 5   Global Step: 26550   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:14:25,327-Speed 10955.29 samples/sec   Loss 10.6354   LearningRate 0.0755   Epoch: 5   Global Step: 26560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:14:26,315-Speed 10363.65 samples/sec   Loss 10.9195   LearningRate 0.0755   Epoch: 5   Global Step: 26570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:14:27,299-Speed 10433.03 samples/sec   Loss 10.8027   LearningRate 0.0755   Epoch: 5   Global Step: 26580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:14:28,271-Speed 10542.53 samples/sec   Loss 10.7817   LearningRate 0.0754   Epoch: 5   Global Step: 26590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:14:29,276-Speed 10198.81 samples/sec   Loss 10.8914   LearningRate 0.0754   Epoch: 5   Global Step: 26600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:14:30,226-Speed 10789.27 samples/sec   Loss 10.9364   LearningRate 0.0754   Epoch: 5   Global Step: 26610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:14:31,175-Speed 10807.87 samples/sec   Loss 10.8597   LearningRate 0.0754   Epoch: 5   Global Step: 26620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:14:32,156-Speed 10440.75 samples/sec   Loss 10.8205   LearningRate 0.0754   Epoch: 5   Global Step: 26630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:14:33,135-Speed 10476.56 samples/sec   Loss 10.9471   LearningRate 0.0754   Epoch: 5   Global Step: 26640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:14:34,135-Speed 10246.61 samples/sec   Loss 10.8897   LearningRate 0.0754   Epoch: 5   Global Step: 26650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:14:35,096-Speed 10664.40 samples/sec   Loss 10.7847   LearningRate 0.0754   Epoch: 5   Global Step: 26660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:14:36,065-Speed 10577.23 samples/sec   Loss 10.9221   LearningRate 0.0754   Epoch: 5   Global Step: 26670   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 00:14:37,050-Speed 10414.54 samples/sec   Loss 10.9461   LearningRate 0.0754   Epoch: 5   Global Step: 26680   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 00:14:38,029-Speed 10459.83 samples/sec   Loss 10.8248   LearningRate 0.0754   Epoch: 5   Global Step: 26690   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 00:14:38,961-Speed 10999.26 samples/sec   Loss 10.7863   LearningRate 0.0753   Epoch: 5   Global Step: 26700   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:14:39,938-Speed 10497.36 samples/sec   Loss 10.6536   LearningRate 0.0753   Epoch: 5   Global Step: 26710   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:14:40,888-Speed 10785.77 samples/sec   Loss 10.6509   LearningRate 0.0753   Epoch: 5   Global Step: 26720   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:14:41,848-Speed 10678.99 samples/sec   Loss 10.6921   LearningRate 0.0753   Epoch: 5   Global Step: 26730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:14:42,860-Speed 10127.71 samples/sec   Loss 10.8312   LearningRate 0.0753   Epoch: 5   Global Step: 26740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:14:43,825-Speed 10625.27 samples/sec   Loss 10.8853   LearningRate 0.0753   Epoch: 5   Global Step: 26750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:14:44,763-Speed 10930.03 samples/sec   Loss 10.7437   LearningRate 0.0753   Epoch: 5   Global Step: 26760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:14:45,716-Speed 10754.97 samples/sec   Loss 10.9945   LearningRate 0.0753   Epoch: 5   Global Step: 26770   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:14:46,712-Speed 10291.92 samples/sec   Loss 10.7269   LearningRate 0.0753   Epoch: 5   Global Step: 26780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:14:47,688-Speed 10495.09 samples/sec   Loss 11.0458   LearningRate 0.0753   Epoch: 5   Global Step: 26790   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:14:48,656-Speed 10596.39 samples/sec   Loss 10.7071   LearningRate 0.0753   Epoch: 5   Global Step: 26800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:14:49,592-Speed 10952.23 samples/sec   Loss 10.9333   LearningRate 0.0753   Epoch: 5   Global Step: 26810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:14:50,575-Speed 10433.90 samples/sec   Loss 10.8755   LearningRate 0.0752   Epoch: 5   Global Step: 26820   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:14:51,543-Speed 10580.81 samples/sec   Loss 10.8662   LearningRate 0.0752   Epoch: 5   Global Step: 26830   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:14:52,525-Speed 10446.75 samples/sec   Loss 10.6918   LearningRate 0.0752   Epoch: 5   Global Step: 26840   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:14:53,504-Speed 10461.32 samples/sec   Loss 10.8954   LearningRate 0.0752   Epoch: 5   Global Step: 26850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:14:54,442-Speed 10929.40 samples/sec   Loss 10.7807   LearningRate 0.0752   Epoch: 5   Global Step: 26860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:14:55,414-Speed 10544.68 samples/sec   Loss 10.9976   LearningRate 0.0752   Epoch: 5   Global Step: 26870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:14:56,486-Speed 9558.82 samples/sec   Loss 10.9811   LearningRate 0.0752   Epoch: 5   Global Step: 26880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:14:57,469-Speed 10429.07 samples/sec   Loss 10.7801   LearningRate 0.0752   Epoch: 5   Global Step: 26890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:14:58,429-Speed 10676.65 samples/sec   Loss 10.8380   LearningRate 0.0752   Epoch: 5   Global Step: 26900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:14:59,469-Speed 9850.79 samples/sec   Loss 10.9644   LearningRate 0.0752   Epoch: 5   Global Step: 26910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:15:00,405-Speed 10954.63 samples/sec   Loss 10.8458   LearningRate 0.0752   Epoch: 5   Global Step: 26920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:15:01,418-Speed 10132.42 samples/sec   Loss 10.7363   LearningRate 0.0752   Epoch: 5   Global Step: 26930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:15:02,391-Speed 10536.76 samples/sec   Loss 10.8636   LearningRate 0.0751   Epoch: 5   Global Step: 26940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:15:03,345-Speed 10748.52 samples/sec   Loss 10.8712   LearningRate 0.0751   Epoch: 5   Global Step: 26950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:15:04,347-Speed 10227.61 samples/sec   Loss 10.8046   LearningRate 0.0751   Epoch: 5   Global Step: 26960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:15:05,337-Speed 10361.34 samples/sec   Loss 10.7330   LearningRate 0.0751   Epoch: 5   Global Step: 26970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:15:06,314-Speed 10489.68 samples/sec   Loss 10.6812   LearningRate 0.0751   Epoch: 5   Global Step: 26980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:15:07,315-Speed 10242.07 samples/sec   Loss 10.7340   LearningRate 0.0751   Epoch: 5   Global Step: 26990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:15:08,261-Speed 10840.26 samples/sec   Loss 10.9600   LearningRate 0.0751   Epoch: 5   Global Step: 27000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:15:09,231-Speed 10564.79 samples/sec   Loss 10.8065   LearningRate 0.0751   Epoch: 5   Global Step: 27010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:15:10,174-Speed 10872.20 samples/sec   Loss 10.7516   LearningRate 0.0751   Epoch: 5   Global Step: 27020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:15:11,154-Speed 10459.35 samples/sec   Loss 10.7202   LearningRate 0.0751   Epoch: 5   Global Step: 27030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:15:12,140-Speed 10396.23 samples/sec   Loss 10.7842   LearningRate 0.0751   Epoch: 5   Global Step: 27040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:15:13,096-Speed 10721.85 samples/sec   Loss 10.8952   LearningRate 0.0750   Epoch: 5   Global Step: 27050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:15:14,113-Speed 10075.29 samples/sec   Loss 10.9687   LearningRate 0.0750   Epoch: 5   Global Step: 27060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:15:15,056-Speed 10864.20 samples/sec   Loss 10.8267   LearningRate 0.0750   Epoch: 5   Global Step: 27070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:15:16,014-Speed 10700.26 samples/sec   Loss 10.9265   LearningRate 0.0750   Epoch: 5   Global Step: 27080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:15:16,966-Speed 10763.71 samples/sec   Loss 10.7762   LearningRate 0.0750   Epoch: 5   Global Step: 27090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:15:17,943-Speed 10495.31 samples/sec   Loss 10.7704   LearningRate 0.0750   Epoch: 5   Global Step: 27100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:15:18,925-Speed 10440.60 samples/sec   Loss 10.7530   LearningRate 0.0750   Epoch: 5   Global Step: 27110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:15:19,903-Speed 10477.39 samples/sec   Loss 11.0314   LearningRate 0.0750   Epoch: 5   Global Step: 27120   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 00:15:20,847-Speed 10863.67 samples/sec   Loss 10.9511   LearningRate 0.0750   Epoch: 5   Global Step: 27130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:15:21,821-Speed 10520.02 samples/sec   Loss 10.9236   LearningRate 0.0750   Epoch: 5   Global Step: 27140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:15:22,780-Speed 10686.85 samples/sec   Loss 10.7968   LearningRate 0.0750   Epoch: 5   Global Step: 27150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:15:23,713-Speed 10984.82 samples/sec   Loss 10.9945   LearningRate 0.0750   Epoch: 5   Global Step: 27160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:15:24,657-Speed 10863.97 samples/sec   Loss 10.8796   LearningRate 0.0749   Epoch: 5   Global Step: 27170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:15:25,606-Speed 10794.29 samples/sec   Loss 10.9999   LearningRate 0.0749   Epoch: 5   Global Step: 27180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:15:26,583-Speed 10492.24 samples/sec   Loss 10.7992   LearningRate 0.0749   Epoch: 5   Global Step: 27190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:15:27,567-Speed 10413.59 samples/sec   Loss 10.8137   LearningRate 0.0749   Epoch: 5   Global Step: 27200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:15:28,521-Speed 10744.65 samples/sec   Loss 10.6960   LearningRate 0.0749   Epoch: 5   Global Step: 27210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:15:29,502-Speed 10454.42 samples/sec   Loss 10.8968   LearningRate 0.0749   Epoch: 5   Global Step: 27220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:15:30,439-Speed 10933.86 samples/sec   Loss 10.8306   LearningRate 0.0749   Epoch: 5   Global Step: 27230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:15:31,411-Speed 10544.14 samples/sec   Loss 10.8251   LearningRate 0.0749   Epoch: 5   Global Step: 27240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:15:32,386-Speed 10512.80 samples/sec   Loss 10.8355   LearningRate 0.0749   Epoch: 5   Global Step: 27250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:15:33,356-Speed 10564.72 samples/sec   Loss 11.0183   LearningRate 0.0749   Epoch: 5   Global Step: 27260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:15:34,305-Speed 10801.40 samples/sec   Loss 10.7291   LearningRate 0.0749   Epoch: 5   Global Step: 27270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:15:35,296-Speed 10350.02 samples/sec   Loss 10.7419   LearningRate 0.0749   Epoch: 5   Global Step: 27280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:15:36,272-Speed 10502.00 samples/sec   Loss 10.8216   LearningRate 0.0748   Epoch: 5   Global Step: 27290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:15:37,238-Speed 10609.00 samples/sec   Loss 11.0883   LearningRate 0.0748   Epoch: 5   Global Step: 27300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:15:38,202-Speed 10636.54 samples/sec   Loss 10.9393   LearningRate 0.0748   Epoch: 5   Global Step: 27310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:15:39,184-Speed 10432.66 samples/sec   Loss 10.9179   LearningRate 0.0748   Epoch: 5   Global Step: 27320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:15:40,148-Speed 10635.00 samples/sec   Loss 10.7179   LearningRate 0.0748   Epoch: 5   Global Step: 27330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:15:41,144-Speed 10289.58 samples/sec   Loss 10.6672   LearningRate 0.0748   Epoch: 5   Global Step: 27340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:15:42,109-Speed 10629.21 samples/sec   Loss 10.7919   LearningRate 0.0748   Epoch: 5   Global Step: 27350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:15:43,073-Speed 10630.88 samples/sec   Loss 10.8984   LearningRate 0.0748   Epoch: 5   Global Step: 27360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:15:44,041-Speed 10579.56 samples/sec   Loss 10.9223   LearningRate 0.0748   Epoch: 5   Global Step: 27370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:15:45,011-Speed 10570.46 samples/sec   Loss 10.7521   LearningRate 0.0748   Epoch: 5   Global Step: 27380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:15:45,987-Speed 10498.66 samples/sec   Loss 10.6124   LearningRate 0.0748   Epoch: 5   Global Step: 27390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:15:46,954-Speed 10604.37 samples/sec   Loss 10.9686   LearningRate 0.0747   Epoch: 5   Global Step: 27400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:15:47,919-Speed 10616.53 samples/sec   Loss 10.8145   LearningRate 0.0747   Epoch: 5   Global Step: 27410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:15:48,854-Speed 10973.30 samples/sec   Loss 10.7964   LearningRate 0.0747   Epoch: 5   Global Step: 27420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:15:49,827-Speed 10536.78 samples/sec   Loss 10.8700   LearningRate 0.0747   Epoch: 5   Global Step: 27430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:15:50,805-Speed 10477.94 samples/sec   Loss 10.7459   LearningRate 0.0747   Epoch: 5   Global Step: 27440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:15:51,780-Speed 10513.27 samples/sec   Loss 10.6943   LearningRate 0.0747   Epoch: 5   Global Step: 27450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:15:52,750-Speed 10561.16 samples/sec   Loss 10.7716   LearningRate 0.0747   Epoch: 5   Global Step: 27460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:15:53,697-Speed 10821.67 samples/sec   Loss 10.8816   LearningRate 0.0747   Epoch: 5   Global Step: 27470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:15:54,705-Speed 10171.66 samples/sec   Loss 10.8232   LearningRate 0.0747   Epoch: 5   Global Step: 27480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:15:55,671-Speed 10608.50 samples/sec   Loss 10.8858   LearningRate 0.0747   Epoch: 5   Global Step: 27490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:15:56,629-Speed 10696.71 samples/sec   Loss 10.6781   LearningRate 0.0747   Epoch: 5   Global Step: 27500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:15:57,609-Speed 10466.11 samples/sec   Loss 10.8464   LearningRate 0.0747   Epoch: 5   Global Step: 27510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:15:58,577-Speed 10591.47 samples/sec   Loss 10.8751   LearningRate 0.0746   Epoch: 5   Global Step: 27520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:15:59,610-Speed 9919.06 samples/sec   Loss 10.8697   LearningRate 0.0746   Epoch: 5   Global Step: 27530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:16:00,566-Speed 10721.15 samples/sec   Loss 10.7538   LearningRate 0.0746   Epoch: 5   Global Step: 27540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:16:01,531-Speed 10624.62 samples/sec   Loss 10.6295   LearningRate 0.0746   Epoch: 5   Global Step: 27550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:16:02,497-Speed 10616.57 samples/sec   Loss 10.8513   LearningRate 0.0746   Epoch: 5   Global Step: 27560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:16:03,460-Speed 10649.88 samples/sec   Loss 10.8205   LearningRate 0.0746   Epoch: 5   Global Step: 27570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:16:04,417-Speed 10710.82 samples/sec   Loss 10.6246   LearningRate 0.0746   Epoch: 5   Global Step: 27580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:16:05,367-Speed 10785.38 samples/sec   Loss 10.8166   LearningRate 0.0746   Epoch: 5   Global Step: 27590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:16:06,353-Speed 10400.98 samples/sec   Loss 10.8905   LearningRate 0.0746   Epoch: 5   Global Step: 27600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:16:07,311-Speed 10703.30 samples/sec   Loss 10.9367   LearningRate 0.0746   Epoch: 5   Global Step: 27610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:16:08,268-Speed 10711.91 samples/sec   Loss 10.8532   LearningRate 0.0746   Epoch: 5   Global Step: 27620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:16:09,223-Speed 10729.75 samples/sec   Loss 10.7467   LearningRate 0.0746   Epoch: 5   Global Step: 27630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:16:10,221-Speed 10267.67 samples/sec   Loss 10.8923   LearningRate 0.0745   Epoch: 5   Global Step: 27640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:16:11,217-Speed 10303.34 samples/sec   Loss 10.5836   LearningRate 0.0745   Epoch: 5   Global Step: 27650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:16:12,182-Speed 10624.45 samples/sec   Loss 10.7928   LearningRate 0.0745   Epoch: 5   Global Step: 27660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:16:13,159-Speed 10498.73 samples/sec   Loss 10.6169   LearningRate 0.0745   Epoch: 5   Global Step: 27670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:16:14,100-Speed 10893.25 samples/sec   Loss 10.9161   LearningRate 0.0745   Epoch: 5   Global Step: 27680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:16:15,045-Speed 10849.90 samples/sec   Loss 10.8172   LearningRate 0.0745   Epoch: 5   Global Step: 27690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:16:15,992-Speed 10817.42 samples/sec   Loss 11.0160   LearningRate 0.0745   Epoch: 5   Global Step: 27700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:16:16,962-Speed 10568.58 samples/sec   Loss 10.8715   LearningRate 0.0745   Epoch: 5   Global Step: 27710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:16:17,966-Speed 10207.93 samples/sec   Loss 10.7648   LearningRate 0.0745   Epoch: 5   Global Step: 27720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:16:18,949-Speed 10433.35 samples/sec   Loss 10.9116   LearningRate 0.0745   Epoch: 5   Global Step: 27730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:16:19,930-Speed 10439.42 samples/sec   Loss 10.9305   LearningRate 0.0745   Epoch: 5   Global Step: 27740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:16:20,906-Speed 10504.75 samples/sec   Loss 10.8457   LearningRate 0.0744   Epoch: 5   Global Step: 27750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:16:21,845-Speed 10918.80 samples/sec   Loss 10.8672   LearningRate 0.0744   Epoch: 5   Global Step: 27760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:16:22,848-Speed 10225.13 samples/sec   Loss 10.9072   LearningRate 0.0744   Epoch: 5   Global Step: 27770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:16:23,828-Speed 10452.02 samples/sec   Loss 10.8437   LearningRate 0.0744   Epoch: 5   Global Step: 27780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:16:24,821-Speed 10329.49 samples/sec   Loss 10.6769   LearningRate 0.0744   Epoch: 5   Global Step: 27790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:16:25,829-Speed 10160.18 samples/sec   Loss 10.7255   LearningRate 0.0744   Epoch: 5   Global Step: 27800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:16:26,807-Speed 10487.53 samples/sec   Loss 10.8606   LearningRate 0.0744   Epoch: 5   Global Step: 27810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:16:27,738-Speed 11013.66 samples/sec   Loss 10.8285   LearningRate 0.0744   Epoch: 5   Global Step: 27820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:16:28,717-Speed 10464.82 samples/sec   Loss 10.9847   LearningRate 0.0744   Epoch: 5   Global Step: 27830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:16:29,677-Speed 10679.90 samples/sec   Loss 10.7881   LearningRate 0.0744   Epoch: 5   Global Step: 27840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:16:30,686-Speed 10152.11 samples/sec   Loss 10.9761   LearningRate 0.0744   Epoch: 5   Global Step: 27850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:16:31,639-Speed 10759.18 samples/sec   Loss 10.8195   LearningRate 0.0744   Epoch: 5   Global Step: 27860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:16:32,692-Speed 9731.46 samples/sec   Loss 10.8613   LearningRate 0.0743   Epoch: 5   Global Step: 27870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:16:33,697-Speed 10200.85 samples/sec   Loss 10.6941   LearningRate 0.0743   Epoch: 5   Global Step: 27880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:16:34,645-Speed 10815.60 samples/sec   Loss 10.8031   LearningRate 0.0743   Epoch: 5   Global Step: 27890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:16:35,676-Speed 9932.07 samples/sec   Loss 10.9078   LearningRate 0.0743   Epoch: 5   Global Step: 27900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:16:36,679-Speed 10223.40 samples/sec   Loss 10.7508   LearningRate 0.0743   Epoch: 5   Global Step: 27910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:16:37,623-Speed 10858.81 samples/sec   Loss 10.8315   LearningRate 0.0743   Epoch: 5   Global Step: 27920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:16:38,568-Speed 10847.33 samples/sec   Loss 10.9203   LearningRate 0.0743   Epoch: 5   Global Step: 27930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:16:39,576-Speed 10165.69 samples/sec   Loss 10.8012   LearningRate 0.0743   Epoch: 5   Global Step: 27940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:16:40,674-Speed 9340.82 samples/sec   Loss 10.7345   LearningRate 0.0743   Epoch: 5   Global Step: 27950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:16:41,641-Speed 10599.70 samples/sec   Loss 11.1474   LearningRate 0.0743   Epoch: 5   Global Step: 27960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:16:42,629-Speed 10375.91 samples/sec   Loss 11.0048   LearningRate 0.0743   Epoch: 5   Global Step: 27970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:16:43,608-Speed 10470.33 samples/sec   Loss 10.8948   LearningRate 0.0743   Epoch: 5   Global Step: 27980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:16:44,607-Speed 10266.35 samples/sec   Loss 11.0125   LearningRate 0.0742   Epoch: 5   Global Step: 27990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:16:45,562-Speed 10736.14 samples/sec   Loss 10.9895   LearningRate 0.0742   Epoch: 5   Global Step: 28000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:17:08,443-[lfw][28000]XNorm: 14.497399
Training: 2022-04-11 00:17:08,443-[lfw][28000]Accuracy-Flip: 0.99367+-0.00371
Training: 2022-04-11 00:17:08,444-[lfw][28000]Accuracy-Highest: 0.99400
Training: 2022-04-11 00:17:34,090-[cfp_fp][28000]XNorm: 12.246552
Training: 2022-04-11 00:17:34,091-[cfp_fp][28000]Accuracy-Flip: 0.94000+-0.01580
Training: 2022-04-11 00:17:34,092-[cfp_fp][28000]Accuracy-Highest: 0.94229
Training: 2022-04-11 00:17:56,290-[agedb_30][28000]XNorm: 14.169908
Training: 2022-04-11 00:17:56,291-[agedb_30][28000]Accuracy-Flip: 0.94833+-0.01271
Training: 2022-04-11 00:17:56,292-[agedb_30][28000]Accuracy-Highest: 0.94883
Training: 2022-04-11 00:17:57,240-Speed 142.86 samples/sec   Loss 10.7163   LearningRate 0.0742   Epoch: 5   Global Step: 28010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:17:58,198-Speed 10692.67 samples/sec   Loss 10.8914   LearningRate 0.0742   Epoch: 5   Global Step: 28020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:17:59,168-Speed 10571.24 samples/sec   Loss 10.8613   LearningRate 0.0742   Epoch: 5   Global Step: 28030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:00,173-Speed 10198.65 samples/sec   Loss 10.9347   LearningRate 0.0742   Epoch: 5   Global Step: 28040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:01,123-Speed 10791.29 samples/sec   Loss 10.7415   LearningRate 0.0742   Epoch: 5   Global Step: 28050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:02,069-Speed 10825.69 samples/sec   Loss 10.7083   LearningRate 0.0742   Epoch: 5   Global Step: 28060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:03,025-Speed 10723.11 samples/sec   Loss 10.7527   LearningRate 0.0742   Epoch: 5   Global Step: 28070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:04,018-Speed 10319.69 samples/sec   Loss 10.6791   LearningRate 0.0742   Epoch: 5   Global Step: 28080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:04,997-Speed 10476.32 samples/sec   Loss 10.9761   LearningRate 0.0742   Epoch: 5   Global Step: 28090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:05,963-Speed 10606.60 samples/sec   Loss 10.8030   LearningRate 0.0742   Epoch: 5   Global Step: 28100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:06,896-Speed 10995.37 samples/sec   Loss 10.7258   LearningRate 0.0741   Epoch: 5   Global Step: 28110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:07,869-Speed 10525.55 samples/sec   Loss 10.8790   LearningRate 0.0741   Epoch: 5   Global Step: 28120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:08,875-Speed 10203.66 samples/sec   Loss 10.7726   LearningRate 0.0741   Epoch: 5   Global Step: 28130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:09,854-Speed 10471.88 samples/sec   Loss 10.8450   LearningRate 0.0741   Epoch: 5   Global Step: 28140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:10,812-Speed 10694.40 samples/sec   Loss 10.8036   LearningRate 0.0741   Epoch: 5   Global Step: 28150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:11,858-Speed 9802.17 samples/sec   Loss 10.6979   LearningRate 0.0741   Epoch: 5   Global Step: 28160   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 00:18:12,809-Speed 10787.73 samples/sec   Loss 10.7202   LearningRate 0.0741   Epoch: 5   Global Step: 28170   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 00:18:13,761-Speed 10765.07 samples/sec   Loss 10.8694   LearningRate 0.0741   Epoch: 5   Global Step: 28180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:14,707-Speed 10827.35 samples/sec   Loss 10.8350   LearningRate 0.0741   Epoch: 5   Global Step: 28190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:15,659-Speed 10773.20 samples/sec   Loss 10.8783   LearningRate 0.0741   Epoch: 5   Global Step: 28200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:16,609-Speed 10781.95 samples/sec   Loss 10.8362   LearningRate 0.0741   Epoch: 5   Global Step: 28210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:17,886-Speed 8031.35 samples/sec   Loss 10.7623   LearningRate 0.0740   Epoch: 5   Global Step: 28220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:19,205-Speed 7768.18 samples/sec   Loss 10.8294   LearningRate 0.0740   Epoch: 5   Global Step: 28230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:21,182-Speed 5184.77 samples/sec   Loss 10.8242   LearningRate 0.0740   Epoch: 5   Global Step: 28240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:22,396-Speed 8442.66 samples/sec   Loss 10.8162   LearningRate 0.0740   Epoch: 5   Global Step: 28250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:23,352-Speed 10713.33 samples/sec   Loss 10.9911   LearningRate 0.0740   Epoch: 5   Global Step: 28260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:24,345-Speed 10326.44 samples/sec   Loss 10.7907   LearningRate 0.0740   Epoch: 5   Global Step: 28270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:25,289-Speed 10850.44 samples/sec   Loss 10.7222   LearningRate 0.0740   Epoch: 5   Global Step: 28280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:26,236-Speed 10834.09 samples/sec   Loss 10.8254   LearningRate 0.0740   Epoch: 5   Global Step: 28290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:27,236-Speed 10242.11 samples/sec   Loss 10.6974   LearningRate 0.0740   Epoch: 5   Global Step: 28300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:28,205-Speed 10576.72 samples/sec   Loss 10.8180   LearningRate 0.0740   Epoch: 5   Global Step: 28310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:29,185-Speed 10466.55 samples/sec   Loss 10.7231   LearningRate 0.0740   Epoch: 5   Global Step: 28320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:30,171-Speed 10391.55 samples/sec   Loss 10.8690   LearningRate 0.0740   Epoch: 5   Global Step: 28330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:31,125-Speed 10743.49 samples/sec   Loss 10.9371   LearningRate 0.0739   Epoch: 5   Global Step: 28340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:32,098-Speed 10529.74 samples/sec   Loss 10.8607   LearningRate 0.0739   Epoch: 5   Global Step: 28350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:33,036-Speed 10928.62 samples/sec   Loss 10.6417   LearningRate 0.0739   Epoch: 5   Global Step: 28360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:34,047-Speed 10139.78 samples/sec   Loss 10.8748   LearningRate 0.0739   Epoch: 5   Global Step: 28370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:35,023-Speed 10506.90 samples/sec   Loss 10.7473   LearningRate 0.0739   Epoch: 5   Global Step: 28380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:35,992-Speed 10576.59 samples/sec   Loss 11.0389   LearningRate 0.0739   Epoch: 5   Global Step: 28390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:36,993-Speed 10240.39 samples/sec   Loss 10.8059   LearningRate 0.0739   Epoch: 5   Global Step: 28400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:38,011-Speed 10074.64 samples/sec   Loss 10.8632   LearningRate 0.0739   Epoch: 5   Global Step: 28410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:38,974-Speed 10636.64 samples/sec   Loss 10.7042   LearningRate 0.0739   Epoch: 5   Global Step: 28420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:39,945-Speed 10621.61 samples/sec   Loss 10.7706   LearningRate 0.0739   Epoch: 5   Global Step: 28430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:40,929-Speed 10416.94 samples/sec   Loss 10.7036   LearningRate 0.0739   Epoch: 5   Global Step: 28440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:41,909-Speed 10455.56 samples/sec   Loss 10.7401   LearningRate 0.0739   Epoch: 5   Global Step: 28450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:42,863-Speed 10744.27 samples/sec   Loss 10.9612   LearningRate 0.0738   Epoch: 5   Global Step: 28460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:43,792-Speed 11035.76 samples/sec   Loss 11.0001   LearningRate 0.0738   Epoch: 5   Global Step: 28470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 00:18:44,741-Speed 10805.79 samples/sec   Loss 10.6809   LearningRate 0.0738   Epoch: 5   Global Step: 28480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:18:45,702-Speed 10660.59 samples/sec   Loss 11.0740   LearningRate 0.0738   Epoch: 5   Global Step: 28490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:18:46,667-Speed 10622.54 samples/sec   Loss 10.7543   LearningRate 0.0738   Epoch: 5   Global Step: 28500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:18:47,643-Speed 10501.35 samples/sec   Loss 10.7322   LearningRate 0.0738   Epoch: 5   Global Step: 28510   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:18:48,616-Speed 10549.53 samples/sec   Loss 10.8996   LearningRate 0.0738   Epoch: 5   Global Step: 28520   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:18:49,595-Speed 10460.12 samples/sec   Loss 10.7299   LearningRate 0.0738   Epoch: 5   Global Step: 28530   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:18:50,551-Speed 10725.15 samples/sec   Loss 10.8017   LearningRate 0.0738   Epoch: 5   Global Step: 28540   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:18:51,536-Speed 10407.28 samples/sec   Loss 10.7381   LearningRate 0.0738   Epoch: 5   Global Step: 28550   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 00:18:52,492-Speed 10720.50 samples/sec   Loss 10.7681   LearningRate 0.0738   Epoch: 5   Global Step: 28560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:18:53,454-Speed 10657.30 samples/sec   Loss 10.9088   LearningRate 0.0738   Epoch: 5   Global Step: 28570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:18:54,401-Speed 10814.37 samples/sec   Loss 10.7426   LearningRate 0.0737   Epoch: 5   Global Step: 28580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:18:55,381-Speed 10463.73 samples/sec   Loss 10.7934   LearningRate 0.0737   Epoch: 5   Global Step: 28590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:18:56,381-Speed 10253.74 samples/sec   Loss 10.8136   LearningRate 0.0737   Epoch: 5   Global Step: 28600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:18:57,344-Speed 10645.14 samples/sec   Loss 10.7522   LearningRate 0.0737   Epoch: 5   Global Step: 28610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:18:58,337-Speed 10318.18 samples/sec   Loss 10.8416   LearningRate 0.0737   Epoch: 5   Global Step: 28620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:18:59,313-Speed 10504.95 samples/sec   Loss 10.7793   LearningRate 0.0737   Epoch: 5   Global Step: 28630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:00,316-Speed 10219.53 samples/sec   Loss 10.6653   LearningRate 0.0737   Epoch: 5   Global Step: 28640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:01,315-Speed 10257.86 samples/sec   Loss 10.5539   LearningRate 0.0737   Epoch: 5   Global Step: 28650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:02,286-Speed 10548.71 samples/sec   Loss 10.7918   LearningRate 0.0737   Epoch: 5   Global Step: 28660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:03,320-Speed 9911.57 samples/sec   Loss 10.9847   LearningRate 0.0737   Epoch: 5   Global Step: 28670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:04,298-Speed 10490.50 samples/sec   Loss 10.9707   LearningRate 0.0737   Epoch: 5   Global Step: 28680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:05,260-Speed 10650.73 samples/sec   Loss 10.8095   LearningRate 0.0736   Epoch: 5   Global Step: 28690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:06,197-Speed 10943.20 samples/sec   Loss 10.6916   LearningRate 0.0736   Epoch: 5   Global Step: 28700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:07,171-Speed 10528.57 samples/sec   Loss 10.7423   LearningRate 0.0736   Epoch: 5   Global Step: 28710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:08,164-Speed 10318.72 samples/sec   Loss 10.7103   LearningRate 0.0736   Epoch: 5   Global Step: 28720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:09,110-Speed 10841.00 samples/sec   Loss 10.8606   LearningRate 0.0736   Epoch: 5   Global Step: 28730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:10,040-Speed 11021.20 samples/sec   Loss 10.6593   LearningRate 0.0736   Epoch: 5   Global Step: 28740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:11,054-Speed 10100.59 samples/sec   Loss 10.7699   LearningRate 0.0736   Epoch: 5   Global Step: 28750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:12,112-Speed 9697.81 samples/sec   Loss 10.7942   LearningRate 0.0736   Epoch: 5   Global Step: 28760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:13,077-Speed 10614.96 samples/sec   Loss 10.6674   LearningRate 0.0736   Epoch: 5   Global Step: 28770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:14,055-Speed 10480.84 samples/sec   Loss 10.8661   LearningRate 0.0736   Epoch: 5   Global Step: 28780   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:19:15,039-Speed 10421.56 samples/sec   Loss 10.6408   LearningRate 0.0736   Epoch: 5   Global Step: 28790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:16,062-Speed 10017.45 samples/sec   Loss 10.7242   LearningRate 0.0736   Epoch: 5   Global Step: 28800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:17,033-Speed 10550.84 samples/sec   Loss 10.7626   LearningRate 0.0735   Epoch: 5   Global Step: 28810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:18,001-Speed 10596.99 samples/sec   Loss 10.7849   LearningRate 0.0735   Epoch: 5   Global Step: 28820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:18,943-Speed 10873.01 samples/sec   Loss 10.6826   LearningRate 0.0735   Epoch: 5   Global Step: 28830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:19,909-Speed 10612.79 samples/sec   Loss 10.8415   LearningRate 0.0735   Epoch: 5   Global Step: 28840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:20,870-Speed 10661.26 samples/sec   Loss 10.7450   LearningRate 0.0735   Epoch: 5   Global Step: 28850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:21,865-Speed 10307.13 samples/sec   Loss 10.9308   LearningRate 0.0735   Epoch: 5   Global Step: 28860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:22,853-Speed 10378.23 samples/sec   Loss 10.7260   LearningRate 0.0735   Epoch: 5   Global Step: 28870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:23,837-Speed 10411.37 samples/sec   Loss 10.8484   LearningRate 0.0735   Epoch: 5   Global Step: 28880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:24,783-Speed 10840.09 samples/sec   Loss 10.7312   LearningRate 0.0735   Epoch: 5   Global Step: 28890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:25,738-Speed 10735.29 samples/sec   Loss 10.7824   LearningRate 0.0735   Epoch: 5   Global Step: 28900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:26,705-Speed 10595.20 samples/sec   Loss 10.8773   LearningRate 0.0735   Epoch: 5   Global Step: 28910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:27,712-Speed 10180.54 samples/sec   Loss 10.7862   LearningRate 0.0735   Epoch: 5   Global Step: 28920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:28,678-Speed 10610.22 samples/sec   Loss 10.8769   LearningRate 0.0734   Epoch: 5   Global Step: 28930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:29,657-Speed 10469.55 samples/sec   Loss 10.6506   LearningRate 0.0734   Epoch: 5   Global Step: 28940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:30,623-Speed 10610.15 samples/sec   Loss 10.9377   LearningRate 0.0734   Epoch: 5   Global Step: 28950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:31,648-Speed 10003.80 samples/sec   Loss 10.7878   LearningRate 0.0734   Epoch: 5   Global Step: 28960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:32,610-Speed 10658.90 samples/sec   Loss 10.9316   LearningRate 0.0734   Epoch: 5   Global Step: 28970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:33,572-Speed 10657.23 samples/sec   Loss 10.6838   LearningRate 0.0734   Epoch: 5   Global Step: 28980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:34,531-Speed 10690.14 samples/sec   Loss 10.6755   LearningRate 0.0734   Epoch: 5   Global Step: 28990   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:19:35,757-Speed 8352.96 samples/sec   Loss 10.6473   LearningRate 0.0734   Epoch: 5   Global Step: 29000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:36,741-Speed 10420.39 samples/sec   Loss 10.7145   LearningRate 0.0734   Epoch: 5   Global Step: 29010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:37,718-Speed 10486.16 samples/sec   Loss 10.8282   LearningRate 0.0734   Epoch: 5   Global Step: 29020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:38,678-Speed 10679.88 samples/sec   Loss 10.6255   LearningRate 0.0734   Epoch: 5   Global Step: 29030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:19:39,667-Speed 10373.43 samples/sec   Loss 10.7288   LearningRate 0.0734   Epoch: 5   Global Step: 29040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:19:40,665-Speed 10268.71 samples/sec   Loss 10.7613   LearningRate 0.0733   Epoch: 5   Global Step: 29050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:19:41,717-Speed 9739.07 samples/sec   Loss 10.6802   LearningRate 0.0733   Epoch: 5   Global Step: 29060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:19:42,716-Speed 10263.34 samples/sec   Loss 10.7752   LearningRate 0.0733   Epoch: 5   Global Step: 29070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:19:43,714-Speed 10269.32 samples/sec   Loss 10.7056   LearningRate 0.0733   Epoch: 5   Global Step: 29080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:19:44,676-Speed 10657.75 samples/sec   Loss 10.7526   LearningRate 0.0733   Epoch: 5   Global Step: 29090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:19:45,650-Speed 10520.64 samples/sec   Loss 10.8020   LearningRate 0.0733   Epoch: 5   Global Step: 29100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:19:46,657-Speed 10184.47 samples/sec   Loss 10.6838   LearningRate 0.0733   Epoch: 5   Global Step: 29110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:19:47,634-Speed 10486.79 samples/sec   Loss 10.6768   LearningRate 0.0733   Epoch: 5   Global Step: 29120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:19:48,614-Speed 10461.03 samples/sec   Loss 10.7882   LearningRate 0.0733   Epoch: 5   Global Step: 29130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:49,580-Speed 10606.04 samples/sec   Loss 10.6544   LearningRate 0.0733   Epoch: 5   Global Step: 29140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:50,599-Speed 10067.68 samples/sec   Loss 10.6884   LearningRate 0.0733   Epoch: 5   Global Step: 29150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:51,554-Speed 10735.86 samples/sec   Loss 10.6736   LearningRate 0.0733   Epoch: 5   Global Step: 29160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:52,528-Speed 10529.65 samples/sec   Loss 10.7973   LearningRate 0.0732   Epoch: 5   Global Step: 29170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:53,517-Speed 10366.56 samples/sec   Loss 10.9554   LearningRate 0.0732   Epoch: 5   Global Step: 29180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:54,516-Speed 10259.28 samples/sec   Loss 10.7089   LearningRate 0.0732   Epoch: 5   Global Step: 29190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:55,501-Speed 10407.98 samples/sec   Loss 10.5253   LearningRate 0.0732   Epoch: 5   Global Step: 29200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:56,474-Speed 10533.24 samples/sec   Loss 10.7265   LearningRate 0.0732   Epoch: 5   Global Step: 29210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:57,437-Speed 10646.67 samples/sec   Loss 10.8740   LearningRate 0.0732   Epoch: 5   Global Step: 29220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:58,411-Speed 10531.56 samples/sec   Loss 10.6336   LearningRate 0.0732   Epoch: 5   Global Step: 29230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:19:59,377-Speed 10609.07 samples/sec   Loss 10.8547   LearningRate 0.0732   Epoch: 5   Global Step: 29240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:20:00,353-Speed 10509.52 samples/sec   Loss 10.5779   LearningRate 0.0732   Epoch: 5   Global Step: 29250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:20:01,324-Speed 10547.13 samples/sec   Loss 10.6693   LearningRate 0.0732   Epoch: 5   Global Step: 29260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:20:02,304-Speed 10460.58 samples/sec   Loss 10.7509   LearningRate 0.0732   Epoch: 5   Global Step: 29270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:20:03,283-Speed 10496.10 samples/sec   Loss 10.7520   LearningRate 0.0732   Epoch: 5   Global Step: 29280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:20:04,232-Speed 10805.68 samples/sec   Loss 10.6652   LearningRate 0.0731   Epoch: 5   Global Step: 29290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:20:05,173-Speed 10888.96 samples/sec   Loss 10.6712   LearningRate 0.0731   Epoch: 5   Global Step: 29300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:20:06,166-Speed 10320.92 samples/sec   Loss 10.8665   LearningRate 0.0731   Epoch: 5   Global Step: 29310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:20:07,147-Speed 10452.22 samples/sec   Loss 10.6712   LearningRate 0.0731   Epoch: 5   Global Step: 29320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:20:08,129-Speed 10438.50 samples/sec   Loss 10.6263   LearningRate 0.0731   Epoch: 5   Global Step: 29330   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:20:09,108-Speed 10467.93 samples/sec   Loss 10.6260   LearningRate 0.0731   Epoch: 5   Global Step: 29340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:20:10,072-Speed 10638.60 samples/sec   Loss 10.7139   LearningRate 0.0731   Epoch: 5   Global Step: 29350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:20:11,044-Speed 10537.37 samples/sec   Loss 10.6321   LearningRate 0.0731   Epoch: 5   Global Step: 29360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:20:12,099-Speed 9726.63 samples/sec   Loss 10.5624   LearningRate 0.0731   Epoch: 5   Global Step: 29370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:20:13,078-Speed 10466.61 samples/sec   Loss 10.9340   LearningRate 0.0731   Epoch: 5   Global Step: 29380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:20:14,022-Speed 10861.53 samples/sec   Loss 10.7687   LearningRate 0.0731   Epoch: 5   Global Step: 29390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:20:15,002-Speed 10454.89 samples/sec   Loss 10.7308   LearningRate 0.0730   Epoch: 5   Global Step: 29400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:20:15,970-Speed 10594.12 samples/sec   Loss 10.8817   LearningRate 0.0730   Epoch: 5   Global Step: 29410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:20:16,972-Speed 10225.92 samples/sec   Loss 10.9012   LearningRate 0.0730   Epoch: 5   Global Step: 29420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:20:17,934-Speed 10662.29 samples/sec   Loss 10.8135   LearningRate 0.0730   Epoch: 5   Global Step: 29430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:20:18,917-Speed 10422.24 samples/sec   Loss 10.6608   LearningRate 0.0730   Epoch: 5   Global Step: 29440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:20:19,936-Speed 10055.00 samples/sec   Loss 10.6859   LearningRate 0.0730   Epoch: 5   Global Step: 29450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:20:20,910-Speed 10525.33 samples/sec   Loss 10.7288   LearningRate 0.0730   Epoch: 5   Global Step: 29460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:20:21,906-Speed 10291.67 samples/sec   Loss 10.5020   LearningRate 0.0730   Epoch: 5   Global Step: 29470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:20:22,866-Speed 10684.98 samples/sec   Loss 10.8152   LearningRate 0.0730   Epoch: 5   Global Step: 29480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:20:23,859-Speed 10310.72 samples/sec   Loss 10.7610   LearningRate 0.0730   Epoch: 5   Global Step: 29490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:20:24,816-Speed 10712.48 samples/sec   Loss 10.9091   LearningRate 0.0730   Epoch: 5   Global Step: 29500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:20:25,777-Speed 10667.46 samples/sec   Loss 10.7483   LearningRate 0.0730   Epoch: 5   Global Step: 29510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:20:26,761-Speed 10421.37 samples/sec   Loss 10.7507   LearningRate 0.0729   Epoch: 5   Global Step: 29520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:20:27,754-Speed 10325.07 samples/sec   Loss 10.7309   LearningRate 0.0729   Epoch: 5   Global Step: 29530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:20:28,722-Speed 10584.06 samples/sec   Loss 10.8674   LearningRate 0.0729   Epoch: 5   Global Step: 29540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:20:29,696-Speed 10525.15 samples/sec   Loss 10.8841   LearningRate 0.0729   Epoch: 5   Global Step: 29550   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:20:30,667-Speed 10555.37 samples/sec   Loss 10.7377   LearningRate 0.0729   Epoch: 5   Global Step: 29560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:20:31,632-Speed 10635.11 samples/sec   Loss 10.5772   LearningRate 0.0729   Epoch: 5   Global Step: 29570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:20:32,602-Speed 10567.69 samples/sec   Loss 10.5152   LearningRate 0.0729   Epoch: 5   Global Step: 29580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:20:33,598-Speed 10291.26 samples/sec   Loss 10.6339   LearningRate 0.0729   Epoch: 5   Global Step: 29590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:20:34,592-Speed 10305.48 samples/sec   Loss 10.6737   LearningRate 0.0729   Epoch: 5   Global Step: 29600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:20:35,544-Speed 10773.49 samples/sec   Loss 10.8374   LearningRate 0.0729   Epoch: 5   Global Step: 29610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:20:36,526-Speed 10436.04 samples/sec   Loss 10.7394   LearningRate 0.0729   Epoch: 5   Global Step: 29620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:20:37,475-Speed 10802.35 samples/sec   Loss 10.5762   LearningRate 0.0729   Epoch: 5   Global Step: 29630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:20:38,438-Speed 10643.70 samples/sec   Loss 10.7357   LearningRate 0.0728   Epoch: 5   Global Step: 29640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:20:39,381-Speed 10878.88 samples/sec   Loss 10.7841   LearningRate 0.0728   Epoch: 5   Global Step: 29650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:20:40,364-Speed 10427.91 samples/sec   Loss 10.8092   LearningRate 0.0728   Epoch: 5   Global Step: 29660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:20:41,348-Speed 10417.02 samples/sec   Loss 10.7310   LearningRate 0.0728   Epoch: 5   Global Step: 29670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:20:42,352-Speed 10210.98 samples/sec   Loss 10.8771   LearningRate 0.0728   Epoch: 5   Global Step: 29680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:20:43,317-Speed 10627.76 samples/sec   Loss 10.6360   LearningRate 0.0728   Epoch: 5   Global Step: 29690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:20:44,271-Speed 10745.89 samples/sec   Loss 10.5936   LearningRate 0.0728   Epoch: 5   Global Step: 29700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:20:45,242-Speed 10545.11 samples/sec   Loss 10.6448   LearningRate 0.0728   Epoch: 5   Global Step: 29710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:20:46,252-Speed 10149.59 samples/sec   Loss 10.8252   LearningRate 0.0728   Epoch: 5   Global Step: 29720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:20:47,257-Speed 10198.04 samples/sec   Loss 10.7009   LearningRate 0.0728   Epoch: 5   Global Step: 29730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:20:48,277-Speed 10048.24 samples/sec   Loss 10.7179   LearningRate 0.0728   Epoch: 5   Global Step: 29740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:20:49,326-Speed 9769.50 samples/sec   Loss 10.7994   LearningRate 0.0728   Epoch: 5   Global Step: 29750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:20:50,339-Speed 10124.01 samples/sec   Loss 10.5617   LearningRate 0.0727   Epoch: 5   Global Step: 29760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:20:51,305-Speed 10609.50 samples/sec   Loss 10.7560   LearningRate 0.0727   Epoch: 5   Global Step: 29770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:20:52,372-Speed 9599.85 samples/sec   Loss 10.8529   LearningRate 0.0727   Epoch: 5   Global Step: 29780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:20:53,305-Speed 10990.22 samples/sec   Loss 10.6482   LearningRate 0.0727   Epoch: 5   Global Step: 29790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:20:54,291-Speed 10396.04 samples/sec   Loss 10.8697   LearningRate 0.0727   Epoch: 5   Global Step: 29800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:20:55,240-Speed 10793.31 samples/sec   Loss 10.7886   LearningRate 0.0727   Epoch: 5   Global Step: 29810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:20:56,217-Speed 10494.47 samples/sec   Loss 10.5898   LearningRate 0.0727   Epoch: 5   Global Step: 29820   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:20:57,202-Speed 10404.58 samples/sec   Loss 10.6022   LearningRate 0.0727   Epoch: 5   Global Step: 29830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:20:58,145-Speed 10865.95 samples/sec   Loss 10.5560   LearningRate 0.0727   Epoch: 5   Global Step: 29840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:20:59,106-Speed 10666.31 samples/sec   Loss 10.6861   LearningRate 0.0727   Epoch: 5   Global Step: 29850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:21:00,051-Speed 10847.63 samples/sec   Loss 10.6925   LearningRate 0.0727   Epoch: 5   Global Step: 29860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:21:01,026-Speed 10508.36 samples/sec   Loss 10.6955   LearningRate 0.0727   Epoch: 5   Global Step: 29870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:21:01,993-Speed 10608.58 samples/sec   Loss 10.7120   LearningRate 0.0726   Epoch: 5   Global Step: 29880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:21:02,945-Speed 10757.54 samples/sec   Loss 10.4903   LearningRate 0.0726   Epoch: 5   Global Step: 29890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:21:03,894-Speed 10802.85 samples/sec   Loss 10.6057   LearningRate 0.0726   Epoch: 5   Global Step: 29900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:21:04,864-Speed 10574.33 samples/sec   Loss 10.6465   LearningRate 0.0726   Epoch: 5   Global Step: 29910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:21:05,869-Speed 10193.79 samples/sec   Loss 10.6504   LearningRate 0.0726   Epoch: 5   Global Step: 29920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:21:06,846-Speed 10495.20 samples/sec   Loss 10.7622   LearningRate 0.0726   Epoch: 5   Global Step: 29930   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:21:07,799-Speed 10748.04 samples/sec   Loss 10.9082   LearningRate 0.0726   Epoch: 5   Global Step: 29940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:21:08,765-Speed 10609.49 samples/sec   Loss 10.6032   LearningRate 0.0726   Epoch: 5   Global Step: 29950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:21:09,729-Speed 10640.35 samples/sec   Loss 10.6434   LearningRate 0.0726   Epoch: 5   Global Step: 29960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:21:10,676-Speed 10831.36 samples/sec   Loss 10.7054   LearningRate 0.0726   Epoch: 5   Global Step: 29970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:21:11,602-Speed 11063.30 samples/sec   Loss 10.6497   LearningRate 0.0726   Epoch: 5   Global Step: 29980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:21:12,572-Speed 10558.70 samples/sec   Loss 10.5526   LearningRate 0.0726   Epoch: 5   Global Step: 29990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:21:13,564-Speed 10340.83 samples/sec   Loss 10.5994   LearningRate 0.0725   Epoch: 5   Global Step: 30000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:21:35,792-[lfw][30000]XNorm: 14.148164
Training: 2022-04-11 00:21:35,792-[lfw][30000]Accuracy-Flip: 0.99517+-0.00337
Training: 2022-04-11 00:21:35,793-[lfw][30000]Accuracy-Highest: 0.99517
Training: 2022-04-11 00:22:01,415-[cfp_fp][30000]XNorm: 11.849483
Training: 2022-04-11 00:22:01,416-[cfp_fp][30000]Accuracy-Flip: 0.94271+-0.01527
Training: 2022-04-11 00:22:01,416-[cfp_fp][30000]Accuracy-Highest: 0.94271
Training: 2022-04-11 00:22:23,730-[agedb_30][30000]XNorm: 13.728994
Training: 2022-04-11 00:22:23,731-[agedb_30][30000]Accuracy-Flip: 0.95400+-0.01300
Training: 2022-04-11 00:22:23,732-[agedb_30][30000]Accuracy-Highest: 0.95400
Training: 2022-04-11 00:22:24,691-Speed 143.97 samples/sec   Loss 10.7666   LearningRate 0.0725   Epoch: 5   Global Step: 30010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:22:25,709-Speed 10069.39 samples/sec   Loss 10.6307   LearningRate 0.0725   Epoch: 5   Global Step: 30020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:22:26,688-Speed 10469.39 samples/sec   Loss 10.7270   LearningRate 0.0725   Epoch: 5   Global Step: 30030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:22:27,649-Speed 10664.98 samples/sec   Loss 10.7722   LearningRate 0.0725   Epoch: 5   Global Step: 30040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:22:28,578-Speed 11036.47 samples/sec   Loss 10.8148   LearningRate 0.0725   Epoch: 5   Global Step: 30050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:22:29,513-Speed 10966.74 samples/sec   Loss 10.8463   LearningRate 0.0725   Epoch: 5   Global Step: 30060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:22:30,475-Speed 10649.78 samples/sec   Loss 10.6939   LearningRate 0.0725   Epoch: 5   Global Step: 30070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:22:31,423-Speed 10821.94 samples/sec   Loss 10.7953   LearningRate 0.0725   Epoch: 5   Global Step: 30080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:22:32,353-Speed 11029.82 samples/sec   Loss 10.7098   LearningRate 0.0725   Epoch: 5   Global Step: 30090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:22:33,317-Speed 10628.31 samples/sec   Loss 10.8554   LearningRate 0.0725   Epoch: 5   Global Step: 30100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:22:34,282-Speed 10621.47 samples/sec   Loss 10.7216   LearningRate 0.0725   Epoch: 5   Global Step: 30110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:22:35,288-Speed 10188.75 samples/sec   Loss 10.9281   LearningRate 0.0724   Epoch: 5   Global Step: 30120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:22:36,249-Speed 10673.86 samples/sec   Loss 10.6068   LearningRate 0.0724   Epoch: 5   Global Step: 30130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:22:37,185-Speed 10944.76 samples/sec   Loss 10.6842   LearningRate 0.0724   Epoch: 5   Global Step: 30140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:22:38,179-Speed 10311.80 samples/sec   Loss 10.5951   LearningRate 0.0724   Epoch: 5   Global Step: 30150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:22:39,164-Speed 10409.80 samples/sec   Loss 10.6489   LearningRate 0.0724   Epoch: 5   Global Step: 30160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:22:40,130-Speed 10608.97 samples/sec   Loss 10.6835   LearningRate 0.0724   Epoch: 5   Global Step: 30170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:22:41,117-Speed 10391.28 samples/sec   Loss 10.5507   LearningRate 0.0724   Epoch: 5   Global Step: 30180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:22:42,102-Speed 10399.85 samples/sec   Loss 10.7218   LearningRate 0.0724   Epoch: 5   Global Step: 30190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:22:43,101-Speed 10267.06 samples/sec   Loss 10.7545   LearningRate 0.0724   Epoch: 5   Global Step: 30200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:22:44,084-Speed 10430.13 samples/sec   Loss 10.5881   LearningRate 0.0724   Epoch: 5   Global Step: 30210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:22:45,043-Speed 10686.30 samples/sec   Loss 10.6693   LearningRate 0.0724   Epoch: 5   Global Step: 30220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:22:46,000-Speed 10706.13 samples/sec   Loss 10.6657   LearningRate 0.0723   Epoch: 5   Global Step: 30230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:22:46,969-Speed 10579.57 samples/sec   Loss 10.6022   LearningRate 0.0723   Epoch: 5   Global Step: 30240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:22:47,956-Speed 10385.40 samples/sec   Loss 10.6765   LearningRate 0.0723   Epoch: 5   Global Step: 30250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:22:48,897-Speed 10885.19 samples/sec   Loss 10.7111   LearningRate 0.0723   Epoch: 5   Global Step: 30260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:22:49,881-Speed 10427.75 samples/sec   Loss 10.6672   LearningRate 0.0723   Epoch: 5   Global Step: 30270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:22:50,844-Speed 10641.77 samples/sec   Loss 10.7173   LearningRate 0.0723   Epoch: 5   Global Step: 30280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:22:51,803-Speed 10696.68 samples/sec   Loss 10.6712   LearningRate 0.0723   Epoch: 5   Global Step: 30290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:22:52,781-Speed 10479.15 samples/sec   Loss 10.6689   LearningRate 0.0723   Epoch: 5   Global Step: 30300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:22:53,784-Speed 10222.57 samples/sec   Loss 10.7349   LearningRate 0.0723   Epoch: 5   Global Step: 30310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:22:54,759-Speed 10510.76 samples/sec   Loss 10.5939   LearningRate 0.0723   Epoch: 5   Global Step: 30320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:22:55,711-Speed 10767.19 samples/sec   Loss 10.8948   LearningRate 0.0723   Epoch: 5   Global Step: 30330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:22:56,797-Speed 9442.14 samples/sec   Loss 10.8165   LearningRate 0.0723   Epoch: 5   Global Step: 30340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:23:06,480-Speed 1057.64 samples/sec   Loss 10.4917   LearningRate 0.0722   Epoch: 6   Global Step: 30350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:23:07,563-Speed 9469.29 samples/sec   Loss 9.7445   LearningRate 0.0722   Epoch: 6   Global Step: 30360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:23:08,526-Speed 10642.72 samples/sec   Loss 9.6707   LearningRate 0.0722   Epoch: 6   Global Step: 30370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:09,513-Speed 10393.25 samples/sec   Loss 9.7005   LearningRate 0.0722   Epoch: 6   Global Step: 30380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:10,551-Speed 9867.40 samples/sec   Loss 9.6041   LearningRate 0.0722   Epoch: 6   Global Step: 30390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:11,710-Speed 8841.15 samples/sec   Loss 9.7063   LearningRate 0.0722   Epoch: 6   Global Step: 30400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:13,053-Speed 7630.59 samples/sec   Loss 9.6918   LearningRate 0.0722   Epoch: 6   Global Step: 30410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:14,023-Speed 10562.73 samples/sec   Loss 9.7202   LearningRate 0.0722   Epoch: 6   Global Step: 30420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:15,071-Speed 9779.89 samples/sec   Loss 9.6480   LearningRate 0.0722   Epoch: 6   Global Step: 30430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:16,042-Speed 10563.93 samples/sec   Loss 9.8484   LearningRate 0.0722   Epoch: 6   Global Step: 30440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:17,019-Speed 10486.64 samples/sec   Loss 9.8092   LearningRate 0.0722   Epoch: 6   Global Step: 30450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:18,007-Speed 10377.26 samples/sec   Loss 9.8986   LearningRate 0.0722   Epoch: 6   Global Step: 30460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:19,030-Speed 10015.58 samples/sec   Loss 9.8152   LearningRate 0.0721   Epoch: 6   Global Step: 30470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:23:20,172-Speed 8970.97 samples/sec   Loss 9.8426   LearningRate 0.0721   Epoch: 6   Global Step: 30480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:23:21,185-Speed 10121.32 samples/sec   Loss 9.7734   LearningRate 0.0721   Epoch: 6   Global Step: 30490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:23:22,120-Speed 10966.69 samples/sec   Loss 9.9110   LearningRate 0.0721   Epoch: 6   Global Step: 30500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:23:23,131-Speed 10138.78 samples/sec   Loss 9.9852   LearningRate 0.0721   Epoch: 6   Global Step: 30510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:24,088-Speed 10713.56 samples/sec   Loss 9.8062   LearningRate 0.0721   Epoch: 6   Global Step: 30520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:25,070-Speed 10432.90 samples/sec   Loss 9.9201   LearningRate 0.0721   Epoch: 6   Global Step: 30530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:26,020-Speed 10786.81 samples/sec   Loss 9.9992   LearningRate 0.0721   Epoch: 6   Global Step: 30540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:26,970-Speed 10787.81 samples/sec   Loss 9.8113   LearningRate 0.0721   Epoch: 6   Global Step: 30550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:27,923-Speed 10754.33 samples/sec   Loss 10.0471   LearningRate 0.0721   Epoch: 6   Global Step: 30560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:29,048-Speed 9115.27 samples/sec   Loss 9.9443   LearningRate 0.0721   Epoch: 6   Global Step: 30570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:30,104-Speed 9706.91 samples/sec   Loss 9.9426   LearningRate 0.0721   Epoch: 6   Global Step: 30580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:31,053-Speed 10796.33 samples/sec   Loss 9.9854   LearningRate 0.0720   Epoch: 6   Global Step: 30590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:32,147-Speed 9363.12 samples/sec   Loss 9.9649   LearningRate 0.0720   Epoch: 6   Global Step: 30600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:33,099-Speed 10779.23 samples/sec   Loss 9.8577   LearningRate 0.0720   Epoch: 6   Global Step: 30610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:23:34,048-Speed 10794.79 samples/sec   Loss 9.8157   LearningRate 0.0720   Epoch: 6   Global Step: 30620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:23:35,023-Speed 10513.96 samples/sec   Loss 10.0434   LearningRate 0.0720   Epoch: 6   Global Step: 30630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:23:35,952-Speed 11034.03 samples/sec   Loss 10.0188   LearningRate 0.0720   Epoch: 6   Global Step: 30640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:23:37,009-Speed 9705.89 samples/sec   Loss 9.9224   LearningRate 0.0720   Epoch: 6   Global Step: 30650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:37,998-Speed 10365.90 samples/sec   Loss 10.0756   LearningRate 0.0720   Epoch: 6   Global Step: 30660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:38,972-Speed 10528.75 samples/sec   Loss 9.9219   LearningRate 0.0720   Epoch: 6   Global Step: 30670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:39,961-Speed 10354.81 samples/sec   Loss 9.8728   LearningRate 0.0720   Epoch: 6   Global Step: 30680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:40,960-Speed 10273.59 samples/sec   Loss 10.0281   LearningRate 0.0720   Epoch: 6   Global Step: 30690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:41,906-Speed 10837.41 samples/sec   Loss 10.1133   LearningRate 0.0720   Epoch: 6   Global Step: 30700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:42,891-Speed 10406.07 samples/sec   Loss 9.8547   LearningRate 0.0719   Epoch: 6   Global Step: 30710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:43,870-Speed 10469.25 samples/sec   Loss 10.0906   LearningRate 0.0719   Epoch: 6   Global Step: 30720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:44,888-Speed 10073.59 samples/sec   Loss 9.9187   LearningRate 0.0719   Epoch: 6   Global Step: 30730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:45,846-Speed 10699.90 samples/sec   Loss 10.0150   LearningRate 0.0719   Epoch: 6   Global Step: 30740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:46,816-Speed 10565.01 samples/sec   Loss 9.9326   LearningRate 0.0719   Epoch: 6   Global Step: 30750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:23:48,197-Speed 7417.59 samples/sec   Loss 10.0021   LearningRate 0.0719   Epoch: 6   Global Step: 30760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:23:49,201-Speed 10212.73 samples/sec   Loss 10.1885   LearningRate 0.0719   Epoch: 6   Global Step: 30770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:50,207-Speed 10182.00 samples/sec   Loss 10.1694   LearningRate 0.0719   Epoch: 6   Global Step: 30780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:51,155-Speed 10815.41 samples/sec   Loss 10.0398   LearningRate 0.0719   Epoch: 6   Global Step: 30790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:52,192-Speed 9882.84 samples/sec   Loss 10.1568   LearningRate 0.0719   Epoch: 6   Global Step: 30800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:53,160-Speed 10591.14 samples/sec   Loss 10.2990   LearningRate 0.0719   Epoch: 6   Global Step: 30810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:54,149-Speed 10362.51 samples/sec   Loss 10.1026   LearningRate 0.0719   Epoch: 6   Global Step: 30820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:55,126-Speed 10487.59 samples/sec   Loss 10.1576   LearningRate 0.0718   Epoch: 6   Global Step: 30830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:56,099-Speed 10534.49 samples/sec   Loss 10.2046   LearningRate 0.0718   Epoch: 6   Global Step: 30840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:57,145-Speed 9800.66 samples/sec   Loss 10.1745   LearningRate 0.0718   Epoch: 6   Global Step: 30850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:58,134-Speed 10362.48 samples/sec   Loss 10.1540   LearningRate 0.0718   Epoch: 6   Global Step: 30860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:23:59,118-Speed 10418.37 samples/sec   Loss 10.0395   LearningRate 0.0718   Epoch: 6   Global Step: 30870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:24:00,282-Speed 8802.34 samples/sec   Loss 10.2683   LearningRate 0.0718   Epoch: 6   Global Step: 30880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:24:01,318-Speed 9893.99 samples/sec   Loss 10.3447   LearningRate 0.0718   Epoch: 6   Global Step: 30890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:24:02,294-Speed 10517.52 samples/sec   Loss 10.0737   LearningRate 0.0718   Epoch: 6   Global Step: 30900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:24:03,269-Speed 10514.41 samples/sec   Loss 10.0993   LearningRate 0.0718   Epoch: 6   Global Step: 30910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:24:04,265-Speed 10286.78 samples/sec   Loss 10.2203   LearningRate 0.0718   Epoch: 6   Global Step: 30920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:24:05,299-Speed 9905.60 samples/sec   Loss 9.9696   LearningRate 0.0718   Epoch: 6   Global Step: 30930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:24:06,312-Speed 10122.49 samples/sec   Loss 10.2552   LearningRate 0.0718   Epoch: 6   Global Step: 30940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:24:07,312-Speed 10259.90 samples/sec   Loss 10.2266   LearningRate 0.0717   Epoch: 6   Global Step: 30950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:24:08,416-Speed 9275.94 samples/sec   Loss 10.2256   LearningRate 0.0717   Epoch: 6   Global Step: 30960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:24:09,387-Speed 10561.24 samples/sec   Loss 10.1897   LearningRate 0.0717   Epoch: 6   Global Step: 30970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:24:10,389-Speed 10226.83 samples/sec   Loss 10.2498   LearningRate 0.0717   Epoch: 6   Global Step: 30980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:24:11,377-Speed 10373.05 samples/sec   Loss 10.1702   LearningRate 0.0717   Epoch: 6   Global Step: 30990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:24:12,340-Speed 10651.39 samples/sec   Loss 10.1597   LearningRate 0.0717   Epoch: 6   Global Step: 31000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:24:13,288-Speed 10804.86 samples/sec   Loss 10.1727   LearningRate 0.0717   Epoch: 6   Global Step: 31010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:24:14,308-Speed 10053.77 samples/sec   Loss 10.2825   LearningRate 0.0717   Epoch: 6   Global Step: 31020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:24:15,252-Speed 10863.74 samples/sec   Loss 10.2246   LearningRate 0.0717   Epoch: 6   Global Step: 31030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:24:16,230-Speed 10484.57 samples/sec   Loss 10.1922   LearningRate 0.0717   Epoch: 6   Global Step: 31040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:24:17,344-Speed 9214.22 samples/sec   Loss 10.1870   LearningRate 0.0717   Epoch: 6   Global Step: 31050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:24:18,339-Speed 10297.28 samples/sec   Loss 10.2331   LearningRate 0.0717   Epoch: 6   Global Step: 31060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:24:19,305-Speed 10606.52 samples/sec   Loss 10.1995   LearningRate 0.0716   Epoch: 6   Global Step: 31070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:24:20,298-Speed 10327.14 samples/sec   Loss 10.1915   LearningRate 0.0716   Epoch: 6   Global Step: 31080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:24:21,317-Speed 10055.67 samples/sec   Loss 10.2003   LearningRate 0.0716   Epoch: 6   Global Step: 31090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:24:22,274-Speed 10710.31 samples/sec   Loss 10.1531   LearningRate 0.0716   Epoch: 6   Global Step: 31100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:24:23,230-Speed 10728.64 samples/sec   Loss 10.2758   LearningRate 0.0716   Epoch: 6   Global Step: 31110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:24:24,244-Speed 10103.87 samples/sec   Loss 10.1167   LearningRate 0.0716   Epoch: 6   Global Step: 31120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:24:25,204-Speed 10678.08 samples/sec   Loss 10.3545   LearningRate 0.0716   Epoch: 6   Global Step: 31130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:24:26,164-Speed 10683.00 samples/sec   Loss 10.2049   LearningRate 0.0716   Epoch: 6   Global Step: 31140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:24:27,100-Speed 10946.15 samples/sec   Loss 10.1708   LearningRate 0.0716   Epoch: 6   Global Step: 31150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:24:28,113-Speed 10124.98 samples/sec   Loss 10.3403   LearningRate 0.0716   Epoch: 6   Global Step: 31160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:24:29,106-Speed 10316.56 samples/sec   Loss 10.4102   LearningRate 0.0716   Epoch: 6   Global Step: 31170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:24:30,093-Speed 10391.62 samples/sec   Loss 10.3516   LearningRate 0.0716   Epoch: 6   Global Step: 31180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:24:31,049-Speed 10717.64 samples/sec   Loss 10.3126   LearningRate 0.0715   Epoch: 6   Global Step: 31190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:24:32,071-Speed 10030.95 samples/sec   Loss 10.2736   LearningRate 0.0715   Epoch: 6   Global Step: 31200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:24:33,076-Speed 10201.61 samples/sec   Loss 10.2305   LearningRate 0.0715   Epoch: 6   Global Step: 31210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:24:34,063-Speed 10380.68 samples/sec   Loss 10.3582   LearningRate 0.0715   Epoch: 6   Global Step: 31220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:24:35,085-Speed 10028.63 samples/sec   Loss 10.2557   LearningRate 0.0715   Epoch: 6   Global Step: 31230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:24:36,069-Speed 10414.96 samples/sec   Loss 10.2234   LearningRate 0.0715   Epoch: 6   Global Step: 31240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:24:37,034-Speed 10620.28 samples/sec   Loss 10.3445   LearningRate 0.0715   Epoch: 6   Global Step: 31250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:24:37,998-Speed 10645.08 samples/sec   Loss 10.3767   LearningRate 0.0715   Epoch: 6   Global Step: 31260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:24:38,938-Speed 10895.04 samples/sec   Loss 10.3252   LearningRate 0.0715   Epoch: 6   Global Step: 31270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:24:39,949-Speed 10139.99 samples/sec   Loss 10.1144   LearningRate 0.0715   Epoch: 6   Global Step: 31280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:24:40,951-Speed 10224.96 samples/sec   Loss 10.2347   LearningRate 0.0715   Epoch: 6   Global Step: 31290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:24:41,939-Speed 10379.14 samples/sec   Loss 10.4016   LearningRate 0.0715   Epoch: 6   Global Step: 31300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:24:42,902-Speed 10647.65 samples/sec   Loss 10.2616   LearningRate 0.0714   Epoch: 6   Global Step: 31310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:24:43,844-Speed 10879.23 samples/sec   Loss 10.1893   LearningRate 0.0714   Epoch: 6   Global Step: 31320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:24:44,846-Speed 10228.90 samples/sec   Loss 10.4996   LearningRate 0.0714   Epoch: 6   Global Step: 31330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:24:45,844-Speed 10266.38 samples/sec   Loss 10.2951   LearningRate 0.0714   Epoch: 6   Global Step: 31340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:24:46,822-Speed 10487.65 samples/sec   Loss 10.3642   LearningRate 0.0714   Epoch: 6   Global Step: 31350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:24:47,776-Speed 10738.63 samples/sec   Loss 10.3227   LearningRate 0.0714   Epoch: 6   Global Step: 31360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:24:48,737-Speed 10673.06 samples/sec   Loss 10.2341   LearningRate 0.0714   Epoch: 6   Global Step: 31370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:24:49,721-Speed 10414.20 samples/sec   Loss 10.3062   LearningRate 0.0714   Epoch: 6   Global Step: 31380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:24:50,687-Speed 10605.32 samples/sec   Loss 10.3499   LearningRate 0.0714   Epoch: 6   Global Step: 31390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:24:51,641-Speed 10748.32 samples/sec   Loss 10.2050   LearningRate 0.0714   Epoch: 6   Global Step: 31400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:24:52,620-Speed 10464.86 samples/sec   Loss 10.1044   LearningRate 0.0714   Epoch: 6   Global Step: 31410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:24:53,645-Speed 9996.64 samples/sec   Loss 10.3635   LearningRate 0.0714   Epoch: 6   Global Step: 31420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:24:54,592-Speed 10834.97 samples/sec   Loss 10.2417   LearningRate 0.0713   Epoch: 6   Global Step: 31430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:24:55,539-Speed 10824.13 samples/sec   Loss 10.1652   LearningRate 0.0713   Epoch: 6   Global Step: 31440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:24:56,493-Speed 10739.29 samples/sec   Loss 10.1723   LearningRate 0.0713   Epoch: 6   Global Step: 31450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:24:57,409-Speed 11187.67 samples/sec   Loss 10.3067   LearningRate 0.0713   Epoch: 6   Global Step: 31460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:24:58,358-Speed 10797.11 samples/sec   Loss 10.3015   LearningRate 0.0713   Epoch: 6   Global Step: 31470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:24:59,300-Speed 10890.30 samples/sec   Loss 10.2040   LearningRate 0.0713   Epoch: 6   Global Step: 31480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:00,268-Speed 10582.79 samples/sec   Loss 10.4241   LearningRate 0.0713   Epoch: 6   Global Step: 31490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:01,203-Speed 10967.54 samples/sec   Loss 10.1008   LearningRate 0.0713   Epoch: 6   Global Step: 31500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:02,150-Speed 10828.95 samples/sec   Loss 10.6102   LearningRate 0.0713   Epoch: 6   Global Step: 31510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:03,120-Speed 10568.31 samples/sec   Loss 10.3006   LearningRate 0.0713   Epoch: 6   Global Step: 31520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:04,083-Speed 10649.28 samples/sec   Loss 10.4359   LearningRate 0.0713   Epoch: 6   Global Step: 31530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:05,046-Speed 10643.69 samples/sec   Loss 10.2803   LearningRate 0.0713   Epoch: 6   Global Step: 31540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:06,031-Speed 10402.03 samples/sec   Loss 10.2931   LearningRate 0.0712   Epoch: 6   Global Step: 31550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:07,003-Speed 10540.69 samples/sec   Loss 10.3570   LearningRate 0.0712   Epoch: 6   Global Step: 31560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:07,988-Speed 10401.69 samples/sec   Loss 10.4293   LearningRate 0.0712   Epoch: 6   Global Step: 31570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:08,967-Speed 10475.38 samples/sec   Loss 10.4620   LearningRate 0.0712   Epoch: 6   Global Step: 31580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:09,953-Speed 10384.28 samples/sec   Loss 10.3141   LearningRate 0.0712   Epoch: 6   Global Step: 31590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:10,921-Speed 10593.47 samples/sec   Loss 10.1388   LearningRate 0.0712   Epoch: 6   Global Step: 31600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:11,882-Speed 10669.48 samples/sec   Loss 10.2282   LearningRate 0.0712   Epoch: 6   Global Step: 31610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:12,863-Speed 10447.65 samples/sec   Loss 10.2095   LearningRate 0.0712   Epoch: 6   Global Step: 31620   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:25:13,840-Speed 10494.88 samples/sec   Loss 10.2469   LearningRate 0.0712   Epoch: 6   Global Step: 31630   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:25:14,806-Speed 10616.35 samples/sec   Loss 10.2406   LearningRate 0.0712   Epoch: 6   Global Step: 31640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:15,767-Speed 10660.85 samples/sec   Loss 10.2139   LearningRate 0.0712   Epoch: 6   Global Step: 31650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:16,725-Speed 10693.59 samples/sec   Loss 10.2661   LearningRate 0.0712   Epoch: 6   Global Step: 31660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:17,743-Speed 10073.31 samples/sec   Loss 10.3311   LearningRate 0.0711   Epoch: 6   Global Step: 31670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:18,689-Speed 10833.22 samples/sec   Loss 10.2075   LearningRate 0.0711   Epoch: 6   Global Step: 31680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:19,662-Speed 10527.54 samples/sec   Loss 10.3072   LearningRate 0.0711   Epoch: 6   Global Step: 31690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:20,618-Speed 10729.54 samples/sec   Loss 10.2505   LearningRate 0.0711   Epoch: 6   Global Step: 31700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:21,590-Speed 10535.32 samples/sec   Loss 10.2118   LearningRate 0.0711   Epoch: 6   Global Step: 31710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:22,603-Speed 10124.40 samples/sec   Loss 10.3388   LearningRate 0.0711   Epoch: 6   Global Step: 31720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:23,563-Speed 10681.63 samples/sec   Loss 10.2527   LearningRate 0.0711   Epoch: 6   Global Step: 31730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:24,512-Speed 10809.27 samples/sec   Loss 10.3465   LearningRate 0.0711   Epoch: 6   Global Step: 31740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:25,479-Speed 10595.24 samples/sec   Loss 10.2659   LearningRate 0.0711   Epoch: 6   Global Step: 31750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:26,463-Speed 10414.27 samples/sec   Loss 10.4114   LearningRate 0.0711   Epoch: 6   Global Step: 31760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:27,492-Speed 9959.20 samples/sec   Loss 10.3244   LearningRate 0.0711   Epoch: 6   Global Step: 31770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:28,455-Speed 10643.75 samples/sec   Loss 10.3550   LearningRate 0.0711   Epoch: 6   Global Step: 31780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:29,409-Speed 10746.91 samples/sec   Loss 10.3714   LearningRate 0.0710   Epoch: 6   Global Step: 31790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:30,340-Speed 11013.43 samples/sec   Loss 10.2278   LearningRate 0.0710   Epoch: 6   Global Step: 31800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:31,306-Speed 10611.12 samples/sec   Loss 10.5145   LearningRate 0.0710   Epoch: 6   Global Step: 31810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:32,287-Speed 10445.64 samples/sec   Loss 10.4586   LearningRate 0.0710   Epoch: 6   Global Step: 31820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:33,265-Speed 10490.40 samples/sec   Loss 10.2343   LearningRate 0.0710   Epoch: 6   Global Step: 31830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:34,214-Speed 10793.95 samples/sec   Loss 10.4162   LearningRate 0.0710   Epoch: 6   Global Step: 31840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:35,183-Speed 10570.86 samples/sec   Loss 10.3085   LearningRate 0.0710   Epoch: 6   Global Step: 31850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:36,161-Speed 10480.86 samples/sec   Loss 10.3527   LearningRate 0.0710   Epoch: 6   Global Step: 31860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:37,132-Speed 10564.29 samples/sec   Loss 10.4361   LearningRate 0.0710   Epoch: 6   Global Step: 31870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:38,095-Speed 10645.91 samples/sec   Loss 10.4388   LearningRate 0.0710   Epoch: 6   Global Step: 31880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:39,044-Speed 10795.91 samples/sec   Loss 10.3340   LearningRate 0.0710   Epoch: 6   Global Step: 31890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:40,013-Speed 10577.69 samples/sec   Loss 10.2340   LearningRate 0.0710   Epoch: 6   Global Step: 31900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:41,018-Speed 10199.07 samples/sec   Loss 10.3402   LearningRate 0.0709   Epoch: 6   Global Step: 31910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:41,983-Speed 10618.72 samples/sec   Loss 10.4681   LearningRate 0.0709   Epoch: 6   Global Step: 31920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:42,971-Speed 10376.15 samples/sec   Loss 10.3917   LearningRate 0.0709   Epoch: 6   Global Step: 31930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:25:43,925-Speed 10743.32 samples/sec   Loss 10.4095   LearningRate 0.0709   Epoch: 6   Global Step: 31940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:25:44,910-Speed 10406.16 samples/sec   Loss 10.5335   LearningRate 0.0709   Epoch: 6   Global Step: 31950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:25:45,842-Speed 10999.65 samples/sec   Loss 10.3279   LearningRate 0.0709   Epoch: 6   Global Step: 31960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:25:46,836-Speed 10306.12 samples/sec   Loss 10.4771   LearningRate 0.0709   Epoch: 6   Global Step: 31970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:25:47,815-Speed 10475.78 samples/sec   Loss 10.4007   LearningRate 0.0709   Epoch: 6   Global Step: 31980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:25:48,825-Speed 10143.70 samples/sec   Loss 10.3256   LearningRate 0.0709   Epoch: 6   Global Step: 31990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:25:49,788-Speed 10651.01 samples/sec   Loss 10.2751   LearningRate 0.0709   Epoch: 6   Global Step: 32000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:26:12,357-[lfw][32000]XNorm: 13.916358
Training: 2022-04-11 00:26:12,358-[lfw][32000]Accuracy-Flip: 0.99350+-0.00376
Training: 2022-04-11 00:26:12,358-[lfw][32000]Accuracy-Highest: 0.99517
Training: 2022-04-11 00:26:38,014-[cfp_fp][32000]XNorm: 11.734932
Training: 2022-04-11 00:26:38,015-[cfp_fp][32000]Accuracy-Flip: 0.93971+-0.01329
Training: 2022-04-11 00:26:38,015-[cfp_fp][32000]Accuracy-Highest: 0.94271
Training: 2022-04-11 00:27:00,437-[agedb_30][32000]XNorm: 13.497513
Training: 2022-04-11 00:27:00,437-[agedb_30][32000]Accuracy-Flip: 0.94983+-0.00979
Training: 2022-04-11 00:27:00,439-[agedb_30][32000]Accuracy-Highest: 0.95400
Training: 2022-04-11 00:27:01,366-Speed 143.06 samples/sec   Loss 10.3349   LearningRate 0.0709   Epoch: 6   Global Step: 32010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:27:02,343-Speed 10484.03 samples/sec   Loss 10.3843   LearningRate 0.0709   Epoch: 6   Global Step: 32020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:27:03,305-Speed 10664.36 samples/sec   Loss 10.3029   LearningRate 0.0708   Epoch: 6   Global Step: 32030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:27:04,277-Speed 10538.34 samples/sec   Loss 10.2761   LearningRate 0.0708   Epoch: 6   Global Step: 32040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:27:05,236-Speed 10687.26 samples/sec   Loss 10.4471   LearningRate 0.0708   Epoch: 6   Global Step: 32050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:27:06,195-Speed 10702.38 samples/sec   Loss 10.3709   LearningRate 0.0708   Epoch: 6   Global Step: 32060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:27:07,152-Speed 10718.93 samples/sec   Loss 10.6668   LearningRate 0.0708   Epoch: 6   Global Step: 32070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:27:08,174-Speed 10037.00 samples/sec   Loss 10.2822   LearningRate 0.0708   Epoch: 6   Global Step: 32080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:27:09,125-Speed 10801.59 samples/sec   Loss 10.4320   LearningRate 0.0708   Epoch: 6   Global Step: 32090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:27:10,064-Speed 10912.36 samples/sec   Loss 10.3538   LearningRate 0.0708   Epoch: 6   Global Step: 32100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:27:11,011-Speed 10824.27 samples/sec   Loss 10.3783   LearningRate 0.0708   Epoch: 6   Global Step: 32110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:27:12,008-Speed 10281.96 samples/sec   Loss 10.3734   LearningRate 0.0708   Epoch: 6   Global Step: 32120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:27:12,970-Speed 10649.87 samples/sec   Loss 10.3591   LearningRate 0.0708   Epoch: 6   Global Step: 32130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:27:13,956-Speed 10399.50 samples/sec   Loss 10.5541   LearningRate 0.0708   Epoch: 6   Global Step: 32140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:27:14,944-Speed 10372.50 samples/sec   Loss 10.3985   LearningRate 0.0707   Epoch: 6   Global Step: 32150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:27:15,904-Speed 10677.50 samples/sec   Loss 10.2830   LearningRate 0.0707   Epoch: 6   Global Step: 32160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:27:16,885-Speed 10440.18 samples/sec   Loss 10.3831   LearningRate 0.0707   Epoch: 6   Global Step: 32170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:27:17,827-Speed 10877.57 samples/sec   Loss 10.3355   LearningRate 0.0707   Epoch: 6   Global Step: 32180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:27:18,812-Speed 10412.54 samples/sec   Loss 10.4280   LearningRate 0.0707   Epoch: 6   Global Step: 32190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:27:19,780-Speed 10582.29 samples/sec   Loss 10.4673   LearningRate 0.0707   Epoch: 6   Global Step: 32200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:27:20,761-Speed 10446.09 samples/sec   Loss 10.2867   LearningRate 0.0707   Epoch: 6   Global Step: 32210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:27:21,730-Speed 10577.09 samples/sec   Loss 10.4300   LearningRate 0.0707   Epoch: 6   Global Step: 32220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:27:22,685-Speed 10738.60 samples/sec   Loss 10.5092   LearningRate 0.0707   Epoch: 6   Global Step: 32230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:27:23,636-Speed 10775.22 samples/sec   Loss 10.4829   LearningRate 0.0707   Epoch: 6   Global Step: 32240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:27:24,622-Speed 10404.36 samples/sec   Loss 10.5102   LearningRate 0.0707   Epoch: 6   Global Step: 32250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:27:25,580-Speed 10697.62 samples/sec   Loss 10.4486   LearningRate 0.0707   Epoch: 6   Global Step: 32260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:27:26,560-Speed 10458.42 samples/sec   Loss 10.1710   LearningRate 0.0706   Epoch: 6   Global Step: 32270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:27:27,501-Speed 10893.39 samples/sec   Loss 10.3365   LearningRate 0.0706   Epoch: 6   Global Step: 32280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:27:28,460-Speed 10676.90 samples/sec   Loss 10.4489   LearningRate 0.0706   Epoch: 6   Global Step: 32290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:27:29,456-Speed 10306.73 samples/sec   Loss 10.2091   LearningRate 0.0706   Epoch: 6   Global Step: 32300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:27:30,413-Speed 10720.59 samples/sec   Loss 10.4397   LearningRate 0.0706   Epoch: 6   Global Step: 32310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:27:31,366-Speed 10751.91 samples/sec   Loss 10.4592   LearningRate 0.0706   Epoch: 6   Global Step: 32320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:27:32,312-Speed 10833.20 samples/sec   Loss 10.4068   LearningRate 0.0706   Epoch: 6   Global Step: 32330   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 00:27:33,281-Speed 10582.16 samples/sec   Loss 10.4576   LearningRate 0.0706   Epoch: 6   Global Step: 32340   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 00:27:34,290-Speed 10154.08 samples/sec   Loss 10.4906   LearningRate 0.0706   Epoch: 6   Global Step: 32350   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 00:27:35,248-Speed 10700.70 samples/sec   Loss 10.3955   LearningRate 0.0706   Epoch: 6   Global Step: 32360   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 00:27:36,217-Speed 10578.52 samples/sec   Loss 10.4460   LearningRate 0.0706   Epoch: 6   Global Step: 32370   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 00:27:37,186-Speed 10575.74 samples/sec   Loss 10.3995   LearningRate 0.0706   Epoch: 6   Global Step: 32380   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 00:27:38,147-Speed 10663.83 samples/sec   Loss 10.4231   LearningRate 0.0705   Epoch: 6   Global Step: 32390   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 00:27:39,106-Speed 10693.08 samples/sec   Loss 10.4866   LearningRate 0.0705   Epoch: 6   Global Step: 32400   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 00:27:40,068-Speed 10655.09 samples/sec   Loss 10.3547   LearningRate 0.0705   Epoch: 6   Global Step: 32410   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 00:27:41,060-Speed 10333.60 samples/sec   Loss 10.3296   LearningRate 0.0705   Epoch: 6   Global Step: 32420   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 00:27:42,007-Speed 10818.18 samples/sec   Loss 10.6465   LearningRate 0.0705   Epoch: 6   Global Step: 32430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:27:42,948-Speed 10894.76 samples/sec   Loss 10.3338   LearningRate 0.0705   Epoch: 6   Global Step: 32440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:27:43,928-Speed 10466.25 samples/sec   Loss 10.3907   LearningRate 0.0705   Epoch: 6   Global Step: 32450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:27:44,901-Speed 10535.40 samples/sec   Loss 10.4616   LearningRate 0.0705   Epoch: 6   Global Step: 32460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:27:45,841-Speed 10905.68 samples/sec   Loss 10.4251   LearningRate 0.0705   Epoch: 6   Global Step: 32470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:27:46,805-Speed 10629.85 samples/sec   Loss 10.3946   LearningRate 0.0705   Epoch: 6   Global Step: 32480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:27:47,798-Speed 10346.69 samples/sec   Loss 10.5622   LearningRate 0.0705   Epoch: 6   Global Step: 32490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:27:48,714-Speed 11178.11 samples/sec   Loss 10.4081   LearningRate 0.0705   Epoch: 6   Global Step: 32500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:27:49,702-Speed 10373.84 samples/sec   Loss 10.4488   LearningRate 0.0704   Epoch: 6   Global Step: 32510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:27:50,657-Speed 10736.09 samples/sec   Loss 10.3181   LearningRate 0.0704   Epoch: 6   Global Step: 32520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:27:51,659-Speed 10233.19 samples/sec   Loss 10.3815   LearningRate 0.0704   Epoch: 6   Global Step: 32530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:27:52,617-Speed 10712.86 samples/sec   Loss 10.4851   LearningRate 0.0704   Epoch: 6   Global Step: 32540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:27:53,621-Speed 10212.49 samples/sec   Loss 10.3468   LearningRate 0.0704   Epoch: 6   Global Step: 32550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:27:54,583-Speed 10648.64 samples/sec   Loss 10.3350   LearningRate 0.0704   Epoch: 6   Global Step: 32560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:27:55,563-Speed 10457.46 samples/sec   Loss 10.3523   LearningRate 0.0704   Epoch: 6   Global Step: 32570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:27:56,529-Speed 10609.00 samples/sec   Loss 10.4690   LearningRate 0.0704   Epoch: 6   Global Step: 32580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:27:57,501-Speed 10539.13 samples/sec   Loss 10.2472   LearningRate 0.0704   Epoch: 6   Global Step: 32590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:27:58,493-Speed 10338.77 samples/sec   Loss 10.2872   LearningRate 0.0704   Epoch: 6   Global Step: 32600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:27:59,489-Speed 10285.63 samples/sec   Loss 10.4315   LearningRate 0.0704   Epoch: 6   Global Step: 32610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:28:00,438-Speed 10825.49 samples/sec   Loss 10.3002   LearningRate 0.0704   Epoch: 6   Global Step: 32620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:28:01,397-Speed 10682.81 samples/sec   Loss 10.2305   LearningRate 0.0703   Epoch: 6   Global Step: 32630   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:28:02,365-Speed 10579.78 samples/sec   Loss 10.3437   LearningRate 0.0703   Epoch: 6   Global Step: 32640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:28:03,320-Speed 10741.78 samples/sec   Loss 10.4109   LearningRate 0.0703   Epoch: 6   Global Step: 32650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:28:04,300-Speed 10454.06 samples/sec   Loss 10.4446   LearningRate 0.0703   Epoch: 6   Global Step: 32660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:28:05,282-Speed 10438.79 samples/sec   Loss 10.4433   LearningRate 0.0703   Epoch: 6   Global Step: 32670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:28:06,261-Speed 10468.85 samples/sec   Loss 10.2071   LearningRate 0.0703   Epoch: 6   Global Step: 32680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:28:07,242-Speed 10459.81 samples/sec   Loss 10.5229   LearningRate 0.0703   Epoch: 6   Global Step: 32690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:28:08,233-Speed 10337.92 samples/sec   Loss 10.4120   LearningRate 0.0703   Epoch: 6   Global Step: 32700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:28:09,196-Speed 10639.08 samples/sec   Loss 10.5112   LearningRate 0.0703   Epoch: 6   Global Step: 32710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:28:10,195-Speed 10270.85 samples/sec   Loss 10.3959   LearningRate 0.0703   Epoch: 6   Global Step: 32720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:28:11,168-Speed 10529.84 samples/sec   Loss 10.4350   LearningRate 0.0703   Epoch: 6   Global Step: 32730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:28:12,133-Speed 10623.20 samples/sec   Loss 10.3056   LearningRate 0.0703   Epoch: 6   Global Step: 32740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:28:13,063-Speed 11027.53 samples/sec   Loss 10.3970   LearningRate 0.0702   Epoch: 6   Global Step: 32750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:28:14,013-Speed 10798.57 samples/sec   Loss 10.3434   LearningRate 0.0702   Epoch: 6   Global Step: 32760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:28:14,966-Speed 10749.47 samples/sec   Loss 10.4059   LearningRate 0.0702   Epoch: 6   Global Step: 32770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:28:15,913-Speed 10820.52 samples/sec   Loss 10.6879   LearningRate 0.0702   Epoch: 6   Global Step: 32780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:28:16,896-Speed 10425.62 samples/sec   Loss 10.3221   LearningRate 0.0702   Epoch: 6   Global Step: 32790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:28:17,922-Speed 9990.40 samples/sec   Loss 10.4927   LearningRate 0.0702   Epoch: 6   Global Step: 32800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:28:18,874-Speed 10769.82 samples/sec   Loss 10.3618   LearningRate 0.0702   Epoch: 6   Global Step: 32810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:28:19,870-Speed 10294.84 samples/sec   Loss 10.3611   LearningRate 0.0702   Epoch: 6   Global Step: 32820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:28:20,840-Speed 10565.42 samples/sec   Loss 10.5329   LearningRate 0.0702   Epoch: 6   Global Step: 32830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:28:21,822-Speed 10435.81 samples/sec   Loss 10.2623   LearningRate 0.0702   Epoch: 6   Global Step: 32840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:28:22,823-Speed 10240.52 samples/sec   Loss 10.4428   LearningRate 0.0702   Epoch: 6   Global Step: 32850   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:28:23,762-Speed 10915.78 samples/sec   Loss 10.4350   LearningRate 0.0702   Epoch: 6   Global Step: 32860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:28:24,709-Speed 10825.61 samples/sec   Loss 10.5431   LearningRate 0.0701   Epoch: 6   Global Step: 32870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:28:25,694-Speed 10410.51 samples/sec   Loss 10.6219   LearningRate 0.0701   Epoch: 6   Global Step: 32880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:28:26,722-Speed 9962.37 samples/sec   Loss 10.4488   LearningRate 0.0701   Epoch: 6   Global Step: 32890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:28:27,708-Speed 10407.78 samples/sec   Loss 10.4967   LearningRate 0.0701   Epoch: 6   Global Step: 32900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:28:28,703-Speed 10297.30 samples/sec   Loss 10.2702   LearningRate 0.0701   Epoch: 6   Global Step: 32910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:28:29,643-Speed 10912.66 samples/sec   Loss 10.4269   LearningRate 0.0701   Epoch: 6   Global Step: 32920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:28:30,654-Speed 10164.27 samples/sec   Loss 10.4188   LearningRate 0.0701   Epoch: 6   Global Step: 32930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:28:31,637-Speed 10423.44 samples/sec   Loss 10.3441   LearningRate 0.0701   Epoch: 6   Global Step: 32940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:28:32,603-Speed 10614.43 samples/sec   Loss 10.3764   LearningRate 0.0701   Epoch: 6   Global Step: 32950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:28:33,541-Speed 10922.04 samples/sec   Loss 10.5070   LearningRate 0.0701   Epoch: 6   Global Step: 32960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:28:34,522-Speed 10452.21 samples/sec   Loss 10.2929   LearningRate 0.0701   Epoch: 6   Global Step: 32970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:28:35,486-Speed 10637.11 samples/sec   Loss 10.3391   LearningRate 0.0701   Epoch: 6   Global Step: 32980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:28:36,457-Speed 10549.66 samples/sec   Loss 10.3974   LearningRate 0.0700   Epoch: 6   Global Step: 32990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:28:37,427-Speed 10569.64 samples/sec   Loss 10.3652   LearningRate 0.0700   Epoch: 6   Global Step: 33000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:28:38,417-Speed 10348.59 samples/sec   Loss 10.2888   LearningRate 0.0700   Epoch: 6   Global Step: 33010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:28:39,383-Speed 10618.26 samples/sec   Loss 10.3690   LearningRate 0.0700   Epoch: 6   Global Step: 33020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:28:40,361-Speed 10475.49 samples/sec   Loss 10.1630   LearningRate 0.0700   Epoch: 6   Global Step: 33030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:28:41,319-Speed 10696.37 samples/sec   Loss 10.4351   LearningRate 0.0700   Epoch: 6   Global Step: 33040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:28:42,285-Speed 10616.09 samples/sec   Loss 10.3081   LearningRate 0.0700   Epoch: 6   Global Step: 33050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:28:43,298-Speed 10120.80 samples/sec   Loss 10.4131   LearningRate 0.0700   Epoch: 6   Global Step: 33060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:28:44,269-Speed 10555.61 samples/sec   Loss 10.4184   LearningRate 0.0700   Epoch: 6   Global Step: 33070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:28:45,231-Speed 10654.52 samples/sec   Loss 10.2257   LearningRate 0.0700   Epoch: 6   Global Step: 33080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:28:46,181-Speed 10786.33 samples/sec   Loss 10.4100   LearningRate 0.0700   Epoch: 6   Global Step: 33090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:28:47,123-Speed 10883.29 samples/sec   Loss 10.6398   LearningRate 0.0700   Epoch: 6   Global Step: 33100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:28:48,103-Speed 10455.08 samples/sec   Loss 10.4875   LearningRate 0.0699   Epoch: 6   Global Step: 33110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:28:49,084-Speed 10466.63 samples/sec   Loss 10.4153   LearningRate 0.0699   Epoch: 6   Global Step: 33120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:28:50,024-Speed 10908.63 samples/sec   Loss 10.4388   LearningRate 0.0699   Epoch: 6   Global Step: 33130   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:28:50,974-Speed 10784.97 samples/sec   Loss 10.4348   LearningRate 0.0699   Epoch: 6   Global Step: 33140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:28:51,940-Speed 10615.96 samples/sec   Loss 10.4041   LearningRate 0.0699   Epoch: 6   Global Step: 33150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:28:52,923-Speed 10440.21 samples/sec   Loss 10.5143   LearningRate 0.0699   Epoch: 6   Global Step: 33160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:28:53,875-Speed 10766.27 samples/sec   Loss 10.5087   LearningRate 0.0699   Epoch: 6   Global Step: 33170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:28:54,896-Speed 10031.79 samples/sec   Loss 10.4643   LearningRate 0.0699   Epoch: 6   Global Step: 33180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:28:55,836-Speed 10901.77 samples/sec   Loss 10.3826   LearningRate 0.0699   Epoch: 6   Global Step: 33190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:28:56,783-Speed 10827.28 samples/sec   Loss 10.4077   LearningRate 0.0699   Epoch: 6   Global Step: 33200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:28:57,734-Speed 10784.06 samples/sec   Loss 10.5574   LearningRate 0.0699   Epoch: 6   Global Step: 33210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:28:58,682-Speed 10813.10 samples/sec   Loss 10.4933   LearningRate 0.0699   Epoch: 6   Global Step: 33220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:28:59,676-Speed 10303.92 samples/sec   Loss 10.4136   LearningRate 0.0698   Epoch: 6   Global Step: 33230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:29:00,654-Speed 10483.32 samples/sec   Loss 10.1961   LearningRate 0.0698   Epoch: 6   Global Step: 33240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:29:01,625-Speed 10558.68 samples/sec   Loss 10.3310   LearningRate 0.0698   Epoch: 6   Global Step: 33250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:29:02,587-Speed 10661.07 samples/sec   Loss 10.3578   LearningRate 0.0698   Epoch: 6   Global Step: 33260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:29:03,551-Speed 10626.93 samples/sec   Loss 10.5498   LearningRate 0.0698   Epoch: 6   Global Step: 33270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:29:04,534-Speed 10430.06 samples/sec   Loss 10.2683   LearningRate 0.0698   Epoch: 6   Global Step: 33280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:05,482-Speed 10802.93 samples/sec   Loss 10.4513   LearningRate 0.0698   Epoch: 6   Global Step: 33290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:29:06,442-Speed 10675.13 samples/sec   Loss 10.6898   LearningRate 0.0698   Epoch: 6   Global Step: 33300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:29:07,454-Speed 10127.95 samples/sec   Loss 10.4862   LearningRate 0.0698   Epoch: 6   Global Step: 33310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:29:08,445-Speed 10345.16 samples/sec   Loss 10.1846   LearningRate 0.0698   Epoch: 6   Global Step: 33320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:29:09,443-Speed 10269.47 samples/sec   Loss 10.3636   LearningRate 0.0698   Epoch: 6   Global Step: 33330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:29:10,401-Speed 10700.92 samples/sec   Loss 10.5421   LearningRate 0.0698   Epoch: 6   Global Step: 33340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:29:11,386-Speed 10406.58 samples/sec   Loss 10.6266   LearningRate 0.0697   Epoch: 6   Global Step: 33350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:29:12,325-Speed 10920.83 samples/sec   Loss 10.3894   LearningRate 0.0697   Epoch: 6   Global Step: 33360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:29:13,279-Speed 10732.80 samples/sec   Loss 10.6262   LearningRate 0.0697   Epoch: 6   Global Step: 33370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:29:14,250-Speed 10556.51 samples/sec   Loss 10.3116   LearningRate 0.0697   Epoch: 6   Global Step: 33380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:29:15,214-Speed 10636.34 samples/sec   Loss 10.2096   LearningRate 0.0697   Epoch: 6   Global Step: 33390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:16,182-Speed 10582.32 samples/sec   Loss 10.5411   LearningRate 0.0697   Epoch: 6   Global Step: 33400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:17,149-Speed 10602.68 samples/sec   Loss 10.3898   LearningRate 0.0697   Epoch: 6   Global Step: 33410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:18,117-Speed 10592.19 samples/sec   Loss 10.2315   LearningRate 0.0697   Epoch: 6   Global Step: 33420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:19,071-Speed 10743.20 samples/sec   Loss 10.3346   LearningRate 0.0697   Epoch: 6   Global Step: 33430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:20,061-Speed 10352.72 samples/sec   Loss 10.4365   LearningRate 0.0697   Epoch: 6   Global Step: 33440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:21,020-Speed 10685.32 samples/sec   Loss 10.4816   LearningRate 0.0697   Epoch: 6   Global Step: 33450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:21,993-Speed 10524.58 samples/sec   Loss 10.2612   LearningRate 0.0697   Epoch: 6   Global Step: 33460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:22,952-Speed 10697.16 samples/sec   Loss 10.3869   LearningRate 0.0697   Epoch: 6   Global Step: 33470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:23,893-Speed 10887.91 samples/sec   Loss 10.2924   LearningRate 0.0696   Epoch: 6   Global Step: 33480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:24,866-Speed 10535.50 samples/sec   Loss 10.4838   LearningRate 0.0696   Epoch: 6   Global Step: 33490   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:29:25,840-Speed 10515.74 samples/sec   Loss 10.4723   LearningRate 0.0696   Epoch: 6   Global Step: 33500   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:29:26,783-Speed 10865.15 samples/sec   Loss 10.0324   LearningRate 0.0696   Epoch: 6   Global Step: 33510   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:29:27,707-Speed 11100.70 samples/sec   Loss 10.4713   LearningRate 0.0696   Epoch: 6   Global Step: 33520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:28,659-Speed 10766.14 samples/sec   Loss 10.3315   LearningRate 0.0696   Epoch: 6   Global Step: 33530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:29,615-Speed 10720.94 samples/sec   Loss 10.3917   LearningRate 0.0696   Epoch: 6   Global Step: 33540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:30,550-Speed 10950.12 samples/sec   Loss 10.2544   LearningRate 0.0696   Epoch: 6   Global Step: 33550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:31,608-Speed 9701.58 samples/sec   Loss 10.2470   LearningRate 0.0696   Epoch: 6   Global Step: 33560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:32,563-Speed 10731.12 samples/sec   Loss 10.3765   LearningRate 0.0696   Epoch: 6   Global Step: 33570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:33,469-Speed 11315.67 samples/sec   Loss 10.4989   LearningRate 0.0696   Epoch: 6   Global Step: 33580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:34,449-Speed 10465.27 samples/sec   Loss 10.5515   LearningRate 0.0696   Epoch: 6   Global Step: 33590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:35,454-Speed 10190.70 samples/sec   Loss 10.4054   LearningRate 0.0695   Epoch: 6   Global Step: 33600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:36,457-Speed 10219.90 samples/sec   Loss 10.4553   LearningRate 0.0695   Epoch: 6   Global Step: 33610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:37,423-Speed 10611.43 samples/sec   Loss 10.3029   LearningRate 0.0695   Epoch: 6   Global Step: 33620   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:29:38,397-Speed 10535.40 samples/sec   Loss 10.4562   LearningRate 0.0695   Epoch: 6   Global Step: 33630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:39,357-Speed 10674.43 samples/sec   Loss 10.2891   LearningRate 0.0695   Epoch: 6   Global Step: 33640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:40,328-Speed 10554.78 samples/sec   Loss 10.3862   LearningRate 0.0695   Epoch: 6   Global Step: 33650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:41,308-Speed 10454.53 samples/sec   Loss 10.6078   LearningRate 0.0695   Epoch: 6   Global Step: 33660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:42,271-Speed 10642.82 samples/sec   Loss 10.3704   LearningRate 0.0695   Epoch: 6   Global Step: 33670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:43,224-Speed 10759.59 samples/sec   Loss 10.3785   LearningRate 0.0695   Epoch: 6   Global Step: 33680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:44,195-Speed 10554.25 samples/sec   Loss 10.3661   LearningRate 0.0695   Epoch: 6   Global Step: 33690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:45,168-Speed 10536.80 samples/sec   Loss 10.4159   LearningRate 0.0695   Epoch: 6   Global Step: 33700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:46,135-Speed 10590.22 samples/sec   Loss 10.4230   LearningRate 0.0695   Epoch: 6   Global Step: 33710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:47,059-Speed 11102.65 samples/sec   Loss 10.1511   LearningRate 0.0694   Epoch: 6   Global Step: 33720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:48,021-Speed 10646.97 samples/sec   Loss 10.4018   LearningRate 0.0694   Epoch: 6   Global Step: 33730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:49,003-Speed 10439.62 samples/sec   Loss 10.2957   LearningRate 0.0694   Epoch: 6   Global Step: 33740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:49,979-Speed 10503.50 samples/sec   Loss 10.4847   LearningRate 0.0694   Epoch: 6   Global Step: 33750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:50,910-Speed 11017.79 samples/sec   Loss 10.3067   LearningRate 0.0694   Epoch: 6   Global Step: 33760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:51,858-Speed 10803.91 samples/sec   Loss 10.3686   LearningRate 0.0694   Epoch: 6   Global Step: 33770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:52,837-Speed 10473.03 samples/sec   Loss 10.5533   LearningRate 0.0694   Epoch: 6   Global Step: 33780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:53,789-Speed 10760.69 samples/sec   Loss 10.2918   LearningRate 0.0694   Epoch: 6   Global Step: 33790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:54,784-Speed 10303.94 samples/sec   Loss 10.3846   LearningRate 0.0694   Epoch: 6   Global Step: 33800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:55,732-Speed 10809.79 samples/sec   Loss 10.3508   LearningRate 0.0694   Epoch: 6   Global Step: 33810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:56,677-Speed 10842.40 samples/sec   Loss 10.5005   LearningRate 0.0694   Epoch: 6   Global Step: 33820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:57,632-Speed 10726.95 samples/sec   Loss 10.2332   LearningRate 0.0694   Epoch: 6   Global Step: 33830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:58,604-Speed 10548.34 samples/sec   Loss 10.2317   LearningRate 0.0693   Epoch: 6   Global Step: 33840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:29:59,605-Speed 10236.31 samples/sec   Loss 10.2020   LearningRate 0.0693   Epoch: 6   Global Step: 33850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:30:00,567-Speed 10661.97 samples/sec   Loss 10.3432   LearningRate 0.0693   Epoch: 6   Global Step: 33860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:30:01,538-Speed 10552.42 samples/sec   Loss 10.3792   LearningRate 0.0693   Epoch: 6   Global Step: 33870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:30:02,493-Speed 10729.25 samples/sec   Loss 10.4040   LearningRate 0.0693   Epoch: 6   Global Step: 33880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:30:03,471-Speed 10482.18 samples/sec   Loss 10.3875   LearningRate 0.0693   Epoch: 6   Global Step: 33890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:30:04,460-Speed 10364.03 samples/sec   Loss 10.3751   LearningRate 0.0693   Epoch: 6   Global Step: 33900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:30:05,409-Speed 10804.06 samples/sec   Loss 10.3393   LearningRate 0.0693   Epoch: 6   Global Step: 33910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:30:06,366-Speed 10713.68 samples/sec   Loss 10.2732   LearningRate 0.0693   Epoch: 6   Global Step: 33920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:30:07,325-Speed 10693.64 samples/sec   Loss 10.3862   LearningRate 0.0693   Epoch: 6   Global Step: 33930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:30:08,298-Speed 10526.71 samples/sec   Loss 10.2166   LearningRate 0.0693   Epoch: 6   Global Step: 33940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:30:09,298-Speed 10253.70 samples/sec   Loss 10.4314   LearningRate 0.0693   Epoch: 6   Global Step: 33950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:30:10,270-Speed 10552.58 samples/sec   Loss 10.1298   LearningRate 0.0692   Epoch: 6   Global Step: 33960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:30:11,236-Speed 10610.33 samples/sec   Loss 10.3645   LearningRate 0.0692   Epoch: 6   Global Step: 33970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:30:12,191-Speed 10729.57 samples/sec   Loss 10.3290   LearningRate 0.0692   Epoch: 6   Global Step: 33980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:30:13,141-Speed 10782.81 samples/sec   Loss 10.5502   LearningRate 0.0692   Epoch: 6   Global Step: 33990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:30:14,117-Speed 10498.86 samples/sec   Loss 10.4134   LearningRate 0.0692   Epoch: 6   Global Step: 34000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:30:36,727-[lfw][34000]XNorm: 14.047067
Training: 2022-04-11 00:30:36,728-[lfw][34000]Accuracy-Flip: 0.99500+-0.00357
Training: 2022-04-11 00:30:36,728-[lfw][34000]Accuracy-Highest: 0.99517
Training: 2022-04-11 00:31:02,078-[cfp_fp][34000]XNorm: 11.864357
Training: 2022-04-11 00:31:02,079-[cfp_fp][34000]Accuracy-Flip: 0.94686+-0.01124
Training: 2022-04-11 00:31:02,079-[cfp_fp][34000]Accuracy-Highest: 0.94686
Training: 2022-04-11 00:31:23,997-[agedb_30][34000]XNorm: 13.713110
Training: 2022-04-11 00:31:23,998-[agedb_30][34000]Accuracy-Flip: 0.95350+-0.00967
Training: 2022-04-11 00:31:23,998-[agedb_30][34000]Accuracy-Highest: 0.95400
Training: 2022-04-11 00:31:24,987-Speed 144.49 samples/sec   Loss 10.3233   LearningRate 0.0692   Epoch: 6   Global Step: 34010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:31:25,912-Speed 11080.72 samples/sec   Loss 10.3897   LearningRate 0.0692   Epoch: 6   Global Step: 34020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:31:26,854-Speed 10877.02 samples/sec   Loss 10.4314   LearningRate 0.0692   Epoch: 6   Global Step: 34030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:31:27,820-Speed 10602.09 samples/sec   Loss 10.5103   LearningRate 0.0692   Epoch: 6   Global Step: 34040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:31:28,787-Speed 10609.12 samples/sec   Loss 10.4267   LearningRate 0.0692   Epoch: 6   Global Step: 34050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:31:29,727-Speed 10909.06 samples/sec   Loss 10.3937   LearningRate 0.0692   Epoch: 6   Global Step: 34060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:31:30,699-Speed 10539.52 samples/sec   Loss 10.4893   LearningRate 0.0692   Epoch: 6   Global Step: 34070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:31:31,678-Speed 10476.29 samples/sec   Loss 10.3262   LearningRate 0.0691   Epoch: 6   Global Step: 34080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:31:32,673-Speed 10299.75 samples/sec   Loss 10.4659   LearningRate 0.0691   Epoch: 6   Global Step: 34090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:31:33,663-Speed 10358.16 samples/sec   Loss 10.3867   LearningRate 0.0691   Epoch: 6   Global Step: 34100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:31:34,631-Speed 10585.73 samples/sec   Loss 10.4140   LearningRate 0.0691   Epoch: 6   Global Step: 34110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:31:35,582-Speed 10781.80 samples/sec   Loss 10.4835   LearningRate 0.0691   Epoch: 6   Global Step: 34120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:31:36,553-Speed 10556.37 samples/sec   Loss 10.3131   LearningRate 0.0691   Epoch: 6   Global Step: 34130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:31:37,545-Speed 10333.17 samples/sec   Loss 10.4611   LearningRate 0.0691   Epoch: 6   Global Step: 34140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:31:38,515-Speed 10570.00 samples/sec   Loss 10.3194   LearningRate 0.0691   Epoch: 6   Global Step: 34150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:31:39,486-Speed 10551.76 samples/sec   Loss 10.5097   LearningRate 0.0691   Epoch: 6   Global Step: 34160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:31:40,463-Speed 10489.10 samples/sec   Loss 10.4815   LearningRate 0.0691   Epoch: 6   Global Step: 34170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:31:41,437-Speed 10524.60 samples/sec   Loss 10.3910   LearningRate 0.0691   Epoch: 6   Global Step: 34180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:31:42,426-Speed 10364.44 samples/sec   Loss 10.4849   LearningRate 0.0691   Epoch: 6   Global Step: 34190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:31:43,388-Speed 10665.88 samples/sec   Loss 10.1827   LearningRate 0.0690   Epoch: 6   Global Step: 34200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:31:44,352-Speed 10633.19 samples/sec   Loss 10.3348   LearningRate 0.0690   Epoch: 6   Global Step: 34210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:31:45,304-Speed 10758.49 samples/sec   Loss 10.2434   LearningRate 0.0690   Epoch: 6   Global Step: 34220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:31:46,253-Speed 10798.09 samples/sec   Loss 10.3749   LearningRate 0.0690   Epoch: 6   Global Step: 34230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:31:47,219-Speed 10744.28 samples/sec   Loss 10.2008   LearningRate 0.0690   Epoch: 6   Global Step: 34240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:31:48,217-Speed 10273.96 samples/sec   Loss 10.3148   LearningRate 0.0690   Epoch: 6   Global Step: 34250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:31:49,195-Speed 10486.82 samples/sec   Loss 10.2540   LearningRate 0.0690   Epoch: 6   Global Step: 34260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:31:50,144-Speed 10794.17 samples/sec   Loss 10.3308   LearningRate 0.0690   Epoch: 6   Global Step: 34270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:31:51,103-Speed 10689.83 samples/sec   Loss 10.2429   LearningRate 0.0690   Epoch: 6   Global Step: 34280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:31:52,077-Speed 10528.76 samples/sec   Loss 10.4349   LearningRate 0.0690   Epoch: 6   Global Step: 34290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:31:53,062-Speed 10404.78 samples/sec   Loss 10.1458   LearningRate 0.0690   Epoch: 6   Global Step: 34300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:31:54,053-Speed 10354.95 samples/sec   Loss 10.3695   LearningRate 0.0690   Epoch: 6   Global Step: 34310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:31:55,003-Speed 10783.62 samples/sec   Loss 10.2946   LearningRate 0.0690   Epoch: 6   Global Step: 34320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:31:55,955-Speed 10763.02 samples/sec   Loss 10.4044   LearningRate 0.0689   Epoch: 6   Global Step: 34330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:31:56,973-Speed 10068.34 samples/sec   Loss 10.3802   LearningRate 0.0689   Epoch: 6   Global Step: 34340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:31:57,945-Speed 10551.46 samples/sec   Loss 10.3352   LearningRate 0.0689   Epoch: 6   Global Step: 34350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:31:58,890-Speed 10851.33 samples/sec   Loss 10.4524   LearningRate 0.0689   Epoch: 6   Global Step: 34360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:31:59,869-Speed 10471.35 samples/sec   Loss 10.2530   LearningRate 0.0689   Epoch: 6   Global Step: 34370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:32:00,834-Speed 10615.90 samples/sec   Loss 10.5472   LearningRate 0.0689   Epoch: 6   Global Step: 34380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:32:01,785-Speed 10785.44 samples/sec   Loss 10.4499   LearningRate 0.0689   Epoch: 6   Global Step: 34390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:32:02,773-Speed 10376.84 samples/sec   Loss 10.4258   LearningRate 0.0689   Epoch: 6   Global Step: 34400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:32:03,715-Speed 10872.63 samples/sec   Loss 10.5188   LearningRate 0.0689   Epoch: 6   Global Step: 34410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:32:04,697-Speed 10441.97 samples/sec   Loss 10.3149   LearningRate 0.0689   Epoch: 6   Global Step: 34420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:32:05,666-Speed 10580.67 samples/sec   Loss 10.4384   LearningRate 0.0689   Epoch: 6   Global Step: 34430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:32:06,651-Speed 10407.90 samples/sec   Loss 10.3868   LearningRate 0.0689   Epoch: 6   Global Step: 34440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:32:07,619-Speed 10583.99 samples/sec   Loss 10.2840   LearningRate 0.0688   Epoch: 6   Global Step: 34450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:32:08,594-Speed 10512.41 samples/sec   Loss 10.4059   LearningRate 0.0688   Epoch: 6   Global Step: 34460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:32:09,580-Speed 10398.02 samples/sec   Loss 10.3071   LearningRate 0.0688   Epoch: 6   Global Step: 34470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:32:10,541-Speed 10658.37 samples/sec   Loss 10.4919   LearningRate 0.0688   Epoch: 6   Global Step: 34480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:32:11,542-Speed 10249.52 samples/sec   Loss 10.1936   LearningRate 0.0688   Epoch: 6   Global Step: 34490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:32:12,509-Speed 10599.61 samples/sec   Loss 10.1863   LearningRate 0.0688   Epoch: 6   Global Step: 34500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:32:13,472-Speed 10650.41 samples/sec   Loss 10.2652   LearningRate 0.0688   Epoch: 6   Global Step: 34510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:32:14,433-Speed 10671.87 samples/sec   Loss 10.3655   LearningRate 0.0688   Epoch: 6   Global Step: 34520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:32:15,389-Speed 10716.18 samples/sec   Loss 10.3944   LearningRate 0.0688   Epoch: 6   Global Step: 34530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:32:16,387-Speed 10269.24 samples/sec   Loss 10.4767   LearningRate 0.0688   Epoch: 6   Global Step: 34540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:32:17,366-Speed 10475.74 samples/sec   Loss 10.3537   LearningRate 0.0688   Epoch: 6   Global Step: 34550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:32:18,329-Speed 10632.90 samples/sec   Loss 10.3492   LearningRate 0.0688   Epoch: 6   Global Step: 34560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:32:19,303-Speed 10531.98 samples/sec   Loss 10.3317   LearningRate 0.0687   Epoch: 6   Global Step: 34570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:32:20,321-Speed 10066.39 samples/sec   Loss 10.3072   LearningRate 0.0687   Epoch: 6   Global Step: 34580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:32:21,293-Speed 10545.90 samples/sec   Loss 10.3295   LearningRate 0.0687   Epoch: 6   Global Step: 34590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:32:22,252-Speed 10691.66 samples/sec   Loss 10.3026   LearningRate 0.0687   Epoch: 6   Global Step: 34600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:32:23,218-Speed 10612.59 samples/sec   Loss 10.2869   LearningRate 0.0687   Epoch: 6   Global Step: 34610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:32:24,210-Speed 10331.11 samples/sec   Loss 10.2972   LearningRate 0.0687   Epoch: 6   Global Step: 34620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:32:25,163-Speed 10757.14 samples/sec   Loss 10.3231   LearningRate 0.0687   Epoch: 6   Global Step: 34630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:32:26,111-Speed 10818.91 samples/sec   Loss 10.3378   LearningRate 0.0687   Epoch: 6   Global Step: 34640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:32:27,096-Speed 10398.02 samples/sec   Loss 10.3555   LearningRate 0.0687   Epoch: 6   Global Step: 34650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:32:28,058-Speed 10661.14 samples/sec   Loss 10.3839   LearningRate 0.0687   Epoch: 6   Global Step: 34660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:32:29,128-Speed 9578.37 samples/sec   Loss 10.4385   LearningRate 0.0687   Epoch: 6   Global Step: 34670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:32:30,100-Speed 10554.22 samples/sec   Loss 10.3133   LearningRate 0.0687   Epoch: 6   Global Step: 34680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:32:31,037-Speed 10938.70 samples/sec   Loss 10.3966   LearningRate 0.0686   Epoch: 6   Global Step: 34690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:32:31,986-Speed 10799.05 samples/sec   Loss 10.3521   LearningRate 0.0686   Epoch: 6   Global Step: 34700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:32:32,955-Speed 10580.49 samples/sec   Loss 10.2727   LearningRate 0.0686   Epoch: 6   Global Step: 34710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:32:33,927-Speed 10548.89 samples/sec   Loss 10.2812   LearningRate 0.0686   Epoch: 6   Global Step: 34720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:32:34,892-Speed 10617.88 samples/sec   Loss 10.4155   LearningRate 0.0686   Epoch: 6   Global Step: 34730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:32:35,825-Speed 10996.54 samples/sec   Loss 10.3310   LearningRate 0.0686   Epoch: 6   Global Step: 34740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:32:36,796-Speed 10551.71 samples/sec   Loss 10.4616   LearningRate 0.0686   Epoch: 6   Global Step: 34750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:32:37,764-Speed 10586.40 samples/sec   Loss 10.2856   LearningRate 0.0686   Epoch: 6   Global Step: 34760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:32:38,758-Speed 10315.80 samples/sec   Loss 10.2182   LearningRate 0.0686   Epoch: 6   Global Step: 34770   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:32:39,694-Speed 10954.74 samples/sec   Loss 10.1925   LearningRate 0.0686   Epoch: 6   Global Step: 34780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:32:40,669-Speed 10507.99 samples/sec   Loss 10.4067   LearningRate 0.0686   Epoch: 6   Global Step: 34790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:32:41,651-Speed 10445.02 samples/sec   Loss 10.2306   LearningRate 0.0686   Epoch: 6   Global Step: 34800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:32:42,621-Speed 10565.94 samples/sec   Loss 10.4304   LearningRate 0.0685   Epoch: 6   Global Step: 34810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:32:43,585-Speed 10640.09 samples/sec   Loss 10.3535   LearningRate 0.0685   Epoch: 6   Global Step: 34820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:32:44,565-Speed 10457.86 samples/sec   Loss 10.3527   LearningRate 0.0685   Epoch: 6   Global Step: 34830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:32:45,509-Speed 10857.91 samples/sec   Loss 10.2596   LearningRate 0.0685   Epoch: 6   Global Step: 34840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:32:46,441-Speed 10994.56 samples/sec   Loss 10.2798   LearningRate 0.0685   Epoch: 6   Global Step: 34850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:32:47,451-Speed 10155.19 samples/sec   Loss 10.3630   LearningRate 0.0685   Epoch: 6   Global Step: 34860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:32:48,429-Speed 10481.84 samples/sec   Loss 10.3630   LearningRate 0.0685   Epoch: 6   Global Step: 34870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:32:49,389-Speed 10670.39 samples/sec   Loss 10.3272   LearningRate 0.0685   Epoch: 6   Global Step: 34880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:32:50,325-Speed 10948.25 samples/sec   Loss 10.2268   LearningRate 0.0685   Epoch: 6   Global Step: 34890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:32:51,295-Speed 10569.51 samples/sec   Loss 10.2660   LearningRate 0.0685   Epoch: 6   Global Step: 34900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:32:52,271-Speed 10499.20 samples/sec   Loss 10.3116   LearningRate 0.0685   Epoch: 6   Global Step: 34910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:32:53,264-Speed 10324.09 samples/sec   Loss 10.2674   LearningRate 0.0685   Epoch: 6   Global Step: 34920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:32:54,227-Speed 10646.97 samples/sec   Loss 10.4065   LearningRate 0.0685   Epoch: 6   Global Step: 34930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:32:55,175-Speed 10813.42 samples/sec   Loss 10.4379   LearningRate 0.0684   Epoch: 6   Global Step: 34940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:32:56,177-Speed 10223.63 samples/sec   Loss 10.3715   LearningRate 0.0684   Epoch: 6   Global Step: 34950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:32:57,144-Speed 10604.20 samples/sec   Loss 10.1988   LearningRate 0.0684   Epoch: 6   Global Step: 34960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:32:58,120-Speed 10499.23 samples/sec   Loss 10.4534   LearningRate 0.0684   Epoch: 6   Global Step: 34970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:32:59,096-Speed 10498.20 samples/sec   Loss 10.3681   LearningRate 0.0684   Epoch: 6   Global Step: 34980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:33:00,056-Speed 10674.95 samples/sec   Loss 10.2862   LearningRate 0.0684   Epoch: 6   Global Step: 34990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:33:01,026-Speed 10572.54 samples/sec   Loss 10.3889   LearningRate 0.0684   Epoch: 6   Global Step: 35000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:33:02,019-Speed 10313.94 samples/sec   Loss 10.3426   LearningRate 0.0684   Epoch: 6   Global Step: 35010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:33:02,976-Speed 10717.30 samples/sec   Loss 10.1994   LearningRate 0.0684   Epoch: 6   Global Step: 35020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:33:03,939-Speed 10649.31 samples/sec   Loss 10.2016   LearningRate 0.0684   Epoch: 6   Global Step: 35030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:33:04,920-Speed 10444.07 samples/sec   Loss 10.1392   LearningRate 0.0684   Epoch: 6   Global Step: 35040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:33:05,851-Speed 11013.52 samples/sec   Loss 10.3948   LearningRate 0.0684   Epoch: 6   Global Step: 35050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:33:06,821-Speed 10571.39 samples/sec   Loss 10.2877   LearningRate 0.0683   Epoch: 6   Global Step: 35060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:33:07,805-Speed 10409.95 samples/sec   Loss 10.3077   LearningRate 0.0683   Epoch: 6   Global Step: 35070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:33:08,764-Speed 10691.68 samples/sec   Loss 10.3674   LearningRate 0.0683   Epoch: 6   Global Step: 35080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:33:09,768-Speed 10209.92 samples/sec   Loss 10.2405   LearningRate 0.0683   Epoch: 6   Global Step: 35090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:33:10,724-Speed 10720.49 samples/sec   Loss 10.3305   LearningRate 0.0683   Epoch: 6   Global Step: 35100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:33:11,707-Speed 10431.97 samples/sec   Loss 10.3524   LearningRate 0.0683   Epoch: 6   Global Step: 35110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:33:12,712-Speed 10196.01 samples/sec   Loss 10.2589   LearningRate 0.0683   Epoch: 6   Global Step: 35120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:33:13,676-Speed 10630.02 samples/sec   Loss 10.3331   LearningRate 0.0683   Epoch: 6   Global Step: 35130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:33:14,625-Speed 10801.55 samples/sec   Loss 10.3880   LearningRate 0.0683   Epoch: 6   Global Step: 35140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:33:15,563-Speed 10934.97 samples/sec   Loss 10.3105   LearningRate 0.0683   Epoch: 6   Global Step: 35150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:33:16,557-Speed 10303.73 samples/sec   Loss 10.5280   LearningRate 0.0683   Epoch: 6   Global Step: 35160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:33:17,533-Speed 10501.82 samples/sec   Loss 10.2262   LearningRate 0.0683   Epoch: 6   Global Step: 35170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:33:18,499-Speed 10614.34 samples/sec   Loss 10.3714   LearningRate 0.0682   Epoch: 6   Global Step: 35180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:33:19,463-Speed 10633.19 samples/sec   Loss 10.3839   LearningRate 0.0682   Epoch: 6   Global Step: 35190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:33:20,439-Speed 10501.34 samples/sec   Loss 10.3208   LearningRate 0.0682   Epoch: 6   Global Step: 35200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:33:21,420-Speed 10453.84 samples/sec   Loss 10.2898   LearningRate 0.0682   Epoch: 6   Global Step: 35210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:33:22,354-Speed 10979.67 samples/sec   Loss 10.3078   LearningRate 0.0682   Epoch: 6   Global Step: 35220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:33:23,307-Speed 10752.03 samples/sec   Loss 10.3691   LearningRate 0.0682   Epoch: 6   Global Step: 35230   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:33:24,249-Speed 10875.04 samples/sec   Loss 10.4258   LearningRate 0.0682   Epoch: 6   Global Step: 35240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:33:25,252-Speed 10222.05 samples/sec   Loss 10.1321   LearningRate 0.0682   Epoch: 6   Global Step: 35250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:33:26,276-Speed 10004.81 samples/sec   Loss 10.3327   LearningRate 0.0682   Epoch: 6   Global Step: 35260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:33:27,245-Speed 10587.30 samples/sec   Loss 10.4956   LearningRate 0.0682   Epoch: 6   Global Step: 35270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:33:28,187-Speed 10878.87 samples/sec   Loss 10.3677   LearningRate 0.0682   Epoch: 6   Global Step: 35280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:33:29,141-Speed 10739.41 samples/sec   Loss 10.4817   LearningRate 0.0682   Epoch: 6   Global Step: 35290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:33:30,131-Speed 10347.81 samples/sec   Loss 10.4323   LearningRate 0.0681   Epoch: 6   Global Step: 35300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:33:31,084-Speed 10759.24 samples/sec   Loss 10.1754   LearningRate 0.0681   Epoch: 6   Global Step: 35310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:33:32,057-Speed 10532.17 samples/sec   Loss 10.1725   LearningRate 0.0681   Epoch: 6   Global Step: 35320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:33:33,018-Speed 10672.53 samples/sec   Loss 10.2387   LearningRate 0.0681   Epoch: 6   Global Step: 35330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:33:34,035-Speed 10080.45 samples/sec   Loss 10.2148   LearningRate 0.0681   Epoch: 6   Global Step: 35340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:33:35,011-Speed 10498.51 samples/sec   Loss 10.3760   LearningRate 0.0681   Epoch: 6   Global Step: 35350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:33:35,962-Speed 10782.05 samples/sec   Loss 10.1919   LearningRate 0.0681   Epoch: 6   Global Step: 35360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:33:36,914-Speed 10766.53 samples/sec   Loss 10.1641   LearningRate 0.0681   Epoch: 6   Global Step: 35370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:33:37,869-Speed 10728.81 samples/sec   Loss 10.2041   LearningRate 0.0681   Epoch: 6   Global Step: 35380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:33:38,897-Speed 9966.37 samples/sec   Loss 10.2763   LearningRate 0.0681   Epoch: 6   Global Step: 35390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:33:40,042-Speed 8982.91 samples/sec   Loss 10.3767   LearningRate 0.0681   Epoch: 6   Global Step: 35400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:33:50,525-Speed 977.16 samples/sec   Loss 9.9956   LearningRate 0.0681   Epoch: 7   Global Step: 35410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:33:51,531-Speed 10194.49 samples/sec   Loss 9.3909   LearningRate 0.0681   Epoch: 7   Global Step: 35420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:33:52,532-Speed 10239.18 samples/sec   Loss 9.2777   LearningRate 0.0680   Epoch: 7   Global Step: 35430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:33:53,547-Speed 10095.12 samples/sec   Loss 9.3655   LearningRate 0.0680   Epoch: 7   Global Step: 35440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:33:54,718-Speed 8759.40 samples/sec   Loss 9.3141   LearningRate 0.0680   Epoch: 7   Global Step: 35450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:33:55,917-Speed 8546.68 samples/sec   Loss 9.3856   LearningRate 0.0680   Epoch: 7   Global Step: 35460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:33:56,992-Speed 9528.75 samples/sec   Loss 9.3107   LearningRate 0.0680   Epoch: 7   Global Step: 35470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:33:57,955-Speed 10647.18 samples/sec   Loss 9.3316   LearningRate 0.0680   Epoch: 7   Global Step: 35480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:33:58,901-Speed 10844.68 samples/sec   Loss 9.3099   LearningRate 0.0680   Epoch: 7   Global Step: 35490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:33:59,861-Speed 10675.77 samples/sec   Loss 9.4530   LearningRate 0.0680   Epoch: 7   Global Step: 35500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:34:00,811-Speed 10782.43 samples/sec   Loss 9.4364   LearningRate 0.0680   Epoch: 7   Global Step: 35510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:34:01,835-Speed 10007.06 samples/sec   Loss 9.4463   LearningRate 0.0680   Epoch: 7   Global Step: 35520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:34:02,821-Speed 10401.80 samples/sec   Loss 9.5063   LearningRate 0.0680   Epoch: 7   Global Step: 35530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:34:03,794-Speed 10534.26 samples/sec   Loss 9.5311   LearningRate 0.0680   Epoch: 7   Global Step: 35540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:34:04,719-Speed 11078.93 samples/sec   Loss 9.5836   LearningRate 0.0679   Epoch: 7   Global Step: 35550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:34:05,679-Speed 10664.65 samples/sec   Loss 9.5640   LearningRate 0.0679   Epoch: 7   Global Step: 35560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:34:06,654-Speed 10514.51 samples/sec   Loss 9.5197   LearningRate 0.0679   Epoch: 7   Global Step: 35570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:34:07,637-Speed 10418.37 samples/sec   Loss 9.3733   LearningRate 0.0679   Epoch: 7   Global Step: 35580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:34:08,643-Speed 10198.72 samples/sec   Loss 9.5455   LearningRate 0.0679   Epoch: 7   Global Step: 35590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:34:09,591-Speed 10804.13 samples/sec   Loss 9.4067   LearningRate 0.0679   Epoch: 7   Global Step: 35600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:34:10,553-Speed 10655.50 samples/sec   Loss 9.5276   LearningRate 0.0679   Epoch: 7   Global Step: 35610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:34:11,518-Speed 10622.96 samples/sec   Loss 9.3252   LearningRate 0.0679   Epoch: 7   Global Step: 35620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:34:12,509-Speed 10336.23 samples/sec   Loss 9.4128   LearningRate 0.0679   Epoch: 7   Global Step: 35630   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:34:13,556-Speed 9787.53 samples/sec   Loss 9.5590   LearningRate 0.0679   Epoch: 7   Global Step: 35640   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:34:14,503-Speed 10824.83 samples/sec   Loss 9.5499   LearningRate 0.0679   Epoch: 7   Global Step: 35650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:34:15,431-Speed 11043.82 samples/sec   Loss 9.6269   LearningRate 0.0679   Epoch: 7   Global Step: 35660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:34:16,418-Speed 10389.49 samples/sec   Loss 9.4442   LearningRate 0.0678   Epoch: 7   Global Step: 35670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:34:17,410-Speed 10329.34 samples/sec   Loss 9.5589   LearningRate 0.0678   Epoch: 7   Global Step: 35680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:34:18,375-Speed 10620.38 samples/sec   Loss 9.7219   LearningRate 0.0678   Epoch: 7   Global Step: 35690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:34:19,348-Speed 10536.93 samples/sec   Loss 9.7471   LearningRate 0.0678   Epoch: 7   Global Step: 35700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:34:20,336-Speed 10377.22 samples/sec   Loss 9.5786   LearningRate 0.0678   Epoch: 7   Global Step: 35710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:34:21,320-Speed 10423.14 samples/sec   Loss 9.5717   LearningRate 0.0678   Epoch: 7   Global Step: 35720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:34:22,301-Speed 10449.69 samples/sec   Loss 9.6715   LearningRate 0.0678   Epoch: 7   Global Step: 35730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:34:23,294-Speed 10327.23 samples/sec   Loss 9.6217   LearningRate 0.0678   Epoch: 7   Global Step: 35740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:34:24,281-Speed 10400.16 samples/sec   Loss 9.6284   LearningRate 0.0678   Epoch: 7   Global Step: 35750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:34:25,216-Speed 10964.45 samples/sec   Loss 9.5345   LearningRate 0.0678   Epoch: 7   Global Step: 35760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:34:26,194-Speed 10487.25 samples/sec   Loss 9.5048   LearningRate 0.0678   Epoch: 7   Global Step: 35770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:34:27,152-Speed 10692.88 samples/sec   Loss 9.7301   LearningRate 0.0678   Epoch: 7   Global Step: 35780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:34:28,162-Speed 10146.22 samples/sec   Loss 9.4932   LearningRate 0.0677   Epoch: 7   Global Step: 35790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:34:29,122-Speed 10670.99 samples/sec   Loss 9.6980   LearningRate 0.0677   Epoch: 7   Global Step: 35800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:34:30,057-Speed 10975.83 samples/sec   Loss 9.6701   LearningRate 0.0677   Epoch: 7   Global Step: 35810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:34:31,081-Speed 10002.07 samples/sec   Loss 9.5332   LearningRate 0.0677   Epoch: 7   Global Step: 35820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:34:32,161-Speed 9497.33 samples/sec   Loss 9.6680   LearningRate 0.0677   Epoch: 7   Global Step: 35830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:34:33,265-Speed 9279.01 samples/sec   Loss 9.7069   LearningRate 0.0677   Epoch: 7   Global Step: 35840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:34:34,237-Speed 10550.72 samples/sec   Loss 9.7369   LearningRate 0.0677   Epoch: 7   Global Step: 35850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:34:35,170-Speed 10983.63 samples/sec   Loss 9.7600   LearningRate 0.0677   Epoch: 7   Global Step: 35860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:34:36,165-Speed 10294.62 samples/sec   Loss 9.7765   LearningRate 0.0677   Epoch: 7   Global Step: 35870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:34:37,215-Speed 9766.23 samples/sec   Loss 9.7177   LearningRate 0.0677   Epoch: 7   Global Step: 35880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:34:38,160-Speed 10843.76 samples/sec   Loss 9.6668   LearningRate 0.0677   Epoch: 7   Global Step: 35890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:34:39,101-Speed 10888.24 samples/sec   Loss 9.7673   LearningRate 0.0677   Epoch: 7   Global Step: 35900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:34:40,052-Speed 10774.33 samples/sec   Loss 9.6833   LearningRate 0.0677   Epoch: 7   Global Step: 35910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:34:41,015-Speed 10647.82 samples/sec   Loss 9.7617   LearningRate 0.0676   Epoch: 7   Global Step: 35920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:34:41,984-Speed 10573.85 samples/sec   Loss 9.7943   LearningRate 0.0676   Epoch: 7   Global Step: 35930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:34:42,943-Speed 10692.34 samples/sec   Loss 9.8047   LearningRate 0.0676   Epoch: 7   Global Step: 35940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:34:43,882-Speed 10920.94 samples/sec   Loss 9.6965   LearningRate 0.0676   Epoch: 7   Global Step: 35950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:34:44,804-Speed 11111.68 samples/sec   Loss 9.6909   LearningRate 0.0676   Epoch: 7   Global Step: 35960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:34:45,782-Speed 10482.07 samples/sec   Loss 9.8238   LearningRate 0.0676   Epoch: 7   Global Step: 35970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:34:46,775-Speed 10326.83 samples/sec   Loss 9.7101   LearningRate 0.0676   Epoch: 7   Global Step: 35980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:34:47,742-Speed 10594.13 samples/sec   Loss 9.6943   LearningRate 0.0676   Epoch: 7   Global Step: 35990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:34:48,677-Speed 10958.69 samples/sec   Loss 9.8606   LearningRate 0.0676   Epoch: 7   Global Step: 36000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:35:10,971-[lfw][36000]XNorm: 13.976989
Training: 2022-04-11 00:35:10,972-[lfw][36000]Accuracy-Flip: 0.99450+-0.00358
Training: 2022-04-11 00:35:10,972-[lfw][36000]Accuracy-Highest: 0.99517
Training: 2022-04-11 00:35:36,379-[cfp_fp][36000]XNorm: 11.838854
Training: 2022-04-11 00:35:36,380-[cfp_fp][36000]Accuracy-Flip: 0.94986+-0.01387
Training: 2022-04-11 00:35:36,380-[cfp_fp][36000]Accuracy-Highest: 0.94986
Training: 2022-04-11 00:35:58,393-[agedb_30][36000]XNorm: 13.613294
Training: 2022-04-11 00:35:58,393-[agedb_30][36000]Accuracy-Flip: 0.95267+-0.00987
Training: 2022-04-11 00:35:58,395-[agedb_30][36000]Accuracy-Highest: 0.95400
Training: 2022-04-11 00:35:59,348-Speed 144.90 samples/sec   Loss 9.8404   LearningRate 0.0676   Epoch: 7   Global Step: 36010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:36:00,335-Speed 10385.72 samples/sec   Loss 9.6737   LearningRate 0.0676   Epoch: 7   Global Step: 36020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:36:01,298-Speed 10646.27 samples/sec   Loss 9.8591   LearningRate 0.0676   Epoch: 7   Global Step: 36030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:36:02,268-Speed 10556.79 samples/sec   Loss 9.6740   LearningRate 0.0675   Epoch: 7   Global Step: 36040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:36:03,220-Speed 10764.57 samples/sec   Loss 9.7620   LearningRate 0.0675   Epoch: 7   Global Step: 36050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:36:04,195-Speed 10516.89 samples/sec   Loss 9.7476   LearningRate 0.0675   Epoch: 7   Global Step: 36060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:36:05,174-Speed 10473.91 samples/sec   Loss 9.8510   LearningRate 0.0675   Epoch: 7   Global Step: 36070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:36:06,141-Speed 10616.13 samples/sec   Loss 9.7701   LearningRate 0.0675   Epoch: 7   Global Step: 36080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:36:07,098-Speed 10701.70 samples/sec   Loss 9.9249   LearningRate 0.0675   Epoch: 7   Global Step: 36090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:36:08,132-Speed 9912.60 samples/sec   Loss 9.7357   LearningRate 0.0675   Epoch: 7   Global Step: 36100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:09,135-Speed 10229.62 samples/sec   Loss 10.0361   LearningRate 0.0675   Epoch: 7   Global Step: 36110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:10,096-Speed 10654.56 samples/sec   Loss 9.8498   LearningRate 0.0675   Epoch: 7   Global Step: 36120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:11,083-Speed 10394.90 samples/sec   Loss 10.1036   LearningRate 0.0675   Epoch: 7   Global Step: 36130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:12,074-Speed 10331.45 samples/sec   Loss 9.8616   LearningRate 0.0675   Epoch: 7   Global Step: 36140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:13,052-Speed 10485.71 samples/sec   Loss 9.8399   LearningRate 0.0675   Epoch: 7   Global Step: 36150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:14,014-Speed 10664.23 samples/sec   Loss 9.7760   LearningRate 0.0674   Epoch: 7   Global Step: 36160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:14,982-Speed 10584.49 samples/sec   Loss 9.6441   LearningRate 0.0674   Epoch: 7   Global Step: 36170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:15,925-Speed 10862.97 samples/sec   Loss 10.0105   LearningRate 0.0674   Epoch: 7   Global Step: 36180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:16,915-Speed 10359.33 samples/sec   Loss 9.9479   LearningRate 0.0674   Epoch: 7   Global Step: 36190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:17,874-Speed 10691.65 samples/sec   Loss 9.7262   LearningRate 0.0674   Epoch: 7   Global Step: 36200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:18,834-Speed 10677.18 samples/sec   Loss 9.9434   LearningRate 0.0674   Epoch: 7   Global Step: 36210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:19,804-Speed 10563.53 samples/sec   Loss 9.7726   LearningRate 0.0674   Epoch: 7   Global Step: 36220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:20,801-Speed 10277.72 samples/sec   Loss 9.8844   LearningRate 0.0674   Epoch: 7   Global Step: 36230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:21,753-Speed 10770.36 samples/sec   Loss 9.9055   LearningRate 0.0674   Epoch: 7   Global Step: 36240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:22,714-Speed 10667.53 samples/sec   Loss 10.0098   LearningRate 0.0674   Epoch: 7   Global Step: 36250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:23,674-Speed 10673.97 samples/sec   Loss 9.9627   LearningRate 0.0674   Epoch: 7   Global Step: 36260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:24,678-Speed 10202.87 samples/sec   Loss 9.9054   LearningRate 0.0674   Epoch: 7   Global Step: 36270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:25,622-Speed 10868.89 samples/sec   Loss 9.8749   LearningRate 0.0674   Epoch: 7   Global Step: 36280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:26,563-Speed 10888.21 samples/sec   Loss 10.0031   LearningRate 0.0673   Epoch: 7   Global Step: 36290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:27,533-Speed 10565.81 samples/sec   Loss 9.7795   LearningRate 0.0673   Epoch: 7   Global Step: 36300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:28,510-Speed 10493.85 samples/sec   Loss 9.8138   LearningRate 0.0673   Epoch: 7   Global Step: 36310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:29,493-Speed 10434.88 samples/sec   Loss 9.8410   LearningRate 0.0673   Epoch: 7   Global Step: 36320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:30,463-Speed 10561.62 samples/sec   Loss 9.9619   LearningRate 0.0673   Epoch: 7   Global Step: 36330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:31,420-Speed 10716.00 samples/sec   Loss 9.8367   LearningRate 0.0673   Epoch: 7   Global Step: 36340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:32,369-Speed 10790.35 samples/sec   Loss 9.7024   LearningRate 0.0673   Epoch: 7   Global Step: 36350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:33,323-Speed 10748.14 samples/sec   Loss 9.9980   LearningRate 0.0673   Epoch: 7   Global Step: 36360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:34,292-Speed 10582.74 samples/sec   Loss 9.9048   LearningRate 0.0673   Epoch: 7   Global Step: 36370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:35,258-Speed 10602.80 samples/sec   Loss 9.9199   LearningRate 0.0673   Epoch: 7   Global Step: 36380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:36,244-Speed 10397.98 samples/sec   Loss 9.9287   LearningRate 0.0673   Epoch: 7   Global Step: 36390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:37,164-Speed 11139.05 samples/sec   Loss 9.8469   LearningRate 0.0673   Epoch: 7   Global Step: 36400   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:36:38,127-Speed 10649.88 samples/sec   Loss 9.9206   LearningRate 0.0672   Epoch: 7   Global Step: 36410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:39,070-Speed 10863.39 samples/sec   Loss 9.9187   LearningRate 0.0672   Epoch: 7   Global Step: 36420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:40,116-Speed 9798.55 samples/sec   Loss 9.7674   LearningRate 0.0672   Epoch: 7   Global Step: 36430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:41,104-Speed 10381.65 samples/sec   Loss 9.9425   LearningRate 0.0672   Epoch: 7   Global Step: 36440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:42,091-Speed 10384.47 samples/sec   Loss 9.9561   LearningRate 0.0672   Epoch: 7   Global Step: 36450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:43,058-Speed 10599.62 samples/sec   Loss 9.8080   LearningRate 0.0672   Epoch: 7   Global Step: 36460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:44,048-Speed 10349.94 samples/sec   Loss 10.0591   LearningRate 0.0672   Epoch: 7   Global Step: 36470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:45,054-Speed 10195.15 samples/sec   Loss 9.8660   LearningRate 0.0672   Epoch: 7   Global Step: 36480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:46,025-Speed 10553.07 samples/sec   Loss 9.8063   LearningRate 0.0672   Epoch: 7   Global Step: 36490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:47,011-Speed 10397.23 samples/sec   Loss 9.7500   LearningRate 0.0672   Epoch: 7   Global Step: 36500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:47,969-Speed 10697.79 samples/sec   Loss 9.9745   LearningRate 0.0672   Epoch: 7   Global Step: 36510   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:36:48,949-Speed 10464.03 samples/sec   Loss 9.9973   LearningRate 0.0672   Epoch: 7   Global Step: 36520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:49,926-Speed 10492.54 samples/sec   Loss 9.8319   LearningRate 0.0671   Epoch: 7   Global Step: 36530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:50,896-Speed 10559.37 samples/sec   Loss 9.9965   LearningRate 0.0671   Epoch: 7   Global Step: 36540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:51,862-Speed 10612.41 samples/sec   Loss 10.0171   LearningRate 0.0671   Epoch: 7   Global Step: 36550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:52,821-Speed 10695.66 samples/sec   Loss 9.8521   LearningRate 0.0671   Epoch: 7   Global Step: 36560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:53,801-Speed 10457.72 samples/sec   Loss 10.0027   LearningRate 0.0671   Epoch: 7   Global Step: 36570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:54,834-Speed 9919.98 samples/sec   Loss 9.9839   LearningRate 0.0671   Epoch: 7   Global Step: 36580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:55,793-Speed 10692.89 samples/sec   Loss 9.9437   LearningRate 0.0671   Epoch: 7   Global Step: 36590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:56,781-Speed 10370.83 samples/sec   Loss 9.9835   LearningRate 0.0671   Epoch: 7   Global Step: 36600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:57,771-Speed 10351.79 samples/sec   Loss 10.0405   LearningRate 0.0671   Epoch: 7   Global Step: 36610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:36:58,747-Speed 10510.39 samples/sec   Loss 9.9394   LearningRate 0.0671   Epoch: 7   Global Step: 36620   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:36:59,686-Speed 10915.39 samples/sec   Loss 9.9264   LearningRate 0.0671   Epoch: 7   Global Step: 36630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:00,654-Speed 10579.97 samples/sec   Loss 9.9362   LearningRate 0.0671   Epoch: 7   Global Step: 36640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:01,612-Speed 10700.74 samples/sec   Loss 9.9261   LearningRate 0.0671   Epoch: 7   Global Step: 36650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:02,582-Speed 10563.56 samples/sec   Loss 9.8882   LearningRate 0.0670   Epoch: 7   Global Step: 36660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:03,611-Speed 9962.46 samples/sec   Loss 9.9360   LearningRate 0.0670   Epoch: 7   Global Step: 36670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:04,577-Speed 10612.90 samples/sec   Loss 10.0760   LearningRate 0.0670   Epoch: 7   Global Step: 36680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:05,542-Speed 10621.73 samples/sec   Loss 10.0469   LearningRate 0.0670   Epoch: 7   Global Step: 36690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:06,510-Speed 10584.00 samples/sec   Loss 10.0138   LearningRate 0.0670   Epoch: 7   Global Step: 36700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:07,482-Speed 10544.91 samples/sec   Loss 9.8143   LearningRate 0.0670   Epoch: 7   Global Step: 36710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:08,440-Speed 10692.93 samples/sec   Loss 10.0085   LearningRate 0.0670   Epoch: 7   Global Step: 36720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:09,420-Speed 10469.24 samples/sec   Loss 10.0900   LearningRate 0.0670   Epoch: 7   Global Step: 36730   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:37:10,373-Speed 10754.99 samples/sec   Loss 10.0626   LearningRate 0.0670   Epoch: 7   Global Step: 36740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:11,348-Speed 10516.57 samples/sec   Loss 10.0562   LearningRate 0.0670   Epoch: 7   Global Step: 36750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:12,343-Speed 10294.12 samples/sec   Loss 9.9697   LearningRate 0.0670   Epoch: 7   Global Step: 36760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:13,329-Speed 10391.78 samples/sec   Loss 9.8456   LearningRate 0.0670   Epoch: 7   Global Step: 36770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:14,287-Speed 10702.03 samples/sec   Loss 9.8633   LearningRate 0.0669   Epoch: 7   Global Step: 36780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:15,244-Speed 10717.65 samples/sec   Loss 9.9621   LearningRate 0.0669   Epoch: 7   Global Step: 36790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:16,198-Speed 10740.43 samples/sec   Loss 9.9915   LearningRate 0.0669   Epoch: 7   Global Step: 36800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:17,136-Speed 10931.87 samples/sec   Loss 9.9348   LearningRate 0.0669   Epoch: 7   Global Step: 36810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:18,148-Speed 10123.91 samples/sec   Loss 10.1110   LearningRate 0.0669   Epoch: 7   Global Step: 36820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:19,108-Speed 10676.78 samples/sec   Loss 9.9258   LearningRate 0.0669   Epoch: 7   Global Step: 36830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:20,079-Speed 10557.96 samples/sec   Loss 9.8660   LearningRate 0.0669   Epoch: 7   Global Step: 36840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:21,032-Speed 10747.46 samples/sec   Loss 9.9142   LearningRate 0.0669   Epoch: 7   Global Step: 36850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:22,004-Speed 10542.40 samples/sec   Loss 9.9228   LearningRate 0.0669   Epoch: 7   Global Step: 36860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:23,053-Speed 9771.42 samples/sec   Loss 9.9959   LearningRate 0.0669   Epoch: 7   Global Step: 36870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:23,982-Speed 11043.90 samples/sec   Loss 10.0037   LearningRate 0.0669   Epoch: 7   Global Step: 36880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:24,951-Speed 10579.33 samples/sec   Loss 9.9177   LearningRate 0.0669   Epoch: 7   Global Step: 36890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:25,944-Speed 10331.59 samples/sec   Loss 9.9869   LearningRate 0.0668   Epoch: 7   Global Step: 36900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:26,907-Speed 10640.37 samples/sec   Loss 10.0683   LearningRate 0.0668   Epoch: 7   Global Step: 36910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:27,868-Speed 10672.52 samples/sec   Loss 9.9346   LearningRate 0.0668   Epoch: 7   Global Step: 36920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:28,829-Speed 10658.03 samples/sec   Loss 10.0357   LearningRate 0.0668   Epoch: 7   Global Step: 36930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:29,791-Speed 10655.87 samples/sec   Loss 10.0185   LearningRate 0.0668   Epoch: 7   Global Step: 36940   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:37:30,786-Speed 10299.70 samples/sec   Loss 10.1574   LearningRate 0.0668   Epoch: 7   Global Step: 36950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:31,761-Speed 10508.73 samples/sec   Loss 9.9375   LearningRate 0.0668   Epoch: 7   Global Step: 36960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:32,745-Speed 10422.29 samples/sec   Loss 10.0617   LearningRate 0.0668   Epoch: 7   Global Step: 36970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:33,715-Speed 10565.27 samples/sec   Loss 9.9429   LearningRate 0.0668   Epoch: 7   Global Step: 36980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:34,666-Speed 10772.48 samples/sec   Loss 9.8598   LearningRate 0.0668   Epoch: 7   Global Step: 36990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:35,633-Speed 10602.83 samples/sec   Loss 9.9352   LearningRate 0.0668   Epoch: 7   Global Step: 37000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:36,645-Speed 10124.95 samples/sec   Loss 9.9204   LearningRate 0.0668   Epoch: 7   Global Step: 37010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:37,604-Speed 10688.73 samples/sec   Loss 10.1063   LearningRate 0.0668   Epoch: 7   Global Step: 37020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:38,550-Speed 10836.00 samples/sec   Loss 10.0872   LearningRate 0.0667   Epoch: 7   Global Step: 37030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:37:39,537-Speed 10379.51 samples/sec   Loss 9.9569   LearningRate 0.0667   Epoch: 7   Global Step: 37040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:37:40,492-Speed 10736.84 samples/sec   Loss 10.1126   LearningRate 0.0667   Epoch: 7   Global Step: 37050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:37:41,464-Speed 10536.88 samples/sec   Loss 10.1145   LearningRate 0.0667   Epoch: 7   Global Step: 37060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:37:42,450-Speed 10402.35 samples/sec   Loss 10.0701   LearningRate 0.0667   Epoch: 7   Global Step: 37070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:37:43,421-Speed 10554.32 samples/sec   Loss 10.1623   LearningRate 0.0667   Epoch: 7   Global Step: 37080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:37:44,412-Speed 10343.86 samples/sec   Loss 10.0460   LearningRate 0.0667   Epoch: 7   Global Step: 37090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:37:45,363-Speed 10773.51 samples/sec   Loss 9.8917   LearningRate 0.0667   Epoch: 7   Global Step: 37100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:37:46,332-Speed 10576.94 samples/sec   Loss 10.1760   LearningRate 0.0667   Epoch: 7   Global Step: 37110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:37:47,298-Speed 10604.54 samples/sec   Loss 10.0641   LearningRate 0.0667   Epoch: 7   Global Step: 37120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:37:48,254-Speed 10727.72 samples/sec   Loss 9.8485   LearningRate 0.0667   Epoch: 7   Global Step: 37130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:49,245-Speed 10337.66 samples/sec   Loss 9.9964   LearningRate 0.0667   Epoch: 7   Global Step: 37140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:37:50,200-Speed 10745.20 samples/sec   Loss 10.0304   LearningRate 0.0666   Epoch: 7   Global Step: 37150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:37:51,147-Speed 10819.01 samples/sec   Loss 9.9584   LearningRate 0.0666   Epoch: 7   Global Step: 37160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:37:52,096-Speed 10793.49 samples/sec   Loss 10.0274   LearningRate 0.0666   Epoch: 7   Global Step: 37170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:37:53,033-Speed 10942.46 samples/sec   Loss 9.9737   LearningRate 0.0666   Epoch: 7   Global Step: 37180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:37:54,006-Speed 10546.00 samples/sec   Loss 9.8762   LearningRate 0.0666   Epoch: 7   Global Step: 37190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:37:54,952-Speed 10831.65 samples/sec   Loss 10.1052   LearningRate 0.0666   Epoch: 7   Global Step: 37200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:37:55,904-Speed 10765.60 samples/sec   Loss 10.1306   LearningRate 0.0666   Epoch: 7   Global Step: 37210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:37:56,850-Speed 10837.31 samples/sec   Loss 10.0314   LearningRate 0.0666   Epoch: 7   Global Step: 37220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:37:57,849-Speed 10268.45 samples/sec   Loss 10.1585   LearningRate 0.0666   Epoch: 7   Global Step: 37230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:37:58,803-Speed 10736.88 samples/sec   Loss 10.0243   LearningRate 0.0666   Epoch: 7   Global Step: 37240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:37:59,770-Speed 10607.72 samples/sec   Loss 9.8407   LearningRate 0.0666   Epoch: 7   Global Step: 37250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:38:00,764-Speed 10313.81 samples/sec   Loss 9.9096   LearningRate 0.0666   Epoch: 7   Global Step: 37260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:38:01,748-Speed 10412.28 samples/sec   Loss 9.9769   LearningRate 0.0666   Epoch: 7   Global Step: 37270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:38:02,763-Speed 10101.37 samples/sec   Loss 9.9330   LearningRate 0.0665   Epoch: 7   Global Step: 37280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:38:03,738-Speed 10508.93 samples/sec   Loss 10.1174   LearningRate 0.0665   Epoch: 7   Global Step: 37290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:38:04,703-Speed 10623.05 samples/sec   Loss 9.8985   LearningRate 0.0665   Epoch: 7   Global Step: 37300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:38:05,666-Speed 10643.03 samples/sec   Loss 9.8763   LearningRate 0.0665   Epoch: 7   Global Step: 37310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:38:06,626-Speed 10674.78 samples/sec   Loss 9.9737   LearningRate 0.0665   Epoch: 7   Global Step: 37320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:38:07,628-Speed 10228.28 samples/sec   Loss 9.9376   LearningRate 0.0665   Epoch: 7   Global Step: 37330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:38:08,626-Speed 10274.42 samples/sec   Loss 10.1629   LearningRate 0.0665   Epoch: 7   Global Step: 37340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:38:09,604-Speed 10484.62 samples/sec   Loss 10.0791   LearningRate 0.0665   Epoch: 7   Global Step: 37350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:38:10,561-Speed 10707.43 samples/sec   Loss 10.0444   LearningRate 0.0665   Epoch: 7   Global Step: 37360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:38:11,531-Speed 10559.56 samples/sec   Loss 9.8854   LearningRate 0.0665   Epoch: 7   Global Step: 37370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:38:12,493-Speed 10659.99 samples/sec   Loss 9.8846   LearningRate 0.0665   Epoch: 7   Global Step: 37380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:38:13,488-Speed 10300.54 samples/sec   Loss 10.0501   LearningRate 0.0665   Epoch: 7   Global Step: 37390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:38:14,455-Speed 10602.53 samples/sec   Loss 10.0960   LearningRate 0.0664   Epoch: 7   Global Step: 37400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:38:15,398-Speed 10869.05 samples/sec   Loss 10.0403   LearningRate 0.0664   Epoch: 7   Global Step: 37410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:38:16,370-Speed 10552.32 samples/sec   Loss 9.9029   LearningRate 0.0664   Epoch: 7   Global Step: 37420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:38:17,339-Speed 10568.86 samples/sec   Loss 9.9963   LearningRate 0.0664   Epoch: 7   Global Step: 37430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:38:18,324-Speed 10408.20 samples/sec   Loss 10.0397   LearningRate 0.0664   Epoch: 7   Global Step: 37440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:38:19,290-Speed 10616.14 samples/sec   Loss 10.1391   LearningRate 0.0664   Epoch: 7   Global Step: 37450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:38:20,229-Speed 10923.65 samples/sec   Loss 9.9947   LearningRate 0.0664   Epoch: 7   Global Step: 37460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:38:21,209-Speed 10453.74 samples/sec   Loss 10.0737   LearningRate 0.0664   Epoch: 7   Global Step: 37470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:38:22,179-Speed 10564.57 samples/sec   Loss 10.1576   LearningRate 0.0664   Epoch: 7   Global Step: 37480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:38:23,170-Speed 10373.53 samples/sec   Loss 9.9266   LearningRate 0.0664   Epoch: 7   Global Step: 37490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:38:24,210-Speed 9860.76 samples/sec   Loss 10.0759   LearningRate 0.0664   Epoch: 7   Global Step: 37500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:38:25,201-Speed 10344.66 samples/sec   Loss 9.9738   LearningRate 0.0664   Epoch: 7   Global Step: 37510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:38:26,161-Speed 10675.58 samples/sec   Loss 9.9054   LearningRate 0.0663   Epoch: 7   Global Step: 37520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:38:27,128-Speed 10605.09 samples/sec   Loss 10.0753   LearningRate 0.0663   Epoch: 7   Global Step: 37530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:38:28,086-Speed 10697.12 samples/sec   Loss 10.1568   LearningRate 0.0663   Epoch: 7   Global Step: 37540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:38:29,041-Speed 10730.78 samples/sec   Loss 10.1263   LearningRate 0.0663   Epoch: 7   Global Step: 37550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:38:30,058-Speed 10080.79 samples/sec   Loss 10.1391   LearningRate 0.0663   Epoch: 7   Global Step: 37560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:38:31,009-Speed 10781.09 samples/sec   Loss 10.2118   LearningRate 0.0663   Epoch: 7   Global Step: 37570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:38:31,988-Speed 10462.70 samples/sec   Loss 10.1025   LearningRate 0.0663   Epoch: 7   Global Step: 37580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:38:32,968-Speed 10460.71 samples/sec   Loss 10.0953   LearningRate 0.0663   Epoch: 7   Global Step: 37590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:38:33,960-Speed 10334.36 samples/sec   Loss 9.9948   LearningRate 0.0663   Epoch: 7   Global Step: 37600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:38:34,892-Speed 10993.27 samples/sec   Loss 10.0325   LearningRate 0.0663   Epoch: 7   Global Step: 37610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:38:35,884-Speed 10328.46 samples/sec   Loss 10.1327   LearningRate 0.0663   Epoch: 7   Global Step: 37620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:38:36,866-Speed 10445.08 samples/sec   Loss 10.0627   LearningRate 0.0663   Epoch: 7   Global Step: 37630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:38:37,840-Speed 10523.45 samples/sec   Loss 10.0446   LearningRate 0.0663   Epoch: 7   Global Step: 37640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:38:38,820-Speed 10463.67 samples/sec   Loss 10.1598   LearningRate 0.0662   Epoch: 7   Global Step: 37650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:38:39,773-Speed 10775.10 samples/sec   Loss 10.0622   LearningRate 0.0662   Epoch: 7   Global Step: 37660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:38:40,777-Speed 10204.16 samples/sec   Loss 10.1022   LearningRate 0.0662   Epoch: 7   Global Step: 37670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:38:41,797-Speed 10054.77 samples/sec   Loss 10.0648   LearningRate 0.0662   Epoch: 7   Global Step: 37680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:38:42,768-Speed 10550.54 samples/sec   Loss 9.8934   LearningRate 0.0662   Epoch: 7   Global Step: 37690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:38:43,738-Speed 10569.80 samples/sec   Loss 10.1139   LearningRate 0.0662   Epoch: 7   Global Step: 37700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:38:44,719-Speed 10447.91 samples/sec   Loss 9.9924   LearningRate 0.0662   Epoch: 7   Global Step: 37710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:38:45,675-Speed 10714.63 samples/sec   Loss 9.9814   LearningRate 0.0662   Epoch: 7   Global Step: 37720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:38:46,633-Speed 10698.41 samples/sec   Loss 9.9186   LearningRate 0.0662   Epoch: 7   Global Step: 37730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:38:47,656-Speed 10015.29 samples/sec   Loss 9.9427   LearningRate 0.0662   Epoch: 7   Global Step: 37740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:38:48,638-Speed 10438.25 samples/sec   Loss 10.1374   LearningRate 0.0662   Epoch: 7   Global Step: 37750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:38:49,634-Speed 10298.15 samples/sec   Loss 9.9999   LearningRate 0.0662   Epoch: 7   Global Step: 37760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:38:50,601-Speed 10601.77 samples/sec   Loss 9.9915   LearningRate 0.0661   Epoch: 7   Global Step: 37770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:38:51,543-Speed 10880.83 samples/sec   Loss 10.0647   LearningRate 0.0661   Epoch: 7   Global Step: 37780   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:38:52,529-Speed 10388.56 samples/sec   Loss 10.1807   LearningRate 0.0661   Epoch: 7   Global Step: 37790   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:38:53,512-Speed 10432.10 samples/sec   Loss 9.9957   LearningRate 0.0661   Epoch: 7   Global Step: 37800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:38:54,488-Speed 10497.28 samples/sec   Loss 10.1225   LearningRate 0.0661   Epoch: 7   Global Step: 37810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:38:55,442-Speed 10752.82 samples/sec   Loss 10.1383   LearningRate 0.0661   Epoch: 7   Global Step: 37820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:38:56,392-Speed 10789.40 samples/sec   Loss 10.0471   LearningRate 0.0661   Epoch: 7   Global Step: 37830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:38:57,368-Speed 10501.50 samples/sec   Loss 10.0693   LearningRate 0.0661   Epoch: 7   Global Step: 37840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:38:58,371-Speed 10222.64 samples/sec   Loss 10.0368   LearningRate 0.0661   Epoch: 7   Global Step: 37850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:38:59,351-Speed 10460.51 samples/sec   Loss 9.9599   LearningRate 0.0661   Epoch: 7   Global Step: 37860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:39:00,305-Speed 10741.76 samples/sec   Loss 10.0577   LearningRate 0.0661   Epoch: 7   Global Step: 37870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:39:01,236-Speed 11015.69 samples/sec   Loss 10.0377   LearningRate 0.0661   Epoch: 7   Global Step: 37880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:39:02,213-Speed 10486.61 samples/sec   Loss 10.0665   LearningRate 0.0661   Epoch: 7   Global Step: 37890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:39:03,166-Speed 10754.47 samples/sec   Loss 9.8427   LearningRate 0.0660   Epoch: 7   Global Step: 37900   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:39:04,115-Speed 10795.65 samples/sec   Loss 10.1504   LearningRate 0.0660   Epoch: 7   Global Step: 37910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:39:05,080-Speed 10635.03 samples/sec   Loss 10.0626   LearningRate 0.0660   Epoch: 7   Global Step: 37920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:39:06,044-Speed 10633.46 samples/sec   Loss 10.1778   LearningRate 0.0660   Epoch: 7   Global Step: 37930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:39:06,998-Speed 10740.38 samples/sec   Loss 10.1982   LearningRate 0.0660   Epoch: 7   Global Step: 37940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:39:07,967-Speed 10580.03 samples/sec   Loss 9.8934   LearningRate 0.0660   Epoch: 7   Global Step: 37950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:39:08,949-Speed 10436.80 samples/sec   Loss 9.9860   LearningRate 0.0660   Epoch: 7   Global Step: 37960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:39:09,917-Speed 10589.32 samples/sec   Loss 10.0598   LearningRate 0.0660   Epoch: 7   Global Step: 37970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:39:10,909-Speed 10333.18 samples/sec   Loss 10.1240   LearningRate 0.0660   Epoch: 7   Global Step: 37980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:39:11,865-Speed 10726.40 samples/sec   Loss 10.0402   LearningRate 0.0660   Epoch: 7   Global Step: 37990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:39:12,842-Speed 10482.45 samples/sec   Loss 9.9569   LearningRate 0.0660   Epoch: 7   Global Step: 38000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:39:34,988-[lfw][38000]XNorm: 13.746561
Training: 2022-04-11 00:39:34,989-[lfw][38000]Accuracy-Flip: 0.99383+-0.00409
Training: 2022-04-11 00:39:34,990-[lfw][38000]Accuracy-Highest: 0.99517
Training: 2022-04-11 00:40:00,299-[cfp_fp][38000]XNorm: 11.714241
Training: 2022-04-11 00:40:00,305-[cfp_fp][38000]Accuracy-Flip: 0.94729+-0.01418
Training: 2022-04-11 00:40:00,306-[cfp_fp][38000]Accuracy-Highest: 0.94986
Training: 2022-04-11 00:40:22,812-[agedb_30][38000]XNorm: 13.470733
Training: 2022-04-11 00:40:22,813-[agedb_30][38000]Accuracy-Flip: 0.95033+-0.01132
Training: 2022-04-11 00:40:22,813-[agedb_30][38000]Accuracy-Highest: 0.95400
Training: 2022-04-11 00:40:23,764-Speed 144.39 samples/sec   Loss 10.1533   LearningRate 0.0660   Epoch: 7   Global Step: 38010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:40:24,735-Speed 10556.05 samples/sec   Loss 10.1899   LearningRate 0.0659   Epoch: 7   Global Step: 38020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:40:25,691-Speed 10726.71 samples/sec   Loss 10.2118   LearningRate 0.0659   Epoch: 7   Global Step: 38030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:40:26,646-Speed 10727.41 samples/sec   Loss 10.2464   LearningRate 0.0659   Epoch: 7   Global Step: 38040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:40:27,643-Speed 10281.48 samples/sec   Loss 10.2801   LearningRate 0.0659   Epoch: 7   Global Step: 38050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:40:28,622-Speed 10469.11 samples/sec   Loss 9.9599   LearningRate 0.0659   Epoch: 7   Global Step: 38060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:40:29,603-Speed 10457.71 samples/sec   Loss 10.2060   LearningRate 0.0659   Epoch: 7   Global Step: 38070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:40:30,574-Speed 10556.71 samples/sec   Loss 9.8707   LearningRate 0.0659   Epoch: 7   Global Step: 38080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:40:31,523-Speed 10795.47 samples/sec   Loss 9.9067   LearningRate 0.0659   Epoch: 7   Global Step: 38090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:40:32,550-Speed 9984.20 samples/sec   Loss 10.0861   LearningRate 0.0659   Epoch: 7   Global Step: 38100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:40:33,495-Speed 10854.81 samples/sec   Loss 10.1145   LearningRate 0.0659   Epoch: 7   Global Step: 38110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:40:34,462-Speed 10597.26 samples/sec   Loss 10.0320   LearningRate 0.0659   Epoch: 7   Global Step: 38120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:40:35,450-Speed 10370.41 samples/sec   Loss 9.9655   LearningRate 0.0659   Epoch: 7   Global Step: 38130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:40:36,393-Speed 10873.10 samples/sec   Loss 10.0578   LearningRate 0.0659   Epoch: 7   Global Step: 38140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:40:37,354-Speed 10658.70 samples/sec   Loss 10.3578   LearningRate 0.0658   Epoch: 7   Global Step: 38150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:40:38,398-Speed 9817.00 samples/sec   Loss 10.0701   LearningRate 0.0658   Epoch: 7   Global Step: 38160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:40:39,373-Speed 10514.26 samples/sec   Loss 9.8664   LearningRate 0.0658   Epoch: 7   Global Step: 38170   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:40:40,337-Speed 10636.52 samples/sec   Loss 10.0642   LearningRate 0.0658   Epoch: 7   Global Step: 38180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:40:41,326-Speed 10363.04 samples/sec   Loss 9.9667   LearningRate 0.0658   Epoch: 7   Global Step: 38190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:40:42,306-Speed 10459.94 samples/sec   Loss 10.1246   LearningRate 0.0658   Epoch: 7   Global Step: 38200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:40:43,249-Speed 10868.01 samples/sec   Loss 10.0277   LearningRate 0.0658   Epoch: 7   Global Step: 38210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:40:44,171-Speed 11127.49 samples/sec   Loss 9.9068   LearningRate 0.0658   Epoch: 7   Global Step: 38220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:40:45,140-Speed 10575.46 samples/sec   Loss 10.0261   LearningRate 0.0658   Epoch: 7   Global Step: 38230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:40:46,092-Speed 10759.75 samples/sec   Loss 10.0015   LearningRate 0.0658   Epoch: 7   Global Step: 38240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:40:47,077-Speed 10409.97 samples/sec   Loss 10.1654   LearningRate 0.0658   Epoch: 7   Global Step: 38250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:40:48,089-Speed 10129.22 samples/sec   Loss 10.0525   LearningRate 0.0658   Epoch: 7   Global Step: 38260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:40:49,030-Speed 10891.53 samples/sec   Loss 10.2000   LearningRate 0.0657   Epoch: 7   Global Step: 38270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:40:49,977-Speed 10838.09 samples/sec   Loss 10.0507   LearningRate 0.0657   Epoch: 7   Global Step: 38280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:40:50,957-Speed 10453.57 samples/sec   Loss 10.0707   LearningRate 0.0657   Epoch: 7   Global Step: 38290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:40:51,944-Speed 10392.72 samples/sec   Loss 10.0005   LearningRate 0.0657   Epoch: 7   Global Step: 38300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:40:52,892-Speed 10817.15 samples/sec   Loss 10.0747   LearningRate 0.0657   Epoch: 7   Global Step: 38310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:40:53,875-Speed 10414.93 samples/sec   Loss 10.1203   LearningRate 0.0657   Epoch: 7   Global Step: 38320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:40:54,836-Speed 10663.79 samples/sec   Loss 10.3204   LearningRate 0.0657   Epoch: 7   Global Step: 38330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:40:55,810-Speed 10527.44 samples/sec   Loss 10.0454   LearningRate 0.0657   Epoch: 7   Global Step: 38340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:40:56,827-Speed 10076.33 samples/sec   Loss 10.1358   LearningRate 0.0657   Epoch: 7   Global Step: 38350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:40:57,787-Speed 10684.64 samples/sec   Loss 9.9729   LearningRate 0.0657   Epoch: 7   Global Step: 38360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:40:58,741-Speed 10741.67 samples/sec   Loss 10.2883   LearningRate 0.0657   Epoch: 7   Global Step: 38370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:40:59,692-Speed 10771.08 samples/sec   Loss 10.1725   LearningRate 0.0657   Epoch: 7   Global Step: 38380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:41:00,690-Speed 10286.78 samples/sec   Loss 9.9250   LearningRate 0.0657   Epoch: 7   Global Step: 38390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:41:01,669-Speed 10467.96 samples/sec   Loss 10.0762   LearningRate 0.0656   Epoch: 7   Global Step: 38400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:41:02,619-Speed 10789.89 samples/sec   Loss 9.9416   LearningRate 0.0656   Epoch: 7   Global Step: 38410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:41:03,566-Speed 10818.65 samples/sec   Loss 10.0168   LearningRate 0.0656   Epoch: 7   Global Step: 38420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:41:04,527-Speed 10666.83 samples/sec   Loss 9.9150   LearningRate 0.0656   Epoch: 7   Global Step: 38430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:41:05,465-Speed 10923.64 samples/sec   Loss 10.1798   LearningRate 0.0656   Epoch: 7   Global Step: 38440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:41:06,399-Speed 10980.29 samples/sec   Loss 9.9690   LearningRate 0.0656   Epoch: 7   Global Step: 38450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:41:07,376-Speed 10495.50 samples/sec   Loss 9.9076   LearningRate 0.0656   Epoch: 7   Global Step: 38460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:41:08,344-Speed 10589.29 samples/sec   Loss 9.9959   LearningRate 0.0656   Epoch: 7   Global Step: 38470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:41:09,299-Speed 10731.62 samples/sec   Loss 10.1848   LearningRate 0.0656   Epoch: 7   Global Step: 38480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:41:10,266-Speed 10594.22 samples/sec   Loss 10.0214   LearningRate 0.0656   Epoch: 7   Global Step: 38490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:41:11,251-Speed 10411.93 samples/sec   Loss 10.0537   LearningRate 0.0656   Epoch: 7   Global Step: 38500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:41:12,248-Speed 10281.44 samples/sec   Loss 10.1860   LearningRate 0.0656   Epoch: 7   Global Step: 38510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:41:13,222-Speed 10526.27 samples/sec   Loss 10.1640   LearningRate 0.0655   Epoch: 7   Global Step: 38520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:41:14,196-Speed 10529.55 samples/sec   Loss 10.0347   LearningRate 0.0655   Epoch: 7   Global Step: 38530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:41:15,134-Speed 10927.45 samples/sec   Loss 10.0249   LearningRate 0.0655   Epoch: 7   Global Step: 38540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:41:16,121-Speed 10380.59 samples/sec   Loss 10.0845   LearningRate 0.0655   Epoch: 7   Global Step: 38550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:41:17,116-Speed 10299.71 samples/sec   Loss 9.9703   LearningRate 0.0655   Epoch: 7   Global Step: 38560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:41:18,073-Speed 10714.60 samples/sec   Loss 10.1312   LearningRate 0.0655   Epoch: 7   Global Step: 38570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:41:19,044-Speed 10554.59 samples/sec   Loss 10.0484   LearningRate 0.0655   Epoch: 7   Global Step: 38580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:41:20,035-Speed 10347.32 samples/sec   Loss 10.0934   LearningRate 0.0655   Epoch: 7   Global Step: 38590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:41:21,003-Speed 10591.26 samples/sec   Loss 10.0159   LearningRate 0.0655   Epoch: 7   Global Step: 38600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:41:21,965-Speed 10654.48 samples/sec   Loss 10.0505   LearningRate 0.0655   Epoch: 7   Global Step: 38610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:41:22,974-Speed 10191.65 samples/sec   Loss 10.1673   LearningRate 0.0655   Epoch: 7   Global Step: 38620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:41:23,966-Speed 10339.03 samples/sec   Loss 10.0645   LearningRate 0.0655   Epoch: 7   Global Step: 38630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:41:24,932-Speed 10610.68 samples/sec   Loss 10.3136   LearningRate 0.0655   Epoch: 7   Global Step: 38640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:41:25,915-Speed 10425.98 samples/sec   Loss 10.0154   LearningRate 0.0654   Epoch: 7   Global Step: 38650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:41:26,871-Speed 10713.43 samples/sec   Loss 9.9649   LearningRate 0.0654   Epoch: 7   Global Step: 38660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:41:27,837-Speed 10622.95 samples/sec   Loss 9.9431   LearningRate 0.0654   Epoch: 7   Global Step: 38670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:41:28,781-Speed 10854.54 samples/sec   Loss 10.2128   LearningRate 0.0654   Epoch: 7   Global Step: 38680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:41:29,748-Speed 10597.53 samples/sec   Loss 9.9356   LearningRate 0.0654   Epoch: 7   Global Step: 38690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:41:30,730-Speed 10438.97 samples/sec   Loss 9.9763   LearningRate 0.0654   Epoch: 7   Global Step: 38700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:41:31,667-Speed 10936.02 samples/sec   Loss 10.1117   LearningRate 0.0654   Epoch: 7   Global Step: 38710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:41:32,664-Speed 10284.05 samples/sec   Loss 10.1059   LearningRate 0.0654   Epoch: 7   Global Step: 38720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:41:33,662-Speed 10269.85 samples/sec   Loss 10.1881   LearningRate 0.0654   Epoch: 7   Global Step: 38730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:41:34,658-Speed 10285.53 samples/sec   Loss 10.1183   LearningRate 0.0654   Epoch: 7   Global Step: 38740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:41:35,622-Speed 10635.06 samples/sec   Loss 10.0099   LearningRate 0.0654   Epoch: 7   Global Step: 38750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:41:36,584-Speed 10661.80 samples/sec   Loss 10.0859   LearningRate 0.0654   Epoch: 7   Global Step: 38760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:41:37,545-Speed 10657.31 samples/sec   Loss 10.0232   LearningRate 0.0653   Epoch: 7   Global Step: 38770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:41:38,523-Speed 10483.31 samples/sec   Loss 10.1556   LearningRate 0.0653   Epoch: 7   Global Step: 38780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:41:39,503-Speed 10461.13 samples/sec   Loss 9.9472   LearningRate 0.0653   Epoch: 7   Global Step: 38790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:41:40,462-Speed 10688.66 samples/sec   Loss 9.9958   LearningRate 0.0653   Epoch: 7   Global Step: 38800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:41:41,420-Speed 10694.30 samples/sec   Loss 10.0857   LearningRate 0.0653   Epoch: 7   Global Step: 38810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:41:42,376-Speed 10728.76 samples/sec   Loss 9.8343   LearningRate 0.0653   Epoch: 7   Global Step: 38820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:41:43,356-Speed 10453.47 samples/sec   Loss 10.2630   LearningRate 0.0653   Epoch: 7   Global Step: 38830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:41:44,345-Speed 10365.26 samples/sec   Loss 9.9829   LearningRate 0.0653   Epoch: 7   Global Step: 38840   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:41:45,278-Speed 10982.82 samples/sec   Loss 9.9438   LearningRate 0.0653   Epoch: 7   Global Step: 38850   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:41:46,227-Speed 10793.99 samples/sec   Loss 9.9037   LearningRate 0.0653   Epoch: 7   Global Step: 38860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:41:47,220-Speed 10321.48 samples/sec   Loss 9.9725   LearningRate 0.0653   Epoch: 7   Global Step: 38870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:41:48,223-Speed 10223.42 samples/sec   Loss 10.1464   LearningRate 0.0653   Epoch: 7   Global Step: 38880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:41:49,172-Speed 10798.11 samples/sec   Loss 10.1497   LearningRate 0.0653   Epoch: 7   Global Step: 38890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:41:50,147-Speed 10513.89 samples/sec   Loss 9.8906   LearningRate 0.0652   Epoch: 7   Global Step: 38900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:41:51,118-Speed 10563.50 samples/sec   Loss 9.9511   LearningRate 0.0652   Epoch: 7   Global Step: 38910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:41:52,065-Speed 10821.60 samples/sec   Loss 9.8081   LearningRate 0.0652   Epoch: 7   Global Step: 38920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:41:53,015-Speed 10790.47 samples/sec   Loss 10.0432   LearningRate 0.0652   Epoch: 7   Global Step: 38930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:41:53,970-Speed 10734.31 samples/sec   Loss 9.9194   LearningRate 0.0652   Epoch: 7   Global Step: 38940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:41:54,948-Speed 10474.06 samples/sec   Loss 10.3118   LearningRate 0.0652   Epoch: 7   Global Step: 38950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:41:55,908-Speed 10684.58 samples/sec   Loss 10.1002   LearningRate 0.0652   Epoch: 7   Global Step: 38960   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:41:56,875-Speed 10595.49 samples/sec   Loss 9.9317   LearningRate 0.0652   Epoch: 7   Global Step: 38970   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:41:57,848-Speed 10528.52 samples/sec   Loss 10.1353   LearningRate 0.0652   Epoch: 7   Global Step: 38980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:41:58,844-Speed 10290.07 samples/sec   Loss 10.0983   LearningRate 0.0652   Epoch: 7   Global Step: 38990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:41:59,802-Speed 10697.54 samples/sec   Loss 10.0454   LearningRate 0.0652   Epoch: 7   Global Step: 39000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:42:00,768-Speed 10609.70 samples/sec   Loss 10.1361   LearningRate 0.0652   Epoch: 7   Global Step: 39010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:42:01,764-Speed 10300.07 samples/sec   Loss 10.0995   LearningRate 0.0651   Epoch: 7   Global Step: 39020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:42:02,753-Speed 10357.13 samples/sec   Loss 9.9502   LearningRate 0.0651   Epoch: 7   Global Step: 39030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:42:03,733-Speed 10463.76 samples/sec   Loss 10.1594   LearningRate 0.0651   Epoch: 7   Global Step: 39040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:42:04,697-Speed 10626.25 samples/sec   Loss 10.0950   LearningRate 0.0651   Epoch: 7   Global Step: 39050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:42:05,661-Speed 10641.00 samples/sec   Loss 10.0001   LearningRate 0.0651   Epoch: 7   Global Step: 39060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:42:06,630-Speed 10570.53 samples/sec   Loss 9.8723   LearningRate 0.0651   Epoch: 7   Global Step: 39070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:42:07,563-Speed 10985.31 samples/sec   Loss 9.9211   LearningRate 0.0651   Epoch: 7   Global Step: 39080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:42:08,538-Speed 10514.91 samples/sec   Loss 9.8202   LearningRate 0.0651   Epoch: 7   Global Step: 39090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:42:09,543-Speed 10196.18 samples/sec   Loss 10.1449   LearningRate 0.0651   Epoch: 7   Global Step: 39100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:10,495-Speed 10772.03 samples/sec   Loss 10.1999   LearningRate 0.0651   Epoch: 7   Global Step: 39110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:11,460-Speed 10622.68 samples/sec   Loss 10.0442   LearningRate 0.0651   Epoch: 7   Global Step: 39120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:12,441-Speed 10442.31 samples/sec   Loss 9.8522   LearningRate 0.0651   Epoch: 7   Global Step: 39130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:13,383-Speed 10877.44 samples/sec   Loss 10.0162   LearningRate 0.0651   Epoch: 7   Global Step: 39140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:14,368-Speed 10409.51 samples/sec   Loss 10.0443   LearningRate 0.0650   Epoch: 7   Global Step: 39150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:15,328-Speed 10679.32 samples/sec   Loss 10.0103   LearningRate 0.0650   Epoch: 7   Global Step: 39160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:16,301-Speed 10529.98 samples/sec   Loss 9.9816   LearningRate 0.0650   Epoch: 7   Global Step: 39170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:17,256-Speed 10732.40 samples/sec   Loss 10.0208   LearningRate 0.0650   Epoch: 7   Global Step: 39180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:18,232-Speed 10499.60 samples/sec   Loss 10.1027   LearningRate 0.0650   Epoch: 7   Global Step: 39190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:19,210-Speed 10486.59 samples/sec   Loss 9.9642   LearningRate 0.0650   Epoch: 7   Global Step: 39200   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:42:20,189-Speed 10462.00 samples/sec   Loss 10.0500   LearningRate 0.0650   Epoch: 7   Global Step: 39210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:21,172-Speed 10425.08 samples/sec   Loss 9.7986   LearningRate 0.0650   Epoch: 7   Global Step: 39220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:22,145-Speed 10530.59 samples/sec   Loss 10.2439   LearningRate 0.0650   Epoch: 7   Global Step: 39230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:23,095-Speed 10796.84 samples/sec   Loss 10.1178   LearningRate 0.0650   Epoch: 7   Global Step: 39240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:24,066-Speed 10561.43 samples/sec   Loss 10.0213   LearningRate 0.0650   Epoch: 7   Global Step: 39250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:25,021-Speed 10727.31 samples/sec   Loss 10.1539   LearningRate 0.0650   Epoch: 7   Global Step: 39260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:25,985-Speed 10634.80 samples/sec   Loss 9.9731   LearningRate 0.0649   Epoch: 7   Global Step: 39270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:26,957-Speed 10535.08 samples/sec   Loss 10.0446   LearningRate 0.0649   Epoch: 7   Global Step: 39280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:27,917-Speed 10685.60 samples/sec   Loss 10.0047   LearningRate 0.0649   Epoch: 7   Global Step: 39290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:28,901-Speed 10418.68 samples/sec   Loss 10.0027   LearningRate 0.0649   Epoch: 7   Global Step: 39300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:29,882-Speed 10447.69 samples/sec   Loss 9.8859   LearningRate 0.0649   Epoch: 7   Global Step: 39310   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:42:30,877-Speed 10303.21 samples/sec   Loss 9.9831   LearningRate 0.0649   Epoch: 7   Global Step: 39320   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:42:31,815-Speed 10928.82 samples/sec   Loss 9.9454   LearningRate 0.0649   Epoch: 7   Global Step: 39330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:32,780-Speed 10624.64 samples/sec   Loss 10.0783   LearningRate 0.0649   Epoch: 7   Global Step: 39340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:33,722-Speed 10878.52 samples/sec   Loss 9.9256   LearningRate 0.0649   Epoch: 7   Global Step: 39350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:34,649-Speed 11061.19 samples/sec   Loss 9.9859   LearningRate 0.0649   Epoch: 7   Global Step: 39360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:35,611-Speed 10650.22 samples/sec   Loss 9.9676   LearningRate 0.0649   Epoch: 7   Global Step: 39370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:36,564-Speed 10752.40 samples/sec   Loss 9.9513   LearningRate 0.0649   Epoch: 7   Global Step: 39380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:37,555-Speed 10345.91 samples/sec   Loss 9.8565   LearningRate 0.0649   Epoch: 7   Global Step: 39390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:38,517-Speed 10658.84 samples/sec   Loss 9.9635   LearningRate 0.0648   Epoch: 7   Global Step: 39400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:39,462-Speed 10841.67 samples/sec   Loss 10.2122   LearningRate 0.0648   Epoch: 7   Global Step: 39410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:40,419-Speed 10717.02 samples/sec   Loss 10.1264   LearningRate 0.0648   Epoch: 7   Global Step: 39420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:41,381-Speed 10647.32 samples/sec   Loss 9.9451   LearningRate 0.0648   Epoch: 7   Global Step: 39430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:42,327-Speed 10841.68 samples/sec   Loss 10.0547   LearningRate 0.0648   Epoch: 7   Global Step: 39440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:43,293-Speed 10610.31 samples/sec   Loss 10.0646   LearningRate 0.0648   Epoch: 7   Global Step: 39450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:44,263-Speed 10567.48 samples/sec   Loss 9.9906   LearningRate 0.0648   Epoch: 7   Global Step: 39460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:45,234-Speed 10554.63 samples/sec   Loss 10.0897   LearningRate 0.0648   Epoch: 7   Global Step: 39470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:46,239-Speed 10198.95 samples/sec   Loss 9.9797   LearningRate 0.0648   Epoch: 7   Global Step: 39480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:47,165-Speed 11068.95 samples/sec   Loss 10.0243   LearningRate 0.0648   Epoch: 7   Global Step: 39490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:48,138-Speed 10533.34 samples/sec   Loss 10.1523   LearningRate 0.0648   Epoch: 7   Global Step: 39500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:49,067-Speed 11027.53 samples/sec   Loss 10.1137   LearningRate 0.0648   Epoch: 7   Global Step: 39510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:50,027-Speed 10678.63 samples/sec   Loss 9.9024   LearningRate 0.0647   Epoch: 7   Global Step: 39520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:51,005-Speed 10478.79 samples/sec   Loss 10.0343   LearningRate 0.0647   Epoch: 7   Global Step: 39530   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:42:51,997-Speed 10328.95 samples/sec   Loss 9.9238   LearningRate 0.0647   Epoch: 7   Global Step: 39540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:52,974-Speed 10505.88 samples/sec   Loss 10.0485   LearningRate 0.0647   Epoch: 7   Global Step: 39550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:53,936-Speed 10659.68 samples/sec   Loss 10.1596   LearningRate 0.0647   Epoch: 7   Global Step: 39560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:54,919-Speed 10424.92 samples/sec   Loss 10.0089   LearningRate 0.0647   Epoch: 7   Global Step: 39570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:55,905-Speed 10397.73 samples/sec   Loss 10.0022   LearningRate 0.0647   Epoch: 7   Global Step: 39580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:56,884-Speed 10470.06 samples/sec   Loss 9.9891   LearningRate 0.0647   Epoch: 7   Global Step: 39590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:42:57,832-Speed 10821.11 samples/sec   Loss 9.9569   LearningRate 0.0647   Epoch: 7   Global Step: 39600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:42:58,811-Speed 10463.95 samples/sec   Loss 9.9827   LearningRate 0.0647   Epoch: 7   Global Step: 39610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:42:59,856-Speed 9805.61 samples/sec   Loss 10.1291   LearningRate 0.0647   Epoch: 7   Global Step: 39620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:43:00,829-Speed 10541.40 samples/sec   Loss 10.0163   LearningRate 0.0647   Epoch: 7   Global Step: 39630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:43:01,818-Speed 10357.81 samples/sec   Loss 9.9943   LearningRate 0.0647   Epoch: 7   Global Step: 39640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:43:02,799-Speed 10447.94 samples/sec   Loss 9.9622   LearningRate 0.0646   Epoch: 7   Global Step: 39650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:43:03,788-Speed 10366.75 samples/sec   Loss 9.9894   LearningRate 0.0646   Epoch: 7   Global Step: 39660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:43:04,762-Speed 10524.24 samples/sec   Loss 10.1106   LearningRate 0.0646   Epoch: 7   Global Step: 39670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:43:05,743-Speed 10450.59 samples/sec   Loss 10.0624   LearningRate 0.0646   Epoch: 7   Global Step: 39680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:43:06,697-Speed 10737.80 samples/sec   Loss 9.9091   LearningRate 0.0646   Epoch: 7   Global Step: 39690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:43:07,682-Speed 10404.77 samples/sec   Loss 10.0747   LearningRate 0.0646   Epoch: 7   Global Step: 39700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:43:08,650-Speed 10586.07 samples/sec   Loss 9.8618   LearningRate 0.0646   Epoch: 7   Global Step: 39710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:43:09,626-Speed 10507.63 samples/sec   Loss 9.9847   LearningRate 0.0646   Epoch: 7   Global Step: 39720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:43:10,580-Speed 10739.88 samples/sec   Loss 9.9508   LearningRate 0.0646   Epoch: 7   Global Step: 39730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:43:11,532-Speed 10768.81 samples/sec   Loss 10.0057   LearningRate 0.0646   Epoch: 7   Global Step: 39740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:43:12,537-Speed 10197.73 samples/sec   Loss 9.9441   LearningRate 0.0646   Epoch: 7   Global Step: 39750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:43:13,508-Speed 10564.13 samples/sec   Loss 10.0219   LearningRate 0.0646   Epoch: 7   Global Step: 39760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:43:14,474-Speed 10609.95 samples/sec   Loss 10.1336   LearningRate 0.0646   Epoch: 7   Global Step: 39770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:43:15,442-Speed 10579.42 samples/sec   Loss 9.9734   LearningRate 0.0645   Epoch: 7   Global Step: 39780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:43:16,412-Speed 10575.29 samples/sec   Loss 9.9426   LearningRate 0.0645   Epoch: 7   Global Step: 39790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:43:17,412-Speed 10248.62 samples/sec   Loss 10.1831   LearningRate 0.0645   Epoch: 7   Global Step: 39800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:43:18,388-Speed 10497.00 samples/sec   Loss 10.0873   LearningRate 0.0645   Epoch: 7   Global Step: 39810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:43:19,373-Speed 10413.21 samples/sec   Loss 10.1423   LearningRate 0.0645   Epoch: 7   Global Step: 39820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:43:20,360-Speed 10377.48 samples/sec   Loss 9.9819   LearningRate 0.0645   Epoch: 7   Global Step: 39830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:43:21,384-Speed 10013.68 samples/sec   Loss 10.0484   LearningRate 0.0645   Epoch: 7   Global Step: 39840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:43:22,347-Speed 10643.57 samples/sec   Loss 9.9230   LearningRate 0.0645   Epoch: 7   Global Step: 39850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:43:23,297-Speed 10796.46 samples/sec   Loss 10.1497   LearningRate 0.0645   Epoch: 7   Global Step: 39860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:43:24,264-Speed 10595.97 samples/sec   Loss 9.8905   LearningRate 0.0645   Epoch: 7   Global Step: 39870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:43:25,207-Speed 10866.52 samples/sec   Loss 9.9349   LearningRate 0.0645   Epoch: 7   Global Step: 39880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:43:26,176-Speed 10575.35 samples/sec   Loss 10.0418   LearningRate 0.0645   Epoch: 7   Global Step: 39890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:43:27,180-Speed 10211.50 samples/sec   Loss 9.9967   LearningRate 0.0644   Epoch: 7   Global Step: 39900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:43:28,108-Speed 11047.77 samples/sec   Loss 9.8960   LearningRate 0.0644   Epoch: 7   Global Step: 39910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:43:29,096-Speed 10369.14 samples/sec   Loss 9.8931   LearningRate 0.0644   Epoch: 7   Global Step: 39920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:43:30,085-Speed 10363.18 samples/sec   Loss 10.1500   LearningRate 0.0644   Epoch: 7   Global Step: 39930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:43:31,078-Speed 10326.07 samples/sec   Loss 9.9555   LearningRate 0.0644   Epoch: 7   Global Step: 39940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:43:32,053-Speed 10513.21 samples/sec   Loss 10.0922   LearningRate 0.0644   Epoch: 7   Global Step: 39950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:43:33,017-Speed 10638.55 samples/sec   Loss 10.0531   LearningRate 0.0644   Epoch: 7   Global Step: 39960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:43:33,987-Speed 10568.81 samples/sec   Loss 9.9234   LearningRate 0.0644   Epoch: 7   Global Step: 39970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:43:34,925-Speed 10922.29 samples/sec   Loss 9.8973   LearningRate 0.0644   Epoch: 7   Global Step: 39980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:43:35,879-Speed 10744.14 samples/sec   Loss 10.1563   LearningRate 0.0644   Epoch: 7   Global Step: 39990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:43:36,859-Speed 10459.20 samples/sec   Loss 9.9836   LearningRate 0.0644   Epoch: 7   Global Step: 40000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:43:58,941-[lfw][40000]XNorm: 13.688095
Training: 2022-04-11 00:43:58,942-[lfw][40000]Accuracy-Flip: 0.99433+-0.00367
Training: 2022-04-11 00:43:58,943-[lfw][40000]Accuracy-Highest: 0.99517
Training: 2022-04-11 00:44:24,173-[cfp_fp][40000]XNorm: 11.567385
Training: 2022-04-11 00:44:24,174-[cfp_fp][40000]Accuracy-Flip: 0.94714+-0.01125
Training: 2022-04-11 00:44:24,175-[cfp_fp][40000]Accuracy-Highest: 0.94986
Training: 2022-04-11 00:44:46,382-[agedb_30][40000]XNorm: 13.269620
Training: 2022-04-11 00:44:46,383-[agedb_30][40000]Accuracy-Flip: 0.95767+-0.01075
Training: 2022-04-11 00:44:46,383-[agedb_30][40000]Accuracy-Highest: 0.95767
Training: 2022-04-11 00:44:47,321-Speed 145.33 samples/sec   Loss 10.1668   LearningRate 0.0644   Epoch: 7   Global Step: 40010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:44:48,290-Speed 10575.01 samples/sec   Loss 9.9794   LearningRate 0.0644   Epoch: 7   Global Step: 40020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:44:49,276-Speed 10393.40 samples/sec   Loss 9.9483   LearningRate 0.0643   Epoch: 7   Global Step: 40030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:44:50,306-Speed 9950.28 samples/sec   Loss 9.9304   LearningRate 0.0643   Epoch: 7   Global Step: 40040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:44:51,255-Speed 10799.09 samples/sec   Loss 10.0378   LearningRate 0.0643   Epoch: 7   Global Step: 40050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:44:52,203-Speed 10808.43 samples/sec   Loss 9.9619   LearningRate 0.0643   Epoch: 7   Global Step: 40060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:44:53,173-Speed 10566.98 samples/sec   Loss 9.8814   LearningRate 0.0643   Epoch: 7   Global Step: 40070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:44:54,148-Speed 10656.86 samples/sec   Loss 9.9727   LearningRate 0.0643   Epoch: 7   Global Step: 40080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:44:55,092-Speed 10871.60 samples/sec   Loss 9.9910   LearningRate 0.0643   Epoch: 7   Global Step: 40090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:44:56,066-Speed 10519.69 samples/sec   Loss 9.9293   LearningRate 0.0643   Epoch: 7   Global Step: 40100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:44:57,033-Speed 10598.56 samples/sec   Loss 10.0087   LearningRate 0.0643   Epoch: 7   Global Step: 40110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:44:57,991-Speed 10704.41 samples/sec   Loss 10.1818   LearningRate 0.0643   Epoch: 7   Global Step: 40120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:44:59,036-Speed 9809.41 samples/sec   Loss 10.0536   LearningRate 0.0643   Epoch: 7   Global Step: 40130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:45:00,035-Speed 10258.27 samples/sec   Loss 9.9938   LearningRate 0.0643   Epoch: 7   Global Step: 40140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:45:00,997-Speed 10660.16 samples/sec   Loss 9.8652   LearningRate 0.0642   Epoch: 7   Global Step: 40150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:45:01,949-Speed 10759.62 samples/sec   Loss 10.1249   LearningRate 0.0642   Epoch: 7   Global Step: 40160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:45:02,970-Speed 10038.78 samples/sec   Loss 10.0534   LearningRate 0.0642   Epoch: 7   Global Step: 40170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:45:03,957-Speed 10385.90 samples/sec   Loss 9.9687   LearningRate 0.0642   Epoch: 7   Global Step: 40180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:45:04,887-Speed 11013.05 samples/sec   Loss 9.8896   LearningRate 0.0642   Epoch: 7   Global Step: 40190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:45:05,850-Speed 10643.20 samples/sec   Loss 9.8822   LearningRate 0.0642   Epoch: 7   Global Step: 40200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:45:06,812-Speed 10661.49 samples/sec   Loss 9.9139   LearningRate 0.0642   Epoch: 7   Global Step: 40210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:45:07,767-Speed 10724.00 samples/sec   Loss 9.9260   LearningRate 0.0642   Epoch: 7   Global Step: 40220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:45:08,771-Speed 10210.68 samples/sec   Loss 9.8098   LearningRate 0.0642   Epoch: 7   Global Step: 40230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:45:09,744-Speed 10540.25 samples/sec   Loss 9.9360   LearningRate 0.0642   Epoch: 7   Global Step: 40240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:45:10,723-Speed 10466.78 samples/sec   Loss 9.9495   LearningRate 0.0642   Epoch: 7   Global Step: 40250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:45:11,680-Speed 10704.98 samples/sec   Loss 10.0340   LearningRate 0.0642   Epoch: 7   Global Step: 40260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:45:12,623-Speed 10872.27 samples/sec   Loss 9.9878   LearningRate 0.0642   Epoch: 7   Global Step: 40270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:45:13,617-Speed 10314.01 samples/sec   Loss 9.9961   LearningRate 0.0641   Epoch: 7   Global Step: 40280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:45:14,620-Speed 10218.86 samples/sec   Loss 9.9013   LearningRate 0.0641   Epoch: 7   Global Step: 40290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:45:15,589-Speed 10574.29 samples/sec   Loss 9.9178   LearningRate 0.0641   Epoch: 7   Global Step: 40300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:45:16,550-Speed 10663.24 samples/sec   Loss 9.9620   LearningRate 0.0641   Epoch: 7   Global Step: 40310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:45:17,536-Speed 10400.68 samples/sec   Loss 9.9009   LearningRate 0.0641   Epoch: 7   Global Step: 40320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:45:18,537-Speed 10237.56 samples/sec   Loss 9.8966   LearningRate 0.0641   Epoch: 7   Global Step: 40330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:45:19,498-Speed 10671.35 samples/sec   Loss 10.0758   LearningRate 0.0641   Epoch: 7   Global Step: 40340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:45:20,463-Speed 10623.93 samples/sec   Loss 10.0699   LearningRate 0.0641   Epoch: 7   Global Step: 40350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:45:21,420-Speed 10707.73 samples/sec   Loss 9.7517   LearningRate 0.0641   Epoch: 7   Global Step: 40360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:45:22,399-Speed 10458.55 samples/sec   Loss 9.9856   LearningRate 0.0641   Epoch: 7   Global Step: 40370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:45:23,381-Speed 10438.71 samples/sec   Loss 9.9245   LearningRate 0.0641   Epoch: 7   Global Step: 40380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:45:24,352-Speed 10556.68 samples/sec   Loss 10.0627   LearningRate 0.0641   Epoch: 7   Global Step: 40390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:45:25,333-Speed 10455.05 samples/sec   Loss 9.7444   LearningRate 0.0641   Epoch: 7   Global Step: 40400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:45:26,313-Speed 10459.33 samples/sec   Loss 10.2271   LearningRate 0.0640   Epoch: 7   Global Step: 40410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:45:27,273-Speed 10672.31 samples/sec   Loss 9.9240   LearningRate 0.0640   Epoch: 7   Global Step: 40420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:45:28,274-Speed 10242.15 samples/sec   Loss 9.9512   LearningRate 0.0640   Epoch: 7   Global Step: 40430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:45:29,213-Speed 10914.93 samples/sec   Loss 9.9433   LearningRate 0.0640   Epoch: 7   Global Step: 40440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:45:30,153-Speed 10900.59 samples/sec   Loss 9.7700   LearningRate 0.0640   Epoch: 7   Global Step: 40450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:45:31,377-Speed 8373.69 samples/sec   Loss 10.0280   LearningRate 0.0640   Epoch: 7   Global Step: 40460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:45:40,912-Speed 1074.09 samples/sec   Loss 9.3378   LearningRate 0.0640   Epoch: 8   Global Step: 40470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:45:41,876-Speed 10633.77 samples/sec   Loss 8.9818   LearningRate 0.0640   Epoch: 8   Global Step: 40480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:45:42,970-Speed 9363.55 samples/sec   Loss 9.0556   LearningRate 0.0640   Epoch: 8   Global Step: 40490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:45:43,942-Speed 10550.63 samples/sec   Loss 9.0504   LearningRate 0.0640   Epoch: 8   Global Step: 40500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:45:45,194-Speed 8181.85 samples/sec   Loss 9.0069   LearningRate 0.0640   Epoch: 8   Global Step: 40510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:45:46,248-Speed 9721.53 samples/sec   Loss 8.8636   LearningRate 0.0640   Epoch: 8   Global Step: 40520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:45:47,312-Speed 9635.50 samples/sec   Loss 9.0942   LearningRate 0.0639   Epoch: 8   Global Step: 40530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:45:48,305-Speed 10329.46 samples/sec   Loss 9.0891   LearningRate 0.0639   Epoch: 8   Global Step: 40540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:45:49,330-Speed 9995.30 samples/sec   Loss 8.9431   LearningRate 0.0639   Epoch: 8   Global Step: 40550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:45:50,338-Speed 10171.45 samples/sec   Loss 8.9822   LearningRate 0.0639   Epoch: 8   Global Step: 40560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:45:51,297-Speed 10687.49 samples/sec   Loss 9.1670   LearningRate 0.0639   Epoch: 8   Global Step: 40570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:45:52,264-Speed 10595.30 samples/sec   Loss 9.1340   LearningRate 0.0639   Epoch: 8   Global Step: 40580   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:45:53,250-Speed 10396.20 samples/sec   Loss 9.2429   LearningRate 0.0639   Epoch: 8   Global Step: 40590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:45:54,266-Speed 10088.20 samples/sec   Loss 9.1484   LearningRate 0.0639   Epoch: 8   Global Step: 40600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:45:55,297-Speed 9940.20 samples/sec   Loss 9.0550   LearningRate 0.0639   Epoch: 8   Global Step: 40610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:45:56,315-Speed 10073.04 samples/sec   Loss 9.3053   LearningRate 0.0639   Epoch: 8   Global Step: 40620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:45:57,291-Speed 10499.32 samples/sec   Loss 9.1689   LearningRate 0.0639   Epoch: 8   Global Step: 40630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:45:58,262-Speed 10550.62 samples/sec   Loss 9.1944   LearningRate 0.0639   Epoch: 8   Global Step: 40640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:45:59,310-Speed 9783.80 samples/sec   Loss 9.2564   LearningRate 0.0639   Epoch: 8   Global Step: 40650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:46:00,301-Speed 10345.71 samples/sec   Loss 9.3012   LearningRate 0.0638   Epoch: 8   Global Step: 40660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:46:01,282-Speed 10440.92 samples/sec   Loss 9.3338   LearningRate 0.0638   Epoch: 8   Global Step: 40670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:46:02,265-Speed 10430.63 samples/sec   Loss 9.3175   LearningRate 0.0638   Epoch: 8   Global Step: 40680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:46:03,253-Speed 10380.10 samples/sec   Loss 9.0044   LearningRate 0.0638   Epoch: 8   Global Step: 40690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:46:04,241-Speed 10369.17 samples/sec   Loss 9.3657   LearningRate 0.0638   Epoch: 8   Global Step: 40700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:46:05,256-Speed 10093.75 samples/sec   Loss 9.3820   LearningRate 0.0638   Epoch: 8   Global Step: 40710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:46:06,245-Speed 10377.18 samples/sec   Loss 9.3185   LearningRate 0.0638   Epoch: 8   Global Step: 40720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:46:07,291-Speed 9794.37 samples/sec   Loss 9.1883   LearningRate 0.0638   Epoch: 8   Global Step: 40730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:46:08,369-Speed 9508.78 samples/sec   Loss 9.1564   LearningRate 0.0638   Epoch: 8   Global Step: 40740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:46:09,416-Speed 9789.40 samples/sec   Loss 9.1420   LearningRate 0.0638   Epoch: 8   Global Step: 40750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:46:10,407-Speed 10343.20 samples/sec   Loss 9.3036   LearningRate 0.0638   Epoch: 8   Global Step: 40760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:46:11,356-Speed 10802.84 samples/sec   Loss 9.3978   LearningRate 0.0638   Epoch: 8   Global Step: 40770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:46:12,360-Speed 10212.17 samples/sec   Loss 9.1432   LearningRate 0.0638   Epoch: 8   Global Step: 40780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:46:13,352-Speed 10332.52 samples/sec   Loss 9.3495   LearningRate 0.0637   Epoch: 8   Global Step: 40790   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:46:14,360-Speed 10163.41 samples/sec   Loss 9.1950   LearningRate 0.0637   Epoch: 8   Global Step: 40800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:46:15,331-Speed 10564.14 samples/sec   Loss 9.1479   LearningRate 0.0637   Epoch: 8   Global Step: 40810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:46:16,487-Speed 8869.26 samples/sec   Loss 9.1642   LearningRate 0.0637   Epoch: 8   Global Step: 40820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:46:17,440-Speed 10745.50 samples/sec   Loss 9.2801   LearningRate 0.0637   Epoch: 8   Global Step: 40830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:46:18,522-Speed 9478.16 samples/sec   Loss 9.5050   LearningRate 0.0637   Epoch: 8   Global Step: 40840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:46:19,518-Speed 10297.77 samples/sec   Loss 9.3401   LearningRate 0.0637   Epoch: 8   Global Step: 40850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:46:20,496-Speed 10469.13 samples/sec   Loss 9.3081   LearningRate 0.0637   Epoch: 8   Global Step: 40860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:46:21,491-Speed 10306.81 samples/sec   Loss 9.3964   LearningRate 0.0637   Epoch: 8   Global Step: 40870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:46:22,484-Speed 10323.40 samples/sec   Loss 9.5211   LearningRate 0.0637   Epoch: 8   Global Step: 40880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:46:23,485-Speed 10233.72 samples/sec   Loss 9.3090   LearningRate 0.0637   Epoch: 8   Global Step: 40890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:46:24,465-Speed 10466.81 samples/sec   Loss 9.4072   LearningRate 0.0637   Epoch: 8   Global Step: 40900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:46:25,433-Speed 10587.83 samples/sec   Loss 9.4726   LearningRate 0.0636   Epoch: 8   Global Step: 40910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:46:26,421-Speed 10374.81 samples/sec   Loss 9.3868   LearningRate 0.0636   Epoch: 8   Global Step: 40920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:46:27,437-Speed 10084.47 samples/sec   Loss 9.3926   LearningRate 0.0636   Epoch: 8   Global Step: 40930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:46:28,417-Speed 10455.14 samples/sec   Loss 9.4844   LearningRate 0.0636   Epoch: 8   Global Step: 40940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:46:29,408-Speed 10351.29 samples/sec   Loss 9.3355   LearningRate 0.0636   Epoch: 8   Global Step: 40950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:46:30,364-Speed 10715.03 samples/sec   Loss 9.3853   LearningRate 0.0636   Epoch: 8   Global Step: 40960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:46:31,334-Speed 10570.82 samples/sec   Loss 9.3622   LearningRate 0.0636   Epoch: 8   Global Step: 40970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:46:32,425-Speed 9393.26 samples/sec   Loss 9.4813   LearningRate 0.0636   Epoch: 8   Global Step: 40980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:46:33,440-Speed 10088.09 samples/sec   Loss 9.5344   LearningRate 0.0636   Epoch: 8   Global Step: 40990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:46:34,508-Speed 9601.47 samples/sec   Loss 9.1587   LearningRate 0.0636   Epoch: 8   Global Step: 41000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:46:35,480-Speed 10559.91 samples/sec   Loss 9.5262   LearningRate 0.0636   Epoch: 8   Global Step: 41010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:46:36,568-Speed 9427.19 samples/sec   Loss 9.4159   LearningRate 0.0636   Epoch: 8   Global Step: 41020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:46:37,546-Speed 10471.26 samples/sec   Loss 9.2582   LearningRate 0.0636   Epoch: 8   Global Step: 41030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:46:38,643-Speed 9345.23 samples/sec   Loss 9.4202   LearningRate 0.0635   Epoch: 8   Global Step: 41040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:46:39,701-Speed 9684.38 samples/sec   Loss 9.3430   LearningRate 0.0635   Epoch: 8   Global Step: 41050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:46:40,692-Speed 10347.79 samples/sec   Loss 9.4833   LearningRate 0.0635   Epoch: 8   Global Step: 41060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:46:41,741-Speed 9762.78 samples/sec   Loss 9.4113   LearningRate 0.0635   Epoch: 8   Global Step: 41070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:46:42,762-Speed 10039.25 samples/sec   Loss 9.4343   LearningRate 0.0635   Epoch: 8   Global Step: 41080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:46:43,822-Speed 9670.42 samples/sec   Loss 9.6010   LearningRate 0.0635   Epoch: 8   Global Step: 41090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:46:44,909-Speed 9439.72 samples/sec   Loss 9.4821   LearningRate 0.0635   Epoch: 8   Global Step: 41100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:46:45,869-Speed 10684.50 samples/sec   Loss 9.6359   LearningRate 0.0635   Epoch: 8   Global Step: 41110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:46:46,940-Speed 9572.05 samples/sec   Loss 9.3135   LearningRate 0.0635   Epoch: 8   Global Step: 41120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:46:47,995-Speed 9710.11 samples/sec   Loss 9.4771   LearningRate 0.0635   Epoch: 8   Global Step: 41130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:46:48,944-Speed 10798.19 samples/sec   Loss 9.5485   LearningRate 0.0635   Epoch: 8   Global Step: 41140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:46:49,925-Speed 10463.31 samples/sec   Loss 9.3653   LearningRate 0.0635   Epoch: 8   Global Step: 41150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:46:50,895-Speed 10579.44 samples/sec   Loss 9.3656   LearningRate 0.0635   Epoch: 8   Global Step: 41160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:46:52,022-Speed 9088.92 samples/sec   Loss 9.3489   LearningRate 0.0634   Epoch: 8   Global Step: 41170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:46:53,046-Speed 10006.98 samples/sec   Loss 9.5178   LearningRate 0.0634   Epoch: 8   Global Step: 41180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:46:54,220-Speed 8730.86 samples/sec   Loss 9.5685   LearningRate 0.0634   Epoch: 8   Global Step: 41190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:46:55,162-Speed 10891.43 samples/sec   Loss 9.6820   LearningRate 0.0634   Epoch: 8   Global Step: 41200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:46:56,123-Speed 10666.36 samples/sec   Loss 9.3976   LearningRate 0.0634   Epoch: 8   Global Step: 41210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:46:57,101-Speed 10471.50 samples/sec   Loss 9.5212   LearningRate 0.0634   Epoch: 8   Global Step: 41220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:46:58,076-Speed 10510.21 samples/sec   Loss 9.4845   LearningRate 0.0634   Epoch: 8   Global Step: 41230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:46:59,080-Speed 10211.50 samples/sec   Loss 9.4840   LearningRate 0.0634   Epoch: 8   Global Step: 41240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:00,032-Speed 10773.42 samples/sec   Loss 9.4881   LearningRate 0.0634   Epoch: 8   Global Step: 41250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:01,031-Speed 10253.96 samples/sec   Loss 9.7259   LearningRate 0.0634   Epoch: 8   Global Step: 41260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:01,999-Speed 10591.04 samples/sec   Loss 9.4652   LearningRate 0.0634   Epoch: 8   Global Step: 41270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:03,016-Speed 10078.16 samples/sec   Loss 9.5244   LearningRate 0.0634   Epoch: 8   Global Step: 41280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:04,003-Speed 10381.07 samples/sec   Loss 9.5838   LearningRate 0.0633   Epoch: 8   Global Step: 41290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:04,969-Speed 10611.29 samples/sec   Loss 9.7542   LearningRate 0.0633   Epoch: 8   Global Step: 41300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:05,946-Speed 10495.25 samples/sec   Loss 9.6478   LearningRate 0.0633   Epoch: 8   Global Step: 41310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:06,903-Speed 10703.68 samples/sec   Loss 9.7095   LearningRate 0.0633   Epoch: 8   Global Step: 41320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:47:07,856-Speed 10757.35 samples/sec   Loss 9.5533   LearningRate 0.0633   Epoch: 8   Global Step: 41330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:47:08,918-Speed 9651.72 samples/sec   Loss 9.5933   LearningRate 0.0633   Epoch: 8   Global Step: 41340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:47:09,886-Speed 10588.31 samples/sec   Loss 9.6034   LearningRate 0.0633   Epoch: 8   Global Step: 41350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:47:10,866-Speed 10456.24 samples/sec   Loss 9.5952   LearningRate 0.0633   Epoch: 8   Global Step: 41360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:47:11,831-Speed 10632.76 samples/sec   Loss 9.5439   LearningRate 0.0633   Epoch: 8   Global Step: 41370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:47:12,891-Speed 9663.89 samples/sec   Loss 9.4703   LearningRate 0.0633   Epoch: 8   Global Step: 41380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:47:13,896-Speed 10205.01 samples/sec   Loss 9.6838   LearningRate 0.0633   Epoch: 8   Global Step: 41390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:47:14,826-Speed 11012.19 samples/sec   Loss 9.4883   LearningRate 0.0633   Epoch: 8   Global Step: 41400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:47:15,861-Speed 9912.99 samples/sec   Loss 9.5785   LearningRate 0.0633   Epoch: 8   Global Step: 41410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:47:16,876-Speed 10092.41 samples/sec   Loss 9.6032   LearningRate 0.0632   Epoch: 8   Global Step: 41420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:17,935-Speed 9683.35 samples/sec   Loss 9.5790   LearningRate 0.0632   Epoch: 8   Global Step: 41430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:18,913-Speed 10484.72 samples/sec   Loss 9.6872   LearningRate 0.0632   Epoch: 8   Global Step: 41440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:19,867-Speed 10743.28 samples/sec   Loss 9.7483   LearningRate 0.0632   Epoch: 8   Global Step: 41450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:20,842-Speed 10510.09 samples/sec   Loss 9.6383   LearningRate 0.0632   Epoch: 8   Global Step: 41460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:21,821-Speed 10476.06 samples/sec   Loss 9.5975   LearningRate 0.0632   Epoch: 8   Global Step: 41470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:22,786-Speed 10623.75 samples/sec   Loss 9.5726   LearningRate 0.0632   Epoch: 8   Global Step: 41480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:23,754-Speed 10625.31 samples/sec   Loss 9.4004   LearningRate 0.0632   Epoch: 8   Global Step: 41490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:24,806-Speed 9742.57 samples/sec   Loss 9.7659   LearningRate 0.0632   Epoch: 8   Global Step: 41500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:25,823-Speed 10077.64 samples/sec   Loss 9.5927   LearningRate 0.0632   Epoch: 8   Global Step: 41510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:26,910-Speed 9432.89 samples/sec   Loss 9.6061   LearningRate 0.0632   Epoch: 8   Global Step: 41520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:27,842-Speed 10996.94 samples/sec   Loss 9.5599   LearningRate 0.0632   Epoch: 8   Global Step: 41530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:28,825-Speed 10428.63 samples/sec   Loss 9.6530   LearningRate 0.0632   Epoch: 8   Global Step: 41540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:29,837-Speed 10119.28 samples/sec   Loss 9.6185   LearningRate 0.0631   Epoch: 8   Global Step: 41550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:30,857-Speed 10052.52 samples/sec   Loss 9.7259   LearningRate 0.0631   Epoch: 8   Global Step: 41560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:31,865-Speed 10169.66 samples/sec   Loss 9.4701   LearningRate 0.0631   Epoch: 8   Global Step: 41570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:32,828-Speed 10652.05 samples/sec   Loss 9.5557   LearningRate 0.0631   Epoch: 8   Global Step: 41580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:33,784-Speed 10720.57 samples/sec   Loss 9.6071   LearningRate 0.0631   Epoch: 8   Global Step: 41590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:34,808-Speed 10001.20 samples/sec   Loss 9.6341   LearningRate 0.0631   Epoch: 8   Global Step: 41600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:35,793-Speed 10405.26 samples/sec   Loss 9.5470   LearningRate 0.0631   Epoch: 8   Global Step: 41610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:36,794-Speed 10238.91 samples/sec   Loss 9.7967   LearningRate 0.0631   Epoch: 8   Global Step: 41620   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:47:37,782-Speed 10376.51 samples/sec   Loss 9.7448   LearningRate 0.0631   Epoch: 8   Global Step: 41630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:38,770-Speed 10378.43 samples/sec   Loss 9.6286   LearningRate 0.0631   Epoch: 8   Global Step: 41640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:39,853-Speed 9460.17 samples/sec   Loss 9.5849   LearningRate 0.0631   Epoch: 8   Global Step: 41650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:40,830-Speed 10493.01 samples/sec   Loss 9.6884   LearningRate 0.0631   Epoch: 8   Global Step: 41660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:41,810-Speed 10456.66 samples/sec   Loss 9.4618   LearningRate 0.0630   Epoch: 8   Global Step: 41670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:42,811-Speed 10248.37 samples/sec   Loss 9.7100   LearningRate 0.0630   Epoch: 8   Global Step: 41680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:43,808-Speed 10272.50 samples/sec   Loss 9.5359   LearningRate 0.0630   Epoch: 8   Global Step: 41690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:44,788-Speed 10463.94 samples/sec   Loss 9.5160   LearningRate 0.0630   Epoch: 8   Global Step: 41700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:45,757-Speed 10570.48 samples/sec   Loss 9.6314   LearningRate 0.0630   Epoch: 8   Global Step: 41710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:46,774-Speed 10085.69 samples/sec   Loss 9.8173   LearningRate 0.0630   Epoch: 8   Global Step: 41720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:47,736-Speed 10647.66 samples/sec   Loss 9.6695   LearningRate 0.0630   Epoch: 8   Global Step: 41730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:48,686-Speed 10793.56 samples/sec   Loss 9.7639   LearningRate 0.0630   Epoch: 8   Global Step: 41740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:49,739-Speed 9732.75 samples/sec   Loss 9.6890   LearningRate 0.0630   Epoch: 8   Global Step: 41750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:50,706-Speed 10596.43 samples/sec   Loss 9.5360   LearningRate 0.0630   Epoch: 8   Global Step: 41760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:51,666-Speed 10682.61 samples/sec   Loss 9.9965   LearningRate 0.0630   Epoch: 8   Global Step: 41770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:52,703-Speed 9880.87 samples/sec   Loss 9.6333   LearningRate 0.0630   Epoch: 8   Global Step: 41780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:53,704-Speed 10240.35 samples/sec   Loss 9.7731   LearningRate 0.0630   Epoch: 8   Global Step: 41790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:54,696-Speed 10332.85 samples/sec   Loss 9.6950   LearningRate 0.0629   Epoch: 8   Global Step: 41800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:55,661-Speed 10617.17 samples/sec   Loss 9.6841   LearningRate 0.0629   Epoch: 8   Global Step: 41810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:56,701-Speed 9853.81 samples/sec   Loss 9.6201   LearningRate 0.0629   Epoch: 8   Global Step: 41820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:57,689-Speed 10365.55 samples/sec   Loss 9.6910   LearningRate 0.0629   Epoch: 8   Global Step: 41830   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:47:58,681-Speed 10340.07 samples/sec   Loss 9.8108   LearningRate 0.0629   Epoch: 8   Global Step: 41840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:47:59,630-Speed 10801.42 samples/sec   Loss 9.5547   LearningRate 0.0629   Epoch: 8   Global Step: 41850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:48:00,605-Speed 10506.75 samples/sec   Loss 9.5945   LearningRate 0.0629   Epoch: 8   Global Step: 41860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:48:01,558-Speed 10752.01 samples/sec   Loss 9.8780   LearningRate 0.0629   Epoch: 8   Global Step: 41870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:48:02,588-Speed 9956.61 samples/sec   Loss 9.5357   LearningRate 0.0629   Epoch: 8   Global Step: 41880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:48:03,547-Speed 10689.89 samples/sec   Loss 9.4886   LearningRate 0.0629   Epoch: 8   Global Step: 41890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:48:04,480-Speed 10974.85 samples/sec   Loss 9.6720   LearningRate 0.0629   Epoch: 8   Global Step: 41900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:48:05,395-Speed 11209.82 samples/sec   Loss 9.6730   LearningRate 0.0629   Epoch: 8   Global Step: 41910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:48:06,455-Speed 9662.51 samples/sec   Loss 9.5397   LearningRate 0.0629   Epoch: 8   Global Step: 41920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:48:07,444-Speed 10365.24 samples/sec   Loss 9.7210   LearningRate 0.0628   Epoch: 8   Global Step: 41930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:48:08,478-Speed 9914.76 samples/sec   Loss 9.7210   LearningRate 0.0628   Epoch: 8   Global Step: 41940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:48:09,457-Speed 10465.37 samples/sec   Loss 9.6728   LearningRate 0.0628   Epoch: 8   Global Step: 41950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:48:10,458-Speed 10241.96 samples/sec   Loss 9.7476   LearningRate 0.0628   Epoch: 8   Global Step: 41960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:48:11,417-Speed 10686.56 samples/sec   Loss 9.7817   LearningRate 0.0628   Epoch: 8   Global Step: 41970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:48:12,430-Speed 10115.96 samples/sec   Loss 9.8970   LearningRate 0.0628   Epoch: 8   Global Step: 41980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:48:13,436-Speed 10190.64 samples/sec   Loss 9.7841   LearningRate 0.0628   Epoch: 8   Global Step: 41990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:48:14,402-Speed 10612.78 samples/sec   Loss 9.6801   LearningRate 0.0628   Epoch: 8   Global Step: 42000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:48:36,796-[lfw][42000]XNorm: 13.711006
Training: 2022-04-11 00:48:36,797-[lfw][42000]Accuracy-Flip: 0.99483+-0.00376
Training: 2022-04-11 00:48:36,798-[lfw][42000]Accuracy-Highest: 0.99517
Training: 2022-04-11 00:49:02,184-[cfp_fp][42000]XNorm: 11.573321
Training: 2022-04-11 00:49:02,185-[cfp_fp][42000]Accuracy-Flip: 0.94500+-0.01105
Training: 2022-04-11 00:49:02,185-[cfp_fp][42000]Accuracy-Highest: 0.94986
Training: 2022-04-11 00:49:24,198-[agedb_30][42000]XNorm: 13.413920
Training: 2022-04-11 00:49:24,199-[agedb_30][42000]Accuracy-Flip: 0.95617+-0.01080
Training: 2022-04-11 00:49:24,200-[agedb_30][42000]Accuracy-Highest: 0.95767
Training: 2022-04-11 00:49:25,162-Speed 144.72 samples/sec   Loss 9.6105   LearningRate 0.0628   Epoch: 8   Global Step: 42010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:49:26,126-Speed 10641.51 samples/sec   Loss 9.7024   LearningRate 0.0628   Epoch: 8   Global Step: 42020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:49:27,091-Speed 10617.17 samples/sec   Loss 9.7107   LearningRate 0.0628   Epoch: 8   Global Step: 42030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:49:28,080-Speed 10363.12 samples/sec   Loss 9.6020   LearningRate 0.0628   Epoch: 8   Global Step: 42040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:49:29,098-Speed 10069.31 samples/sec   Loss 9.7571   LearningRate 0.0628   Epoch: 8   Global Step: 42050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:49:30,082-Speed 10420.00 samples/sec   Loss 9.6163   LearningRate 0.0627   Epoch: 8   Global Step: 42060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:49:31,067-Speed 10407.31 samples/sec   Loss 9.7673   LearningRate 0.0627   Epoch: 8   Global Step: 42070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:49:32,044-Speed 10494.36 samples/sec   Loss 9.7627   LearningRate 0.0627   Epoch: 8   Global Step: 42080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:49:33,014-Speed 10563.62 samples/sec   Loss 9.8066   LearningRate 0.0627   Epoch: 8   Global Step: 42090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:49:33,982-Speed 10595.77 samples/sec   Loss 9.5582   LearningRate 0.0627   Epoch: 8   Global Step: 42100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:49:34,926-Speed 10851.58 samples/sec   Loss 9.7499   LearningRate 0.0627   Epoch: 8   Global Step: 42110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:49:35,905-Speed 10467.56 samples/sec   Loss 9.6428   LearningRate 0.0627   Epoch: 8   Global Step: 42120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:49:36,874-Speed 10582.52 samples/sec   Loss 9.6669   LearningRate 0.0627   Epoch: 8   Global Step: 42130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:49:37,838-Speed 10631.02 samples/sec   Loss 9.5353   LearningRate 0.0627   Epoch: 8   Global Step: 42140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:49:38,837-Speed 10255.03 samples/sec   Loss 9.7391   LearningRate 0.0627   Epoch: 8   Global Step: 42150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:49:39,786-Speed 10806.59 samples/sec   Loss 9.6361   LearningRate 0.0627   Epoch: 8   Global Step: 42160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:49:40,728-Speed 10872.47 samples/sec   Loss 9.5706   LearningRate 0.0627   Epoch: 8   Global Step: 42170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:49:41,693-Speed 10627.39 samples/sec   Loss 9.6567   LearningRate 0.0627   Epoch: 8   Global Step: 42180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:49:42,651-Speed 10693.28 samples/sec   Loss 9.7343   LearningRate 0.0626   Epoch: 8   Global Step: 42190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:49:43,657-Speed 10189.47 samples/sec   Loss 9.6346   LearningRate 0.0626   Epoch: 8   Global Step: 42200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:49:44,621-Speed 10640.05 samples/sec   Loss 9.6889   LearningRate 0.0626   Epoch: 8   Global Step: 42210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:49:45,609-Speed 10373.10 samples/sec   Loss 9.6447   LearningRate 0.0626   Epoch: 8   Global Step: 42220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:49:46,575-Speed 10612.00 samples/sec   Loss 9.7625   LearningRate 0.0626   Epoch: 8   Global Step: 42230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:49:47,545-Speed 10565.89 samples/sec   Loss 9.7094   LearningRate 0.0626   Epoch: 8   Global Step: 42240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:49:48,494-Speed 10806.44 samples/sec   Loss 9.6933   LearningRate 0.0626   Epoch: 8   Global Step: 42250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:49:49,495-Speed 10234.62 samples/sec   Loss 9.6907   LearningRate 0.0626   Epoch: 8   Global Step: 42260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:49:50,448-Speed 10759.03 samples/sec   Loss 9.7125   LearningRate 0.0626   Epoch: 8   Global Step: 42270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:49:51,383-Speed 10960.31 samples/sec   Loss 9.6807   LearningRate 0.0626   Epoch: 8   Global Step: 42280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:49:52,387-Speed 10213.26 samples/sec   Loss 9.7492   LearningRate 0.0626   Epoch: 8   Global Step: 42290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:49:53,351-Speed 10636.49 samples/sec   Loss 9.7618   LearningRate 0.0626   Epoch: 8   Global Step: 42300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:49:54,330-Speed 10467.12 samples/sec   Loss 9.8217   LearningRate 0.0625   Epoch: 8   Global Step: 42310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:49:55,288-Speed 10701.77 samples/sec   Loss 9.6215   LearningRate 0.0625   Epoch: 8   Global Step: 42320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:49:56,232-Speed 10857.55 samples/sec   Loss 9.6340   LearningRate 0.0625   Epoch: 8   Global Step: 42330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:49:57,192-Speed 10669.21 samples/sec   Loss 9.6698   LearningRate 0.0625   Epoch: 8   Global Step: 42340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:49:58,145-Speed 10749.36 samples/sec   Loss 9.7507   LearningRate 0.0625   Epoch: 8   Global Step: 42350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:49:59,172-Speed 9991.87 samples/sec   Loss 9.9067   LearningRate 0.0625   Epoch: 8   Global Step: 42360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:50:00,161-Speed 10354.58 samples/sec   Loss 9.8968   LearningRate 0.0625   Epoch: 8   Global Step: 42370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:50:01,122-Speed 10678.23 samples/sec   Loss 9.7367   LearningRate 0.0625   Epoch: 8   Global Step: 42380   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:50:02,105-Speed 10417.85 samples/sec   Loss 9.7106   LearningRate 0.0625   Epoch: 8   Global Step: 42390   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:50:03,115-Speed 10142.34 samples/sec   Loss 9.6256   LearningRate 0.0625   Epoch: 8   Global Step: 42400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:50:04,159-Speed 9824.81 samples/sec   Loss 9.8068   LearningRate 0.0625   Epoch: 8   Global Step: 42410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:50:05,171-Speed 10123.80 samples/sec   Loss 9.6558   LearningRate 0.0625   Epoch: 8   Global Step: 42420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:50:06,113-Speed 10891.64 samples/sec   Loss 9.8197   LearningRate 0.0625   Epoch: 8   Global Step: 42430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:50:07,069-Speed 10716.21 samples/sec   Loss 9.7993   LearningRate 0.0624   Epoch: 8   Global Step: 42440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:50:08,062-Speed 10324.08 samples/sec   Loss 9.7888   LearningRate 0.0624   Epoch: 8   Global Step: 42450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:50:09,052-Speed 10355.23 samples/sec   Loss 9.8601   LearningRate 0.0624   Epoch: 8   Global Step: 42460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:50:10,059-Speed 10187.17 samples/sec   Loss 9.6421   LearningRate 0.0624   Epoch: 8   Global Step: 42470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:50:11,130-Speed 9571.17 samples/sec   Loss 9.8289   LearningRate 0.0624   Epoch: 8   Global Step: 42480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:50:12,073-Speed 10869.14 samples/sec   Loss 9.7397   LearningRate 0.0624   Epoch: 8   Global Step: 42490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:50:13,090-Speed 10073.05 samples/sec   Loss 9.6606   LearningRate 0.0624   Epoch: 8   Global Step: 42500   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:50:14,095-Speed 10197.11 samples/sec   Loss 9.7320   LearningRate 0.0624   Epoch: 8   Global Step: 42510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:50:15,072-Speed 10486.69 samples/sec   Loss 9.6916   LearningRate 0.0624   Epoch: 8   Global Step: 42520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:50:16,032-Speed 10689.98 samples/sec   Loss 9.8031   LearningRate 0.0624   Epoch: 8   Global Step: 42530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:50:16,980-Speed 10805.99 samples/sec   Loss 9.7450   LearningRate 0.0624   Epoch: 8   Global Step: 42540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:50:18,024-Speed 9817.63 samples/sec   Loss 9.6731   LearningRate 0.0624   Epoch: 8   Global Step: 42550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:50:19,044-Speed 10048.56 samples/sec   Loss 9.7571   LearningRate 0.0624   Epoch: 8   Global Step: 42560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:50:20,033-Speed 10364.35 samples/sec   Loss 9.7015   LearningRate 0.0623   Epoch: 8   Global Step: 42570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:50:21,002-Speed 10572.72 samples/sec   Loss 9.8021   LearningRate 0.0623   Epoch: 8   Global Step: 42580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:50:21,987-Speed 10404.78 samples/sec   Loss 9.5756   LearningRate 0.0623   Epoch: 8   Global Step: 42590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:50:22,950-Speed 10643.75 samples/sec   Loss 9.6417   LearningRate 0.0623   Epoch: 8   Global Step: 42600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:50:23,936-Speed 10392.27 samples/sec   Loss 9.9507   LearningRate 0.0623   Epoch: 8   Global Step: 42610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:50:24,927-Speed 10351.32 samples/sec   Loss 9.7314   LearningRate 0.0623   Epoch: 8   Global Step: 42620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:50:25,873-Speed 10836.09 samples/sec   Loss 9.6364   LearningRate 0.0623   Epoch: 8   Global Step: 42630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:50:26,826-Speed 10744.47 samples/sec   Loss 9.5545   LearningRate 0.0623   Epoch: 8   Global Step: 42640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:50:27,788-Speed 10660.18 samples/sec   Loss 9.6302   LearningRate 0.0623   Epoch: 8   Global Step: 42650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:50:28,866-Speed 9503.42 samples/sec   Loss 9.7506   LearningRate 0.0623   Epoch: 8   Global Step: 42660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:50:29,895-Speed 9969.10 samples/sec   Loss 9.8735   LearningRate 0.0623   Epoch: 8   Global Step: 42670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:50:30,944-Speed 9768.18 samples/sec   Loss 9.8128   LearningRate 0.0623   Epoch: 8   Global Step: 42680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:50:31,907-Speed 10639.22 samples/sec   Loss 9.7005   LearningRate 0.0623   Epoch: 8   Global Step: 42690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:50:32,934-Speed 9977.14 samples/sec   Loss 9.6300   LearningRate 0.0622   Epoch: 8   Global Step: 42700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:50:33,974-Speed 9858.29 samples/sec   Loss 9.7239   LearningRate 0.0622   Epoch: 8   Global Step: 42710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:50:34,945-Speed 10563.94 samples/sec   Loss 9.7932   LearningRate 0.0622   Epoch: 8   Global Step: 42720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:50:35,957-Speed 10121.92 samples/sec   Loss 9.8433   LearningRate 0.0622   Epoch: 8   Global Step: 42730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:50:36,953-Speed 10294.79 samples/sec   Loss 9.7486   LearningRate 0.0622   Epoch: 8   Global Step: 42740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:50:37,956-Speed 10220.45 samples/sec   Loss 9.9000   LearningRate 0.0622   Epoch: 8   Global Step: 42750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:50:39,025-Speed 9586.95 samples/sec   Loss 9.6509   LearningRate 0.0622   Epoch: 8   Global Step: 42760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:50:40,001-Speed 10496.30 samples/sec   Loss 9.7917   LearningRate 0.0622   Epoch: 8   Global Step: 42770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:50:40,963-Speed 10663.54 samples/sec   Loss 9.7369   LearningRate 0.0622   Epoch: 8   Global Step: 42780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:50:41,930-Speed 10600.69 samples/sec   Loss 9.7462   LearningRate 0.0622   Epoch: 8   Global Step: 42790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:50:42,903-Speed 10530.35 samples/sec   Loss 9.7486   LearningRate 0.0622   Epoch: 8   Global Step: 42800   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 00:50:43,909-Speed 10184.06 samples/sec   Loss 9.7615   LearningRate 0.0622   Epoch: 8   Global Step: 42810   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 00:50:44,851-Speed 10887.70 samples/sec   Loss 9.7462   LearningRate 0.0622   Epoch: 8   Global Step: 42820   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 00:50:45,824-Speed 10536.15 samples/sec   Loss 9.7377   LearningRate 0.0621   Epoch: 8   Global Step: 42830   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 00:50:46,810-Speed 10394.18 samples/sec   Loss 9.7267   LearningRate 0.0621   Epoch: 8   Global Step: 42840   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 00:50:47,872-Speed 9649.16 samples/sec   Loss 9.8186   LearningRate 0.0621   Epoch: 8   Global Step: 42850   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 00:50:48,875-Speed 10222.98 samples/sec   Loss 9.6422   LearningRate 0.0621   Epoch: 8   Global Step: 42860   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 00:50:49,827-Speed 10765.31 samples/sec   Loss 9.7795   LearningRate 0.0621   Epoch: 8   Global Step: 42870   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 00:50:50,820-Speed 10325.93 samples/sec   Loss 9.8648   LearningRate 0.0621   Epoch: 8   Global Step: 42880   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 00:50:51,811-Speed 10341.21 samples/sec   Loss 9.7843   LearningRate 0.0621   Epoch: 8   Global Step: 42890   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 00:50:52,742-Speed 11008.24 samples/sec   Loss 9.6885   LearningRate 0.0621   Epoch: 8   Global Step: 42900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:50:53,712-Speed 10568.77 samples/sec   Loss 9.7474   LearningRate 0.0621   Epoch: 8   Global Step: 42910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:50:54,737-Speed 10001.17 samples/sec   Loss 9.8350   LearningRate 0.0621   Epoch: 8   Global Step: 42920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:50:55,719-Speed 10430.85 samples/sec   Loss 9.8648   LearningRate 0.0621   Epoch: 8   Global Step: 42930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:50:56,744-Speed 10004.30 samples/sec   Loss 9.6915   LearningRate 0.0621   Epoch: 8   Global Step: 42940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:50:57,972-Speed 8339.33 samples/sec   Loss 9.7476   LearningRate 0.0620   Epoch: 8   Global Step: 42950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:50:58,957-Speed 10413.89 samples/sec   Loss 9.6947   LearningRate 0.0620   Epoch: 8   Global Step: 42960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:50:59,975-Speed 10070.53 samples/sec   Loss 9.6428   LearningRate 0.0620   Epoch: 8   Global Step: 42970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:51:00,943-Speed 10590.85 samples/sec   Loss 9.7000   LearningRate 0.0620   Epoch: 8   Global Step: 42980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:51:02,002-Speed 9676.78 samples/sec   Loss 9.6570   LearningRate 0.0620   Epoch: 8   Global Step: 42990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:51:03,065-Speed 9638.94 samples/sec   Loss 9.7050   LearningRate 0.0620   Epoch: 8   Global Step: 43000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:51:04,142-Speed 9523.50 samples/sec   Loss 9.6812   LearningRate 0.0620   Epoch: 8   Global Step: 43010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:51:05,215-Speed 9551.69 samples/sec   Loss 9.7222   LearningRate 0.0620   Epoch: 8   Global Step: 43020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:51:06,163-Speed 10806.38 samples/sec   Loss 9.7170   LearningRate 0.0620   Epoch: 8   Global Step: 43030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:51:07,192-Speed 9958.35 samples/sec   Loss 9.6970   LearningRate 0.0620   Epoch: 8   Global Step: 43040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:51:08,195-Speed 10216.36 samples/sec   Loss 9.7410   LearningRate 0.0620   Epoch: 8   Global Step: 43050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:51:09,160-Speed 10625.99 samples/sec   Loss 9.5817   LearningRate 0.0620   Epoch: 8   Global Step: 43060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:51:10,129-Speed 10590.33 samples/sec   Loss 9.6829   LearningRate 0.0620   Epoch: 8   Global Step: 43070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:51:11,124-Speed 10299.04 samples/sec   Loss 9.5053   LearningRate 0.0619   Epoch: 8   Global Step: 43080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:51:12,178-Speed 9731.35 samples/sec   Loss 9.9118   LearningRate 0.0619   Epoch: 8   Global Step: 43090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:51:13,136-Speed 10701.79 samples/sec   Loss 9.8704   LearningRate 0.0619   Epoch: 8   Global Step: 43100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:51:14,152-Speed 10084.92 samples/sec   Loss 9.6640   LearningRate 0.0619   Epoch: 8   Global Step: 43110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:51:15,121-Speed 10574.50 samples/sec   Loss 9.7436   LearningRate 0.0619   Epoch: 8   Global Step: 43120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:51:16,085-Speed 10630.48 samples/sec   Loss 9.5426   LearningRate 0.0619   Epoch: 8   Global Step: 43130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:51:17,091-Speed 10184.64 samples/sec   Loss 9.7070   LearningRate 0.0619   Epoch: 8   Global Step: 43140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:51:18,140-Speed 9776.50 samples/sec   Loss 9.7413   LearningRate 0.0619   Epoch: 8   Global Step: 43150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:51:19,253-Speed 9206.25 samples/sec   Loss 9.7817   LearningRate 0.0619   Epoch: 8   Global Step: 43160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:51:20,202-Speed 10803.20 samples/sec   Loss 9.9638   LearningRate 0.0619   Epoch: 8   Global Step: 43170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:51:21,201-Speed 10266.45 samples/sec   Loss 9.7366   LearningRate 0.0619   Epoch: 8   Global Step: 43180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:51:22,129-Speed 11038.27 samples/sec   Loss 9.8655   LearningRate 0.0619   Epoch: 8   Global Step: 43190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:51:23,162-Speed 9928.31 samples/sec   Loss 9.7693   LearningRate 0.0619   Epoch: 8   Global Step: 43200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:51:24,179-Speed 10073.62 samples/sec   Loss 9.7045   LearningRate 0.0618   Epoch: 8   Global Step: 43210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:51:25,173-Speed 10313.64 samples/sec   Loss 9.6734   LearningRate 0.0618   Epoch: 8   Global Step: 43220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:51:26,164-Speed 10338.03 samples/sec   Loss 9.7338   LearningRate 0.0618   Epoch: 8   Global Step: 43230   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:51:27,186-Speed 10028.47 samples/sec   Loss 9.5613   LearningRate 0.0618   Epoch: 8   Global Step: 43240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:51:28,124-Speed 10927.20 samples/sec   Loss 9.8836   LearningRate 0.0618   Epoch: 8   Global Step: 43250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:51:29,094-Speed 10572.41 samples/sec   Loss 9.6473   LearningRate 0.0618   Epoch: 8   Global Step: 43260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:51:30,100-Speed 10187.92 samples/sec   Loss 9.8819   LearningRate 0.0618   Epoch: 8   Global Step: 43270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:51:31,219-Speed 9160.06 samples/sec   Loss 9.6367   LearningRate 0.0618   Epoch: 8   Global Step: 43280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:51:32,278-Speed 9684.66 samples/sec   Loss 9.8112   LearningRate 0.0618   Epoch: 8   Global Step: 43290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:51:33,260-Speed 10430.18 samples/sec   Loss 9.6792   LearningRate 0.0618   Epoch: 8   Global Step: 43300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:51:34,338-Speed 9516.80 samples/sec   Loss 9.8370   LearningRate 0.0618   Epoch: 8   Global Step: 43310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:51:35,314-Speed 10503.14 samples/sec   Loss 9.7337   LearningRate 0.0618   Epoch: 8   Global Step: 43320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:51:36,271-Speed 10705.40 samples/sec   Loss 9.5240   LearningRate 0.0618   Epoch: 8   Global Step: 43330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:51:37,342-Speed 9570.10 samples/sec   Loss 9.8200   LearningRate 0.0617   Epoch: 8   Global Step: 43340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:51:38,342-Speed 10251.79 samples/sec   Loss 9.6980   LearningRate 0.0617   Epoch: 8   Global Step: 43350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:51:39,326-Speed 10414.69 samples/sec   Loss 9.8744   LearningRate 0.0617   Epoch: 8   Global Step: 43360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:51:40,340-Speed 10108.86 samples/sec   Loss 9.7646   LearningRate 0.0617   Epoch: 8   Global Step: 43370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:51:41,425-Speed 9440.44 samples/sec   Loss 9.5674   LearningRate 0.0617   Epoch: 8   Global Step: 43380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:51:42,474-Speed 9778.13 samples/sec   Loss 9.7877   LearningRate 0.0617   Epoch: 8   Global Step: 43390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:51:43,506-Speed 9932.22 samples/sec   Loss 9.7273   LearningRate 0.0617   Epoch: 8   Global Step: 43400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:51:44,503-Speed 10277.19 samples/sec   Loss 9.7119   LearningRate 0.0617   Epoch: 8   Global Step: 43410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:51:45,552-Speed 9771.99 samples/sec   Loss 9.6999   LearningRate 0.0617   Epoch: 8   Global Step: 43420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:51:46,556-Speed 10207.53 samples/sec   Loss 9.8710   LearningRate 0.0617   Epoch: 8   Global Step: 43430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:51:47,588-Speed 9927.45 samples/sec   Loss 9.7334   LearningRate 0.0617   Epoch: 8   Global Step: 43440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:51:48,558-Speed 10577.35 samples/sec   Loss 9.8314   LearningRate 0.0617   Epoch: 8   Global Step: 43450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:51:49,520-Speed 10653.24 samples/sec   Loss 9.7395   LearningRate 0.0617   Epoch: 8   Global Step: 43460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:51:50,499-Speed 10477.96 samples/sec   Loss 9.8475   LearningRate 0.0616   Epoch: 8   Global Step: 43470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:51:51,469-Speed 10560.84 samples/sec   Loss 9.8369   LearningRate 0.0616   Epoch: 8   Global Step: 43480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:51:52,418-Speed 10798.10 samples/sec   Loss 9.8243   LearningRate 0.0616   Epoch: 8   Global Step: 43490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:51:53,451-Speed 9926.24 samples/sec   Loss 9.7714   LearningRate 0.0616   Epoch: 8   Global Step: 43500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:51:54,414-Speed 10644.67 samples/sec   Loss 9.6738   LearningRate 0.0616   Epoch: 8   Global Step: 43510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:51:55,495-Speed 9482.99 samples/sec   Loss 9.6341   LearningRate 0.0616   Epoch: 8   Global Step: 43520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:51:56,507-Speed 10123.76 samples/sec   Loss 9.7583   LearningRate 0.0616   Epoch: 8   Global Step: 43530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:51:57,560-Speed 9732.53 samples/sec   Loss 9.7817   LearningRate 0.0616   Epoch: 8   Global Step: 43540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:51:58,553-Speed 10324.86 samples/sec   Loss 9.7879   LearningRate 0.0616   Epoch: 8   Global Step: 43550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:51:59,595-Speed 9831.24 samples/sec   Loss 9.8985   LearningRate 0.0616   Epoch: 8   Global Step: 43560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:52:00,682-Speed 9438.99 samples/sec   Loss 9.6515   LearningRate 0.0616   Epoch: 8   Global Step: 43570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:52:01,729-Speed 9790.67 samples/sec   Loss 9.6703   LearningRate 0.0616   Epoch: 8   Global Step: 43580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:52:02,759-Speed 9948.85 samples/sec   Loss 9.7110   LearningRate 0.0616   Epoch: 8   Global Step: 43590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:52:03,812-Speed 9735.76 samples/sec   Loss 9.7749   LearningRate 0.0615   Epoch: 8   Global Step: 43600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:52:04,773-Speed 10661.77 samples/sec   Loss 9.6925   LearningRate 0.0615   Epoch: 8   Global Step: 43610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:52:05,760-Speed 10386.09 samples/sec   Loss 9.6711   LearningRate 0.0615   Epoch: 8   Global Step: 43620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:52:06,889-Speed 9073.53 samples/sec   Loss 9.7685   LearningRate 0.0615   Epoch: 8   Global Step: 43630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:52:07,887-Speed 10270.17 samples/sec   Loss 9.7525   LearningRate 0.0615   Epoch: 8   Global Step: 43640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:52:08,832-Speed 10842.29 samples/sec   Loss 9.7720   LearningRate 0.0615   Epoch: 8   Global Step: 43650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:52:09,759-Speed 11063.73 samples/sec   Loss 9.7825   LearningRate 0.0615   Epoch: 8   Global Step: 43660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:52:10,733-Speed 10520.84 samples/sec   Loss 9.9079   LearningRate 0.0615   Epoch: 8   Global Step: 43670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:52:11,862-Speed 9094.91 samples/sec   Loss 9.7859   LearningRate 0.0615   Epoch: 8   Global Step: 43680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:52:12,869-Speed 10182.16 samples/sec   Loss 9.7167   LearningRate 0.0615   Epoch: 8   Global Step: 43690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:52:13,934-Speed 9617.51 samples/sec   Loss 9.9093   LearningRate 0.0615   Epoch: 8   Global Step: 43700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:52:14,893-Speed 10694.26 samples/sec   Loss 9.7896   LearningRate 0.0615   Epoch: 8   Global Step: 43710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:52:15,940-Speed 9783.52 samples/sec   Loss 9.7741   LearningRate 0.0615   Epoch: 8   Global Step: 43720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:52:16,954-Speed 10106.56 samples/sec   Loss 9.7791   LearningRate 0.0614   Epoch: 8   Global Step: 43730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:52:17,920-Speed 10611.75 samples/sec   Loss 9.8658   LearningRate 0.0614   Epoch: 8   Global Step: 43740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:52:18,934-Speed 10114.12 samples/sec   Loss 9.8341   LearningRate 0.0614   Epoch: 8   Global Step: 43750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:52:19,961-Speed 9982.55 samples/sec   Loss 9.9156   LearningRate 0.0614   Epoch: 8   Global Step: 43760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:52:20,939-Speed 10480.69 samples/sec   Loss 9.5365   LearningRate 0.0614   Epoch: 8   Global Step: 43770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:52:21,928-Speed 10366.56 samples/sec   Loss 9.9852   LearningRate 0.0614   Epoch: 8   Global Step: 43780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:52:22,942-Speed 10113.35 samples/sec   Loss 9.7051   LearningRate 0.0614   Epoch: 8   Global Step: 43790   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:52:23,891-Speed 10810.68 samples/sec   Loss 9.7513   LearningRate 0.0614   Epoch: 8   Global Step: 43800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:52:24,859-Speed 10581.70 samples/sec   Loss 10.0014   LearningRate 0.0614   Epoch: 8   Global Step: 43810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:52:25,825-Speed 10609.31 samples/sec   Loss 9.6631   LearningRate 0.0614   Epoch: 8   Global Step: 43820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:52:26,867-Speed 9840.02 samples/sec   Loss 9.7515   LearningRate 0.0614   Epoch: 8   Global Step: 43830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:52:27,832-Speed 10623.97 samples/sec   Loss 9.6507   LearningRate 0.0614   Epoch: 8   Global Step: 43840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:52:28,798-Speed 10604.70 samples/sec   Loss 9.8223   LearningRate 0.0614   Epoch: 8   Global Step: 43850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:52:29,878-Speed 9494.66 samples/sec   Loss 9.6708   LearningRate 0.0613   Epoch: 8   Global Step: 43860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:52:30,867-Speed 10373.17 samples/sec   Loss 9.5287   LearningRate 0.0613   Epoch: 8   Global Step: 43870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:52:31,867-Speed 10248.39 samples/sec   Loss 9.7352   LearningRate 0.0613   Epoch: 8   Global Step: 43880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:52:32,933-Speed 9616.01 samples/sec   Loss 9.7698   LearningRate 0.0613   Epoch: 8   Global Step: 43890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:52:33,918-Speed 10399.19 samples/sec   Loss 9.6535   LearningRate 0.0613   Epoch: 8   Global Step: 43900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:52:34,995-Speed 9515.72 samples/sec   Loss 9.7053   LearningRate 0.0613   Epoch: 8   Global Step: 43910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:52:36,007-Speed 10133.60 samples/sec   Loss 9.6828   LearningRate 0.0613   Epoch: 8   Global Step: 43920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:52:36,992-Speed 10399.55 samples/sec   Loss 9.6780   LearningRate 0.0613   Epoch: 8   Global Step: 43930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:52:37,952-Speed 10684.87 samples/sec   Loss 9.7535   LearningRate 0.0613   Epoch: 8   Global Step: 43940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:52:38,912-Speed 10672.38 samples/sec   Loss 9.8576   LearningRate 0.0613   Epoch: 8   Global Step: 43950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:52:39,920-Speed 10163.29 samples/sec   Loss 9.6753   LearningRate 0.0613   Epoch: 8   Global Step: 43960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:52:40,868-Speed 10816.58 samples/sec   Loss 9.6916   LearningRate 0.0613   Epoch: 8   Global Step: 43970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:52:41,910-Speed 9834.05 samples/sec   Loss 9.6977   LearningRate 0.0612   Epoch: 8   Global Step: 43980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:52:42,898-Speed 10379.09 samples/sec   Loss 9.6993   LearningRate 0.0612   Epoch: 8   Global Step: 43990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:52:43,947-Speed 9774.61 samples/sec   Loss 9.9725   LearningRate 0.0612   Epoch: 8   Global Step: 44000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:53:06,345-[lfw][44000]XNorm: 13.519775
Training: 2022-04-11 00:53:06,346-[lfw][44000]Accuracy-Flip: 0.99383+-0.00495
Training: 2022-04-11 00:53:06,347-[lfw][44000]Accuracy-Highest: 0.99517
Training: 2022-04-11 00:53:32,168-[cfp_fp][44000]XNorm: 11.554743
Training: 2022-04-11 00:53:32,170-[cfp_fp][44000]Accuracy-Flip: 0.94771+-0.01219
Training: 2022-04-11 00:53:32,170-[cfp_fp][44000]Accuracy-Highest: 0.94986
Training: 2022-04-11 00:53:54,509-[agedb_30][44000]XNorm: 13.255656
Training: 2022-04-11 00:53:54,510-[agedb_30][44000]Accuracy-Flip: 0.95617+-0.01057
Training: 2022-04-11 00:53:54,510-[agedb_30][44000]Accuracy-Highest: 0.95767
Training: 2022-04-11 00:53:55,475-Speed 143.16 samples/sec   Loss 9.7633   LearningRate 0.0612   Epoch: 8   Global Step: 44010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:53:56,467-Speed 10327.61 samples/sec   Loss 9.8492   LearningRate 0.0612   Epoch: 8   Global Step: 44020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:53:57,443-Speed 10504.81 samples/sec   Loss 9.8431   LearningRate 0.0612   Epoch: 8   Global Step: 44030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:53:58,451-Speed 10168.55 samples/sec   Loss 9.8347   LearningRate 0.0612   Epoch: 8   Global Step: 44040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:53:59,367-Speed 11195.88 samples/sec   Loss 9.7626   LearningRate 0.0612   Epoch: 8   Global Step: 44050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:54:00,347-Speed 10460.50 samples/sec   Loss 9.7431   LearningRate 0.0612   Epoch: 8   Global Step: 44060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:54:01,355-Speed 10162.23 samples/sec   Loss 9.6810   LearningRate 0.0612   Epoch: 8   Global Step: 44070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:54:02,362-Speed 10186.93 samples/sec   Loss 9.7458   LearningRate 0.0612   Epoch: 8   Global Step: 44080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:54:03,322-Speed 10677.21 samples/sec   Loss 9.6324   LearningRate 0.0612   Epoch: 8   Global Step: 44090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:54:04,307-Speed 10410.37 samples/sec   Loss 9.7629   LearningRate 0.0612   Epoch: 8   Global Step: 44100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:54:05,286-Speed 10462.17 samples/sec   Loss 9.7999   LearningRate 0.0611   Epoch: 8   Global Step: 44110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:54:06,294-Speed 10176.25 samples/sec   Loss 9.7177   LearningRate 0.0611   Epoch: 8   Global Step: 44120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:54:07,240-Speed 10836.90 samples/sec   Loss 9.8101   LearningRate 0.0611   Epoch: 8   Global Step: 44130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:54:08,229-Speed 10373.00 samples/sec   Loss 9.7598   LearningRate 0.0611   Epoch: 8   Global Step: 44140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:54:09,182-Speed 10755.79 samples/sec   Loss 9.8697   LearningRate 0.0611   Epoch: 8   Global Step: 44150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:54:10,187-Speed 10193.45 samples/sec   Loss 9.6244   LearningRate 0.0611   Epoch: 8   Global Step: 44160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:54:11,154-Speed 10604.69 samples/sec   Loss 9.6101   LearningRate 0.0611   Epoch: 8   Global Step: 44170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:54:12,086-Speed 10993.19 samples/sec   Loss 9.7716   LearningRate 0.0611   Epoch: 8   Global Step: 44180   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:54:13,015-Speed 11040.53 samples/sec   Loss 9.5995   LearningRate 0.0611   Epoch: 8   Global Step: 44190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:54:13,982-Speed 10593.53 samples/sec   Loss 9.7301   LearningRate 0.0611   Epoch: 8   Global Step: 44200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:54:14,964-Speed 10440.50 samples/sec   Loss 9.7646   LearningRate 0.0611   Epoch: 8   Global Step: 44210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:54:15,940-Speed 10501.57 samples/sec   Loss 9.8000   LearningRate 0.0611   Epoch: 8   Global Step: 44220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:54:16,894-Speed 10747.74 samples/sec   Loss 9.9499   LearningRate 0.0611   Epoch: 8   Global Step: 44230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:54:17,872-Speed 10484.53 samples/sec   Loss 9.7097   LearningRate 0.0610   Epoch: 8   Global Step: 44240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:54:18,836-Speed 10631.37 samples/sec   Loss 9.8728   LearningRate 0.0610   Epoch: 8   Global Step: 44250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:54:19,870-Speed 9916.51 samples/sec   Loss 9.6671   LearningRate 0.0610   Epoch: 8   Global Step: 44260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:54:20,885-Speed 10097.36 samples/sec   Loss 9.6842   LearningRate 0.0610   Epoch: 8   Global Step: 44270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:54:21,902-Speed 10076.39 samples/sec   Loss 9.7866   LearningRate 0.0610   Epoch: 8   Global Step: 44280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:54:22,937-Speed 9909.86 samples/sec   Loss 9.7240   LearningRate 0.0610   Epoch: 8   Global Step: 44290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:54:23,966-Speed 9962.81 samples/sec   Loss 9.6342   LearningRate 0.0610   Epoch: 8   Global Step: 44300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:54:24,935-Speed 10575.17 samples/sec   Loss 9.6742   LearningRate 0.0610   Epoch: 8   Global Step: 44310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:54:26,056-Speed 9141.97 samples/sec   Loss 9.7595   LearningRate 0.0610   Epoch: 8   Global Step: 44320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:54:27,168-Speed 9215.87 samples/sec   Loss 9.7315   LearningRate 0.0610   Epoch: 8   Global Step: 44330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:54:28,216-Speed 9786.92 samples/sec   Loss 9.9534   LearningRate 0.0610   Epoch: 8   Global Step: 44340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:54:29,187-Speed 10556.76 samples/sec   Loss 9.6595   LearningRate 0.0610   Epoch: 8   Global Step: 44350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:54:30,258-Speed 9569.98 samples/sec   Loss 9.6170   LearningRate 0.0610   Epoch: 8   Global Step: 44360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:54:31,286-Speed 9966.86 samples/sec   Loss 9.7781   LearningRate 0.0609   Epoch: 8   Global Step: 44370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:54:32,246-Speed 10677.98 samples/sec   Loss 9.6967   LearningRate 0.0609   Epoch: 8   Global Step: 44380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:54:33,226-Speed 10464.63 samples/sec   Loss 9.7295   LearningRate 0.0609   Epoch: 8   Global Step: 44390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:54:34,252-Speed 9995.89 samples/sec   Loss 9.5536   LearningRate 0.0609   Epoch: 8   Global Step: 44400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:54:35,224-Speed 10537.93 samples/sec   Loss 9.7795   LearningRate 0.0609   Epoch: 8   Global Step: 44410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:54:36,245-Speed 10040.16 samples/sec   Loss 9.7356   LearningRate 0.0609   Epoch: 8   Global Step: 44420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:54:37,277-Speed 9926.20 samples/sec   Loss 9.7394   LearningRate 0.0609   Epoch: 8   Global Step: 44430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:54:38,308-Speed 9944.17 samples/sec   Loss 9.5908   LearningRate 0.0609   Epoch: 8   Global Step: 44440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:54:39,306-Speed 10278.74 samples/sec   Loss 9.7198   LearningRate 0.0609   Epoch: 8   Global Step: 44450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:54:40,287-Speed 10441.03 samples/sec   Loss 9.7361   LearningRate 0.0609   Epoch: 8   Global Step: 44460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:54:41,334-Speed 9789.07 samples/sec   Loss 9.7807   LearningRate 0.0609   Epoch: 8   Global Step: 44470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:54:42,299-Speed 10635.76 samples/sec   Loss 9.7441   LearningRate 0.0609   Epoch: 8   Global Step: 44480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:54:43,263-Speed 10631.57 samples/sec   Loss 9.7506   LearningRate 0.0609   Epoch: 8   Global Step: 44490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:54:44,286-Speed 10017.44 samples/sec   Loss 9.6245   LearningRate 0.0608   Epoch: 8   Global Step: 44500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:54:45,241-Speed 10726.39 samples/sec   Loss 9.7004   LearningRate 0.0608   Epoch: 8   Global Step: 44510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:54:46,250-Speed 10163.51 samples/sec   Loss 9.6825   LearningRate 0.0608   Epoch: 8   Global Step: 44520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:54:47,294-Speed 9818.38 samples/sec   Loss 9.6555   LearningRate 0.0608   Epoch: 8   Global Step: 44530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:54:48,339-Speed 9809.10 samples/sec   Loss 9.8153   LearningRate 0.0608   Epoch: 8   Global Step: 44540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:54:49,399-Speed 9670.34 samples/sec   Loss 9.7045   LearningRate 0.0608   Epoch: 8   Global Step: 44550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:54:50,519-Speed 9150.80 samples/sec   Loss 9.8334   LearningRate 0.0608   Epoch: 8   Global Step: 44560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:54:51,517-Speed 10265.54 samples/sec   Loss 9.9131   LearningRate 0.0608   Epoch: 8   Global Step: 44570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:54:52,532-Speed 10096.91 samples/sec   Loss 9.7755   LearningRate 0.0608   Epoch: 8   Global Step: 44580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:54:53,635-Speed 9296.30 samples/sec   Loss 9.8749   LearningRate 0.0608   Epoch: 8   Global Step: 44590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:54:54,642-Speed 10177.94 samples/sec   Loss 9.8483   LearningRate 0.0608   Epoch: 8   Global Step: 44600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:54:55,623-Speed 10455.74 samples/sec   Loss 9.6195   LearningRate 0.0608   Epoch: 8   Global Step: 44610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:54:56,645-Speed 10022.57 samples/sec   Loss 9.7601   LearningRate 0.0608   Epoch: 8   Global Step: 44620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:54:57,646-Speed 10242.73 samples/sec   Loss 9.5816   LearningRate 0.0607   Epoch: 8   Global Step: 44630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:54:58,731-Speed 9449.04 samples/sec   Loss 9.7472   LearningRate 0.0607   Epoch: 8   Global Step: 44640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:54:59,751-Speed 10063.30 samples/sec   Loss 9.5177   LearningRate 0.0607   Epoch: 8   Global Step: 44650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:55:00,863-Speed 9215.48 samples/sec   Loss 9.6082   LearningRate 0.0607   Epoch: 8   Global Step: 44660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:55:01,817-Speed 10746.95 samples/sec   Loss 9.7711   LearningRate 0.0607   Epoch: 8   Global Step: 44670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:55:02,827-Speed 10144.62 samples/sec   Loss 9.8214   LearningRate 0.0607   Epoch: 8   Global Step: 44680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:55:03,902-Speed 9540.23 samples/sec   Loss 9.6758   LearningRate 0.0607   Epoch: 8   Global Step: 44690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:55:04,886-Speed 10410.87 samples/sec   Loss 9.7491   LearningRate 0.0607   Epoch: 8   Global Step: 44700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:55:05,865-Speed 10473.83 samples/sec   Loss 9.6055   LearningRate 0.0607   Epoch: 8   Global Step: 44710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:55:06,870-Speed 10194.70 samples/sec   Loss 9.6150   LearningRate 0.0607   Epoch: 8   Global Step: 44720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:55:07,889-Speed 10065.62 samples/sec   Loss 9.8075   LearningRate 0.0607   Epoch: 8   Global Step: 44730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:55:08,969-Speed 9485.80 samples/sec   Loss 9.7119   LearningRate 0.0607   Epoch: 8   Global Step: 44740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:55:10,016-Speed 9787.01 samples/sec   Loss 9.7382   LearningRate 0.0607   Epoch: 8   Global Step: 44750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:55:11,101-Speed 9447.44 samples/sec   Loss 9.8261   LearningRate 0.0606   Epoch: 8   Global Step: 44760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:55:12,084-Speed 10430.55 samples/sec   Loss 9.7441   LearningRate 0.0606   Epoch: 8   Global Step: 44770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:55:13,097-Speed 10117.38 samples/sec   Loss 9.7745   LearningRate 0.0606   Epoch: 8   Global Step: 44780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:55:14,134-Speed 9880.09 samples/sec   Loss 9.7307   LearningRate 0.0606   Epoch: 8   Global Step: 44790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:55:15,323-Speed 8615.83 samples/sec   Loss 9.7894   LearningRate 0.0606   Epoch: 8   Global Step: 44800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:55:16,305-Speed 10440.38 samples/sec   Loss 9.6714   LearningRate 0.0606   Epoch: 8   Global Step: 44810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:55:17,369-Speed 9636.12 samples/sec   Loss 9.6012   LearningRate 0.0606   Epoch: 8   Global Step: 44820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:55:18,407-Speed 9879.57 samples/sec   Loss 9.8126   LearningRate 0.0606   Epoch: 8   Global Step: 44830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:55:19,417-Speed 10145.02 samples/sec   Loss 9.7675   LearningRate 0.0606   Epoch: 8   Global Step: 44840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:55:20,355-Speed 10936.63 samples/sec   Loss 9.7231   LearningRate 0.0606   Epoch: 8   Global Step: 44850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:55:21,408-Speed 9728.04 samples/sec   Loss 9.6932   LearningRate 0.0606   Epoch: 8   Global Step: 44860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:55:22,444-Speed 9894.48 samples/sec   Loss 9.7445   LearningRate 0.0606   Epoch: 8   Global Step: 44870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:55:23,461-Speed 10085.22 samples/sec   Loss 9.7899   LearningRate 0.0606   Epoch: 8   Global Step: 44880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:55:24,460-Speed 10262.97 samples/sec   Loss 9.7504   LearningRate 0.0605   Epoch: 8   Global Step: 44890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:55:25,419-Speed 10686.47 samples/sec   Loss 9.6496   LearningRate 0.0605   Epoch: 8   Global Step: 44900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:55:26,415-Speed 10292.95 samples/sec   Loss 9.8717   LearningRate 0.0605   Epoch: 8   Global Step: 44910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:55:27,407-Speed 10335.87 samples/sec   Loss 9.7106   LearningRate 0.0605   Epoch: 8   Global Step: 44920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:55:28,543-Speed 9020.37 samples/sec   Loss 9.6998   LearningRate 0.0605   Epoch: 8   Global Step: 44930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:55:29,553-Speed 10138.48 samples/sec   Loss 9.7774   LearningRate 0.0605   Epoch: 8   Global Step: 44940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:55:30,507-Speed 10752.94 samples/sec   Loss 9.6507   LearningRate 0.0605   Epoch: 8   Global Step: 44950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:55:31,489-Speed 10435.56 samples/sec   Loss 9.7046   LearningRate 0.0605   Epoch: 8   Global Step: 44960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:55:32,502-Speed 10116.57 samples/sec   Loss 9.7499   LearningRate 0.0605   Epoch: 8   Global Step: 44970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:55:33,550-Speed 9795.46 samples/sec   Loss 9.8033   LearningRate 0.0605   Epoch: 8   Global Step: 44980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:55:34,526-Speed 10495.74 samples/sec   Loss 9.8300   LearningRate 0.0605   Epoch: 8   Global Step: 44990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:55:35,561-Speed 9908.84 samples/sec   Loss 9.6776   LearningRate 0.0605   Epoch: 8   Global Step: 45000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:55:36,542-Speed 10443.23 samples/sec   Loss 9.6419   LearningRate 0.0605   Epoch: 8   Global Step: 45010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:55:37,578-Speed 9892.22 samples/sec   Loss 9.6846   LearningRate 0.0604   Epoch: 8   Global Step: 45020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:55:38,567-Speed 10375.50 samples/sec   Loss 9.6921   LearningRate 0.0604   Epoch: 8   Global Step: 45030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:55:39,512-Speed 10846.11 samples/sec   Loss 9.8834   LearningRate 0.0604   Epoch: 8   Global Step: 45040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:55:40,487-Speed 10510.41 samples/sec   Loss 9.7000   LearningRate 0.0604   Epoch: 8   Global Step: 45050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:55:41,452-Speed 10622.65 samples/sec   Loss 9.5536   LearningRate 0.0604   Epoch: 8   Global Step: 45060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:55:42,415-Speed 10652.09 samples/sec   Loss 9.7441   LearningRate 0.0604   Epoch: 8   Global Step: 45070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:55:43,362-Speed 10823.26 samples/sec   Loss 9.5831   LearningRate 0.0604   Epoch: 8   Global Step: 45080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:55:44,344-Speed 10438.39 samples/sec   Loss 9.6089   LearningRate 0.0604   Epoch: 8   Global Step: 45090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:55:45,307-Speed 10641.01 samples/sec   Loss 9.7083   LearningRate 0.0604   Epoch: 8   Global Step: 45100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:55:46,294-Speed 10382.38 samples/sec   Loss 9.7274   LearningRate 0.0604   Epoch: 8   Global Step: 45110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:55:47,253-Speed 10701.68 samples/sec   Loss 9.6423   LearningRate 0.0604   Epoch: 8   Global Step: 45120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:55:48,251-Speed 10273.11 samples/sec   Loss 9.5320   LearningRate 0.0604   Epoch: 8   Global Step: 45130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:55:49,220-Speed 10575.59 samples/sec   Loss 9.6721   LearningRate 0.0604   Epoch: 8   Global Step: 45140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:55:50,193-Speed 10537.04 samples/sec   Loss 9.7256   LearningRate 0.0603   Epoch: 8   Global Step: 45150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:55:51,148-Speed 10729.79 samples/sec   Loss 9.6117   LearningRate 0.0603   Epoch: 8   Global Step: 45160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:55:52,119-Speed 10550.19 samples/sec   Loss 9.7265   LearningRate 0.0603   Epoch: 8   Global Step: 45170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:55:53,156-Speed 9881.94 samples/sec   Loss 9.6355   LearningRate 0.0603   Epoch: 8   Global Step: 45180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:55:54,182-Speed 9994.69 samples/sec   Loss 9.7760   LearningRate 0.0603   Epoch: 8   Global Step: 45190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:55:55,124-Speed 10882.02 samples/sec   Loss 9.6634   LearningRate 0.0603   Epoch: 8   Global Step: 45200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:55:56,094-Speed 10574.27 samples/sec   Loss 9.6429   LearningRate 0.0603   Epoch: 8   Global Step: 45210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:55:57,139-Speed 9807.01 samples/sec   Loss 9.7692   LearningRate 0.0603   Epoch: 8   Global Step: 45220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:55:58,109-Speed 10563.84 samples/sec   Loss 9.7752   LearningRate 0.0603   Epoch: 8   Global Step: 45230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:55:59,056-Speed 10827.74 samples/sec   Loss 9.8817   LearningRate 0.0603   Epoch: 8   Global Step: 45240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:56:00,050-Speed 10307.99 samples/sec   Loss 9.7315   LearningRate 0.0603   Epoch: 8   Global Step: 45250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:56:00,993-Speed 10865.06 samples/sec   Loss 9.7403   LearningRate 0.0603   Epoch: 8   Global Step: 45260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:56:02,081-Speed 9424.20 samples/sec   Loss 9.6133   LearningRate 0.0603   Epoch: 8   Global Step: 45270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:56:03,040-Speed 10687.47 samples/sec   Loss 9.7241   LearningRate 0.0602   Epoch: 8   Global Step: 45280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:56:04,001-Speed 10668.70 samples/sec   Loss 9.6113   LearningRate 0.0602   Epoch: 8   Global Step: 45290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:56:05,006-Speed 10203.38 samples/sec   Loss 9.6609   LearningRate 0.0602   Epoch: 8   Global Step: 45300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:56:05,955-Speed 10797.18 samples/sec   Loss 9.6636   LearningRate 0.0602   Epoch: 8   Global Step: 45310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:56:06,902-Speed 10820.96 samples/sec   Loss 9.7023   LearningRate 0.0602   Epoch: 8   Global Step: 45320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:56:07,863-Speed 10663.22 samples/sec   Loss 9.7731   LearningRate 0.0602   Epoch: 8   Global Step: 45330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:56:08,865-Speed 10238.03 samples/sec   Loss 9.6501   LearningRate 0.0602   Epoch: 8   Global Step: 45340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:56:09,835-Speed 10570.71 samples/sec   Loss 9.5407   LearningRate 0.0602   Epoch: 8   Global Step: 45350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:56:10,859-Speed 10003.45 samples/sec   Loss 9.6768   LearningRate 0.0602   Epoch: 8   Global Step: 45360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:56:11,824-Speed 10613.85 samples/sec   Loss 9.7551   LearningRate 0.0602   Epoch: 8   Global Step: 45370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:56:12,802-Speed 10486.94 samples/sec   Loss 9.7297   LearningRate 0.0602   Epoch: 8   Global Step: 45380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:56:13,751-Speed 10796.94 samples/sec   Loss 9.8672   LearningRate 0.0602   Epoch: 8   Global Step: 45390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:56:14,728-Speed 10494.34 samples/sec   Loss 9.8063   LearningRate 0.0602   Epoch: 8   Global Step: 45400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:56:15,715-Speed 10376.83 samples/sec   Loss 9.5605   LearningRate 0.0601   Epoch: 8   Global Step: 45410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:56:16,669-Speed 10747.87 samples/sec   Loss 9.6449   LearningRate 0.0601   Epoch: 8   Global Step: 45420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:56:17,662-Speed 10329.93 samples/sec   Loss 9.7545   LearningRate 0.0601   Epoch: 8   Global Step: 45430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:56:18,637-Speed 10521.48 samples/sec   Loss 9.6864   LearningRate 0.0601   Epoch: 8   Global Step: 45440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:56:19,630-Speed 10321.04 samples/sec   Loss 9.7576   LearningRate 0.0601   Epoch: 8   Global Step: 45450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:56:20,579-Speed 10797.96 samples/sec   Loss 9.7569   LearningRate 0.0601   Epoch: 8   Global Step: 45460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:56:21,519-Speed 10898.04 samples/sec   Loss 9.7564   LearningRate 0.0601   Epoch: 8   Global Step: 45470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:56:22,484-Speed 10628.42 samples/sec   Loss 9.7762   LearningRate 0.0601   Epoch: 8   Global Step: 45480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:56:23,474-Speed 10348.66 samples/sec   Loss 9.8571   LearningRate 0.0601   Epoch: 8   Global Step: 45490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:56:24,522-Speed 9783.70 samples/sec   Loss 9.7799   LearningRate 0.0601   Epoch: 8   Global Step: 45500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:56:25,564-Speed 9832.72 samples/sec   Loss 9.5967   LearningRate 0.0601   Epoch: 8   Global Step: 45510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:56:26,618-Speed 9727.58 samples/sec   Loss 9.5523   LearningRate 0.0601   Epoch: 8   Global Step: 45520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:56:36,273-Speed 1060.73 samples/sec   Loss 9.0258   LearningRate 0.0601   Epoch: 9   Global Step: 45530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:56:37,235-Speed 10660.71 samples/sec   Loss 8.6402   LearningRate 0.0600   Epoch: 9   Global Step: 45540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:56:38,221-Speed 10400.55 samples/sec   Loss 8.8071   LearningRate 0.0600   Epoch: 9   Global Step: 45550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:56:39,533-Speed 7807.01 samples/sec   Loss 8.8038   LearningRate 0.0600   Epoch: 9   Global Step: 45560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:56:40,557-Speed 10005.17 samples/sec   Loss 8.9540   LearningRate 0.0600   Epoch: 9   Global Step: 45570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:56:41,736-Speed 8699.23 samples/sec   Loss 8.6899   LearningRate 0.0600   Epoch: 9   Global Step: 45580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:56:42,692-Speed 10712.79 samples/sec   Loss 8.8626   LearningRate 0.0600   Epoch: 9   Global Step: 45590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:56:43,670-Speed 10489.12 samples/sec   Loss 8.9395   LearningRate 0.0600   Epoch: 9   Global Step: 45600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:56:44,668-Speed 10266.59 samples/sec   Loss 8.9482   LearningRate 0.0600   Epoch: 9   Global Step: 45610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:56:45,604-Speed 10976.49 samples/sec   Loss 8.7177   LearningRate 0.0600   Epoch: 9   Global Step: 45620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:56:46,638-Speed 9904.70 samples/sec   Loss 8.7260   LearningRate 0.0600   Epoch: 9   Global Step: 45630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:56:47,682-Speed 9824.82 samples/sec   Loss 8.8675   LearningRate 0.0600   Epoch: 9   Global Step: 45640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:56:48,670-Speed 10367.49 samples/sec   Loss 8.9316   LearningRate 0.0600   Epoch: 9   Global Step: 45650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:56:49,654-Speed 10416.24 samples/sec   Loss 8.6044   LearningRate 0.0600   Epoch: 9   Global Step: 45660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:56:50,605-Speed 10774.94 samples/sec   Loss 8.9202   LearningRate 0.0599   Epoch: 9   Global Step: 45670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:56:51,601-Speed 10295.75 samples/sec   Loss 8.9514   LearningRate 0.0599   Epoch: 9   Global Step: 45680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:56:52,583-Speed 10445.12 samples/sec   Loss 8.8124   LearningRate 0.0599   Epoch: 9   Global Step: 45690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:56:53,564-Speed 10448.17 samples/sec   Loss 8.8585   LearningRate 0.0599   Epoch: 9   Global Step: 45700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:56:54,542-Speed 10480.01 samples/sec   Loss 8.7785   LearningRate 0.0599   Epoch: 9   Global Step: 45710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:56:55,460-Speed 11167.94 samples/sec   Loss 8.9832   LearningRate 0.0599   Epoch: 9   Global Step: 45720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:56:56,425-Speed 10618.19 samples/sec   Loss 8.8279   LearningRate 0.0599   Epoch: 9   Global Step: 45730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:56:57,380-Speed 10739.71 samples/sec   Loss 8.8495   LearningRate 0.0599   Epoch: 9   Global Step: 45740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:56:58,333-Speed 10751.45 samples/sec   Loss 8.8576   LearningRate 0.0599   Epoch: 9   Global Step: 45750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:56:59,521-Speed 8624.78 samples/sec   Loss 9.0814   LearningRate 0.0599   Epoch: 9   Global Step: 45760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:57:00,618-Speed 9344.67 samples/sec   Loss 8.9256   LearningRate 0.0599   Epoch: 9   Global Step: 45770   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:57:01,712-Speed 9366.04 samples/sec   Loss 8.8964   LearningRate 0.0599   Epoch: 9   Global Step: 45780   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 00:57:02,686-Speed 10522.36 samples/sec   Loss 9.1220   LearningRate 0.0599   Epoch: 9   Global Step: 45790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:57:03,665-Speed 10468.27 samples/sec   Loss 8.9003   LearningRate 0.0598   Epoch: 9   Global Step: 45800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:57:04,626-Speed 10665.22 samples/sec   Loss 8.8731   LearningRate 0.0598   Epoch: 9   Global Step: 45810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:57:05,586-Speed 10667.83 samples/sec   Loss 9.0684   LearningRate 0.0598   Epoch: 9   Global Step: 45820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:57:06,599-Speed 10130.74 samples/sec   Loss 9.0974   LearningRate 0.0598   Epoch: 9   Global Step: 45830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:57:07,552-Speed 10748.83 samples/sec   Loss 8.9577   LearningRate 0.0598   Epoch: 9   Global Step: 45840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:57:08,553-Speed 10241.41 samples/sec   Loss 9.0029   LearningRate 0.0598   Epoch: 9   Global Step: 45850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:57:09,542-Speed 10357.25 samples/sec   Loss 9.0836   LearningRate 0.0598   Epoch: 9   Global Step: 45860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:57:10,478-Speed 10957.20 samples/sec   Loss 9.1273   LearningRate 0.0598   Epoch: 9   Global Step: 45870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:57:11,431-Speed 10767.16 samples/sec   Loss 8.9603   LearningRate 0.0598   Epoch: 9   Global Step: 45880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:57:12,407-Speed 10494.93 samples/sec   Loss 8.8293   LearningRate 0.0598   Epoch: 9   Global Step: 45890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:57:13,411-Speed 10228.58 samples/sec   Loss 9.0959   LearningRate 0.0598   Epoch: 9   Global Step: 45900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:57:14,391-Speed 10456.52 samples/sec   Loss 9.1097   LearningRate 0.0598   Epoch: 9   Global Step: 45910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:57:15,327-Speed 10946.64 samples/sec   Loss 9.1536   LearningRate 0.0598   Epoch: 9   Global Step: 45920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:57:16,346-Speed 10055.82 samples/sec   Loss 8.9611   LearningRate 0.0598   Epoch: 9   Global Step: 45930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:57:17,363-Speed 10081.76 samples/sec   Loss 9.0573   LearningRate 0.0597   Epoch: 9   Global Step: 45940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:57:18,281-Speed 11167.77 samples/sec   Loss 9.3124   LearningRate 0.0597   Epoch: 9   Global Step: 45950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:57:19,248-Speed 10600.10 samples/sec   Loss 9.0806   LearningRate 0.0597   Epoch: 9   Global Step: 45960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:57:20,188-Speed 10920.66 samples/sec   Loss 9.0726   LearningRate 0.0597   Epoch: 9   Global Step: 45970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:57:21,176-Speed 10385.01 samples/sec   Loss 8.9091   LearningRate 0.0597   Epoch: 9   Global Step: 45980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:57:22,115-Speed 10927.15 samples/sec   Loss 9.1465   LearningRate 0.0597   Epoch: 9   Global Step: 45990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:57:23,103-Speed 10370.61 samples/sec   Loss 9.1948   LearningRate 0.0597   Epoch: 9   Global Step: 46000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:57:45,220-[lfw][46000]XNorm: 13.362953
Training: 2022-04-11 00:57:45,220-[lfw][46000]Accuracy-Flip: 0.99383+-0.00350
Training: 2022-04-11 00:57:45,221-[lfw][46000]Accuracy-Highest: 0.99517
Training: 2022-04-11 00:58:10,406-[cfp_fp][46000]XNorm: 11.290274
Training: 2022-04-11 00:58:10,407-[cfp_fp][46000]Accuracy-Flip: 0.95386+-0.00979
Training: 2022-04-11 00:58:10,407-[cfp_fp][46000]Accuracy-Highest: 0.95386
Training: 2022-04-11 00:58:32,232-[agedb_30][46000]XNorm: 13.125063
Training: 2022-04-11 00:58:32,232-[agedb_30][46000]Accuracy-Flip: 0.95833+-0.01116
Training: 2022-04-11 00:58:32,232-[agedb_30][46000]Accuracy-Highest: 0.95833
Training: 2022-04-11 00:58:33,177-Speed 146.13 samples/sec   Loss 9.1093   LearningRate 0.0597   Epoch: 9   Global Step: 46010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:58:34,117-Speed 10901.50 samples/sec   Loss 9.0494   LearningRate 0.0597   Epoch: 9   Global Step: 46020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:58:35,112-Speed 10303.83 samples/sec   Loss 9.1909   LearningRate 0.0597   Epoch: 9   Global Step: 46030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:58:36,038-Speed 11074.26 samples/sec   Loss 9.0383   LearningRate 0.0597   Epoch: 9   Global Step: 46040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:58:36,999-Speed 10665.63 samples/sec   Loss 9.2041   LearningRate 0.0597   Epoch: 9   Global Step: 46050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:58:37,950-Speed 10776.16 samples/sec   Loss 9.1169   LearningRate 0.0597   Epoch: 9   Global Step: 46060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:58:38,926-Speed 10500.36 samples/sec   Loss 9.1955   LearningRate 0.0596   Epoch: 9   Global Step: 46070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:58:39,898-Speed 10555.00 samples/sec   Loss 9.1776   LearningRate 0.0596   Epoch: 9   Global Step: 46080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:58:40,842-Speed 10850.73 samples/sec   Loss 9.2510   LearningRate 0.0596   Epoch: 9   Global Step: 46090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:58:41,835-Speed 10325.00 samples/sec   Loss 9.2037   LearningRate 0.0596   Epoch: 9   Global Step: 46100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:58:42,787-Speed 10993.24 samples/sec   Loss 9.2025   LearningRate 0.0596   Epoch: 9   Global Step: 46110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:58:43,737-Speed 10791.42 samples/sec   Loss 9.3030   LearningRate 0.0596   Epoch: 9   Global Step: 46120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:58:44,687-Speed 10780.94 samples/sec   Loss 9.2259   LearningRate 0.0596   Epoch: 9   Global Step: 46130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:58:45,673-Speed 10395.87 samples/sec   Loss 9.2468   LearningRate 0.0596   Epoch: 9   Global Step: 46140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:58:46,668-Speed 10308.87 samples/sec   Loss 9.0517   LearningRate 0.0596   Epoch: 9   Global Step: 46150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:58:47,594-Speed 11073.66 samples/sec   Loss 9.2259   LearningRate 0.0596   Epoch: 9   Global Step: 46160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:58:48,546-Speed 10768.79 samples/sec   Loss 9.1273   LearningRate 0.0596   Epoch: 9   Global Step: 46170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:58:49,544-Speed 10267.46 samples/sec   Loss 9.0952   LearningRate 0.0596   Epoch: 9   Global Step: 46180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:58:50,499-Speed 10732.23 samples/sec   Loss 9.2640   LearningRate 0.0596   Epoch: 9   Global Step: 46190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:58:51,510-Speed 10142.37 samples/sec   Loss 9.1891   LearningRate 0.0595   Epoch: 9   Global Step: 46200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:58:52,470-Speed 10682.22 samples/sec   Loss 9.1890   LearningRate 0.0595   Epoch: 9   Global Step: 46210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:58:53,447-Speed 10490.00 samples/sec   Loss 9.2277   LearningRate 0.0595   Epoch: 9   Global Step: 46220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:58:54,417-Speed 10560.34 samples/sec   Loss 9.1303   LearningRate 0.0595   Epoch: 9   Global Step: 46230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:58:55,387-Speed 10571.13 samples/sec   Loss 9.2373   LearningRate 0.0595   Epoch: 9   Global Step: 46240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:58:56,341-Speed 10735.69 samples/sec   Loss 9.2901   LearningRate 0.0595   Epoch: 9   Global Step: 46250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:58:57,280-Speed 10916.87 samples/sec   Loss 9.3346   LearningRate 0.0595   Epoch: 9   Global Step: 46260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:58:58,244-Speed 10627.03 samples/sec   Loss 9.1605   LearningRate 0.0595   Epoch: 9   Global Step: 46270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:58:59,199-Speed 10734.10 samples/sec   Loss 9.1907   LearningRate 0.0595   Epoch: 9   Global Step: 46280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:59:00,145-Speed 10830.11 samples/sec   Loss 9.4091   LearningRate 0.0595   Epoch: 9   Global Step: 46290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:59:01,173-Speed 9967.52 samples/sec   Loss 9.3417   LearningRate 0.0595   Epoch: 9   Global Step: 46300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:59:02,134-Speed 10677.89 samples/sec   Loss 9.2515   LearningRate 0.0595   Epoch: 9   Global Step: 46310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:59:03,093-Speed 10689.49 samples/sec   Loss 9.2824   LearningRate 0.0595   Epoch: 9   Global Step: 46320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:59:04,021-Speed 11041.10 samples/sec   Loss 9.1618   LearningRate 0.0594   Epoch: 9   Global Step: 46330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:59:04,979-Speed 10703.60 samples/sec   Loss 9.2839   LearningRate 0.0594   Epoch: 9   Global Step: 46340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:59:05,984-Speed 10192.90 samples/sec   Loss 9.2442   LearningRate 0.0594   Epoch: 9   Global Step: 46350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:59:06,923-Speed 10916.40 samples/sec   Loss 9.2498   LearningRate 0.0594   Epoch: 9   Global Step: 46360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:59:07,883-Speed 10681.98 samples/sec   Loss 9.3860   LearningRate 0.0594   Epoch: 9   Global Step: 46370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:59:08,870-Speed 10382.42 samples/sec   Loss 9.4026   LearningRate 0.0594   Epoch: 9   Global Step: 46380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:59:09,856-Speed 10398.35 samples/sec   Loss 9.5177   LearningRate 0.0594   Epoch: 9   Global Step: 46390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:59:10,819-Speed 10639.30 samples/sec   Loss 9.1923   LearningRate 0.0594   Epoch: 9   Global Step: 46400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:59:11,755-Speed 10949.52 samples/sec   Loss 9.1118   LearningRate 0.0594   Epoch: 9   Global Step: 46410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:59:12,678-Speed 11107.13 samples/sec   Loss 9.1762   LearningRate 0.0594   Epoch: 9   Global Step: 46420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:59:13,656-Speed 10489.58 samples/sec   Loss 9.2711   LearningRate 0.0594   Epoch: 9   Global Step: 46430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:59:14,596-Speed 10902.27 samples/sec   Loss 9.2500   LearningRate 0.0594   Epoch: 9   Global Step: 46440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:59:15,547-Speed 10781.52 samples/sec   Loss 9.2544   LearningRate 0.0594   Epoch: 9   Global Step: 46450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:59:16,551-Speed 10202.58 samples/sec   Loss 9.3991   LearningRate 0.0593   Epoch: 9   Global Step: 46460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:59:17,498-Speed 10828.30 samples/sec   Loss 9.2919   LearningRate 0.0593   Epoch: 9   Global Step: 46470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:59:18,456-Speed 10696.79 samples/sec   Loss 9.2110   LearningRate 0.0593   Epoch: 9   Global Step: 46480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:59:19,453-Speed 10275.46 samples/sec   Loss 9.3145   LearningRate 0.0593   Epoch: 9   Global Step: 46490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:59:20,408-Speed 10733.37 samples/sec   Loss 9.3648   LearningRate 0.0593   Epoch: 9   Global Step: 46500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:59:21,365-Speed 10717.15 samples/sec   Loss 9.3273   LearningRate 0.0593   Epoch: 9   Global Step: 46510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:59:22,303-Speed 10918.05 samples/sec   Loss 9.4205   LearningRate 0.0593   Epoch: 9   Global Step: 46520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:59:23,240-Speed 10949.04 samples/sec   Loss 9.4126   LearningRate 0.0593   Epoch: 9   Global Step: 46530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:59:24,202-Speed 10661.35 samples/sec   Loss 9.3221   LearningRate 0.0593   Epoch: 9   Global Step: 46540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:59:25,154-Speed 10767.32 samples/sec   Loss 9.3453   LearningRate 0.0593   Epoch: 9   Global Step: 46550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:59:26,128-Speed 10521.69 samples/sec   Loss 9.4182   LearningRate 0.0593   Epoch: 9   Global Step: 46560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:59:27,073-Speed 10839.10 samples/sec   Loss 9.4773   LearningRate 0.0593   Epoch: 9   Global Step: 46570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:59:28,009-Speed 10958.53 samples/sec   Loss 9.2675   LearningRate 0.0593   Epoch: 9   Global Step: 46580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:59:28,970-Speed 10660.48 samples/sec   Loss 9.5013   LearningRate 0.0592   Epoch: 9   Global Step: 46590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:59:29,961-Speed 10345.20 samples/sec   Loss 9.1923   LearningRate 0.0592   Epoch: 9   Global Step: 46600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:59:30,920-Speed 10689.91 samples/sec   Loss 9.4675   LearningRate 0.0592   Epoch: 9   Global Step: 46610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:59:31,882-Speed 10657.48 samples/sec   Loss 9.2985   LearningRate 0.0592   Epoch: 9   Global Step: 46620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:59:32,837-Speed 10728.41 samples/sec   Loss 9.4376   LearningRate 0.0592   Epoch: 9   Global Step: 46630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:59:33,818-Speed 10452.21 samples/sec   Loss 9.2945   LearningRate 0.0592   Epoch: 9   Global Step: 46640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:59:34,794-Speed 10502.96 samples/sec   Loss 9.5370   LearningRate 0.0592   Epoch: 9   Global Step: 46650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:59:35,787-Speed 10318.66 samples/sec   Loss 9.5483   LearningRate 0.0592   Epoch: 9   Global Step: 46660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:59:36,754-Speed 10604.39 samples/sec   Loss 9.3582   LearningRate 0.0592   Epoch: 9   Global Step: 46670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:59:37,726-Speed 10538.81 samples/sec   Loss 9.5321   LearningRate 0.0592   Epoch: 9   Global Step: 46680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:59:38,653-Speed 11064.23 samples/sec   Loss 9.1667   LearningRate 0.0592   Epoch: 9   Global Step: 46690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:59:39,670-Speed 10072.38 samples/sec   Loss 9.3816   LearningRate 0.0592   Epoch: 9   Global Step: 46700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:59:40,656-Speed 10398.96 samples/sec   Loss 9.5211   LearningRate 0.0592   Epoch: 9   Global Step: 46710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:59:41,610-Speed 10736.99 samples/sec   Loss 9.4775   LearningRate 0.0591   Epoch: 9   Global Step: 46720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:59:42,552-Speed 10879.28 samples/sec   Loss 9.4852   LearningRate 0.0591   Epoch: 9   Global Step: 46730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:59:43,515-Speed 10649.48 samples/sec   Loss 9.4041   LearningRate 0.0591   Epoch: 9   Global Step: 46740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:59:44,468-Speed 10759.18 samples/sec   Loss 9.3222   LearningRate 0.0591   Epoch: 9   Global Step: 46750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:59:45,435-Speed 10591.82 samples/sec   Loss 9.4012   LearningRate 0.0591   Epoch: 9   Global Step: 46760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:59:46,411-Speed 10509.22 samples/sec   Loss 9.3463   LearningRate 0.0591   Epoch: 9   Global Step: 46770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:59:47,363-Speed 10763.62 samples/sec   Loss 9.4896   LearningRate 0.0591   Epoch: 9   Global Step: 46780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:59:48,317-Speed 10747.34 samples/sec   Loss 9.3741   LearningRate 0.0591   Epoch: 9   Global Step: 46790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:59:49,276-Speed 10695.99 samples/sec   Loss 9.3113   LearningRate 0.0591   Epoch: 9   Global Step: 46800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:59:50,218-Speed 10881.77 samples/sec   Loss 9.2395   LearningRate 0.0591   Epoch: 9   Global Step: 46810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:59:51,200-Speed 10440.48 samples/sec   Loss 9.2001   LearningRate 0.0591   Epoch: 9   Global Step: 46820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:59:52,180-Speed 10454.97 samples/sec   Loss 9.2507   LearningRate 0.0591   Epoch: 9   Global Step: 46830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 00:59:53,145-Speed 10623.03 samples/sec   Loss 9.3246   LearningRate 0.0591   Epoch: 9   Global Step: 46840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:59:54,107-Speed 10655.64 samples/sec   Loss 9.4321   LearningRate 0.0590   Epoch: 9   Global Step: 46850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:59:55,082-Speed 10512.37 samples/sec   Loss 9.5877   LearningRate 0.0590   Epoch: 9   Global Step: 46860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:59:56,032-Speed 10789.28 samples/sec   Loss 9.4562   LearningRate 0.0590   Epoch: 9   Global Step: 46870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:59:56,998-Speed 10600.42 samples/sec   Loss 9.4597   LearningRate 0.0590   Epoch: 9   Global Step: 46880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:59:57,925-Speed 11060.44 samples/sec   Loss 9.2587   LearningRate 0.0590   Epoch: 9   Global Step: 46890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:59:58,919-Speed 10312.46 samples/sec   Loss 9.3678   LearningRate 0.0590   Epoch: 9   Global Step: 46900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 00:59:59,896-Speed 10497.49 samples/sec   Loss 9.3705   LearningRate 0.0590   Epoch: 9   Global Step: 46910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:00:00,841-Speed 10839.99 samples/sec   Loss 9.3986   LearningRate 0.0590   Epoch: 9   Global Step: 46920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:00:01,809-Speed 10590.10 samples/sec   Loss 9.2538   LearningRate 0.0590   Epoch: 9   Global Step: 46930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:00:02,775-Speed 10607.33 samples/sec   Loss 9.4174   LearningRate 0.0590   Epoch: 9   Global Step: 46940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:00:03,765-Speed 10360.42 samples/sec   Loss 9.3956   LearningRate 0.0590   Epoch: 9   Global Step: 46950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:00:04,732-Speed 10598.02 samples/sec   Loss 9.2218   LearningRate 0.0590   Epoch: 9   Global Step: 46960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:00:05,698-Speed 10607.24 samples/sec   Loss 9.3793   LearningRate 0.0590   Epoch: 9   Global Step: 46970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:00:06,653-Speed 10734.39 samples/sec   Loss 9.4375   LearningRate 0.0590   Epoch: 9   Global Step: 46980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:00:07,654-Speed 10439.72 samples/sec   Loss 9.4234   LearningRate 0.0589   Epoch: 9   Global Step: 46990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:00:08,625-Speed 10562.87 samples/sec   Loss 9.4313   LearningRate 0.0589   Epoch: 9   Global Step: 47000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:00:09,563-Speed 10927.24 samples/sec   Loss 9.4277   LearningRate 0.0589   Epoch: 9   Global Step: 47010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:00:10,528-Speed 10620.23 samples/sec   Loss 9.4558   LearningRate 0.0589   Epoch: 9   Global Step: 47020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:00:11,520-Speed 10331.96 samples/sec   Loss 9.2369   LearningRate 0.0589   Epoch: 9   Global Step: 47030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:00:12,460-Speed 10901.07 samples/sec   Loss 9.3617   LearningRate 0.0589   Epoch: 9   Global Step: 47040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:00:13,374-Speed 11219.03 samples/sec   Loss 9.3842   LearningRate 0.0589   Epoch: 9   Global Step: 47050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:00:14,359-Speed 10403.45 samples/sec   Loss 9.4985   LearningRate 0.0589   Epoch: 9   Global Step: 47060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:00:15,317-Speed 10692.15 samples/sec   Loss 9.5231   LearningRate 0.0589   Epoch: 9   Global Step: 47070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:00:16,300-Speed 10429.84 samples/sec   Loss 9.7477   LearningRate 0.0589   Epoch: 9   Global Step: 47080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:00:17,281-Speed 10444.17 samples/sec   Loss 9.5845   LearningRate 0.0589   Epoch: 9   Global Step: 47090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:00:18,266-Speed 10416.84 samples/sec   Loss 9.4781   LearningRate 0.0589   Epoch: 9   Global Step: 47100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:00:19,219-Speed 10756.95 samples/sec   Loss 9.3481   LearningRate 0.0589   Epoch: 9   Global Step: 47110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:00:20,204-Speed 10406.06 samples/sec   Loss 9.6500   LearningRate 0.0588   Epoch: 9   Global Step: 47120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:00:21,163-Speed 10676.74 samples/sec   Loss 9.3258   LearningRate 0.0588   Epoch: 9   Global Step: 47130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:00:22,088-Speed 11086.15 samples/sec   Loss 9.2747   LearningRate 0.0588   Epoch: 9   Global Step: 47140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:00:23,073-Speed 10394.54 samples/sec   Loss 9.4832   LearningRate 0.0588   Epoch: 9   Global Step: 47150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:00:24,031-Speed 10709.38 samples/sec   Loss 9.3349   LearningRate 0.0588   Epoch: 9   Global Step: 47160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:00:24,979-Speed 10813.42 samples/sec   Loss 9.4603   LearningRate 0.0588   Epoch: 9   Global Step: 47170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:00:25,917-Speed 10929.36 samples/sec   Loss 9.4306   LearningRate 0.0588   Epoch: 9   Global Step: 47180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:00:26,884-Speed 10592.43 samples/sec   Loss 9.4969   LearningRate 0.0588   Epoch: 9   Global Step: 47190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:00:27,807-Speed 11104.08 samples/sec   Loss 9.3965   LearningRate 0.0588   Epoch: 9   Global Step: 47200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:00:28,815-Speed 10171.83 samples/sec   Loss 9.4738   LearningRate 0.0588   Epoch: 9   Global Step: 47210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:00:29,786-Speed 10560.59 samples/sec   Loss 9.5000   LearningRate 0.0588   Epoch: 9   Global Step: 47220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:00:30,750-Speed 10632.93 samples/sec   Loss 9.3348   LearningRate 0.0588   Epoch: 9   Global Step: 47230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:00:31,756-Speed 10181.10 samples/sec   Loss 9.4066   LearningRate 0.0588   Epoch: 9   Global Step: 47240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:00:32,737-Speed 10457.54 samples/sec   Loss 9.5215   LearningRate 0.0587   Epoch: 9   Global Step: 47250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:00:33,694-Speed 10704.74 samples/sec   Loss 9.3900   LearningRate 0.0587   Epoch: 9   Global Step: 47260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:00:34,639-Speed 10849.47 samples/sec   Loss 9.3237   LearningRate 0.0587   Epoch: 9   Global Step: 47270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:00:35,611-Speed 10546.37 samples/sec   Loss 9.4450   LearningRate 0.0587   Epoch: 9   Global Step: 47280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:00:36,602-Speed 10343.36 samples/sec   Loss 9.3849   LearningRate 0.0587   Epoch: 9   Global Step: 47290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:00:37,531-Speed 11035.29 samples/sec   Loss 9.3664   LearningRate 0.0587   Epoch: 9   Global Step: 47300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:00:38,482-Speed 10772.91 samples/sec   Loss 9.5333   LearningRate 0.0587   Epoch: 9   Global Step: 47310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:00:39,417-Speed 10969.87 samples/sec   Loss 9.3741   LearningRate 0.0587   Epoch: 9   Global Step: 47320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:00:40,360-Speed 10861.58 samples/sec   Loss 9.4557   LearningRate 0.0587   Epoch: 9   Global Step: 47330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:00:41,352-Speed 10336.07 samples/sec   Loss 9.2694   LearningRate 0.0587   Epoch: 9   Global Step: 47340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:00:42,331-Speed 10475.63 samples/sec   Loss 9.5005   LearningRate 0.0587   Epoch: 9   Global Step: 47350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:00:43,280-Speed 10793.83 samples/sec   Loss 9.4214   LearningRate 0.0587   Epoch: 9   Global Step: 47360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:00:44,256-Speed 10507.95 samples/sec   Loss 9.4020   LearningRate 0.0587   Epoch: 9   Global Step: 47370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:00:45,220-Speed 10624.57 samples/sec   Loss 9.3591   LearningRate 0.0586   Epoch: 9   Global Step: 47380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:00:46,178-Speed 10702.14 samples/sec   Loss 9.4670   LearningRate 0.0586   Epoch: 9   Global Step: 47390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:00:47,140-Speed 10659.72 samples/sec   Loss 9.5520   LearningRate 0.0586   Epoch: 9   Global Step: 47400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:00:48,106-Speed 10607.62 samples/sec   Loss 9.3738   LearningRate 0.0586   Epoch: 9   Global Step: 47410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:00:49,067-Speed 10667.97 samples/sec   Loss 9.5115   LearningRate 0.0586   Epoch: 9   Global Step: 47420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:00:50,030-Speed 10639.95 samples/sec   Loss 9.3091   LearningRate 0.0586   Epoch: 9   Global Step: 47430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:00:50,982-Speed 10759.48 samples/sec   Loss 9.5433   LearningRate 0.0586   Epoch: 9   Global Step: 47440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:00:51,967-Speed 10415.31 samples/sec   Loss 9.5387   LearningRate 0.0586   Epoch: 9   Global Step: 47450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:00:52,916-Speed 10805.84 samples/sec   Loss 9.4513   LearningRate 0.0586   Epoch: 9   Global Step: 47460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:00:53,890-Speed 10523.10 samples/sec   Loss 9.5691   LearningRate 0.0586   Epoch: 9   Global Step: 47470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:00:54,870-Speed 10455.12 samples/sec   Loss 9.4087   LearningRate 0.0586   Epoch: 9   Global Step: 47480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:00:55,809-Speed 10914.72 samples/sec   Loss 9.4142   LearningRate 0.0586   Epoch: 9   Global Step: 47490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:00:56,736-Speed 11059.56 samples/sec   Loss 9.4643   LearningRate 0.0586   Epoch: 9   Global Step: 47500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:00:57,719-Speed 10425.13 samples/sec   Loss 9.3685   LearningRate 0.0585   Epoch: 9   Global Step: 47510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:00:58,670-Speed 10774.68 samples/sec   Loss 9.4597   LearningRate 0.0585   Epoch: 9   Global Step: 47520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:00:59,609-Speed 10923.15 samples/sec   Loss 9.4085   LearningRate 0.0585   Epoch: 9   Global Step: 47530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:01:00,579-Speed 10561.83 samples/sec   Loss 9.5920   LearningRate 0.0585   Epoch: 9   Global Step: 47540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:01:01,537-Speed 10694.26 samples/sec   Loss 9.5188   LearningRate 0.0585   Epoch: 9   Global Step: 47550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:01:02,501-Speed 10640.15 samples/sec   Loss 9.5316   LearningRate 0.0585   Epoch: 9   Global Step: 47560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:01:03,471-Speed 10557.41 samples/sec   Loss 9.4408   LearningRate 0.0585   Epoch: 9   Global Step: 47570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:01:04,418-Speed 10824.89 samples/sec   Loss 9.4822   LearningRate 0.0585   Epoch: 9   Global Step: 47580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:01:05,363-Speed 10853.54 samples/sec   Loss 9.3632   LearningRate 0.0585   Epoch: 9   Global Step: 47590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:01:06,322-Speed 10684.64 samples/sec   Loss 9.4597   LearningRate 0.0585   Epoch: 9   Global Step: 47600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:01:07,247-Speed 11080.02 samples/sec   Loss 9.5810   LearningRate 0.0585   Epoch: 9   Global Step: 47610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:01:08,207-Speed 10676.00 samples/sec   Loss 9.5548   LearningRate 0.0585   Epoch: 9   Global Step: 47620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:01:09,240-Speed 9923.67 samples/sec   Loss 9.5891   LearningRate 0.0585   Epoch: 9   Global Step: 47630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:01:10,208-Speed 10586.42 samples/sec   Loss 9.6432   LearningRate 0.0585   Epoch: 9   Global Step: 47640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:01:11,155-Speed 10833.42 samples/sec   Loss 9.5403   LearningRate 0.0584   Epoch: 9   Global Step: 47650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:01:12,120-Speed 10610.19 samples/sec   Loss 9.5640   LearningRate 0.0584   Epoch: 9   Global Step: 47660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:01:13,090-Speed 10567.83 samples/sec   Loss 9.3558   LearningRate 0.0584   Epoch: 9   Global Step: 47670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:01:14,045-Speed 10734.95 samples/sec   Loss 9.4267   LearningRate 0.0584   Epoch: 9   Global Step: 47680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:01:14,999-Speed 10745.72 samples/sec   Loss 9.4486   LearningRate 0.0584   Epoch: 9   Global Step: 47690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:01:15,969-Speed 10567.28 samples/sec   Loss 9.4251   LearningRate 0.0584   Epoch: 9   Global Step: 47700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:01:16,932-Speed 10641.76 samples/sec   Loss 9.5574   LearningRate 0.0584   Epoch: 9   Global Step: 47710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:01:17,913-Speed 10441.52 samples/sec   Loss 9.4259   LearningRate 0.0584   Epoch: 9   Global Step: 47720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:01:18,831-Speed 11167.83 samples/sec   Loss 9.5239   LearningRate 0.0584   Epoch: 9   Global Step: 47730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:01:19,801-Speed 10574.49 samples/sec   Loss 9.6028   LearningRate 0.0584   Epoch: 9   Global Step: 47740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:01:20,771-Speed 10564.87 samples/sec   Loss 9.4151   LearningRate 0.0584   Epoch: 9   Global Step: 47750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:01:21,724-Speed 10754.97 samples/sec   Loss 9.4545   LearningRate 0.0584   Epoch: 9   Global Step: 47760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:01:22,692-Speed 10589.75 samples/sec   Loss 9.3554   LearningRate 0.0584   Epoch: 9   Global Step: 47770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:01:23,690-Speed 10265.16 samples/sec   Loss 9.4648   LearningRate 0.0583   Epoch: 9   Global Step: 47780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:01:24,649-Speed 10687.99 samples/sec   Loss 9.4539   LearningRate 0.0583   Epoch: 9   Global Step: 47790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:01:25,662-Speed 10122.10 samples/sec   Loss 9.5210   LearningRate 0.0583   Epoch: 9   Global Step: 47800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:01:26,609-Speed 10824.78 samples/sec   Loss 9.5389   LearningRate 0.0583   Epoch: 9   Global Step: 47810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:01:27,588-Speed 10468.51 samples/sec   Loss 9.5716   LearningRate 0.0583   Epoch: 9   Global Step: 47820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:01:28,560-Speed 10543.61 samples/sec   Loss 9.3151   LearningRate 0.0583   Epoch: 9   Global Step: 47830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:01:29,511-Speed 10790.48 samples/sec   Loss 9.4147   LearningRate 0.0583   Epoch: 9   Global Step: 47840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:01:30,476-Speed 10614.90 samples/sec   Loss 9.5499   LearningRate 0.0583   Epoch: 9   Global Step: 47850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:01:31,446-Speed 10569.50 samples/sec   Loss 9.4010   LearningRate 0.0583   Epoch: 9   Global Step: 47860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:01:32,486-Speed 9854.92 samples/sec   Loss 9.6894   LearningRate 0.0583   Epoch: 9   Global Step: 47870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:01:33,487-Speed 10242.26 samples/sec   Loss 9.4359   LearningRate 0.0583   Epoch: 9   Global Step: 47880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:01:34,456-Speed 10577.34 samples/sec   Loss 9.4599   LearningRate 0.0583   Epoch: 9   Global Step: 47890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:01:35,403-Speed 10815.88 samples/sec   Loss 9.4661   LearningRate 0.0583   Epoch: 9   Global Step: 47900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:01:36,356-Speed 10757.31 samples/sec   Loss 9.3523   LearningRate 0.0582   Epoch: 9   Global Step: 47910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:01:37,322-Speed 10614.59 samples/sec   Loss 9.3711   LearningRate 0.0582   Epoch: 9   Global Step: 47920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:01:38,260-Speed 10929.04 samples/sec   Loss 9.5837   LearningRate 0.0582   Epoch: 9   Global Step: 47930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:01:39,221-Speed 10661.69 samples/sec   Loss 9.5093   LearningRate 0.0582   Epoch: 9   Global Step: 47940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:01:40,195-Speed 10531.40 samples/sec   Loss 9.4529   LearningRate 0.0582   Epoch: 9   Global Step: 47950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:01:41,151-Speed 10714.41 samples/sec   Loss 9.6941   LearningRate 0.0582   Epoch: 9   Global Step: 47960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:01:42,101-Speed 10794.72 samples/sec   Loss 9.3685   LearningRate 0.0582   Epoch: 9   Global Step: 47970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:01:43,087-Speed 10400.95 samples/sec   Loss 9.6145   LearningRate 0.0582   Epoch: 9   Global Step: 47980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:01:44,085-Speed 10262.81 samples/sec   Loss 9.5190   LearningRate 0.0582   Epoch: 9   Global Step: 47990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:01:45,098-Speed 10125.54 samples/sec   Loss 9.5533   LearningRate 0.0582   Epoch: 9   Global Step: 48000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:02:12,450-[lfw][48000]XNorm: 13.420170
Training: 2022-04-11 01:02:12,451-[lfw][48000]Accuracy-Flip: 0.99550+-0.00435
Training: 2022-04-11 01:02:12,451-[lfw][48000]Accuracy-Highest: 0.99550
Training: 2022-04-11 01:02:37,843-[cfp_fp][48000]XNorm: 11.257112
Training: 2022-04-11 01:02:37,844-[cfp_fp][48000]Accuracy-Flip: 0.95114+-0.01198
Training: 2022-04-11 01:02:37,845-[cfp_fp][48000]Accuracy-Highest: 0.95386
Training: 2022-04-11 01:02:59,892-[agedb_30][48000]XNorm: 13.070618
Training: 2022-04-11 01:02:59,892-[agedb_30][48000]Accuracy-Flip: 0.96250+-0.01003
Training: 2022-04-11 01:02:59,893-[agedb_30][48000]Accuracy-Highest: 0.96250
Training: 2022-04-11 01:03:00,858-Speed 135.16 samples/sec   Loss 9.6621   LearningRate 0.0582   Epoch: 9   Global Step: 48010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:03:01,791-Speed 10988.80 samples/sec   Loss 9.3119   LearningRate 0.0582   Epoch: 9   Global Step: 48020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:03:02,765-Speed 10528.58 samples/sec   Loss 9.5004   LearningRate 0.0582   Epoch: 9   Global Step: 48030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:03:03,723-Speed 10688.65 samples/sec   Loss 9.5386   LearningRate 0.0581   Epoch: 9   Global Step: 48040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:03:04,699-Speed 10500.22 samples/sec   Loss 9.4296   LearningRate 0.0581   Epoch: 9   Global Step: 48050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:03:05,660-Speed 10665.08 samples/sec   Loss 9.4857   LearningRate 0.0581   Epoch: 9   Global Step: 48060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:03:06,655-Speed 10314.48 samples/sec   Loss 9.3060   LearningRate 0.0581   Epoch: 9   Global Step: 48070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:03:07,589-Speed 10971.06 samples/sec   Loss 9.4239   LearningRate 0.0581   Epoch: 9   Global Step: 48080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:03:08,534-Speed 10852.59 samples/sec   Loss 9.4776   LearningRate 0.0581   Epoch: 9   Global Step: 48090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:03:09,513-Speed 10470.39 samples/sec   Loss 9.4456   LearningRate 0.0581   Epoch: 9   Global Step: 48100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:03:10,478-Speed 10613.89 samples/sec   Loss 9.4361   LearningRate 0.0581   Epoch: 9   Global Step: 48110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:03:11,454-Speed 10502.41 samples/sec   Loss 9.6613   LearningRate 0.0581   Epoch: 9   Global Step: 48120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:03:12,406-Speed 10776.39 samples/sec   Loss 9.4191   LearningRate 0.0581   Epoch: 9   Global Step: 48130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:03:13,379-Speed 10535.08 samples/sec   Loss 9.6269   LearningRate 0.0581   Epoch: 9   Global Step: 48140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:03:14,332-Speed 10753.87 samples/sec   Loss 9.6039   LearningRate 0.0581   Epoch: 9   Global Step: 48150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:03:15,277-Speed 10842.28 samples/sec   Loss 9.4586   LearningRate 0.0581   Epoch: 9   Global Step: 48160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:03:16,252-Speed 10513.78 samples/sec   Loss 9.3706   LearningRate 0.0581   Epoch: 9   Global Step: 48170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:03:17,215-Speed 10643.92 samples/sec   Loss 9.5604   LearningRate 0.0580   Epoch: 9   Global Step: 48180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:03:18,174-Speed 10694.66 samples/sec   Loss 9.4503   LearningRate 0.0580   Epoch: 9   Global Step: 48190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:03:19,141-Speed 10590.62 samples/sec   Loss 9.5866   LearningRate 0.0580   Epoch: 9   Global Step: 48200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:03:20,110-Speed 10577.05 samples/sec   Loss 9.4528   LearningRate 0.0580   Epoch: 9   Global Step: 48210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:03:21,069-Speed 10690.00 samples/sec   Loss 9.4298   LearningRate 0.0580   Epoch: 9   Global Step: 48220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:03:22,046-Speed 10492.61 samples/sec   Loss 9.3025   LearningRate 0.0580   Epoch: 9   Global Step: 48230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:03:22,996-Speed 10795.27 samples/sec   Loss 9.6317   LearningRate 0.0580   Epoch: 9   Global Step: 48240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:03:23,963-Speed 10590.64 samples/sec   Loss 9.4524   LearningRate 0.0580   Epoch: 9   Global Step: 48250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:03:24,943-Speed 10466.98 samples/sec   Loss 9.2610   LearningRate 0.0580   Epoch: 9   Global Step: 48260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:03:25,947-Speed 10202.36 samples/sec   Loss 9.3035   LearningRate 0.0580   Epoch: 9   Global Step: 48270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:03:26,873-Speed 11077.13 samples/sec   Loss 9.2367   LearningRate 0.0580   Epoch: 9   Global Step: 48280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:03:27,846-Speed 10527.60 samples/sec   Loss 9.3779   LearningRate 0.0580   Epoch: 9   Global Step: 48290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:03:28,857-Speed 10147.87 samples/sec   Loss 9.4491   LearningRate 0.0580   Epoch: 9   Global Step: 48300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:03:29,826-Speed 10576.83 samples/sec   Loss 9.7100   LearningRate 0.0579   Epoch: 9   Global Step: 48310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:03:30,750-Speed 11088.14 samples/sec   Loss 9.5444   LearningRate 0.0579   Epoch: 9   Global Step: 48320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:03:31,727-Speed 10492.37 samples/sec   Loss 9.4768   LearningRate 0.0579   Epoch: 9   Global Step: 48330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:03:32,671-Speed 10859.38 samples/sec   Loss 9.4558   LearningRate 0.0579   Epoch: 9   Global Step: 48340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:03:33,620-Speed 10797.42 samples/sec   Loss 9.5899   LearningRate 0.0579   Epoch: 9   Global Step: 48350   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:03:34,604-Speed 10413.79 samples/sec   Loss 9.4734   LearningRate 0.0579   Epoch: 9   Global Step: 48360   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:03:35,580-Speed 10506.93 samples/sec   Loss 9.5374   LearningRate 0.0579   Epoch: 9   Global Step: 48370   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:03:36,561-Speed 10439.52 samples/sec   Loss 9.3438   LearningRate 0.0579   Epoch: 9   Global Step: 48380   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:03:37,547-Speed 10402.58 samples/sec   Loss 9.4443   LearningRate 0.0579   Epoch: 9   Global Step: 48390   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:03:38,502-Speed 10733.21 samples/sec   Loss 9.5013   LearningRate 0.0579   Epoch: 9   Global Step: 48400   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:03:39,472-Speed 10568.46 samples/sec   Loss 9.4765   LearningRate 0.0579   Epoch: 9   Global Step: 48410   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:03:40,416-Speed 10865.69 samples/sec   Loss 9.4447   LearningRate 0.0579   Epoch: 9   Global Step: 48420   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:03:41,395-Speed 10466.19 samples/sec   Loss 9.3272   LearningRate 0.0579   Epoch: 9   Global Step: 48430   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:03:42,364-Speed 10581.33 samples/sec   Loss 9.5132   LearningRate 0.0578   Epoch: 9   Global Step: 48440   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:03:43,321-Speed 10710.27 samples/sec   Loss 9.3836   LearningRate 0.0578   Epoch: 9   Global Step: 48450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:03:44,280-Speed 10686.60 samples/sec   Loss 9.5005   LearningRate 0.0578   Epoch: 9   Global Step: 48460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:03:45,245-Speed 10619.78 samples/sec   Loss 9.4377   LearningRate 0.0578   Epoch: 9   Global Step: 48470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:03:46,170-Speed 11082.20 samples/sec   Loss 9.4740   LearningRate 0.0578   Epoch: 9   Global Step: 48480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:03:47,156-Speed 10388.28 samples/sec   Loss 9.5309   LearningRate 0.0578   Epoch: 9   Global Step: 48490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:03:48,149-Speed 10329.18 samples/sec   Loss 9.3244   LearningRate 0.0578   Epoch: 9   Global Step: 48500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:03:49,151-Speed 10238.17 samples/sec   Loss 9.4760   LearningRate 0.0578   Epoch: 9   Global Step: 48510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:03:50,122-Speed 10553.64 samples/sec   Loss 9.5663   LearningRate 0.0578   Epoch: 9   Global Step: 48520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:03:51,170-Speed 9781.31 samples/sec   Loss 9.6248   LearningRate 0.0578   Epoch: 9   Global Step: 48530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:03:52,116-Speed 10835.34 samples/sec   Loss 9.5822   LearningRate 0.0578   Epoch: 9   Global Step: 48540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:03:53,155-Speed 9860.57 samples/sec   Loss 9.4404   LearningRate 0.0578   Epoch: 9   Global Step: 48550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:03:54,158-Speed 10224.85 samples/sec   Loss 9.5804   LearningRate 0.0578   Epoch: 9   Global Step: 48560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:03:55,120-Speed 10658.60 samples/sec   Loss 9.4907   LearningRate 0.0578   Epoch: 9   Global Step: 48570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:03:56,086-Speed 10614.13 samples/sec   Loss 9.5037   LearningRate 0.0577   Epoch: 9   Global Step: 48580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:03:57,041-Speed 10731.35 samples/sec   Loss 9.3754   LearningRate 0.0577   Epoch: 9   Global Step: 48590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:03:58,107-Speed 9616.84 samples/sec   Loss 9.4566   LearningRate 0.0577   Epoch: 9   Global Step: 48600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:03:59,082-Speed 10507.41 samples/sec   Loss 9.3235   LearningRate 0.0577   Epoch: 9   Global Step: 48610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:04:00,062-Speed 10463.96 samples/sec   Loss 9.3331   LearningRate 0.0577   Epoch: 9   Global Step: 48620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:04:01,043-Speed 10439.53 samples/sec   Loss 9.4880   LearningRate 0.0577   Epoch: 9   Global Step: 48630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:04:02,004-Speed 10669.55 samples/sec   Loss 9.4645   LearningRate 0.0577   Epoch: 9   Global Step: 48640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:04:02,996-Speed 10326.98 samples/sec   Loss 9.4438   LearningRate 0.0577   Epoch: 9   Global Step: 48650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:04:03,959-Speed 10646.81 samples/sec   Loss 9.4613   LearningRate 0.0577   Epoch: 9   Global Step: 48660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:04:04,902-Speed 10868.75 samples/sec   Loss 9.3881   LearningRate 0.0577   Epoch: 9   Global Step: 48670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:04:05,893-Speed 10346.71 samples/sec   Loss 9.5935   LearningRate 0.0577   Epoch: 9   Global Step: 48680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:04:06,853-Speed 10678.36 samples/sec   Loss 9.4562   LearningRate 0.0577   Epoch: 9   Global Step: 48690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:04:07,874-Speed 10035.77 samples/sec   Loss 9.5175   LearningRate 0.0577   Epoch: 9   Global Step: 48700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:04:08,860-Speed 10390.49 samples/sec   Loss 9.3964   LearningRate 0.0576   Epoch: 9   Global Step: 48710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:04:09,849-Speed 10365.85 samples/sec   Loss 9.4812   LearningRate 0.0576   Epoch: 9   Global Step: 48720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:04:10,818-Speed 10580.15 samples/sec   Loss 9.4840   LearningRate 0.0576   Epoch: 9   Global Step: 48730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:04:11,823-Speed 10199.83 samples/sec   Loss 9.3768   LearningRate 0.0576   Epoch: 9   Global Step: 48740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:04:12,789-Speed 10601.79 samples/sec   Loss 9.5204   LearningRate 0.0576   Epoch: 9   Global Step: 48750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:04:13,766-Speed 10494.42 samples/sec   Loss 9.5075   LearningRate 0.0576   Epoch: 9   Global Step: 48760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:04:14,775-Speed 10155.86 samples/sec   Loss 9.5501   LearningRate 0.0576   Epoch: 9   Global Step: 48770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:04:15,736-Speed 10672.13 samples/sec   Loss 9.4774   LearningRate 0.0576   Epoch: 9   Global Step: 48780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:04:16,676-Speed 10891.42 samples/sec   Loss 9.5827   LearningRate 0.0576   Epoch: 9   Global Step: 48790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:04:17,642-Speed 10613.77 samples/sec   Loss 9.4532   LearningRate 0.0576   Epoch: 9   Global Step: 48800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:04:18,601-Speed 10693.83 samples/sec   Loss 9.5286   LearningRate 0.0576   Epoch: 9   Global Step: 48810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:04:19,582-Speed 10452.33 samples/sec   Loss 9.4973   LearningRate 0.0576   Epoch: 9   Global Step: 48820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:04:20,519-Speed 10943.46 samples/sec   Loss 9.6429   LearningRate 0.0576   Epoch: 9   Global Step: 48830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:04:21,494-Speed 10509.46 samples/sec   Loss 9.3480   LearningRate 0.0575   Epoch: 9   Global Step: 48840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:04:22,532-Speed 9875.74 samples/sec   Loss 9.5398   LearningRate 0.0575   Epoch: 9   Global Step: 48850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:04:23,565-Speed 9923.53 samples/sec   Loss 9.3866   LearningRate 0.0575   Epoch: 9   Global Step: 48860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:04:24,549-Speed 10420.00 samples/sec   Loss 9.4087   LearningRate 0.0575   Epoch: 9   Global Step: 48870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:04:25,503-Speed 10742.93 samples/sec   Loss 9.3702   LearningRate 0.0575   Epoch: 9   Global Step: 48880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:04:26,429-Speed 11067.85 samples/sec   Loss 9.5158   LearningRate 0.0575   Epoch: 9   Global Step: 48890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:04:27,399-Speed 10570.69 samples/sec   Loss 9.4448   LearningRate 0.0575   Epoch: 9   Global Step: 48900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:04:28,385-Speed 10392.61 samples/sec   Loss 9.4979   LearningRate 0.0575   Epoch: 9   Global Step: 48910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:04:29,388-Speed 10222.19 samples/sec   Loss 9.3743   LearningRate 0.0575   Epoch: 9   Global Step: 48920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:04:30,420-Speed 9933.95 samples/sec   Loss 9.4750   LearningRate 0.0575   Epoch: 9   Global Step: 48930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:04:31,372-Speed 10766.54 samples/sec   Loss 9.2933   LearningRate 0.0575   Epoch: 9   Global Step: 48940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:04:32,369-Speed 10277.42 samples/sec   Loss 9.5108   LearningRate 0.0575   Epoch: 9   Global Step: 48950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:04:33,325-Speed 10718.86 samples/sec   Loss 9.4652   LearningRate 0.0575   Epoch: 9   Global Step: 48960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:04:34,289-Speed 10628.79 samples/sec   Loss 9.5250   LearningRate 0.0574   Epoch: 9   Global Step: 48970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:04:35,246-Speed 10717.49 samples/sec   Loss 9.5752   LearningRate 0.0574   Epoch: 9   Global Step: 48980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:04:36,195-Speed 10798.41 samples/sec   Loss 9.4739   LearningRate 0.0574   Epoch: 9   Global Step: 48990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:04:37,179-Speed 10419.14 samples/sec   Loss 9.4608   LearningRate 0.0574   Epoch: 9   Global Step: 49000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:04:38,166-Speed 10393.30 samples/sec   Loss 9.5303   LearningRate 0.0574   Epoch: 9   Global Step: 49010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:04:39,094-Speed 11034.57 samples/sec   Loss 9.6149   LearningRate 0.0574   Epoch: 9   Global Step: 49020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:04:40,060-Speed 10613.87 samples/sec   Loss 9.4598   LearningRate 0.0574   Epoch: 9   Global Step: 49030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:04:41,013-Speed 10751.50 samples/sec   Loss 9.4557   LearningRate 0.0574   Epoch: 9   Global Step: 49040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:04:42,014-Speed 10238.92 samples/sec   Loss 9.3940   LearningRate 0.0574   Epoch: 9   Global Step: 49050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:04:43,041-Speed 9984.11 samples/sec   Loss 9.2403   LearningRate 0.0574   Epoch: 9   Global Step: 49060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:04:43,995-Speed 10743.32 samples/sec   Loss 9.5035   LearningRate 0.0574   Epoch: 9   Global Step: 49070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:04:45,044-Speed 9769.53 samples/sec   Loss 9.6086   LearningRate 0.0574   Epoch: 9   Global Step: 49080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:04:45,983-Speed 10915.47 samples/sec   Loss 9.3707   LearningRate 0.0574   Epoch: 9   Global Step: 49090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:04:47,038-Speed 9715.82 samples/sec   Loss 9.4713   LearningRate 0.0574   Epoch: 9   Global Step: 49100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:04:48,039-Speed 10246.37 samples/sec   Loss 9.5375   LearningRate 0.0573   Epoch: 9   Global Step: 49110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:04:48,981-Speed 10885.42 samples/sec   Loss 9.4632   LearningRate 0.0573   Epoch: 9   Global Step: 49120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:04:49,955-Speed 10516.83 samples/sec   Loss 9.3892   LearningRate 0.0573   Epoch: 9   Global Step: 49130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:04:50,933-Speed 10485.31 samples/sec   Loss 9.4213   LearningRate 0.0573   Epoch: 9   Global Step: 49140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:04:51,903-Speed 10568.12 samples/sec   Loss 9.5662   LearningRate 0.0573   Epoch: 9   Global Step: 49150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:04:52,905-Speed 10230.03 samples/sec   Loss 9.2684   LearningRate 0.0573   Epoch: 9   Global Step: 49160   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 01:04:53,857-Speed 10761.86 samples/sec   Loss 9.5657   LearningRate 0.0573   Epoch: 9   Global Step: 49170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:04:54,831-Speed 10532.52 samples/sec   Loss 9.4504   LearningRate 0.0573   Epoch: 9   Global Step: 49180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:04:55,787-Speed 10710.27 samples/sec   Loss 9.4024   LearningRate 0.0573   Epoch: 9   Global Step: 49190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:04:56,781-Speed 10316.27 samples/sec   Loss 9.3417   LearningRate 0.0573   Epoch: 9   Global Step: 49200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:04:57,771-Speed 10354.41 samples/sec   Loss 9.5859   LearningRate 0.0573   Epoch: 9   Global Step: 49210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:04:58,744-Speed 10537.50 samples/sec   Loss 9.5411   LearningRate 0.0573   Epoch: 9   Global Step: 49220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:04:59,740-Speed 10281.72 samples/sec   Loss 9.5671   LearningRate 0.0573   Epoch: 9   Global Step: 49230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:05:00,713-Speed 10540.34 samples/sec   Loss 9.4329   LearningRate 0.0572   Epoch: 9   Global Step: 49240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:05:01,688-Speed 10513.78 samples/sec   Loss 9.3179   LearningRate 0.0572   Epoch: 9   Global Step: 49250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:05:02,655-Speed 10603.57 samples/sec   Loss 9.4432   LearningRate 0.0572   Epoch: 9   Global Step: 49260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:05:03,680-Speed 9993.63 samples/sec   Loss 9.4548   LearningRate 0.0572   Epoch: 9   Global Step: 49270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:05:04,635-Speed 10744.88 samples/sec   Loss 9.4162   LearningRate 0.0572   Epoch: 9   Global Step: 49280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:05:05,587-Speed 10763.54 samples/sec   Loss 9.3981   LearningRate 0.0572   Epoch: 9   Global Step: 49290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:05:06,548-Speed 10664.12 samples/sec   Loss 9.6242   LearningRate 0.0572   Epoch: 9   Global Step: 49300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:05:07,557-Speed 10163.74 samples/sec   Loss 9.5272   LearningRate 0.0572   Epoch: 9   Global Step: 49310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:05:08,542-Speed 10403.48 samples/sec   Loss 9.5238   LearningRate 0.0572   Epoch: 9   Global Step: 49320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:05:09,490-Speed 10814.10 samples/sec   Loss 9.3997   LearningRate 0.0572   Epoch: 9   Global Step: 49330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:05:10,462-Speed 10550.56 samples/sec   Loss 9.3529   LearningRate 0.0572   Epoch: 9   Global Step: 49340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:05:11,444-Speed 10432.91 samples/sec   Loss 9.5101   LearningRate 0.0572   Epoch: 9   Global Step: 49350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:05:12,406-Speed 10652.19 samples/sec   Loss 9.2859   LearningRate 0.0572   Epoch: 9   Global Step: 49360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:05:13,410-Speed 10210.96 samples/sec   Loss 9.3955   LearningRate 0.0572   Epoch: 9   Global Step: 49370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:05:14,379-Speed 10573.93 samples/sec   Loss 9.3602   LearningRate 0.0571   Epoch: 9   Global Step: 49380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:05:15,292-Speed 11231.90 samples/sec   Loss 9.4160   LearningRate 0.0571   Epoch: 9   Global Step: 49390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:05:16,253-Speed 10658.08 samples/sec   Loss 9.5218   LearningRate 0.0571   Epoch: 9   Global Step: 49400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:05:17,222-Speed 10578.81 samples/sec   Loss 9.4640   LearningRate 0.0571   Epoch: 9   Global Step: 49410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:05:18,179-Speed 10714.16 samples/sec   Loss 9.3394   LearningRate 0.0571   Epoch: 9   Global Step: 49420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:05:19,199-Speed 10048.02 samples/sec   Loss 9.3271   LearningRate 0.0571   Epoch: 9   Global Step: 49430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:05:20,168-Speed 10582.48 samples/sec   Loss 9.4602   LearningRate 0.0571   Epoch: 9   Global Step: 49440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:05:21,119-Speed 10775.76 samples/sec   Loss 9.5512   LearningRate 0.0571   Epoch: 9   Global Step: 49450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:05:22,083-Speed 10628.62 samples/sec   Loss 9.6251   LearningRate 0.0571   Epoch: 9   Global Step: 49460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:05:23,055-Speed 10540.96 samples/sec   Loss 9.5961   LearningRate 0.0571   Epoch: 9   Global Step: 49470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:05:24,055-Speed 10252.10 samples/sec   Loss 9.5467   LearningRate 0.0571   Epoch: 9   Global Step: 49480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:05:25,051-Speed 10289.99 samples/sec   Loss 9.4433   LearningRate 0.0571   Epoch: 9   Global Step: 49490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:05:26,022-Speed 10558.11 samples/sec   Loss 9.5018   LearningRate 0.0571   Epoch: 9   Global Step: 49500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:05:26,998-Speed 10504.34 samples/sec   Loss 9.3663   LearningRate 0.0570   Epoch: 9   Global Step: 49510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:05:27,950-Speed 10769.17 samples/sec   Loss 9.3270   LearningRate 0.0570   Epoch: 9   Global Step: 49520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:05:28,920-Speed 10562.89 samples/sec   Loss 9.4084   LearningRate 0.0570   Epoch: 9   Global Step: 49530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:05:29,892-Speed 10550.05 samples/sec   Loss 9.5990   LearningRate 0.0570   Epoch: 9   Global Step: 49540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:05:30,831-Speed 10913.09 samples/sec   Loss 9.4889   LearningRate 0.0570   Epoch: 9   Global Step: 49550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:05:31,747-Speed 11190.44 samples/sec   Loss 9.3469   LearningRate 0.0570   Epoch: 9   Global Step: 49560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:05:32,702-Speed 10724.99 samples/sec   Loss 9.5232   LearningRate 0.0570   Epoch: 9   Global Step: 49570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:05:33,647-Speed 10853.94 samples/sec   Loss 9.5244   LearningRate 0.0570   Epoch: 9   Global Step: 49580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:05:34,608-Speed 10658.57 samples/sec   Loss 9.4730   LearningRate 0.0570   Epoch: 9   Global Step: 49590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:05:35,603-Speed 10299.72 samples/sec   Loss 9.4635   LearningRate 0.0570   Epoch: 9   Global Step: 49600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:05:36,582-Speed 10470.54 samples/sec   Loss 9.4170   LearningRate 0.0570   Epoch: 9   Global Step: 49610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:05:37,529-Speed 10837.77 samples/sec   Loss 9.5870   LearningRate 0.0570   Epoch: 9   Global Step: 49620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:05:38,506-Speed 10490.25 samples/sec   Loss 9.5022   LearningRate 0.0570   Epoch: 9   Global Step: 49630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:05:39,489-Speed 10430.75 samples/sec   Loss 9.4104   LearningRate 0.0569   Epoch: 9   Global Step: 49640   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 01:05:40,444-Speed 10733.70 samples/sec   Loss 9.4469   LearningRate 0.0569   Epoch: 9   Global Step: 49650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:05:41,456-Speed 10127.15 samples/sec   Loss 9.4938   LearningRate 0.0569   Epoch: 9   Global Step: 49660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:05:42,389-Speed 10982.60 samples/sec   Loss 9.5899   LearningRate 0.0569   Epoch: 9   Global Step: 49670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:05:43,372-Speed 10434.07 samples/sec   Loss 9.5138   LearningRate 0.0569   Epoch: 9   Global Step: 49680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:05:44,322-Speed 10781.65 samples/sec   Loss 9.5459   LearningRate 0.0569   Epoch: 9   Global Step: 49690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:05:45,260-Speed 10966.16 samples/sec   Loss 9.5562   LearningRate 0.0569   Epoch: 9   Global Step: 49700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:05:46,232-Speed 10543.62 samples/sec   Loss 9.4795   LearningRate 0.0569   Epoch: 9   Global Step: 49710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:05:47,225-Speed 10326.35 samples/sec   Loss 9.6122   LearningRate 0.0569   Epoch: 9   Global Step: 49720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:05:48,204-Speed 10469.41 samples/sec   Loss 9.4682   LearningRate 0.0569   Epoch: 9   Global Step: 49730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:05:49,174-Speed 10572.60 samples/sec   Loss 9.3026   LearningRate 0.0569   Epoch: 9   Global Step: 49740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:05:50,120-Speed 10832.34 samples/sec   Loss 9.4313   LearningRate 0.0569   Epoch: 9   Global Step: 49750   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 01:05:51,085-Speed 10618.65 samples/sec   Loss 9.4658   LearningRate 0.0569   Epoch: 9   Global Step: 49760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:05:52,096-Speed 10140.19 samples/sec   Loss 9.5492   LearningRate 0.0569   Epoch: 9   Global Step: 49770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:05:53,084-Speed 10384.49 samples/sec   Loss 9.5027   LearningRate 0.0568   Epoch: 9   Global Step: 49780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:05:54,039-Speed 10729.69 samples/sec   Loss 9.4862   LearningRate 0.0568   Epoch: 9   Global Step: 49790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:05:54,991-Speed 10789.46 samples/sec   Loss 9.5137   LearningRate 0.0568   Epoch: 9   Global Step: 49800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:05:55,968-Speed 10484.83 samples/sec   Loss 9.4277   LearningRate 0.0568   Epoch: 9   Global Step: 49810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:05:56,937-Speed 10577.81 samples/sec   Loss 9.2811   LearningRate 0.0568   Epoch: 9   Global Step: 49820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:05:57,930-Speed 10330.60 samples/sec   Loss 9.5422   LearningRate 0.0568   Epoch: 9   Global Step: 49830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:05:58,887-Speed 10710.06 samples/sec   Loss 9.4269   LearningRate 0.0568   Epoch: 9   Global Step: 49840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:05:59,828-Speed 10886.81 samples/sec   Loss 9.5498   LearningRate 0.0568   Epoch: 9   Global Step: 49850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:06:00,785-Speed 10714.68 samples/sec   Loss 9.5502   LearningRate 0.0568   Epoch: 9   Global Step: 49860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:06:01,781-Speed 10293.18 samples/sec   Loss 9.5965   LearningRate 0.0568   Epoch: 9   Global Step: 49870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:06:02,708-Speed 11046.37 samples/sec   Loss 9.4744   LearningRate 0.0568   Epoch: 9   Global Step: 49880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:06:03,693-Speed 10405.69 samples/sec   Loss 9.4816   LearningRate 0.0568   Epoch: 9   Global Step: 49890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:06:04,647-Speed 10750.57 samples/sec   Loss 9.5382   LearningRate 0.0568   Epoch: 9   Global Step: 49900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:06:05,577-Speed 11013.40 samples/sec   Loss 9.4607   LearningRate 0.0567   Epoch: 9   Global Step: 49910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:06:06,545-Speed 10598.06 samples/sec   Loss 9.4808   LearningRate 0.0567   Epoch: 9   Global Step: 49920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:06:07,495-Speed 10785.03 samples/sec   Loss 9.4906   LearningRate 0.0567   Epoch: 9   Global Step: 49930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:06:08,456-Speed 10671.60 samples/sec   Loss 9.3300   LearningRate 0.0567   Epoch: 9   Global Step: 49940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:06:09,411-Speed 10735.21 samples/sec   Loss 9.3609   LearningRate 0.0567   Epoch: 9   Global Step: 49950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:06:10,367-Speed 10726.69 samples/sec   Loss 9.4285   LearningRate 0.0567   Epoch: 9   Global Step: 49960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:06:11,368-Speed 10235.17 samples/sec   Loss 9.5004   LearningRate 0.0567   Epoch: 9   Global Step: 49970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:06:12,372-Speed 10209.70 samples/sec   Loss 9.4301   LearningRate 0.0567   Epoch: 9   Global Step: 49980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:06:13,384-Speed 10121.10 samples/sec   Loss 9.2052   LearningRate 0.0567   Epoch: 9   Global Step: 49990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:06:14,373-Speed 10361.39 samples/sec   Loss 9.4418   LearningRate 0.0567   Epoch: 9   Global Step: 50000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:06:36,463-[lfw][50000]XNorm: 13.435060
Training: 2022-04-11 01:06:36,463-[lfw][50000]Accuracy-Flip: 0.99533+-0.00400
Training: 2022-04-11 01:06:36,464-[lfw][50000]Accuracy-Highest: 0.99550
Training: 2022-04-11 01:07:01,727-[cfp_fp][50000]XNorm: 11.457420
Training: 2022-04-11 01:07:01,728-[cfp_fp][50000]Accuracy-Flip: 0.94957+-0.01317
Training: 2022-04-11 01:07:01,728-[cfp_fp][50000]Accuracy-Highest: 0.95386
Training: 2022-04-11 01:07:23,887-[agedb_30][50000]XNorm: 13.175502
Training: 2022-04-11 01:07:23,888-[agedb_30][50000]Accuracy-Flip: 0.95933+-0.00886
Training: 2022-04-11 01:07:23,888-[agedb_30][50000]Accuracy-Highest: 0.96250
Training: 2022-04-11 01:07:24,824-Speed 145.35 samples/sec   Loss 9.3271   LearningRate 0.0567   Epoch: 9   Global Step: 50010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:07:25,745-Speed 11132.21 samples/sec   Loss 9.4073   LearningRate 0.0567   Epoch: 9   Global Step: 50020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:07:26,700-Speed 10726.77 samples/sec   Loss 9.4439   LearningRate 0.0567   Epoch: 9   Global Step: 50030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:07:27,726-Speed 9996.40 samples/sec   Loss 9.4366   LearningRate 0.0567   Epoch: 9   Global Step: 50040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:07:28,694-Speed 10583.18 samples/sec   Loss 9.4354   LearningRate 0.0566   Epoch: 9   Global Step: 50050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:07:29,681-Speed 10388.83 samples/sec   Loss 9.4315   LearningRate 0.0566   Epoch: 9   Global Step: 50060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:07:30,644-Speed 10644.08 samples/sec   Loss 9.3898   LearningRate 0.0566   Epoch: 9   Global Step: 50070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:07:31,611-Speed 10620.88 samples/sec   Loss 9.3880   LearningRate 0.0566   Epoch: 9   Global Step: 50080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:07:32,567-Speed 10720.05 samples/sec   Loss 9.4965   LearningRate 0.0566   Epoch: 9   Global Step: 50090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:07:33,566-Speed 10261.72 samples/sec   Loss 9.2348   LearningRate 0.0566   Epoch: 9   Global Step: 50100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:07:34,514-Speed 10810.05 samples/sec   Loss 9.4144   LearningRate 0.0566   Epoch: 9   Global Step: 50110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:07:35,471-Speed 10715.21 samples/sec   Loss 9.4479   LearningRate 0.0566   Epoch: 9   Global Step: 50120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:07:36,412-Speed 10887.69 samples/sec   Loss 9.2360   LearningRate 0.0566   Epoch: 9   Global Step: 50130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:07:37,380-Speed 10586.29 samples/sec   Loss 9.4779   LearningRate 0.0566   Epoch: 9   Global Step: 50140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:07:38,351-Speed 10561.82 samples/sec   Loss 9.5126   LearningRate 0.0566   Epoch: 9   Global Step: 50150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:07:39,342-Speed 10340.79 samples/sec   Loss 9.5332   LearningRate 0.0566   Epoch: 9   Global Step: 50160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:07:40,293-Speed 10774.14 samples/sec   Loss 9.3893   LearningRate 0.0566   Epoch: 9   Global Step: 50170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:07:41,255-Speed 10656.18 samples/sec   Loss 9.4580   LearningRate 0.0565   Epoch: 9   Global Step: 50180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:07:42,256-Speed 10242.66 samples/sec   Loss 9.4641   LearningRate 0.0565   Epoch: 9   Global Step: 50190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:07:43,230-Speed 10523.00 samples/sec   Loss 9.6130   LearningRate 0.0565   Epoch: 9   Global Step: 50200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:07:44,187-Speed 10719.76 samples/sec   Loss 9.5379   LearningRate 0.0565   Epoch: 9   Global Step: 50210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:07:45,137-Speed 10778.64 samples/sec   Loss 9.3800   LearningRate 0.0565   Epoch: 9   Global Step: 50220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:07:46,117-Speed 10464.20 samples/sec   Loss 9.3992   LearningRate 0.0565   Epoch: 9   Global Step: 50230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:07:47,117-Speed 10251.39 samples/sec   Loss 9.7511   LearningRate 0.0565   Epoch: 9   Global Step: 50240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:07:48,082-Speed 10632.20 samples/sec   Loss 9.5809   LearningRate 0.0565   Epoch: 9   Global Step: 50250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:07:49,018-Speed 10950.48 samples/sec   Loss 9.4708   LearningRate 0.0565   Epoch: 9   Global Step: 50260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:07:49,999-Speed 10451.66 samples/sec   Loss 9.2336   LearningRate 0.0565   Epoch: 9   Global Step: 50270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:07:50,977-Speed 10480.22 samples/sec   Loss 9.4339   LearningRate 0.0565   Epoch: 9   Global Step: 50280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:07:51,981-Speed 10206.33 samples/sec   Loss 9.4101   LearningRate 0.0565   Epoch: 9   Global Step: 50290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:07:52,918-Speed 10937.48 samples/sec   Loss 9.5315   LearningRate 0.0565   Epoch: 9   Global Step: 50300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:07:53,895-Speed 10498.30 samples/sec   Loss 9.5482   LearningRate 0.0565   Epoch: 9   Global Step: 50310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:07:54,829-Speed 10970.63 samples/sec   Loss 9.4408   LearningRate 0.0564   Epoch: 9   Global Step: 50320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:07:55,762-Speed 10983.01 samples/sec   Loss 9.5158   LearningRate 0.0564   Epoch: 9   Global Step: 50330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:07:56,700-Speed 10925.69 samples/sec   Loss 9.4567   LearningRate 0.0564   Epoch: 9   Global Step: 50340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:07:57,759-Speed 9680.13 samples/sec   Loss 9.5238   LearningRate 0.0564   Epoch: 9   Global Step: 50350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:07:58,715-Speed 10722.22 samples/sec   Loss 9.4622   LearningRate 0.0564   Epoch: 9   Global Step: 50360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:07:59,673-Speed 10693.99 samples/sec   Loss 9.5014   LearningRate 0.0564   Epoch: 9   Global Step: 50370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:00,670-Speed 10288.82 samples/sec   Loss 9.3918   LearningRate 0.0564   Epoch: 9   Global Step: 50380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:01,650-Speed 10449.46 samples/sec   Loss 9.6010   LearningRate 0.0564   Epoch: 9   Global Step: 50390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:02,613-Speed 10648.03 samples/sec   Loss 9.4741   LearningRate 0.0564   Epoch: 9   Global Step: 50400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:03,559-Speed 10836.07 samples/sec   Loss 9.2923   LearningRate 0.0564   Epoch: 9   Global Step: 50410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:04,473-Speed 11220.78 samples/sec   Loss 9.5230   LearningRate 0.0564   Epoch: 9   Global Step: 50420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:05,386-Speed 11220.24 samples/sec   Loss 9.4396   LearningRate 0.0564   Epoch: 9   Global Step: 50430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:06,385-Speed 10259.01 samples/sec   Loss 9.5964   LearningRate 0.0564   Epoch: 9   Global Step: 50440   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 01:08:07,347-Speed 10659.00 samples/sec   Loss 9.5392   LearningRate 0.0563   Epoch: 9   Global Step: 50450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:08,294-Speed 10838.16 samples/sec   Loss 9.3372   LearningRate 0.0563   Epoch: 9   Global Step: 50460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:09,256-Speed 10649.55 samples/sec   Loss 9.5245   LearningRate 0.0563   Epoch: 9   Global Step: 50470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:10,220-Speed 10630.94 samples/sec   Loss 9.2761   LearningRate 0.0563   Epoch: 9   Global Step: 50480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:11,176-Speed 10720.92 samples/sec   Loss 9.2673   LearningRate 0.0563   Epoch: 9   Global Step: 50490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:12,159-Speed 10432.79 samples/sec   Loss 9.4946   LearningRate 0.0563   Epoch: 9   Global Step: 50500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:13,108-Speed 10795.65 samples/sec   Loss 9.4139   LearningRate 0.0563   Epoch: 9   Global Step: 50510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:14,109-Speed 10237.26 samples/sec   Loss 9.5668   LearningRate 0.0563   Epoch: 9   Global Step: 50520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:15,088-Speed 10470.64 samples/sec   Loss 9.3612   LearningRate 0.0563   Epoch: 9   Global Step: 50530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:16,071-Speed 10422.74 samples/sec   Loss 9.3099   LearningRate 0.0563   Epoch: 9   Global Step: 50540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:17,052-Speed 10453.46 samples/sec   Loss 9.4210   LearningRate 0.0563   Epoch: 9   Global Step: 50550   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 01:08:17,983-Speed 11009.54 samples/sec   Loss 9.5723   LearningRate 0.0563   Epoch: 9   Global Step: 50560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:19,006-Speed 10015.26 samples/sec   Loss 9.5264   LearningRate 0.0563   Epoch: 9   Global Step: 50570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:20,124-Speed 9167.04 samples/sec   Loss 9.4903   LearningRate 0.0563   Epoch: 9   Global Step: 50580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:29,286-Speed 1117.81 samples/sec   Loss 8.3980   LearningRate 0.0562   Epoch: 10   Global Step: 50590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:30,310-Speed 10023.30 samples/sec   Loss 8.4949   LearningRate 0.0562   Epoch: 10   Global Step: 50600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:31,783-Speed 6954.16 samples/sec   Loss 8.5382   LearningRate 0.0562   Epoch: 10   Global Step: 50610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:32,872-Speed 9415.97 samples/sec   Loss 8.5626   LearningRate 0.0562   Epoch: 10   Global Step: 50620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:33,833-Speed 10664.90 samples/sec   Loss 8.5002   LearningRate 0.0562   Epoch: 10   Global Step: 50630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:34,867-Speed 9904.28 samples/sec   Loss 8.5571   LearningRate 0.0562   Epoch: 10   Global Step: 50640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:35,825-Speed 10692.73 samples/sec   Loss 8.5216   LearningRate 0.0562   Epoch: 10   Global Step: 50650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:36,783-Speed 10705.75 samples/sec   Loss 8.5414   LearningRate 0.0562   Epoch: 10   Global Step: 50660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:37,810-Speed 9980.37 samples/sec   Loss 8.6893   LearningRate 0.0562   Epoch: 10   Global Step: 50670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:38,823-Speed 10122.59 samples/sec   Loss 8.6400   LearningRate 0.0562   Epoch: 10   Global Step: 50680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:39,834-Speed 10152.82 samples/sec   Loss 8.5582   LearningRate 0.0562   Epoch: 10   Global Step: 50690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:40,849-Speed 10099.00 samples/sec   Loss 8.4546   LearningRate 0.0562   Epoch: 10   Global Step: 50700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:41,903-Speed 9722.37 samples/sec   Loss 8.5836   LearningRate 0.0562   Epoch: 10   Global Step: 50710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:42,951-Speed 9779.92 samples/sec   Loss 8.6703   LearningRate 0.0561   Epoch: 10   Global Step: 50720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:43,957-Speed 10189.71 samples/sec   Loss 8.6124   LearningRate 0.0561   Epoch: 10   Global Step: 50730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:44,936-Speed 10472.12 samples/sec   Loss 8.7645   LearningRate 0.0561   Epoch: 10   Global Step: 50740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:45,888-Speed 10766.23 samples/sec   Loss 8.7415   LearningRate 0.0561   Epoch: 10   Global Step: 50750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:46,836-Speed 10809.63 samples/sec   Loss 8.5404   LearningRate 0.0561   Epoch: 10   Global Step: 50760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:47,818-Speed 10441.19 samples/sec   Loss 8.6328   LearningRate 0.0561   Epoch: 10   Global Step: 50770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:48,772-Speed 10748.50 samples/sec   Loss 8.6123   LearningRate 0.0561   Epoch: 10   Global Step: 50780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:49,734-Speed 10655.94 samples/sec   Loss 8.7422   LearningRate 0.0561   Epoch: 10   Global Step: 50790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:50,714-Speed 10456.32 samples/sec   Loss 8.7611   LearningRate 0.0561   Epoch: 10   Global Step: 50800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:51,668-Speed 10745.17 samples/sec   Loss 8.7695   LearningRate 0.0561   Epoch: 10   Global Step: 50810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:52,940-Speed 8055.24 samples/sec   Loss 8.8833   LearningRate 0.0561   Epoch: 10   Global Step: 50820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:53,915-Speed 10511.06 samples/sec   Loss 8.8154   LearningRate 0.0561   Epoch: 10   Global Step: 50830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:54,837-Speed 11117.51 samples/sec   Loss 8.6316   LearningRate 0.0561   Epoch: 10   Global Step: 50840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:55,809-Speed 10570.97 samples/sec   Loss 8.7024   LearningRate 0.0561   Epoch: 10   Global Step: 50850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:56,801-Speed 10333.76 samples/sec   Loss 8.6498   LearningRate 0.0560   Epoch: 10   Global Step: 50860   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 01:08:57,770-Speed 10573.15 samples/sec   Loss 8.6981   LearningRate 0.0560   Epoch: 10   Global Step: 50870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:58,739-Speed 10578.30 samples/sec   Loss 8.7284   LearningRate 0.0560   Epoch: 10   Global Step: 50880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:08:59,701-Speed 10653.76 samples/sec   Loss 8.8233   LearningRate 0.0560   Epoch: 10   Global Step: 50890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:09:00,645-Speed 10858.66 samples/sec   Loss 8.9015   LearningRate 0.0560   Epoch: 10   Global Step: 50900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:09:01,584-Speed 10919.79 samples/sec   Loss 8.6095   LearningRate 0.0560   Epoch: 10   Global Step: 50910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:09:02,504-Speed 11131.61 samples/sec   Loss 8.7425   LearningRate 0.0560   Epoch: 10   Global Step: 50920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:09:03,481-Speed 10494.43 samples/sec   Loss 8.7168   LearningRate 0.0560   Epoch: 10   Global Step: 50930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:09:04,470-Speed 10371.51 samples/sec   Loss 8.8536   LearningRate 0.0560   Epoch: 10   Global Step: 50940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:09:05,418-Speed 10818.40 samples/sec   Loss 8.7053   LearningRate 0.0560   Epoch: 10   Global Step: 50950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:09:06,396-Speed 10476.78 samples/sec   Loss 8.8103   LearningRate 0.0560   Epoch: 10   Global Step: 50960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:09:07,383-Speed 10382.48 samples/sec   Loss 8.5546   LearningRate 0.0560   Epoch: 10   Global Step: 50970   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 01:09:08,370-Speed 10385.29 samples/sec   Loss 8.6113   LearningRate 0.0560   Epoch: 10   Global Step: 50980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:09:09,323-Speed 10763.61 samples/sec   Loss 8.9660   LearningRate 0.0559   Epoch: 10   Global Step: 50990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:09:10,293-Speed 10565.36 samples/sec   Loss 8.8369   LearningRate 0.0559   Epoch: 10   Global Step: 51000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:09:11,238-Speed 10845.63 samples/sec   Loss 9.0550   LearningRate 0.0559   Epoch: 10   Global Step: 51010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:09:12,214-Speed 10499.96 samples/sec   Loss 8.8968   LearningRate 0.0559   Epoch: 10   Global Step: 51020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:09:13,178-Speed 10632.23 samples/sec   Loss 8.9122   LearningRate 0.0559   Epoch: 10   Global Step: 51030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:09:14,155-Speed 10489.83 samples/sec   Loss 8.7763   LearningRate 0.0559   Epoch: 10   Global Step: 51040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:09:15,130-Speed 10517.49 samples/sec   Loss 8.7593   LearningRate 0.0559   Epoch: 10   Global Step: 51050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:09:16,114-Speed 10416.28 samples/sec   Loss 9.0658   LearningRate 0.0559   Epoch: 10   Global Step: 51060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:09:17,069-Speed 10735.42 samples/sec   Loss 8.7721   LearningRate 0.0559   Epoch: 10   Global Step: 51070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:09:18,013-Speed 10856.85 samples/sec   Loss 8.8708   LearningRate 0.0559   Epoch: 10   Global Step: 51080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:09:18,966-Speed 10757.28 samples/sec   Loss 8.8554   LearningRate 0.0559   Epoch: 10   Global Step: 51090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:09:19,978-Speed 10119.78 samples/sec   Loss 8.9020   LearningRate 0.0559   Epoch: 10   Global Step: 51100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:09:21,133-Speed 8878.43 samples/sec   Loss 8.7070   LearningRate 0.0559   Epoch: 10   Global Step: 51110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:09:22,115-Speed 10433.38 samples/sec   Loss 8.7672   LearningRate 0.0559   Epoch: 10   Global Step: 51120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:09:23,160-Speed 9808.54 samples/sec   Loss 8.9438   LearningRate 0.0558   Epoch: 10   Global Step: 51130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:09:24,176-Speed 10094.98 samples/sec   Loss 9.0600   LearningRate 0.0558   Epoch: 10   Global Step: 51140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:09:25,131-Speed 10734.94 samples/sec   Loss 8.7980   LearningRate 0.0558   Epoch: 10   Global Step: 51150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:09:26,084-Speed 10756.66 samples/sec   Loss 8.9491   LearningRate 0.0558   Epoch: 10   Global Step: 51160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:09:27,058-Speed 10526.99 samples/sec   Loss 8.9831   LearningRate 0.0558   Epoch: 10   Global Step: 51170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:09:28,034-Speed 10498.24 samples/sec   Loss 8.9706   LearningRate 0.0558   Epoch: 10   Global Step: 51180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:09:29,041-Speed 10176.02 samples/sec   Loss 9.1183   LearningRate 0.0558   Epoch: 10   Global Step: 51190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:09:30,012-Speed 10553.00 samples/sec   Loss 9.0860   LearningRate 0.0558   Epoch: 10   Global Step: 51200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:09:30,966-Speed 10743.87 samples/sec   Loss 8.9312   LearningRate 0.0558   Epoch: 10   Global Step: 51210   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 01:09:31,913-Speed 10827.37 samples/sec   Loss 9.0653   LearningRate 0.0558   Epoch: 10   Global Step: 51220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:09:32,873-Speed 10669.03 samples/sec   Loss 8.9598   LearningRate 0.0558   Epoch: 10   Global Step: 51230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:09:33,808-Speed 10965.49 samples/sec   Loss 9.1903   LearningRate 0.0558   Epoch: 10   Global Step: 51240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:09:34,802-Speed 10316.22 samples/sec   Loss 9.1012   LearningRate 0.0558   Epoch: 10   Global Step: 51250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:09:35,736-Speed 10977.00 samples/sec   Loss 9.1200   LearningRate 0.0557   Epoch: 10   Global Step: 51260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:09:36,694-Speed 10694.47 samples/sec   Loss 9.0018   LearningRate 0.0557   Epoch: 10   Global Step: 51270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:09:37,658-Speed 10632.78 samples/sec   Loss 9.0060   LearningRate 0.0557   Epoch: 10   Global Step: 51280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:09:38,655-Speed 10283.68 samples/sec   Loss 8.8962   LearningRate 0.0557   Epoch: 10   Global Step: 51290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:09:39,580-Speed 11085.48 samples/sec   Loss 8.9418   LearningRate 0.0557   Epoch: 10   Global Step: 51300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:09:40,547-Speed 10593.77 samples/sec   Loss 8.9059   LearningRate 0.0557   Epoch: 10   Global Step: 51310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:09:41,489-Speed 10889.47 samples/sec   Loss 8.9083   LearningRate 0.0557   Epoch: 10   Global Step: 51320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:09:42,473-Speed 10420.81 samples/sec   Loss 9.0274   LearningRate 0.0557   Epoch: 10   Global Step: 51330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:09:43,459-Speed 10391.03 samples/sec   Loss 9.0429   LearningRate 0.0557   Epoch: 10   Global Step: 51340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:09:44,419-Speed 10686.41 samples/sec   Loss 9.1215   LearningRate 0.0557   Epoch: 10   Global Step: 51350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:09:45,378-Speed 10675.38 samples/sec   Loss 8.9770   LearningRate 0.0557   Epoch: 10   Global Step: 51360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:09:46,367-Speed 10381.75 samples/sec   Loss 9.0822   LearningRate 0.0557   Epoch: 10   Global Step: 51370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:09:47,310-Speed 10864.82 samples/sec   Loss 9.0300   LearningRate 0.0557   Epoch: 10   Global Step: 51380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:09:48,246-Speed 10960.05 samples/sec   Loss 9.1009   LearningRate 0.0557   Epoch: 10   Global Step: 51390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:09:49,221-Speed 10532.40 samples/sec   Loss 9.1257   LearningRate 0.0556   Epoch: 10   Global Step: 51400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:09:50,176-Speed 10726.81 samples/sec   Loss 8.8820   LearningRate 0.0556   Epoch: 10   Global Step: 51410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:09:51,115-Speed 10922.96 samples/sec   Loss 9.0498   LearningRate 0.0556   Epoch: 10   Global Step: 51420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:09:52,096-Speed 10454.65 samples/sec   Loss 9.1402   LearningRate 0.0556   Epoch: 10   Global Step: 51430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:09:53,056-Speed 10670.32 samples/sec   Loss 8.9711   LearningRate 0.0556   Epoch: 10   Global Step: 51440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:09:53,987-Speed 11013.04 samples/sec   Loss 9.0326   LearningRate 0.0556   Epoch: 10   Global Step: 51450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:09:54,973-Speed 10393.67 samples/sec   Loss 9.1124   LearningRate 0.0556   Epoch: 10   Global Step: 51460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:09:55,929-Speed 10719.90 samples/sec   Loss 9.1683   LearningRate 0.0556   Epoch: 10   Global Step: 51470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:09:56,896-Speed 10605.71 samples/sec   Loss 8.9592   LearningRate 0.0556   Epoch: 10   Global Step: 51480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:09:57,819-Speed 11106.21 samples/sec   Loss 9.1820   LearningRate 0.0556   Epoch: 10   Global Step: 51490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:09:58,762-Speed 10866.81 samples/sec   Loss 8.9960   LearningRate 0.0556   Epoch: 10   Global Step: 51500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:09:59,742-Speed 10454.31 samples/sec   Loss 9.1280   LearningRate 0.0556   Epoch: 10   Global Step: 51510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:10:00,745-Speed 10236.86 samples/sec   Loss 9.1936   LearningRate 0.0556   Epoch: 10   Global Step: 51520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:10:01,691-Speed 10834.15 samples/sec   Loss 9.0871   LearningRate 0.0555   Epoch: 10   Global Step: 51530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:10:02,659-Speed 10591.72 samples/sec   Loss 9.0873   LearningRate 0.0555   Epoch: 10   Global Step: 51540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:10:03,588-Speed 11031.09 samples/sec   Loss 9.0866   LearningRate 0.0555   Epoch: 10   Global Step: 51550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:10:04,517-Speed 11027.84 samples/sec   Loss 9.0007   LearningRate 0.0555   Epoch: 10   Global Step: 51560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:10:05,466-Speed 10797.92 samples/sec   Loss 8.8930   LearningRate 0.0555   Epoch: 10   Global Step: 51570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:10:06,416-Speed 10791.90 samples/sec   Loss 9.1105   LearningRate 0.0555   Epoch: 10   Global Step: 51580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:10:07,429-Speed 10125.58 samples/sec   Loss 9.1016   LearningRate 0.0555   Epoch: 10   Global Step: 51590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:10:08,393-Speed 10624.24 samples/sec   Loss 9.0561   LearningRate 0.0555   Epoch: 10   Global Step: 51600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:10:09,344-Speed 10773.75 samples/sec   Loss 8.9950   LearningRate 0.0555   Epoch: 10   Global Step: 51610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:10:10,300-Speed 10729.89 samples/sec   Loss 8.9622   LearningRate 0.0555   Epoch: 10   Global Step: 51620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:10:11,229-Speed 11027.32 samples/sec   Loss 9.0864   LearningRate 0.0555   Epoch: 10   Global Step: 51630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:10:12,236-Speed 10183.89 samples/sec   Loss 9.1474   LearningRate 0.0555   Epoch: 10   Global Step: 51640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:10:13,198-Speed 10656.38 samples/sec   Loss 9.0886   LearningRate 0.0555   Epoch: 10   Global Step: 51650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:10:14,156-Speed 10689.94 samples/sec   Loss 8.8367   LearningRate 0.0555   Epoch: 10   Global Step: 51660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:10:15,137-Speed 10455.23 samples/sec   Loss 8.9247   LearningRate 0.0554   Epoch: 10   Global Step: 51670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:10:16,114-Speed 10485.73 samples/sec   Loss 9.1848   LearningRate 0.0554   Epoch: 10   Global Step: 51680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:10:17,081-Speed 10751.26 samples/sec   Loss 9.1317   LearningRate 0.0554   Epoch: 10   Global Step: 51690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:10:18,021-Speed 10907.79 samples/sec   Loss 9.1712   LearningRate 0.0554   Epoch: 10   Global Step: 51700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:10:18,937-Speed 11198.37 samples/sec   Loss 9.1784   LearningRate 0.0554   Epoch: 10   Global Step: 51710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:10:19,889-Speed 10759.79 samples/sec   Loss 9.1020   LearningRate 0.0554   Epoch: 10   Global Step: 51720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:10:20,859-Speed 10566.02 samples/sec   Loss 9.1756   LearningRate 0.0554   Epoch: 10   Global Step: 51730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:10:21,835-Speed 10506.63 samples/sec   Loss 9.2022   LearningRate 0.0554   Epoch: 10   Global Step: 51740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:10:22,778-Speed 10864.78 samples/sec   Loss 9.2430   LearningRate 0.0554   Epoch: 10   Global Step: 51750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:10:23,711-Speed 10985.27 samples/sec   Loss 9.1651   LearningRate 0.0554   Epoch: 10   Global Step: 51760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:10:24,669-Speed 10711.76 samples/sec   Loss 9.2187   LearningRate 0.0554   Epoch: 10   Global Step: 51770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:10:25,623-Speed 10737.77 samples/sec   Loss 9.1634   LearningRate 0.0554   Epoch: 10   Global Step: 51780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:10:26,572-Speed 10805.21 samples/sec   Loss 9.2634   LearningRate 0.0554   Epoch: 10   Global Step: 51790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:10:27,560-Speed 10376.14 samples/sec   Loss 9.1441   LearningRate 0.0553   Epoch: 10   Global Step: 51800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:10:28,534-Speed 10523.63 samples/sec   Loss 9.1421   LearningRate 0.0553   Epoch: 10   Global Step: 51810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:10:29,479-Speed 10850.10 samples/sec   Loss 9.0995   LearningRate 0.0553   Epoch: 10   Global Step: 51820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:10:30,410-Speed 11019.63 samples/sec   Loss 9.1495   LearningRate 0.0553   Epoch: 10   Global Step: 51830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:10:31,373-Speed 10637.70 samples/sec   Loss 9.0899   LearningRate 0.0553   Epoch: 10   Global Step: 51840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:10:32,300-Speed 11051.03 samples/sec   Loss 9.0534   LearningRate 0.0553   Epoch: 10   Global Step: 51850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:10:33,268-Speed 10586.06 samples/sec   Loss 9.3162   LearningRate 0.0553   Epoch: 10   Global Step: 51860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:10:34,226-Speed 10708.08 samples/sec   Loss 9.1557   LearningRate 0.0553   Epoch: 10   Global Step: 51870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:10:35,221-Speed 10304.98 samples/sec   Loss 9.1831   LearningRate 0.0553   Epoch: 10   Global Step: 51880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:10:36,161-Speed 10894.59 samples/sec   Loss 9.0619   LearningRate 0.0553   Epoch: 10   Global Step: 51890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:10:37,107-Speed 10832.11 samples/sec   Loss 9.1416   LearningRate 0.0553   Epoch: 10   Global Step: 51900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:10:38,058-Speed 10776.78 samples/sec   Loss 9.0439   LearningRate 0.0553   Epoch: 10   Global Step: 51910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:10:39,010-Speed 10776.72 samples/sec   Loss 9.1542   LearningRate 0.0553   Epoch: 10   Global Step: 51920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:10:40,006-Speed 10281.04 samples/sec   Loss 9.0706   LearningRate 0.0553   Epoch: 10   Global Step: 51930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:10:40,962-Speed 10744.40 samples/sec   Loss 9.1306   LearningRate 0.0552   Epoch: 10   Global Step: 51940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:10:41,928-Speed 10604.20 samples/sec   Loss 9.2938   LearningRate 0.0552   Epoch: 10   Global Step: 51950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:10:42,848-Speed 11147.75 samples/sec   Loss 9.2587   LearningRate 0.0552   Epoch: 10   Global Step: 51960   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 01:10:43,803-Speed 10740.22 samples/sec   Loss 9.2719   LearningRate 0.0552   Epoch: 10   Global Step: 51970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:10:44,760-Speed 10744.44 samples/sec   Loss 9.1701   LearningRate 0.0552   Epoch: 10   Global Step: 51980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:10:45,748-Speed 10376.29 samples/sec   Loss 9.0887   LearningRate 0.0552   Epoch: 10   Global Step: 51990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:10:46,704-Speed 10714.18 samples/sec   Loss 9.0610   LearningRate 0.0552   Epoch: 10   Global Step: 52000   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:11:08,923-[lfw][52000]XNorm: 13.168803
Training: 2022-04-11 01:11:08,924-[lfw][52000]Accuracy-Flip: 0.99367+-0.00440
Training: 2022-04-11 01:11:08,924-[lfw][52000]Accuracy-Highest: 0.99550
Training: 2022-04-11 01:11:34,454-[cfp_fp][52000]XNorm: 11.183880
Training: 2022-04-11 01:11:34,455-[cfp_fp][52000]Accuracy-Flip: 0.95214+-0.01300
Training: 2022-04-11 01:11:34,456-[cfp_fp][52000]Accuracy-Highest: 0.95386
Training: 2022-04-11 01:11:56,638-[agedb_30][52000]XNorm: 12.893819
Training: 2022-04-11 01:11:56,639-[agedb_30][52000]Accuracy-Flip: 0.95883+-0.00907
Training: 2022-04-11 01:11:56,639-[agedb_30][52000]Accuracy-Highest: 0.96250
Training: 2022-04-11 01:11:57,602-Speed 144.44 samples/sec   Loss 9.1322   LearningRate 0.0552   Epoch: 10   Global Step: 52010   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:11:58,601-Speed 10258.77 samples/sec   Loss 9.2563   LearningRate 0.0552   Epoch: 10   Global Step: 52020   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:11:59,563-Speed 10663.94 samples/sec   Loss 9.0843   LearningRate 0.0552   Epoch: 10   Global Step: 52030   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:12:00,505-Speed 10877.15 samples/sec   Loss 9.2294   LearningRate 0.0552   Epoch: 10   Global Step: 52040   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:12:01,463-Speed 10694.77 samples/sec   Loss 9.1208   LearningRate 0.0552   Epoch: 10   Global Step: 52050   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:12:02,433-Speed 10568.08 samples/sec   Loss 9.2598   LearningRate 0.0552   Epoch: 10   Global Step: 52060   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:12:03,384-Speed 10786.36 samples/sec   Loss 9.1630   LearningRate 0.0552   Epoch: 10   Global Step: 52070   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:12:04,295-Speed 11254.18 samples/sec   Loss 9.1838   LearningRate 0.0551   Epoch: 10   Global Step: 52080   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:12:05,257-Speed 10655.49 samples/sec   Loss 9.0633   LearningRate 0.0551   Epoch: 10   Global Step: 52090   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:12:06,216-Speed 10688.42 samples/sec   Loss 9.1471   LearningRate 0.0551   Epoch: 10   Global Step: 52100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:07,177-Speed 10660.32 samples/sec   Loss 9.1917   LearningRate 0.0551   Epoch: 10   Global Step: 52110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:08,133-Speed 10726.20 samples/sec   Loss 9.0724   LearningRate 0.0551   Epoch: 10   Global Step: 52120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:09,103-Speed 10570.61 samples/sec   Loss 9.1156   LearningRate 0.0551   Epoch: 10   Global Step: 52130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:10,079-Speed 10502.57 samples/sec   Loss 9.1922   LearningRate 0.0551   Epoch: 10   Global Step: 52140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:10,993-Speed 11211.34 samples/sec   Loss 9.1605   LearningRate 0.0551   Epoch: 10   Global Step: 52150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:11,975-Speed 10437.28 samples/sec   Loss 9.1302   LearningRate 0.0551   Epoch: 10   Global Step: 52160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:12,933-Speed 10696.19 samples/sec   Loss 9.1768   LearningRate 0.0551   Epoch: 10   Global Step: 52170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:13,908-Speed 10523.12 samples/sec   Loss 9.1483   LearningRate 0.0551   Epoch: 10   Global Step: 52180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:14,868-Speed 10673.17 samples/sec   Loss 9.2053   LearningRate 0.0551   Epoch: 10   Global Step: 52190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:15,827-Speed 10692.19 samples/sec   Loss 9.2542   LearningRate 0.0551   Epoch: 10   Global Step: 52200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:12:16,800-Speed 10529.35 samples/sec   Loss 9.0838   LearningRate 0.0550   Epoch: 10   Global Step: 52210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:12:17,752-Speed 10776.45 samples/sec   Loss 9.1885   LearningRate 0.0550   Epoch: 10   Global Step: 52220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:18,712-Speed 10670.73 samples/sec   Loss 9.2063   LearningRate 0.0550   Epoch: 10   Global Step: 52230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:19,686-Speed 10527.55 samples/sec   Loss 9.1485   LearningRate 0.0550   Epoch: 10   Global Step: 52240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:20,676-Speed 10352.84 samples/sec   Loss 9.1495   LearningRate 0.0550   Epoch: 10   Global Step: 52250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:21,632-Speed 10722.97 samples/sec   Loss 9.0601   LearningRate 0.0550   Epoch: 10   Global Step: 52260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:22,561-Speed 11036.58 samples/sec   Loss 9.2570   LearningRate 0.0550   Epoch: 10   Global Step: 52270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:23,552-Speed 10333.52 samples/sec   Loss 9.1038   LearningRate 0.0550   Epoch: 10   Global Step: 52280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:24,531-Speed 10470.82 samples/sec   Loss 9.2358   LearningRate 0.0550   Epoch: 10   Global Step: 52290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:25,467-Speed 10949.96 samples/sec   Loss 9.1353   LearningRate 0.0550   Epoch: 10   Global Step: 52300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:26,412-Speed 10854.53 samples/sec   Loss 9.0718   LearningRate 0.0550   Epoch: 10   Global Step: 52310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:27,361-Speed 10788.91 samples/sec   Loss 9.2102   LearningRate 0.0550   Epoch: 10   Global Step: 52320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:12:28,379-Speed 10072.41 samples/sec   Loss 9.2575   LearningRate 0.0550   Epoch: 10   Global Step: 52330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:12:29,337-Speed 10713.44 samples/sec   Loss 9.1017   LearningRate 0.0550   Epoch: 10   Global Step: 52340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:12:30,290-Speed 10752.78 samples/sec   Loss 9.1954   LearningRate 0.0549   Epoch: 10   Global Step: 52350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:31,258-Speed 10588.89 samples/sec   Loss 9.2969   LearningRate 0.0549   Epoch: 10   Global Step: 52360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:32,222-Speed 10632.07 samples/sec   Loss 9.1998   LearningRate 0.0549   Epoch: 10   Global Step: 52370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:33,239-Speed 10080.34 samples/sec   Loss 9.1717   LearningRate 0.0549   Epoch: 10   Global Step: 52380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:34,210-Speed 10557.63 samples/sec   Loss 8.9816   LearningRate 0.0549   Epoch: 10   Global Step: 52390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:35,187-Speed 10483.93 samples/sec   Loss 9.1918   LearningRate 0.0549   Epoch: 10   Global Step: 52400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:36,157-Speed 10561.92 samples/sec   Loss 9.1540   LearningRate 0.0549   Epoch: 10   Global Step: 52410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:37,114-Speed 10713.84 samples/sec   Loss 9.2742   LearningRate 0.0549   Epoch: 10   Global Step: 52420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:38,072-Speed 10701.59 samples/sec   Loss 9.2199   LearningRate 0.0549   Epoch: 10   Global Step: 52430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:39,032-Speed 10675.82 samples/sec   Loss 9.1370   LearningRate 0.0549   Epoch: 10   Global Step: 52440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:39,976-Speed 10849.27 samples/sec   Loss 9.3271   LearningRate 0.0549   Epoch: 10   Global Step: 52450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:12:40,917-Speed 10896.81 samples/sec   Loss 9.1594   LearningRate 0.0549   Epoch: 10   Global Step: 52460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:12:41,847-Speed 11014.11 samples/sec   Loss 9.1332   LearningRate 0.0549   Epoch: 10   Global Step: 52470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:12:42,836-Speed 10367.01 samples/sec   Loss 9.2095   LearningRate 0.0549   Epoch: 10   Global Step: 52480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:12:43,828-Speed 10328.56 samples/sec   Loss 9.2193   LearningRate 0.0548   Epoch: 10   Global Step: 52490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:12:44,781-Speed 10760.91 samples/sec   Loss 9.1000   LearningRate 0.0548   Epoch: 10   Global Step: 52500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:12:45,728-Speed 10826.78 samples/sec   Loss 9.0886   LearningRate 0.0548   Epoch: 10   Global Step: 52510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:12:46,689-Speed 10659.09 samples/sec   Loss 9.3091   LearningRate 0.0548   Epoch: 10   Global Step: 52520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:12:47,658-Speed 10578.65 samples/sec   Loss 9.2937   LearningRate 0.0548   Epoch: 10   Global Step: 52530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:48,661-Speed 10214.28 samples/sec   Loss 9.2749   LearningRate 0.0548   Epoch: 10   Global Step: 52540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:49,660-Speed 10267.65 samples/sec   Loss 9.2818   LearningRate 0.0548   Epoch: 10   Global Step: 52550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:50,633-Speed 10536.42 samples/sec   Loss 9.1975   LearningRate 0.0548   Epoch: 10   Global Step: 52560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:51,579-Speed 10835.18 samples/sec   Loss 9.2363   LearningRate 0.0548   Epoch: 10   Global Step: 52570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:52,576-Speed 10275.66 samples/sec   Loss 9.1770   LearningRate 0.0548   Epoch: 10   Global Step: 52580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:53,579-Speed 10224.65 samples/sec   Loss 9.1652   LearningRate 0.0548   Epoch: 10   Global Step: 52590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:54,554-Speed 10511.98 samples/sec   Loss 9.1718   LearningRate 0.0548   Epoch: 10   Global Step: 52600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:55,507-Speed 10748.57 samples/sec   Loss 9.2456   LearningRate 0.0548   Epoch: 10   Global Step: 52610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:56,709-Speed 8528.12 samples/sec   Loss 9.1238   LearningRate 0.0547   Epoch: 10   Global Step: 52620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:57,727-Speed 10068.79 samples/sec   Loss 9.2144   LearningRate 0.0547   Epoch: 10   Global Step: 52630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:12:58,746-Speed 10060.50 samples/sec   Loss 9.2645   LearningRate 0.0547   Epoch: 10   Global Step: 52640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:12:59,752-Speed 10189.36 samples/sec   Loss 9.1720   LearningRate 0.0547   Epoch: 10   Global Step: 52650   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:13:00,697-Speed 10849.45 samples/sec   Loss 9.2134   LearningRate 0.0547   Epoch: 10   Global Step: 52660   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:13:01,646-Speed 10793.22 samples/sec   Loss 9.0463   LearningRate 0.0547   Epoch: 10   Global Step: 52670   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:13:02,602-Speed 10727.59 samples/sec   Loss 9.4043   LearningRate 0.0547   Epoch: 10   Global Step: 52680   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:13:03,569-Speed 10595.68 samples/sec   Loss 9.0481   LearningRate 0.0547   Epoch: 10   Global Step: 52690   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:13:04,575-Speed 10185.78 samples/sec   Loss 9.3099   LearningRate 0.0547   Epoch: 10   Global Step: 52700   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:13:05,514-Speed 10911.76 samples/sec   Loss 9.2343   LearningRate 0.0547   Epoch: 10   Global Step: 52710   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:13:06,455-Speed 10893.64 samples/sec   Loss 9.0905   LearningRate 0.0547   Epoch: 10   Global Step: 52720   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:13:07,436-Speed 10451.22 samples/sec   Loss 9.1871   LearningRate 0.0547   Epoch: 10   Global Step: 52730   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:13:08,383-Speed 10828.87 samples/sec   Loss 9.1300   LearningRate 0.0547   Epoch: 10   Global Step: 52740   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:13:09,363-Speed 10465.80 samples/sec   Loss 9.1228   LearningRate 0.0547   Epoch: 10   Global Step: 52750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:13:10,312-Speed 10794.82 samples/sec   Loss 9.3586   LearningRate 0.0546   Epoch: 10   Global Step: 52760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:13:11,285-Speed 10537.60 samples/sec   Loss 9.1331   LearningRate 0.0546   Epoch: 10   Global Step: 52770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:13:12,271-Speed 10399.75 samples/sec   Loss 9.0864   LearningRate 0.0546   Epoch: 10   Global Step: 52780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:13:13,232-Speed 10672.24 samples/sec   Loss 9.2726   LearningRate 0.0546   Epoch: 10   Global Step: 52790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:13:14,225-Speed 10322.44 samples/sec   Loss 9.2820   LearningRate 0.0546   Epoch: 10   Global Step: 52800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:13:15,193-Speed 10593.24 samples/sec   Loss 9.1140   LearningRate 0.0546   Epoch: 10   Global Step: 52810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:13:16,145-Speed 10762.87 samples/sec   Loss 9.2911   LearningRate 0.0546   Epoch: 10   Global Step: 52820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:13:17,106-Speed 10665.75 samples/sec   Loss 9.2091   LearningRate 0.0546   Epoch: 10   Global Step: 52830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:13:18,098-Speed 10328.36 samples/sec   Loss 9.3583   LearningRate 0.0546   Epoch: 10   Global Step: 52840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:13:19,100-Speed 10231.32 samples/sec   Loss 9.3595   LearningRate 0.0546   Epoch: 10   Global Step: 52850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:13:20,078-Speed 10486.12 samples/sec   Loss 9.2912   LearningRate 0.0546   Epoch: 10   Global Step: 52860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:13:21,029-Speed 10780.55 samples/sec   Loss 9.2945   LearningRate 0.0546   Epoch: 10   Global Step: 52870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:13:21,979-Speed 10777.66 samples/sec   Loss 9.2150   LearningRate 0.0546   Epoch: 10   Global Step: 52880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:13:22,926-Speed 10828.98 samples/sec   Loss 9.4589   LearningRate 0.0546   Epoch: 10   Global Step: 52890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:13:23,848-Speed 11112.55 samples/sec   Loss 9.3847   LearningRate 0.0545   Epoch: 10   Global Step: 52900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:13:24,861-Speed 10121.76 samples/sec   Loss 9.1159   LearningRate 0.0545   Epoch: 10   Global Step: 52910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:13:25,840-Speed 10471.22 samples/sec   Loss 9.2633   LearningRate 0.0545   Epoch: 10   Global Step: 52920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:13:26,803-Speed 10636.72 samples/sec   Loss 9.2012   LearningRate 0.0545   Epoch: 10   Global Step: 52930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:13:27,769-Speed 10606.59 samples/sec   Loss 9.0857   LearningRate 0.0545   Epoch: 10   Global Step: 52940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:13:28,779-Speed 10154.24 samples/sec   Loss 9.3710   LearningRate 0.0545   Epoch: 10   Global Step: 52950   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 01:13:29,756-Speed 10490.99 samples/sec   Loss 9.3280   LearningRate 0.0545   Epoch: 10   Global Step: 52960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:13:30,692-Speed 10953.44 samples/sec   Loss 9.1394   LearningRate 0.0545   Epoch: 10   Global Step: 52970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:13:31,673-Speed 10450.40 samples/sec   Loss 9.3311   LearningRate 0.0545   Epoch: 10   Global Step: 52980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:13:32,634-Speed 10669.24 samples/sec   Loss 9.3400   LearningRate 0.0545   Epoch: 10   Global Step: 52990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:13:33,569-Speed 10967.01 samples/sec   Loss 9.1199   LearningRate 0.0545   Epoch: 10   Global Step: 53000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:13:34,520-Speed 10774.75 samples/sec   Loss 9.2692   LearningRate 0.0545   Epoch: 10   Global Step: 53010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:13:35,508-Speed 10370.29 samples/sec   Loss 9.2785   LearningRate 0.0545   Epoch: 10   Global Step: 53020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:13:36,488-Speed 10462.43 samples/sec   Loss 9.3650   LearningRate 0.0544   Epoch: 10   Global Step: 53030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:13:37,441-Speed 10744.47 samples/sec   Loss 9.2254   LearningRate 0.0544   Epoch: 10   Global Step: 53040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:13:38,446-Speed 10200.74 samples/sec   Loss 9.1594   LearningRate 0.0544   Epoch: 10   Global Step: 53050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:13:39,361-Speed 11215.19 samples/sec   Loss 9.2762   LearningRate 0.0544   Epoch: 10   Global Step: 53060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:13:40,357-Speed 10283.93 samples/sec   Loss 9.1919   LearningRate 0.0544   Epoch: 10   Global Step: 53070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:13:41,330-Speed 10537.51 samples/sec   Loss 9.2613   LearningRate 0.0544   Epoch: 10   Global Step: 53080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:13:42,311-Speed 10451.01 samples/sec   Loss 9.2118   LearningRate 0.0544   Epoch: 10   Global Step: 53090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:13:43,298-Speed 10379.57 samples/sec   Loss 9.1355   LearningRate 0.0544   Epoch: 10   Global Step: 53100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:13:44,265-Speed 10603.15 samples/sec   Loss 9.2664   LearningRate 0.0544   Epoch: 10   Global Step: 53110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:13:45,228-Speed 10642.82 samples/sec   Loss 9.3630   LearningRate 0.0544   Epoch: 10   Global Step: 53120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:13:46,145-Speed 11171.62 samples/sec   Loss 9.2980   LearningRate 0.0544   Epoch: 10   Global Step: 53130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:13:47,132-Speed 10388.86 samples/sec   Loss 9.2093   LearningRate 0.0544   Epoch: 10   Global Step: 53140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:13:48,157-Speed 10001.74 samples/sec   Loss 9.3080   LearningRate 0.0544   Epoch: 10   Global Step: 53150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:13:49,134-Speed 10497.70 samples/sec   Loss 9.2008   LearningRate 0.0544   Epoch: 10   Global Step: 53160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:13:50,110-Speed 10500.12 samples/sec   Loss 9.4293   LearningRate 0.0543   Epoch: 10   Global Step: 53170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:13:51,112-Speed 10234.39 samples/sec   Loss 9.1020   LearningRate 0.0543   Epoch: 10   Global Step: 53180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:13:52,116-Speed 10208.37 samples/sec   Loss 9.2151   LearningRate 0.0543   Epoch: 10   Global Step: 53190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:13:53,136-Speed 10047.24 samples/sec   Loss 9.2999   LearningRate 0.0543   Epoch: 10   Global Step: 53200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:13:54,106-Speed 10568.23 samples/sec   Loss 9.2055   LearningRate 0.0543   Epoch: 10   Global Step: 53210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:13:55,067-Speed 10655.41 samples/sec   Loss 9.2422   LearningRate 0.0543   Epoch: 10   Global Step: 53220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:13:56,021-Speed 10743.53 samples/sec   Loss 9.2974   LearningRate 0.0543   Epoch: 10   Global Step: 53230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:13:57,021-Speed 10252.92 samples/sec   Loss 9.2325   LearningRate 0.0543   Epoch: 10   Global Step: 53240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:13:57,970-Speed 10805.34 samples/sec   Loss 9.2261   LearningRate 0.0543   Epoch: 10   Global Step: 53250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:13:58,984-Speed 10113.08 samples/sec   Loss 9.1928   LearningRate 0.0543   Epoch: 10   Global Step: 53260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:13:59,944-Speed 10675.90 samples/sec   Loss 9.2024   LearningRate 0.0543   Epoch: 10   Global Step: 53270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:14:00,910-Speed 10609.00 samples/sec   Loss 9.1298   LearningRate 0.0543   Epoch: 10   Global Step: 53280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:14:01,881-Speed 10557.40 samples/sec   Loss 9.1962   LearningRate 0.0543   Epoch: 10   Global Step: 53290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:14:02,858-Speed 10481.98 samples/sec   Loss 9.2179   LearningRate 0.0543   Epoch: 10   Global Step: 53300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:14:03,783-Speed 11087.90 samples/sec   Loss 9.3738   LearningRate 0.0542   Epoch: 10   Global Step: 53310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:14:04,763-Speed 10465.14 samples/sec   Loss 9.0357   LearningRate 0.0542   Epoch: 10   Global Step: 53320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:14:05,727-Speed 10635.42 samples/sec   Loss 9.2147   LearningRate 0.0542   Epoch: 10   Global Step: 53330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:14:06,671-Speed 10852.94 samples/sec   Loss 9.2077   LearningRate 0.0542   Epoch: 10   Global Step: 53340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:14:07,656-Speed 10401.81 samples/sec   Loss 9.3537   LearningRate 0.0542   Epoch: 10   Global Step: 53350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:14:08,642-Speed 10397.42 samples/sec   Loss 9.2605   LearningRate 0.0542   Epoch: 10   Global Step: 53360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:14:09,616-Speed 10527.85 samples/sec   Loss 9.2242   LearningRate 0.0542   Epoch: 10   Global Step: 53370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:14:10,569-Speed 10753.18 samples/sec   Loss 9.1674   LearningRate 0.0542   Epoch: 10   Global Step: 53380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:14:11,533-Speed 10637.31 samples/sec   Loss 9.1189   LearningRate 0.0542   Epoch: 10   Global Step: 53390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:14:12,521-Speed 10372.37 samples/sec   Loss 9.1915   LearningRate 0.0542   Epoch: 10   Global Step: 53400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:14:13,509-Speed 10372.06 samples/sec   Loss 9.0380   LearningRate 0.0542   Epoch: 10   Global Step: 53410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:14:14,429-Speed 11140.68 samples/sec   Loss 9.3536   LearningRate 0.0542   Epoch: 10   Global Step: 53420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:14:15,390-Speed 10669.25 samples/sec   Loss 9.4027   LearningRate 0.0542   Epoch: 10   Global Step: 53430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:14:16,377-Speed 10388.54 samples/sec   Loss 9.2885   LearningRate 0.0541   Epoch: 10   Global Step: 53440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:14:17,349-Speed 10543.45 samples/sec   Loss 9.4197   LearningRate 0.0541   Epoch: 10   Global Step: 53450   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 01:14:18,320-Speed 10556.71 samples/sec   Loss 9.1377   LearningRate 0.0541   Epoch: 10   Global Step: 53460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:14:19,395-Speed 9536.11 samples/sec   Loss 9.2045   LearningRate 0.0541   Epoch: 10   Global Step: 53470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:14:20,356-Speed 10666.48 samples/sec   Loss 9.3402   LearningRate 0.0541   Epoch: 10   Global Step: 53480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:14:21,326-Speed 10567.02 samples/sec   Loss 9.4485   LearningRate 0.0541   Epoch: 10   Global Step: 53490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:14:22,282-Speed 10715.02 samples/sec   Loss 9.2920   LearningRate 0.0541   Epoch: 10   Global Step: 53500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:14:23,248-Speed 10615.26 samples/sec   Loss 9.3515   LearningRate 0.0541   Epoch: 10   Global Step: 53510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:14:24,212-Speed 10633.50 samples/sec   Loss 9.2464   LearningRate 0.0541   Epoch: 10   Global Step: 53520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:14:25,187-Speed 10512.15 samples/sec   Loss 9.4513   LearningRate 0.0541   Epoch: 10   Global Step: 53530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:14:26,137-Speed 10785.77 samples/sec   Loss 9.1355   LearningRate 0.0541   Epoch: 10   Global Step: 53540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:14:27,075-Speed 10919.48 samples/sec   Loss 9.3711   LearningRate 0.0541   Epoch: 10   Global Step: 53550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:14:28,068-Speed 10325.93 samples/sec   Loss 9.1463   LearningRate 0.0541   Epoch: 10   Global Step: 53560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:14:29,066-Speed 10271.46 samples/sec   Loss 9.3704   LearningRate 0.0541   Epoch: 10   Global Step: 53570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:14:30,024-Speed 10701.31 samples/sec   Loss 9.2161   LearningRate 0.0540   Epoch: 10   Global Step: 53580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:14:30,989-Speed 10616.54 samples/sec   Loss 9.2797   LearningRate 0.0540   Epoch: 10   Global Step: 53590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:14:31,958-Speed 10572.18 samples/sec   Loss 9.2162   LearningRate 0.0540   Epoch: 10   Global Step: 53600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:14:32,933-Speed 10519.22 samples/sec   Loss 9.1583   LearningRate 0.0540   Epoch: 10   Global Step: 53610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:14:33,885-Speed 10771.04 samples/sec   Loss 9.2744   LearningRate 0.0540   Epoch: 10   Global Step: 53620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:14:34,813-Speed 11041.19 samples/sec   Loss 9.2660   LearningRate 0.0540   Epoch: 10   Global Step: 53630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:14:35,803-Speed 10354.89 samples/sec   Loss 9.2958   LearningRate 0.0540   Epoch: 10   Global Step: 53640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:14:36,774-Speed 10552.44 samples/sec   Loss 9.4286   LearningRate 0.0540   Epoch: 10   Global Step: 53650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:14:37,748-Speed 10524.98 samples/sec   Loss 9.2644   LearningRate 0.0540   Epoch: 10   Global Step: 53660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:14:38,717-Speed 10585.99 samples/sec   Loss 9.2308   LearningRate 0.0540   Epoch: 10   Global Step: 53670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:14:39,670-Speed 10757.29 samples/sec   Loss 9.4121   LearningRate 0.0540   Epoch: 10   Global Step: 53680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:14:40,619-Speed 10799.00 samples/sec   Loss 9.1274   LearningRate 0.0540   Epoch: 10   Global Step: 53690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:14:41,555-Speed 10945.07 samples/sec   Loss 9.1749   LearningRate 0.0540   Epoch: 10   Global Step: 53700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:14:42,535-Speed 10459.50 samples/sec   Loss 9.2099   LearningRate 0.0540   Epoch: 10   Global Step: 53710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:14:43,473-Speed 10922.63 samples/sec   Loss 9.1744   LearningRate 0.0539   Epoch: 10   Global Step: 53720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:14:44,433-Speed 10686.34 samples/sec   Loss 9.1956   LearningRate 0.0539   Epoch: 10   Global Step: 53730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:14:45,397-Speed 10632.82 samples/sec   Loss 9.1655   LearningRate 0.0539   Epoch: 10   Global Step: 53740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:14:46,402-Speed 10207.40 samples/sec   Loss 9.2130   LearningRate 0.0539   Epoch: 10   Global Step: 53750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:14:47,396-Speed 10308.14 samples/sec   Loss 9.1634   LearningRate 0.0539   Epoch: 10   Global Step: 53760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:14:48,419-Speed 10014.94 samples/sec   Loss 9.2407   LearningRate 0.0539   Epoch: 10   Global Step: 53770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:14:49,399-Speed 10494.12 samples/sec   Loss 9.3239   LearningRate 0.0539   Epoch: 10   Global Step: 53780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:14:50,364-Speed 10624.37 samples/sec   Loss 9.2073   LearningRate 0.0539   Epoch: 10   Global Step: 53790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:14:51,323-Speed 10683.88 samples/sec   Loss 9.2637   LearningRate 0.0539   Epoch: 10   Global Step: 53800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:14:52,290-Speed 10593.43 samples/sec   Loss 9.1333   LearningRate 0.0539   Epoch: 10   Global Step: 53810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:14:53,266-Speed 10511.76 samples/sec   Loss 9.1584   LearningRate 0.0539   Epoch: 10   Global Step: 53820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:14:54,233-Speed 10594.69 samples/sec   Loss 9.4093   LearningRate 0.0539   Epoch: 10   Global Step: 53830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:14:55,211-Speed 10483.73 samples/sec   Loss 9.3102   LearningRate 0.0539   Epoch: 10   Global Step: 53840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:14:56,153-Speed 10869.99 samples/sec   Loss 9.2541   LearningRate 0.0539   Epoch: 10   Global Step: 53850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:14:57,115-Speed 10655.10 samples/sec   Loss 9.2154   LearningRate 0.0538   Epoch: 10   Global Step: 53860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:14:58,071-Speed 10726.02 samples/sec   Loss 9.3492   LearningRate 0.0538   Epoch: 10   Global Step: 53870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:14:59,045-Speed 10530.11 samples/sec   Loss 9.2726   LearningRate 0.0538   Epoch: 10   Global Step: 53880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:14:59,992-Speed 10823.42 samples/sec   Loss 9.2802   LearningRate 0.0538   Epoch: 10   Global Step: 53890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:15:00,949-Speed 10711.59 samples/sec   Loss 9.4473   LearningRate 0.0538   Epoch: 10   Global Step: 53900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:15:01,981-Speed 9940.57 samples/sec   Loss 9.1474   LearningRate 0.0538   Epoch: 10   Global Step: 53910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:15:02,933-Speed 10756.16 samples/sec   Loss 9.3862   LearningRate 0.0538   Epoch: 10   Global Step: 53920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:15:03,910-Speed 10497.31 samples/sec   Loss 9.2464   LearningRate 0.0538   Epoch: 10   Global Step: 53930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:15:04,879-Speed 10575.68 samples/sec   Loss 9.3095   LearningRate 0.0538   Epoch: 10   Global Step: 53940   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:15:05,840-Speed 10662.32 samples/sec   Loss 9.1050   LearningRate 0.0538   Epoch: 10   Global Step: 53950   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:15:06,784-Speed 10861.70 samples/sec   Loss 9.5209   LearningRate 0.0538   Epoch: 10   Global Step: 53960   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:15:07,817-Speed 9917.88 samples/sec   Loss 9.2666   LearningRate 0.0538   Epoch: 10   Global Step: 53970   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:15:08,778-Speed 10664.38 samples/sec   Loss 9.1465   LearningRate 0.0538   Epoch: 10   Global Step: 53980   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:15:09,736-Speed 10709.93 samples/sec   Loss 9.4503   LearningRate 0.0538   Epoch: 10   Global Step: 53990   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:15:10,694-Speed 10695.71 samples/sec   Loss 9.2869   LearningRate 0.0537   Epoch: 10   Global Step: 54000   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:15:32,862-[lfw][54000]XNorm: 13.164772
Training: 2022-04-11 01:15:32,863-[lfw][54000]Accuracy-Flip: 0.99400+-0.00448
Training: 2022-04-11 01:15:32,864-[lfw][54000]Accuracy-Highest: 0.99550
Training: 2022-04-11 01:15:58,314-[cfp_fp][54000]XNorm: 11.064895
Training: 2022-04-11 01:15:58,315-[cfp_fp][54000]Accuracy-Flip: 0.95243+-0.01116
Training: 2022-04-11 01:15:58,316-[cfp_fp][54000]Accuracy-Highest: 0.95386
Training: 2022-04-11 01:16:20,430-[agedb_30][54000]XNorm: 12.765718
Training: 2022-04-11 01:16:20,431-[agedb_30][54000]Accuracy-Flip: 0.96133+-0.01040
Training: 2022-04-11 01:16:20,432-[agedb_30][54000]Accuracy-Highest: 0.96250
Training: 2022-04-11 01:16:21,372-Speed 144.88 samples/sec   Loss 9.4021   LearningRate 0.0537   Epoch: 10   Global Step: 54010   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:16:22,337-Speed 10614.76 samples/sec   Loss 9.3064   LearningRate 0.0537   Epoch: 10   Global Step: 54020   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:16:23,346-Speed 10161.56 samples/sec   Loss 9.1132   LearningRate 0.0537   Epoch: 10   Global Step: 54030   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 01:16:24,316-Speed 10561.65 samples/sec   Loss 9.1738   LearningRate 0.0537   Epoch: 10   Global Step: 54040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:16:25,280-Speed 10637.67 samples/sec   Loss 9.1259   LearningRate 0.0537   Epoch: 10   Global Step: 54050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:16:26,229-Speed 10802.23 samples/sec   Loss 9.2121   LearningRate 0.0537   Epoch: 10   Global Step: 54060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:16:27,192-Speed 10643.42 samples/sec   Loss 9.2836   LearningRate 0.0537   Epoch: 10   Global Step: 54070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:16:28,147-Speed 10734.61 samples/sec   Loss 9.1037   LearningRate 0.0537   Epoch: 10   Global Step: 54080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:16:29,113-Speed 10616.78 samples/sec   Loss 9.1753   LearningRate 0.0537   Epoch: 10   Global Step: 54090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:16:30,058-Speed 10840.51 samples/sec   Loss 9.2923   LearningRate 0.0537   Epoch: 10   Global Step: 54100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:16:31,010-Speed 10772.01 samples/sec   Loss 9.1591   LearningRate 0.0537   Epoch: 10   Global Step: 54110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:16:31,958-Speed 10804.28 samples/sec   Loss 9.2494   LearningRate 0.0537   Epoch: 10   Global Step: 54120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:16:32,918-Speed 10683.79 samples/sec   Loss 9.1976   LearningRate 0.0536   Epoch: 10   Global Step: 54130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:16:33,875-Speed 10700.44 samples/sec   Loss 9.1454   LearningRate 0.0536   Epoch: 10   Global Step: 54140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:16:34,843-Speed 10598.39 samples/sec   Loss 9.2038   LearningRate 0.0536   Epoch: 10   Global Step: 54150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:16:35,793-Speed 10784.03 samples/sec   Loss 9.1050   LearningRate 0.0536   Epoch: 10   Global Step: 54160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:16:36,736-Speed 10862.49 samples/sec   Loss 9.1791   LearningRate 0.0536   Epoch: 10   Global Step: 54170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:16:37,725-Speed 10365.35 samples/sec   Loss 9.2795   LearningRate 0.0536   Epoch: 10   Global Step: 54180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:16:38,673-Speed 10819.49 samples/sec   Loss 9.2114   LearningRate 0.0536   Epoch: 10   Global Step: 54190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:16:39,640-Speed 10602.51 samples/sec   Loss 9.3198   LearningRate 0.0536   Epoch: 10   Global Step: 54200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:16:40,586-Speed 10833.53 samples/sec   Loss 9.2282   LearningRate 0.0536   Epoch: 10   Global Step: 54210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:16:41,561-Speed 10508.89 samples/sec   Loss 9.3572   LearningRate 0.0536   Epoch: 10   Global Step: 54220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:16:42,510-Speed 10797.91 samples/sec   Loss 9.4186   LearningRate 0.0536   Epoch: 10   Global Step: 54230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:16:43,477-Speed 10602.41 samples/sec   Loss 9.1827   LearningRate 0.0536   Epoch: 10   Global Step: 54240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:16:44,418-Speed 10896.13 samples/sec   Loss 9.1994   LearningRate 0.0536   Epoch: 10   Global Step: 54250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:16:45,383-Speed 10613.84 samples/sec   Loss 9.2957   LearningRate 0.0536   Epoch: 10   Global Step: 54260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:16:46,364-Speed 10449.44 samples/sec   Loss 9.2266   LearningRate 0.0535   Epoch: 10   Global Step: 54270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:16:47,334-Speed 10565.50 samples/sec   Loss 9.2822   LearningRate 0.0535   Epoch: 10   Global Step: 54280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:16:48,294-Speed 10684.02 samples/sec   Loss 9.2200   LearningRate 0.0535   Epoch: 10   Global Step: 54290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:16:49,233-Speed 10912.94 samples/sec   Loss 9.2269   LearningRate 0.0535   Epoch: 10   Global Step: 54300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:16:50,202-Speed 10577.19 samples/sec   Loss 9.1202   LearningRate 0.0535   Epoch: 10   Global Step: 54310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:16:51,157-Speed 10735.03 samples/sec   Loss 9.2290   LearningRate 0.0535   Epoch: 10   Global Step: 54320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:16:52,130-Speed 10529.04 samples/sec   Loss 9.4548   LearningRate 0.0535   Epoch: 10   Global Step: 54330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:16:53,058-Speed 11048.54 samples/sec   Loss 9.2119   LearningRate 0.0535   Epoch: 10   Global Step: 54340   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 01:16:54,028-Speed 10569.47 samples/sec   Loss 9.1089   LearningRate 0.0535   Epoch: 10   Global Step: 54350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:16:55,035-Speed 10173.67 samples/sec   Loss 9.3971   LearningRate 0.0535   Epoch: 10   Global Step: 54360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:16:56,003-Speed 10593.27 samples/sec   Loss 9.3553   LearningRate 0.0535   Epoch: 10   Global Step: 54370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:16:57,005-Speed 10228.20 samples/sec   Loss 9.2791   LearningRate 0.0535   Epoch: 10   Global Step: 54380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:16:57,974-Speed 10577.95 samples/sec   Loss 9.2493   LearningRate 0.0535   Epoch: 10   Global Step: 54390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:16:59,008-Speed 9911.08 samples/sec   Loss 9.3922   LearningRate 0.0535   Epoch: 10   Global Step: 54400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:17:00,003-Speed 10302.15 samples/sec   Loss 9.2480   LearningRate 0.0534   Epoch: 10   Global Step: 54410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:17:01,082-Speed 9494.29 samples/sec   Loss 9.2702   LearningRate 0.0534   Epoch: 10   Global Step: 54420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:17:02,075-Speed 10329.72 samples/sec   Loss 9.0921   LearningRate 0.0534   Epoch: 10   Global Step: 54430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:17:03,055-Speed 10457.42 samples/sec   Loss 9.2499   LearningRate 0.0534   Epoch: 10   Global Step: 54440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:17:03,979-Speed 11089.81 samples/sec   Loss 9.2572   LearningRate 0.0534   Epoch: 10   Global Step: 54450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:17:04,937-Speed 10705.85 samples/sec   Loss 9.2152   LearningRate 0.0534   Epoch: 10   Global Step: 54460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:17:05,948-Speed 10135.36 samples/sec   Loss 9.2643   LearningRate 0.0534   Epoch: 10   Global Step: 54470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:17:06,963-Speed 10102.67 samples/sec   Loss 9.2940   LearningRate 0.0534   Epoch: 10   Global Step: 54480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:17:07,930-Speed 10605.10 samples/sec   Loss 9.2424   LearningRate 0.0534   Epoch: 10   Global Step: 54490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:17:08,897-Speed 10593.01 samples/sec   Loss 9.1619   LearningRate 0.0534   Epoch: 10   Global Step: 54500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:17:09,847-Speed 10787.20 samples/sec   Loss 9.3011   LearningRate 0.0534   Epoch: 10   Global Step: 54510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:17:10,817-Speed 10569.45 samples/sec   Loss 9.2725   LearningRate 0.0534   Epoch: 10   Global Step: 54520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:17:11,769-Speed 10769.23 samples/sec   Loss 9.0996   LearningRate 0.0534   Epoch: 10   Global Step: 54530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:17:12,712-Speed 10876.19 samples/sec   Loss 9.2194   LearningRate 0.0534   Epoch: 10   Global Step: 54540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:17:13,675-Speed 10635.14 samples/sec   Loss 9.2766   LearningRate 0.0533   Epoch: 10   Global Step: 54550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:17:14,681-Speed 10188.92 samples/sec   Loss 9.1816   LearningRate 0.0533   Epoch: 10   Global Step: 54560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:17:15,638-Speed 10716.22 samples/sec   Loss 9.1224   LearningRate 0.0533   Epoch: 10   Global Step: 54570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:17:16,571-Speed 10984.51 samples/sec   Loss 9.3319   LearningRate 0.0533   Epoch: 10   Global Step: 54580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:17:17,523-Speed 10768.69 samples/sec   Loss 9.2380   LearningRate 0.0533   Epoch: 10   Global Step: 54590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:17:18,494-Speed 10552.91 samples/sec   Loss 9.1872   LearningRate 0.0533   Epoch: 10   Global Step: 54600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:17:19,492-Speed 10273.98 samples/sec   Loss 9.3556   LearningRate 0.0533   Epoch: 10   Global Step: 54610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:17:20,481-Speed 10364.36 samples/sec   Loss 9.2137   LearningRate 0.0533   Epoch: 10   Global Step: 54620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:17:21,449-Speed 10600.46 samples/sec   Loss 9.2825   LearningRate 0.0533   Epoch: 10   Global Step: 54630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:17:22,428-Speed 10463.16 samples/sec   Loss 9.2587   LearningRate 0.0533   Epoch: 10   Global Step: 54640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:17:23,397-Speed 10578.17 samples/sec   Loss 9.0091   LearningRate 0.0533   Epoch: 10   Global Step: 54650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:17:24,383-Speed 10388.00 samples/sec   Loss 9.2945   LearningRate 0.0533   Epoch: 10   Global Step: 54660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:17:25,357-Speed 10532.47 samples/sec   Loss 9.0506   LearningRate 0.0533   Epoch: 10   Global Step: 54670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:17:26,294-Speed 10946.14 samples/sec   Loss 9.3422   LearningRate 0.0533   Epoch: 10   Global Step: 54680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:17:27,267-Speed 10528.63 samples/sec   Loss 9.2817   LearningRate 0.0532   Epoch: 10   Global Step: 54690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:17:28,211-Speed 10859.39 samples/sec   Loss 9.0385   LearningRate 0.0532   Epoch: 10   Global Step: 54700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:17:29,201-Speed 10358.15 samples/sec   Loss 9.1881   LearningRate 0.0532   Epoch: 10   Global Step: 54710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:17:30,131-Speed 11031.07 samples/sec   Loss 9.3063   LearningRate 0.0532   Epoch: 10   Global Step: 54720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:17:31,091-Speed 10676.60 samples/sec   Loss 9.1721   LearningRate 0.0532   Epoch: 10   Global Step: 54730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:17:32,053-Speed 10654.56 samples/sec   Loss 9.0969   LearningRate 0.0532   Epoch: 10   Global Step: 54740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:17:33,036-Speed 10428.07 samples/sec   Loss 9.4106   LearningRate 0.0532   Epoch: 10   Global Step: 54750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:17:33,995-Speed 10695.65 samples/sec   Loss 9.2172   LearningRate 0.0532   Epoch: 10   Global Step: 54760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:17:34,939-Speed 10844.88 samples/sec   Loss 9.2561   LearningRate 0.0532   Epoch: 10   Global Step: 54770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:17:35,914-Speed 10545.33 samples/sec   Loss 9.3144   LearningRate 0.0532   Epoch: 10   Global Step: 54780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:17:36,900-Speed 10413.86 samples/sec   Loss 9.1536   LearningRate 0.0532   Epoch: 10   Global Step: 54790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:17:37,880-Speed 10463.16 samples/sec   Loss 9.3931   LearningRate 0.0532   Epoch: 10   Global Step: 54800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:17:38,841-Speed 10661.10 samples/sec   Loss 9.2820   LearningRate 0.0532   Epoch: 10   Global Step: 54810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:17:39,802-Speed 10673.82 samples/sec   Loss 9.2981   LearningRate 0.0532   Epoch: 10   Global Step: 54820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:17:40,787-Speed 10399.32 samples/sec   Loss 9.1561   LearningRate 0.0531   Epoch: 10   Global Step: 54830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:17:41,759-Speed 10546.11 samples/sec   Loss 9.2511   LearningRate 0.0531   Epoch: 10   Global Step: 54840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:17:42,727-Speed 10590.62 samples/sec   Loss 9.2127   LearningRate 0.0531   Epoch: 10   Global Step: 54850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:17:43,698-Speed 10555.27 samples/sec   Loss 9.3434   LearningRate 0.0531   Epoch: 10   Global Step: 54860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:17:44,668-Speed 10587.39 samples/sec   Loss 9.2458   LearningRate 0.0531   Epoch: 10   Global Step: 54870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:17:45,634-Speed 10605.33 samples/sec   Loss 9.3397   LearningRate 0.0531   Epoch: 10   Global Step: 54880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:17:46,597-Speed 10649.36 samples/sec   Loss 9.3107   LearningRate 0.0531   Epoch: 10   Global Step: 54890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:17:47,547-Speed 10788.41 samples/sec   Loss 9.2940   LearningRate 0.0531   Epoch: 10   Global Step: 54900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:17:48,537-Speed 10346.76 samples/sec   Loss 9.1340   LearningRate 0.0531   Epoch: 10   Global Step: 54910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:17:49,488-Speed 10778.56 samples/sec   Loss 9.1143   LearningRate 0.0531   Epoch: 10   Global Step: 54920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:17:50,469-Speed 10456.57 samples/sec   Loss 9.2526   LearningRate 0.0531   Epoch: 10   Global Step: 54930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:17:51,454-Speed 10403.82 samples/sec   Loss 9.2364   LearningRate 0.0531   Epoch: 10   Global Step: 54940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:17:52,461-Speed 10175.71 samples/sec   Loss 9.2884   LearningRate 0.0531   Epoch: 10   Global Step: 54950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 01:17:53,443-Speed 10453.22 samples/sec   Loss 9.3706   LearningRate 0.0530   Epoch: 10   Global Step: 54960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:17:54,380-Speed 10928.57 samples/sec   Loss 9.3238   LearningRate 0.0530   Epoch: 10   Global Step: 54970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:17:55,354-Speed 10521.12 samples/sec   Loss 9.2942   LearningRate 0.0530   Epoch: 10   Global Step: 54980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 01:17:56,298-Speed 10862.56 samples/sec   Loss 9.2167   LearningRate 0.0530   Epoch: 10   Global Step: 54990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:17:57,275-Speed 10491.57 samples/sec   Loss 9.3225   LearningRate 0.0530   Epoch: 10   Global Step: 55000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:17:58,232-Speed 10715.61 samples/sec   Loss 9.3712   LearningRate 0.0530   Epoch: 10   Global Step: 55010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:17:59,185-Speed 10752.93 samples/sec   Loss 9.2863   LearningRate 0.0530   Epoch: 10   Global Step: 55020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:18:00,166-Speed 10458.05 samples/sec   Loss 9.2958   LearningRate 0.0530   Epoch: 10   Global Step: 55030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:18:01,141-Speed 10514.23 samples/sec   Loss 9.1777   LearningRate 0.0530   Epoch: 10   Global Step: 55040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:18:02,090-Speed 10802.13 samples/sec   Loss 9.2144   LearningRate 0.0530   Epoch: 10   Global Step: 55050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:18:03,051-Speed 10660.60 samples/sec   Loss 9.1813   LearningRate 0.0530   Epoch: 10   Global Step: 55060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:18:04,033-Speed 10440.85 samples/sec   Loss 9.1581   LearningRate 0.0530   Epoch: 10   Global Step: 55070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:18:05,007-Speed 10521.12 samples/sec   Loss 9.2287   LearningRate 0.0530   Epoch: 10   Global Step: 55080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:18:05,981-Speed 10524.55 samples/sec   Loss 9.2891   LearningRate 0.0530   Epoch: 10   Global Step: 55090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:18:06,911-Speed 11019.21 samples/sec   Loss 9.1613   LearningRate 0.0529   Epoch: 10   Global Step: 55100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:18:07,914-Speed 10223.14 samples/sec   Loss 9.2835   LearningRate 0.0529   Epoch: 10   Global Step: 55110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:18:08,883-Speed 10575.03 samples/sec   Loss 9.3458   LearningRate 0.0529   Epoch: 10   Global Step: 55120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:18:09,842-Speed 10685.58 samples/sec   Loss 9.0511   LearningRate 0.0529   Epoch: 10   Global Step: 55130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:18:10,792-Speed 10786.77 samples/sec   Loss 9.2151   LearningRate 0.0529   Epoch: 10   Global Step: 55140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:18:11,775-Speed 10431.89 samples/sec   Loss 9.3542   LearningRate 0.0529   Epoch: 10   Global Step: 55150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:18:12,747-Speed 10542.22 samples/sec   Loss 9.2162   LearningRate 0.0529   Epoch: 10   Global Step: 55160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:18:13,694-Speed 10835.74 samples/sec   Loss 9.2864   LearningRate 0.0529   Epoch: 10   Global Step: 55170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:18:14,664-Speed 10566.47 samples/sec   Loss 9.1190   LearningRate 0.0529   Epoch: 10   Global Step: 55180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:18:15,639-Speed 10509.21 samples/sec   Loss 9.3117   LearningRate 0.0529   Epoch: 10   Global Step: 55190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:18:16,610-Speed 10557.64 samples/sec   Loss 9.2043   LearningRate 0.0529   Epoch: 10   Global Step: 55200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:18:17,591-Speed 10458.32 samples/sec   Loss 9.4401   LearningRate 0.0529   Epoch: 10   Global Step: 55210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:18:18,543-Speed 10762.74 samples/sec   Loss 9.2026   LearningRate 0.0529   Epoch: 10   Global Step: 55220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:18:19,585-Speed 9837.02 samples/sec   Loss 9.1817   LearningRate 0.0529   Epoch: 10   Global Step: 55230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:18:20,580-Speed 10303.85 samples/sec   Loss 9.0070   LearningRate 0.0528   Epoch: 10   Global Step: 55240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:18:21,542-Speed 10659.93 samples/sec   Loss 9.2141   LearningRate 0.0528   Epoch: 10   Global Step: 55250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:18:22,465-Speed 11103.35 samples/sec   Loss 9.2374   LearningRate 0.0528   Epoch: 10   Global Step: 55260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:18:23,422-Speed 10712.28 samples/sec   Loss 9.2370   LearningRate 0.0528   Epoch: 10   Global Step: 55270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:18:24,451-Speed 9956.60 samples/sec   Loss 9.2873   LearningRate 0.0528   Epoch: 10   Global Step: 55280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:18:25,380-Speed 11040.63 samples/sec   Loss 9.1886   LearningRate 0.0528   Epoch: 10   Global Step: 55290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:18:26,326-Speed 10831.80 samples/sec   Loss 9.3633   LearningRate 0.0528   Epoch: 10   Global Step: 55300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:18:27,229-Speed 11353.74 samples/sec   Loss 9.2457   LearningRate 0.0528   Epoch: 10   Global Step: 55310   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 01:18:28,198-Speed 10570.16 samples/sec   Loss 9.1983   LearningRate 0.0528   Epoch: 10   Global Step: 55320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:18:29,223-Speed 10009.11 samples/sec   Loss 9.2729   LearningRate 0.0528   Epoch: 10   Global Step: 55330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:18:30,164-Speed 10890.34 samples/sec   Loss 9.0111   LearningRate 0.0528   Epoch: 10   Global Step: 55340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:18:31,147-Speed 10421.81 samples/sec   Loss 9.2127   LearningRate 0.0528   Epoch: 10   Global Step: 55350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:18:32,093-Speed 10843.74 samples/sec   Loss 9.1494   LearningRate 0.0528   Epoch: 10   Global Step: 55360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:18:33,033-Speed 10897.10 samples/sec   Loss 9.1021   LearningRate 0.0528   Epoch: 10   Global Step: 55370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:18:33,989-Speed 10724.45 samples/sec   Loss 9.2095   LearningRate 0.0527   Epoch: 10   Global Step: 55380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:18:34,968-Speed 10468.33 samples/sec   Loss 9.1183   LearningRate 0.0527   Epoch: 10   Global Step: 55390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:18:35,928-Speed 10689.06 samples/sec   Loss 9.3631   LearningRate 0.0527   Epoch: 10   Global Step: 55400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:18:36,887-Speed 10676.94 samples/sec   Loss 9.2989   LearningRate 0.0527   Epoch: 10   Global Step: 55410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:18:37,884-Speed 10462.64 samples/sec   Loss 9.1601   LearningRate 0.0527   Epoch: 10   Global Step: 55420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:18:38,837-Speed 10763.83 samples/sec   Loss 9.3907   LearningRate 0.0527   Epoch: 10   Global Step: 55430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:18:39,803-Speed 10609.81 samples/sec   Loss 9.2181   LearningRate 0.0527   Epoch: 10   Global Step: 55440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:18:40,742-Speed 10917.52 samples/sec   Loss 9.0595   LearningRate 0.0527   Epoch: 10   Global Step: 55450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:18:41,730-Speed 10365.56 samples/sec   Loss 9.4201   LearningRate 0.0527   Epoch: 10   Global Step: 55460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:18:42,749-Speed 10061.05 samples/sec   Loss 9.2341   LearningRate 0.0527   Epoch: 10   Global Step: 55470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:18:43,726-Speed 10493.53 samples/sec   Loss 9.0983   LearningRate 0.0527   Epoch: 10   Global Step: 55480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:18:44,662-Speed 10951.37 samples/sec   Loss 9.1680   LearningRate 0.0527   Epoch: 10   Global Step: 55490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:18:45,597-Speed 10961.02 samples/sec   Loss 9.2427   LearningRate 0.0527   Epoch: 10   Global Step: 55500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:18:46,550-Speed 10761.83 samples/sec   Loss 9.3491   LearningRate 0.0527   Epoch: 10   Global Step: 55510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:18:47,515-Speed 10628.42 samples/sec   Loss 9.3240   LearningRate 0.0526   Epoch: 10   Global Step: 55520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:18:48,467-Speed 10759.03 samples/sec   Loss 9.3239   LearningRate 0.0526   Epoch: 10   Global Step: 55530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:18:49,512-Speed 9812.52 samples/sec   Loss 9.2052   LearningRate 0.0526   Epoch: 10   Global Step: 55540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:18:50,467-Speed 10737.72 samples/sec   Loss 9.1875   LearningRate 0.0526   Epoch: 10   Global Step: 55550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:18:51,428-Speed 10659.53 samples/sec   Loss 9.2174   LearningRate 0.0526   Epoch: 10   Global Step: 55560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:18:52,391-Speed 10645.36 samples/sec   Loss 9.3073   LearningRate 0.0526   Epoch: 10   Global Step: 55570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:18:53,385-Speed 10313.06 samples/sec   Loss 9.2812   LearningRate 0.0526   Epoch: 10   Global Step: 55580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:18:54,363-Speed 10476.16 samples/sec   Loss 9.3334   LearningRate 0.0526   Epoch: 10   Global Step: 55590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:18:55,355-Speed 10332.42 samples/sec   Loss 9.2493   LearningRate 0.0526   Epoch: 10   Global Step: 55600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:18:56,365-Speed 10148.20 samples/sec   Loss 9.2001   LearningRate 0.0526   Epoch: 10   Global Step: 55610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:18:57,368-Speed 10220.51 samples/sec   Loss 9.0864   LearningRate 0.0526   Epoch: 10   Global Step: 55620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:18:58,551-Speed 8668.19 samples/sec   Loss 9.2553   LearningRate 0.0526   Epoch: 10   Global Step: 55630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:19:09,233-Speed 958.79 samples/sec   Loss 9.0218   LearningRate 0.0526   Epoch: 11   Global Step: 55640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:19:10,259-Speed 9997.43 samples/sec   Loss 8.4440   LearningRate 0.0526   Epoch: 11   Global Step: 55650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:19:11,550-Speed 7939.28 samples/sec   Loss 8.4048   LearningRate 0.0525   Epoch: 11   Global Step: 55660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:19:12,627-Speed 9509.06 samples/sec   Loss 8.3010   LearningRate 0.0525   Epoch: 11   Global Step: 55670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:19:13,710-Speed 9465.95 samples/sec   Loss 8.2347   LearningRate 0.0525   Epoch: 11   Global Step: 55680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:19:14,790-Speed 9500.51 samples/sec   Loss 8.3781   LearningRate 0.0525   Epoch: 11   Global Step: 55690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:19:15,861-Speed 9562.19 samples/sec   Loss 8.3101   LearningRate 0.0525   Epoch: 11   Global Step: 55700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:19:16,943-Speed 9470.87 samples/sec   Loss 8.2810   LearningRate 0.0525   Epoch: 11   Global Step: 55710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:19:17,985-Speed 9842.77 samples/sec   Loss 8.4395   LearningRate 0.0525   Epoch: 11   Global Step: 55720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:19:18,960-Speed 10539.81 samples/sec   Loss 8.2408   LearningRate 0.0525   Epoch: 11   Global Step: 55730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:19:19,884-Speed 11089.97 samples/sec   Loss 8.2907   LearningRate 0.0525   Epoch: 11   Global Step: 55740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:19:20,866-Speed 10448.03 samples/sec   Loss 8.3799   LearningRate 0.0525   Epoch: 11   Global Step: 55750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:19:21,851-Speed 10406.02 samples/sec   Loss 8.2744   LearningRate 0.0525   Epoch: 11   Global Step: 55760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:19:22,886-Speed 9905.44 samples/sec   Loss 8.2713   LearningRate 0.0525   Epoch: 11   Global Step: 55770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:19:23,889-Speed 10215.40 samples/sec   Loss 8.5389   LearningRate 0.0525   Epoch: 11   Global Step: 55780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:19:24,853-Speed 10629.40 samples/sec   Loss 8.4130   LearningRate 0.0525   Epoch: 11   Global Step: 55790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:19:25,850-Speed 10288.13 samples/sec   Loss 8.2949   LearningRate 0.0524   Epoch: 11   Global Step: 55800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:19:26,860-Speed 10139.04 samples/sec   Loss 8.3605   LearningRate 0.0524   Epoch: 11   Global Step: 55810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:19:28,123-Speed 8119.31 samples/sec   Loss 8.3034   LearningRate 0.0524   Epoch: 11   Global Step: 55820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:19:29,083-Speed 10683.60 samples/sec   Loss 8.5243   LearningRate 0.0524   Epoch: 11   Global Step: 55830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:19:30,181-Speed 9337.05 samples/sec   Loss 8.5432   LearningRate 0.0524   Epoch: 11   Global Step: 55840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:19:31,216-Speed 9892.47 samples/sec   Loss 8.3682   LearningRate 0.0524   Epoch: 11   Global Step: 55850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:19:32,228-Speed 10134.52 samples/sec   Loss 8.4266   LearningRate 0.0524   Epoch: 11   Global Step: 55860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:19:33,257-Speed 9972.37 samples/sec   Loss 8.4751   LearningRate 0.0524   Epoch: 11   Global Step: 55870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:19:34,222-Speed 10615.91 samples/sec   Loss 8.4155   LearningRate 0.0524   Epoch: 11   Global Step: 55880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:19:35,281-Speed 9677.75 samples/sec   Loss 8.4968   LearningRate 0.0524   Epoch: 11   Global Step: 55890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:19:36,337-Speed 9700.92 samples/sec   Loss 8.4854   LearningRate 0.0524   Epoch: 11   Global Step: 55900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:19:37,372-Speed 9907.63 samples/sec   Loss 8.5057   LearningRate 0.0524   Epoch: 11   Global Step: 55910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:19:38,332-Speed 10683.43 samples/sec   Loss 8.5911   LearningRate 0.0524   Epoch: 11   Global Step: 55920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:19:39,360-Speed 9962.51 samples/sec   Loss 8.5395   LearningRate 0.0524   Epoch: 11   Global Step: 55930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:19:40,367-Speed 10183.28 samples/sec   Loss 8.4981   LearningRate 0.0523   Epoch: 11   Global Step: 55940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:19:41,440-Speed 9542.61 samples/sec   Loss 8.5352   LearningRate 0.0523   Epoch: 11   Global Step: 55950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:19:42,415-Speed 10524.66 samples/sec   Loss 8.7054   LearningRate 0.0523   Epoch: 11   Global Step: 55960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:19:43,413-Speed 10265.40 samples/sec   Loss 8.5471   LearningRate 0.0523   Epoch: 11   Global Step: 55970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:19:44,545-Speed 9057.99 samples/sec   Loss 8.5142   LearningRate 0.0523   Epoch: 11   Global Step: 55980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:19:45,655-Speed 9229.00 samples/sec   Loss 8.5500   LearningRate 0.0523   Epoch: 11   Global Step: 55990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:19:46,613-Speed 10703.17 samples/sec   Loss 8.6462   LearningRate 0.0523   Epoch: 11   Global Step: 56000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:20:09,051-[lfw][56000]XNorm: 12.979810
Training: 2022-04-11 01:20:09,052-[lfw][56000]Accuracy-Flip: 0.99500+-0.00459
Training: 2022-04-11 01:20:09,052-[lfw][56000]Accuracy-Highest: 0.99550
Training: 2022-04-11 01:20:34,477-[cfp_fp][56000]XNorm: 10.955253
Training: 2022-04-11 01:20:34,478-[cfp_fp][56000]Accuracy-Flip: 0.95486+-0.01023
Training: 2022-04-11 01:20:34,479-[cfp_fp][56000]Accuracy-Highest: 0.95486
Training: 2022-04-11 01:20:56,519-[agedb_30][56000]XNorm: 12.579340
Training: 2022-04-11 01:20:56,520-[agedb_30][56000]Accuracy-Flip: 0.96283+-0.00960
Training: 2022-04-11 01:20:56,520-[agedb_30][56000]Accuracy-Highest: 0.96283
Training: 2022-04-11 01:20:57,474-Speed 144.51 samples/sec   Loss 8.6760   LearningRate 0.0523   Epoch: 11   Global Step: 56010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:20:58,428-Speed 10745.31 samples/sec   Loss 8.5729   LearningRate 0.0523   Epoch: 11   Global Step: 56020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:20:59,387-Speed 10688.10 samples/sec   Loss 8.5040   LearningRate 0.0523   Epoch: 11   Global Step: 56030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:21:00,350-Speed 10645.94 samples/sec   Loss 8.7013   LearningRate 0.0523   Epoch: 11   Global Step: 56040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:21:01,308-Speed 10702.38 samples/sec   Loss 8.6081   LearningRate 0.0523   Epoch: 11   Global Step: 56050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:21:02,290-Speed 10434.85 samples/sec   Loss 8.6619   LearningRate 0.0523   Epoch: 11   Global Step: 56060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:21:03,249-Speed 10691.10 samples/sec   Loss 8.6512   LearningRate 0.0523   Epoch: 11   Global Step: 56070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:21:04,306-Speed 9706.05 samples/sec   Loss 8.6528   LearningRate 0.0522   Epoch: 11   Global Step: 56080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:21:05,256-Speed 10790.19 samples/sec   Loss 8.5023   LearningRate 0.0522   Epoch: 11   Global Step: 56090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:21:06,201-Speed 10843.14 samples/sec   Loss 8.7328   LearningRate 0.0522   Epoch: 11   Global Step: 56100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:21:07,181-Speed 10450.33 samples/sec   Loss 8.6144   LearningRate 0.0522   Epoch: 11   Global Step: 56110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:21:08,189-Speed 10168.49 samples/sec   Loss 8.6922   LearningRate 0.0522   Epoch: 11   Global Step: 56120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:21:09,186-Speed 10279.35 samples/sec   Loss 8.6875   LearningRate 0.0522   Epoch: 11   Global Step: 56130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:21:10,159-Speed 10542.57 samples/sec   Loss 8.6783   LearningRate 0.0522   Epoch: 11   Global Step: 56140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:21:11,182-Speed 10020.42 samples/sec   Loss 8.4754   LearningRate 0.0522   Epoch: 11   Global Step: 56150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:21:12,166-Speed 10422.96 samples/sec   Loss 8.6926   LearningRate 0.0522   Epoch: 11   Global Step: 56160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:21:13,246-Speed 9487.88 samples/sec   Loss 8.6587   LearningRate 0.0522   Epoch: 11   Global Step: 56170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:21:14,221-Speed 10503.35 samples/sec   Loss 8.6431   LearningRate 0.0522   Epoch: 11   Global Step: 56180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:21:15,210-Speed 10379.07 samples/sec   Loss 8.6996   LearningRate 0.0522   Epoch: 11   Global Step: 56190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:21:16,199-Speed 10366.97 samples/sec   Loss 8.8308   LearningRate 0.0522   Epoch: 11   Global Step: 56200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:21:17,204-Speed 10192.35 samples/sec   Loss 8.8101   LearningRate 0.0522   Epoch: 11   Global Step: 56210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:21:18,256-Speed 9751.90 samples/sec   Loss 8.7325   LearningRate 0.0521   Epoch: 11   Global Step: 56220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:21:19,337-Speed 9478.61 samples/sec   Loss 8.7337   LearningRate 0.0521   Epoch: 11   Global Step: 56230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:21:20,421-Speed 9457.68 samples/sec   Loss 8.5360   LearningRate 0.0521   Epoch: 11   Global Step: 56240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:21:21,387-Speed 10612.74 samples/sec   Loss 8.6273   LearningRate 0.0521   Epoch: 11   Global Step: 56250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:21:22,512-Speed 9114.02 samples/sec   Loss 8.6806   LearningRate 0.0521   Epoch: 11   Global Step: 56260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:21:23,475-Speed 10640.94 samples/sec   Loss 8.7645   LearningRate 0.0521   Epoch: 11   Global Step: 56270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:21:24,508-Speed 9917.83 samples/sec   Loss 8.6625   LearningRate 0.0521   Epoch: 11   Global Step: 56280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:21:25,595-Speed 9426.72 samples/sec   Loss 8.5967   LearningRate 0.0521   Epoch: 11   Global Step: 56290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:21:26,601-Speed 10192.63 samples/sec   Loss 8.7490   LearningRate 0.0521   Epoch: 11   Global Step: 56300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:21:27,575-Speed 10520.95 samples/sec   Loss 8.7972   LearningRate 0.0521   Epoch: 11   Global Step: 56310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:21:28,515-Speed 10910.17 samples/sec   Loss 8.7888   LearningRate 0.0521   Epoch: 11   Global Step: 56320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:21:29,592-Speed 9514.18 samples/sec   Loss 8.6277   LearningRate 0.0521   Epoch: 11   Global Step: 56330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:21:30,679-Speed 9427.20 samples/sec   Loss 8.6385   LearningRate 0.0521   Epoch: 11   Global Step: 56340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:21:31,722-Speed 9827.09 samples/sec   Loss 8.6367   LearningRate 0.0521   Epoch: 11   Global Step: 56350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:21:32,781-Speed 9680.49 samples/sec   Loss 8.6618   LearningRate 0.0520   Epoch: 11   Global Step: 56360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:21:33,803-Speed 10030.63 samples/sec   Loss 8.8346   LearningRate 0.0520   Epoch: 11   Global Step: 56370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:21:34,767-Speed 10634.30 samples/sec   Loss 8.8679   LearningRate 0.0520   Epoch: 11   Global Step: 56380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:21:35,820-Speed 9731.27 samples/sec   Loss 8.6333   LearningRate 0.0520   Epoch: 11   Global Step: 56390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:21:36,811-Speed 10351.69 samples/sec   Loss 8.7523   LearningRate 0.0520   Epoch: 11   Global Step: 56400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:21:37,849-Speed 9870.38 samples/sec   Loss 8.6056   LearningRate 0.0520   Epoch: 11   Global Step: 56410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:21:38,907-Speed 9694.50 samples/sec   Loss 8.8259   LearningRate 0.0520   Epoch: 11   Global Step: 56420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:21:39,942-Speed 9899.29 samples/sec   Loss 8.6877   LearningRate 0.0520   Epoch: 11   Global Step: 56430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:21:40,969-Speed 9990.09 samples/sec   Loss 8.7383   LearningRate 0.0520   Epoch: 11   Global Step: 56440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:21:41,996-Speed 9977.01 samples/sec   Loss 8.8548   LearningRate 0.0520   Epoch: 11   Global Step: 56450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:21:43,064-Speed 9594.92 samples/sec   Loss 8.9904   LearningRate 0.0520   Epoch: 11   Global Step: 56460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:21:44,055-Speed 10338.22 samples/sec   Loss 8.8811   LearningRate 0.0520   Epoch: 11   Global Step: 56470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:21:45,012-Speed 10717.89 samples/sec   Loss 8.8358   LearningRate 0.0520   Epoch: 11   Global Step: 56480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:21:46,027-Speed 10091.87 samples/sec   Loss 8.5913   LearningRate 0.0520   Epoch: 11   Global Step: 56490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:21:47,015-Speed 10377.67 samples/sec   Loss 8.6772   LearningRate 0.0519   Epoch: 11   Global Step: 56500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:21:48,021-Speed 10188.00 samples/sec   Loss 8.8448   LearningRate 0.0519   Epoch: 11   Global Step: 56510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:21:49,065-Speed 9822.30 samples/sec   Loss 8.8891   LearningRate 0.0519   Epoch: 11   Global Step: 56520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:21:50,093-Speed 9961.91 samples/sec   Loss 8.9355   LearningRate 0.0519   Epoch: 11   Global Step: 56530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:21:51,109-Speed 10095.83 samples/sec   Loss 8.8562   LearningRate 0.0519   Epoch: 11   Global Step: 56540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:21:52,157-Speed 9794.33 samples/sec   Loss 8.7482   LearningRate 0.0519   Epoch: 11   Global Step: 56550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:21:53,139-Speed 10437.21 samples/sec   Loss 8.8857   LearningRate 0.0519   Epoch: 11   Global Step: 56560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:21:54,138-Speed 10263.90 samples/sec   Loss 8.9209   LearningRate 0.0519   Epoch: 11   Global Step: 56570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:21:55,131-Speed 10315.95 samples/sec   Loss 8.8693   LearningRate 0.0519   Epoch: 11   Global Step: 56580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:21:56,279-Speed 8948.72 samples/sec   Loss 9.0426   LearningRate 0.0519   Epoch: 11   Global Step: 56590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:21:57,285-Speed 10189.89 samples/sec   Loss 9.0046   LearningRate 0.0519   Epoch: 11   Global Step: 56600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:21:58,235-Speed 10777.47 samples/sec   Loss 8.8091   LearningRate 0.0519   Epoch: 11   Global Step: 56610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:21:59,249-Speed 10109.87 samples/sec   Loss 8.7872   LearningRate 0.0519   Epoch: 11   Global Step: 56620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:22:00,355-Speed 9265.25 samples/sec   Loss 8.7669   LearningRate 0.0519   Epoch: 11   Global Step: 56630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:22:01,381-Speed 10002.55 samples/sec   Loss 8.8638   LearningRate 0.0518   Epoch: 11   Global Step: 56640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:22:02,383-Speed 10231.02 samples/sec   Loss 9.0142   LearningRate 0.0518   Epoch: 11   Global Step: 56650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:22:03,446-Speed 9645.60 samples/sec   Loss 8.7400   LearningRate 0.0518   Epoch: 11   Global Step: 56660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:22:04,459-Speed 10119.98 samples/sec   Loss 8.9024   LearningRate 0.0518   Epoch: 11   Global Step: 56670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:22:05,380-Speed 11131.69 samples/sec   Loss 8.8555   LearningRate 0.0518   Epoch: 11   Global Step: 56680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:22:06,422-Speed 9841.67 samples/sec   Loss 8.7900   LearningRate 0.0518   Epoch: 11   Global Step: 56690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:22:07,460-Speed 9865.87 samples/sec   Loss 8.8109   LearningRate 0.0518   Epoch: 11   Global Step: 56700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:22:08,459-Speed 10268.73 samples/sec   Loss 8.8193   LearningRate 0.0518   Epoch: 11   Global Step: 56710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:22:09,482-Speed 10021.39 samples/sec   Loss 8.8734   LearningRate 0.0518   Epoch: 11   Global Step: 56720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:22:10,464-Speed 10435.55 samples/sec   Loss 8.9858   LearningRate 0.0518   Epoch: 11   Global Step: 56730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:22:11,518-Speed 9729.87 samples/sec   Loss 8.7633   LearningRate 0.0518   Epoch: 11   Global Step: 56740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:22:12,428-Speed 11266.60 samples/sec   Loss 8.8384   LearningRate 0.0518   Epoch: 11   Global Step: 56750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:22:13,380-Speed 10761.83 samples/sec   Loss 8.7912   LearningRate 0.0518   Epoch: 11   Global Step: 56760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:22:14,343-Speed 10646.56 samples/sec   Loss 8.8994   LearningRate 0.0518   Epoch: 11   Global Step: 56770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:22:15,327-Speed 10411.15 samples/sec   Loss 8.8395   LearningRate 0.0517   Epoch: 11   Global Step: 56780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:22:16,386-Speed 9686.10 samples/sec   Loss 8.8853   LearningRate 0.0517   Epoch: 11   Global Step: 56790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:22:17,421-Speed 9900.32 samples/sec   Loss 8.8804   LearningRate 0.0517   Epoch: 11   Global Step: 56800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:22:18,476-Speed 9713.06 samples/sec   Loss 9.0414   LearningRate 0.0517   Epoch: 11   Global Step: 56810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:22:19,468-Speed 10336.91 samples/sec   Loss 8.8833   LearningRate 0.0517   Epoch: 11   Global Step: 56820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:22:20,498-Speed 9950.95 samples/sec   Loss 8.9177   LearningRate 0.0517   Epoch: 11   Global Step: 56830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:22:21,513-Speed 10103.74 samples/sec   Loss 8.9759   LearningRate 0.0517   Epoch: 11   Global Step: 56840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:22:22,603-Speed 9403.40 samples/sec   Loss 8.8307   LearningRate 0.0517   Epoch: 11   Global Step: 56850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:22:23,642-Speed 9869.86 samples/sec   Loss 8.8423   LearningRate 0.0517   Epoch: 11   Global Step: 56860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:22:24,601-Speed 10679.66 samples/sec   Loss 8.9864   LearningRate 0.0517   Epoch: 11   Global Step: 56870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:22:25,638-Speed 9883.50 samples/sec   Loss 8.9166   LearningRate 0.0517   Epoch: 11   Global Step: 56880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:22:26,661-Speed 10024.77 samples/sec   Loss 8.8451   LearningRate 0.0517   Epoch: 11   Global Step: 56890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:22:27,743-Speed 9471.23 samples/sec   Loss 9.0367   LearningRate 0.0517   Epoch: 11   Global Step: 56900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:22:28,816-Speed 9551.76 samples/sec   Loss 8.8639   LearningRate 0.0517   Epoch: 11   Global Step: 56910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:22:29,802-Speed 10392.23 samples/sec   Loss 8.8953   LearningRate 0.0516   Epoch: 11   Global Step: 56920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:22:30,923-Speed 9141.76 samples/sec   Loss 8.8927   LearningRate 0.0516   Epoch: 11   Global Step: 56930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:22:31,996-Speed 9554.05 samples/sec   Loss 8.9608   LearningRate 0.0516   Epoch: 11   Global Step: 56940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:22:32,987-Speed 10336.26 samples/sec   Loss 8.8106   LearningRate 0.0516   Epoch: 11   Global Step: 56950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:22:33,985-Speed 10275.15 samples/sec   Loss 8.9383   LearningRate 0.0516   Epoch: 11   Global Step: 56960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:22:34,928-Speed 10874.37 samples/sec   Loss 8.7407   LearningRate 0.0516   Epoch: 11   Global Step: 56970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:22:35,937-Speed 10162.88 samples/sec   Loss 8.8493   LearningRate 0.0516   Epoch: 11   Global Step: 56980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:22:36,987-Speed 9760.68 samples/sec   Loss 8.7440   LearningRate 0.0516   Epoch: 11   Global Step: 56990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:22:37,991-Speed 10211.37 samples/sec   Loss 9.0662   LearningRate 0.0516   Epoch: 11   Global Step: 57000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:22:39,021-Speed 9955.08 samples/sec   Loss 8.9595   LearningRate 0.0516   Epoch: 11   Global Step: 57010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:22:40,005-Speed 10410.51 samples/sec   Loss 9.0097   LearningRate 0.0516   Epoch: 11   Global Step: 57020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:22:41,019-Speed 10130.67 samples/sec   Loss 8.9520   LearningRate 0.0516   Epoch: 11   Global Step: 57030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:22:42,109-Speed 9405.26 samples/sec   Loss 8.9899   LearningRate 0.0516   Epoch: 11   Global Step: 57040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:22:43,196-Speed 9427.64 samples/sec   Loss 8.8660   LearningRate 0.0516   Epoch: 11   Global Step: 57050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:22:44,231-Speed 9902.18 samples/sec   Loss 8.9160   LearningRate 0.0515   Epoch: 11   Global Step: 57060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:22:45,208-Speed 10486.50 samples/sec   Loss 8.8767   LearningRate 0.0515   Epoch: 11   Global Step: 57070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:22:46,198-Speed 10355.63 samples/sec   Loss 8.8133   LearningRate 0.0515   Epoch: 11   Global Step: 57080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:22:47,244-Speed 9794.83 samples/sec   Loss 8.9264   LearningRate 0.0515   Epoch: 11   Global Step: 57090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:22:48,302-Speed 9686.67 samples/sec   Loss 9.0131   LearningRate 0.0515   Epoch: 11   Global Step: 57100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:22:49,286-Speed 10418.05 samples/sec   Loss 8.8813   LearningRate 0.0515   Epoch: 11   Global Step: 57110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:22:50,278-Speed 10332.02 samples/sec   Loss 9.0933   LearningRate 0.0515   Epoch: 11   Global Step: 57120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:22:51,295-Speed 10078.03 samples/sec   Loss 8.9489   LearningRate 0.0515   Epoch: 11   Global Step: 57130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:22:52,342-Speed 9794.05 samples/sec   Loss 8.8879   LearningRate 0.0515   Epoch: 11   Global Step: 57140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:22:53,369-Speed 9976.89 samples/sec   Loss 8.9021   LearningRate 0.0515   Epoch: 11   Global Step: 57150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:22:54,501-Speed 9052.81 samples/sec   Loss 8.9756   LearningRate 0.0515   Epoch: 11   Global Step: 57160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:22:55,469-Speed 10588.17 samples/sec   Loss 9.0460   LearningRate 0.0515   Epoch: 11   Global Step: 57170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:22:56,497-Speed 9968.34 samples/sec   Loss 8.9359   LearningRate 0.0515   Epoch: 11   Global Step: 57180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:22:57,480-Speed 10431.47 samples/sec   Loss 8.8654   LearningRate 0.0515   Epoch: 11   Global Step: 57190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:22:58,450-Speed 10573.45 samples/sec   Loss 9.0060   LearningRate 0.0514   Epoch: 11   Global Step: 57200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:22:59,564-Speed 9194.21 samples/sec   Loss 8.9412   LearningRate 0.0514   Epoch: 11   Global Step: 57210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:23:00,574-Speed 10150.18 samples/sec   Loss 8.9498   LearningRate 0.0514   Epoch: 11   Global Step: 57220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:23:01,591-Speed 10077.57 samples/sec   Loss 8.9652   LearningRate 0.0514   Epoch: 11   Global Step: 57230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:23:02,623-Speed 9928.05 samples/sec   Loss 8.8205   LearningRate 0.0514   Epoch: 11   Global Step: 57240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:23:03,657-Speed 9910.92 samples/sec   Loss 9.0311   LearningRate 0.0514   Epoch: 11   Global Step: 57250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:23:04,999-Speed 7637.74 samples/sec   Loss 8.9259   LearningRate 0.0514   Epoch: 11   Global Step: 57260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:23:06,102-Speed 9287.29 samples/sec   Loss 8.8891   LearningRate 0.0514   Epoch: 11   Global Step: 57270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:23:07,087-Speed 10409.79 samples/sec   Loss 8.7801   LearningRate 0.0514   Epoch: 11   Global Step: 57280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:23:08,161-Speed 9537.16 samples/sec   Loss 8.8820   LearningRate 0.0514   Epoch: 11   Global Step: 57290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:23:09,116-Speed 10733.42 samples/sec   Loss 9.0127   LearningRate 0.0514   Epoch: 11   Global Step: 57300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:23:10,114-Speed 10263.16 samples/sec   Loss 8.8723   LearningRate 0.0514   Epoch: 11   Global Step: 57310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:23:11,149-Speed 9899.06 samples/sec   Loss 8.9148   LearningRate 0.0514   Epoch: 11   Global Step: 57320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:23:12,359-Speed 8471.07 samples/sec   Loss 8.9021   LearningRate 0.0514   Epoch: 11   Global Step: 57330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:23:13,454-Speed 9354.00 samples/sec   Loss 8.9168   LearningRate 0.0513   Epoch: 11   Global Step: 57340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:23:14,469-Speed 10098.77 samples/sec   Loss 9.2319   LearningRate 0.0513   Epoch: 11   Global Step: 57350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:23:15,538-Speed 9587.66 samples/sec   Loss 9.0614   LearningRate 0.0513   Epoch: 11   Global Step: 57360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:23:16,512-Speed 10523.55 samples/sec   Loss 8.9375   LearningRate 0.0513   Epoch: 11   Global Step: 57370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:23:17,602-Speed 9395.39 samples/sec   Loss 8.8877   LearningRate 0.0513   Epoch: 11   Global Step: 57380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:23:18,642-Speed 9856.99 samples/sec   Loss 8.9448   LearningRate 0.0513   Epoch: 11   Global Step: 57390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:23:19,738-Speed 9349.43 samples/sec   Loss 9.0433   LearningRate 0.0513   Epoch: 11   Global Step: 57400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:23:20,746-Speed 10158.47 samples/sec   Loss 8.8568   LearningRate 0.0513   Epoch: 11   Global Step: 57410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:23:21,719-Speed 10535.80 samples/sec   Loss 8.9114   LearningRate 0.0513   Epoch: 11   Global Step: 57420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:23:22,680-Speed 10658.79 samples/sec   Loss 8.9261   LearningRate 0.0513   Epoch: 11   Global Step: 57430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:23:23,741-Speed 9661.27 samples/sec   Loss 9.1051   LearningRate 0.0513   Epoch: 11   Global Step: 57440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:23:24,785-Speed 9814.84 samples/sec   Loss 8.9949   LearningRate 0.0513   Epoch: 11   Global Step: 57450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:23:25,804-Speed 10057.55 samples/sec   Loss 9.0105   LearningRate 0.0513   Epoch: 11   Global Step: 57460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:23:26,873-Speed 9591.34 samples/sec   Loss 8.9821   LearningRate 0.0513   Epoch: 11   Global Step: 57470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:23:27,919-Speed 9788.88 samples/sec   Loss 9.2362   LearningRate 0.0513   Epoch: 11   Global Step: 57480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:23:28,879-Speed 10672.22 samples/sec   Loss 8.9889   LearningRate 0.0512   Epoch: 11   Global Step: 57490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:23:29,832-Speed 10754.24 samples/sec   Loss 9.0283   LearningRate 0.0512   Epoch: 11   Global Step: 57500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:23:30,857-Speed 9997.34 samples/sec   Loss 8.9835   LearningRate 0.0512   Epoch: 11   Global Step: 57510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:23:31,875-Speed 10069.90 samples/sec   Loss 9.0567   LearningRate 0.0512   Epoch: 11   Global Step: 57520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:23:32,978-Speed 9288.71 samples/sec   Loss 9.0544   LearningRate 0.0512   Epoch: 11   Global Step: 57530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:23:33,954-Speed 10501.06 samples/sec   Loss 8.8222   LearningRate 0.0512   Epoch: 11   Global Step: 57540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:23:35,017-Speed 9634.58 samples/sec   Loss 8.9860   LearningRate 0.0512   Epoch: 11   Global Step: 57550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:23:36,123-Speed 9262.00 samples/sec   Loss 8.8395   LearningRate 0.0512   Epoch: 11   Global Step: 57560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:23:37,148-Speed 9997.26 samples/sec   Loss 8.9762   LearningRate 0.0512   Epoch: 11   Global Step: 57570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:23:38,223-Speed 9534.18 samples/sec   Loss 9.0044   LearningRate 0.0512   Epoch: 11   Global Step: 57580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:23:39,288-Speed 9619.31 samples/sec   Loss 9.0222   LearningRate 0.0512   Epoch: 11   Global Step: 57590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:23:40,288-Speed 10243.67 samples/sec   Loss 9.0352   LearningRate 0.0512   Epoch: 11   Global Step: 57600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:23:41,423-Speed 9030.81 samples/sec   Loss 8.8907   LearningRate 0.0512   Epoch: 11   Global Step: 57610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:23:42,395-Speed 10540.31 samples/sec   Loss 9.0038   LearningRate 0.0512   Epoch: 11   Global Step: 57620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:23:43,413-Speed 10063.43 samples/sec   Loss 9.0012   LearningRate 0.0511   Epoch: 11   Global Step: 57630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:23:44,461-Speed 9783.46 samples/sec   Loss 8.9424   LearningRate 0.0511   Epoch: 11   Global Step: 57640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:23:45,540-Speed 9492.17 samples/sec   Loss 9.1848   LearningRate 0.0511   Epoch: 11   Global Step: 57650   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:23:46,594-Speed 9726.90 samples/sec   Loss 8.9301   LearningRate 0.0511   Epoch: 11   Global Step: 57660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:23:47,563-Speed 10571.62 samples/sec   Loss 9.0319   LearningRate 0.0511   Epoch: 11   Global Step: 57670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:23:48,579-Speed 10087.45 samples/sec   Loss 9.0843   LearningRate 0.0511   Epoch: 11   Global Step: 57680   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:23:49,528-Speed 10800.05 samples/sec   Loss 8.8197   LearningRate 0.0511   Epoch: 11   Global Step: 57690   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:23:50,631-Speed 9281.57 samples/sec   Loss 8.9722   LearningRate 0.0511   Epoch: 11   Global Step: 57700   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:23:51,670-Speed 9869.11 samples/sec   Loss 9.1464   LearningRate 0.0511   Epoch: 11   Global Step: 57710   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:23:52,702-Speed 9929.89 samples/sec   Loss 9.0667   LearningRate 0.0511   Epoch: 11   Global Step: 57720   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:23:53,756-Speed 9715.19 samples/sec   Loss 9.0094   LearningRate 0.0511   Epoch: 11   Global Step: 57730   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:23:54,804-Speed 9774.73 samples/sec   Loss 9.1193   LearningRate 0.0511   Epoch: 11   Global Step: 57740   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:23:55,901-Speed 9347.60 samples/sec   Loss 8.9799   LearningRate 0.0511   Epoch: 11   Global Step: 57750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:23:56,981-Speed 9479.73 samples/sec   Loss 8.9713   LearningRate 0.0511   Epoch: 11   Global Step: 57760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:23:57,940-Speed 10684.55 samples/sec   Loss 9.0009   LearningRate 0.0510   Epoch: 11   Global Step: 57770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:23:58,960-Speed 10045.22 samples/sec   Loss 8.9554   LearningRate 0.0510   Epoch: 11   Global Step: 57780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:23:59,983-Speed 10018.35 samples/sec   Loss 9.0443   LearningRate 0.0510   Epoch: 11   Global Step: 57790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:24:00,981-Speed 10265.12 samples/sec   Loss 9.0252   LearningRate 0.0510   Epoch: 11   Global Step: 57800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:24:02,028-Speed 9784.83 samples/sec   Loss 8.9609   LearningRate 0.0510   Epoch: 11   Global Step: 57810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:24:03,063-Speed 9903.21 samples/sec   Loss 9.0255   LearningRate 0.0510   Epoch: 11   Global Step: 57820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:24:04,063-Speed 10246.57 samples/sec   Loss 8.9554   LearningRate 0.0510   Epoch: 11   Global Step: 57830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:24:05,188-Speed 9107.27 samples/sec   Loss 9.1551   LearningRate 0.0510   Epoch: 11   Global Step: 57840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:24:06,197-Speed 10152.73 samples/sec   Loss 9.0022   LearningRate 0.0510   Epoch: 11   Global Step: 57850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:24:07,261-Speed 9640.14 samples/sec   Loss 8.9633   LearningRate 0.0510   Epoch: 11   Global Step: 57860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:24:08,213-Speed 10759.83 samples/sec   Loss 9.0128   LearningRate 0.0510   Epoch: 11   Global Step: 57870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:24:09,263-Speed 9762.93 samples/sec   Loss 9.0295   LearningRate 0.0510   Epoch: 11   Global Step: 57880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:24:10,289-Speed 9984.53 samples/sec   Loss 9.0919   LearningRate 0.0510   Epoch: 11   Global Step: 57890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:24:11,318-Speed 9956.66 samples/sec   Loss 9.0657   LearningRate 0.0510   Epoch: 11   Global Step: 57900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:24:12,359-Speed 9847.36 samples/sec   Loss 8.9266   LearningRate 0.0509   Epoch: 11   Global Step: 57910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:24:13,359-Speed 10242.41 samples/sec   Loss 9.0265   LearningRate 0.0509   Epoch: 11   Global Step: 57920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:24:14,473-Speed 9194.94 samples/sec   Loss 9.1390   LearningRate 0.0509   Epoch: 11   Global Step: 57930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:24:15,545-Speed 9568.16 samples/sec   Loss 9.1058   LearningRate 0.0509   Epoch: 11   Global Step: 57940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:24:16,552-Speed 10169.35 samples/sec   Loss 9.0623   LearningRate 0.0509   Epoch: 11   Global Step: 57950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:24:17,545-Speed 10324.80 samples/sec   Loss 9.0403   LearningRate 0.0509   Epoch: 11   Global Step: 57960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:24:18,607-Speed 9643.60 samples/sec   Loss 9.0654   LearningRate 0.0509   Epoch: 11   Global Step: 57970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:24:19,683-Speed 9520.41 samples/sec   Loss 9.0868   LearningRate 0.0509   Epoch: 11   Global Step: 57980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:24:20,691-Speed 10172.30 samples/sec   Loss 9.0902   LearningRate 0.0509   Epoch: 11   Global Step: 57990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:24:21,666-Speed 10514.40 samples/sec   Loss 9.1388   LearningRate 0.0509   Epoch: 11   Global Step: 58000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:24:44,125-[lfw][58000]XNorm: 12.760361
Training: 2022-04-11 01:24:44,126-[lfw][58000]Accuracy-Flip: 0.99467+-0.00386
Training: 2022-04-11 01:24:44,127-[lfw][58000]Accuracy-Highest: 0.99550
Training: 2022-04-11 01:25:17,038-[cfp_fp][58000]XNorm: 10.815280
Training: 2022-04-11 01:25:17,039-[cfp_fp][58000]Accuracy-Flip: 0.95029+-0.01521
Training: 2022-04-11 01:25:17,040-[cfp_fp][58000]Accuracy-Highest: 0.95486
Training: 2022-04-11 01:25:45,638-[agedb_30][58000]XNorm: 12.451436
Training: 2022-04-11 01:25:45,639-[agedb_30][58000]Accuracy-Flip: 0.96050+-0.00898
Training: 2022-04-11 01:25:45,639-[agedb_30][58000]Accuracy-Highest: 0.96283
Training: 2022-04-11 01:25:46,602-Speed 120.56 samples/sec   Loss 9.0742   LearningRate 0.0509   Epoch: 11   Global Step: 58010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:25:47,563-Speed 10659.17 samples/sec   Loss 9.2692   LearningRate 0.0509   Epoch: 11   Global Step: 58020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:25:48,531-Speed 10592.12 samples/sec   Loss 9.1477   LearningRate 0.0509   Epoch: 11   Global Step: 58030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:25:49,465-Speed 10970.82 samples/sec   Loss 9.0064   LearningRate 0.0509   Epoch: 11   Global Step: 58040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:25:50,441-Speed 10507.25 samples/sec   Loss 8.9645   LearningRate 0.0508   Epoch: 11   Global Step: 58050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:25:51,406-Speed 10623.70 samples/sec   Loss 9.0789   LearningRate 0.0508   Epoch: 11   Global Step: 58060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:25:52,398-Speed 10340.41 samples/sec   Loss 9.1403   LearningRate 0.0508   Epoch: 11   Global Step: 58070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:25:53,348-Speed 10781.86 samples/sec   Loss 9.0930   LearningRate 0.0508   Epoch: 11   Global Step: 58080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:25:54,385-Speed 9890.85 samples/sec   Loss 9.2201   LearningRate 0.0508   Epoch: 11   Global Step: 58090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:25:55,334-Speed 10806.91 samples/sec   Loss 8.8633   LearningRate 0.0508   Epoch: 11   Global Step: 58100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:25:56,307-Speed 10536.65 samples/sec   Loss 9.0203   LearningRate 0.0508   Epoch: 11   Global Step: 58110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:25:57,283-Speed 10495.66 samples/sec   Loss 9.0770   LearningRate 0.0508   Epoch: 11   Global Step: 58120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:25:58,265-Speed 10453.17 samples/sec   Loss 9.2271   LearningRate 0.0508   Epoch: 11   Global Step: 58130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:25:59,218-Speed 10750.47 samples/sec   Loss 9.0589   LearningRate 0.0508   Epoch: 11   Global Step: 58140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:26:00,200-Speed 10439.13 samples/sec   Loss 9.0281   LearningRate 0.0508   Epoch: 11   Global Step: 58150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:26:01,181-Speed 10450.32 samples/sec   Loss 9.0147   LearningRate 0.0508   Epoch: 11   Global Step: 58160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:26:02,176-Speed 10304.10 samples/sec   Loss 8.9619   LearningRate 0.0508   Epoch: 11   Global Step: 58170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:26:03,205-Speed 9956.89 samples/sec   Loss 9.0173   LearningRate 0.0508   Epoch: 11   Global Step: 58180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:26:04,221-Speed 10093.51 samples/sec   Loss 9.0322   LearningRate 0.0507   Epoch: 11   Global Step: 58190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:26:05,220-Speed 10257.23 samples/sec   Loss 9.1017   LearningRate 0.0507   Epoch: 11   Global Step: 58200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:26:06,234-Speed 10101.78 samples/sec   Loss 9.1057   LearningRate 0.0507   Epoch: 11   Global Step: 58210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:26:07,233-Speed 10260.09 samples/sec   Loss 8.8255   LearningRate 0.0507   Epoch: 11   Global Step: 58220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:26:08,203-Speed 10567.24 samples/sec   Loss 8.9266   LearningRate 0.0507   Epoch: 11   Global Step: 58230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:26:09,201-Speed 10268.91 samples/sec   Loss 8.9626   LearningRate 0.0507   Epoch: 11   Global Step: 58240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:26:10,246-Speed 9804.77 samples/sec   Loss 8.9371   LearningRate 0.0507   Epoch: 11   Global Step: 58250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:26:11,234-Speed 10377.67 samples/sec   Loss 9.0046   LearningRate 0.0507   Epoch: 11   Global Step: 58260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:26:12,313-Speed 9503.17 samples/sec   Loss 9.0296   LearningRate 0.0507   Epoch: 11   Global Step: 58270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:26:13,364-Speed 9745.84 samples/sec   Loss 9.1069   LearningRate 0.0507   Epoch: 11   Global Step: 58280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:26:14,429-Speed 9626.13 samples/sec   Loss 9.0483   LearningRate 0.0507   Epoch: 11   Global Step: 58290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:26:15,457-Speed 9960.91 samples/sec   Loss 9.0733   LearningRate 0.0507   Epoch: 11   Global Step: 58300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:26:16,523-Speed 9617.77 samples/sec   Loss 8.9193   LearningRate 0.0507   Epoch: 11   Global Step: 58310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:26:17,574-Speed 9757.48 samples/sec   Loss 9.1220   LearningRate 0.0507   Epoch: 11   Global Step: 58320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:26:18,542-Speed 10587.62 samples/sec   Loss 9.1335   LearningRate 0.0507   Epoch: 11   Global Step: 58330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:26:19,557-Speed 10094.85 samples/sec   Loss 8.9848   LearningRate 0.0506   Epoch: 11   Global Step: 58340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:26:20,606-Speed 9776.29 samples/sec   Loss 9.0986   LearningRate 0.0506   Epoch: 11   Global Step: 58350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:26:21,613-Speed 10179.35 samples/sec   Loss 8.9708   LearningRate 0.0506   Epoch: 11   Global Step: 58360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:26:22,707-Speed 9366.07 samples/sec   Loss 9.0758   LearningRate 0.0506   Epoch: 11   Global Step: 58370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:26:23,783-Speed 9530.28 samples/sec   Loss 9.1532   LearningRate 0.0506   Epoch: 11   Global Step: 58380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:26:24,883-Speed 9317.28 samples/sec   Loss 8.9627   LearningRate 0.0506   Epoch: 11   Global Step: 58390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:26:25,894-Speed 10137.62 samples/sec   Loss 9.1208   LearningRate 0.0506   Epoch: 11   Global Step: 58400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:26:26,905-Speed 10139.71 samples/sec   Loss 9.0724   LearningRate 0.0506   Epoch: 11   Global Step: 58410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:26:27,906-Speed 10251.42 samples/sec   Loss 8.9220   LearningRate 0.0506   Epoch: 11   Global Step: 58420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:26:28,909-Speed 10221.13 samples/sec   Loss 8.9479   LearningRate 0.0506   Epoch: 11   Global Step: 58430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:26:29,905-Speed 10293.61 samples/sec   Loss 9.1114   LearningRate 0.0506   Epoch: 11   Global Step: 58440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:26:30,917-Speed 10127.11 samples/sec   Loss 8.9374   LearningRate 0.0506   Epoch: 11   Global Step: 58450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:26:31,915-Speed 10268.18 samples/sec   Loss 9.0481   LearningRate 0.0506   Epoch: 11   Global Step: 58460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:26:32,940-Speed 10001.74 samples/sec   Loss 9.0842   LearningRate 0.0506   Epoch: 11   Global Step: 58470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:26:33,952-Speed 10128.05 samples/sec   Loss 8.9591   LearningRate 0.0505   Epoch: 11   Global Step: 58480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:26:34,940-Speed 10376.35 samples/sec   Loss 8.9999   LearningRate 0.0505   Epoch: 11   Global Step: 58490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:26:36,006-Speed 9608.92 samples/sec   Loss 8.9787   LearningRate 0.0505   Epoch: 11   Global Step: 58500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:26:37,058-Speed 9741.73 samples/sec   Loss 9.1353   LearningRate 0.0505   Epoch: 11   Global Step: 58510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:26:38,184-Speed 9106.96 samples/sec   Loss 9.1902   LearningRate 0.0505   Epoch: 11   Global Step: 58520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:26:39,210-Speed 9999.28 samples/sec   Loss 9.0187   LearningRate 0.0505   Epoch: 11   Global Step: 58530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:26:40,293-Speed 9463.00 samples/sec   Loss 9.1164   LearningRate 0.0505   Epoch: 11   Global Step: 58540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:26:41,284-Speed 10337.83 samples/sec   Loss 9.0030   LearningRate 0.0505   Epoch: 11   Global Step: 58550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:26:42,304-Speed 10044.10 samples/sec   Loss 9.2462   LearningRate 0.0505   Epoch: 11   Global Step: 58560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:26:43,263-Speed 10688.40 samples/sec   Loss 9.1476   LearningRate 0.0505   Epoch: 11   Global Step: 58570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:26:44,244-Speed 10451.22 samples/sec   Loss 9.2371   LearningRate 0.0505   Epoch: 11   Global Step: 58580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:26:45,348-Speed 9280.40 samples/sec   Loss 9.0836   LearningRate 0.0505   Epoch: 11   Global Step: 58590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:26:46,449-Speed 9314.29 samples/sec   Loss 8.8823   LearningRate 0.0505   Epoch: 11   Global Step: 58600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:26:47,453-Speed 10206.93 samples/sec   Loss 9.1472   LearningRate 0.0505   Epoch: 11   Global Step: 58610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:26:48,428-Speed 10513.58 samples/sec   Loss 9.0466   LearningRate 0.0504   Epoch: 11   Global Step: 58620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:26:49,410-Speed 10436.02 samples/sec   Loss 9.1059   LearningRate 0.0504   Epoch: 11   Global Step: 58630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:26:50,424-Speed 10109.05 samples/sec   Loss 9.0824   LearningRate 0.0504   Epoch: 11   Global Step: 58640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:26:51,439-Speed 10101.77 samples/sec   Loss 9.2579   LearningRate 0.0504   Epoch: 11   Global Step: 58650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:26:52,445-Speed 10186.21 samples/sec   Loss 9.2452   LearningRate 0.0504   Epoch: 11   Global Step: 58660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:26:53,419-Speed 10524.30 samples/sec   Loss 9.0110   LearningRate 0.0504   Epoch: 11   Global Step: 58670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:26:54,473-Speed 9721.34 samples/sec   Loss 9.0433   LearningRate 0.0504   Epoch: 11   Global Step: 58680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:26:55,546-Speed 9553.49 samples/sec   Loss 9.1451   LearningRate 0.0504   Epoch: 11   Global Step: 58690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:26:56,561-Speed 10100.44 samples/sec   Loss 9.1422   LearningRate 0.0504   Epoch: 11   Global Step: 58700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:26:57,648-Speed 9429.41 samples/sec   Loss 9.0509   LearningRate 0.0504   Epoch: 11   Global Step: 58710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:26:58,670-Speed 10027.22 samples/sec   Loss 9.1925   LearningRate 0.0504   Epoch: 11   Global Step: 58720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:26:59,643-Speed 10544.32 samples/sec   Loss 8.9269   LearningRate 0.0504   Epoch: 11   Global Step: 58730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:27:00,660-Speed 10081.72 samples/sec   Loss 8.9503   LearningRate 0.0504   Epoch: 11   Global Step: 58740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:27:01,662-Speed 10226.44 samples/sec   Loss 8.8637   LearningRate 0.0504   Epoch: 11   Global Step: 58750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:27:02,717-Speed 9708.36 samples/sec   Loss 8.9829   LearningRate 0.0503   Epoch: 11   Global Step: 58760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:27:03,711-Speed 10309.28 samples/sec   Loss 9.0182   LearningRate 0.0503   Epoch: 11   Global Step: 58770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:27:04,782-Speed 9574.13 samples/sec   Loss 8.9628   LearningRate 0.0503   Epoch: 11   Global Step: 58780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:27:05,828-Speed 9799.20 samples/sec   Loss 9.0443   LearningRate 0.0503   Epoch: 11   Global Step: 58790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:27:06,928-Speed 9330.19 samples/sec   Loss 8.9799   LearningRate 0.0503   Epoch: 11   Global Step: 58800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:27:07,887-Speed 10683.51 samples/sec   Loss 8.9628   LearningRate 0.0503   Epoch: 11   Global Step: 58810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:27:08,912-Speed 10002.69 samples/sec   Loss 9.1260   LearningRate 0.0503   Epoch: 11   Global Step: 58820   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:27:09,886-Speed 10544.01 samples/sec   Loss 8.9534   LearningRate 0.0503   Epoch: 11   Global Step: 58830   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:27:10,861-Speed 10507.59 samples/sec   Loss 8.9645   LearningRate 0.0503   Epoch: 11   Global Step: 58840   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:27:11,975-Speed 9201.01 samples/sec   Loss 8.9352   LearningRate 0.0503   Epoch: 11   Global Step: 58850   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:27:12,920-Speed 10846.07 samples/sec   Loss 9.2113   LearningRate 0.0503   Epoch: 11   Global Step: 58860   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:27:13,923-Speed 10221.13 samples/sec   Loss 8.9645   LearningRate 0.0503   Epoch: 11   Global Step: 58870   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:27:14,927-Speed 10210.63 samples/sec   Loss 8.9646   LearningRate 0.0503   Epoch: 11   Global Step: 58880   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:27:15,997-Speed 9571.10 samples/sec   Loss 9.0865   LearningRate 0.0503   Epoch: 11   Global Step: 58890   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:27:17,064-Speed 9612.42 samples/sec   Loss 8.9146   LearningRate 0.0503   Epoch: 11   Global Step: 58900   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:27:18,224-Speed 8838.73 samples/sec   Loss 9.0622   LearningRate 0.0502   Epoch: 11   Global Step: 58910   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:27:19,233-Speed 10155.62 samples/sec   Loss 9.1001   LearningRate 0.0502   Epoch: 11   Global Step: 58920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:27:20,251-Speed 10071.90 samples/sec   Loss 8.9528   LearningRate 0.0502   Epoch: 11   Global Step: 58930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:27:21,224-Speed 10541.65 samples/sec   Loss 9.0994   LearningRate 0.0502   Epoch: 11   Global Step: 58940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:27:22,216-Speed 10321.24 samples/sec   Loss 8.9437   LearningRate 0.0502   Epoch: 11   Global Step: 58950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:27:23,221-Speed 10205.02 samples/sec   Loss 8.9820   LearningRate 0.0502   Epoch: 11   Global Step: 58960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:27:24,154-Speed 10985.62 samples/sec   Loss 8.9779   LearningRate 0.0502   Epoch: 11   Global Step: 58970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:27:25,150-Speed 10291.09 samples/sec   Loss 8.9766   LearningRate 0.0502   Epoch: 11   Global Step: 58980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:27:26,115-Speed 10617.40 samples/sec   Loss 8.9928   LearningRate 0.0502   Epoch: 11   Global Step: 58990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:27:27,078-Speed 10640.40 samples/sec   Loss 9.1697   LearningRate 0.0502   Epoch: 11   Global Step: 59000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:27:28,138-Speed 9682.01 samples/sec   Loss 9.0379   LearningRate 0.0502   Epoch: 11   Global Step: 59010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:27:29,141-Speed 10215.16 samples/sec   Loss 9.1407   LearningRate 0.0502   Epoch: 11   Global Step: 59020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:27:30,219-Speed 9506.39 samples/sec   Loss 9.0990   LearningRate 0.0502   Epoch: 11   Global Step: 59030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:27:31,264-Speed 9807.23 samples/sec   Loss 8.9986   LearningRate 0.0502   Epoch: 11   Global Step: 59040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:27:32,367-Speed 9293.62 samples/sec   Loss 9.1535   LearningRate 0.0501   Epoch: 11   Global Step: 59050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:27:33,382-Speed 10099.24 samples/sec   Loss 8.8906   LearningRate 0.0501   Epoch: 11   Global Step: 59060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:27:34,376-Speed 10303.45 samples/sec   Loss 9.2699   LearningRate 0.0501   Epoch: 11   Global Step: 59070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:27:35,455-Speed 9502.90 samples/sec   Loss 9.2164   LearningRate 0.0501   Epoch: 11   Global Step: 59080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:27:36,419-Speed 10629.93 samples/sec   Loss 8.8462   LearningRate 0.0501   Epoch: 11   Global Step: 59090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:27:37,485-Speed 9612.03 samples/sec   Loss 9.0774   LearningRate 0.0501   Epoch: 11   Global Step: 59100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:27:38,458-Speed 10538.17 samples/sec   Loss 9.0394   LearningRate 0.0501   Epoch: 11   Global Step: 59110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:27:39,430-Speed 10542.62 samples/sec   Loss 9.0805   LearningRate 0.0501   Epoch: 11   Global Step: 59120   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 01:27:40,498-Speed 9591.67 samples/sec   Loss 8.8402   LearningRate 0.0501   Epoch: 11   Global Step: 59130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:27:41,575-Speed 9521.22 samples/sec   Loss 9.0502   LearningRate 0.0501   Epoch: 11   Global Step: 59140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:27:42,655-Speed 9485.09 samples/sec   Loss 8.9938   LearningRate 0.0501   Epoch: 11   Global Step: 59150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:27:43,748-Speed 9382.39 samples/sec   Loss 8.9374   LearningRate 0.0501   Epoch: 11   Global Step: 59160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:27:44,749-Speed 10233.76 samples/sec   Loss 9.0566   LearningRate 0.0501   Epoch: 11   Global Step: 59170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:27:45,795-Speed 9802.51 samples/sec   Loss 9.0638   LearningRate 0.0501   Epoch: 11   Global Step: 59180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:27:46,785-Speed 10341.68 samples/sec   Loss 8.9849   LearningRate 0.0500   Epoch: 11   Global Step: 59190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:27:47,788-Speed 10219.81 samples/sec   Loss 9.1211   LearningRate 0.0500   Epoch: 11   Global Step: 59200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:27:48,838-Speed 9754.08 samples/sec   Loss 8.9373   LearningRate 0.0500   Epoch: 11   Global Step: 59210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:27:49,861-Speed 10018.83 samples/sec   Loss 9.0096   LearningRate 0.0500   Epoch: 11   Global Step: 59220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:27:50,834-Speed 10531.07 samples/sec   Loss 9.0605   LearningRate 0.0500   Epoch: 11   Global Step: 59230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:27:51,885-Speed 9750.80 samples/sec   Loss 9.0365   LearningRate 0.0500   Epoch: 11   Global Step: 59240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:27:52,899-Speed 10114.48 samples/sec   Loss 9.0799   LearningRate 0.0500   Epoch: 11   Global Step: 59250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:27:53,923-Speed 10003.39 samples/sec   Loss 9.1955   LearningRate 0.0500   Epoch: 11   Global Step: 59260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:27:54,894-Speed 10552.98 samples/sec   Loss 9.0567   LearningRate 0.0500   Epoch: 11   Global Step: 59270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:27:55,992-Speed 9335.27 samples/sec   Loss 9.1220   LearningRate 0.0500   Epoch: 11   Global Step: 59280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:27:56,974-Speed 10445.24 samples/sec   Loss 8.9155   LearningRate 0.0500   Epoch: 11   Global Step: 59290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:27:58,107-Speed 9047.23 samples/sec   Loss 9.0998   LearningRate 0.0500   Epoch: 11   Global Step: 59300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:27:59,051-Speed 10852.88 samples/sec   Loss 9.1368   LearningRate 0.0500   Epoch: 11   Global Step: 59310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:28:00,095-Speed 9820.90 samples/sec   Loss 9.1908   LearningRate 0.0500   Epoch: 11   Global Step: 59320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:28:01,084-Speed 10352.13 samples/sec   Loss 9.2260   LearningRate 0.0499   Epoch: 11   Global Step: 59330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:28:02,215-Speed 9063.77 samples/sec   Loss 8.9765   LearningRate 0.0499   Epoch: 11   Global Step: 59340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:28:03,299-Speed 9453.95 samples/sec   Loss 9.0111   LearningRate 0.0499   Epoch: 11   Global Step: 59350   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 01:28:04,253-Speed 10742.88 samples/sec   Loss 9.1364   LearningRate 0.0499   Epoch: 11   Global Step: 59360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:28:05,297-Speed 9815.91 samples/sec   Loss 9.2166   LearningRate 0.0499   Epoch: 11   Global Step: 59370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:28:06,292-Speed 10291.60 samples/sec   Loss 9.0817   LearningRate 0.0499   Epoch: 11   Global Step: 59380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:28:07,346-Speed 9722.01 samples/sec   Loss 9.0842   LearningRate 0.0499   Epoch: 11   Global Step: 59390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:28:08,379-Speed 9918.25 samples/sec   Loss 8.9272   LearningRate 0.0499   Epoch: 11   Global Step: 59400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:28:09,378-Speed 10259.09 samples/sec   Loss 8.8773   LearningRate 0.0499   Epoch: 11   Global Step: 59410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:28:10,364-Speed 10384.73 samples/sec   Loss 8.9721   LearningRate 0.0499   Epoch: 11   Global Step: 59420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:28:11,362-Speed 10269.61 samples/sec   Loss 9.0774   LearningRate 0.0499   Epoch: 11   Global Step: 59430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:28:12,376-Speed 10111.20 samples/sec   Loss 9.0263   LearningRate 0.0499   Epoch: 11   Global Step: 59440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:28:13,442-Speed 9613.80 samples/sec   Loss 9.0441   LearningRate 0.0499   Epoch: 11   Global Step: 59450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:28:14,390-Speed 10804.70 samples/sec   Loss 9.2642   LearningRate 0.0499   Epoch: 11   Global Step: 59460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:28:15,501-Speed 9224.76 samples/sec   Loss 8.9411   LearningRate 0.0499   Epoch: 11   Global Step: 59470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:28:16,492-Speed 10343.10 samples/sec   Loss 9.1505   LearningRate 0.0498   Epoch: 11   Global Step: 59480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:28:17,566-Speed 9538.52 samples/sec   Loss 8.7706   LearningRate 0.0498   Epoch: 11   Global Step: 59490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:28:18,602-Speed 9891.30 samples/sec   Loss 8.9635   LearningRate 0.0498   Epoch: 11   Global Step: 59500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:28:19,595-Speed 10319.89 samples/sec   Loss 8.9098   LearningRate 0.0498   Epoch: 11   Global Step: 59510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:28:20,614-Speed 10051.35 samples/sec   Loss 9.0704   LearningRate 0.0498   Epoch: 11   Global Step: 59520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:28:21,584-Speed 10567.56 samples/sec   Loss 8.9537   LearningRate 0.0498   Epoch: 11   Global Step: 59530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:28:22,554-Speed 10562.25 samples/sec   Loss 9.0085   LearningRate 0.0498   Epoch: 11   Global Step: 59540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:28:23,560-Speed 10187.54 samples/sec   Loss 9.0930   LearningRate 0.0498   Epoch: 11   Global Step: 59550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:28:24,521-Speed 10663.61 samples/sec   Loss 9.1709   LearningRate 0.0498   Epoch: 11   Global Step: 59560   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 01:28:25,659-Speed 9021.33 samples/sec   Loss 9.0691   LearningRate 0.0498   Epoch: 11   Global Step: 59570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:28:26,654-Speed 10294.12 samples/sec   Loss 9.1332   LearningRate 0.0498   Epoch: 11   Global Step: 59580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:28:27,724-Speed 9592.91 samples/sec   Loss 8.9695   LearningRate 0.0498   Epoch: 11   Global Step: 59590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:28:28,758-Speed 9910.11 samples/sec   Loss 9.1355   LearningRate 0.0498   Epoch: 11   Global Step: 59600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:28:29,770-Speed 10130.68 samples/sec   Loss 9.0032   LearningRate 0.0498   Epoch: 11   Global Step: 59610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:28:30,877-Speed 9252.66 samples/sec   Loss 8.9150   LearningRate 0.0497   Epoch: 11   Global Step: 59620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:28:31,953-Speed 9529.07 samples/sec   Loss 9.0346   LearningRate 0.0497   Epoch: 11   Global Step: 59630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:28:32,908-Speed 10729.90 samples/sec   Loss 9.1191   LearningRate 0.0497   Epoch: 11   Global Step: 59640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:28:33,922-Speed 10099.95 samples/sec   Loss 9.0659   LearningRate 0.0497   Epoch: 11   Global Step: 59650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:28:34,928-Speed 10186.74 samples/sec   Loss 9.0403   LearningRate 0.0497   Epoch: 11   Global Step: 59660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:28:35,992-Speed 9624.93 samples/sec   Loss 9.2565   LearningRate 0.0497   Epoch: 11   Global Step: 59670   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 01:28:37,099-Speed 9259.84 samples/sec   Loss 8.9758   LearningRate 0.0497   Epoch: 11   Global Step: 59680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:28:38,144-Speed 9803.15 samples/sec   Loss 9.0119   LearningRate 0.0497   Epoch: 11   Global Step: 59690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:28:39,160-Speed 10092.37 samples/sec   Loss 9.1607   LearningRate 0.0497   Epoch: 11   Global Step: 59700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:28:40,150-Speed 10350.83 samples/sec   Loss 8.9419   LearningRate 0.0497   Epoch: 11   Global Step: 59710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:28:41,238-Speed 9412.11 samples/sec   Loss 8.8763   LearningRate 0.0497   Epoch: 11   Global Step: 59720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:28:42,268-Speed 9951.94 samples/sec   Loss 9.0449   LearningRate 0.0497   Epoch: 11   Global Step: 59730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:28:43,387-Speed 9154.09 samples/sec   Loss 8.9590   LearningRate 0.0497   Epoch: 11   Global Step: 59740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:28:44,338-Speed 10773.89 samples/sec   Loss 9.0723   LearningRate 0.0497   Epoch: 11   Global Step: 59750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:28:45,362-Speed 10009.46 samples/sec   Loss 8.9820   LearningRate 0.0496   Epoch: 11   Global Step: 59760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:28:46,420-Speed 9682.75 samples/sec   Loss 8.8646   LearningRate 0.0496   Epoch: 11   Global Step: 59770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:28:47,380-Speed 10676.87 samples/sec   Loss 8.8910   LearningRate 0.0496   Epoch: 11   Global Step: 59780   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 01:28:48,405-Speed 9999.31 samples/sec   Loss 9.1249   LearningRate 0.0496   Epoch: 11   Global Step: 59790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:28:49,383-Speed 10476.72 samples/sec   Loss 9.1269   LearningRate 0.0496   Epoch: 11   Global Step: 59800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:28:50,390-Speed 10178.71 samples/sec   Loss 8.9593   LearningRate 0.0496   Epoch: 11   Global Step: 59810   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:28:51,444-Speed 9718.46 samples/sec   Loss 8.9911   LearningRate 0.0496   Epoch: 11   Global Step: 59820   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:28:52,572-Speed 9088.90 samples/sec   Loss 8.9503   LearningRate 0.0496   Epoch: 11   Global Step: 59830   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:28:53,652-Speed 9482.45 samples/sec   Loss 9.0504   LearningRate 0.0496   Epoch: 11   Global Step: 59840   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:28:54,682-Speed 9952.60 samples/sec   Loss 9.0226   LearningRate 0.0496   Epoch: 11   Global Step: 59850   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:28:55,734-Speed 9737.25 samples/sec   Loss 8.9622   LearningRate 0.0496   Epoch: 11   Global Step: 59860   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:28:56,790-Speed 9710.50 samples/sec   Loss 9.0066   LearningRate 0.0496   Epoch: 11   Global Step: 59870   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:28:57,879-Speed 9406.58 samples/sec   Loss 8.9249   LearningRate 0.0496   Epoch: 11   Global Step: 59880   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:28:58,992-Speed 9208.83 samples/sec   Loss 8.9548   LearningRate 0.0496   Epoch: 11   Global Step: 59890   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:28:59,988-Speed 10287.46 samples/sec   Loss 9.1831   LearningRate 0.0496   Epoch: 11   Global Step: 59900   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:29:01,021-Speed 9918.69 samples/sec   Loss 9.1946   LearningRate 0.0495   Epoch: 11   Global Step: 59910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:29:02,014-Speed 10325.76 samples/sec   Loss 9.0308   LearningRate 0.0495   Epoch: 11   Global Step: 59920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:29:03,102-Speed 9419.49 samples/sec   Loss 8.8835   LearningRate 0.0495   Epoch: 11   Global Step: 59930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:29:04,055-Speed 10754.43 samples/sec   Loss 9.0114   LearningRate 0.0495   Epoch: 11   Global Step: 59940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:29:05,024-Speed 10574.88 samples/sec   Loss 9.0399   LearningRate 0.0495   Epoch: 11   Global Step: 59950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:29:06,030-Speed 10182.67 samples/sec   Loss 9.1540   LearningRate 0.0495   Epoch: 11   Global Step: 59960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:29:07,042-Speed 10123.99 samples/sec   Loss 9.0772   LearningRate 0.0495   Epoch: 11   Global Step: 59970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:29:08,046-Speed 10202.11 samples/sec   Loss 9.0819   LearningRate 0.0495   Epoch: 11   Global Step: 59980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:29:09,081-Speed 9904.63 samples/sec   Loss 9.2131   LearningRate 0.0495   Epoch: 11   Global Step: 59990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:29:10,207-Speed 9101.15 samples/sec   Loss 8.9941   LearningRate 0.0495   Epoch: 11   Global Step: 60000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:29:32,504-[lfw][60000]XNorm: 12.707251
Training: 2022-04-11 01:29:32,505-[lfw][60000]Accuracy-Flip: 0.99550+-0.00342
Training: 2022-04-11 01:29:32,505-[lfw][60000]Accuracy-Highest: 0.99550
Training: 2022-04-11 01:29:58,040-[cfp_fp][60000]XNorm: 10.692623
Training: 2022-04-11 01:29:58,041-[cfp_fp][60000]Accuracy-Flip: 0.95114+-0.01190
Training: 2022-04-11 01:29:58,042-[cfp_fp][60000]Accuracy-Highest: 0.95486
Training: 2022-04-11 01:30:20,214-[agedb_30][60000]XNorm: 12.380912
Training: 2022-04-11 01:30:20,214-[agedb_30][60000]Accuracy-Flip: 0.96183+-0.01015
Training: 2022-04-11 01:30:20,215-[agedb_30][60000]Accuracy-Highest: 0.96283
Training: 2022-04-11 01:30:21,193-Speed 144.26 samples/sec   Loss 8.9445   LearningRate 0.0495   Epoch: 11   Global Step: 60010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:30:22,155-Speed 10658.61 samples/sec   Loss 9.1639   LearningRate 0.0495   Epoch: 11   Global Step: 60020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:30:23,081-Speed 11075.57 samples/sec   Loss 9.0482   LearningRate 0.0495   Epoch: 11   Global Step: 60030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:30:24,068-Speed 10385.73 samples/sec   Loss 9.0573   LearningRate 0.0495   Epoch: 11   Global Step: 60040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:30:25,033-Speed 10618.19 samples/sec   Loss 9.1126   LearningRate 0.0494   Epoch: 11   Global Step: 60050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:30:26,003-Speed 10563.59 samples/sec   Loss 8.9930   LearningRate 0.0494   Epoch: 11   Global Step: 60060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:30:26,979-Speed 10494.40 samples/sec   Loss 8.9410   LearningRate 0.0494   Epoch: 11   Global Step: 60070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:30:27,965-Speed 10425.82 samples/sec   Loss 9.0849   LearningRate 0.0494   Epoch: 11   Global Step: 60080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:30:28,929-Speed 10639.26 samples/sec   Loss 8.9295   LearningRate 0.0494   Epoch: 11   Global Step: 60090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:30:29,895-Speed 10606.98 samples/sec   Loss 8.8953   LearningRate 0.0494   Epoch: 11   Global Step: 60100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:30:30,878-Speed 10421.26 samples/sec   Loss 9.3711   LearningRate 0.0494   Epoch: 11   Global Step: 60110   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 01:30:31,866-Speed 10375.12 samples/sec   Loss 9.0108   LearningRate 0.0494   Epoch: 11   Global Step: 60120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:30:32,827-Speed 10666.74 samples/sec   Loss 9.1725   LearningRate 0.0494   Epoch: 11   Global Step: 60130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:30:33,800-Speed 10529.24 samples/sec   Loss 9.0351   LearningRate 0.0494   Epoch: 11   Global Step: 60140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:30:34,785-Speed 10413.33 samples/sec   Loss 8.9799   LearningRate 0.0494   Epoch: 11   Global Step: 60150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:30:35,752-Speed 10593.64 samples/sec   Loss 9.1448   LearningRate 0.0494   Epoch: 11   Global Step: 60160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:30:36,711-Speed 10691.91 samples/sec   Loss 8.8786   LearningRate 0.0494   Epoch: 11   Global Step: 60170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:30:37,741-Speed 9953.44 samples/sec   Loss 9.0924   LearningRate 0.0494   Epoch: 11   Global Step: 60180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:30:38,705-Speed 10629.26 samples/sec   Loss 8.9189   LearningRate 0.0494   Epoch: 11   Global Step: 60190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:30:39,682-Speed 10494.75 samples/sec   Loss 8.7992   LearningRate 0.0493   Epoch: 11   Global Step: 60200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:30:40,627-Speed 10837.93 samples/sec   Loss 8.9365   LearningRate 0.0493   Epoch: 11   Global Step: 60210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:30:41,601-Speed 10533.12 samples/sec   Loss 8.9551   LearningRate 0.0493   Epoch: 11   Global Step: 60220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:30:42,551-Speed 10791.46 samples/sec   Loss 9.0252   LearningRate 0.0493   Epoch: 11   Global Step: 60230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:30:43,522-Speed 10553.63 samples/sec   Loss 8.9858   LearningRate 0.0493   Epoch: 11   Global Step: 60240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:30:44,481-Speed 10688.85 samples/sec   Loss 8.9482   LearningRate 0.0493   Epoch: 11   Global Step: 60250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:30:45,442-Speed 10672.77 samples/sec   Loss 8.8642   LearningRate 0.0493   Epoch: 11   Global Step: 60260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:30:46,376-Speed 10965.17 samples/sec   Loss 8.9807   LearningRate 0.0493   Epoch: 11   Global Step: 60270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:30:47,401-Speed 10008.17 samples/sec   Loss 9.0471   LearningRate 0.0493   Epoch: 11   Global Step: 60280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:30:48,419-Speed 10063.49 samples/sec   Loss 9.1295   LearningRate 0.0493   Epoch: 11   Global Step: 60290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:30:49,397-Speed 10485.93 samples/sec   Loss 9.0083   LearningRate 0.0493   Epoch: 11   Global Step: 60300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:30:50,431-Speed 9910.53 samples/sec   Loss 8.8892   LearningRate 0.0493   Epoch: 11   Global Step: 60310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:30:51,486-Speed 9712.59 samples/sec   Loss 9.1083   LearningRate 0.0493   Epoch: 11   Global Step: 60320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:30:52,464-Speed 10487.10 samples/sec   Loss 9.1594   LearningRate 0.0493   Epoch: 11   Global Step: 60330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:30:53,481-Speed 10074.14 samples/sec   Loss 9.0404   LearningRate 0.0492   Epoch: 11   Global Step: 60340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:30:54,456-Speed 10522.07 samples/sec   Loss 9.0514   LearningRate 0.0492   Epoch: 11   Global Step: 60350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:30:55,404-Speed 10803.14 samples/sec   Loss 8.9624   LearningRate 0.0492   Epoch: 11   Global Step: 60360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:30:56,373-Speed 10585.87 samples/sec   Loss 9.1501   LearningRate 0.0492   Epoch: 11   Global Step: 60370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:30:57,302-Speed 11023.95 samples/sec   Loss 9.0405   LearningRate 0.0492   Epoch: 11   Global Step: 60380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:30:58,287-Speed 10412.28 samples/sec   Loss 8.9771   LearningRate 0.0492   Epoch: 11   Global Step: 60390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:30:59,300-Speed 10117.82 samples/sec   Loss 8.9226   LearningRate 0.0492   Epoch: 11   Global Step: 60400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:31:00,360-Speed 9671.66 samples/sec   Loss 9.1647   LearningRate 0.0492   Epoch: 11   Global Step: 60410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:31:01,407-Speed 9778.62 samples/sec   Loss 8.9873   LearningRate 0.0492   Epoch: 11   Global Step: 60420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:31:02,427-Speed 10056.59 samples/sec   Loss 9.2072   LearningRate 0.0492   Epoch: 11   Global Step: 60430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:31:03,486-Speed 9683.78 samples/sec   Loss 9.0195   LearningRate 0.0492   Epoch: 11   Global Step: 60440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:31:04,473-Speed 10376.50 samples/sec   Loss 8.9921   LearningRate 0.0492   Epoch: 11   Global Step: 60450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:31:05,434-Speed 10669.01 samples/sec   Loss 9.1119   LearningRate 0.0492   Epoch: 11   Global Step: 60460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:31:06,402-Speed 10593.97 samples/sec   Loss 9.0853   LearningRate 0.0492   Epoch: 11   Global Step: 60470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:31:07,410-Speed 10166.26 samples/sec   Loss 9.0440   LearningRate 0.0491   Epoch: 11   Global Step: 60480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:31:08,383-Speed 10538.47 samples/sec   Loss 8.8625   LearningRate 0.0491   Epoch: 11   Global Step: 60490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:31:09,366-Speed 10427.10 samples/sec   Loss 9.0451   LearningRate 0.0491   Epoch: 11   Global Step: 60500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:31:10,365-Speed 10256.52 samples/sec   Loss 8.8914   LearningRate 0.0491   Epoch: 11   Global Step: 60510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:31:11,324-Speed 10696.43 samples/sec   Loss 9.1657   LearningRate 0.0491   Epoch: 11   Global Step: 60520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:31:12,241-Speed 11177.52 samples/sec   Loss 8.9011   LearningRate 0.0491   Epoch: 11   Global Step: 60530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:31:13,232-Speed 10337.86 samples/sec   Loss 8.9766   LearningRate 0.0491   Epoch: 11   Global Step: 60540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:31:14,222-Speed 10358.12 samples/sec   Loss 9.0658   LearningRate 0.0491   Epoch: 11   Global Step: 60550   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 01:31:15,166-Speed 10867.54 samples/sec   Loss 9.0224   LearningRate 0.0491   Epoch: 11   Global Step: 60560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:31:16,122-Speed 10714.11 samples/sec   Loss 9.0806   LearningRate 0.0491   Epoch: 11   Global Step: 60570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:31:17,163-Speed 9843.00 samples/sec   Loss 9.0480   LearningRate 0.0491   Epoch: 11   Global Step: 60580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:31:18,153-Speed 10362.04 samples/sec   Loss 9.1134   LearningRate 0.0491   Epoch: 11   Global Step: 60590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:31:19,134-Speed 10448.13 samples/sec   Loss 8.9825   LearningRate 0.0491   Epoch: 11   Global Step: 60600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:31:20,085-Speed 10774.79 samples/sec   Loss 9.0044   LearningRate 0.0491   Epoch: 11   Global Step: 60610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:31:21,116-Speed 9941.33 samples/sec   Loss 8.9930   LearningRate 0.0491   Epoch: 11   Global Step: 60620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:31:22,082-Speed 10613.40 samples/sec   Loss 9.0828   LearningRate 0.0490   Epoch: 11   Global Step: 60630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:31:23,069-Speed 10386.70 samples/sec   Loss 9.0854   LearningRate 0.0490   Epoch: 11   Global Step: 60640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:31:24,058-Speed 10362.61 samples/sec   Loss 8.9269   LearningRate 0.0490   Epoch: 11   Global Step: 60650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:31:25,027-Speed 10580.68 samples/sec   Loss 9.2051   LearningRate 0.0490   Epoch: 11   Global Step: 60660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:31:26,002-Speed 10521.61 samples/sec   Loss 8.9355   LearningRate 0.0490   Epoch: 11   Global Step: 60670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:31:26,956-Speed 10747.21 samples/sec   Loss 9.0408   LearningRate 0.0490   Epoch: 11   Global Step: 60680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:31:28,085-Speed 9079.80 samples/sec   Loss 9.0344   LearningRate 0.0490   Epoch: 11   Global Step: 60690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:31:40,853-Speed 802.23 samples/sec   Loss 8.6386   LearningRate 0.0490   Epoch: 12   Global Step: 60700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:31:41,850-Speed 10282.38 samples/sec   Loss 7.9257   LearningRate 0.0490   Epoch: 12   Global Step: 60710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:31:42,976-Speed 9099.61 samples/sec   Loss 8.0197   LearningRate 0.0490   Epoch: 12   Global Step: 60720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:31:44,031-Speed 9714.04 samples/sec   Loss 7.9740   LearningRate 0.0490   Epoch: 12   Global Step: 60730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:31:45,036-Speed 10209.36 samples/sec   Loss 7.9975   LearningRate 0.0490   Epoch: 12   Global Step: 60740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:31:46,034-Speed 10271.21 samples/sec   Loss 8.0229   LearningRate 0.0490   Epoch: 12   Global Step: 60750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:31:47,001-Speed 10600.08 samples/sec   Loss 7.9738   LearningRate 0.0490   Epoch: 12   Global Step: 60760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:31:47,959-Speed 10703.60 samples/sec   Loss 8.2355   LearningRate 0.0489   Epoch: 12   Global Step: 60770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:31:48,931-Speed 10554.16 samples/sec   Loss 8.1487   LearningRate 0.0489   Epoch: 12   Global Step: 60780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:31:49,899-Speed 10588.10 samples/sec   Loss 8.1344   LearningRate 0.0489   Epoch: 12   Global Step: 60790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:31:50,863-Speed 10632.25 samples/sec   Loss 8.3310   LearningRate 0.0489   Epoch: 12   Global Step: 60800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:31:51,858-Speed 10303.64 samples/sec   Loss 8.1334   LearningRate 0.0489   Epoch: 12   Global Step: 60810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:31:52,782-Speed 11088.26 samples/sec   Loss 8.1279   LearningRate 0.0489   Epoch: 12   Global Step: 60820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:31:53,736-Speed 10749.20 samples/sec   Loss 8.1669   LearningRate 0.0489   Epoch: 12   Global Step: 60830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:31:54,820-Speed 9453.04 samples/sec   Loss 8.1849   LearningRate 0.0489   Epoch: 12   Global Step: 60840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:31:55,782-Speed 10652.56 samples/sec   Loss 8.3881   LearningRate 0.0489   Epoch: 12   Global Step: 60850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:31:56,855-Speed 9552.72 samples/sec   Loss 8.2589   LearningRate 0.0489   Epoch: 12   Global Step: 60860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:31:57,817-Speed 10662.64 samples/sec   Loss 8.1264   LearningRate 0.0489   Epoch: 12   Global Step: 60870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:31:58,759-Speed 10890.60 samples/sec   Loss 8.2091   LearningRate 0.0489   Epoch: 12   Global Step: 60880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:31:59,749-Speed 10345.98 samples/sec   Loss 8.1333   LearningRate 0.0489   Epoch: 12   Global Step: 60890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:32:00,715-Speed 10613.08 samples/sec   Loss 8.3086   LearningRate 0.0489   Epoch: 12   Global Step: 60900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:32:01,685-Speed 10566.75 samples/sec   Loss 8.4528   LearningRate 0.0489   Epoch: 12   Global Step: 60910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:32:02,644-Speed 10680.85 samples/sec   Loss 8.1889   LearningRate 0.0488   Epoch: 12   Global Step: 60920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:32:03,638-Speed 10310.57 samples/sec   Loss 8.2589   LearningRate 0.0488   Epoch: 12   Global Step: 60930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:32:04,650-Speed 10137.81 samples/sec   Loss 8.2540   LearningRate 0.0488   Epoch: 12   Global Step: 60940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:32:05,589-Speed 10911.82 samples/sec   Loss 8.2795   LearningRate 0.0488   Epoch: 12   Global Step: 60950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:32:06,533-Speed 10869.14 samples/sec   Loss 8.1846   LearningRate 0.0488   Epoch: 12   Global Step: 60960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:32:07,539-Speed 10182.95 samples/sec   Loss 8.2849   LearningRate 0.0488   Epoch: 12   Global Step: 60970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:32:08,551-Speed 10128.46 samples/sec   Loss 8.3682   LearningRate 0.0488   Epoch: 12   Global Step: 60980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:32:09,522-Speed 10562.54 samples/sec   Loss 8.2581   LearningRate 0.0488   Epoch: 12   Global Step: 60990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:32:10,487-Speed 10615.66 samples/sec   Loss 8.2902   LearningRate 0.0488   Epoch: 12   Global Step: 61000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:32:11,443-Speed 10718.76 samples/sec   Loss 8.4222   LearningRate 0.0488   Epoch: 12   Global Step: 61010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:32:12,478-Speed 9909.05 samples/sec   Loss 8.3578   LearningRate 0.0488   Epoch: 12   Global Step: 61020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:32:13,536-Speed 9683.74 samples/sec   Loss 8.3664   LearningRate 0.0488   Epoch: 12   Global Step: 61030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:32:14,583-Speed 9784.85 samples/sec   Loss 8.3856   LearningRate 0.0488   Epoch: 12   Global Step: 61040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:32:15,585-Speed 10233.00 samples/sec   Loss 8.3550   LearningRate 0.0488   Epoch: 12   Global Step: 61050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:32:16,659-Speed 9540.57 samples/sec   Loss 8.2192   LearningRate 0.0487   Epoch: 12   Global Step: 61060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:32:17,729-Speed 9585.94 samples/sec   Loss 8.4071   LearningRate 0.0487   Epoch: 12   Global Step: 61070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:32:18,712-Speed 10424.32 samples/sec   Loss 8.3242   LearningRate 0.0487   Epoch: 12   Global Step: 61080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:32:19,714-Speed 10230.76 samples/sec   Loss 8.5220   LearningRate 0.0487   Epoch: 12   Global Step: 61090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:32:20,653-Speed 10909.08 samples/sec   Loss 8.3878   LearningRate 0.0487   Epoch: 12   Global Step: 61100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:32:21,624-Speed 10559.85 samples/sec   Loss 8.2915   LearningRate 0.0487   Epoch: 12   Global Step: 61110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:32:22,593-Speed 10582.91 samples/sec   Loss 8.3061   LearningRate 0.0487   Epoch: 12   Global Step: 61120   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 01:32:23,542-Speed 10829.91 samples/sec   Loss 8.5394   LearningRate 0.0487   Epoch: 12   Global Step: 61130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:32:24,537-Speed 10303.12 samples/sec   Loss 8.2972   LearningRate 0.0487   Epoch: 12   Global Step: 61140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:32:25,464-Speed 11079.69 samples/sec   Loss 8.3601   LearningRate 0.0487   Epoch: 12   Global Step: 61150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:32:26,405-Speed 10890.35 samples/sec   Loss 8.4235   LearningRate 0.0487   Epoch: 12   Global Step: 61160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:32:27,390-Speed 10408.96 samples/sec   Loss 8.4897   LearningRate 0.0487   Epoch: 12   Global Step: 61170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:32:28,344-Speed 10748.00 samples/sec   Loss 8.5125   LearningRate 0.0487   Epoch: 12   Global Step: 61180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:32:29,304-Speed 10683.19 samples/sec   Loss 8.4782   LearningRate 0.0487   Epoch: 12   Global Step: 61190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:32:30,266-Speed 10647.04 samples/sec   Loss 8.5224   LearningRate 0.0487   Epoch: 12   Global Step: 61200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:32:31,277-Speed 10137.97 samples/sec   Loss 8.2071   LearningRate 0.0486   Epoch: 12   Global Step: 61210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:32:32,251-Speed 10519.53 samples/sec   Loss 8.5612   LearningRate 0.0486   Epoch: 12   Global Step: 61220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:32:33,199-Speed 10812.60 samples/sec   Loss 8.3839   LearningRate 0.0486   Epoch: 12   Global Step: 61230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:32:34,189-Speed 10348.27 samples/sec   Loss 8.3154   LearningRate 0.0486   Epoch: 12   Global Step: 61240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:32:35,186-Speed 10295.34 samples/sec   Loss 8.5408   LearningRate 0.0486   Epoch: 12   Global Step: 61250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:32:36,142-Speed 10723.97 samples/sec   Loss 8.3086   LearningRate 0.0486   Epoch: 12   Global Step: 61260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:32:37,114-Speed 10545.50 samples/sec   Loss 8.5197   LearningRate 0.0486   Epoch: 12   Global Step: 61270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:32:38,099-Speed 10406.32 samples/sec   Loss 8.6678   LearningRate 0.0486   Epoch: 12   Global Step: 61280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:32:39,057-Speed 10700.94 samples/sec   Loss 8.2656   LearningRate 0.0486   Epoch: 12   Global Step: 61290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:32:40,044-Speed 10377.79 samples/sec   Loss 8.5861   LearningRate 0.0486   Epoch: 12   Global Step: 61300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:32:41,028-Speed 10426.15 samples/sec   Loss 8.2331   LearningRate 0.0486   Epoch: 12   Global Step: 61310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:32:41,957-Speed 11022.60 samples/sec   Loss 8.5622   LearningRate 0.0486   Epoch: 12   Global Step: 61320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:32:42,970-Speed 10121.59 samples/sec   Loss 8.3467   LearningRate 0.0486   Epoch: 12   Global Step: 61330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:32:44,002-Speed 9978.83 samples/sec   Loss 8.5927   LearningRate 0.0486   Epoch: 12   Global Step: 61340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:32:44,947-Speed 10839.88 samples/sec   Loss 8.4581   LearningRate 0.0485   Epoch: 12   Global Step: 61350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:32:45,925-Speed 10486.63 samples/sec   Loss 8.6643   LearningRate 0.0485   Epoch: 12   Global Step: 61360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:32:46,886-Speed 10670.21 samples/sec   Loss 8.6011   LearningRate 0.0485   Epoch: 12   Global Step: 61370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:32:47,873-Speed 10390.55 samples/sec   Loss 8.5206   LearningRate 0.0485   Epoch: 12   Global Step: 61380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:32:48,832-Speed 10687.53 samples/sec   Loss 8.5713   LearningRate 0.0485   Epoch: 12   Global Step: 61390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:32:49,790-Speed 10696.02 samples/sec   Loss 8.6910   LearningRate 0.0485   Epoch: 12   Global Step: 61400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:32:50,786-Speed 10289.84 samples/sec   Loss 8.5903   LearningRate 0.0485   Epoch: 12   Global Step: 61410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:32:51,738-Speed 10761.41 samples/sec   Loss 8.3619   LearningRate 0.0485   Epoch: 12   Global Step: 61420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:32:52,668-Speed 11025.93 samples/sec   Loss 8.4033   LearningRate 0.0485   Epoch: 12   Global Step: 61430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:32:53,608-Speed 10905.24 samples/sec   Loss 8.7129   LearningRate 0.0485   Epoch: 12   Global Step: 61440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:32:54,571-Speed 10644.07 samples/sec   Loss 8.6660   LearningRate 0.0485   Epoch: 12   Global Step: 61450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:32:55,509-Speed 10927.32 samples/sec   Loss 8.5768   LearningRate 0.0485   Epoch: 12   Global Step: 61460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:32:56,492-Speed 10435.10 samples/sec   Loss 8.5732   LearningRate 0.0485   Epoch: 12   Global Step: 61470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:32:57,457-Speed 10621.41 samples/sec   Loss 8.5202   LearningRate 0.0485   Epoch: 12   Global Step: 61480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:32:58,403-Speed 10841.62 samples/sec   Loss 8.5923   LearningRate 0.0485   Epoch: 12   Global Step: 61490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:32:59,386-Speed 10425.36 samples/sec   Loss 8.4749   LearningRate 0.0484   Epoch: 12   Global Step: 61500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:33:00,370-Speed 10414.64 samples/sec   Loss 8.7867   LearningRate 0.0484   Epoch: 12   Global Step: 61510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:33:01,341-Speed 10563.49 samples/sec   Loss 8.6736   LearningRate 0.0484   Epoch: 12   Global Step: 61520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:33:02,338-Speed 10276.02 samples/sec   Loss 8.6736   LearningRate 0.0484   Epoch: 12   Global Step: 61530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:33:03,333-Speed 10303.65 samples/sec   Loss 8.6549   LearningRate 0.0484   Epoch: 12   Global Step: 61540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:33:04,324-Speed 10343.07 samples/sec   Loss 8.7044   LearningRate 0.0484   Epoch: 12   Global Step: 61550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:33:05,289-Speed 10629.93 samples/sec   Loss 8.5550   LearningRate 0.0484   Epoch: 12   Global Step: 61560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:33:06,215-Speed 11068.47 samples/sec   Loss 8.5765   LearningRate 0.0484   Epoch: 12   Global Step: 61570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:33:07,222-Speed 10170.54 samples/sec   Loss 8.6415   LearningRate 0.0484   Epoch: 12   Global Step: 61580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:33:08,166-Speed 10867.67 samples/sec   Loss 8.7258   LearningRate 0.0484   Epoch: 12   Global Step: 61590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:33:09,151-Speed 10399.56 samples/sec   Loss 8.7813   LearningRate 0.0484   Epoch: 12   Global Step: 61600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:33:10,150-Speed 10266.23 samples/sec   Loss 8.6492   LearningRate 0.0484   Epoch: 12   Global Step: 61610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:33:11,101-Speed 10779.01 samples/sec   Loss 8.5411   LearningRate 0.0484   Epoch: 12   Global Step: 61620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:33:12,051-Speed 10780.84 samples/sec   Loss 8.5936   LearningRate 0.0484   Epoch: 12   Global Step: 61630   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:33:13,036-Speed 10408.01 samples/sec   Loss 8.5446   LearningRate 0.0483   Epoch: 12   Global Step: 61640   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:33:13,987-Speed 10776.62 samples/sec   Loss 8.6445   LearningRate 0.0483   Epoch: 12   Global Step: 61650   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:33:14,916-Speed 11036.50 samples/sec   Loss 8.7461   LearningRate 0.0483   Epoch: 12   Global Step: 61660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:33:15,851-Speed 10957.56 samples/sec   Loss 8.6732   LearningRate 0.0483   Epoch: 12   Global Step: 61670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:33:16,805-Speed 10742.91 samples/sec   Loss 8.9387   LearningRate 0.0483   Epoch: 12   Global Step: 61680   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:33:17,824-Speed 10064.78 samples/sec   Loss 8.6242   LearningRate 0.0483   Epoch: 12   Global Step: 61690   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:33:18,765-Speed 10887.14 samples/sec   Loss 8.7721   LearningRate 0.0483   Epoch: 12   Global Step: 61700   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:33:19,739-Speed 10528.04 samples/sec   Loss 8.6844   LearningRate 0.0483   Epoch: 12   Global Step: 61710   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:33:20,763-Speed 10006.99 samples/sec   Loss 8.6100   LearningRate 0.0483   Epoch: 12   Global Step: 61720   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:33:21,707-Speed 10871.52 samples/sec   Loss 8.5769   LearningRate 0.0483   Epoch: 12   Global Step: 61730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:33:22,635-Speed 11032.97 samples/sec   Loss 8.8060   LearningRate 0.0483   Epoch: 12   Global Step: 61740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:33:23,607-Speed 10549.35 samples/sec   Loss 8.5361   LearningRate 0.0483   Epoch: 12   Global Step: 61750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:33:24,572-Speed 10622.26 samples/sec   Loss 8.6198   LearningRate 0.0483   Epoch: 12   Global Step: 61760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:33:25,543-Speed 10559.24 samples/sec   Loss 8.7478   LearningRate 0.0483   Epoch: 12   Global Step: 61770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:33:26,516-Speed 10524.88 samples/sec   Loss 8.6060   LearningRate 0.0483   Epoch: 12   Global Step: 61780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:33:27,487-Speed 10563.69 samples/sec   Loss 8.6300   LearningRate 0.0482   Epoch: 12   Global Step: 61790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:33:28,468-Speed 10450.53 samples/sec   Loss 8.6046   LearningRate 0.0482   Epoch: 12   Global Step: 61800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:33:29,460-Speed 10330.51 samples/sec   Loss 8.7169   LearningRate 0.0482   Epoch: 12   Global Step: 61810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:33:30,431-Speed 10558.68 samples/sec   Loss 8.7751   LearningRate 0.0482   Epoch: 12   Global Step: 61820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:33:31,368-Speed 10936.46 samples/sec   Loss 8.7209   LearningRate 0.0482   Epoch: 12   Global Step: 61830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:33:32,314-Speed 10829.50 samples/sec   Loss 8.6983   LearningRate 0.0482   Epoch: 12   Global Step: 61840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:33:33,336-Speed 10036.08 samples/sec   Loss 8.7451   LearningRate 0.0482   Epoch: 12   Global Step: 61850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:33:34,296-Speed 10672.36 samples/sec   Loss 8.6384   LearningRate 0.0482   Epoch: 12   Global Step: 61860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:33:35,235-Speed 10917.41 samples/sec   Loss 8.8179   LearningRate 0.0482   Epoch: 12   Global Step: 61870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:33:36,220-Speed 10414.20 samples/sec   Loss 8.7187   LearningRate 0.0482   Epoch: 12   Global Step: 61880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:33:37,232-Speed 10120.92 samples/sec   Loss 8.7822   LearningRate 0.0482   Epoch: 12   Global Step: 61890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:33:38,179-Speed 10829.11 samples/sec   Loss 8.8697   LearningRate 0.0482   Epoch: 12   Global Step: 61900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:33:39,175-Speed 10287.93 samples/sec   Loss 8.6799   LearningRate 0.0482   Epoch: 12   Global Step: 61910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:33:40,150-Speed 10514.86 samples/sec   Loss 8.8107   LearningRate 0.0482   Epoch: 12   Global Step: 61920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:33:41,137-Speed 10389.53 samples/sec   Loss 8.8296   LearningRate 0.0481   Epoch: 12   Global Step: 61930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:33:42,142-Speed 10198.05 samples/sec   Loss 8.7109   LearningRate 0.0481   Epoch: 12   Global Step: 61940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:33:43,114-Speed 10545.28 samples/sec   Loss 8.7246   LearningRate 0.0481   Epoch: 12   Global Step: 61950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:33:44,080-Speed 10609.56 samples/sec   Loss 8.7828   LearningRate 0.0481   Epoch: 12   Global Step: 61960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:33:45,032-Speed 10769.41 samples/sec   Loss 8.8618   LearningRate 0.0481   Epoch: 12   Global Step: 61970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:33:45,998-Speed 10604.39 samples/sec   Loss 8.7179   LearningRate 0.0481   Epoch: 12   Global Step: 61980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:33:46,968-Speed 10567.08 samples/sec   Loss 8.6561   LearningRate 0.0481   Epoch: 12   Global Step: 61990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:33:47,921-Speed 10759.28 samples/sec   Loss 8.6207   LearningRate 0.0481   Epoch: 12   Global Step: 62000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:34:17,276-[lfw][62000]XNorm: 12.745948
Training: 2022-04-11 01:34:17,277-[lfw][62000]Accuracy-Flip: 0.99517+-0.00418
Training: 2022-04-11 01:34:17,277-[lfw][62000]Accuracy-Highest: 0.99550
Training: 2022-04-11 01:34:46,316-[cfp_fp][62000]XNorm: 10.733094
Training: 2022-04-11 01:34:46,317-[cfp_fp][62000]Accuracy-Flip: 0.94943+-0.01440
Training: 2022-04-11 01:34:46,317-[cfp_fp][62000]Accuracy-Highest: 0.95486
Training: 2022-04-11 01:35:08,465-[agedb_30][62000]XNorm: 12.372078
Training: 2022-04-11 01:35:08,465-[agedb_30][62000]Accuracy-Flip: 0.96283+-0.00922
Training: 2022-04-11 01:35:08,466-[agedb_30][62000]Accuracy-Highest: 0.96283
Training: 2022-04-11 01:35:09,416-Speed 125.65 samples/sec   Loss 8.7536   LearningRate 0.0481   Epoch: 12   Global Step: 62010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:35:10,402-Speed 10387.08 samples/sec   Loss 8.7112   LearningRate 0.0481   Epoch: 12   Global Step: 62020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:35:11,372-Speed 10570.23 samples/sec   Loss 8.7480   LearningRate 0.0481   Epoch: 12   Global Step: 62030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:35:12,314-Speed 10873.01 samples/sec   Loss 8.7481   LearningRate 0.0481   Epoch: 12   Global Step: 62040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:35:13,318-Speed 10215.91 samples/sec   Loss 8.5977   LearningRate 0.0481   Epoch: 12   Global Step: 62050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:35:14,300-Speed 10441.72 samples/sec   Loss 8.5381   LearningRate 0.0481   Epoch: 12   Global Step: 62060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:35:15,248-Speed 10816.29 samples/sec   Loss 8.6703   LearningRate 0.0481   Epoch: 12   Global Step: 62070   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:35:16,258-Speed 10146.64 samples/sec   Loss 8.8430   LearningRate 0.0480   Epoch: 12   Global Step: 62080   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:35:17,221-Speed 10635.48 samples/sec   Loss 8.6937   LearningRate 0.0480   Epoch: 12   Global Step: 62090   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:35:18,163-Speed 10891.70 samples/sec   Loss 8.6774   LearningRate 0.0480   Epoch: 12   Global Step: 62100   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:35:19,124-Speed 10660.16 samples/sec   Loss 8.6665   LearningRate 0.0480   Epoch: 12   Global Step: 62110   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:35:20,081-Speed 10710.88 samples/sec   Loss 8.7143   LearningRate 0.0480   Epoch: 12   Global Step: 62120   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:35:21,090-Speed 10158.18 samples/sec   Loss 8.7415   LearningRate 0.0480   Epoch: 12   Global Step: 62130   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:35:22,061-Speed 10562.31 samples/sec   Loss 8.8475   LearningRate 0.0480   Epoch: 12   Global Step: 62140   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:35:22,991-Speed 11020.09 samples/sec   Loss 8.6876   LearningRate 0.0480   Epoch: 12   Global Step: 62150   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:35:23,966-Speed 10518.31 samples/sec   Loss 8.6479   LearningRate 0.0480   Epoch: 12   Global Step: 62160   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:35:24,901-Speed 10955.04 samples/sec   Loss 8.7544   LearningRate 0.0480   Epoch: 12   Global Step: 62170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:35:25,833-Speed 10998.49 samples/sec   Loss 8.6049   LearningRate 0.0480   Epoch: 12   Global Step: 62180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:35:26,792-Speed 10688.27 samples/sec   Loss 8.7924   LearningRate 0.0480   Epoch: 12   Global Step: 62190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:35:27,757-Speed 10618.17 samples/sec   Loss 8.7997   LearningRate 0.0480   Epoch: 12   Global Step: 62200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:35:28,716-Speed 10684.34 samples/sec   Loss 8.7814   LearningRate 0.0480   Epoch: 12   Global Step: 62210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:35:29,682-Speed 10617.54 samples/sec   Loss 8.5800   LearningRate 0.0480   Epoch: 12   Global Step: 62220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:35:30,656-Speed 10520.81 samples/sec   Loss 8.5628   LearningRate 0.0479   Epoch: 12   Global Step: 62230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:35:31,613-Speed 10713.81 samples/sec   Loss 8.8516   LearningRate 0.0479   Epoch: 12   Global Step: 62240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:35:32,586-Speed 10531.27 samples/sec   Loss 8.7443   LearningRate 0.0479   Epoch: 12   Global Step: 62250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:35:33,548-Speed 10659.54 samples/sec   Loss 8.9929   LearningRate 0.0479   Epoch: 12   Global Step: 62260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:35:34,553-Speed 10198.83 samples/sec   Loss 8.6452   LearningRate 0.0479   Epoch: 12   Global Step: 62270   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:35:35,518-Speed 10619.22 samples/sec   Loss 8.8053   LearningRate 0.0479   Epoch: 12   Global Step: 62280   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:35:36,499-Speed 10453.36 samples/sec   Loss 8.7809   LearningRate 0.0479   Epoch: 12   Global Step: 62290   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:35:37,473-Speed 10524.93 samples/sec   Loss 8.7404   LearningRate 0.0479   Epoch: 12   Global Step: 62300   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:35:38,437-Speed 10630.94 samples/sec   Loss 8.8096   LearningRate 0.0479   Epoch: 12   Global Step: 62310   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:35:39,413-Speed 10499.61 samples/sec   Loss 8.7473   LearningRate 0.0479   Epoch: 12   Global Step: 62320   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:35:40,383-Speed 10594.08 samples/sec   Loss 8.8085   LearningRate 0.0479   Epoch: 12   Global Step: 62330   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:35:41,344-Speed 10667.38 samples/sec   Loss 8.8529   LearningRate 0.0479   Epoch: 12   Global Step: 62340   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:35:42,358-Speed 10116.08 samples/sec   Loss 8.8691   LearningRate 0.0479   Epoch: 12   Global Step: 62350   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:35:43,338-Speed 10454.15 samples/sec   Loss 8.8041   LearningRate 0.0479   Epoch: 12   Global Step: 62360   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:35:44,312-Speed 10524.22 samples/sec   Loss 8.8533   LearningRate 0.0478   Epoch: 12   Global Step: 62370   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:35:45,277-Speed 10625.73 samples/sec   Loss 8.8192   LearningRate 0.0478   Epoch: 12   Global Step: 62380   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:35:46,232-Speed 10726.99 samples/sec   Loss 8.8448   LearningRate 0.0478   Epoch: 12   Global Step: 62390   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:35:47,218-Speed 10395.69 samples/sec   Loss 8.6003   LearningRate 0.0478   Epoch: 12   Global Step: 62400   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:35:48,276-Speed 9691.78 samples/sec   Loss 8.6537   LearningRate 0.0478   Epoch: 12   Global Step: 62410   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:35:49,257-Speed 10442.33 samples/sec   Loss 8.8679   LearningRate 0.0478   Epoch: 12   Global Step: 62420   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:35:50,212-Speed 10732.07 samples/sec   Loss 8.7012   LearningRate 0.0478   Epoch: 12   Global Step: 62430   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:35:51,214-Speed 10231.05 samples/sec   Loss 8.6617   LearningRate 0.0478   Epoch: 12   Global Step: 62440   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:35:52,184-Speed 10570.41 samples/sec   Loss 8.9019   LearningRate 0.0478   Epoch: 12   Global Step: 62450   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:35:53,166-Speed 10437.08 samples/sec   Loss 8.8374   LearningRate 0.0478   Epoch: 12   Global Step: 62460   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:35:54,140-Speed 10522.33 samples/sec   Loss 8.7469   LearningRate 0.0478   Epoch: 12   Global Step: 62470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:35:55,094-Speed 10745.11 samples/sec   Loss 8.8400   LearningRate 0.0478   Epoch: 12   Global Step: 62480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:35:56,140-Speed 9793.60 samples/sec   Loss 8.9130   LearningRate 0.0478   Epoch: 12   Global Step: 62490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:35:57,137-Speed 10303.08 samples/sec   Loss 8.8533   LearningRate 0.0478   Epoch: 12   Global Step: 62500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:35:58,068-Speed 11022.46 samples/sec   Loss 8.7376   LearningRate 0.0478   Epoch: 12   Global Step: 62510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:35:59,032-Speed 10631.94 samples/sec   Loss 8.8075   LearningRate 0.0477   Epoch: 12   Global Step: 62520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:35:59,989-Speed 10709.32 samples/sec   Loss 8.7988   LearningRate 0.0477   Epoch: 12   Global Step: 62530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:36:00,974-Speed 10410.17 samples/sec   Loss 8.8039   LearningRate 0.0477   Epoch: 12   Global Step: 62540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:36:01,947-Speed 10530.38 samples/sec   Loss 8.8720   LearningRate 0.0477   Epoch: 12   Global Step: 62550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:36:02,967-Speed 10047.77 samples/sec   Loss 8.8139   LearningRate 0.0477   Epoch: 12   Global Step: 62560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:36:03,944-Speed 10495.48 samples/sec   Loss 8.5812   LearningRate 0.0477   Epoch: 12   Global Step: 62570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:36:04,914-Speed 10569.74 samples/sec   Loss 8.8617   LearningRate 0.0477   Epoch: 12   Global Step: 62580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:36:05,845-Speed 11008.78 samples/sec   Loss 8.8584   LearningRate 0.0477   Epoch: 12   Global Step: 62590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:36:06,804-Speed 10685.09 samples/sec   Loss 8.9258   LearningRate 0.0477   Epoch: 12   Global Step: 62600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:36:07,830-Speed 10004.53 samples/sec   Loss 8.9722   LearningRate 0.0477   Epoch: 12   Global Step: 62610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:36:08,789-Speed 10685.15 samples/sec   Loss 8.7342   LearningRate 0.0477   Epoch: 12   Global Step: 62620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:36:09,721-Speed 10995.67 samples/sec   Loss 8.8208   LearningRate 0.0477   Epoch: 12   Global Step: 62630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:36:10,705-Speed 10417.22 samples/sec   Loss 8.8577   LearningRate 0.0477   Epoch: 12   Global Step: 62640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:36:11,726-Speed 10034.05 samples/sec   Loss 8.8003   LearningRate 0.0477   Epoch: 12   Global Step: 62650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:36:12,681-Speed 10741.15 samples/sec   Loss 8.7861   LearningRate 0.0477   Epoch: 12   Global Step: 62660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:36:13,644-Speed 10633.41 samples/sec   Loss 8.8674   LearningRate 0.0476   Epoch: 12   Global Step: 62670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:36:14,627-Speed 10430.75 samples/sec   Loss 8.5937   LearningRate 0.0476   Epoch: 12   Global Step: 62680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:36:15,582-Speed 10741.33 samples/sec   Loss 8.7876   LearningRate 0.0476   Epoch: 12   Global Step: 62690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:36:16,561-Speed 10462.41 samples/sec   Loss 8.7140   LearningRate 0.0476   Epoch: 12   Global Step: 62700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:36:17,612-Speed 9757.15 samples/sec   Loss 8.9243   LearningRate 0.0476   Epoch: 12   Global Step: 62710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:36:18,550-Speed 10927.24 samples/sec   Loss 9.0334   LearningRate 0.0476   Epoch: 12   Global Step: 62720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:36:19,478-Speed 11033.35 samples/sec   Loss 8.8439   LearningRate 0.0476   Epoch: 12   Global Step: 62730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:36:20,527-Speed 10081.07 samples/sec   Loss 8.9993   LearningRate 0.0476   Epoch: 12   Global Step: 62740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:36:21,491-Speed 10629.38 samples/sec   Loss 8.7561   LearningRate 0.0476   Epoch: 12   Global Step: 62750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:36:22,450-Speed 10685.82 samples/sec   Loss 8.8667   LearningRate 0.0476   Epoch: 12   Global Step: 62760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:36:23,391-Speed 10899.56 samples/sec   Loss 8.9395   LearningRate 0.0476   Epoch: 12   Global Step: 62770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:36:24,396-Speed 10192.99 samples/sec   Loss 8.7752   LearningRate 0.0476   Epoch: 12   Global Step: 62780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:36:25,335-Speed 10923.76 samples/sec   Loss 8.7909   LearningRate 0.0476   Epoch: 12   Global Step: 62790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:36:26,279-Speed 10864.33 samples/sec   Loss 8.8709   LearningRate 0.0476   Epoch: 12   Global Step: 62800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:36:27,254-Speed 10517.47 samples/sec   Loss 8.7621   LearningRate 0.0475   Epoch: 12   Global Step: 62810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:36:28,229-Speed 10507.85 samples/sec   Loss 8.8930   LearningRate 0.0475   Epoch: 12   Global Step: 62820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:36:29,196-Speed 10605.99 samples/sec   Loss 8.7840   LearningRate 0.0475   Epoch: 12   Global Step: 62830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:36:30,161-Speed 10627.98 samples/sec   Loss 8.7680   LearningRate 0.0475   Epoch: 12   Global Step: 62840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:36:31,125-Speed 10621.28 samples/sec   Loss 8.8862   LearningRate 0.0475   Epoch: 12   Global Step: 62850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:36:32,139-Speed 10117.01 samples/sec   Loss 8.7417   LearningRate 0.0475   Epoch: 12   Global Step: 62860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:36:33,113-Speed 10529.85 samples/sec   Loss 8.8671   LearningRate 0.0475   Epoch: 12   Global Step: 62870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:36:34,056-Speed 10860.28 samples/sec   Loss 8.8121   LearningRate 0.0475   Epoch: 12   Global Step: 62880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:36:35,023-Speed 10598.05 samples/sec   Loss 8.8397   LearningRate 0.0475   Epoch: 12   Global Step: 62890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:36:35,972-Speed 10799.11 samples/sec   Loss 8.7838   LearningRate 0.0475   Epoch: 12   Global Step: 62900   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:36:36,911-Speed 10920.71 samples/sec   Loss 8.8257   LearningRate 0.0475   Epoch: 12   Global Step: 62910   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:36:37,894-Speed 10430.91 samples/sec   Loss 8.8636   LearningRate 0.0475   Epoch: 12   Global Step: 62920   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:36:38,865-Speed 10557.72 samples/sec   Loss 8.6106   LearningRate 0.0475   Epoch: 12   Global Step: 62930   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:36:39,827-Speed 10654.67 samples/sec   Loss 8.8107   LearningRate 0.0475   Epoch: 12   Global Step: 62940   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:36:40,762-Speed 10965.29 samples/sec   Loss 8.8474   LearningRate 0.0475   Epoch: 12   Global Step: 62950   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:36:41,742-Speed 10463.46 samples/sec   Loss 8.8710   LearningRate 0.0474   Epoch: 12   Global Step: 62960   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:36:42,666-Speed 11090.79 samples/sec   Loss 8.7250   LearningRate 0.0474   Epoch: 12   Global Step: 62970   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:36:43,642-Speed 10498.66 samples/sec   Loss 8.7313   LearningRate 0.0474   Epoch: 12   Global Step: 62980   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:36:44,622-Speed 10457.20 samples/sec   Loss 8.8798   LearningRate 0.0474   Epoch: 12   Global Step: 62990   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:36:45,562-Speed 10910.80 samples/sec   Loss 8.8480   LearningRate 0.0474   Epoch: 12   Global Step: 63000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:36:46,479-Speed 11178.53 samples/sec   Loss 8.8625   LearningRate 0.0474   Epoch: 12   Global Step: 63010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:36:47,507-Speed 9974.05 samples/sec   Loss 8.8269   LearningRate 0.0474   Epoch: 12   Global Step: 63020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:36:48,504-Speed 10281.09 samples/sec   Loss 8.7109   LearningRate 0.0474   Epoch: 12   Global Step: 63030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:36:49,433-Speed 11035.08 samples/sec   Loss 8.9265   LearningRate 0.0474   Epoch: 12   Global Step: 63040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:36:50,382-Speed 10805.13 samples/sec   Loss 8.8307   LearningRate 0.0474   Epoch: 12   Global Step: 63050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:36:51,370-Speed 10369.46 samples/sec   Loss 8.8277   LearningRate 0.0474   Epoch: 12   Global Step: 63060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:36:52,343-Speed 10532.96 samples/sec   Loss 8.9414   LearningRate 0.0474   Epoch: 12   Global Step: 63070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:36:53,277-Speed 10977.30 samples/sec   Loss 8.8249   LearningRate 0.0474   Epoch: 12   Global Step: 63080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:36:54,224-Speed 10825.76 samples/sec   Loss 8.7547   LearningRate 0.0474   Epoch: 12   Global Step: 63090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:36:55,188-Speed 10631.41 samples/sec   Loss 8.9333   LearningRate 0.0474   Epoch: 12   Global Step: 63100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:36:56,144-Speed 10718.64 samples/sec   Loss 8.7511   LearningRate 0.0473   Epoch: 12   Global Step: 63110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:36:57,134-Speed 10354.73 samples/sec   Loss 8.9420   LearningRate 0.0473   Epoch: 12   Global Step: 63120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:36:58,166-Speed 9937.41 samples/sec   Loss 8.7576   LearningRate 0.0473   Epoch: 12   Global Step: 63130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:36:59,162-Speed 10291.99 samples/sec   Loss 8.9197   LearningRate 0.0473   Epoch: 12   Global Step: 63140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:37:00,169-Speed 10175.80 samples/sec   Loss 8.7235   LearningRate 0.0473   Epoch: 12   Global Step: 63150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:37:01,182-Speed 10116.88 samples/sec   Loss 8.7998   LearningRate 0.0473   Epoch: 12   Global Step: 63160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:37:02,177-Speed 10308.38 samples/sec   Loss 8.7827   LearningRate 0.0473   Epoch: 12   Global Step: 63170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:37:03,125-Speed 10813.03 samples/sec   Loss 8.7776   LearningRate 0.0473   Epoch: 12   Global Step: 63180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:37:04,117-Speed 10331.33 samples/sec   Loss 8.8916   LearningRate 0.0473   Epoch: 12   Global Step: 63190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:37:05,069-Speed 10763.58 samples/sec   Loss 8.8789   LearningRate 0.0473   Epoch: 12   Global Step: 63200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:37:06,037-Speed 10586.61 samples/sec   Loss 8.8732   LearningRate 0.0473   Epoch: 12   Global Step: 63210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:37:07,010-Speed 10540.75 samples/sec   Loss 8.8347   LearningRate 0.0473   Epoch: 12   Global Step: 63220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:37:07,950-Speed 10898.75 samples/sec   Loss 8.8802   LearningRate 0.0473   Epoch: 12   Global Step: 63230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:37:08,956-Speed 10193.27 samples/sec   Loss 8.7902   LearningRate 0.0473   Epoch: 12   Global Step: 63240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:37:09,931-Speed 10510.07 samples/sec   Loss 8.8357   LearningRate 0.0472   Epoch: 12   Global Step: 63250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:37:10,894-Speed 10640.47 samples/sec   Loss 8.7332   LearningRate 0.0472   Epoch: 12   Global Step: 63260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:37:11,844-Speed 10785.55 samples/sec   Loss 8.8264   LearningRate 0.0472   Epoch: 12   Global Step: 63270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:37:12,828-Speed 10415.01 samples/sec   Loss 8.7616   LearningRate 0.0472   Epoch: 12   Global Step: 63280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:37:13,809-Speed 10449.39 samples/sec   Loss 8.9673   LearningRate 0.0472   Epoch: 12   Global Step: 63290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:37:14,760-Speed 10777.39 samples/sec   Loss 8.8647   LearningRate 0.0472   Epoch: 12   Global Step: 63300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:37:15,704-Speed 10862.59 samples/sec   Loss 8.8832   LearningRate 0.0472   Epoch: 12   Global Step: 63310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:37:16,632-Speed 11042.32 samples/sec   Loss 8.7660   LearningRate 0.0472   Epoch: 12   Global Step: 63320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:37:17,627-Speed 10305.97 samples/sec   Loss 8.7605   LearningRate 0.0472   Epoch: 12   Global Step: 63330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:37:18,595-Speed 10590.61 samples/sec   Loss 8.8079   LearningRate 0.0472   Epoch: 12   Global Step: 63340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:37:19,563-Speed 10599.59 samples/sec   Loss 8.9350   LearningRate 0.0472   Epoch: 12   Global Step: 63350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:37:20,567-Speed 10206.97 samples/sec   Loss 8.8793   LearningRate 0.0472   Epoch: 12   Global Step: 63360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:37:21,568-Speed 10247.10 samples/sec   Loss 8.7443   LearningRate 0.0472   Epoch: 12   Global Step: 63370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:37:22,540-Speed 10538.47 samples/sec   Loss 8.7929   LearningRate 0.0472   Epoch: 12   Global Step: 63380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:37:23,492-Speed 10772.16 samples/sec   Loss 8.7547   LearningRate 0.0472   Epoch: 12   Global Step: 63390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:37:24,480-Speed 10366.25 samples/sec   Loss 8.7555   LearningRate 0.0471   Epoch: 12   Global Step: 63400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:37:25,433-Speed 10762.78 samples/sec   Loss 8.8118   LearningRate 0.0471   Epoch: 12   Global Step: 63410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:37:26,430-Speed 10284.02 samples/sec   Loss 9.0722   LearningRate 0.0471   Epoch: 12   Global Step: 63420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:37:27,353-Speed 11103.48 samples/sec   Loss 8.8650   LearningRate 0.0471   Epoch: 12   Global Step: 63430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:37:28,286-Speed 10987.23 samples/sec   Loss 8.7485   LearningRate 0.0471   Epoch: 12   Global Step: 63440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:37:29,265-Speed 10460.22 samples/sec   Loss 8.9413   LearningRate 0.0471   Epoch: 12   Global Step: 63450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:37:30,255-Speed 10359.34 samples/sec   Loss 8.9859   LearningRate 0.0471   Epoch: 12   Global Step: 63460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:37:31,263-Speed 10174.61 samples/sec   Loss 8.9300   LearningRate 0.0471   Epoch: 12   Global Step: 63470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:37:32,214-Speed 10779.25 samples/sec   Loss 8.8909   LearningRate 0.0471   Epoch: 12   Global Step: 63480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:37:33,223-Speed 10157.46 samples/sec   Loss 8.9269   LearningRate 0.0471   Epoch: 12   Global Step: 63490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:37:34,201-Speed 10483.60 samples/sec   Loss 8.8308   LearningRate 0.0471   Epoch: 12   Global Step: 63500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:37:35,125-Speed 11084.27 samples/sec   Loss 8.8331   LearningRate 0.0471   Epoch: 12   Global Step: 63510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:37:36,106-Speed 10454.98 samples/sec   Loss 8.8616   LearningRate 0.0471   Epoch: 12   Global Step: 63520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:37:37,074-Speed 10593.42 samples/sec   Loss 8.7480   LearningRate 0.0471   Epoch: 12   Global Step: 63530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:37:38,073-Speed 10255.39 samples/sec   Loss 9.0292   LearningRate 0.0471   Epoch: 12   Global Step: 63540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:37:39,043-Speed 10566.96 samples/sec   Loss 8.8564   LearningRate 0.0470   Epoch: 12   Global Step: 63550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:37:40,048-Speed 10202.03 samples/sec   Loss 8.9249   LearningRate 0.0470   Epoch: 12   Global Step: 63560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:37:41,007-Speed 10688.21 samples/sec   Loss 8.8680   LearningRate 0.0470   Epoch: 12   Global Step: 63570   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:37:41,984-Speed 10484.83 samples/sec   Loss 8.6513   LearningRate 0.0470   Epoch: 12   Global Step: 63580   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:37:42,936-Speed 10773.03 samples/sec   Loss 8.9125   LearningRate 0.0470   Epoch: 12   Global Step: 63590   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:37:43,905-Speed 10569.63 samples/sec   Loss 8.7016   LearningRate 0.0470   Epoch: 12   Global Step: 63600   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:37:44,887-Speed 10436.70 samples/sec   Loss 8.8189   LearningRate 0.0470   Epoch: 12   Global Step: 63610   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:37:45,841-Speed 10747.94 samples/sec   Loss 8.7841   LearningRate 0.0470   Epoch: 12   Global Step: 63620   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:37:46,803-Speed 10649.08 samples/sec   Loss 8.8161   LearningRate 0.0470   Epoch: 12   Global Step: 63630   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:37:47,792-Speed 10371.96 samples/sec   Loss 8.8665   LearningRate 0.0470   Epoch: 12   Global Step: 63640   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:37:48,751-Speed 10677.51 samples/sec   Loss 8.9180   LearningRate 0.0470   Epoch: 12   Global Step: 63650   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:37:49,751-Speed 10250.87 samples/sec   Loss 8.7211   LearningRate 0.0470   Epoch: 12   Global Step: 63660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:37:50,722-Speed 10560.32 samples/sec   Loss 8.6741   LearningRate 0.0470   Epoch: 12   Global Step: 63670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:37:51,665-Speed 10867.39 samples/sec   Loss 8.7538   LearningRate 0.0470   Epoch: 12   Global Step: 63680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:37:52,631-Speed 10604.77 samples/sec   Loss 8.7435   LearningRate 0.0470   Epoch: 12   Global Step: 63690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:37:53,632-Speed 10243.10 samples/sec   Loss 8.9613   LearningRate 0.0469   Epoch: 12   Global Step: 63700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:37:54,585-Speed 10765.34 samples/sec   Loss 8.9152   LearningRate 0.0469   Epoch: 12   Global Step: 63710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:37:55,511-Speed 11066.13 samples/sec   Loss 8.6255   LearningRate 0.0469   Epoch: 12   Global Step: 63720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:37:56,487-Speed 10508.94 samples/sec   Loss 8.8400   LearningRate 0.0469   Epoch: 12   Global Step: 63730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:37:57,433-Speed 10823.64 samples/sec   Loss 8.6904   LearningRate 0.0469   Epoch: 12   Global Step: 63740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:37:58,408-Speed 10514.69 samples/sec   Loss 8.7851   LearningRate 0.0469   Epoch: 12   Global Step: 63750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:37:59,360-Speed 10773.64 samples/sec   Loss 8.8560   LearningRate 0.0469   Epoch: 12   Global Step: 63760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:38:00,307-Speed 10819.50 samples/sec   Loss 8.7805   LearningRate 0.0469   Epoch: 12   Global Step: 63770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:38:01,292-Speed 10409.32 samples/sec   Loss 8.6672   LearningRate 0.0469   Epoch: 12   Global Step: 63780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:38:02,255-Speed 10648.29 samples/sec   Loss 8.9147   LearningRate 0.0469   Epoch: 12   Global Step: 63790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:38:03,198-Speed 10869.81 samples/sec   Loss 8.9553   LearningRate 0.0469   Epoch: 12   Global Step: 63800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:38:04,187-Speed 10356.61 samples/sec   Loss 9.0431   LearningRate 0.0469   Epoch: 12   Global Step: 63810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:38:05,177-Speed 10360.78 samples/sec   Loss 8.9005   LearningRate 0.0469   Epoch: 12   Global Step: 63820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:38:06,152-Speed 10512.95 samples/sec   Loss 8.7378   LearningRate 0.0469   Epoch: 12   Global Step: 63830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:38:07,093-Speed 10888.67 samples/sec   Loss 8.8374   LearningRate 0.0468   Epoch: 12   Global Step: 63840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:38:08,074-Speed 10448.94 samples/sec   Loss 8.9053   LearningRate 0.0468   Epoch: 12   Global Step: 63850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:38:09,068-Speed 10314.15 samples/sec   Loss 8.8982   LearningRate 0.0468   Epoch: 12   Global Step: 63860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:38:10,001-Speed 10988.31 samples/sec   Loss 8.9224   LearningRate 0.0468   Epoch: 12   Global Step: 63870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:38:10,981-Speed 10452.14 samples/sec   Loss 8.9199   LearningRate 0.0468   Epoch: 12   Global Step: 63880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:38:11,955-Speed 10535.15 samples/sec   Loss 8.8844   LearningRate 0.0468   Epoch: 12   Global Step: 63890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:38:12,985-Speed 9945.50 samples/sec   Loss 8.9930   LearningRate 0.0468   Epoch: 12   Global Step: 63900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:38:13,966-Speed 10450.48 samples/sec   Loss 8.8716   LearningRate 0.0468   Epoch: 12   Global Step: 63910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:38:14,947-Speed 10446.38 samples/sec   Loss 8.8143   LearningRate 0.0468   Epoch: 12   Global Step: 63920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:38:15,924-Speed 10497.69 samples/sec   Loss 8.8433   LearningRate 0.0468   Epoch: 12   Global Step: 63930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:38:16,885-Speed 10665.34 samples/sec   Loss 8.6881   LearningRate 0.0468   Epoch: 12   Global Step: 63940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:38:17,864-Speed 10467.53 samples/sec   Loss 8.6106   LearningRate 0.0468   Epoch: 12   Global Step: 63950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:38:18,850-Speed 10394.03 samples/sec   Loss 8.9521   LearningRate 0.0468   Epoch: 12   Global Step: 63960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:38:19,810-Speed 10686.19 samples/sec   Loss 8.8653   LearningRate 0.0468   Epoch: 12   Global Step: 63970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:38:20,783-Speed 10530.84 samples/sec   Loss 8.8001   LearningRate 0.0468   Epoch: 12   Global Step: 63980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:38:21,755-Speed 10549.92 samples/sec   Loss 8.8639   LearningRate 0.0467   Epoch: 12   Global Step: 63990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:38:22,708-Speed 10762.22 samples/sec   Loss 8.6873   LearningRate 0.0467   Epoch: 12   Global Step: 64000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:38:45,052-[lfw][64000]XNorm: 12.742657
Training: 2022-04-11 01:38:45,052-[lfw][64000]Accuracy-Flip: 0.99450+-0.00388
Training: 2022-04-11 01:38:45,052-[lfw][64000]Accuracy-Highest: 0.99550
Training: 2022-04-11 01:39:13,134-[cfp_fp][64000]XNorm: 10.786077
Training: 2022-04-11 01:39:13,135-[cfp_fp][64000]Accuracy-Flip: 0.95657+-0.01101
Training: 2022-04-11 01:39:13,135-[cfp_fp][64000]Accuracy-Highest: 0.95657
Training: 2022-04-11 01:39:35,257-[agedb_30][64000]XNorm: 12.451322
Training: 2022-04-11 01:39:35,257-[agedb_30][64000]Accuracy-Flip: 0.96117+-0.00830
Training: 2022-04-11 01:39:35,258-[agedb_30][64000]Accuracy-Highest: 0.96283
Training: 2022-04-11 01:39:36,195-Speed 139.35 samples/sec   Loss 8.9279   LearningRate 0.0467   Epoch: 12   Global Step: 64010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:39:37,167-Speed 10543.61 samples/sec   Loss 8.8046   LearningRate 0.0467   Epoch: 12   Global Step: 64020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:39:38,114-Speed 10826.65 samples/sec   Loss 8.7948   LearningRate 0.0467   Epoch: 12   Global Step: 64030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:39:39,048-Speed 10969.90 samples/sec   Loss 8.9975   LearningRate 0.0467   Epoch: 12   Global Step: 64040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:39:40,039-Speed 10347.83 samples/sec   Loss 8.9727   LearningRate 0.0467   Epoch: 12   Global Step: 64050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:39:41,046-Speed 10174.33 samples/sec   Loss 8.8282   LearningRate 0.0467   Epoch: 12   Global Step: 64060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:39:41,987-Speed 10893.24 samples/sec   Loss 8.8132   LearningRate 0.0467   Epoch: 12   Global Step: 64070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:39:42,947-Speed 10687.09 samples/sec   Loss 8.7761   LearningRate 0.0467   Epoch: 12   Global Step: 64080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:39:43,991-Speed 9814.19 samples/sec   Loss 8.9252   LearningRate 0.0467   Epoch: 12   Global Step: 64090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:39:44,986-Speed 10303.32 samples/sec   Loss 8.8094   LearningRate 0.0467   Epoch: 12   Global Step: 64100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:39:45,959-Speed 10536.36 samples/sec   Loss 8.7345   LearningRate 0.0467   Epoch: 12   Global Step: 64110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:39:46,950-Speed 10342.60 samples/sec   Loss 8.7239   LearningRate 0.0467   Epoch: 12   Global Step: 64120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:39:47,938-Speed 10372.73 samples/sec   Loss 8.8766   LearningRate 0.0467   Epoch: 12   Global Step: 64130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:39:48,886-Speed 10817.36 samples/sec   Loss 8.9163   LearningRate 0.0466   Epoch: 12   Global Step: 64140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:39:49,864-Speed 10476.31 samples/sec   Loss 8.8201   LearningRate 0.0466   Epoch: 12   Global Step: 64150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:39:50,865-Speed 10239.27 samples/sec   Loss 8.7918   LearningRate 0.0466   Epoch: 12   Global Step: 64160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:39:51,804-Speed 10914.87 samples/sec   Loss 8.8518   LearningRate 0.0466   Epoch: 12   Global Step: 64170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:39:52,770-Speed 10616.64 samples/sec   Loss 8.8778   LearningRate 0.0466   Epoch: 12   Global Step: 64180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:39:53,757-Speed 10380.39 samples/sec   Loss 8.8396   LearningRate 0.0466   Epoch: 12   Global Step: 64190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:39:54,780-Speed 10024.91 samples/sec   Loss 8.7421   LearningRate 0.0466   Epoch: 12   Global Step: 64200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:39:55,739-Speed 10689.68 samples/sec   Loss 8.6359   LearningRate 0.0466   Epoch: 12   Global Step: 64210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:39:56,707-Speed 10579.21 samples/sec   Loss 8.9494   LearningRate 0.0466   Epoch: 12   Global Step: 64220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:39:57,665-Speed 10698.29 samples/sec   Loss 8.7352   LearningRate 0.0466   Epoch: 12   Global Step: 64230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:39:58,652-Speed 10395.41 samples/sec   Loss 8.8266   LearningRate 0.0466   Epoch: 12   Global Step: 64240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:39:59,581-Speed 11032.11 samples/sec   Loss 8.8529   LearningRate 0.0466   Epoch: 12   Global Step: 64250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:00,528-Speed 10817.85 samples/sec   Loss 8.7999   LearningRate 0.0466   Epoch: 12   Global Step: 64260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:01,466-Speed 10929.29 samples/sec   Loss 8.9200   LearningRate 0.0466   Epoch: 12   Global Step: 64270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:02,442-Speed 10496.77 samples/sec   Loss 8.8689   LearningRate 0.0466   Epoch: 12   Global Step: 64280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:40:03,441-Speed 10266.32 samples/sec   Loss 8.7302   LearningRate 0.0465   Epoch: 12   Global Step: 64290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:40:04,412-Speed 10552.84 samples/sec   Loss 8.8434   LearningRate 0.0465   Epoch: 12   Global Step: 64300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:40:05,410-Speed 10264.56 samples/sec   Loss 8.8329   LearningRate 0.0465   Epoch: 12   Global Step: 64310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:40:06,356-Speed 10838.58 samples/sec   Loss 8.8969   LearningRate 0.0465   Epoch: 12   Global Step: 64320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:40:07,315-Speed 10684.52 samples/sec   Loss 8.8512   LearningRate 0.0465   Epoch: 12   Global Step: 64330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:40:08,305-Speed 10358.09 samples/sec   Loss 8.8763   LearningRate 0.0465   Epoch: 12   Global Step: 64340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:40:09,345-Speed 9852.68 samples/sec   Loss 8.8243   LearningRate 0.0465   Epoch: 12   Global Step: 64350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:10,314-Speed 10585.97 samples/sec   Loss 9.0078   LearningRate 0.0465   Epoch: 12   Global Step: 64360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:11,348-Speed 9904.79 samples/sec   Loss 8.7946   LearningRate 0.0465   Epoch: 12   Global Step: 64370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:12,344-Speed 10296.18 samples/sec   Loss 8.7750   LearningRate 0.0465   Epoch: 12   Global Step: 64380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:13,321-Speed 10484.21 samples/sec   Loss 8.8235   LearningRate 0.0465   Epoch: 12   Global Step: 64390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:14,272-Speed 10776.49 samples/sec   Loss 8.9123   LearningRate 0.0465   Epoch: 12   Global Step: 64400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:15,213-Speed 10895.61 samples/sec   Loss 8.7716   LearningRate 0.0465   Epoch: 12   Global Step: 64410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:16,196-Speed 10428.59 samples/sec   Loss 8.8920   LearningRate 0.0465   Epoch: 12   Global Step: 64420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:17,157-Speed 10653.26 samples/sec   Loss 8.9357   LearningRate 0.0465   Epoch: 12   Global Step: 64430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:18,124-Speed 10609.32 samples/sec   Loss 8.9031   LearningRate 0.0464   Epoch: 12   Global Step: 64440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:19,093-Speed 10573.31 samples/sec   Loss 8.7076   LearningRate 0.0464   Epoch: 12   Global Step: 64450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:40:20,104-Speed 10133.30 samples/sec   Loss 8.7474   LearningRate 0.0464   Epoch: 12   Global Step: 64460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:40:21,087-Speed 10435.35 samples/sec   Loss 8.8020   LearningRate 0.0464   Epoch: 12   Global Step: 64470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:40:22,048-Speed 10669.22 samples/sec   Loss 8.9222   LearningRate 0.0464   Epoch: 12   Global Step: 64480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:40:23,013-Speed 10616.66 samples/sec   Loss 8.7944   LearningRate 0.0464   Epoch: 12   Global Step: 64490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:40:24,030-Speed 10082.13 samples/sec   Loss 8.9733   LearningRate 0.0464   Epoch: 12   Global Step: 64500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:40:25,011-Speed 10449.30 samples/sec   Loss 8.8049   LearningRate 0.0464   Epoch: 12   Global Step: 64510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:40:25,969-Speed 10692.89 samples/sec   Loss 8.8643   LearningRate 0.0464   Epoch: 12   Global Step: 64520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:40:26,934-Speed 10624.69 samples/sec   Loss 8.7515   LearningRate 0.0464   Epoch: 12   Global Step: 64530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:40:27,901-Speed 10597.72 samples/sec   Loss 8.9090   LearningRate 0.0464   Epoch: 12   Global Step: 64540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:28,883-Speed 10438.10 samples/sec   Loss 8.7828   LearningRate 0.0464   Epoch: 12   Global Step: 64550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:29,867-Speed 10413.01 samples/sec   Loss 8.8567   LearningRate 0.0464   Epoch: 12   Global Step: 64560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:30,845-Speed 10479.16 samples/sec   Loss 8.8775   LearningRate 0.0464   Epoch: 12   Global Step: 64570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:31,856-Speed 10138.72 samples/sec   Loss 8.8094   LearningRate 0.0463   Epoch: 12   Global Step: 64580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:32,839-Speed 10432.66 samples/sec   Loss 8.9268   LearningRate 0.0463   Epoch: 12   Global Step: 64590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:33,774-Speed 10963.65 samples/sec   Loss 8.7298   LearningRate 0.0463   Epoch: 12   Global Step: 64600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:34,688-Speed 11213.27 samples/sec   Loss 8.8644   LearningRate 0.0463   Epoch: 12   Global Step: 64610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:35,670-Speed 10427.60 samples/sec   Loss 8.7720   LearningRate 0.0463   Epoch: 12   Global Step: 64620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:36,636-Speed 10617.68 samples/sec   Loss 8.8612   LearningRate 0.0463   Epoch: 12   Global Step: 64630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:37,588-Speed 10782.66 samples/sec   Loss 8.9606   LearningRate 0.0463   Epoch: 12   Global Step: 64640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:40:38,499-Speed 11255.60 samples/sec   Loss 8.6825   LearningRate 0.0463   Epoch: 12   Global Step: 64650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:39,556-Speed 9694.04 samples/sec   Loss 8.8709   LearningRate 0.0463   Epoch: 12   Global Step: 64660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:40,552-Speed 10295.36 samples/sec   Loss 8.7860   LearningRate 0.0463   Epoch: 12   Global Step: 64670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:41,488-Speed 10945.17 samples/sec   Loss 8.8787   LearningRate 0.0463   Epoch: 12   Global Step: 64680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:42,543-Speed 9714.82 samples/sec   Loss 8.7057   LearningRate 0.0463   Epoch: 12   Global Step: 64690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:43,540-Speed 10283.97 samples/sec   Loss 8.7729   LearningRate 0.0463   Epoch: 12   Global Step: 64700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:44,457-Speed 11176.00 samples/sec   Loss 8.7871   LearningRate 0.0463   Epoch: 12   Global Step: 64710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:45,406-Speed 10799.75 samples/sec   Loss 8.8836   LearningRate 0.0463   Epoch: 12   Global Step: 64720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:46,351-Speed 10838.20 samples/sec   Loss 8.8993   LearningRate 0.0462   Epoch: 12   Global Step: 64730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:47,361-Speed 10147.49 samples/sec   Loss 8.9378   LearningRate 0.0462   Epoch: 12   Global Step: 64740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:48,326-Speed 10627.72 samples/sec   Loss 8.8390   LearningRate 0.0462   Epoch: 12   Global Step: 64750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:40:49,263-Speed 10939.97 samples/sec   Loss 8.8912   LearningRate 0.0462   Epoch: 12   Global Step: 64760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:40:50,223-Speed 10677.88 samples/sec   Loss 8.8063   LearningRate 0.0462   Epoch: 12   Global Step: 64770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:40:51,250-Speed 9968.24 samples/sec   Loss 8.7007   LearningRate 0.0462   Epoch: 12   Global Step: 64780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:40:52,188-Speed 10943.52 samples/sec   Loss 8.9070   LearningRate 0.0462   Epoch: 12   Global Step: 64790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:53,163-Speed 10516.64 samples/sec   Loss 8.9101   LearningRate 0.0462   Epoch: 12   Global Step: 64800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:54,070-Speed 11293.97 samples/sec   Loss 9.0354   LearningRate 0.0462   Epoch: 12   Global Step: 64810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:55,087-Speed 10079.96 samples/sec   Loss 8.9793   LearningRate 0.0462   Epoch: 12   Global Step: 64820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:56,069-Speed 10429.21 samples/sec   Loss 8.8194   LearningRate 0.0462   Epoch: 12   Global Step: 64830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:57,011-Speed 10889.17 samples/sec   Loss 8.8769   LearningRate 0.0462   Epoch: 12   Global Step: 64840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:57,982-Speed 10553.17 samples/sec   Loss 8.8169   LearningRate 0.0462   Epoch: 12   Global Step: 64850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:58,972-Speed 10355.32 samples/sec   Loss 8.7502   LearningRate 0.0462   Epoch: 12   Global Step: 64860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:40:59,932-Speed 10675.22 samples/sec   Loss 8.7717   LearningRate 0.0462   Epoch: 12   Global Step: 64870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:41:00,911-Speed 10472.21 samples/sec   Loss 8.8232   LearningRate 0.0461   Epoch: 12   Global Step: 64880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:41:01,851-Speed 10896.12 samples/sec   Loss 8.8373   LearningRate 0.0461   Epoch: 12   Global Step: 64890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:41:02,840-Speed 10364.56 samples/sec   Loss 8.9084   LearningRate 0.0461   Epoch: 12   Global Step: 64900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:41:03,826-Speed 10403.57 samples/sec   Loss 8.7719   LearningRate 0.0461   Epoch: 12   Global Step: 64910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:41:04,752-Speed 11058.39 samples/sec   Loss 8.6381   LearningRate 0.0461   Epoch: 12   Global Step: 64920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:41:05,701-Speed 10811.15 samples/sec   Loss 8.9137   LearningRate 0.0461   Epoch: 12   Global Step: 64930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:41:06,723-Speed 10026.47 samples/sec   Loss 8.7396   LearningRate 0.0461   Epoch: 12   Global Step: 64940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:41:07,659-Speed 10961.52 samples/sec   Loss 8.8798   LearningRate 0.0461   Epoch: 12   Global Step: 64950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:41:08,672-Speed 10110.79 samples/sec   Loss 8.7785   LearningRate 0.0461   Epoch: 12   Global Step: 64960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:41:09,678-Speed 10192.36 samples/sec   Loss 8.9846   LearningRate 0.0461   Epoch: 12   Global Step: 64970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:41:10,643-Speed 10623.01 samples/sec   Loss 8.8820   LearningRate 0.0461   Epoch: 12   Global Step: 64980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:41:11,601-Speed 10690.13 samples/sec   Loss 8.6807   LearningRate 0.0461   Epoch: 12   Global Step: 64990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:41:12,663-Speed 9661.33 samples/sec   Loss 8.8135   LearningRate 0.0461   Epoch: 12   Global Step: 65000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:41:13,629-Speed 10611.51 samples/sec   Loss 8.8878   LearningRate 0.0461   Epoch: 12   Global Step: 65010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:41:14,597-Speed 10588.34 samples/sec   Loss 8.8830   LearningRate 0.0461   Epoch: 12   Global Step: 65020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:41:15,578-Speed 10442.01 samples/sec   Loss 8.9168   LearningRate 0.0460   Epoch: 12   Global Step: 65030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:41:16,571-Speed 10330.61 samples/sec   Loss 8.9054   LearningRate 0.0460   Epoch: 12   Global Step: 65040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:41:17,519-Speed 10803.33 samples/sec   Loss 8.8556   LearningRate 0.0460   Epoch: 12   Global Step: 65050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:41:18,485-Speed 10616.90 samples/sec   Loss 8.9157   LearningRate 0.0460   Epoch: 12   Global Step: 65060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:41:19,499-Speed 10104.11 samples/sec   Loss 8.8241   LearningRate 0.0460   Epoch: 12   Global Step: 65070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:41:20,470-Speed 10565.42 samples/sec   Loss 8.8798   LearningRate 0.0460   Epoch: 12   Global Step: 65080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:41:21,395-Speed 11082.00 samples/sec   Loss 8.7731   LearningRate 0.0460   Epoch: 12   Global Step: 65090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:41:22,331-Speed 10951.84 samples/sec   Loss 8.6930   LearningRate 0.0460   Epoch: 12   Global Step: 65100   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:41:23,373-Speed 9837.59 samples/sec   Loss 8.6403   LearningRate 0.0460   Epoch: 12   Global Step: 65110   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:41:24,332-Speed 10691.48 samples/sec   Loss 8.8083   LearningRate 0.0460   Epoch: 12   Global Step: 65120   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:41:25,274-Speed 10881.10 samples/sec   Loss 8.9132   LearningRate 0.0460   Epoch: 12   Global Step: 65130   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:41:26,226-Speed 10763.92 samples/sec   Loss 8.7749   LearningRate 0.0460   Epoch: 12   Global Step: 65140   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:41:27,192-Speed 10602.18 samples/sec   Loss 8.8410   LearningRate 0.0460   Epoch: 12   Global Step: 65150   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:41:28,189-Speed 10287.99 samples/sec   Loss 8.9745   LearningRate 0.0460   Epoch: 12   Global Step: 65160   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:41:29,143-Speed 10751.75 samples/sec   Loss 8.7887   LearningRate 0.0460   Epoch: 12   Global Step: 65170   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:41:30,086-Speed 10861.02 samples/sec   Loss 8.8558   LearningRate 0.0459   Epoch: 12   Global Step: 65180   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:41:31,083-Speed 10285.13 samples/sec   Loss 8.8663   LearningRate 0.0459   Epoch: 12   Global Step: 65190   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:41:32,011-Speed 11044.08 samples/sec   Loss 8.8871   LearningRate 0.0459   Epoch: 12   Global Step: 65200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:41:32,967-Speed 10727.89 samples/sec   Loss 8.9181   LearningRate 0.0459   Epoch: 12   Global Step: 65210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:41:33,895-Speed 11044.33 samples/sec   Loss 8.9216   LearningRate 0.0459   Epoch: 12   Global Step: 65220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:41:34,938-Speed 9825.82 samples/sec   Loss 8.8072   LearningRate 0.0459   Epoch: 12   Global Step: 65230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:41:35,896-Speed 10700.27 samples/sec   Loss 8.7856   LearningRate 0.0459   Epoch: 12   Global Step: 65240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:41:36,880-Speed 10405.73 samples/sec   Loss 8.7400   LearningRate 0.0459   Epoch: 12   Global Step: 65250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:41:37,886-Speed 10191.14 samples/sec   Loss 8.7216   LearningRate 0.0459   Epoch: 12   Global Step: 65260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:41:38,806-Speed 11147.20 samples/sec   Loss 8.8151   LearningRate 0.0459   Epoch: 12   Global Step: 65270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:41:39,823-Speed 10081.15 samples/sec   Loss 8.9632   LearningRate 0.0459   Epoch: 12   Global Step: 65280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:41:40,789-Speed 10616.24 samples/sec   Loss 8.8222   LearningRate 0.0459   Epoch: 12   Global Step: 65290   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:41:41,714-Speed 11086.82 samples/sec   Loss 8.9154   LearningRate 0.0459   Epoch: 12   Global Step: 65300   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:41:42,690-Speed 10498.96 samples/sec   Loss 8.8859   LearningRate 0.0459   Epoch: 12   Global Step: 65310   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:41:43,692-Speed 10228.66 samples/sec   Loss 8.8140   LearningRate 0.0459   Epoch: 12   Global Step: 65320   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:41:44,651-Speed 10684.36 samples/sec   Loss 8.9115   LearningRate 0.0458   Epoch: 12   Global Step: 65330   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:41:45,614-Speed 10648.49 samples/sec   Loss 8.7541   LearningRate 0.0458   Epoch: 12   Global Step: 65340   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:41:46,571-Speed 10700.86 samples/sec   Loss 8.8093   LearningRate 0.0458   Epoch: 12   Global Step: 65350   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:41:47,584-Speed 10133.66 samples/sec   Loss 8.6735   LearningRate 0.0458   Epoch: 12   Global Step: 65360   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:41:48,551-Speed 10600.23 samples/sec   Loss 8.9605   LearningRate 0.0458   Epoch: 12   Global Step: 65370   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:41:49,557-Speed 10178.96 samples/sec   Loss 8.7226   LearningRate 0.0458   Epoch: 12   Global Step: 65380   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:41:50,538-Speed 10451.32 samples/sec   Loss 8.7877   LearningRate 0.0458   Epoch: 12   Global Step: 65390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:41:51,505-Speed 10604.32 samples/sec   Loss 8.7860   LearningRate 0.0458   Epoch: 12   Global Step: 65400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:41:52,446-Speed 10895.38 samples/sec   Loss 8.7230   LearningRate 0.0458   Epoch: 12   Global Step: 65410   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:41:53,396-Speed 10779.76 samples/sec   Loss 8.8632   LearningRate 0.0458   Epoch: 12   Global Step: 65420   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:41:54,387-Speed 10351.62 samples/sec   Loss 8.7383   LearningRate 0.0458   Epoch: 12   Global Step: 65430   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:41:55,332-Speed 10842.92 samples/sec   Loss 8.7183   LearningRate 0.0458   Epoch: 12   Global Step: 65440   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:41:56,271-Speed 10910.56 samples/sec   Loss 8.8138   LearningRate 0.0458   Epoch: 12   Global Step: 65450   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:41:57,297-Speed 9995.19 samples/sec   Loss 8.8331   LearningRate 0.0458   Epoch: 12   Global Step: 65460   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:41:58,266-Speed 10579.93 samples/sec   Loss 8.7747   LearningRate 0.0458   Epoch: 12   Global Step: 65470   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:41:59,239-Speed 10536.98 samples/sec   Loss 8.7867   LearningRate 0.0457   Epoch: 12   Global Step: 65480   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:42:00,214-Speed 10502.29 samples/sec   Loss 9.0563   LearningRate 0.0457   Epoch: 12   Global Step: 65490   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:42:01,224-Speed 10153.77 samples/sec   Loss 8.7729   LearningRate 0.0457   Epoch: 12   Global Step: 65500   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:42:02,192-Speed 10582.27 samples/sec   Loss 8.8836   LearningRate 0.0457   Epoch: 12   Global Step: 65510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:03,136-Speed 10863.22 samples/sec   Loss 8.8060   LearningRate 0.0457   Epoch: 12   Global Step: 65520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:04,138-Speed 10221.53 samples/sec   Loss 8.8003   LearningRate 0.0457   Epoch: 12   Global Step: 65530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:05,128-Speed 10356.93 samples/sec   Loss 9.0305   LearningRate 0.0457   Epoch: 12   Global Step: 65540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:06,080-Speed 10768.26 samples/sec   Loss 8.6902   LearningRate 0.0457   Epoch: 12   Global Step: 65550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:07,060-Speed 10460.62 samples/sec   Loss 8.7407   LearningRate 0.0457   Epoch: 12   Global Step: 65560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:08,039-Speed 10470.23 samples/sec   Loss 8.6517   LearningRate 0.0457   Epoch: 12   Global Step: 65570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:08,991-Speed 10773.70 samples/sec   Loss 8.8955   LearningRate 0.0457   Epoch: 12   Global Step: 65580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:09,935-Speed 10849.77 samples/sec   Loss 8.8827   LearningRate 0.0457   Epoch: 12   Global Step: 65590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:10,923-Speed 10374.95 samples/sec   Loss 8.9101   LearningRate 0.0457   Epoch: 12   Global Step: 65600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:11,913-Speed 10356.66 samples/sec   Loss 8.7448   LearningRate 0.0457   Epoch: 12   Global Step: 65610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:42:12,892-Speed 10464.46 samples/sec   Loss 8.8881   LearningRate 0.0457   Epoch: 12   Global Step: 65620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:13,892-Speed 10244.09 samples/sec   Loss 8.9642   LearningRate 0.0456   Epoch: 12   Global Step: 65630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:14,872-Speed 10461.74 samples/sec   Loss 8.8797   LearningRate 0.0456   Epoch: 12   Global Step: 65640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:15,852-Speed 10463.80 samples/sec   Loss 8.9005   LearningRate 0.0456   Epoch: 12   Global Step: 65650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:16,834-Speed 10437.13 samples/sec   Loss 8.7337   LearningRate 0.0456   Epoch: 12   Global Step: 65660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:17,837-Speed 10223.34 samples/sec   Loss 8.9238   LearningRate 0.0456   Epoch: 12   Global Step: 65670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:18,765-Speed 11054.28 samples/sec   Loss 8.9212   LearningRate 0.0456   Epoch: 12   Global Step: 65680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:19,721-Speed 10730.05 samples/sec   Loss 8.7842   LearningRate 0.0456   Epoch: 12   Global Step: 65690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:20,708-Speed 10382.98 samples/sec   Loss 8.8528   LearningRate 0.0456   Epoch: 12   Global Step: 65700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:21,669-Speed 10670.96 samples/sec   Loss 8.7086   LearningRate 0.0456   Epoch: 12   Global Step: 65710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:22,660-Speed 10344.19 samples/sec   Loss 8.7943   LearningRate 0.0456   Epoch: 12   Global Step: 65720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:42:23,642-Speed 10428.32 samples/sec   Loss 9.0571   LearningRate 0.0456   Epoch: 12   Global Step: 65730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:42:24,685-Speed 9830.21 samples/sec   Loss 8.8555   LearningRate 0.0456   Epoch: 12   Global Step: 65740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:42:25,653-Speed 10595.68 samples/sec   Loss 8.8923   LearningRate 0.0456   Epoch: 12   Global Step: 65750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:42:35,771-Speed 1012.16 samples/sec   Loss 8.2785   LearningRate 0.0456   Epoch: 13   Global Step: 65760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:37,054-Speed 7989.38 samples/sec   Loss 7.8894   LearningRate 0.0456   Epoch: 13   Global Step: 65770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:38,177-Speed 9129.45 samples/sec   Loss 8.0134   LearningRate 0.0455   Epoch: 13   Global Step: 65780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:39,124-Speed 10828.60 samples/sec   Loss 7.7954   LearningRate 0.0455   Epoch: 13   Global Step: 65790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:40,178-Speed 9720.34 samples/sec   Loss 7.9053   LearningRate 0.0455   Epoch: 13   Global Step: 65800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:41,165-Speed 10379.75 samples/sec   Loss 7.8644   LearningRate 0.0455   Epoch: 13   Global Step: 65810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:42,228-Speed 9649.26 samples/sec   Loss 7.9980   LearningRate 0.0455   Epoch: 13   Global Step: 65820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:43,242-Speed 10127.64 samples/sec   Loss 7.8054   LearningRate 0.0455   Epoch: 13   Global Step: 65830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:44,213-Speed 10555.80 samples/sec   Loss 7.9100   LearningRate 0.0455   Epoch: 13   Global Step: 65840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:45,182-Speed 10576.31 samples/sec   Loss 7.9865   LearningRate 0.0455   Epoch: 13   Global Step: 65850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:46,166-Speed 10415.21 samples/sec   Loss 8.0813   LearningRate 0.0455   Epoch: 13   Global Step: 65860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:42:47,133-Speed 10597.77 samples/sec   Loss 8.0097   LearningRate 0.0455   Epoch: 13   Global Step: 65870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:48,100-Speed 10610.52 samples/sec   Loss 7.8697   LearningRate 0.0455   Epoch: 13   Global Step: 65880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:49,428-Speed 7710.14 samples/sec   Loss 7.9782   LearningRate 0.0455   Epoch: 13   Global Step: 65890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:50,410-Speed 10446.04 samples/sec   Loss 7.9542   LearningRate 0.0455   Epoch: 13   Global Step: 65900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:51,350-Speed 10913.58 samples/sec   Loss 8.0448   LearningRate 0.0455   Epoch: 13   Global Step: 65910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:52,407-Speed 9701.74 samples/sec   Loss 7.9021   LearningRate 0.0455   Epoch: 13   Global Step: 65920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:53,432-Speed 10000.16 samples/sec   Loss 8.1040   LearningRate 0.0454   Epoch: 13   Global Step: 65930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:54,376-Speed 10859.77 samples/sec   Loss 7.9538   LearningRate 0.0454   Epoch: 13   Global Step: 65940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:55,332-Speed 10720.70 samples/sec   Loss 8.0518   LearningRate 0.0454   Epoch: 13   Global Step: 65950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:56,297-Speed 10616.47 samples/sec   Loss 8.1660   LearningRate 0.0454   Epoch: 13   Global Step: 65960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:57,269-Speed 10543.52 samples/sec   Loss 8.0342   LearningRate 0.0454   Epoch: 13   Global Step: 65970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:42:58,206-Speed 10943.98 samples/sec   Loss 8.1688   LearningRate 0.0454   Epoch: 13   Global Step: 65980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:42:59,200-Speed 10317.73 samples/sec   Loss 8.1131   LearningRate 0.0454   Epoch: 13   Global Step: 65990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:43:00,223-Speed 10014.90 samples/sec   Loss 8.1039   LearningRate 0.0454   Epoch: 13   Global Step: 66000   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:43:22,218-[lfw][66000]XNorm: 12.636602
Training: 2022-04-11 01:43:22,218-[lfw][66000]Accuracy-Flip: 0.99583+-0.00367
Training: 2022-04-11 01:43:22,219-[lfw][66000]Accuracy-Highest: 0.99583
Training: 2022-04-11 01:43:47,661-[cfp_fp][66000]XNorm: 10.710675
Training: 2022-04-11 01:43:47,662-[cfp_fp][66000]Accuracy-Flip: 0.95743+-0.01331
Training: 2022-04-11 01:43:47,663-[cfp_fp][66000]Accuracy-Highest: 0.95743
Training: 2022-04-11 01:44:09,716-[agedb_30][66000]XNorm: 12.305493
Training: 2022-04-11 01:44:09,717-[agedb_30][66000]Accuracy-Flip: 0.95917+-0.00995
Training: 2022-04-11 01:44:09,718-[agedb_30][66000]Accuracy-Highest: 0.96283
Training: 2022-04-11 01:44:10,671-Speed 145.36 samples/sec   Loss 8.0227   LearningRate 0.0454   Epoch: 13   Global Step: 66010   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:44:11,607-Speed 10944.55 samples/sec   Loss 8.0856   LearningRate 0.0454   Epoch: 13   Global Step: 66020   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:44:12,585-Speed 10479.37 samples/sec   Loss 8.0919   LearningRate 0.0454   Epoch: 13   Global Step: 66030   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:44:13,560-Speed 10513.19 samples/sec   Loss 8.0918   LearningRate 0.0454   Epoch: 13   Global Step: 66040   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:44:14,546-Speed 10393.06 samples/sec   Loss 8.1874   LearningRate 0.0454   Epoch: 13   Global Step: 66050   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:44:15,487-Speed 10892.67 samples/sec   Loss 8.0547   LearningRate 0.0454   Epoch: 13   Global Step: 66060   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:44:16,510-Speed 10019.02 samples/sec   Loss 7.9170   LearningRate 0.0454   Epoch: 13   Global Step: 66070   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:44:17,483-Speed 10535.56 samples/sec   Loss 8.1358   LearningRate 0.0453   Epoch: 13   Global Step: 66080   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:44:18,451-Speed 10586.80 samples/sec   Loss 8.0619   LearningRate 0.0453   Epoch: 13   Global Step: 66090   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:44:19,433-Speed 10439.67 samples/sec   Loss 8.2737   LearningRate 0.0453   Epoch: 13   Global Step: 66100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:44:20,447-Speed 10109.64 samples/sec   Loss 8.1355   LearningRate 0.0453   Epoch: 13   Global Step: 66110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:44:21,417-Speed 10575.55 samples/sec   Loss 8.4375   LearningRate 0.0453   Epoch: 13   Global Step: 66120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:44:22,393-Speed 10490.65 samples/sec   Loss 8.2059   LearningRate 0.0453   Epoch: 13   Global Step: 66130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:44:23,350-Speed 10719.51 samples/sec   Loss 8.2172   LearningRate 0.0453   Epoch: 13   Global Step: 66140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:44:24,343-Speed 10324.32 samples/sec   Loss 8.1408   LearningRate 0.0453   Epoch: 13   Global Step: 66150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:44:25,328-Speed 10408.43 samples/sec   Loss 8.2321   LearningRate 0.0453   Epoch: 13   Global Step: 66160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:44:26,289-Speed 10664.78 samples/sec   Loss 8.2300   LearningRate 0.0453   Epoch: 13   Global Step: 66170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:44:27,246-Speed 10708.90 samples/sec   Loss 8.1906   LearningRate 0.0453   Epoch: 13   Global Step: 66180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:44:28,224-Speed 10484.35 samples/sec   Loss 8.2265   LearningRate 0.0453   Epoch: 13   Global Step: 66190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:44:29,157-Speed 10973.39 samples/sec   Loss 8.3399   LearningRate 0.0453   Epoch: 13   Global Step: 66200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:44:30,109-Speed 10766.16 samples/sec   Loss 8.3044   LearningRate 0.0453   Epoch: 13   Global Step: 66210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:44:31,119-Speed 10148.19 samples/sec   Loss 7.8974   LearningRate 0.0453   Epoch: 13   Global Step: 66220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:44:32,111-Speed 10339.68 samples/sec   Loss 8.3361   LearningRate 0.0452   Epoch: 13   Global Step: 66230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:44:33,080-Speed 10575.47 samples/sec   Loss 8.2042   LearningRate 0.0452   Epoch: 13   Global Step: 66240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:44:34,061-Speed 10446.77 samples/sec   Loss 8.0520   LearningRate 0.0452   Epoch: 13   Global Step: 66250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:44:35,019-Speed 10708.49 samples/sec   Loss 8.1668   LearningRate 0.0452   Epoch: 13   Global Step: 66260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:44:35,972-Speed 10749.41 samples/sec   Loss 8.3058   LearningRate 0.0452   Epoch: 13   Global Step: 66270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:44:36,960-Speed 10375.22 samples/sec   Loss 8.3324   LearningRate 0.0452   Epoch: 13   Global Step: 66280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:44:37,903-Speed 10870.14 samples/sec   Loss 8.1947   LearningRate 0.0452   Epoch: 13   Global Step: 66290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:44:38,865-Speed 10658.52 samples/sec   Loss 8.2554   LearningRate 0.0452   Epoch: 13   Global Step: 66300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:44:39,833-Speed 10582.78 samples/sec   Loss 8.4090   LearningRate 0.0452   Epoch: 13   Global Step: 66310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:44:40,818-Speed 10412.72 samples/sec   Loss 8.2859   LearningRate 0.0452   Epoch: 13   Global Step: 66320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:44:41,777-Speed 10684.71 samples/sec   Loss 8.3219   LearningRate 0.0452   Epoch: 13   Global Step: 66330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:44:42,712-Speed 10962.14 samples/sec   Loss 8.1942   LearningRate 0.0452   Epoch: 13   Global Step: 66340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:44:43,678-Speed 10604.71 samples/sec   Loss 8.2270   LearningRate 0.0452   Epoch: 13   Global Step: 66350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:44:44,681-Speed 10216.59 samples/sec   Loss 8.3369   LearningRate 0.0452   Epoch: 13   Global Step: 66360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:44:45,626-Speed 10843.80 samples/sec   Loss 8.4362   LearningRate 0.0452   Epoch: 13   Global Step: 66370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:44:46,602-Speed 10504.78 samples/sec   Loss 8.4117   LearningRate 0.0451   Epoch: 13   Global Step: 66380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:44:47,576-Speed 10517.54 samples/sec   Loss 8.3795   LearningRate 0.0451   Epoch: 13   Global Step: 66390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:44:48,686-Speed 9243.80 samples/sec   Loss 8.3950   LearningRate 0.0451   Epoch: 13   Global Step: 66400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:44:49,676-Speed 10352.85 samples/sec   Loss 8.3123   LearningRate 0.0451   Epoch: 13   Global Step: 66410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:44:50,713-Speed 9880.95 samples/sec   Loss 8.3075   LearningRate 0.0451   Epoch: 13   Global Step: 66420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:44:51,696-Speed 10427.33 samples/sec   Loss 8.1181   LearningRate 0.0451   Epoch: 13   Global Step: 66430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:44:52,669-Speed 10530.67 samples/sec   Loss 8.3238   LearningRate 0.0451   Epoch: 13   Global Step: 66440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:44:53,638-Speed 10576.95 samples/sec   Loss 8.3383   LearningRate 0.0451   Epoch: 13   Global Step: 66450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:44:54,620-Speed 10440.46 samples/sec   Loss 8.4095   LearningRate 0.0451   Epoch: 13   Global Step: 66460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:44:55,574-Speed 10742.45 samples/sec   Loss 8.2806   LearningRate 0.0451   Epoch: 13   Global Step: 66470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:44:56,556-Speed 10429.67 samples/sec   Loss 8.4602   LearningRate 0.0451   Epoch: 13   Global Step: 66480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:44:57,552-Speed 10288.09 samples/sec   Loss 8.3715   LearningRate 0.0451   Epoch: 13   Global Step: 66490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:44:58,501-Speed 10823.43 samples/sec   Loss 8.2232   LearningRate 0.0451   Epoch: 13   Global Step: 66500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:44:59,458-Speed 10704.41 samples/sec   Loss 8.3298   LearningRate 0.0451   Epoch: 13   Global Step: 66510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:45:00,448-Speed 10381.02 samples/sec   Loss 8.4358   LearningRate 0.0451   Epoch: 13   Global Step: 66520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:45:01,441-Speed 10321.85 samples/sec   Loss 8.1998   LearningRate 0.0450   Epoch: 13   Global Step: 66530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:45:02,412-Speed 10566.18 samples/sec   Loss 8.2828   LearningRate 0.0450   Epoch: 13   Global Step: 66540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:45:03,374-Speed 10647.03 samples/sec   Loss 8.2646   LearningRate 0.0450   Epoch: 13   Global Step: 66550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:45:04,353-Speed 10471.46 samples/sec   Loss 8.4215   LearningRate 0.0450   Epoch: 13   Global Step: 66560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:45:05,307-Speed 10740.84 samples/sec   Loss 8.3722   LearningRate 0.0450   Epoch: 13   Global Step: 66570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:45:06,264-Speed 10713.44 samples/sec   Loss 8.3011   LearningRate 0.0450   Epoch: 13   Global Step: 66580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:45:07,252-Speed 10377.16 samples/sec   Loss 8.2523   LearningRate 0.0450   Epoch: 13   Global Step: 66590   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:45:08,188-Speed 10949.65 samples/sec   Loss 8.5085   LearningRate 0.0450   Epoch: 13   Global Step: 66600   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:45:09,151-Speed 10635.52 samples/sec   Loss 8.5506   LearningRate 0.0450   Epoch: 13   Global Step: 66610   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:45:10,095-Speed 10865.62 samples/sec   Loss 8.4629   LearningRate 0.0450   Epoch: 13   Global Step: 66620   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:45:11,062-Speed 10588.27 samples/sec   Loss 8.3852   LearningRate 0.0450   Epoch: 13   Global Step: 66630   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:45:12,030-Speed 10591.28 samples/sec   Loss 8.4959   LearningRate 0.0450   Epoch: 13   Global Step: 66640   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:45:13,051-Speed 10046.14 samples/sec   Loss 8.3505   LearningRate 0.0450   Epoch: 13   Global Step: 66650   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:45:13,974-Speed 11101.73 samples/sec   Loss 8.4407   LearningRate 0.0450   Epoch: 13   Global Step: 66660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:45:14,915-Speed 10891.83 samples/sec   Loss 8.1555   LearningRate 0.0450   Epoch: 13   Global Step: 66670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:45:15,856-Speed 10886.10 samples/sec   Loss 8.3479   LearningRate 0.0449   Epoch: 13   Global Step: 66680   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:45:16,826-Speed 10561.48 samples/sec   Loss 8.5452   LearningRate 0.0449   Epoch: 13   Global Step: 66690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:45:17,781-Speed 10732.74 samples/sec   Loss 8.3575   LearningRate 0.0449   Epoch: 13   Global Step: 66700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:45:18,756-Speed 10512.77 samples/sec   Loss 8.5513   LearningRate 0.0449   Epoch: 13   Global Step: 66710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:45:19,756-Speed 10260.89 samples/sec   Loss 8.4316   LearningRate 0.0449   Epoch: 13   Global Step: 66720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:45:20,684-Speed 11057.06 samples/sec   Loss 8.3981   LearningRate 0.0449   Epoch: 13   Global Step: 66730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:45:21,647-Speed 10645.71 samples/sec   Loss 8.3463   LearningRate 0.0449   Epoch: 13   Global Step: 66740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:45:22,597-Speed 10787.05 samples/sec   Loss 8.4133   LearningRate 0.0449   Epoch: 13   Global Step: 66750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:45:23,597-Speed 10245.19 samples/sec   Loss 8.4357   LearningRate 0.0449   Epoch: 13   Global Step: 66760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:45:24,568-Speed 10559.56 samples/sec   Loss 8.3558   LearningRate 0.0449   Epoch: 13   Global Step: 66770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:45:25,499-Speed 11010.94 samples/sec   Loss 8.2961   LearningRate 0.0449   Epoch: 13   Global Step: 66780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:45:26,475-Speed 10499.90 samples/sec   Loss 8.5576   LearningRate 0.0449   Epoch: 13   Global Step: 66790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:45:27,529-Speed 9723.87 samples/sec   Loss 8.3674   LearningRate 0.0449   Epoch: 13   Global Step: 66800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:45:28,516-Speed 10385.88 samples/sec   Loss 8.5250   LearningRate 0.0449   Epoch: 13   Global Step: 66810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:45:29,480-Speed 10630.12 samples/sec   Loss 8.4504   LearningRate 0.0449   Epoch: 13   Global Step: 66820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:45:30,464-Speed 10436.07 samples/sec   Loss 8.4151   LearningRate 0.0448   Epoch: 13   Global Step: 66830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:45:31,418-Speed 10748.32 samples/sec   Loss 8.4869   LearningRate 0.0448   Epoch: 13   Global Step: 66840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:45:32,376-Speed 10700.25 samples/sec   Loss 8.5334   LearningRate 0.0448   Epoch: 13   Global Step: 66850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:45:33,383-Speed 10177.07 samples/sec   Loss 8.4958   LearningRate 0.0448   Epoch: 13   Global Step: 66860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:45:34,357-Speed 10520.25 samples/sec   Loss 8.5222   LearningRate 0.0448   Epoch: 13   Global Step: 66870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:45:35,276-Speed 11157.90 samples/sec   Loss 8.4640   LearningRate 0.0448   Epoch: 13   Global Step: 66880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:45:36,256-Speed 10467.15 samples/sec   Loss 8.4728   LearningRate 0.0448   Epoch: 13   Global Step: 66890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:45:37,205-Speed 10791.49 samples/sec   Loss 8.4463   LearningRate 0.0448   Epoch: 13   Global Step: 66900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:45:38,209-Speed 10206.89 samples/sec   Loss 8.3584   LearningRate 0.0448   Epoch: 13   Global Step: 66910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:45:39,196-Speed 10384.87 samples/sec   Loss 8.5967   LearningRate 0.0448   Epoch: 13   Global Step: 66920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:45:40,165-Speed 10583.27 samples/sec   Loss 8.4645   LearningRate 0.0448   Epoch: 13   Global Step: 66930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:45:41,140-Speed 10502.44 samples/sec   Loss 8.6496   LearningRate 0.0448   Epoch: 13   Global Step: 66940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:45:42,138-Speed 10280.24 samples/sec   Loss 8.5241   LearningRate 0.0448   Epoch: 13   Global Step: 66950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:45:43,078-Speed 10900.17 samples/sec   Loss 8.4197   LearningRate 0.0448   Epoch: 13   Global Step: 66960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:45:44,065-Speed 10381.38 samples/sec   Loss 8.3798   LearningRate 0.0448   Epoch: 13   Global Step: 66970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:45:45,048-Speed 10423.41 samples/sec   Loss 8.5144   LearningRate 0.0447   Epoch: 13   Global Step: 66980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:45:46,004-Speed 10723.27 samples/sec   Loss 8.6112   LearningRate 0.0447   Epoch: 13   Global Step: 66990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:45:46,965-Speed 10666.49 samples/sec   Loss 8.4248   LearningRate 0.0447   Epoch: 13   Global Step: 67000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:45:47,905-Speed 10907.79 samples/sec   Loss 8.4395   LearningRate 0.0447   Epoch: 13   Global Step: 67010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:45:48,837-Speed 10994.71 samples/sec   Loss 8.5577   LearningRate 0.0447   Epoch: 13   Global Step: 67020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:45:49,787-Speed 10789.24 samples/sec   Loss 8.4020   LearningRate 0.0447   Epoch: 13   Global Step: 67030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:45:50,850-Speed 9640.64 samples/sec   Loss 8.5303   LearningRate 0.0447   Epoch: 13   Global Step: 67040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:45:51,810-Speed 10672.74 samples/sec   Loss 8.5605   LearningRate 0.0447   Epoch: 13   Global Step: 67050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:45:52,780-Speed 10565.66 samples/sec   Loss 8.5134   LearningRate 0.0447   Epoch: 13   Global Step: 67060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:45:53,731-Speed 10782.13 samples/sec   Loss 8.5676   LearningRate 0.0447   Epoch: 13   Global Step: 67070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:45:54,686-Speed 10731.83 samples/sec   Loss 8.5371   LearningRate 0.0447   Epoch: 13   Global Step: 67080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:45:55,645-Speed 10691.50 samples/sec   Loss 8.6782   LearningRate 0.0447   Epoch: 13   Global Step: 67090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:45:56,569-Speed 11092.32 samples/sec   Loss 8.5583   LearningRate 0.0447   Epoch: 13   Global Step: 67100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:45:57,545-Speed 10505.84 samples/sec   Loss 8.4960   LearningRate 0.0447   Epoch: 13   Global Step: 67110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:45:58,493-Speed 10814.33 samples/sec   Loss 8.4493   LearningRate 0.0447   Epoch: 13   Global Step: 67120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:45:59,484-Speed 10348.19 samples/sec   Loss 8.4757   LearningRate 0.0446   Epoch: 13   Global Step: 67130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:46:00,414-Speed 11019.41 samples/sec   Loss 8.4234   LearningRate 0.0446   Epoch: 13   Global Step: 67140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:46:01,363-Speed 10805.79 samples/sec   Loss 8.6353   LearningRate 0.0446   Epoch: 13   Global Step: 67150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:46:02,376-Speed 10116.67 samples/sec   Loss 8.4711   LearningRate 0.0446   Epoch: 13   Global Step: 67160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:46:03,383-Speed 10171.74 samples/sec   Loss 8.5969   LearningRate 0.0446   Epoch: 13   Global Step: 67170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:46:04,321-Speed 10925.88 samples/sec   Loss 8.4081   LearningRate 0.0446   Epoch: 13   Global Step: 67180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:46:05,268-Speed 10826.12 samples/sec   Loss 8.6352   LearningRate 0.0446   Epoch: 13   Global Step: 67190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:46:06,239-Speed 10552.26 samples/sec   Loss 8.4838   LearningRate 0.0446   Epoch: 13   Global Step: 67200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:46:07,241-Speed 10229.35 samples/sec   Loss 8.5653   LearningRate 0.0446   Epoch: 13   Global Step: 67210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:46:08,178-Speed 10948.93 samples/sec   Loss 8.5038   LearningRate 0.0446   Epoch: 13   Global Step: 67220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:46:09,158-Speed 10458.49 samples/sec   Loss 8.5254   LearningRate 0.0446   Epoch: 13   Global Step: 67230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:46:10,141-Speed 10419.17 samples/sec   Loss 8.5366   LearningRate 0.0446   Epoch: 13   Global Step: 67240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:46:11,077-Speed 10953.36 samples/sec   Loss 8.4518   LearningRate 0.0446   Epoch: 13   Global Step: 67250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:46:12,031-Speed 10741.63 samples/sec   Loss 8.3498   LearningRate 0.0446   Epoch: 13   Global Step: 67260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:46:13,052-Speed 10041.40 samples/sec   Loss 8.6144   LearningRate 0.0446   Epoch: 13   Global Step: 67270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:46:14,008-Speed 10721.48 samples/sec   Loss 8.4584   LearningRate 0.0445   Epoch: 13   Global Step: 67280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:46:14,991-Speed 10428.98 samples/sec   Loss 8.3820   LearningRate 0.0445   Epoch: 13   Global Step: 67290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:46:15,959-Speed 10586.28 samples/sec   Loss 8.4683   LearningRate 0.0445   Epoch: 13   Global Step: 67300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:46:16,926-Speed 10604.80 samples/sec   Loss 8.6406   LearningRate 0.0445   Epoch: 13   Global Step: 67310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:46:17,949-Speed 10011.87 samples/sec   Loss 8.5655   LearningRate 0.0445   Epoch: 13   Global Step: 67320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:46:18,912-Speed 10639.88 samples/sec   Loss 8.6229   LearningRate 0.0445   Epoch: 13   Global Step: 67330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:46:19,914-Speed 10229.27 samples/sec   Loss 8.6377   LearningRate 0.0445   Epoch: 13   Global Step: 67340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:46:20,876-Speed 10659.54 samples/sec   Loss 8.5831   LearningRate 0.0445   Epoch: 13   Global Step: 67350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:46:21,977-Speed 9306.16 samples/sec   Loss 8.4236   LearningRate 0.0445   Epoch: 13   Global Step: 67360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:46:22,934-Speed 10707.70 samples/sec   Loss 8.6229   LearningRate 0.0445   Epoch: 13   Global Step: 67370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:46:23,881-Speed 10833.92 samples/sec   Loss 8.6075   LearningRate 0.0445   Epoch: 13   Global Step: 67380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:46:24,829-Speed 10810.25 samples/sec   Loss 8.6672   LearningRate 0.0445   Epoch: 13   Global Step: 67390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:46:25,764-Speed 10951.39 samples/sec   Loss 8.6712   LearningRate 0.0445   Epoch: 13   Global Step: 67400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:46:26,703-Speed 10913.62 samples/sec   Loss 8.5905   LearningRate 0.0445   Epoch: 13   Global Step: 67410   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:46:27,690-Speed 10387.79 samples/sec   Loss 8.5872   LearningRate 0.0445   Epoch: 13   Global Step: 67420   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:46:28,656-Speed 10619.28 samples/sec   Loss 8.5788   LearningRate 0.0445   Epoch: 13   Global Step: 67430   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:46:29,582-Speed 11064.27 samples/sec   Loss 8.5184   LearningRate 0.0444   Epoch: 13   Global Step: 67440   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:46:30,515-Speed 10981.23 samples/sec   Loss 8.6191   LearningRate 0.0444   Epoch: 13   Global Step: 67450   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:46:31,536-Speed 10044.14 samples/sec   Loss 8.4685   LearningRate 0.0444   Epoch: 13   Global Step: 67460   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:46:32,525-Speed 10356.98 samples/sec   Loss 8.7816   LearningRate 0.0444   Epoch: 13   Global Step: 67470   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:46:33,488-Speed 10643.45 samples/sec   Loss 8.5758   LearningRate 0.0444   Epoch: 13   Global Step: 67480   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:46:34,458-Speed 10565.14 samples/sec   Loss 8.5276   LearningRate 0.0444   Epoch: 13   Global Step: 67490   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:46:35,430-Speed 10547.08 samples/sec   Loss 8.8214   LearningRate 0.0444   Epoch: 13   Global Step: 67500   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:46:36,384-Speed 10745.08 samples/sec   Loss 8.4782   LearningRate 0.0444   Epoch: 13   Global Step: 67510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:46:37,364-Speed 10463.38 samples/sec   Loss 8.6583   LearningRate 0.0444   Epoch: 13   Global Step: 67520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:46:38,302-Speed 10919.05 samples/sec   Loss 8.5267   LearningRate 0.0444   Epoch: 13   Global Step: 67530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:46:39,299-Speed 10277.96 samples/sec   Loss 8.5368   LearningRate 0.0444   Epoch: 13   Global Step: 67540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:46:40,325-Speed 9987.31 samples/sec   Loss 8.5749   LearningRate 0.0444   Epoch: 13   Global Step: 67550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:46:41,259-Speed 10981.79 samples/sec   Loss 8.7005   LearningRate 0.0444   Epoch: 13   Global Step: 67560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:46:42,210-Speed 10770.19 samples/sec   Loss 8.5462   LearningRate 0.0444   Epoch: 13   Global Step: 67570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:46:43,163-Speed 10755.42 samples/sec   Loss 8.5244   LearningRate 0.0444   Epoch: 13   Global Step: 67580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:46:44,189-Speed 9994.70 samples/sec   Loss 8.6151   LearningRate 0.0443   Epoch: 13   Global Step: 67590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:46:45,134-Speed 10841.99 samples/sec   Loss 8.7101   LearningRate 0.0443   Epoch: 13   Global Step: 67600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:46:46,080-Speed 10834.39 samples/sec   Loss 8.6372   LearningRate 0.0443   Epoch: 13   Global Step: 67610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:46:47,045-Speed 10620.39 samples/sec   Loss 8.5497   LearningRate 0.0443   Epoch: 13   Global Step: 67620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:46:48,047-Speed 10225.05 samples/sec   Loss 8.4664   LearningRate 0.0443   Epoch: 13   Global Step: 67630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:46:49,009-Speed 10649.59 samples/sec   Loss 8.7686   LearningRate 0.0443   Epoch: 13   Global Step: 67640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:46:49,964-Speed 10741.22 samples/sec   Loss 8.7998   LearningRate 0.0443   Epoch: 13   Global Step: 67650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:46:50,885-Speed 11127.84 samples/sec   Loss 8.7073   LearningRate 0.0443   Epoch: 13   Global Step: 67660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:46:51,874-Speed 10353.56 samples/sec   Loss 8.4649   LearningRate 0.0443   Epoch: 13   Global Step: 67670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:46:52,882-Speed 10169.30 samples/sec   Loss 8.6462   LearningRate 0.0443   Epoch: 13   Global Step: 67680   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:46:53,847-Speed 10630.30 samples/sec   Loss 8.6859   LearningRate 0.0443   Epoch: 13   Global Step: 67690   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:46:54,819-Speed 10535.28 samples/sec   Loss 8.8165   LearningRate 0.0443   Epoch: 13   Global Step: 67700   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:46:55,802-Speed 10425.86 samples/sec   Loss 8.6803   LearningRate 0.0443   Epoch: 13   Global Step: 67710   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:46:56,829-Speed 9984.20 samples/sec   Loss 8.5994   LearningRate 0.0443   Epoch: 13   Global Step: 67720   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:46:57,804-Speed 10509.86 samples/sec   Loss 8.4914   LearningRate 0.0443   Epoch: 13   Global Step: 67730   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:46:58,773-Speed 10580.22 samples/sec   Loss 8.8042   LearningRate 0.0442   Epoch: 13   Global Step: 67740   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:46:59,756-Speed 10416.35 samples/sec   Loss 8.6616   LearningRate 0.0442   Epoch: 13   Global Step: 67750   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:47:00,698-Speed 10879.91 samples/sec   Loss 8.4127   LearningRate 0.0442   Epoch: 13   Global Step: 67760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:47:01,671-Speed 10539.87 samples/sec   Loss 8.6389   LearningRate 0.0442   Epoch: 13   Global Step: 67770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:47:02,665-Speed 10309.15 samples/sec   Loss 8.5367   LearningRate 0.0442   Epoch: 13   Global Step: 67780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:47:03,590-Speed 11089.48 samples/sec   Loss 8.4954   LearningRate 0.0442   Epoch: 13   Global Step: 67790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:47:04,548-Speed 10697.68 samples/sec   Loss 8.5796   LearningRate 0.0442   Epoch: 13   Global Step: 67800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:47:05,511-Speed 10635.61 samples/sec   Loss 8.6884   LearningRate 0.0442   Epoch: 13   Global Step: 67810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:47:06,518-Speed 10179.77 samples/sec   Loss 8.5692   LearningRate 0.0442   Epoch: 13   Global Step: 67820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:47:07,544-Speed 9993.36 samples/sec   Loss 8.6092   LearningRate 0.0442   Epoch: 13   Global Step: 67830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:47:08,510-Speed 10604.41 samples/sec   Loss 8.4640   LearningRate 0.0442   Epoch: 13   Global Step: 67840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:47:09,493-Speed 10428.09 samples/sec   Loss 8.5830   LearningRate 0.0442   Epoch: 13   Global Step: 67850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:47:10,468-Speed 10502.80 samples/sec   Loss 8.6122   LearningRate 0.0442   Epoch: 13   Global Step: 67860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:47:11,420-Speed 10769.77 samples/sec   Loss 8.6275   LearningRate 0.0442   Epoch: 13   Global Step: 67870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:47:12,381-Speed 10665.52 samples/sec   Loss 8.6333   LearningRate 0.0442   Epoch: 13   Global Step: 67880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:47:13,357-Speed 10506.37 samples/sec   Loss 8.5930   LearningRate 0.0441   Epoch: 13   Global Step: 67890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:47:14,315-Speed 10696.94 samples/sec   Loss 8.5562   LearningRate 0.0441   Epoch: 13   Global Step: 67900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:47:15,288-Speed 10535.33 samples/sec   Loss 8.5385   LearningRate 0.0441   Epoch: 13   Global Step: 67910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:47:16,261-Speed 10539.67 samples/sec   Loss 8.7306   LearningRate 0.0441   Epoch: 13   Global Step: 67920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:47:17,217-Speed 10715.09 samples/sec   Loss 8.5511   LearningRate 0.0441   Epoch: 13   Global Step: 67930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:47:18,188-Speed 10552.17 samples/sec   Loss 8.8040   LearningRate 0.0441   Epoch: 13   Global Step: 67940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:47:19,148-Speed 10682.84 samples/sec   Loss 8.7824   LearningRate 0.0441   Epoch: 13   Global Step: 67950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:47:20,217-Speed 9587.15 samples/sec   Loss 8.5524   LearningRate 0.0441   Epoch: 13   Global Step: 67960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:47:21,172-Speed 10725.80 samples/sec   Loss 8.5837   LearningRate 0.0441   Epoch: 13   Global Step: 67970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:47:22,132-Speed 10678.11 samples/sec   Loss 8.7758   LearningRate 0.0441   Epoch: 13   Global Step: 67980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:47:23,143-Speed 10141.74 samples/sec   Loss 8.6875   LearningRate 0.0441   Epoch: 13   Global Step: 67990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:47:24,128-Speed 10402.21 samples/sec   Loss 8.6176   LearningRate 0.0441   Epoch: 13   Global Step: 68000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:47:46,410-[lfw][68000]XNorm: 12.240071
Training: 2022-04-11 01:47:46,410-[lfw][68000]Accuracy-Flip: 0.99433+-0.00367
Training: 2022-04-11 01:47:46,411-[lfw][68000]Accuracy-Highest: 0.99583
Training: 2022-04-11 01:48:12,064-[cfp_fp][68000]XNorm: 10.309925
Training: 2022-04-11 01:48:12,065-[cfp_fp][68000]Accuracy-Flip: 0.95129+-0.01003
Training: 2022-04-11 01:48:12,066-[cfp_fp][68000]Accuracy-Highest: 0.95743
Training: 2022-04-11 01:48:34,331-[agedb_30][68000]XNorm: 11.937241
Training: 2022-04-11 01:48:34,332-[agedb_30][68000]Accuracy-Flip: 0.96150+-0.01034
Training: 2022-04-11 01:48:34,333-[agedb_30][68000]Accuracy-Highest: 0.96283
Training: 2022-04-11 01:48:35,266-Speed 143.95 samples/sec   Loss 8.6894   LearningRate 0.0441   Epoch: 13   Global Step: 68010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:48:36,198-Speed 11002.09 samples/sec   Loss 8.6705   LearningRate 0.0441   Epoch: 13   Global Step: 68020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:48:37,161-Speed 10638.07 samples/sec   Loss 8.5281   LearningRate 0.0441   Epoch: 13   Global Step: 68030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:48:38,137-Speed 10498.74 samples/sec   Loss 8.6475   LearningRate 0.0440   Epoch: 13   Global Step: 68040   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:48:39,166-Speed 9959.98 samples/sec   Loss 8.7876   LearningRate 0.0440   Epoch: 13   Global Step: 68050   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:48:40,081-Speed 11210.54 samples/sec   Loss 8.8756   LearningRate 0.0440   Epoch: 13   Global Step: 68060   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:48:41,026-Speed 10848.58 samples/sec   Loss 8.5909   LearningRate 0.0440   Epoch: 13   Global Step: 68070   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:48:41,999-Speed 10531.45 samples/sec   Loss 8.6318   LearningRate 0.0440   Epoch: 13   Global Step: 68080   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:48:42,981-Speed 10441.35 samples/sec   Loss 8.7489   LearningRate 0.0440   Epoch: 13   Global Step: 68090   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:48:43,967-Speed 10389.46 samples/sec   Loss 8.6050   LearningRate 0.0440   Epoch: 13   Global Step: 68100   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:48:44,914-Speed 10842.87 samples/sec   Loss 8.5843   LearningRate 0.0440   Epoch: 13   Global Step: 68110   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:48:45,858-Speed 10862.28 samples/sec   Loss 8.7609   LearningRate 0.0440   Epoch: 13   Global Step: 68120   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:48:46,834-Speed 10495.11 samples/sec   Loss 8.5180   LearningRate 0.0440   Epoch: 13   Global Step: 68130   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:48:47,858-Speed 10013.18 samples/sec   Loss 8.6060   LearningRate 0.0440   Epoch: 13   Global Step: 68140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:48:48,890-Speed 9926.91 samples/sec   Loss 8.6484   LearningRate 0.0440   Epoch: 13   Global Step: 68150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:48:49,854-Speed 10639.27 samples/sec   Loss 8.5535   LearningRate 0.0440   Epoch: 13   Global Step: 68160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:48:50,865-Speed 10137.50 samples/sec   Loss 8.7342   LearningRate 0.0440   Epoch: 13   Global Step: 68170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:48:51,864-Speed 10259.42 samples/sec   Loss 8.7819   LearningRate 0.0440   Epoch: 13   Global Step: 68180   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:48:52,831-Speed 10604.77 samples/sec   Loss 8.6367   LearningRate 0.0440   Epoch: 13   Global Step: 68190   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:48:53,826-Speed 10297.48 samples/sec   Loss 8.6796   LearningRate 0.0439   Epoch: 13   Global Step: 68200   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:48:54,796-Speed 10572.16 samples/sec   Loss 8.3857   LearningRate 0.0439   Epoch: 13   Global Step: 68210   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:48:55,716-Speed 11132.41 samples/sec   Loss 8.6127   LearningRate 0.0439   Epoch: 13   Global Step: 68220   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:48:56,765-Speed 9777.20 samples/sec   Loss 8.7308   LearningRate 0.0439   Epoch: 13   Global Step: 68230   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:48:57,728-Speed 10640.05 samples/sec   Loss 8.6740   LearningRate 0.0439   Epoch: 13   Global Step: 68240   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:48:58,690-Speed 10647.53 samples/sec   Loss 8.6052   LearningRate 0.0439   Epoch: 13   Global Step: 68250   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:48:59,699-Speed 10157.69 samples/sec   Loss 8.6974   LearningRate 0.0439   Epoch: 13   Global Step: 68260   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:49:00,687-Speed 10385.21 samples/sec   Loss 8.7121   LearningRate 0.0439   Epoch: 13   Global Step: 68270   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:49:01,649-Speed 10650.31 samples/sec   Loss 8.4517   LearningRate 0.0439   Epoch: 13   Global Step: 68280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:49:02,601-Speed 10773.23 samples/sec   Loss 8.7952   LearningRate 0.0439   Epoch: 13   Global Step: 68290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:49:03,564-Speed 10639.07 samples/sec   Loss 8.5941   LearningRate 0.0439   Epoch: 13   Global Step: 68300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:49:04,533-Speed 10568.68 samples/sec   Loss 8.6472   LearningRate 0.0439   Epoch: 13   Global Step: 68310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:49:05,490-Speed 10714.58 samples/sec   Loss 8.7733   LearningRate 0.0439   Epoch: 13   Global Step: 68320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:49:06,496-Speed 10188.86 samples/sec   Loss 8.5955   LearningRate 0.0439   Epoch: 13   Global Step: 68330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:49:07,451-Speed 10738.51 samples/sec   Loss 8.6452   LearningRate 0.0439   Epoch: 13   Global Step: 68340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:49:08,384-Speed 10977.92 samples/sec   Loss 8.7219   LearningRate 0.0438   Epoch: 13   Global Step: 68350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:49:09,401-Speed 10082.93 samples/sec   Loss 8.6346   LearningRate 0.0438   Epoch: 13   Global Step: 68360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:49:10,418-Speed 10071.16 samples/sec   Loss 8.6916   LearningRate 0.0438   Epoch: 13   Global Step: 68370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:49:11,400-Speed 10451.64 samples/sec   Loss 8.6649   LearningRate 0.0438   Epoch: 13   Global Step: 68380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:49:12,336-Speed 10949.72 samples/sec   Loss 8.6709   LearningRate 0.0438   Epoch: 13   Global Step: 68390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:49:13,319-Speed 10430.48 samples/sec   Loss 8.7257   LearningRate 0.0438   Epoch: 13   Global Step: 68400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:49:14,313-Speed 10308.42 samples/sec   Loss 8.5294   LearningRate 0.0438   Epoch: 13   Global Step: 68410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:49:15,270-Speed 10722.18 samples/sec   Loss 8.6245   LearningRate 0.0438   Epoch: 13   Global Step: 68420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:49:16,232-Speed 10641.93 samples/sec   Loss 8.6332   LearningRate 0.0438   Epoch: 13   Global Step: 68430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:49:17,208-Speed 10501.12 samples/sec   Loss 8.6560   LearningRate 0.0438   Epoch: 13   Global Step: 68440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:49:18,204-Speed 10290.50 samples/sec   Loss 8.6500   LearningRate 0.0438   Epoch: 13   Global Step: 68450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:49:19,142-Speed 10927.25 samples/sec   Loss 8.7715   LearningRate 0.0438   Epoch: 13   Global Step: 68460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:49:20,107-Speed 10621.59 samples/sec   Loss 8.5850   LearningRate 0.0438   Epoch: 13   Global Step: 68470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:49:21,099-Speed 10335.31 samples/sec   Loss 8.6034   LearningRate 0.0438   Epoch: 13   Global Step: 68480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:49:22,070-Speed 10553.60 samples/sec   Loss 8.5599   LearningRate 0.0438   Epoch: 13   Global Step: 68490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:49:23,035-Speed 10626.80 samples/sec   Loss 8.6202   LearningRate 0.0437   Epoch: 13   Global Step: 68500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:49:23,992-Speed 10719.21 samples/sec   Loss 8.5893   LearningRate 0.0437   Epoch: 13   Global Step: 68510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:49:24,921-Speed 11033.96 samples/sec   Loss 8.6429   LearningRate 0.0437   Epoch: 13   Global Step: 68520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:49:25,881-Speed 10672.24 samples/sec   Loss 8.4829   LearningRate 0.0437   Epoch: 13   Global Step: 68530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:49:26,851-Speed 10560.76 samples/sec   Loss 8.4685   LearningRate 0.0437   Epoch: 13   Global Step: 68540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:49:27,804-Speed 10762.01 samples/sec   Loss 8.7174   LearningRate 0.0437   Epoch: 13   Global Step: 68550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:49:28,708-Speed 11339.10 samples/sec   Loss 8.5849   LearningRate 0.0437   Epoch: 13   Global Step: 68560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:49:29,674-Speed 10602.29 samples/sec   Loss 8.6078   LearningRate 0.0437   Epoch: 13   Global Step: 68570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:49:30,681-Speed 10176.90 samples/sec   Loss 8.6346   LearningRate 0.0437   Epoch: 13   Global Step: 68580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:49:31,658-Speed 10497.79 samples/sec   Loss 8.6130   LearningRate 0.0437   Epoch: 13   Global Step: 68590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:49:32,572-Speed 11217.46 samples/sec   Loss 8.5664   LearningRate 0.0437   Epoch: 13   Global Step: 68600   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:49:33,522-Speed 10785.61 samples/sec   Loss 8.6987   LearningRate 0.0437   Epoch: 13   Global Step: 68610   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:49:34,506-Speed 10430.13 samples/sec   Loss 8.6742   LearningRate 0.0437   Epoch: 13   Global Step: 68620   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:49:35,486-Speed 10462.48 samples/sec   Loss 8.6449   LearningRate 0.0437   Epoch: 13   Global Step: 68630   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:49:36,447-Speed 10673.99 samples/sec   Loss 8.6774   LearningRate 0.0437   Epoch: 13   Global Step: 68640   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:49:37,436-Speed 10367.42 samples/sec   Loss 8.7460   LearningRate 0.0437   Epoch: 13   Global Step: 68650   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:49:38,422-Speed 10392.13 samples/sec   Loss 8.6132   LearningRate 0.0436   Epoch: 13   Global Step: 68660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:49:39,343-Speed 11127.44 samples/sec   Loss 8.8664   LearningRate 0.0436   Epoch: 13   Global Step: 68670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:49:40,302-Speed 10717.57 samples/sec   Loss 8.6879   LearningRate 0.0436   Epoch: 13   Global Step: 68680   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:49:41,291-Speed 10356.17 samples/sec   Loss 8.6458   LearningRate 0.0436   Epoch: 13   Global Step: 68690   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:49:42,285-Speed 10310.80 samples/sec   Loss 8.7490   LearningRate 0.0436   Epoch: 13   Global Step: 68700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:49:43,229-Speed 10859.19 samples/sec   Loss 8.7588   LearningRate 0.0436   Epoch: 13   Global Step: 68710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:49:44,159-Speed 11025.82 samples/sec   Loss 8.8112   LearningRate 0.0436   Epoch: 13   Global Step: 68720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:49:45,150-Speed 10342.08 samples/sec   Loss 8.5478   LearningRate 0.0436   Epoch: 13   Global Step: 68730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:49:46,092-Speed 10873.67 samples/sec   Loss 8.6643   LearningRate 0.0436   Epoch: 13   Global Step: 68740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:49:47,026-Speed 10978.03 samples/sec   Loss 8.6858   LearningRate 0.0436   Epoch: 13   Global Step: 68750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:49:47,975-Speed 10800.59 samples/sec   Loss 8.7942   LearningRate 0.0436   Epoch: 13   Global Step: 68760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:49:48,971-Speed 10292.54 samples/sec   Loss 8.6450   LearningRate 0.0436   Epoch: 13   Global Step: 68770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:49:50,018-Speed 9783.98 samples/sec   Loss 8.6525   LearningRate 0.0436   Epoch: 13   Global Step: 68780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:49:50,991-Speed 10543.21 samples/sec   Loss 8.5669   LearningRate 0.0436   Epoch: 13   Global Step: 68790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:49:51,934-Speed 10872.47 samples/sec   Loss 8.6341   LearningRate 0.0436   Epoch: 13   Global Step: 68800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:49:52,921-Speed 10379.05 samples/sec   Loss 8.8258   LearningRate 0.0435   Epoch: 13   Global Step: 68810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:49:53,855-Speed 10979.90 samples/sec   Loss 8.6915   LearningRate 0.0435   Epoch: 13   Global Step: 68820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:49:54,804-Speed 10797.30 samples/sec   Loss 8.7125   LearningRate 0.0435   Epoch: 13   Global Step: 68830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:49:55,765-Speed 10668.67 samples/sec   Loss 8.6341   LearningRate 0.0435   Epoch: 13   Global Step: 68840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:49:56,736-Speed 10547.27 samples/sec   Loss 8.7231   LearningRate 0.0435   Epoch: 13   Global Step: 68850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:49:57,765-Speed 9953.90 samples/sec   Loss 8.6855   LearningRate 0.0435   Epoch: 13   Global Step: 68860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:49:58,724-Speed 10696.02 samples/sec   Loss 8.7742   LearningRate 0.0435   Epoch: 13   Global Step: 68870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:49:59,686-Speed 10646.07 samples/sec   Loss 8.6978   LearningRate 0.0435   Epoch: 13   Global Step: 68880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:50:00,776-Speed 9407.95 samples/sec   Loss 8.6629   LearningRate 0.0435   Epoch: 13   Global Step: 68890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:50:01,769-Speed 10324.57 samples/sec   Loss 8.6556   LearningRate 0.0435   Epoch: 13   Global Step: 68900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:50:02,739-Speed 10573.56 samples/sec   Loss 8.6831   LearningRate 0.0435   Epoch: 13   Global Step: 68910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:50:03,723-Speed 10408.15 samples/sec   Loss 8.6536   LearningRate 0.0435   Epoch: 13   Global Step: 68920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:50:04,755-Speed 9934.29 samples/sec   Loss 8.7356   LearningRate 0.0435   Epoch: 13   Global Step: 68930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:50:05,698-Speed 10862.61 samples/sec   Loss 8.6848   LearningRate 0.0435   Epoch: 13   Global Step: 68940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:50:06,634-Speed 10956.67 samples/sec   Loss 8.6612   LearningRate 0.0435   Epoch: 13   Global Step: 68950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:50:07,567-Speed 10974.58 samples/sec   Loss 8.9875   LearningRate 0.0434   Epoch: 13   Global Step: 68960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:50:08,581-Speed 10116.15 samples/sec   Loss 8.5768   LearningRate 0.0434   Epoch: 13   Global Step: 68970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:50:09,546-Speed 10623.34 samples/sec   Loss 8.7513   LearningRate 0.0434   Epoch: 13   Global Step: 68980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:50:10,514-Speed 10587.31 samples/sec   Loss 8.7759   LearningRate 0.0434   Epoch: 13   Global Step: 68990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:50:11,457-Speed 10866.95 samples/sec   Loss 8.6856   LearningRate 0.0434   Epoch: 13   Global Step: 69000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:50:12,451-Speed 10305.95 samples/sec   Loss 8.7078   LearningRate 0.0434   Epoch: 13   Global Step: 69010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:50:13,427-Speed 10530.30 samples/sec   Loss 8.7372   LearningRate 0.0434   Epoch: 13   Global Step: 69020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:50:14,390-Speed 10642.61 samples/sec   Loss 8.7222   LearningRate 0.0434   Epoch: 13   Global Step: 69030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:50:15,321-Speed 11012.40 samples/sec   Loss 8.8240   LearningRate 0.0434   Epoch: 13   Global Step: 69040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:50:16,261-Speed 10905.17 samples/sec   Loss 8.6266   LearningRate 0.0434   Epoch: 13   Global Step: 69050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:50:17,202-Speed 10894.12 samples/sec   Loss 8.7626   LearningRate 0.0434   Epoch: 13   Global Step: 69060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:50:18,165-Speed 10644.77 samples/sec   Loss 8.5760   LearningRate 0.0434   Epoch: 13   Global Step: 69070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:50:19,133-Speed 10588.76 samples/sec   Loss 8.8034   LearningRate 0.0434   Epoch: 13   Global Step: 69080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:50:20,106-Speed 10523.39 samples/sec   Loss 8.5885   LearningRate 0.0434   Epoch: 13   Global Step: 69090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:50:21,105-Speed 10261.66 samples/sec   Loss 8.5427   LearningRate 0.0434   Epoch: 13   Global Step: 69100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:50:22,070-Speed 10622.01 samples/sec   Loss 8.7636   LearningRate 0.0434   Epoch: 13   Global Step: 69110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:50:23,022-Speed 10768.97 samples/sec   Loss 8.7422   LearningRate 0.0433   Epoch: 13   Global Step: 69120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:50:23,959-Speed 10943.74 samples/sec   Loss 8.6899   LearningRate 0.0433   Epoch: 13   Global Step: 69130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:50:24,927-Speed 10588.36 samples/sec   Loss 8.7501   LearningRate 0.0433   Epoch: 13   Global Step: 69140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:50:25,865-Speed 10915.60 samples/sec   Loss 8.8091   LearningRate 0.0433   Epoch: 13   Global Step: 69150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:50:26,844-Speed 10479.72 samples/sec   Loss 8.6038   LearningRate 0.0433   Epoch: 13   Global Step: 69160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:50:27,805-Speed 10662.05 samples/sec   Loss 8.5058   LearningRate 0.0433   Epoch: 13   Global Step: 69170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:50:28,771-Speed 10603.21 samples/sec   Loss 8.7241   LearningRate 0.0433   Epoch: 13   Global Step: 69180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:50:29,746-Speed 10510.76 samples/sec   Loss 8.6623   LearningRate 0.0433   Epoch: 13   Global Step: 69190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:50:30,794-Speed 9782.02 samples/sec   Loss 8.7943   LearningRate 0.0433   Epoch: 13   Global Step: 69200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:50:31,756-Speed 10657.40 samples/sec   Loss 8.7037   LearningRate 0.0433   Epoch: 13   Global Step: 69210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:50:32,725-Speed 10570.83 samples/sec   Loss 8.5025   LearningRate 0.0433   Epoch: 13   Global Step: 69220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:50:33,647-Speed 11122.41 samples/sec   Loss 8.8383   LearningRate 0.0433   Epoch: 13   Global Step: 69230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:50:34,661-Speed 10101.62 samples/sec   Loss 8.7563   LearningRate 0.0433   Epoch: 13   Global Step: 69240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:50:35,634-Speed 10537.02 samples/sec   Loss 8.6459   LearningRate 0.0433   Epoch: 13   Global Step: 69250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:50:36,605-Speed 10558.25 samples/sec   Loss 8.7623   LearningRate 0.0433   Epoch: 13   Global Step: 69260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:50:37,565-Speed 10674.95 samples/sec   Loss 8.6620   LearningRate 0.0432   Epoch: 13   Global Step: 69270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:50:38,532-Speed 10601.12 samples/sec   Loss 8.7856   LearningRate 0.0432   Epoch: 13   Global Step: 69280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:50:39,533-Speed 10240.42 samples/sec   Loss 8.7027   LearningRate 0.0432   Epoch: 13   Global Step: 69290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:50:40,506-Speed 10533.70 samples/sec   Loss 8.6763   LearningRate 0.0432   Epoch: 13   Global Step: 69300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:50:41,480-Speed 10520.21 samples/sec   Loss 8.5972   LearningRate 0.0432   Epoch: 13   Global Step: 69310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:50:42,473-Speed 10325.01 samples/sec   Loss 8.8300   LearningRate 0.0432   Epoch: 13   Global Step: 69320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:50:43,444-Speed 10561.76 samples/sec   Loss 8.6940   LearningRate 0.0432   Epoch: 13   Global Step: 69330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:50:44,370-Speed 11064.92 samples/sec   Loss 8.8050   LearningRate 0.0432   Epoch: 13   Global Step: 69340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:50:45,331-Speed 10659.88 samples/sec   Loss 8.7342   LearningRate 0.0432   Epoch: 13   Global Step: 69350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:50:46,260-Speed 11029.28 samples/sec   Loss 8.6211   LearningRate 0.0432   Epoch: 13   Global Step: 69360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:50:47,243-Speed 10432.40 samples/sec   Loss 8.8783   LearningRate 0.0432   Epoch: 13   Global Step: 69370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:50:48,188-Speed 10851.83 samples/sec   Loss 8.6633   LearningRate 0.0432   Epoch: 13   Global Step: 69380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:50:49,165-Speed 10491.16 samples/sec   Loss 8.6657   LearningRate 0.0432   Epoch: 13   Global Step: 69390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:50:50,152-Speed 10384.03 samples/sec   Loss 8.7280   LearningRate 0.0432   Epoch: 13   Global Step: 69400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:50:51,149-Speed 10276.27 samples/sec   Loss 8.7165   LearningRate 0.0432   Epoch: 13   Global Step: 69410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:50:52,124-Speed 10511.94 samples/sec   Loss 8.7396   LearningRate 0.0431   Epoch: 13   Global Step: 69420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:50:53,081-Speed 10709.17 samples/sec   Loss 8.6322   LearningRate 0.0431   Epoch: 13   Global Step: 69430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:50:54,108-Speed 9976.77 samples/sec   Loss 8.7866   LearningRate 0.0431   Epoch: 13   Global Step: 69440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:50:55,072-Speed 10643.54 samples/sec   Loss 8.6249   LearningRate 0.0431   Epoch: 13   Global Step: 69450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:50:55,987-Speed 11197.21 samples/sec   Loss 8.9262   LearningRate 0.0431   Epoch: 13   Global Step: 69460   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:50:56,949-Speed 10652.26 samples/sec   Loss 8.5332   LearningRate 0.0431   Epoch: 13   Global Step: 69470   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:50:57,912-Speed 10650.25 samples/sec   Loss 8.7102   LearningRate 0.0431   Epoch: 13   Global Step: 69480   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:50:58,836-Speed 11100.77 samples/sec   Loss 8.7360   LearningRate 0.0431   Epoch: 13   Global Step: 69490   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:50:59,832-Speed 10295.55 samples/sec   Loss 8.6049   LearningRate 0.0431   Epoch: 13   Global Step: 69500   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:51:00,774-Speed 10880.46 samples/sec   Loss 8.6409   LearningRate 0.0431   Epoch: 13   Global Step: 69510   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:51:01,762-Speed 10377.61 samples/sec   Loss 8.7277   LearningRate 0.0431   Epoch: 13   Global Step: 69520   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:51:02,708-Speed 10828.96 samples/sec   Loss 8.5530   LearningRate 0.0431   Epoch: 13   Global Step: 69530   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:51:03,648-Speed 10906.13 samples/sec   Loss 8.6168   LearningRate 0.0431   Epoch: 13   Global Step: 69540   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:51:04,623-Speed 10506.08 samples/sec   Loss 8.5663   LearningRate 0.0431   Epoch: 13   Global Step: 69550   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:51:05,578-Speed 10734.96 samples/sec   Loss 8.6984   LearningRate 0.0431   Epoch: 13   Global Step: 69560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:51:06,550-Speed 10542.28 samples/sec   Loss 8.7590   LearningRate 0.0431   Epoch: 13   Global Step: 69570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:51:07,571-Speed 10038.79 samples/sec   Loss 8.6398   LearningRate 0.0430   Epoch: 13   Global Step: 69580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:51:08,555-Speed 10414.12 samples/sec   Loss 8.8192   LearningRate 0.0430   Epoch: 13   Global Step: 69590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:51:09,522-Speed 10600.48 samples/sec   Loss 8.6544   LearningRate 0.0430   Epoch: 13   Global Step: 69600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:51:10,477-Speed 10736.37 samples/sec   Loss 8.8555   LearningRate 0.0430   Epoch: 13   Global Step: 69610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:51:11,492-Speed 10097.20 samples/sec   Loss 8.6611   LearningRate 0.0430   Epoch: 13   Global Step: 69620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:51:12,440-Speed 10809.89 samples/sec   Loss 8.7454   LearningRate 0.0430   Epoch: 13   Global Step: 69630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:51:13,383-Speed 10872.19 samples/sec   Loss 8.6418   LearningRate 0.0430   Epoch: 13   Global Step: 69640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:51:14,330-Speed 10825.84 samples/sec   Loss 8.6653   LearningRate 0.0430   Epoch: 13   Global Step: 69650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:51:15,246-Speed 11181.87 samples/sec   Loss 8.7946   LearningRate 0.0430   Epoch: 13   Global Step: 69660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:51:16,233-Speed 10385.48 samples/sec   Loss 8.8718   LearningRate 0.0430   Epoch: 13   Global Step: 69670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:51:17,258-Speed 10001.08 samples/sec   Loss 8.5933   LearningRate 0.0430   Epoch: 13   Global Step: 69680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:51:18,174-Speed 11187.49 samples/sec   Loss 8.6346   LearningRate 0.0430   Epoch: 13   Global Step: 69690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:51:19,143-Speed 10574.83 samples/sec   Loss 8.6070   LearningRate 0.0430   Epoch: 13   Global Step: 69700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:51:20,106-Speed 10644.52 samples/sec   Loss 8.5921   LearningRate 0.0430   Epoch: 13   Global Step: 69710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:51:21,036-Speed 11014.75 samples/sec   Loss 8.7315   LearningRate 0.0430   Epoch: 13   Global Step: 69720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:51:21,953-Speed 11178.33 samples/sec   Loss 8.5331   LearningRate 0.0429   Epoch: 13   Global Step: 69730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:51:22,932-Speed 10462.05 samples/sec   Loss 8.8030   LearningRate 0.0429   Epoch: 13   Global Step: 69740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:51:23,901-Speed 10589.84 samples/sec   Loss 8.6115   LearningRate 0.0429   Epoch: 13   Global Step: 69750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:51:24,892-Speed 10335.17 samples/sec   Loss 8.8206   LearningRate 0.0429   Epoch: 13   Global Step: 69760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:51:25,814-Speed 11110.42 samples/sec   Loss 8.7404   LearningRate 0.0429   Epoch: 13   Global Step: 69770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:51:26,778-Speed 10647.70 samples/sec   Loss 8.6111   LearningRate 0.0429   Epoch: 13   Global Step: 69780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:51:27,736-Speed 10698.77 samples/sec   Loss 8.4466   LearningRate 0.0429   Epoch: 13   Global Step: 69790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:51:28,704-Speed 10593.02 samples/sec   Loss 8.6499   LearningRate 0.0429   Epoch: 13   Global Step: 69800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:51:29,674-Speed 10562.41 samples/sec   Loss 8.7963   LearningRate 0.0429   Epoch: 13   Global Step: 69810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:51:30,633-Speed 10688.52 samples/sec   Loss 8.5287   LearningRate 0.0429   Epoch: 13   Global Step: 69820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:51:31,610-Speed 10496.98 samples/sec   Loss 8.7342   LearningRate 0.0429   Epoch: 13   Global Step: 69830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:51:32,590-Speed 10457.74 samples/sec   Loss 8.4462   LearningRate 0.0429   Epoch: 13   Global Step: 69840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:51:33,575-Speed 10401.24 samples/sec   Loss 8.7720   LearningRate 0.0429   Epoch: 13   Global Step: 69850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:51:34,519-Speed 10855.11 samples/sec   Loss 8.6765   LearningRate 0.0429   Epoch: 13   Global Step: 69860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:51:35,445-Speed 11062.81 samples/sec   Loss 8.6747   LearningRate 0.0429   Epoch: 13   Global Step: 69870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:51:36,416-Speed 10561.12 samples/sec   Loss 8.7438   LearningRate 0.0429   Epoch: 13   Global Step: 69880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:51:37,372-Speed 10719.55 samples/sec   Loss 8.6946   LearningRate 0.0428   Epoch: 13   Global Step: 69890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:51:38,347-Speed 10517.64 samples/sec   Loss 8.5681   LearningRate 0.0428   Epoch: 13   Global Step: 69900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:51:39,314-Speed 10598.35 samples/sec   Loss 8.5760   LearningRate 0.0428   Epoch: 13   Global Step: 69910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:51:40,293-Speed 10459.57 samples/sec   Loss 8.6091   LearningRate 0.0428   Epoch: 13   Global Step: 69920   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:51:41,272-Speed 10470.75 samples/sec   Loss 8.6125   LearningRate 0.0428   Epoch: 13   Global Step: 69930   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:51:42,247-Speed 10511.31 samples/sec   Loss 8.5504   LearningRate 0.0428   Epoch: 13   Global Step: 69940   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:51:43,235-Speed 10370.51 samples/sec   Loss 8.6895   LearningRate 0.0428   Epoch: 13   Global Step: 69950   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:51:44,222-Speed 10385.83 samples/sec   Loss 8.6657   LearningRate 0.0428   Epoch: 13   Global Step: 69960   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:51:45,203-Speed 10452.15 samples/sec   Loss 8.7447   LearningRate 0.0428   Epoch: 13   Global Step: 69970   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:51:46,115-Speed 11240.16 samples/sec   Loss 8.6962   LearningRate 0.0428   Epoch: 13   Global Step: 69980   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:51:47,099-Speed 10413.69 samples/sec   Loss 8.7019   LearningRate 0.0428   Epoch: 13   Global Step: 69990   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:51:48,146-Speed 9792.96 samples/sec   Loss 8.7064   LearningRate 0.0428   Epoch: 13   Global Step: 70000   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:52:10,424-[lfw][70000]XNorm: 12.308245
Training: 2022-04-11 01:52:10,425-[lfw][70000]Accuracy-Flip: 0.99467+-0.00420
Training: 2022-04-11 01:52:10,425-[lfw][70000]Accuracy-Highest: 0.99583
Training: 2022-04-11 01:52:35,872-[cfp_fp][70000]XNorm: 10.418238
Training: 2022-04-11 01:52:35,873-[cfp_fp][70000]Accuracy-Flip: 0.95186+-0.01044
Training: 2022-04-11 01:52:35,874-[cfp_fp][70000]Accuracy-Highest: 0.95743
Training: 2022-04-11 01:52:58,187-[agedb_30][70000]XNorm: 12.007892
Training: 2022-04-11 01:52:58,188-[agedb_30][70000]Accuracy-Flip: 0.95767+-0.01014
Training: 2022-04-11 01:52:58,188-[agedb_30][70000]Accuracy-Highest: 0.96283
Training: 2022-04-11 01:52:59,165-Speed 144.19 samples/sec   Loss 8.6801   LearningRate 0.0428   Epoch: 13   Global Step: 70010   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:53:00,132-Speed 10601.32 samples/sec   Loss 8.6591   LearningRate 0.0428   Epoch: 13   Global Step: 70020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:53:01,079-Speed 10816.64 samples/sec   Loss 8.6381   LearningRate 0.0428   Epoch: 13   Global Step: 70030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:53:02,095-Speed 10086.94 samples/sec   Loss 8.5477   LearningRate 0.0427   Epoch: 13   Global Step: 70040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:53:03,088-Speed 10324.38 samples/sec   Loss 8.5850   LearningRate 0.0427   Epoch: 13   Global Step: 70050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:53:04,038-Speed 10796.02 samples/sec   Loss 8.6641   LearningRate 0.0427   Epoch: 13   Global Step: 70060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:53:05,026-Speed 10372.95 samples/sec   Loss 8.7908   LearningRate 0.0427   Epoch: 13   Global Step: 70070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:53:05,961-Speed 10955.12 samples/sec   Loss 8.5567   LearningRate 0.0427   Epoch: 13   Global Step: 70080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:53:06,928-Speed 10597.41 samples/sec   Loss 8.7550   LearningRate 0.0427   Epoch: 13   Global Step: 70090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:53:07,906-Speed 10487.98 samples/sec   Loss 8.8399   LearningRate 0.0427   Epoch: 13   Global Step: 70100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:53:08,852-Speed 10833.40 samples/sec   Loss 8.6772   LearningRate 0.0427   Epoch: 13   Global Step: 70110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:53:09,824-Speed 10541.23 samples/sec   Loss 8.5561   LearningRate 0.0427   Epoch: 13   Global Step: 70120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:53:10,805-Speed 10450.52 samples/sec   Loss 8.5047   LearningRate 0.0427   Epoch: 13   Global Step: 70130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:53:11,853-Speed 9784.27 samples/sec   Loss 8.4899   LearningRate 0.0427   Epoch: 13   Global Step: 70140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:53:12,824-Speed 10550.55 samples/sec   Loss 8.7340   LearningRate 0.0427   Epoch: 13   Global Step: 70150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:53:13,820-Speed 10287.42 samples/sec   Loss 8.5725   LearningRate 0.0427   Epoch: 13   Global Step: 70160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:53:14,790-Speed 10569.34 samples/sec   Loss 8.7715   LearningRate 0.0427   Epoch: 13   Global Step: 70170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:53:15,759-Speed 10576.61 samples/sec   Loss 8.5632   LearningRate 0.0427   Epoch: 13   Global Step: 70180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:53:16,729-Speed 10563.84 samples/sec   Loss 8.7262   LearningRate 0.0427   Epoch: 13   Global Step: 70190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:53:17,683-Speed 10742.08 samples/sec   Loss 8.7288   LearningRate 0.0426   Epoch: 13   Global Step: 70200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:53:18,635-Speed 10764.39 samples/sec   Loss 8.6172   LearningRate 0.0426   Epoch: 13   Global Step: 70210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:53:19,578-Speed 10877.38 samples/sec   Loss 8.7756   LearningRate 0.0426   Epoch: 13   Global Step: 70220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:53:20,556-Speed 10477.86 samples/sec   Loss 8.6668   LearningRate 0.0426   Epoch: 13   Global Step: 70230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:53:21,530-Speed 10524.74 samples/sec   Loss 8.7375   LearningRate 0.0426   Epoch: 13   Global Step: 70240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:53:22,535-Speed 10200.19 samples/sec   Loss 8.7346   LearningRate 0.0426   Epoch: 13   Global Step: 70250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:53:23,494-Speed 10686.37 samples/sec   Loss 8.6058   LearningRate 0.0426   Epoch: 13   Global Step: 70260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:53:24,490-Speed 10293.80 samples/sec   Loss 8.7232   LearningRate 0.0426   Epoch: 13   Global Step: 70270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:53:25,448-Speed 10698.08 samples/sec   Loss 8.6855   LearningRate 0.0426   Epoch: 13   Global Step: 70280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:53:26,407-Speed 10691.43 samples/sec   Loss 8.4866   LearningRate 0.0426   Epoch: 13   Global Step: 70290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:53:27,358-Speed 10789.55 samples/sec   Loss 8.6034   LearningRate 0.0426   Epoch: 13   Global Step: 70300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:53:28,383-Speed 9998.86 samples/sec   Loss 8.6458   LearningRate 0.0426   Epoch: 13   Global Step: 70310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:53:29,345-Speed 10651.78 samples/sec   Loss 8.6946   LearningRate 0.0426   Epoch: 13   Global Step: 70320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:53:30,336-Speed 10346.61 samples/sec   Loss 8.4781   LearningRate 0.0426   Epoch: 13   Global Step: 70330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:53:31,286-Speed 10784.97 samples/sec   Loss 8.5506   LearningRate 0.0426   Epoch: 13   Global Step: 70340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:53:32,257-Speed 10553.96 samples/sec   Loss 8.7635   LearningRate 0.0425   Epoch: 13   Global Step: 70350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:53:33,284-Speed 9985.96 samples/sec   Loss 8.7161   LearningRate 0.0425   Epoch: 13   Global Step: 70360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:53:34,292-Speed 10168.39 samples/sec   Loss 8.5298   LearningRate 0.0425   Epoch: 13   Global Step: 70370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:53:35,260-Speed 10592.12 samples/sec   Loss 8.5980   LearningRate 0.0425   Epoch: 13   Global Step: 70380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:53:36,205-Speed 10847.70 samples/sec   Loss 8.6659   LearningRate 0.0425   Epoch: 13   Global Step: 70390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:53:37,114-Speed 11273.36 samples/sec   Loss 8.6202   LearningRate 0.0425   Epoch: 13   Global Step: 70400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:53:38,156-Speed 9842.45 samples/sec   Loss 8.7510   LearningRate 0.0425   Epoch: 13   Global Step: 70410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:53:39,137-Speed 10448.31 samples/sec   Loss 8.7447   LearningRate 0.0425   Epoch: 13   Global Step: 70420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:53:40,121-Speed 10409.71 samples/sec   Loss 8.7191   LearningRate 0.0425   Epoch: 13   Global Step: 70430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:53:41,128-Speed 10173.29 samples/sec   Loss 8.4603   LearningRate 0.0425   Epoch: 13   Global Step: 70440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:53:42,070-Speed 10887.06 samples/sec   Loss 8.6035   LearningRate 0.0425   Epoch: 13   Global Step: 70450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:53:43,016-Speed 10830.85 samples/sec   Loss 8.6372   LearningRate 0.0425   Epoch: 13   Global Step: 70460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:53:43,993-Speed 10499.58 samples/sec   Loss 8.7330   LearningRate 0.0425   Epoch: 13   Global Step: 70470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:53:44,944-Speed 10772.25 samples/sec   Loss 8.4884   LearningRate 0.0425   Epoch: 13   Global Step: 70480   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 01:53:45,860-Speed 11195.90 samples/sec   Loss 8.5181   LearningRate 0.0425   Epoch: 13   Global Step: 70490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:53:46,854-Speed 10313.60 samples/sec   Loss 8.6136   LearningRate 0.0425   Epoch: 13   Global Step: 70500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:53:47,785-Speed 11007.73 samples/sec   Loss 8.6831   LearningRate 0.0424   Epoch: 13   Global Step: 70510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:53:48,749-Speed 10626.25 samples/sec   Loss 8.6545   LearningRate 0.0424   Epoch: 13   Global Step: 70520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:53:49,752-Speed 10216.33 samples/sec   Loss 8.4586   LearningRate 0.0424   Epoch: 13   Global Step: 70530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:53:50,755-Speed 10227.46 samples/sec   Loss 8.6436   LearningRate 0.0424   Epoch: 13   Global Step: 70540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:53:51,712-Speed 10712.31 samples/sec   Loss 8.5304   LearningRate 0.0424   Epoch: 13   Global Step: 70550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:53:52,657-Speed 10841.27 samples/sec   Loss 8.6267   LearningRate 0.0424   Epoch: 13   Global Step: 70560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:53:53,641-Speed 10409.33 samples/sec   Loss 8.7489   LearningRate 0.0424   Epoch: 13   Global Step: 70570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:53:54,673-Speed 9943.74 samples/sec   Loss 8.5631   LearningRate 0.0424   Epoch: 13   Global Step: 70580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:53:55,620-Speed 10815.49 samples/sec   Loss 8.6129   LearningRate 0.0424   Epoch: 13   Global Step: 70590   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 01:53:56,562-Speed 10878.63 samples/sec   Loss 8.8080   LearningRate 0.0424   Epoch: 13   Global Step: 70600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:53:57,526-Speed 10628.22 samples/sec   Loss 8.7493   LearningRate 0.0424   Epoch: 13   Global Step: 70610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:53:58,592-Speed 9622.23 samples/sec   Loss 8.7142   LearningRate 0.0424   Epoch: 13   Global Step: 70620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:53:59,530-Speed 10935.49 samples/sec   Loss 8.5695   LearningRate 0.0424   Epoch: 13   Global Step: 70630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:54:00,477-Speed 10825.16 samples/sec   Loss 8.6005   LearningRate 0.0424   Epoch: 13   Global Step: 70640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:54:01,423-Speed 10840.25 samples/sec   Loss 8.5286   LearningRate 0.0424   Epoch: 13   Global Step: 70650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:54:02,487-Speed 9634.85 samples/sec   Loss 8.5702   LearningRate 0.0423   Epoch: 13   Global Step: 70660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:54:03,428-Speed 10892.73 samples/sec   Loss 8.7434   LearningRate 0.0423   Epoch: 13   Global Step: 70670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:54:04,374-Speed 10825.75 samples/sec   Loss 8.5954   LearningRate 0.0423   Epoch: 13   Global Step: 70680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:54:05,316-Speed 10876.88 samples/sec   Loss 8.7130   LearningRate 0.0423   Epoch: 13   Global Step: 70690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:54:06,281-Speed 10626.00 samples/sec   Loss 8.4587   LearningRate 0.0423   Epoch: 13   Global Step: 70700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:54:07,271-Speed 10350.65 samples/sec   Loss 8.6818   LearningRate 0.0423   Epoch: 13   Global Step: 70710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:54:08,232-Speed 10669.97 samples/sec   Loss 8.7831   LearningRate 0.0423   Epoch: 13   Global Step: 70720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:54:09,175-Speed 10866.00 samples/sec   Loss 8.6882   LearningRate 0.0423   Epoch: 13   Global Step: 70730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:54:10,153-Speed 10474.82 samples/sec   Loss 8.6524   LearningRate 0.0423   Epoch: 13   Global Step: 70740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:54:11,106-Speed 10757.63 samples/sec   Loss 8.6059   LearningRate 0.0423   Epoch: 13   Global Step: 70750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:54:12,074-Speed 10594.34 samples/sec   Loss 8.6984   LearningRate 0.0423   Epoch: 13   Global Step: 70760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:54:13,050-Speed 10498.00 samples/sec   Loss 8.7725   LearningRate 0.0423   Epoch: 13   Global Step: 70770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:54:14,049-Speed 10263.67 samples/sec   Loss 8.6743   LearningRate 0.0423   Epoch: 13   Global Step: 70780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:54:15,020-Speed 10555.43 samples/sec   Loss 8.7195   LearningRate 0.0423   Epoch: 13   Global Step: 70790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:54:15,981-Speed 10664.44 samples/sec   Loss 8.7149   LearningRate 0.0423   Epoch: 13   Global Step: 70800   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:54:16,988-Speed 10168.21 samples/sec   Loss 8.6318   LearningRate 0.0423   Epoch: 13   Global Step: 70810   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:54:29,262-Speed 834.41 samples/sec   Loss 7.9807   LearningRate 0.0422   Epoch: 14   Global Step: 70820   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:54:30,412-Speed 8918.65 samples/sec   Loss 7.7430   LearningRate 0.0422   Epoch: 14   Global Step: 70830   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:54:31,398-Speed 10395.71 samples/sec   Loss 7.6086   LearningRate 0.0422   Epoch: 14   Global Step: 70840   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:54:32,478-Speed 9488.00 samples/sec   Loss 7.7947   LearningRate 0.0422   Epoch: 14   Global Step: 70850   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:54:33,710-Speed 8322.83 samples/sec   Loss 7.7010   LearningRate 0.0422   Epoch: 14   Global Step: 70860   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:54:34,683-Speed 10544.14 samples/sec   Loss 7.8495   LearningRate 0.0422   Epoch: 14   Global Step: 70870   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:54:35,674-Speed 10332.85 samples/sec   Loss 7.6792   LearningRate 0.0422   Epoch: 14   Global Step: 70880   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:54:36,651-Speed 10496.06 samples/sec   Loss 7.7567   LearningRate 0.0422   Epoch: 14   Global Step: 70890   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:54:37,629-Speed 10478.05 samples/sec   Loss 7.6976   LearningRate 0.0422   Epoch: 14   Global Step: 70900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:54:38,577-Speed 10814.68 samples/sec   Loss 7.8000   LearningRate 0.0422   Epoch: 14   Global Step: 70910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:54:39,546-Speed 10585.34 samples/sec   Loss 8.0440   LearningRate 0.0422   Epoch: 14   Global Step: 70920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:54:40,515-Speed 10572.71 samples/sec   Loss 7.8332   LearningRate 0.0422   Epoch: 14   Global Step: 70930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:54:41,466-Speed 10784.08 samples/sec   Loss 7.7631   LearningRate 0.0422   Epoch: 14   Global Step: 70940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:54:42,498-Speed 9933.70 samples/sec   Loss 7.9086   LearningRate 0.0422   Epoch: 14   Global Step: 70950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:54:43,470-Speed 10539.10 samples/sec   Loss 7.8678   LearningRate 0.0422   Epoch: 14   Global Step: 70960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:54:44,472-Speed 10223.03 samples/sec   Loss 7.7945   LearningRate 0.0421   Epoch: 14   Global Step: 70970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:54:45,440-Speed 10598.47 samples/sec   Loss 7.6558   LearningRate 0.0421   Epoch: 14   Global Step: 70980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:54:46,564-Speed 9118.64 samples/sec   Loss 7.8168   LearningRate 0.0421   Epoch: 14   Global Step: 70990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:54:47,550-Speed 10384.14 samples/sec   Loss 7.7659   LearningRate 0.0421   Epoch: 14   Global Step: 71000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:54:48,652-Speed 9305.27 samples/sec   Loss 7.8875   LearningRate 0.0421   Epoch: 14   Global Step: 71010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:54:49,653-Speed 10238.86 samples/sec   Loss 7.9050   LearningRate 0.0421   Epoch: 14   Global Step: 71020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:54:50,587-Speed 10975.95 samples/sec   Loss 7.8540   LearningRate 0.0421   Epoch: 14   Global Step: 71030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:54:51,529-Speed 10882.96 samples/sec   Loss 7.9254   LearningRate 0.0421   Epoch: 14   Global Step: 71040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:54:52,530-Speed 10231.02 samples/sec   Loss 7.9023   LearningRate 0.0421   Epoch: 14   Global Step: 71050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:54:53,516-Speed 10392.93 samples/sec   Loss 7.9209   LearningRate 0.0421   Epoch: 14   Global Step: 71060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:54:54,509-Speed 10333.50 samples/sec   Loss 7.8110   LearningRate 0.0421   Epoch: 14   Global Step: 71070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:54:55,428-Speed 11144.07 samples/sec   Loss 7.7876   LearningRate 0.0421   Epoch: 14   Global Step: 71080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:54:56,401-Speed 10535.45 samples/sec   Loss 7.9488   LearningRate 0.0421   Epoch: 14   Global Step: 71090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:54:57,413-Speed 10126.71 samples/sec   Loss 8.0141   LearningRate 0.0421   Epoch: 14   Global Step: 71100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:54:58,327-Speed 11213.22 samples/sec   Loss 7.8469   LearningRate 0.0421   Epoch: 14   Global Step: 71110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:54:59,295-Speed 10596.69 samples/sec   Loss 7.9695   LearningRate 0.0421   Epoch: 14   Global Step: 71120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:55:00,318-Speed 10014.52 samples/sec   Loss 7.9271   LearningRate 0.0420   Epoch: 14   Global Step: 71130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:55:01,279-Speed 10658.82 samples/sec   Loss 8.0830   LearningRate 0.0420   Epoch: 14   Global Step: 71140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:55:02,245-Speed 10610.71 samples/sec   Loss 8.0402   LearningRate 0.0420   Epoch: 14   Global Step: 71150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:55:03,242-Speed 10284.48 samples/sec   Loss 8.0815   LearningRate 0.0420   Epoch: 14   Global Step: 71160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:55:04,210-Speed 10590.81 samples/sec   Loss 7.9276   LearningRate 0.0420   Epoch: 14   Global Step: 71170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:55:05,257-Speed 9783.69 samples/sec   Loss 7.9485   LearningRate 0.0420   Epoch: 14   Global Step: 71180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:55:06,276-Speed 10061.33 samples/sec   Loss 7.7802   LearningRate 0.0420   Epoch: 14   Global Step: 71190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:55:07,240-Speed 10626.44 samples/sec   Loss 7.9783   LearningRate 0.0420   Epoch: 14   Global Step: 71200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:55:08,241-Speed 10246.23 samples/sec   Loss 8.0467   LearningRate 0.0420   Epoch: 14   Global Step: 71210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:55:09,226-Speed 10400.20 samples/sec   Loss 8.0454   LearningRate 0.0420   Epoch: 14   Global Step: 71220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:55:10,228-Speed 10224.31 samples/sec   Loss 8.0076   LearningRate 0.0420   Epoch: 14   Global Step: 71230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:55:11,293-Speed 9615.98 samples/sec   Loss 8.0691   LearningRate 0.0420   Epoch: 14   Global Step: 71240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:55:12,289-Speed 10304.32 samples/sec   Loss 7.9723   LearningRate 0.0420   Epoch: 14   Global Step: 71250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:55:13,247-Speed 10701.58 samples/sec   Loss 8.0285   LearningRate 0.0420   Epoch: 14   Global Step: 71260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:55:14,208-Speed 10672.31 samples/sec   Loss 8.1383   LearningRate 0.0420   Epoch: 14   Global Step: 71270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:55:15,193-Speed 10405.49 samples/sec   Loss 8.0056   LearningRate 0.0419   Epoch: 14   Global Step: 71280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:55:16,162-Speed 10575.28 samples/sec   Loss 8.0818   LearningRate 0.0419   Epoch: 14   Global Step: 71290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:55:17,187-Speed 9997.81 samples/sec   Loss 8.1743   LearningRate 0.0419   Epoch: 14   Global Step: 71300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:55:18,148-Speed 10668.34 samples/sec   Loss 7.9416   LearningRate 0.0419   Epoch: 14   Global Step: 71310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:55:19,087-Speed 10914.53 samples/sec   Loss 7.9677   LearningRate 0.0419   Epoch: 14   Global Step: 71320   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:55:20,079-Speed 10325.79 samples/sec   Loss 8.0732   LearningRate 0.0419   Epoch: 14   Global Step: 71330   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:55:21,089-Speed 10153.75 samples/sec   Loss 8.0108   LearningRate 0.0419   Epoch: 14   Global Step: 71340   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:55:22,149-Speed 9669.76 samples/sec   Loss 7.9905   LearningRate 0.0419   Epoch: 14   Global Step: 71350   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:55:23,174-Speed 10008.40 samples/sec   Loss 8.1428   LearningRate 0.0419   Epoch: 14   Global Step: 71360   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:55:24,127-Speed 10761.35 samples/sec   Loss 8.1745   LearningRate 0.0419   Epoch: 14   Global Step: 71370   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:55:25,066-Speed 10915.97 samples/sec   Loss 8.1960   LearningRate 0.0419   Epoch: 14   Global Step: 71380   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:55:25,998-Speed 11001.43 samples/sec   Loss 7.9748   LearningRate 0.0419   Epoch: 14   Global Step: 71390   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:55:26,920-Speed 11112.89 samples/sec   Loss 8.1899   LearningRate 0.0419   Epoch: 14   Global Step: 71400   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:55:27,893-Speed 10527.11 samples/sec   Loss 8.0686   LearningRate 0.0419   Epoch: 14   Global Step: 71410   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:55:28,848-Speed 10736.24 samples/sec   Loss 8.2197   LearningRate 0.0419   Epoch: 14   Global Step: 71420   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:55:29,837-Speed 10369.14 samples/sec   Loss 8.2318   LearningRate 0.0419   Epoch: 14   Global Step: 71430   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:55:30,793-Speed 10724.60 samples/sec   Loss 8.1679   LearningRate 0.0418   Epoch: 14   Global Step: 71440   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:55:31,793-Speed 10247.05 samples/sec   Loss 8.0770   LearningRate 0.0418   Epoch: 14   Global Step: 71450   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:55:32,798-Speed 10192.39 samples/sec   Loss 8.0710   LearningRate 0.0418   Epoch: 14   Global Step: 71460   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:55:33,765-Speed 10599.44 samples/sec   Loss 8.1211   LearningRate 0.0418   Epoch: 14   Global Step: 71470   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:55:34,745-Speed 10454.87 samples/sec   Loss 8.2211   LearningRate 0.0418   Epoch: 14   Global Step: 71480   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:55:35,688-Speed 10871.41 samples/sec   Loss 8.1490   LearningRate 0.0418   Epoch: 14   Global Step: 71490   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:55:36,632-Speed 10856.34 samples/sec   Loss 8.1113   LearningRate 0.0418   Epoch: 14   Global Step: 71500   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:55:37,586-Speed 10746.49 samples/sec   Loss 8.1609   LearningRate 0.0418   Epoch: 14   Global Step: 71510   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:55:38,557-Speed 10552.40 samples/sec   Loss 8.1246   LearningRate 0.0418   Epoch: 14   Global Step: 71520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:55:39,491-Speed 10977.63 samples/sec   Loss 7.9331   LearningRate 0.0418   Epoch: 14   Global Step: 71530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:55:40,468-Speed 10489.60 samples/sec   Loss 8.0695   LearningRate 0.0418   Epoch: 14   Global Step: 71540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:55:41,427-Speed 10697.72 samples/sec   Loss 8.1825   LearningRate 0.0418   Epoch: 14   Global Step: 71550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:55:42,422-Speed 10292.60 samples/sec   Loss 8.1234   LearningRate 0.0418   Epoch: 14   Global Step: 71560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:55:43,430-Speed 10170.22 samples/sec   Loss 8.2122   LearningRate 0.0418   Epoch: 14   Global Step: 71570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:55:44,400-Speed 10573.21 samples/sec   Loss 8.1577   LearningRate 0.0418   Epoch: 14   Global Step: 71580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:55:45,355-Speed 10734.97 samples/sec   Loss 8.3291   LearningRate 0.0418   Epoch: 14   Global Step: 71590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:55:46,296-Speed 10881.29 samples/sec   Loss 8.1715   LearningRate 0.0417   Epoch: 14   Global Step: 71600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:55:47,251-Speed 10734.25 samples/sec   Loss 8.3232   LearningRate 0.0417   Epoch: 14   Global Step: 71610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:55:48,197-Speed 10836.44 samples/sec   Loss 8.2066   LearningRate 0.0417   Epoch: 14   Global Step: 71620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:55:49,184-Speed 10379.68 samples/sec   Loss 8.1730   LearningRate 0.0417   Epoch: 14   Global Step: 71630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:55:50,155-Speed 10555.67 samples/sec   Loss 8.1461   LearningRate 0.0417   Epoch: 14   Global Step: 71640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:55:51,114-Speed 10690.17 samples/sec   Loss 8.1768   LearningRate 0.0417   Epoch: 14   Global Step: 71650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:55:52,108-Speed 10317.28 samples/sec   Loss 8.2214   LearningRate 0.0417   Epoch: 14   Global Step: 71660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:55:53,050-Speed 10876.29 samples/sec   Loss 8.1116   LearningRate 0.0417   Epoch: 14   Global Step: 71670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:55:54,057-Speed 10173.31 samples/sec   Loss 8.3045   LearningRate 0.0417   Epoch: 14   Global Step: 71680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:55:55,030-Speed 10538.70 samples/sec   Loss 8.3209   LearningRate 0.0417   Epoch: 14   Global Step: 71690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:55:56,033-Speed 10218.63 samples/sec   Loss 8.2095   LearningRate 0.0417   Epoch: 14   Global Step: 71700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:55:56,994-Speed 10663.80 samples/sec   Loss 8.1033   LearningRate 0.0417   Epoch: 14   Global Step: 71710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:55:57,961-Speed 10604.10 samples/sec   Loss 8.4062   LearningRate 0.0417   Epoch: 14   Global Step: 71720   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 01:55:58,936-Speed 10509.05 samples/sec   Loss 8.4373   LearningRate 0.0417   Epoch: 14   Global Step: 71730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:55:59,922-Speed 10398.24 samples/sec   Loss 8.2631   LearningRate 0.0417   Epoch: 14   Global Step: 71740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:56:00,860-Speed 10921.43 samples/sec   Loss 8.1052   LearningRate 0.0416   Epoch: 14   Global Step: 71750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:56:01,930-Speed 9585.95 samples/sec   Loss 8.3483   LearningRate 0.0416   Epoch: 14   Global Step: 71760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:56:02,924-Speed 10307.72 samples/sec   Loss 8.2037   LearningRate 0.0416   Epoch: 14   Global Step: 71770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:56:03,869-Speed 10850.97 samples/sec   Loss 8.0989   LearningRate 0.0416   Epoch: 14   Global Step: 71780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:56:04,834-Speed 10611.61 samples/sec   Loss 8.2926   LearningRate 0.0416   Epoch: 14   Global Step: 71790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:56:05,796-Speed 10659.16 samples/sec   Loss 8.4320   LearningRate 0.0416   Epoch: 14   Global Step: 71800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:56:06,761-Speed 10623.08 samples/sec   Loss 8.2404   LearningRate 0.0416   Epoch: 14   Global Step: 71810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:56:07,713-Speed 10766.85 samples/sec   Loss 8.3293   LearningRate 0.0416   Epoch: 14   Global Step: 71820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:56:08,693-Speed 10459.57 samples/sec   Loss 8.1477   LearningRate 0.0416   Epoch: 14   Global Step: 71830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:56:09,644-Speed 10776.77 samples/sec   Loss 8.4163   LearningRate 0.0416   Epoch: 14   Global Step: 71840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:56:10,623-Speed 10465.83 samples/sec   Loss 8.3240   LearningRate 0.0416   Epoch: 14   Global Step: 71850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:56:11,622-Speed 10262.09 samples/sec   Loss 8.4479   LearningRate 0.0416   Epoch: 14   Global Step: 71860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:56:12,601-Speed 10464.09 samples/sec   Loss 8.2722   LearningRate 0.0416   Epoch: 14   Global Step: 71870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:56:13,598-Speed 10329.92 samples/sec   Loss 8.4479   LearningRate 0.0416   Epoch: 14   Global Step: 71880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:56:14,584-Speed 10397.57 samples/sec   Loss 8.2478   LearningRate 0.0416   Epoch: 14   Global Step: 71890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:56:15,557-Speed 10525.15 samples/sec   Loss 8.3534   LearningRate 0.0416   Epoch: 14   Global Step: 71900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:56:16,641-Speed 9461.69 samples/sec   Loss 8.4748   LearningRate 0.0415   Epoch: 14   Global Step: 71910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:56:17,597-Speed 10717.60 samples/sec   Loss 8.2024   LearningRate 0.0415   Epoch: 14   Global Step: 71920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:56:18,571-Speed 10524.42 samples/sec   Loss 8.2457   LearningRate 0.0415   Epoch: 14   Global Step: 71930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:56:19,523-Speed 10766.64 samples/sec   Loss 8.2631   LearningRate 0.0415   Epoch: 14   Global Step: 71940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:56:20,543-Speed 10046.12 samples/sec   Loss 8.1636   LearningRate 0.0415   Epoch: 14   Global Step: 71950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:56:21,508-Speed 10626.97 samples/sec   Loss 8.4666   LearningRate 0.0415   Epoch: 14   Global Step: 71960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:56:22,463-Speed 10736.27 samples/sec   Loss 8.3684   LearningRate 0.0415   Epoch: 14   Global Step: 71970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:56:23,446-Speed 10423.52 samples/sec   Loss 8.0566   LearningRate 0.0415   Epoch: 14   Global Step: 71980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:56:24,426-Speed 10457.75 samples/sec   Loss 8.3710   LearningRate 0.0415   Epoch: 14   Global Step: 71990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:56:25,408-Speed 10439.31 samples/sec   Loss 8.2383   LearningRate 0.0415   Epoch: 14   Global Step: 72000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:56:47,725-[lfw][72000]XNorm: 11.983744
Training: 2022-04-11 01:56:47,726-[lfw][72000]Accuracy-Flip: 0.99517+-0.00411
Training: 2022-04-11 01:56:47,727-[lfw][72000]Accuracy-Highest: 0.99583
Training: 2022-04-11 01:57:13,391-[cfp_fp][72000]XNorm: 10.168832
Training: 2022-04-11 01:57:13,392-[cfp_fp][72000]Accuracy-Flip: 0.95586+-0.01070
Training: 2022-04-11 01:57:13,394-[cfp_fp][72000]Accuracy-Highest: 0.95743
Training: 2022-04-11 01:57:36,004-[agedb_30][72000]XNorm: 11.668525
Training: 2022-04-11 01:57:36,005-[agedb_30][72000]Accuracy-Flip: 0.96183+-0.00713
Training: 2022-04-11 01:57:36,006-[agedb_30][72000]Accuracy-Highest: 0.96283
Training: 2022-04-11 01:57:36,970-Speed 143.10 samples/sec   Loss 8.3715   LearningRate 0.0415   Epoch: 14   Global Step: 72010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:57:37,907-Speed 10935.96 samples/sec   Loss 8.3645   LearningRate 0.0415   Epoch: 14   Global Step: 72020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:57:38,879-Speed 10541.23 samples/sec   Loss 8.3448   LearningRate 0.0415   Epoch: 14   Global Step: 72030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:57:39,813-Speed 10972.74 samples/sec   Loss 8.3180   LearningRate 0.0415   Epoch: 14   Global Step: 72040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:57:40,797-Speed 10421.44 samples/sec   Loss 8.2819   LearningRate 0.0415   Epoch: 14   Global Step: 72050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:57:41,756-Speed 10692.99 samples/sec   Loss 8.2719   LearningRate 0.0415   Epoch: 14   Global Step: 72060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:57:42,744-Speed 10373.12 samples/sec   Loss 8.2257   LearningRate 0.0414   Epoch: 14   Global Step: 72070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:57:43,713-Speed 10581.71 samples/sec   Loss 8.2587   LearningRate 0.0414   Epoch: 14   Global Step: 72080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:57:44,676-Speed 10649.02 samples/sec   Loss 8.2298   LearningRate 0.0414   Epoch: 14   Global Step: 72090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:57:45,627-Speed 10765.73 samples/sec   Loss 8.3024   LearningRate 0.0414   Epoch: 14   Global Step: 72100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:57:46,564-Speed 10943.36 samples/sec   Loss 8.3310   LearningRate 0.0414   Epoch: 14   Global Step: 72110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:57:47,609-Speed 9809.89 samples/sec   Loss 8.3975   LearningRate 0.0414   Epoch: 14   Global Step: 72120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:57:48,572-Speed 10649.19 samples/sec   Loss 8.3782   LearningRate 0.0414   Epoch: 14   Global Step: 72130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:57:49,529-Speed 10703.45 samples/sec   Loss 8.2885   LearningRate 0.0414   Epoch: 14   Global Step: 72140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:57:50,490-Speed 10669.40 samples/sec   Loss 8.2633   LearningRate 0.0414   Epoch: 14   Global Step: 72150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:57:51,452-Speed 10650.68 samples/sec   Loss 8.2294   LearningRate 0.0414   Epoch: 14   Global Step: 72160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:57:52,420-Speed 10591.22 samples/sec   Loss 8.2922   LearningRate 0.0414   Epoch: 14   Global Step: 72170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:57:53,372-Speed 10775.79 samples/sec   Loss 8.4681   LearningRate 0.0414   Epoch: 14   Global Step: 72180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:57:54,269-Speed 11414.48 samples/sec   Loss 8.3408   LearningRate 0.0414   Epoch: 14   Global Step: 72190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:57:55,224-Speed 10732.29 samples/sec   Loss 8.3588   LearningRate 0.0414   Epoch: 14   Global Step: 72200   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:57:56,216-Speed 10330.38 samples/sec   Loss 8.2344   LearningRate 0.0414   Epoch: 14   Global Step: 72210   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:57:57,203-Speed 10387.36 samples/sec   Loss 8.3474   LearningRate 0.0414   Epoch: 14   Global Step: 72220   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:57:58,124-Speed 11118.63 samples/sec   Loss 8.3323   LearningRate 0.0413   Epoch: 14   Global Step: 72230   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:57:59,109-Speed 10409.59 samples/sec   Loss 8.3246   LearningRate 0.0413   Epoch: 14   Global Step: 72240   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:58:00,072-Speed 10641.25 samples/sec   Loss 8.3311   LearningRate 0.0413   Epoch: 14   Global Step: 72250   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:58:01,017-Speed 10856.60 samples/sec   Loss 8.4539   LearningRate 0.0413   Epoch: 14   Global Step: 72260   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:58:01,987-Speed 10559.11 samples/sec   Loss 8.4411   LearningRate 0.0413   Epoch: 14   Global Step: 72270   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:58:02,940-Speed 10756.83 samples/sec   Loss 8.4035   LearningRate 0.0413   Epoch: 14   Global Step: 72280   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:58:03,895-Speed 10738.85 samples/sec   Loss 8.5189   LearningRate 0.0413   Epoch: 14   Global Step: 72290   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:58:04,856-Speed 10658.41 samples/sec   Loss 8.3862   LearningRate 0.0413   Epoch: 14   Global Step: 72300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:58:05,811-Speed 10733.95 samples/sec   Loss 8.2525   LearningRate 0.0413   Epoch: 14   Global Step: 72310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:58:06,775-Speed 10633.65 samples/sec   Loss 8.2406   LearningRate 0.0413   Epoch: 14   Global Step: 72320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:58:07,722-Speed 10818.86 samples/sec   Loss 8.3888   LearningRate 0.0413   Epoch: 14   Global Step: 72330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:58:08,682-Speed 10684.37 samples/sec   Loss 8.3058   LearningRate 0.0413   Epoch: 14   Global Step: 72340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:58:09,670-Speed 10367.23 samples/sec   Loss 8.3778   LearningRate 0.0413   Epoch: 14   Global Step: 72350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:58:10,627-Speed 10706.75 samples/sec   Loss 8.3810   LearningRate 0.0413   Epoch: 14   Global Step: 72360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:58:11,567-Speed 10898.71 samples/sec   Loss 8.3713   LearningRate 0.0413   Epoch: 14   Global Step: 72370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:58:12,595-Speed 9971.89 samples/sec   Loss 8.4460   LearningRate 0.0412   Epoch: 14   Global Step: 72380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:58:13,520-Speed 11084.38 samples/sec   Loss 8.4802   LearningRate 0.0412   Epoch: 14   Global Step: 72390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:58:14,476-Speed 10716.06 samples/sec   Loss 8.2073   LearningRate 0.0412   Epoch: 14   Global Step: 72400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:58:15,460-Speed 10414.39 samples/sec   Loss 8.3069   LearningRate 0.0412   Epoch: 14   Global Step: 72410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:58:16,430-Speed 10565.48 samples/sec   Loss 8.4734   LearningRate 0.0412   Epoch: 14   Global Step: 72420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:58:17,380-Speed 10792.77 samples/sec   Loss 8.4489   LearningRate 0.0412   Epoch: 14   Global Step: 72430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:58:18,280-Speed 11388.29 samples/sec   Loss 8.4994   LearningRate 0.0412   Epoch: 14   Global Step: 72440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:58:19,204-Speed 11103.35 samples/sec   Loss 8.4784   LearningRate 0.0412   Epoch: 14   Global Step: 72450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:58:20,170-Speed 10609.35 samples/sec   Loss 8.3329   LearningRate 0.0412   Epoch: 14   Global Step: 72460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:58:21,165-Speed 10299.40 samples/sec   Loss 8.2945   LearningRate 0.0412   Epoch: 14   Global Step: 72470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:58:22,131-Speed 10612.38 samples/sec   Loss 8.3602   LearningRate 0.0412   Epoch: 14   Global Step: 72480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:58:23,098-Speed 10589.76 samples/sec   Loss 8.2683   LearningRate 0.0412   Epoch: 14   Global Step: 72490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:58:24,108-Speed 10151.82 samples/sec   Loss 8.3697   LearningRate 0.0412   Epoch: 14   Global Step: 72500   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 01:58:25,056-Speed 10806.57 samples/sec   Loss 8.4709   LearningRate 0.0412   Epoch: 14   Global Step: 72510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:58:25,969-Speed 11222.53 samples/sec   Loss 8.3824   LearningRate 0.0412   Epoch: 14   Global Step: 72520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:58:26,939-Speed 10578.46 samples/sec   Loss 8.3541   LearningRate 0.0412   Epoch: 14   Global Step: 72530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:58:27,888-Speed 10789.93 samples/sec   Loss 8.4045   LearningRate 0.0411   Epoch: 14   Global Step: 72540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:58:28,889-Speed 10244.86 samples/sec   Loss 8.4739   LearningRate 0.0411   Epoch: 14   Global Step: 72550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:58:29,862-Speed 10535.73 samples/sec   Loss 8.3690   LearningRate 0.0411   Epoch: 14   Global Step: 72560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:58:30,818-Speed 10718.31 samples/sec   Loss 8.5983   LearningRate 0.0411   Epoch: 14   Global Step: 72570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:58:31,746-Speed 11048.79 samples/sec   Loss 8.2674   LearningRate 0.0411   Epoch: 14   Global Step: 72580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:58:32,761-Speed 10100.82 samples/sec   Loss 8.4218   LearningRate 0.0411   Epoch: 14   Global Step: 72590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:58:33,756-Speed 10299.99 samples/sec   Loss 8.5016   LearningRate 0.0411   Epoch: 14   Global Step: 72600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:58:34,709-Speed 10754.87 samples/sec   Loss 8.2164   LearningRate 0.0411   Epoch: 14   Global Step: 72610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:58:35,683-Speed 10530.83 samples/sec   Loss 8.5110   LearningRate 0.0411   Epoch: 14   Global Step: 72620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:58:36,664-Speed 10449.88 samples/sec   Loss 8.3219   LearningRate 0.0411   Epoch: 14   Global Step: 72630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:58:37,579-Speed 11191.87 samples/sec   Loss 8.4239   LearningRate 0.0411   Epoch: 14   Global Step: 72640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:58:38,536-Speed 10712.28 samples/sec   Loss 8.4822   LearningRate 0.0411   Epoch: 14   Global Step: 72650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:58:39,480-Speed 10859.39 samples/sec   Loss 8.5115   LearningRate 0.0411   Epoch: 14   Global Step: 72660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:58:40,422-Speed 10882.10 samples/sec   Loss 8.2182   LearningRate 0.0411   Epoch: 14   Global Step: 72670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:58:41,417-Speed 10290.72 samples/sec   Loss 8.3764   LearningRate 0.0411   Epoch: 14   Global Step: 72680   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:58:42,444-Speed 9976.36 samples/sec   Loss 8.3165   LearningRate 0.0411   Epoch: 14   Global Step: 72690   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:58:43,398-Speed 10747.49 samples/sec   Loss 8.3154   LearningRate 0.0410   Epoch: 14   Global Step: 72700   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:58:44,376-Speed 10483.05 samples/sec   Loss 8.4431   LearningRate 0.0410   Epoch: 14   Global Step: 72710   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:58:45,314-Speed 10922.67 samples/sec   Loss 8.3981   LearningRate 0.0410   Epoch: 14   Global Step: 72720   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:58:46,246-Speed 10994.27 samples/sec   Loss 8.3787   LearningRate 0.0410   Epoch: 14   Global Step: 72730   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:58:47,194-Speed 10809.26 samples/sec   Loss 8.4132   LearningRate 0.0410   Epoch: 14   Global Step: 72740   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:58:48,173-Speed 10479.50 samples/sec   Loss 8.4279   LearningRate 0.0410   Epoch: 14   Global Step: 72750   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:58:49,097-Speed 11099.36 samples/sec   Loss 8.4513   LearningRate 0.0410   Epoch: 14   Global Step: 72760   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:58:50,010-Speed 11222.70 samples/sec   Loss 8.4667   LearningRate 0.0410   Epoch: 14   Global Step: 72770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:58:50,962-Speed 10759.85 samples/sec   Loss 8.3909   LearningRate 0.0410   Epoch: 14   Global Step: 72780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:58:51,942-Speed 10461.44 samples/sec   Loss 8.3856   LearningRate 0.0410   Epoch: 14   Global Step: 72790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:58:52,881-Speed 10910.03 samples/sec   Loss 8.3914   LearningRate 0.0410   Epoch: 14   Global Step: 72800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:58:53,831-Speed 10797.40 samples/sec   Loss 8.3610   LearningRate 0.0410   Epoch: 14   Global Step: 72810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:58:54,768-Speed 10939.73 samples/sec   Loss 8.4160   LearningRate 0.0410   Epoch: 14   Global Step: 72820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:58:55,728-Speed 10676.17 samples/sec   Loss 8.4518   LearningRate 0.0410   Epoch: 14   Global Step: 72830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:58:56,671-Speed 10860.42 samples/sec   Loss 8.3354   LearningRate 0.0410   Epoch: 14   Global Step: 72840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:58:57,672-Speed 10245.87 samples/sec   Loss 8.3841   LearningRate 0.0410   Epoch: 14   Global Step: 72850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:58:58,619-Speed 10827.09 samples/sec   Loss 8.4571   LearningRate 0.0409   Epoch: 14   Global Step: 72860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:58:59,543-Speed 11084.03 samples/sec   Loss 8.4919   LearningRate 0.0409   Epoch: 14   Global Step: 72870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:59:00,505-Speed 10648.64 samples/sec   Loss 8.6333   LearningRate 0.0409   Epoch: 14   Global Step: 72880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:59:01,476-Speed 10567.03 samples/sec   Loss 8.5750   LearningRate 0.0409   Epoch: 14   Global Step: 72890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:59:02,455-Speed 10463.18 samples/sec   Loss 8.4245   LearningRate 0.0409   Epoch: 14   Global Step: 72900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:59:03,403-Speed 10810.65 samples/sec   Loss 8.5281   LearningRate 0.0409   Epoch: 14   Global Step: 72910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:59:04,373-Speed 10565.84 samples/sec   Loss 8.5299   LearningRate 0.0409   Epoch: 14   Global Step: 72920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:59:05,341-Speed 10589.43 samples/sec   Loss 8.4342   LearningRate 0.0409   Epoch: 14   Global Step: 72930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:59:06,321-Speed 10457.36 samples/sec   Loss 8.3942   LearningRate 0.0409   Epoch: 14   Global Step: 72940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:59:07,287-Speed 10605.50 samples/sec   Loss 8.4616   LearningRate 0.0409   Epoch: 14   Global Step: 72950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:59:08,244-Speed 10716.72 samples/sec   Loss 8.3946   LearningRate 0.0409   Epoch: 14   Global Step: 72960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:59:09,205-Speed 10660.63 samples/sec   Loss 8.5312   LearningRate 0.0409   Epoch: 14   Global Step: 72970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:59:10,182-Speed 10500.86 samples/sec   Loss 8.4950   LearningRate 0.0409   Epoch: 14   Global Step: 72980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:59:11,120-Speed 10933.93 samples/sec   Loss 8.4722   LearningRate 0.0409   Epoch: 14   Global Step: 72990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:59:12,071-Speed 10772.08 samples/sec   Loss 8.5611   LearningRate 0.0409   Epoch: 14   Global Step: 73000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:59:13,020-Speed 10807.68 samples/sec   Loss 8.4925   LearningRate 0.0408   Epoch: 14   Global Step: 73010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:59:14,021-Speed 10236.77 samples/sec   Loss 8.5201   LearningRate 0.0408   Epoch: 14   Global Step: 73020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:59:15,011-Speed 10357.64 samples/sec   Loss 8.2257   LearningRate 0.0408   Epoch: 14   Global Step: 73030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:59:15,963-Speed 10765.16 samples/sec   Loss 8.4593   LearningRate 0.0408   Epoch: 14   Global Step: 73040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:59:16,919-Speed 10725.91 samples/sec   Loss 8.4682   LearningRate 0.0408   Epoch: 14   Global Step: 73050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:59:17,866-Speed 10820.20 samples/sec   Loss 8.4542   LearningRate 0.0408   Epoch: 14   Global Step: 73060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:59:18,842-Speed 10494.13 samples/sec   Loss 8.4390   LearningRate 0.0408   Epoch: 14   Global Step: 73070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:59:19,839-Speed 10281.38 samples/sec   Loss 8.4962   LearningRate 0.0408   Epoch: 14   Global Step: 73080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:59:20,757-Speed 11165.98 samples/sec   Loss 8.5614   LearningRate 0.0408   Epoch: 14   Global Step: 73090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:59:21,682-Speed 11081.68 samples/sec   Loss 8.6048   LearningRate 0.0408   Epoch: 14   Global Step: 73100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:59:22,642-Speed 10669.96 samples/sec   Loss 8.4966   LearningRate 0.0408   Epoch: 14   Global Step: 73110   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:59:23,637-Speed 10308.08 samples/sec   Loss 8.5109   LearningRate 0.0408   Epoch: 14   Global Step: 73120   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:59:24,631-Speed 10306.27 samples/sec   Loss 8.6090   LearningRate 0.0408   Epoch: 14   Global Step: 73130   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:59:25,550-Speed 11149.59 samples/sec   Loss 8.4890   LearningRate 0.0408   Epoch: 14   Global Step: 73140   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:59:26,481-Speed 11016.85 samples/sec   Loss 8.3591   LearningRate 0.0408   Epoch: 14   Global Step: 73150   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:59:27,509-Speed 9966.74 samples/sec   Loss 8.4182   LearningRate 0.0408   Epoch: 14   Global Step: 73160   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:59:28,461-Speed 10762.21 samples/sec   Loss 8.5047   LearningRate 0.0407   Epoch: 14   Global Step: 73170   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:59:29,440-Speed 10468.21 samples/sec   Loss 8.5277   LearningRate 0.0407   Epoch: 14   Global Step: 73180   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:59:30,415-Speed 10510.57 samples/sec   Loss 8.4883   LearningRate 0.0407   Epoch: 14   Global Step: 73190   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:59:31,373-Speed 10726.47 samples/sec   Loss 8.3779   LearningRate 0.0407   Epoch: 14   Global Step: 73200   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 01:59:32,319-Speed 10831.98 samples/sec   Loss 8.4534   LearningRate 0.0407   Epoch: 14   Global Step: 73210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:59:33,287-Speed 10589.63 samples/sec   Loss 8.4282   LearningRate 0.0407   Epoch: 14   Global Step: 73220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:59:34,362-Speed 9537.63 samples/sec   Loss 8.5722   LearningRate 0.0407   Epoch: 14   Global Step: 73230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:59:35,322-Speed 10673.14 samples/sec   Loss 8.4290   LearningRate 0.0407   Epoch: 14   Global Step: 73240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:59:36,310-Speed 10378.25 samples/sec   Loss 8.4142   LearningRate 0.0407   Epoch: 14   Global Step: 73250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:59:37,281-Speed 10554.38 samples/sec   Loss 8.3293   LearningRate 0.0407   Epoch: 14   Global Step: 73260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:59:38,237-Speed 10720.26 samples/sec   Loss 8.4302   LearningRate 0.0407   Epoch: 14   Global Step: 73270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:59:39,217-Speed 10451.06 samples/sec   Loss 8.5045   LearningRate 0.0407   Epoch: 14   Global Step: 73280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:59:40,197-Speed 10465.90 samples/sec   Loss 8.4338   LearningRate 0.0407   Epoch: 14   Global Step: 73290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:59:41,145-Speed 10810.85 samples/sec   Loss 8.5505   LearningRate 0.0407   Epoch: 14   Global Step: 73300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:59:42,095-Speed 10782.24 samples/sec   Loss 8.3732   LearningRate 0.0407   Epoch: 14   Global Step: 73310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:59:43,078-Speed 10426.01 samples/sec   Loss 8.5957   LearningRate 0.0407   Epoch: 14   Global Step: 73320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:59:44,053-Speed 10511.00 samples/sec   Loss 8.3429   LearningRate 0.0406   Epoch: 14   Global Step: 73330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:59:44,982-Speed 11029.92 samples/sec   Loss 8.3656   LearningRate 0.0406   Epoch: 14   Global Step: 73340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:59:45,963-Speed 10444.28 samples/sec   Loss 8.4477   LearningRate 0.0406   Epoch: 14   Global Step: 73350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:59:46,944-Speed 10450.17 samples/sec   Loss 8.4172   LearningRate 0.0406   Epoch: 14   Global Step: 73360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:59:47,919-Speed 10510.69 samples/sec   Loss 8.4525   LearningRate 0.0406   Epoch: 14   Global Step: 73370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:59:48,916-Speed 10282.96 samples/sec   Loss 8.5633   LearningRate 0.0406   Epoch: 14   Global Step: 73380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:59:49,881-Speed 10625.39 samples/sec   Loss 8.4140   LearningRate 0.0406   Epoch: 14   Global Step: 73390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:59:50,841-Speed 10669.87 samples/sec   Loss 8.4685   LearningRate 0.0406   Epoch: 14   Global Step: 73400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:59:51,808-Speed 10599.88 samples/sec   Loss 8.3860   LearningRate 0.0406   Epoch: 14   Global Step: 73410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 01:59:52,815-Speed 10185.91 samples/sec   Loss 8.6291   LearningRate 0.0406   Epoch: 14   Global Step: 73420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:59:53,782-Speed 10603.48 samples/sec   Loss 8.5837   LearningRate 0.0406   Epoch: 14   Global Step: 73430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:59:54,743-Speed 10656.92 samples/sec   Loss 8.3892   LearningRate 0.0406   Epoch: 14   Global Step: 73440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:59:55,692-Speed 10799.25 samples/sec   Loss 8.4895   LearningRate 0.0406   Epoch: 14   Global Step: 73450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:59:56,652-Speed 10678.65 samples/sec   Loss 8.4677   LearningRate 0.0406   Epoch: 14   Global Step: 73460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:59:57,632-Speed 10463.88 samples/sec   Loss 8.6016   LearningRate 0.0406   Epoch: 14   Global Step: 73470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:59:58,577-Speed 10843.52 samples/sec   Loss 8.5026   LearningRate 0.0406   Epoch: 14   Global Step: 73480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 01:59:59,512-Speed 10971.21 samples/sec   Loss 8.2880   LearningRate 0.0405   Epoch: 14   Global Step: 73490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:00:00,462-Speed 10779.45 samples/sec   Loss 8.3647   LearningRate 0.0405   Epoch: 14   Global Step: 73500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:00:01,447-Speed 10409.80 samples/sec   Loss 8.5719   LearningRate 0.0405   Epoch: 14   Global Step: 73510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:00:02,408-Speed 10657.25 samples/sec   Loss 8.3985   LearningRate 0.0405   Epoch: 14   Global Step: 73520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:00:03,367-Speed 10689.95 samples/sec   Loss 8.5468   LearningRate 0.0405   Epoch: 14   Global Step: 73530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:00:04,326-Speed 10687.40 samples/sec   Loss 8.4737   LearningRate 0.0405   Epoch: 14   Global Step: 73540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:00:05,307-Speed 10449.23 samples/sec   Loss 8.5248   LearningRate 0.0405   Epoch: 14   Global Step: 73550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:00:06,293-Speed 10393.01 samples/sec   Loss 8.6001   LearningRate 0.0405   Epoch: 14   Global Step: 73560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:00:07,258-Speed 10618.30 samples/sec   Loss 8.4411   LearningRate 0.0405   Epoch: 14   Global Step: 73570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:00:08,223-Speed 10626.90 samples/sec   Loss 8.5803   LearningRate 0.0405   Epoch: 14   Global Step: 73580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:00:09,182-Speed 10685.78 samples/sec   Loss 8.5993   LearningRate 0.0405   Epoch: 14   Global Step: 73590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:00:10,123-Speed 10889.45 samples/sec   Loss 8.4112   LearningRate 0.0405   Epoch: 14   Global Step: 73600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:00:11,093-Speed 10572.38 samples/sec   Loss 8.5734   LearningRate 0.0405   Epoch: 14   Global Step: 73610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:00:12,040-Speed 10817.46 samples/sec   Loss 8.2832   LearningRate 0.0405   Epoch: 14   Global Step: 73620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:00:13,005-Speed 10616.85 samples/sec   Loss 8.6731   LearningRate 0.0405   Epoch: 14   Global Step: 73630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:00:13,950-Speed 10843.91 samples/sec   Loss 8.4471   LearningRate 0.0405   Epoch: 14   Global Step: 73640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:00:14,977-Speed 9976.28 samples/sec   Loss 8.5902   LearningRate 0.0404   Epoch: 14   Global Step: 73650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:00:15,940-Speed 10653.59 samples/sec   Loss 8.4836   LearningRate 0.0404   Epoch: 14   Global Step: 73660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:00:16,899-Speed 10676.78 samples/sec   Loss 8.2598   LearningRate 0.0404   Epoch: 14   Global Step: 73670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:00:17,877-Speed 10486.76 samples/sec   Loss 8.5225   LearningRate 0.0404   Epoch: 14   Global Step: 73680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:00:18,847-Speed 10573.68 samples/sec   Loss 8.3974   LearningRate 0.0404   Epoch: 14   Global Step: 73690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:00:19,821-Speed 10520.73 samples/sec   Loss 8.4584   LearningRate 0.0404   Epoch: 14   Global Step: 73700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:00:20,803-Speed 10435.44 samples/sec   Loss 8.4746   LearningRate 0.0404   Epoch: 14   Global Step: 73710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:00:21,765-Speed 10649.07 samples/sec   Loss 8.4312   LearningRate 0.0404   Epoch: 14   Global Step: 73720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:00:22,705-Speed 10908.01 samples/sec   Loss 8.5789   LearningRate 0.0404   Epoch: 14   Global Step: 73730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:00:23,695-Speed 10345.11 samples/sec   Loss 8.5717   LearningRate 0.0404   Epoch: 14   Global Step: 73740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:00:24,710-Speed 10098.47 samples/sec   Loss 8.5104   LearningRate 0.0404   Epoch: 14   Global Step: 73750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:00:25,661-Speed 10778.88 samples/sec   Loss 8.5358   LearningRate 0.0404   Epoch: 14   Global Step: 73760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:00:26,617-Speed 10727.15 samples/sec   Loss 8.6178   LearningRate 0.0404   Epoch: 14   Global Step: 73770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:00:27,623-Speed 10176.42 samples/sec   Loss 8.6458   LearningRate 0.0404   Epoch: 14   Global Step: 73780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:00:28,611-Speed 10377.73 samples/sec   Loss 8.4011   LearningRate 0.0404   Epoch: 14   Global Step: 73790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:00:29,554-Speed 10874.97 samples/sec   Loss 8.4671   LearningRate 0.0404   Epoch: 14   Global Step: 73800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:00:30,537-Speed 10418.08 samples/sec   Loss 8.4021   LearningRate 0.0403   Epoch: 14   Global Step: 73810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:00:31,508-Speed 10553.48 samples/sec   Loss 8.4200   LearningRate 0.0403   Epoch: 14   Global Step: 73820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:00:32,462-Speed 10749.29 samples/sec   Loss 8.4039   LearningRate 0.0403   Epoch: 14   Global Step: 73830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:00:33,444-Speed 10437.22 samples/sec   Loss 8.7032   LearningRate 0.0403   Epoch: 14   Global Step: 73840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:00:34,417-Speed 10534.21 samples/sec   Loss 8.5162   LearningRate 0.0403   Epoch: 14   Global Step: 73850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:00:35,361-Speed 10854.87 samples/sec   Loss 8.5269   LearningRate 0.0403   Epoch: 14   Global Step: 73860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:00:36,332-Speed 10555.18 samples/sec   Loss 8.5411   LearningRate 0.0403   Epoch: 14   Global Step: 73870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:00:37,310-Speed 10481.04 samples/sec   Loss 8.5327   LearningRate 0.0403   Epoch: 14   Global Step: 73880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:00:38,231-Speed 11128.13 samples/sec   Loss 8.4447   LearningRate 0.0403   Epoch: 14   Global Step: 73890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:00:39,224-Speed 10325.37 samples/sec   Loss 8.5404   LearningRate 0.0403   Epoch: 14   Global Step: 73900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:00:40,204-Speed 10455.74 samples/sec   Loss 8.3973   LearningRate 0.0403   Epoch: 14   Global Step: 73910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:00:41,158-Speed 10731.58 samples/sec   Loss 8.5087   LearningRate 0.0403   Epoch: 14   Global Step: 73920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:00:42,121-Speed 10641.53 samples/sec   Loss 8.5669   LearningRate 0.0403   Epoch: 14   Global Step: 73930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:00:43,151-Speed 9957.77 samples/sec   Loss 8.4408   LearningRate 0.0403   Epoch: 14   Global Step: 73940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:00:44,086-Speed 10958.05 samples/sec   Loss 8.5842   LearningRate 0.0403   Epoch: 14   Global Step: 73950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:00:45,068-Speed 10434.65 samples/sec   Loss 8.3617   LearningRate 0.0403   Epoch: 14   Global Step: 73960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:00:46,012-Speed 10849.58 samples/sec   Loss 8.5227   LearningRate 0.0402   Epoch: 14   Global Step: 73970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:00:46,981-Speed 10584.56 samples/sec   Loss 8.3831   LearningRate 0.0402   Epoch: 14   Global Step: 73980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:00:47,911-Speed 11014.80 samples/sec   Loss 8.6644   LearningRate 0.0402   Epoch: 14   Global Step: 73990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:00:48,846-Speed 10970.09 samples/sec   Loss 8.4232   LearningRate 0.0402   Epoch: 14   Global Step: 74000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:01:11,411-[lfw][74000]XNorm: 12.128293
Training: 2022-04-11 02:01:11,411-[lfw][74000]Accuracy-Flip: 0.99417+-0.00389
Training: 2022-04-11 02:01:11,412-[lfw][74000]Accuracy-Highest: 0.99583
Training: 2022-04-11 02:01:36,967-[cfp_fp][74000]XNorm: 10.287101
Training: 2022-04-11 02:01:36,968-[cfp_fp][74000]Accuracy-Flip: 0.95600+-0.00939
Training: 2022-04-11 02:01:36,969-[cfp_fp][74000]Accuracy-Highest: 0.95743
Training: 2022-04-11 02:01:58,963-[agedb_30][74000]XNorm: 11.929308
Training: 2022-04-11 02:01:58,964-[agedb_30][74000]Accuracy-Flip: 0.95933+-0.00929
Training: 2022-04-11 02:01:58,964-[agedb_30][74000]Accuracy-Highest: 0.96283
Training: 2022-04-11 02:01:59,949-Speed 144.02 samples/sec   Loss 8.5072   LearningRate 0.0402   Epoch: 14   Global Step: 74010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:02:00,872-Speed 11096.42 samples/sec   Loss 8.6129   LearningRate 0.0402   Epoch: 14   Global Step: 74020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:02:01,819-Speed 10820.46 samples/sec   Loss 8.4859   LearningRate 0.0402   Epoch: 14   Global Step: 74030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:02:02,815-Speed 10291.20 samples/sec   Loss 8.5321   LearningRate 0.0402   Epoch: 14   Global Step: 74040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:02:03,750-Speed 10961.37 samples/sec   Loss 8.4867   LearningRate 0.0402   Epoch: 14   Global Step: 74050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:02:04,710-Speed 10680.89 samples/sec   Loss 8.5407   LearningRate 0.0402   Epoch: 14   Global Step: 74060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:02:05,680-Speed 10561.79 samples/sec   Loss 8.5618   LearningRate 0.0402   Epoch: 14   Global Step: 74070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:02:06,641-Speed 10664.35 samples/sec   Loss 8.6553   LearningRate 0.0402   Epoch: 14   Global Step: 74080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:02:07,586-Speed 10849.53 samples/sec   Loss 8.4842   LearningRate 0.0402   Epoch: 14   Global Step: 74090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:02:08,553-Speed 10600.43 samples/sec   Loss 8.4993   LearningRate 0.0402   Epoch: 14   Global Step: 74100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:02:09,506-Speed 10759.71 samples/sec   Loss 8.4780   LearningRate 0.0402   Epoch: 14   Global Step: 74110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:02:10,476-Speed 10568.47 samples/sec   Loss 8.5195   LearningRate 0.0402   Epoch: 14   Global Step: 74120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:02:11,462-Speed 10383.26 samples/sec   Loss 8.4076   LearningRate 0.0401   Epoch: 14   Global Step: 74130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:02:12,387-Speed 11087.65 samples/sec   Loss 8.5742   LearningRate 0.0401   Epoch: 14   Global Step: 74140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:02:13,337-Speed 10781.66 samples/sec   Loss 8.4970   LearningRate 0.0401   Epoch: 14   Global Step: 74150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:02:14,298-Speed 10680.36 samples/sec   Loss 8.5491   LearningRate 0.0401   Epoch: 14   Global Step: 74160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:02:15,266-Speed 10581.50 samples/sec   Loss 8.4646   LearningRate 0.0401   Epoch: 14   Global Step: 74170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:02:16,226-Speed 10677.54 samples/sec   Loss 8.3629   LearningRate 0.0401   Epoch: 14   Global Step: 74180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:02:17,186-Speed 10678.44 samples/sec   Loss 8.4956   LearningRate 0.0401   Epoch: 14   Global Step: 74190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:02:18,152-Speed 10609.71 samples/sec   Loss 8.5196   LearningRate 0.0401   Epoch: 14   Global Step: 74200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:02:19,130-Speed 10479.39 samples/sec   Loss 8.5476   LearningRate 0.0401   Epoch: 14   Global Step: 74210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:02:20,129-Speed 10259.48 samples/sec   Loss 8.5698   LearningRate 0.0401   Epoch: 14   Global Step: 74220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:02:21,111-Speed 10439.55 samples/sec   Loss 8.5576   LearningRate 0.0401   Epoch: 14   Global Step: 74230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:02:22,054-Speed 10872.34 samples/sec   Loss 8.3739   LearningRate 0.0401   Epoch: 14   Global Step: 74240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:02:23,033-Speed 10473.16 samples/sec   Loss 8.5920   LearningRate 0.0401   Epoch: 14   Global Step: 74250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:02:24,041-Speed 10167.31 samples/sec   Loss 8.5547   LearningRate 0.0401   Epoch: 14   Global Step: 74260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:02:25,021-Speed 10455.22 samples/sec   Loss 8.4893   LearningRate 0.0401   Epoch: 14   Global Step: 74270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:02:25,965-Speed 10855.19 samples/sec   Loss 8.6005   LearningRate 0.0401   Epoch: 14   Global Step: 74280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:02:26,934-Speed 10580.01 samples/sec   Loss 8.3653   LearningRate 0.0400   Epoch: 14   Global Step: 74290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:02:27,904-Speed 10568.87 samples/sec   Loss 8.5778   LearningRate 0.0400   Epoch: 14   Global Step: 74300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:02:28,893-Speed 10362.10 samples/sec   Loss 8.7062   LearningRate 0.0400   Epoch: 14   Global Step: 74310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:02:29,832-Speed 10912.38 samples/sec   Loss 8.3957   LearningRate 0.0400   Epoch: 14   Global Step: 74320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:02:30,771-Speed 10919.79 samples/sec   Loss 8.5234   LearningRate 0.0400   Epoch: 14   Global Step: 74330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:02:31,730-Speed 10686.89 samples/sec   Loss 8.4588   LearningRate 0.0400   Epoch: 14   Global Step: 74340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:02:32,685-Speed 10731.07 samples/sec   Loss 8.5398   LearningRate 0.0400   Epoch: 14   Global Step: 74350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:02:33,589-Speed 11338.40 samples/sec   Loss 8.6582   LearningRate 0.0400   Epoch: 14   Global Step: 74360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:02:34,563-Speed 10528.75 samples/sec   Loss 8.4925   LearningRate 0.0400   Epoch: 14   Global Step: 74370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:02:35,497-Speed 10967.21 samples/sec   Loss 8.4136   LearningRate 0.0400   Epoch: 14   Global Step: 74380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:02:36,481-Speed 10427.70 samples/sec   Loss 8.5530   LearningRate 0.0400   Epoch: 14   Global Step: 74390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:02:37,430-Speed 10797.97 samples/sec   Loss 8.4617   LearningRate 0.0400   Epoch: 14   Global Step: 74400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:02:38,391-Speed 10662.84 samples/sec   Loss 8.6660   LearningRate 0.0400   Epoch: 14   Global Step: 74410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:02:39,380-Speed 10366.20 samples/sec   Loss 8.4855   LearningRate 0.0400   Epoch: 14   Global Step: 74420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:02:40,341-Speed 10662.36 samples/sec   Loss 8.4909   LearningRate 0.0400   Epoch: 14   Global Step: 74430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:02:41,319-Speed 10481.55 samples/sec   Loss 8.4869   LearningRate 0.0400   Epoch: 14   Global Step: 74440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:02:42,278-Speed 10689.28 samples/sec   Loss 8.4314   LearningRate 0.0399   Epoch: 14   Global Step: 74450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:02:43,256-Speed 10471.14 samples/sec   Loss 8.4390   LearningRate 0.0399   Epoch: 14   Global Step: 74460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:02:44,273-Speed 10103.73 samples/sec   Loss 8.5528   LearningRate 0.0399   Epoch: 14   Global Step: 74470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:02:45,230-Speed 10720.25 samples/sec   Loss 8.6407   LearningRate 0.0399   Epoch: 14   Global Step: 74480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:02:46,147-Speed 11172.89 samples/sec   Loss 8.4780   LearningRate 0.0399   Epoch: 14   Global Step: 74490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:02:47,106-Speed 10692.69 samples/sec   Loss 8.3978   LearningRate 0.0399   Epoch: 14   Global Step: 74500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:02:48,050-Speed 10850.44 samples/sec   Loss 8.3328   LearningRate 0.0399   Epoch: 14   Global Step: 74510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:02:49,005-Speed 10740.98 samples/sec   Loss 8.4167   LearningRate 0.0399   Epoch: 14   Global Step: 74520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:02:49,952-Speed 10824.31 samples/sec   Loss 8.4122   LearningRate 0.0399   Epoch: 14   Global Step: 74530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:02:50,945-Speed 10316.60 samples/sec   Loss 8.4578   LearningRate 0.0399   Epoch: 14   Global Step: 74540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:02:51,894-Speed 10806.08 samples/sec   Loss 8.3495   LearningRate 0.0399   Epoch: 14   Global Step: 74550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:02:52,852-Speed 10696.87 samples/sec   Loss 8.6350   LearningRate 0.0399   Epoch: 14   Global Step: 74560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:02:53,821-Speed 10572.60 samples/sec   Loss 8.5575   LearningRate 0.0399   Epoch: 14   Global Step: 74570   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:02:54,805-Speed 10413.83 samples/sec   Loss 8.5382   LearningRate 0.0399   Epoch: 14   Global Step: 74580   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:02:55,727-Speed 11115.09 samples/sec   Loss 8.6444   LearningRate 0.0399   Epoch: 14   Global Step: 74590   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:02:56,788-Speed 9654.56 samples/sec   Loss 8.5216   LearningRate 0.0399   Epoch: 14   Global Step: 74600   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:02:57,710-Speed 11123.28 samples/sec   Loss 8.5118   LearningRate 0.0398   Epoch: 14   Global Step: 74610   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:02:58,682-Speed 10546.54 samples/sec   Loss 8.4119   LearningRate 0.0398   Epoch: 14   Global Step: 74620   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:02:59,659-Speed 10482.07 samples/sec   Loss 8.3134   LearningRate 0.0398   Epoch: 14   Global Step: 74630   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:03:00,632-Speed 10601.68 samples/sec   Loss 8.3552   LearningRate 0.0398   Epoch: 14   Global Step: 74640   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:03:01,583-Speed 10772.88 samples/sec   Loss 8.5750   LearningRate 0.0398   Epoch: 14   Global Step: 74650   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:03:02,518-Speed 10965.80 samples/sec   Loss 8.5338   LearningRate 0.0398   Epoch: 14   Global Step: 74660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:03:03,474-Speed 10725.72 samples/sec   Loss 8.4939   LearningRate 0.0398   Epoch: 14   Global Step: 74670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:03:04,479-Speed 10187.53 samples/sec   Loss 8.4043   LearningRate 0.0398   Epoch: 14   Global Step: 74680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:03:05,437-Speed 10699.27 samples/sec   Loss 8.3576   LearningRate 0.0398   Epoch: 14   Global Step: 74690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:03:06,372-Speed 10964.69 samples/sec   Loss 8.4171   LearningRate 0.0398   Epoch: 14   Global Step: 74700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:03:07,308-Speed 10962.44 samples/sec   Loss 8.5490   LearningRate 0.0398   Epoch: 14   Global Step: 74710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:03:08,266-Speed 10700.59 samples/sec   Loss 8.5196   LearningRate 0.0398   Epoch: 14   Global Step: 74720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:03:09,243-Speed 10486.95 samples/sec   Loss 8.4800   LearningRate 0.0398   Epoch: 14   Global Step: 74730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:03:10,209-Speed 10606.60 samples/sec   Loss 8.5235   LearningRate 0.0398   Epoch: 14   Global Step: 74740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:03:11,181-Speed 10549.39 samples/sec   Loss 8.2950   LearningRate 0.0398   Epoch: 14   Global Step: 74750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:03:12,129-Speed 10810.52 samples/sec   Loss 8.3771   LearningRate 0.0398   Epoch: 14   Global Step: 74760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:03:13,116-Speed 10386.93 samples/sec   Loss 8.4428   LearningRate 0.0397   Epoch: 14   Global Step: 74770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:03:14,054-Speed 10929.17 samples/sec   Loss 8.6318   LearningRate 0.0397   Epoch: 14   Global Step: 74780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:03:15,024-Speed 10557.31 samples/sec   Loss 8.3192   LearningRate 0.0397   Epoch: 14   Global Step: 74790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:03:15,973-Speed 10802.70 samples/sec   Loss 8.4127   LearningRate 0.0397   Epoch: 14   Global Step: 74800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:03:17,053-Speed 9493.43 samples/sec   Loss 8.5123   LearningRate 0.0397   Epoch: 14   Global Step: 74810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:03:18,002-Speed 10797.54 samples/sec   Loss 8.6442   LearningRate 0.0397   Epoch: 14   Global Step: 74820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:03:18,977-Speed 10517.44 samples/sec   Loss 8.4432   LearningRate 0.0397   Epoch: 14   Global Step: 74830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:03:19,985-Speed 10180.24 samples/sec   Loss 8.3726   LearningRate 0.0397   Epoch: 14   Global Step: 74840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:03:21,065-Speed 9492.27 samples/sec   Loss 8.4415   LearningRate 0.0397   Epoch: 14   Global Step: 74850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:03:22,033-Speed 10592.22 samples/sec   Loss 8.5866   LearningRate 0.0397   Epoch: 14   Global Step: 74860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:03:23,010-Speed 10485.39 samples/sec   Loss 8.4157   LearningRate 0.0397   Epoch: 14   Global Step: 74870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:03:23,940-Speed 11025.18 samples/sec   Loss 8.4506   LearningRate 0.0397   Epoch: 14   Global Step: 74880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:03:24,926-Speed 10384.95 samples/sec   Loss 8.2515   LearningRate 0.0397   Epoch: 14   Global Step: 74890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:03:25,858-Speed 10998.69 samples/sec   Loss 8.4526   LearningRate 0.0397   Epoch: 14   Global Step: 74900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:03:26,788-Speed 11026.42 samples/sec   Loss 8.3716   LearningRate 0.0397   Epoch: 14   Global Step: 74910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:03:27,771-Speed 10421.89 samples/sec   Loss 8.3829   LearningRate 0.0397   Epoch: 14   Global Step: 74920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:03:28,736-Speed 10618.29 samples/sec   Loss 8.5262   LearningRate 0.0396   Epoch: 14   Global Step: 74930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:03:29,704-Speed 10589.81 samples/sec   Loss 8.7954   LearningRate 0.0396   Epoch: 14   Global Step: 74940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:03:30,647-Speed 10867.09 samples/sec   Loss 8.3704   LearningRate 0.0396   Epoch: 14   Global Step: 74950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:03:31,615-Speed 10594.70 samples/sec   Loss 8.3779   LearningRate 0.0396   Epoch: 14   Global Step: 74960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:03:32,553-Speed 10918.47 samples/sec   Loss 8.5466   LearningRate 0.0396   Epoch: 14   Global Step: 74970   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 02:03:33,497-Speed 10864.45 samples/sec   Loss 8.5832   LearningRate 0.0396   Epoch: 14   Global Step: 74980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:03:34,448-Speed 10775.74 samples/sec   Loss 8.5555   LearningRate 0.0396   Epoch: 14   Global Step: 74990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:03:35,357-Speed 11275.58 samples/sec   Loss 8.5113   LearningRate 0.0396   Epoch: 14   Global Step: 75000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:03:36,332-Speed 10511.67 samples/sec   Loss 8.4331   LearningRate 0.0396   Epoch: 14   Global Step: 75010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:03:37,298-Speed 10607.53 samples/sec   Loss 8.3680   LearningRate 0.0396   Epoch: 14   Global Step: 75020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:03:38,289-Speed 10343.80 samples/sec   Loss 8.5002   LearningRate 0.0396   Epoch: 14   Global Step: 75030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:03:39,253-Speed 10632.92 samples/sec   Loss 8.4395   LearningRate 0.0396   Epoch: 14   Global Step: 75040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:03:40,209-Speed 10726.71 samples/sec   Loss 8.3967   LearningRate 0.0396   Epoch: 14   Global Step: 75050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:03:41,254-Speed 9798.52 samples/sec   Loss 8.4305   LearningRate 0.0396   Epoch: 14   Global Step: 75060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:03:42,197-Speed 10878.62 samples/sec   Loss 8.5040   LearningRate 0.0396   Epoch: 14   Global Step: 75070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:03:43,173-Speed 10511.63 samples/sec   Loss 8.4194   LearningRate 0.0396   Epoch: 14   Global Step: 75080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:03:44,120-Speed 10826.11 samples/sec   Loss 8.4131   LearningRate 0.0395   Epoch: 14   Global Step: 75090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:03:45,083-Speed 10635.21 samples/sec   Loss 8.4219   LearningRate 0.0395   Epoch: 14   Global Step: 75100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:03:46,089-Speed 10186.99 samples/sec   Loss 8.4048   LearningRate 0.0395   Epoch: 14   Global Step: 75110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:03:47,102-Speed 10120.93 samples/sec   Loss 8.4619   LearningRate 0.0395   Epoch: 14   Global Step: 75120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:03:48,083-Speed 10451.61 samples/sec   Loss 8.5409   LearningRate 0.0395   Epoch: 14   Global Step: 75130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:03:49,035-Speed 10764.63 samples/sec   Loss 8.4809   LearningRate 0.0395   Epoch: 14   Global Step: 75140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:03:49,998-Speed 10642.65 samples/sec   Loss 8.4902   LearningRate 0.0395   Epoch: 14   Global Step: 75150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:03:51,052-Speed 9727.25 samples/sec   Loss 8.4495   LearningRate 0.0395   Epoch: 14   Global Step: 75160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:03:51,961-Speed 11270.05 samples/sec   Loss 8.5993   LearningRate 0.0395   Epoch: 14   Global Step: 75170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:03:52,902-Speed 10900.55 samples/sec   Loss 8.4441   LearningRate 0.0395   Epoch: 14   Global Step: 75180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:03:53,899-Speed 10273.25 samples/sec   Loss 8.5453   LearningRate 0.0395   Epoch: 14   Global Step: 75190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:03:54,849-Speed 10798.29 samples/sec   Loss 8.4445   LearningRate 0.0395   Epoch: 14   Global Step: 75200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:03:55,774-Speed 11079.53 samples/sec   Loss 8.4468   LearningRate 0.0395   Epoch: 14   Global Step: 75210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:03:56,711-Speed 10928.41 samples/sec   Loss 8.4183   LearningRate 0.0395   Epoch: 14   Global Step: 75220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:03:57,663-Speed 10773.10 samples/sec   Loss 8.5535   LearningRate 0.0395   Epoch: 14   Global Step: 75230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:03:58,639-Speed 10493.79 samples/sec   Loss 8.3841   LearningRate 0.0395   Epoch: 14   Global Step: 75240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:03:59,609-Speed 10575.27 samples/sec   Loss 8.3285   LearningRate 0.0394   Epoch: 14   Global Step: 75250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:04:00,577-Speed 10589.31 samples/sec   Loss 8.6714   LearningRate 0.0394   Epoch: 14   Global Step: 75260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:04:01,534-Speed 10709.46 samples/sec   Loss 8.4871   LearningRate 0.0394   Epoch: 14   Global Step: 75270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:04:02,506-Speed 10545.98 samples/sec   Loss 8.5122   LearningRate 0.0394   Epoch: 14   Global Step: 75280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:04:03,472-Speed 10602.33 samples/sec   Loss 8.5390   LearningRate 0.0394   Epoch: 14   Global Step: 75290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:04:04,418-Speed 10837.53 samples/sec   Loss 8.5432   LearningRate 0.0394   Epoch: 14   Global Step: 75300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:04:05,377-Speed 10683.45 samples/sec   Loss 8.6354   LearningRate 0.0394   Epoch: 14   Global Step: 75310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:04:06,327-Speed 10786.53 samples/sec   Loss 8.5362   LearningRate 0.0394   Epoch: 14   Global Step: 75320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:04:07,260-Speed 10993.68 samples/sec   Loss 8.5109   LearningRate 0.0394   Epoch: 14   Global Step: 75330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:04:08,258-Speed 10274.66 samples/sec   Loss 8.6918   LearningRate 0.0394   Epoch: 14   Global Step: 75340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:04:09,224-Speed 10606.68 samples/sec   Loss 8.6220   LearningRate 0.0394   Epoch: 14   Global Step: 75350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:04:10,180-Speed 10724.52 samples/sec   Loss 8.5828   LearningRate 0.0394   Epoch: 14   Global Step: 75360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:04:11,145-Speed 10621.76 samples/sec   Loss 8.6110   LearningRate 0.0394   Epoch: 14   Global Step: 75370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:04:12,085-Speed 10898.42 samples/sec   Loss 8.4249   LearningRate 0.0394   Epoch: 14   Global Step: 75380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:04:13,054-Speed 10575.53 samples/sec   Loss 8.3592   LearningRate 0.0394   Epoch: 14   Global Step: 75390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:04:14,027-Speed 10540.50 samples/sec   Loss 8.5624   LearningRate 0.0394   Epoch: 14   Global Step: 75400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:04:14,959-Speed 10999.99 samples/sec   Loss 8.4417   LearningRate 0.0393   Epoch: 14   Global Step: 75410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:04:15,877-Speed 11156.96 samples/sec   Loss 8.4092   LearningRate 0.0393   Epoch: 14   Global Step: 75420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:04:16,859-Speed 10441.63 samples/sec   Loss 8.5282   LearningRate 0.0393   Epoch: 14   Global Step: 75430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:04:17,795-Speed 10944.59 samples/sec   Loss 8.5602   LearningRate 0.0393   Epoch: 14   Global Step: 75440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:04:18,783-Speed 10375.81 samples/sec   Loss 8.4787   LearningRate 0.0393   Epoch: 14   Global Step: 75450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:04:19,732-Speed 10810.63 samples/sec   Loss 8.5978   LearningRate 0.0393   Epoch: 14   Global Step: 75460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:04:20,699-Speed 10599.76 samples/sec   Loss 8.4117   LearningRate 0.0393   Epoch: 14   Global Step: 75470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:04:21,660-Speed 10674.97 samples/sec   Loss 8.4153   LearningRate 0.0393   Epoch: 14   Global Step: 75480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:04:22,610-Speed 10777.86 samples/sec   Loss 8.5115   LearningRate 0.0393   Epoch: 14   Global Step: 75490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:04:23,527-Speed 11184.43 samples/sec   Loss 8.4768   LearningRate 0.0393   Epoch: 14   Global Step: 75500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:04:24,457-Speed 11015.42 samples/sec   Loss 8.4595   LearningRate 0.0393   Epoch: 14   Global Step: 75510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:04:25,443-Speed 10392.04 samples/sec   Loss 8.5057   LearningRate 0.0393   Epoch: 14   Global Step: 75520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:04:26,400-Speed 10717.11 samples/sec   Loss 8.3892   LearningRate 0.0393   Epoch: 14   Global Step: 75530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:04:27,363-Speed 10640.75 samples/sec   Loss 8.4076   LearningRate 0.0393   Epoch: 14   Global Step: 75540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:04:28,317-Speed 10742.23 samples/sec   Loss 8.2960   LearningRate 0.0393   Epoch: 14   Global Step: 75550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:04:29,310-Speed 10324.93 samples/sec   Loss 8.3891   LearningRate 0.0393   Epoch: 14   Global Step: 75560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:04:30,268-Speed 10701.13 samples/sec   Loss 8.4111   LearningRate 0.0392   Epoch: 14   Global Step: 75570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:04:31,229-Speed 10668.75 samples/sec   Loss 8.5410   LearningRate 0.0392   Epoch: 14   Global Step: 75580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:04:32,177-Speed 10804.24 samples/sec   Loss 8.3376   LearningRate 0.0392   Epoch: 14   Global Step: 75590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:04:33,135-Speed 10702.17 samples/sec   Loss 8.4508   LearningRate 0.0392   Epoch: 14   Global Step: 75600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:04:34,082-Speed 10818.61 samples/sec   Loss 8.4286   LearningRate 0.0392   Epoch: 14   Global Step: 75610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:04:35,072-Speed 10358.78 samples/sec   Loss 8.4891   LearningRate 0.0392   Epoch: 14   Global Step: 75620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:04:36,040-Speed 10588.29 samples/sec   Loss 8.6603   LearningRate 0.0392   Epoch: 14   Global Step: 75630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:04:36,978-Speed 10923.49 samples/sec   Loss 8.4690   LearningRate 0.0392   Epoch: 14   Global Step: 75640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:04:37,939-Speed 10667.76 samples/sec   Loss 8.4473   LearningRate 0.0392   Epoch: 14   Global Step: 75650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:04:38,904-Speed 10626.94 samples/sec   Loss 8.2732   LearningRate 0.0392   Epoch: 14   Global Step: 75660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:04:39,887-Speed 10418.34 samples/sec   Loss 8.4928   LearningRate 0.0392   Epoch: 14   Global Step: 75670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:04:40,871-Speed 10417.47 samples/sec   Loss 8.5748   LearningRate 0.0392   Epoch: 14   Global Step: 75680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:04:41,799-Speed 11044.67 samples/sec   Loss 8.6401   LearningRate 0.0392   Epoch: 14   Global Step: 75690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:04:42,777-Speed 10485.49 samples/sec   Loss 8.4907   LearningRate 0.0392   Epoch: 14   Global Step: 75700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:04:43,724-Speed 10820.96 samples/sec   Loss 8.5234   LearningRate 0.0392   Epoch: 14   Global Step: 75710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:04:44,672-Speed 10824.88 samples/sec   Loss 8.4615   LearningRate 0.0392   Epoch: 14   Global Step: 75720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:04:45,624-Speed 10759.42 samples/sec   Loss 8.4475   LearningRate 0.0391   Epoch: 14   Global Step: 75730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:04:46,590-Speed 10605.78 samples/sec   Loss 8.5640   LearningRate 0.0391   Epoch: 14   Global Step: 75740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:04:47,554-Speed 10647.05 samples/sec   Loss 8.4586   LearningRate 0.0391   Epoch: 14   Global Step: 75750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:04:48,499-Speed 10855.31 samples/sec   Loss 8.5174   LearningRate 0.0391   Epoch: 14   Global Step: 75760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:04:49,461-Speed 10660.69 samples/sec   Loss 8.7121   LearningRate 0.0391   Epoch: 14   Global Step: 75770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:04:50,450-Speed 10357.59 samples/sec   Loss 8.4612   LearningRate 0.0391   Epoch: 14   Global Step: 75780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:04:51,433-Speed 10429.44 samples/sec   Loss 8.4027   LearningRate 0.0391   Epoch: 14   Global Step: 75790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:04:52,383-Speed 10785.91 samples/sec   Loss 8.6034   LearningRate 0.0391   Epoch: 14   Global Step: 75800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:04:53,350-Speed 10603.67 samples/sec   Loss 8.4413   LearningRate 0.0391   Epoch: 14   Global Step: 75810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:04:54,325-Speed 10522.21 samples/sec   Loss 8.5648   LearningRate 0.0391   Epoch: 14   Global Step: 75820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:04:55,257-Speed 11003.20 samples/sec   Loss 8.4972   LearningRate 0.0391   Epoch: 14   Global Step: 75830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:04:56,188-Speed 11003.38 samples/sec   Loss 8.3739   LearningRate 0.0391   Epoch: 14   Global Step: 75840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:04:57,219-Speed 9936.47 samples/sec   Loss 8.4900   LearningRate 0.0391   Epoch: 14   Global Step: 75850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:04:58,260-Speed 9846.20 samples/sec   Loss 8.4347   LearningRate 0.0391   Epoch: 14   Global Step: 75860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:04:59,196-Speed 10955.37 samples/sec   Loss 8.5027   LearningRate 0.0391   Epoch: 14   Global Step: 75870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:05:11,527-Speed 830.60 samples/sec   Loss 7.5139   LearningRate 0.0391   Epoch: 15   Global Step: 75880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:05:12,585-Speed 9688.29 samples/sec   Loss 7.5872   LearningRate 0.0391   Epoch: 15   Global Step: 75890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:05:13,718-Speed 9044.13 samples/sec   Loss 7.6100   LearningRate 0.0390   Epoch: 15   Global Step: 75900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:05:14,700-Speed 10437.17 samples/sec   Loss 7.6321   LearningRate 0.0390   Epoch: 15   Global Step: 75910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:05:15,741-Speed 9844.10 samples/sec   Loss 7.5859   LearningRate 0.0390   Epoch: 15   Global Step: 75920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:05:16,765-Speed 10015.45 samples/sec   Loss 7.6228   LearningRate 0.0390   Epoch: 15   Global Step: 75930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:05:17,748-Speed 10420.92 samples/sec   Loss 7.4269   LearningRate 0.0390   Epoch: 15   Global Step: 75940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:05:18,705-Speed 10715.60 samples/sec   Loss 7.4506   LearningRate 0.0390   Epoch: 15   Global Step: 75950   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:05:19,660-Speed 10743.30 samples/sec   Loss 7.6274   LearningRate 0.0390   Epoch: 15   Global Step: 75960   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:05:20,621-Speed 10657.58 samples/sec   Loss 7.6605   LearningRate 0.0390   Epoch: 15   Global Step: 75970   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:05:21,624-Speed 10221.34 samples/sec   Loss 7.4249   LearningRate 0.0390   Epoch: 15   Global Step: 75980   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:05:22,627-Speed 10218.22 samples/sec   Loss 7.5561   LearningRate 0.0390   Epoch: 15   Global Step: 75990   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:05:23,631-Speed 10210.94 samples/sec   Loss 7.5143   LearningRate 0.0390   Epoch: 15   Global Step: 76000   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:05:46,048-[lfw][76000]XNorm: 11.891736
Training: 2022-04-11 02:05:46,049-[lfw][76000]Accuracy-Flip: 0.99617+-0.00395
Training: 2022-04-11 02:05:46,050-[lfw][76000]Accuracy-Highest: 0.99617
Training: 2022-04-11 02:06:11,580-[cfp_fp][76000]XNorm: 10.120627
Training: 2022-04-11 02:06:11,581-[cfp_fp][76000]Accuracy-Flip: 0.95629+-0.01436
Training: 2022-04-11 02:06:11,582-[cfp_fp][76000]Accuracy-Highest: 0.95743
Training: 2022-04-11 02:06:33,835-[agedb_30][76000]XNorm: 11.594925
Training: 2022-04-11 02:06:33,836-[agedb_30][76000]Accuracy-Flip: 0.96017+-0.00899
Training: 2022-04-11 02:06:33,837-[agedb_30][76000]Accuracy-Highest: 0.96283
Training: 2022-04-11 02:06:34,826-Speed 143.83 samples/sec   Loss 7.5216   LearningRate 0.0390   Epoch: 15   Global Step: 76010   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:06:35,754-Speed 11040.00 samples/sec   Loss 7.5780   LearningRate 0.0390   Epoch: 15   Global Step: 76020   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:06:36,750-Speed 10288.30 samples/sec   Loss 7.6834   LearningRate 0.0390   Epoch: 15   Global Step: 76030   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:06:37,762-Speed 10125.44 samples/sec   Loss 7.5553   LearningRate 0.0390   Epoch: 15   Global Step: 76040   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:06:38,765-Speed 10216.30 samples/sec   Loss 7.7609   LearningRate 0.0390   Epoch: 15   Global Step: 76050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:06:39,718-Speed 10764.76 samples/sec   Loss 7.6815   LearningRate 0.0389   Epoch: 15   Global Step: 76060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:06:40,699-Speed 10450.91 samples/sec   Loss 7.5504   LearningRate 0.0389   Epoch: 15   Global Step: 76070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:06:41,678-Speed 10474.47 samples/sec   Loss 7.6840   LearningRate 0.0389   Epoch: 15   Global Step: 76080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:06:42,626-Speed 10809.95 samples/sec   Loss 7.6291   LearningRate 0.0389   Epoch: 15   Global Step: 76090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:06:43,592-Speed 10602.39 samples/sec   Loss 7.8349   LearningRate 0.0389   Epoch: 15   Global Step: 76100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:06:44,600-Speed 10164.03 samples/sec   Loss 7.9269   LearningRate 0.0389   Epoch: 15   Global Step: 76110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:06:45,563-Speed 10643.32 samples/sec   Loss 7.6827   LearningRate 0.0389   Epoch: 15   Global Step: 76120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:06:46,540-Speed 10492.36 samples/sec   Loss 7.5759   LearningRate 0.0389   Epoch: 15   Global Step: 76130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:06:47,500-Speed 10675.50 samples/sec   Loss 7.6487   LearningRate 0.0389   Epoch: 15   Global Step: 76140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:06:48,437-Speed 10940.99 samples/sec   Loss 7.7153   LearningRate 0.0389   Epoch: 15   Global Step: 76150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:06:49,448-Speed 10137.16 samples/sec   Loss 7.6037   LearningRate 0.0389   Epoch: 15   Global Step: 76160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:06:50,407-Speed 10685.99 samples/sec   Loss 7.5950   LearningRate 0.0389   Epoch: 15   Global Step: 76170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:06:51,353-Speed 10836.26 samples/sec   Loss 7.7961   LearningRate 0.0389   Epoch: 15   Global Step: 76180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:06:52,281-Speed 11047.30 samples/sec   Loss 7.8316   LearningRate 0.0389   Epoch: 15   Global Step: 76190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:06:53,268-Speed 10404.27 samples/sec   Loss 7.7434   LearningRate 0.0389   Epoch: 15   Global Step: 76200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:06:54,201-Speed 10980.92 samples/sec   Loss 7.8928   LearningRate 0.0389   Epoch: 15   Global Step: 76210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:06:55,136-Speed 10968.77 samples/sec   Loss 7.7199   LearningRate 0.0388   Epoch: 15   Global Step: 76220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:06:56,071-Speed 10955.85 samples/sec   Loss 7.7476   LearningRate 0.0388   Epoch: 15   Global Step: 76230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:06:57,067-Speed 10286.68 samples/sec   Loss 7.7828   LearningRate 0.0388   Epoch: 15   Global Step: 76240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:06:58,028-Speed 10674.29 samples/sec   Loss 7.7608   LearningRate 0.0388   Epoch: 15   Global Step: 76250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:06:58,999-Speed 10559.21 samples/sec   Loss 7.8027   LearningRate 0.0388   Epoch: 15   Global Step: 76260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:06:59,971-Speed 10539.19 samples/sec   Loss 7.5989   LearningRate 0.0388   Epoch: 15   Global Step: 76270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:07:00,957-Speed 10397.62 samples/sec   Loss 7.8359   LearningRate 0.0388   Epoch: 15   Global Step: 76280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:07:01,927-Speed 10569.78 samples/sec   Loss 7.7989   LearningRate 0.0388   Epoch: 15   Global Step: 76290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:07:02,889-Speed 10656.34 samples/sec   Loss 7.8364   LearningRate 0.0388   Epoch: 15   Global Step: 76300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:07:03,944-Speed 9713.85 samples/sec   Loss 7.6545   LearningRate 0.0388   Epoch: 15   Global Step: 76310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:07:04,898-Speed 10748.90 samples/sec   Loss 7.8439   LearningRate 0.0388   Epoch: 15   Global Step: 76320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:07:05,861-Speed 10638.43 samples/sec   Loss 7.9014   LearningRate 0.0388   Epoch: 15   Global Step: 76330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:07:06,951-Speed 9400.91 samples/sec   Loss 7.8639   LearningRate 0.0388   Epoch: 15   Global Step: 76340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:07:07,929-Speed 10487.74 samples/sec   Loss 7.7950   LearningRate 0.0388   Epoch: 15   Global Step: 76350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:07:09,093-Speed 8803.15 samples/sec   Loss 7.9258   LearningRate 0.0388   Epoch: 15   Global Step: 76360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:07:10,072-Speed 10467.55 samples/sec   Loss 7.7372   LearningRate 0.0388   Epoch: 15   Global Step: 76370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:07:11,074-Speed 10225.30 samples/sec   Loss 7.8601   LearningRate 0.0387   Epoch: 15   Global Step: 76380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:07:12,031-Speed 10707.32 samples/sec   Loss 7.7812   LearningRate 0.0387   Epoch: 15   Global Step: 76390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:07:13,019-Speed 10380.26 samples/sec   Loss 7.9568   LearningRate 0.0387   Epoch: 15   Global Step: 76400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:07:13,998-Speed 10468.98 samples/sec   Loss 7.9245   LearningRate 0.0387   Epoch: 15   Global Step: 76410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:07:14,956-Speed 10700.79 samples/sec   Loss 7.7774   LearningRate 0.0387   Epoch: 15   Global Step: 76420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:07:15,922-Speed 10608.35 samples/sec   Loss 7.9130   LearningRate 0.0387   Epoch: 15   Global Step: 76430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:07:16,878-Speed 10717.49 samples/sec   Loss 7.9567   LearningRate 0.0387   Epoch: 15   Global Step: 76440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:07:17,866-Speed 10378.21 samples/sec   Loss 8.0692   LearningRate 0.0387   Epoch: 15   Global Step: 76450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:07:18,808-Speed 10888.21 samples/sec   Loss 7.8602   LearningRate 0.0387   Epoch: 15   Global Step: 76460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:07:19,764-Speed 10712.62 samples/sec   Loss 8.0239   LearningRate 0.0387   Epoch: 15   Global Step: 76470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:07:20,746-Speed 10435.77 samples/sec   Loss 8.1004   LearningRate 0.0387   Epoch: 15   Global Step: 76480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:07:21,772-Speed 9990.06 samples/sec   Loss 7.9429   LearningRate 0.0387   Epoch: 15   Global Step: 76490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:07:22,706-Speed 10980.09 samples/sec   Loss 8.0858   LearningRate 0.0387   Epoch: 15   Global Step: 76500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:07:23,634-Speed 11048.37 samples/sec   Loss 7.9460   LearningRate 0.0387   Epoch: 15   Global Step: 76510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:07:24,622-Speed 10370.85 samples/sec   Loss 7.9391   LearningRate 0.0387   Epoch: 15   Global Step: 76520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:07:25,553-Speed 11006.69 samples/sec   Loss 8.0495   LearningRate 0.0387   Epoch: 15   Global Step: 76530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:07:26,485-Speed 11003.59 samples/sec   Loss 8.0062   LearningRate 0.0386   Epoch: 15   Global Step: 76540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:07:27,422-Speed 10937.06 samples/sec   Loss 7.9785   LearningRate 0.0386   Epoch: 15   Global Step: 76550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:07:28,373-Speed 10783.07 samples/sec   Loss 7.9808   LearningRate 0.0386   Epoch: 15   Global Step: 76560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:07:29,397-Speed 10001.64 samples/sec   Loss 7.7795   LearningRate 0.0386   Epoch: 15   Global Step: 76570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:07:30,344-Speed 10825.66 samples/sec   Loss 8.0062   LearningRate 0.0386   Epoch: 15   Global Step: 76580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:07:31,302-Speed 10698.92 samples/sec   Loss 8.0687   LearningRate 0.0386   Epoch: 15   Global Step: 76590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:07:32,268-Speed 10604.92 samples/sec   Loss 7.9430   LearningRate 0.0386   Epoch: 15   Global Step: 76600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:07:33,258-Speed 10362.17 samples/sec   Loss 7.9894   LearningRate 0.0386   Epoch: 15   Global Step: 76610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:07:34,225-Speed 10589.86 samples/sec   Loss 8.1346   LearningRate 0.0386   Epoch: 15   Global Step: 76620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:07:35,165-Speed 10908.84 samples/sec   Loss 7.9642   LearningRate 0.0386   Epoch: 15   Global Step: 76630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:07:36,143-Speed 10474.58 samples/sec   Loss 7.8871   LearningRate 0.0386   Epoch: 15   Global Step: 76640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:07:37,133-Speed 10357.51 samples/sec   Loss 7.9907   LearningRate 0.0386   Epoch: 15   Global Step: 76650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:07:38,095-Speed 10653.90 samples/sec   Loss 8.0469   LearningRate 0.0386   Epoch: 15   Global Step: 76660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:07:39,024-Speed 11030.73 samples/sec   Loss 7.9019   LearningRate 0.0386   Epoch: 15   Global Step: 76670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:07:40,009-Speed 10405.70 samples/sec   Loss 7.9917   LearningRate 0.0386   Epoch: 15   Global Step: 76680   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:07:40,968-Speed 10692.47 samples/sec   Loss 8.0731   LearningRate 0.0386   Epoch: 15   Global Step: 76690   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:07:41,915-Speed 10827.20 samples/sec   Loss 8.1419   LearningRate 0.0386   Epoch: 15   Global Step: 76700   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:07:42,893-Speed 10482.76 samples/sec   Loss 8.0337   LearningRate 0.0385   Epoch: 15   Global Step: 76710   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:07:43,865-Speed 10546.65 samples/sec   Loss 8.0770   LearningRate 0.0385   Epoch: 15   Global Step: 76720   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:07:44,792-Speed 11066.46 samples/sec   Loss 8.0116   LearningRate 0.0385   Epoch: 15   Global Step: 76730   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:07:45,711-Speed 11152.11 samples/sec   Loss 8.1100   LearningRate 0.0385   Epoch: 15   Global Step: 76740   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:07:46,679-Speed 10594.66 samples/sec   Loss 8.0307   LearningRate 0.0385   Epoch: 15   Global Step: 76750   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:07:47,766-Speed 9422.87 samples/sec   Loss 8.1846   LearningRate 0.0385   Epoch: 15   Global Step: 76760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:07:48,724-Speed 10706.12 samples/sec   Loss 8.0424   LearningRate 0.0385   Epoch: 15   Global Step: 76770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:07:49,679-Speed 10733.06 samples/sec   Loss 8.0412   LearningRate 0.0385   Epoch: 15   Global Step: 76780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:07:50,674-Speed 10290.32 samples/sec   Loss 8.0746   LearningRate 0.0385   Epoch: 15   Global Step: 76790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:07:51,622-Speed 10812.80 samples/sec   Loss 8.1301   LearningRate 0.0385   Epoch: 15   Global Step: 76800   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:07:52,582-Speed 10683.64 samples/sec   Loss 8.0740   LearningRate 0.0385   Epoch: 15   Global Step: 76810   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:07:53,515-Speed 10982.73 samples/sec   Loss 8.1916   LearningRate 0.0385   Epoch: 15   Global Step: 76820   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:07:54,502-Speed 10383.78 samples/sec   Loss 8.0323   LearningRate 0.0385   Epoch: 15   Global Step: 76830   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:07:55,462-Speed 10688.95 samples/sec   Loss 8.0847   LearningRate 0.0385   Epoch: 15   Global Step: 76840   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:07:56,434-Speed 10541.72 samples/sec   Loss 8.0545   LearningRate 0.0385   Epoch: 15   Global Step: 76850   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:07:57,365-Speed 11024.51 samples/sec   Loss 8.1600   LearningRate 0.0385   Epoch: 15   Global Step: 76860   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:07:58,343-Speed 10471.30 samples/sec   Loss 8.0665   LearningRate 0.0384   Epoch: 15   Global Step: 76870   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:07:59,308-Speed 10622.54 samples/sec   Loss 8.1121   LearningRate 0.0384   Epoch: 15   Global Step: 76880   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:08:00,256-Speed 10808.60 samples/sec   Loss 8.0285   LearningRate 0.0384   Epoch: 15   Global Step: 76890   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:08:01,219-Speed 10643.34 samples/sec   Loss 8.2223   LearningRate 0.0384   Epoch: 15   Global Step: 76900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:02,204-Speed 10410.04 samples/sec   Loss 8.1522   LearningRate 0.0384   Epoch: 15   Global Step: 76910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:03,126-Speed 11117.26 samples/sec   Loss 8.2589   LearningRate 0.0384   Epoch: 15   Global Step: 76920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:04,076-Speed 10789.78 samples/sec   Loss 7.9996   LearningRate 0.0384   Epoch: 15   Global Step: 76930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:05,014-Speed 10925.11 samples/sec   Loss 8.1444   LearningRate 0.0384   Epoch: 15   Global Step: 76940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:05,915-Speed 11379.81 samples/sec   Loss 8.1122   LearningRate 0.0384   Epoch: 15   Global Step: 76950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:06,895-Speed 10451.90 samples/sec   Loss 8.1644   LearningRate 0.0384   Epoch: 15   Global Step: 76960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:07,842-Speed 10818.32 samples/sec   Loss 8.1464   LearningRate 0.0384   Epoch: 15   Global Step: 76970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:08,802-Speed 10676.54 samples/sec   Loss 8.0515   LearningRate 0.0384   Epoch: 15   Global Step: 76980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:09,737-Speed 10966.64 samples/sec   Loss 8.1908   LearningRate 0.0384   Epoch: 15   Global Step: 76990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:10,655-Speed 11164.00 samples/sec   Loss 8.1648   LearningRate 0.0384   Epoch: 15   Global Step: 77000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:11,617-Speed 10649.30 samples/sec   Loss 8.2153   LearningRate 0.0384   Epoch: 15   Global Step: 77010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:12,545-Speed 11054.89 samples/sec   Loss 8.2401   LearningRate 0.0384   Epoch: 15   Global Step: 77020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:13,479-Speed 10970.39 samples/sec   Loss 8.1512   LearningRate 0.0383   Epoch: 15   Global Step: 77030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:14,451-Speed 10544.82 samples/sec   Loss 8.1395   LearningRate 0.0383   Epoch: 15   Global Step: 77040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:15,450-Speed 10255.79 samples/sec   Loss 8.4174   LearningRate 0.0383   Epoch: 15   Global Step: 77050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:16,395-Speed 10857.78 samples/sec   Loss 8.2048   LearningRate 0.0383   Epoch: 15   Global Step: 77060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:17,369-Speed 10515.12 samples/sec   Loss 8.1282   LearningRate 0.0383   Epoch: 15   Global Step: 77070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:18,398-Speed 9964.01 samples/sec   Loss 8.1206   LearningRate 0.0383   Epoch: 15   Global Step: 77080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:19,348-Speed 10792.30 samples/sec   Loss 8.1306   LearningRate 0.0383   Epoch: 15   Global Step: 77090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:20,292-Speed 10858.66 samples/sec   Loss 8.0693   LearningRate 0.0383   Epoch: 15   Global Step: 77100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:08:21,213-Speed 11115.54 samples/sec   Loss 8.1602   LearningRate 0.0383   Epoch: 15   Global Step: 77110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:08:22,207-Speed 10319.32 samples/sec   Loss 8.1492   LearningRate 0.0383   Epoch: 15   Global Step: 77120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:08:23,173-Speed 10606.37 samples/sec   Loss 8.0166   LearningRate 0.0383   Epoch: 15   Global Step: 77130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:24,119-Speed 10834.08 samples/sec   Loss 8.0312   LearningRate 0.0383   Epoch: 15   Global Step: 77140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:25,082-Speed 10638.09 samples/sec   Loss 8.1673   LearningRate 0.0383   Epoch: 15   Global Step: 77150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:26,057-Speed 10510.83 samples/sec   Loss 8.0991   LearningRate 0.0383   Epoch: 15   Global Step: 77160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:27,032-Speed 10516.53 samples/sec   Loss 8.2005   LearningRate 0.0383   Epoch: 15   Global Step: 77170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:27,960-Speed 11049.98 samples/sec   Loss 8.0863   LearningRate 0.0383   Epoch: 15   Global Step: 77180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:28,949-Speed 10358.41 samples/sec   Loss 8.0792   LearningRate 0.0383   Epoch: 15   Global Step: 77190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:29,898-Speed 10805.20 samples/sec   Loss 8.0899   LearningRate 0.0382   Epoch: 15   Global Step: 77200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:30,882-Speed 10412.48 samples/sec   Loss 8.1369   LearningRate 0.0382   Epoch: 15   Global Step: 77210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:31,819-Speed 10933.32 samples/sec   Loss 8.2872   LearningRate 0.0382   Epoch: 15   Global Step: 77220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:32,735-Speed 11193.61 samples/sec   Loss 8.1826   LearningRate 0.0382   Epoch: 15   Global Step: 77230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:08:33,655-Speed 11144.68 samples/sec   Loss 8.1883   LearningRate 0.0382   Epoch: 15   Global Step: 77240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:08:34,627-Speed 10543.08 samples/sec   Loss 8.2156   LearningRate 0.0382   Epoch: 15   Global Step: 77250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:08:35,584-Speed 10711.92 samples/sec   Loss 8.2274   LearningRate 0.0382   Epoch: 15   Global Step: 77260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:08:36,599-Speed 10093.77 samples/sec   Loss 8.1599   LearningRate 0.0382   Epoch: 15   Global Step: 77270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:08:37,584-Speed 10397.81 samples/sec   Loss 8.0547   LearningRate 0.0382   Epoch: 15   Global Step: 77280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:08:38,552-Speed 10596.89 samples/sec   Loss 8.0778   LearningRate 0.0382   Epoch: 15   Global Step: 77290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:39,478-Speed 11070.67 samples/sec   Loss 8.3523   LearningRate 0.0382   Epoch: 15   Global Step: 77300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:40,417-Speed 10923.27 samples/sec   Loss 8.0890   LearningRate 0.0382   Epoch: 15   Global Step: 77310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:41,401-Speed 10414.86 samples/sec   Loss 8.1627   LearningRate 0.0382   Epoch: 15   Global Step: 77320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:42,399-Speed 10273.42 samples/sec   Loss 8.2883   LearningRate 0.0382   Epoch: 15   Global Step: 77330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:43,335-Speed 10945.53 samples/sec   Loss 8.2505   LearningRate 0.0382   Epoch: 15   Global Step: 77340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:44,269-Speed 10971.94 samples/sec   Loss 8.3235   LearningRate 0.0382   Epoch: 15   Global Step: 77350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:45,213-Speed 10860.80 samples/sec   Loss 8.2874   LearningRate 0.0381   Epoch: 15   Global Step: 77360   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:08:46,168-Speed 10726.79 samples/sec   Loss 8.1502   LearningRate 0.0381   Epoch: 15   Global Step: 77370   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:08:47,223-Speed 9719.95 samples/sec   Loss 8.2556   LearningRate 0.0381   Epoch: 15   Global Step: 77380   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:08:48,185-Speed 10647.35 samples/sec   Loss 8.1880   LearningRate 0.0381   Epoch: 15   Global Step: 77390   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:08:49,155-Speed 10568.34 samples/sec   Loss 8.1228   LearningRate 0.0381   Epoch: 15   Global Step: 77400   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:08:50,100-Speed 10838.70 samples/sec   Loss 8.2375   LearningRate 0.0381   Epoch: 15   Global Step: 77410   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:08:51,182-Speed 9480.15 samples/sec   Loss 8.1676   LearningRate 0.0381   Epoch: 15   Global Step: 77420   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:08:52,128-Speed 10833.68 samples/sec   Loss 8.1397   LearningRate 0.0381   Epoch: 15   Global Step: 77430   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:08:53,060-Speed 10999.76 samples/sec   Loss 8.1763   LearningRate 0.0381   Epoch: 15   Global Step: 77440   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:08:53,993-Speed 10981.79 samples/sec   Loss 8.1621   LearningRate 0.0381   Epoch: 15   Global Step: 77450   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:08:54,934-Speed 10894.92 samples/sec   Loss 8.3057   LearningRate 0.0381   Epoch: 15   Global Step: 77460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:55,877-Speed 10861.46 samples/sec   Loss 8.0951   LearningRate 0.0381   Epoch: 15   Global Step: 77470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:56,833-Speed 10720.80 samples/sec   Loss 8.2683   LearningRate 0.0381   Epoch: 15   Global Step: 77480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:57,787-Speed 10742.77 samples/sec   Loss 8.2284   LearningRate 0.0381   Epoch: 15   Global Step: 77490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:58,775-Speed 10371.73 samples/sec   Loss 8.1270   LearningRate 0.0381   Epoch: 15   Global Step: 77500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:08:59,726-Speed 10784.01 samples/sec   Loss 8.2645   LearningRate 0.0381   Epoch: 15   Global Step: 77510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:09:00,661-Speed 10962.45 samples/sec   Loss 8.2926   LearningRate 0.0380   Epoch: 15   Global Step: 77520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:09:01,574-Speed 11230.03 samples/sec   Loss 8.2564   LearningRate 0.0380   Epoch: 15   Global Step: 77530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:09:02,494-Speed 11130.10 samples/sec   Loss 8.1807   LearningRate 0.0380   Epoch: 15   Global Step: 77540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:09:03,448-Speed 10751.54 samples/sec   Loss 8.1047   LearningRate 0.0380   Epoch: 15   Global Step: 77550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:09:04,400-Speed 10759.34 samples/sec   Loss 8.1743   LearningRate 0.0380   Epoch: 15   Global Step: 77560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:09:05,385-Speed 10408.34 samples/sec   Loss 8.2436   LearningRate 0.0380   Epoch: 15   Global Step: 77570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:09:06,378-Speed 10328.23 samples/sec   Loss 8.2198   LearningRate 0.0380   Epoch: 15   Global Step: 77580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:09:07,317-Speed 10915.52 samples/sec   Loss 8.1388   LearningRate 0.0380   Epoch: 15   Global Step: 77590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:09:08,267-Speed 10788.17 samples/sec   Loss 8.2448   LearningRate 0.0380   Epoch: 15   Global Step: 77600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:09:09,209-Speed 10875.83 samples/sec   Loss 8.1396   LearningRate 0.0380   Epoch: 15   Global Step: 77610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:09:10,158-Speed 10800.53 samples/sec   Loss 8.2005   LearningRate 0.0380   Epoch: 15   Global Step: 77620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:09:11,105-Speed 10823.41 samples/sec   Loss 8.3221   LearningRate 0.0380   Epoch: 15   Global Step: 77630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:09:12,024-Speed 11156.12 samples/sec   Loss 8.2800   LearningRate 0.0380   Epoch: 15   Global Step: 77640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:09:12,936-Speed 11247.29 samples/sec   Loss 8.3029   LearningRate 0.0380   Epoch: 15   Global Step: 77650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:09:13,903-Speed 10591.31 samples/sec   Loss 8.3464   LearningRate 0.0380   Epoch: 15   Global Step: 77660   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 02:09:14,859-Speed 10722.53 samples/sec   Loss 8.1903   LearningRate 0.0380   Epoch: 15   Global Step: 77670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:09:15,839-Speed 10462.69 samples/sec   Loss 8.2836   LearningRate 0.0380   Epoch: 15   Global Step: 77680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:09:16,814-Speed 10521.59 samples/sec   Loss 8.1104   LearningRate 0.0379   Epoch: 15   Global Step: 77690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:09:17,740-Speed 11072.06 samples/sec   Loss 8.2224   LearningRate 0.0379   Epoch: 15   Global Step: 77700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:09:18,676-Speed 10950.43 samples/sec   Loss 8.2243   LearningRate 0.0379   Epoch: 15   Global Step: 77710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:09:19,657-Speed 10437.73 samples/sec   Loss 8.2485   LearningRate 0.0379   Epoch: 15   Global Step: 77720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:09:20,637-Speed 10460.95 samples/sec   Loss 8.2218   LearningRate 0.0379   Epoch: 15   Global Step: 77730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:09:21,614-Speed 10487.17 samples/sec   Loss 8.2626   LearningRate 0.0379   Epoch: 15   Global Step: 77740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:09:22,554-Speed 10912.24 samples/sec   Loss 8.3214   LearningRate 0.0379   Epoch: 15   Global Step: 77750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:09:23,529-Speed 10518.05 samples/sec   Loss 8.3609   LearningRate 0.0379   Epoch: 15   Global Step: 77760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:09:24,506-Speed 10487.05 samples/sec   Loss 8.2523   LearningRate 0.0379   Epoch: 15   Global Step: 77770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:09:25,452-Speed 10839.95 samples/sec   Loss 8.3131   LearningRate 0.0379   Epoch: 15   Global Step: 77780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:09:26,408-Speed 10717.25 samples/sec   Loss 8.0112   LearningRate 0.0379   Epoch: 15   Global Step: 77790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:09:27,363-Speed 10729.77 samples/sec   Loss 8.1944   LearningRate 0.0379   Epoch: 15   Global Step: 77800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:09:28,312-Speed 10804.84 samples/sec   Loss 8.1261   LearningRate 0.0379   Epoch: 15   Global Step: 77810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:09:29,263-Speed 10778.56 samples/sec   Loss 8.0693   LearningRate 0.0379   Epoch: 15   Global Step: 77820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:09:30,179-Speed 11192.00 samples/sec   Loss 8.3333   LearningRate 0.0379   Epoch: 15   Global Step: 77830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:09:31,151-Speed 10544.08 samples/sec   Loss 8.2710   LearningRate 0.0379   Epoch: 15   Global Step: 77840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:09:32,132-Speed 10476.49 samples/sec   Loss 8.3582   LearningRate 0.0378   Epoch: 15   Global Step: 77850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:09:33,085-Speed 10755.95 samples/sec   Loss 8.2557   LearningRate 0.0378   Epoch: 15   Global Step: 77860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:09:34,039-Speed 10742.32 samples/sec   Loss 8.2503   LearningRate 0.0378   Epoch: 15   Global Step: 77870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:09:35,036-Speed 10279.98 samples/sec   Loss 8.1783   LearningRate 0.0378   Epoch: 15   Global Step: 77880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:09:35,983-Speed 10825.13 samples/sec   Loss 8.2093   LearningRate 0.0378   Epoch: 15   Global Step: 77890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:09:37,003-Speed 10047.47 samples/sec   Loss 8.2258   LearningRate 0.0378   Epoch: 15   Global Step: 77900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:09:37,970-Speed 10600.54 samples/sec   Loss 8.2039   LearningRate 0.0378   Epoch: 15   Global Step: 77910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:09:38,938-Speed 10584.96 samples/sec   Loss 8.2784   LearningRate 0.0378   Epoch: 15   Global Step: 77920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:09:39,895-Speed 10886.42 samples/sec   Loss 8.2133   LearningRate 0.0378   Epoch: 15   Global Step: 77930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:09:40,834-Speed 10926.08 samples/sec   Loss 8.3067   LearningRate 0.0378   Epoch: 15   Global Step: 77940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:09:41,795-Speed 10666.12 samples/sec   Loss 8.3522   LearningRate 0.0378   Epoch: 15   Global Step: 77950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:09:42,720-Speed 11080.12 samples/sec   Loss 8.2865   LearningRate 0.0378   Epoch: 15   Global Step: 77960   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:09:43,697-Speed 10485.51 samples/sec   Loss 8.3802   LearningRate 0.0378   Epoch: 15   Global Step: 77970   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:09:44,653-Speed 10727.28 samples/sec   Loss 8.2496   LearningRate 0.0378   Epoch: 15   Global Step: 77980   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:09:45,586-Speed 10981.81 samples/sec   Loss 8.2757   LearningRate 0.0378   Epoch: 15   Global Step: 77990   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:09:46,563-Speed 10491.45 samples/sec   Loss 8.3611   LearningRate 0.0378   Epoch: 15   Global Step: 78000   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:10:08,832-[lfw][78000]XNorm: 11.745637
Training: 2022-04-11 02:10:08,833-[lfw][78000]Accuracy-Flip: 0.99600+-0.00318
Training: 2022-04-11 02:10:08,834-[lfw][78000]Accuracy-Highest: 0.99617
Training: 2022-04-11 02:10:34,387-[cfp_fp][78000]XNorm: 9.997324
Training: 2022-04-11 02:10:34,388-[cfp_fp][78000]Accuracy-Flip: 0.95629+-0.00846
Training: 2022-04-11 02:10:34,389-[cfp_fp][78000]Accuracy-Highest: 0.95743
Training: 2022-04-11 02:10:56,584-[agedb_30][78000]XNorm: 11.563045
Training: 2022-04-11 02:10:56,586-[agedb_30][78000]Accuracy-Flip: 0.96183+-0.00984
Training: 2022-04-11 02:10:56,586-[agedb_30][78000]Accuracy-Highest: 0.96283
Training: 2022-04-11 02:10:57,552-Speed 144.25 samples/sec   Loss 8.3809   LearningRate 0.0378   Epoch: 15   Global Step: 78010   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:10:58,476-Speed 11094.58 samples/sec   Loss 8.2072   LearningRate 0.0377   Epoch: 15   Global Step: 78020   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:10:59,420-Speed 10859.57 samples/sec   Loss 8.1978   LearningRate 0.0377   Epoch: 15   Global Step: 78030   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:11:00,368-Speed 10804.22 samples/sec   Loss 8.2093   LearningRate 0.0377   Epoch: 15   Global Step: 78040   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:11:01,330-Speed 10657.01 samples/sec   Loss 8.3914   LearningRate 0.0377   Epoch: 15   Global Step: 78050   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:11:02,442-Speed 9216.98 samples/sec   Loss 8.3506   LearningRate 0.0377   Epoch: 15   Global Step: 78060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:03,425-Speed 10432.68 samples/sec   Loss 8.3444   LearningRate 0.0377   Epoch: 15   Global Step: 78070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:04,399-Speed 10522.34 samples/sec   Loss 8.2479   LearningRate 0.0377   Epoch: 15   Global Step: 78080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:05,387-Speed 10375.98 samples/sec   Loss 8.1599   LearningRate 0.0377   Epoch: 15   Global Step: 78090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:06,362-Speed 10516.74 samples/sec   Loss 8.2898   LearningRate 0.0377   Epoch: 15   Global Step: 78100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:07,293-Speed 11010.06 samples/sec   Loss 8.2242   LearningRate 0.0377   Epoch: 15   Global Step: 78110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:08,194-Speed 11368.20 samples/sec   Loss 7.9907   LearningRate 0.0377   Epoch: 15   Global Step: 78120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:09,128-Speed 10976.79 samples/sec   Loss 8.3784   LearningRate 0.0377   Epoch: 15   Global Step: 78130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:10,088-Speed 10671.99 samples/sec   Loss 8.4239   LearningRate 0.0377   Epoch: 15   Global Step: 78140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:11,015-Speed 11051.31 samples/sec   Loss 8.2928   LearningRate 0.0377   Epoch: 15   Global Step: 78150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:11,976-Speed 10668.29 samples/sec   Loss 8.3264   LearningRate 0.0377   Epoch: 15   Global Step: 78160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:11:12,933-Speed 10709.10 samples/sec   Loss 8.3455   LearningRate 0.0377   Epoch: 15   Global Step: 78170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:13,936-Speed 10216.65 samples/sec   Loss 8.4282   LearningRate 0.0376   Epoch: 15   Global Step: 78180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:14,877-Speed 10899.06 samples/sec   Loss 8.2261   LearningRate 0.0376   Epoch: 15   Global Step: 78190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:15,811-Speed 10973.92 samples/sec   Loss 8.2493   LearningRate 0.0376   Epoch: 15   Global Step: 78200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:16,776-Speed 10619.62 samples/sec   Loss 8.2861   LearningRate 0.0376   Epoch: 15   Global Step: 78210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:17,783-Speed 10171.67 samples/sec   Loss 8.3057   LearningRate 0.0376   Epoch: 15   Global Step: 78220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:18,719-Speed 10957.62 samples/sec   Loss 8.4566   LearningRate 0.0376   Epoch: 15   Global Step: 78230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:19,667-Speed 10812.68 samples/sec   Loss 8.3177   LearningRate 0.0376   Epoch: 15   Global Step: 78240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:20,625-Speed 10697.28 samples/sec   Loss 8.3495   LearningRate 0.0376   Epoch: 15   Global Step: 78250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:21,602-Speed 10486.81 samples/sec   Loss 8.2995   LearningRate 0.0376   Epoch: 15   Global Step: 78260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:22,534-Speed 10997.44 samples/sec   Loss 8.4406   LearningRate 0.0376   Epoch: 15   Global Step: 78270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:11:23,467-Speed 10991.00 samples/sec   Loss 8.2647   LearningRate 0.0376   Epoch: 15   Global Step: 78280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:11:24,450-Speed 10433.33 samples/sec   Loss 8.2003   LearningRate 0.0376   Epoch: 15   Global Step: 78290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:25,397-Speed 10814.53 samples/sec   Loss 8.3677   LearningRate 0.0376   Epoch: 15   Global Step: 78300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:26,353-Speed 10720.45 samples/sec   Loss 8.4785   LearningRate 0.0376   Epoch: 15   Global Step: 78310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:27,323-Speed 10568.18 samples/sec   Loss 8.1368   LearningRate 0.0376   Epoch: 15   Global Step: 78320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:28,277-Speed 10745.98 samples/sec   Loss 8.3875   LearningRate 0.0376   Epoch: 15   Global Step: 78330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:29,275-Speed 10264.71 samples/sec   Loss 8.3523   LearningRate 0.0376   Epoch: 15   Global Step: 78340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:30,246-Speed 10561.07 samples/sec   Loss 8.3065   LearningRate 0.0375   Epoch: 15   Global Step: 78350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:31,155-Speed 11271.12 samples/sec   Loss 8.3070   LearningRate 0.0375   Epoch: 15   Global Step: 78360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:32,080-Speed 11081.38 samples/sec   Loss 8.3403   LearningRate 0.0375   Epoch: 15   Global Step: 78370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:33,053-Speed 10536.44 samples/sec   Loss 8.1601   LearningRate 0.0375   Epoch: 15   Global Step: 78380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:34,018-Speed 10609.34 samples/sec   Loss 8.3876   LearningRate 0.0375   Epoch: 15   Global Step: 78390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:11:34,971-Speed 10759.51 samples/sec   Loss 8.5418   LearningRate 0.0375   Epoch: 15   Global Step: 78400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:11:35,940-Speed 10570.83 samples/sec   Loss 8.2494   LearningRate 0.0375   Epoch: 15   Global Step: 78410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:11:36,925-Speed 10418.32 samples/sec   Loss 8.2062   LearningRate 0.0375   Epoch: 15   Global Step: 78420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:11:37,883-Speed 10690.73 samples/sec   Loss 8.1367   LearningRate 0.0375   Epoch: 15   Global Step: 78430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:11:38,829-Speed 10834.51 samples/sec   Loss 8.3085   LearningRate 0.0375   Epoch: 15   Global Step: 78440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:11:39,804-Speed 10511.53 samples/sec   Loss 8.2847   LearningRate 0.0375   Epoch: 15   Global Step: 78450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:11:40,720-Speed 11197.22 samples/sec   Loss 8.2024   LearningRate 0.0375   Epoch: 15   Global Step: 78460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:11:41,659-Speed 10909.99 samples/sec   Loss 8.2553   LearningRate 0.0375   Epoch: 15   Global Step: 78470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:11:42,597-Speed 10929.87 samples/sec   Loss 8.3534   LearningRate 0.0375   Epoch: 15   Global Step: 78480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:43,584-Speed 10380.75 samples/sec   Loss 8.2223   LearningRate 0.0375   Epoch: 15   Global Step: 78490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:44,511-Speed 11056.57 samples/sec   Loss 8.2610   LearningRate 0.0375   Epoch: 15   Global Step: 78500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:45,426-Speed 11191.69 samples/sec   Loss 8.1964   LearningRate 0.0374   Epoch: 15   Global Step: 78510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:46,398-Speed 10553.17 samples/sec   Loss 8.3242   LearningRate 0.0374   Epoch: 15   Global Step: 78520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:47,367-Speed 10574.10 samples/sec   Loss 8.1788   LearningRate 0.0374   Epoch: 15   Global Step: 78530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:48,303-Speed 10957.94 samples/sec   Loss 8.2628   LearningRate 0.0374   Epoch: 15   Global Step: 78540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:49,267-Speed 10630.24 samples/sec   Loss 8.1710   LearningRate 0.0374   Epoch: 15   Global Step: 78550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:50,313-Speed 9792.51 samples/sec   Loss 8.2980   LearningRate 0.0374   Epoch: 15   Global Step: 78560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:51,286-Speed 10532.01 samples/sec   Loss 8.4179   LearningRate 0.0374   Epoch: 15   Global Step: 78570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:11:52,295-Speed 10163.15 samples/sec   Loss 8.0450   LearningRate 0.0374   Epoch: 15   Global Step: 78580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:11:53,263-Speed 10589.37 samples/sec   Loss 8.1974   LearningRate 0.0374   Epoch: 15   Global Step: 78590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:11:54,233-Speed 10565.23 samples/sec   Loss 8.1964   LearningRate 0.0374   Epoch: 15   Global Step: 78600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:11:55,161-Speed 11039.64 samples/sec   Loss 8.1991   LearningRate 0.0374   Epoch: 15   Global Step: 78610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:11:56,108-Speed 10825.28 samples/sec   Loss 8.3211   LearningRate 0.0374   Epoch: 15   Global Step: 78620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:11:57,061-Speed 10760.41 samples/sec   Loss 8.3567   LearningRate 0.0374   Epoch: 15   Global Step: 78630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:11:58,071-Speed 10143.31 samples/sec   Loss 8.2187   LearningRate 0.0374   Epoch: 15   Global Step: 78640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:11:59,010-Speed 10924.18 samples/sec   Loss 8.1965   LearningRate 0.0374   Epoch: 15   Global Step: 78650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:11:59,975-Speed 10621.19 samples/sec   Loss 8.3174   LearningRate 0.0374   Epoch: 15   Global Step: 78660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:12:00,933-Speed 10693.39 samples/sec   Loss 8.2617   LearningRate 0.0374   Epoch: 15   Global Step: 78670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:12:01,900-Speed 10601.43 samples/sec   Loss 8.3478   LearningRate 0.0373   Epoch: 15   Global Step: 78680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:12:02,860-Speed 10681.15 samples/sec   Loss 8.2136   LearningRate 0.0373   Epoch: 15   Global Step: 78690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:12:03,815-Speed 10730.31 samples/sec   Loss 8.4573   LearningRate 0.0373   Epoch: 15   Global Step: 78700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:12:04,785-Speed 10567.91 samples/sec   Loss 8.3885   LearningRate 0.0373   Epoch: 15   Global Step: 78710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:12:05,743-Speed 10708.08 samples/sec   Loss 8.2967   LearningRate 0.0373   Epoch: 15   Global Step: 78720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:12:06,710-Speed 10594.91 samples/sec   Loss 8.4236   LearningRate 0.0373   Epoch: 15   Global Step: 78730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:12:07,683-Speed 10528.17 samples/sec   Loss 8.3454   LearningRate 0.0373   Epoch: 15   Global Step: 78740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:12:08,710-Speed 9982.50 samples/sec   Loss 8.1832   LearningRate 0.0373   Epoch: 15   Global Step: 78750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:12:09,670-Speed 10681.89 samples/sec   Loss 8.3466   LearningRate 0.0373   Epoch: 15   Global Step: 78760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:12:10,609-Speed 10908.60 samples/sec   Loss 8.2837   LearningRate 0.0373   Epoch: 15   Global Step: 78770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:12:11,564-Speed 10727.19 samples/sec   Loss 8.3392   LearningRate 0.0373   Epoch: 15   Global Step: 78780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:12:12,501-Speed 10944.98 samples/sec   Loss 8.3045   LearningRate 0.0373   Epoch: 15   Global Step: 78790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:12:13,478-Speed 10491.67 samples/sec   Loss 8.3422   LearningRate 0.0373   Epoch: 15   Global Step: 78800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:12:14,446-Speed 10588.27 samples/sec   Loss 8.2685   LearningRate 0.0373   Epoch: 15   Global Step: 78810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:12:15,415-Speed 10581.71 samples/sec   Loss 8.2577   LearningRate 0.0373   Epoch: 15   Global Step: 78820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:12:16,370-Speed 10732.58 samples/sec   Loss 8.2989   LearningRate 0.0373   Epoch: 15   Global Step: 78830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:12:17,349-Speed 10478.07 samples/sec   Loss 8.2610   LearningRate 0.0372   Epoch: 15   Global Step: 78840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:12:18,304-Speed 10733.89 samples/sec   Loss 8.1065   LearningRate 0.0372   Epoch: 15   Global Step: 78850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:12:19,247-Speed 10868.70 samples/sec   Loss 8.4442   LearningRate 0.0372   Epoch: 15   Global Step: 78860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:12:20,227-Speed 10456.74 samples/sec   Loss 8.4212   LearningRate 0.0372   Epoch: 15   Global Step: 78870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:12:21,167-Speed 10898.52 samples/sec   Loss 8.3321   LearningRate 0.0372   Epoch: 15   Global Step: 78880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:12:22,076-Speed 11276.67 samples/sec   Loss 8.3781   LearningRate 0.0372   Epoch: 15   Global Step: 78890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:12:23,038-Speed 10662.76 samples/sec   Loss 8.1608   LearningRate 0.0372   Epoch: 15   Global Step: 78900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:12:24,017-Speed 10469.96 samples/sec   Loss 8.3234   LearningRate 0.0372   Epoch: 15   Global Step: 78910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:12:24,953-Speed 10958.17 samples/sec   Loss 8.4391   LearningRate 0.0372   Epoch: 15   Global Step: 78920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:12:25,909-Speed 10726.19 samples/sec   Loss 8.3958   LearningRate 0.0372   Epoch: 15   Global Step: 78930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:12:26,898-Speed 10364.40 samples/sec   Loss 8.3041   LearningRate 0.0372   Epoch: 15   Global Step: 78940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:12:27,853-Speed 10728.20 samples/sec   Loss 8.1945   LearningRate 0.0372   Epoch: 15   Global Step: 78950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:12:28,788-Speed 10959.83 samples/sec   Loss 8.3922   LearningRate 0.0372   Epoch: 15   Global Step: 78960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:12:29,742-Speed 10739.27 samples/sec   Loss 8.4093   LearningRate 0.0372   Epoch: 15   Global Step: 78970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:12:30,685-Speed 10879.69 samples/sec   Loss 8.3823   LearningRate 0.0372   Epoch: 15   Global Step: 78980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:12:31,661-Speed 10497.26 samples/sec   Loss 8.2874   LearningRate 0.0372   Epoch: 15   Global Step: 78990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:12:32,614-Speed 10753.38 samples/sec   Loss 8.1210   LearningRate 0.0372   Epoch: 15   Global Step: 79000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:12:33,588-Speed 10521.45 samples/sec   Loss 8.4294   LearningRate 0.0371   Epoch: 15   Global Step: 79010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:12:34,578-Speed 10354.14 samples/sec   Loss 8.1255   LearningRate 0.0371   Epoch: 15   Global Step: 79020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:12:35,525-Speed 10819.57 samples/sec   Loss 8.3856   LearningRate 0.0371   Epoch: 15   Global Step: 79030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:12:36,484-Speed 10684.99 samples/sec   Loss 8.2348   LearningRate 0.0371   Epoch: 15   Global Step: 79040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:12:37,494-Speed 10152.87 samples/sec   Loss 8.2371   LearningRate 0.0371   Epoch: 15   Global Step: 79050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:12:38,467-Speed 10529.68 samples/sec   Loss 8.1134   LearningRate 0.0371   Epoch: 15   Global Step: 79060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:12:39,414-Speed 10825.34 samples/sec   Loss 8.4692   LearningRate 0.0371   Epoch: 15   Global Step: 79070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:12:40,350-Speed 10949.40 samples/sec   Loss 8.3098   LearningRate 0.0371   Epoch: 15   Global Step: 79080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:12:41,333-Speed 10423.50 samples/sec   Loss 8.3494   LearningRate 0.0371   Epoch: 15   Global Step: 79090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:12:42,279-Speed 10842.29 samples/sec   Loss 8.2118   LearningRate 0.0371   Epoch: 15   Global Step: 79100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:12:43,230-Speed 10779.40 samples/sec   Loss 8.4125   LearningRate 0.0371   Epoch: 15   Global Step: 79110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:12:44,191-Speed 10659.78 samples/sec   Loss 8.2974   LearningRate 0.0371   Epoch: 15   Global Step: 79120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:12:45,161-Speed 10563.32 samples/sec   Loss 8.6083   LearningRate 0.0371   Epoch: 15   Global Step: 79130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:12:46,080-Speed 11158.59 samples/sec   Loss 8.4961   LearningRate 0.0371   Epoch: 15   Global Step: 79140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:12:47,060-Speed 10460.85 samples/sec   Loss 8.3374   LearningRate 0.0371   Epoch: 15   Global Step: 79150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:12:48,010-Speed 10783.31 samples/sec   Loss 8.2997   LearningRate 0.0371   Epoch: 15   Global Step: 79160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:12:48,986-Speed 10504.00 samples/sec   Loss 8.3830   LearningRate 0.0371   Epoch: 15   Global Step: 79170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:12:49,961-Speed 10512.87 samples/sec   Loss 8.3616   LearningRate 0.0370   Epoch: 15   Global Step: 79180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:12:50,924-Speed 10643.96 samples/sec   Loss 8.2494   LearningRate 0.0370   Epoch: 15   Global Step: 79190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:12:51,857-Speed 10981.61 samples/sec   Loss 8.2303   LearningRate 0.0370   Epoch: 15   Global Step: 79200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:12:52,831-Speed 10519.07 samples/sec   Loss 8.5787   LearningRate 0.0370   Epoch: 15   Global Step: 79210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:12:53,814-Speed 10439.98 samples/sec   Loss 8.3242   LearningRate 0.0370   Epoch: 15   Global Step: 79220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:12:54,779-Speed 10623.06 samples/sec   Loss 8.2796   LearningRate 0.0370   Epoch: 15   Global Step: 79230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:12:55,765-Speed 10402.87 samples/sec   Loss 8.2958   LearningRate 0.0370   Epoch: 15   Global Step: 79240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:12:56,722-Speed 10713.10 samples/sec   Loss 8.4021   LearningRate 0.0370   Epoch: 15   Global Step: 79250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:12:57,816-Speed 9361.81 samples/sec   Loss 8.1200   LearningRate 0.0370   Epoch: 15   Global Step: 79260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:12:58,766-Speed 10792.18 samples/sec   Loss 8.3448   LearningRate 0.0370   Epoch: 15   Global Step: 79270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:12:59,730-Speed 10632.36 samples/sec   Loss 8.2510   LearningRate 0.0370   Epoch: 15   Global Step: 79280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:13:00,747-Speed 10071.33 samples/sec   Loss 8.3801   LearningRate 0.0370   Epoch: 15   Global Step: 79290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:13:01,766-Speed 10067.54 samples/sec   Loss 8.3998   LearningRate 0.0370   Epoch: 15   Global Step: 79300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:02,723-Speed 10710.07 samples/sec   Loss 8.3863   LearningRate 0.0370   Epoch: 15   Global Step: 79310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:03,687-Speed 10629.78 samples/sec   Loss 8.4736   LearningRate 0.0370   Epoch: 15   Global Step: 79320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:04,649-Speed 10645.83 samples/sec   Loss 8.3771   LearningRate 0.0370   Epoch: 15   Global Step: 79330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:05,614-Speed 10630.31 samples/sec   Loss 8.4946   LearningRate 0.0369   Epoch: 15   Global Step: 79340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:06,596-Speed 10441.39 samples/sec   Loss 8.4659   LearningRate 0.0369   Epoch: 15   Global Step: 79350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:07,552-Speed 10721.35 samples/sec   Loss 8.2399   LearningRate 0.0369   Epoch: 15   Global Step: 79360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:08,486-Speed 10972.30 samples/sec   Loss 8.5249   LearningRate 0.0369   Epoch: 15   Global Step: 79370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:09,463-Speed 10495.82 samples/sec   Loss 8.3117   LearningRate 0.0369   Epoch: 15   Global Step: 79380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:13:10,422-Speed 10686.79 samples/sec   Loss 8.2474   LearningRate 0.0369   Epoch: 15   Global Step: 79390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:13:11,398-Speed 10504.74 samples/sec   Loss 8.2081   LearningRate 0.0369   Epoch: 15   Global Step: 79400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:13:12,351-Speed 10751.36 samples/sec   Loss 8.2738   LearningRate 0.0369   Epoch: 15   Global Step: 79410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:13:13,332-Speed 10446.62 samples/sec   Loss 8.1964   LearningRate 0.0369   Epoch: 15   Global Step: 79420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:13:14,302-Speed 10562.81 samples/sec   Loss 8.4813   LearningRate 0.0369   Epoch: 15   Global Step: 79430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:13:15,252-Speed 10791.32 samples/sec   Loss 8.3487   LearningRate 0.0369   Epoch: 15   Global Step: 79440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:13:16,203-Speed 10773.44 samples/sec   Loss 8.3799   LearningRate 0.0369   Epoch: 15   Global Step: 79450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:13:17,151-Speed 10816.34 samples/sec   Loss 8.1672   LearningRate 0.0369   Epoch: 15   Global Step: 79460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:13:18,110-Speed 10687.69 samples/sec   Loss 8.3497   LearningRate 0.0369   Epoch: 15   Global Step: 79470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:13:19,088-Speed 10484.74 samples/sec   Loss 8.2121   LearningRate 0.0369   Epoch: 15   Global Step: 79480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:20,033-Speed 10845.21 samples/sec   Loss 8.3764   LearningRate 0.0369   Epoch: 15   Global Step: 79490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:20,985-Speed 10757.66 samples/sec   Loss 8.3280   LearningRate 0.0369   Epoch: 15   Global Step: 79500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:21,968-Speed 10429.29 samples/sec   Loss 8.4190   LearningRate 0.0368   Epoch: 15   Global Step: 79510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:22,915-Speed 10828.71 samples/sec   Loss 8.2660   LearningRate 0.0368   Epoch: 15   Global Step: 79520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:23,863-Speed 10809.68 samples/sec   Loss 8.3846   LearningRate 0.0368   Epoch: 15   Global Step: 79530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:24,866-Speed 10217.61 samples/sec   Loss 8.3706   LearningRate 0.0368   Epoch: 15   Global Step: 79540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:25,838-Speed 10538.51 samples/sec   Loss 8.4503   LearningRate 0.0368   Epoch: 15   Global Step: 79550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:26,832-Speed 10312.98 samples/sec   Loss 8.4300   LearningRate 0.0368   Epoch: 15   Global Step: 79560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:27,807-Speed 10511.84 samples/sec   Loss 8.4507   LearningRate 0.0368   Epoch: 15   Global Step: 79570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:28,791-Speed 10417.11 samples/sec   Loss 8.1984   LearningRate 0.0368   Epoch: 15   Global Step: 79580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:29,782-Speed 10335.37 samples/sec   Loss 8.2516   LearningRate 0.0368   Epoch: 15   Global Step: 79590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:30,730-Speed 10816.18 samples/sec   Loss 8.2083   LearningRate 0.0368   Epoch: 15   Global Step: 79600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:31,734-Speed 10205.93 samples/sec   Loss 8.2706   LearningRate 0.0368   Epoch: 15   Global Step: 79610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:32,642-Speed 11291.20 samples/sec   Loss 8.4051   LearningRate 0.0368   Epoch: 15   Global Step: 79620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:33,594-Speed 10764.35 samples/sec   Loss 8.3024   LearningRate 0.0368   Epoch: 15   Global Step: 79630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:34,537-Speed 10868.83 samples/sec   Loss 8.3037   LearningRate 0.0368   Epoch: 15   Global Step: 79640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:35,494-Speed 10703.72 samples/sec   Loss 8.3799   LearningRate 0.0368   Epoch: 15   Global Step: 79650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:36,420-Speed 11065.07 samples/sec   Loss 8.4212   LearningRate 0.0368   Epoch: 15   Global Step: 79660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:37,368-Speed 10818.54 samples/sec   Loss 8.3240   LearningRate 0.0368   Epoch: 15   Global Step: 79670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:38,285-Speed 11181.93 samples/sec   Loss 8.4435   LearningRate 0.0367   Epoch: 15   Global Step: 79680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:39,273-Speed 10377.31 samples/sec   Loss 8.3403   LearningRate 0.0367   Epoch: 15   Global Step: 79690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:40,206-Speed 10982.39 samples/sec   Loss 8.3475   LearningRate 0.0367   Epoch: 15   Global Step: 79700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:41,115-Speed 11281.21 samples/sec   Loss 8.4630   LearningRate 0.0367   Epoch: 15   Global Step: 79710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:42,188-Speed 9546.69 samples/sec   Loss 8.3090   LearningRate 0.0367   Epoch: 15   Global Step: 79720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:43,159-Speed 10557.99 samples/sec   Loss 8.3191   LearningRate 0.0367   Epoch: 15   Global Step: 79730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:44,087-Speed 11049.96 samples/sec   Loss 8.4495   LearningRate 0.0367   Epoch: 15   Global Step: 79740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:45,043-Speed 10717.99 samples/sec   Loss 8.2872   LearningRate 0.0367   Epoch: 15   Global Step: 79750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:45,987-Speed 10858.04 samples/sec   Loss 8.5067   LearningRate 0.0367   Epoch: 15   Global Step: 79760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:46,933-Speed 10839.61 samples/sec   Loss 8.2770   LearningRate 0.0367   Epoch: 15   Global Step: 79770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:47,854-Speed 11123.93 samples/sec   Loss 8.3164   LearningRate 0.0367   Epoch: 15   Global Step: 79780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:13:48,841-Speed 10375.82 samples/sec   Loss 8.3233   LearningRate 0.0367   Epoch: 15   Global Step: 79790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:13:49,815-Speed 10524.48 samples/sec   Loss 8.4478   LearningRate 0.0367   Epoch: 15   Global Step: 79800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:13:50,764-Speed 10807.53 samples/sec   Loss 8.3863   LearningRate 0.0367   Epoch: 15   Global Step: 79810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:13:51,728-Speed 10632.92 samples/sec   Loss 8.5091   LearningRate 0.0367   Epoch: 15   Global Step: 79820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:13:52,674-Speed 10830.15 samples/sec   Loss 8.3020   LearningRate 0.0367   Epoch: 15   Global Step: 79830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:13:53,671-Speed 10277.57 samples/sec   Loss 8.1923   LearningRate 0.0366   Epoch: 15   Global Step: 79840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:13:54,598-Speed 11062.46 samples/sec   Loss 8.4349   LearningRate 0.0366   Epoch: 15   Global Step: 79850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:13:55,525-Speed 11054.99 samples/sec   Loss 8.2956   LearningRate 0.0366   Epoch: 15   Global Step: 79860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:13:56,442-Speed 11172.37 samples/sec   Loss 8.0922   LearningRate 0.0366   Epoch: 15   Global Step: 79870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:13:57,364-Speed 11118.72 samples/sec   Loss 8.3400   LearningRate 0.0366   Epoch: 15   Global Step: 79880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:58,340-Speed 10505.97 samples/sec   Loss 8.2617   LearningRate 0.0366   Epoch: 15   Global Step: 79890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:13:59,283-Speed 10863.58 samples/sec   Loss 8.2513   LearningRate 0.0366   Epoch: 15   Global Step: 79900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:14:00,257-Speed 10531.17 samples/sec   Loss 8.0825   LearningRate 0.0366   Epoch: 15   Global Step: 79910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:14:01,209-Speed 10759.96 samples/sec   Loss 8.5339   LearningRate 0.0366   Epoch: 15   Global Step: 79920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:14:02,208-Speed 10258.09 samples/sec   Loss 8.3180   LearningRate 0.0366   Epoch: 15   Global Step: 79930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:14:03,132-Speed 11093.44 samples/sec   Loss 8.3989   LearningRate 0.0366   Epoch: 15   Global Step: 79940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:14:04,099-Speed 10601.45 samples/sec   Loss 8.4009   LearningRate 0.0366   Epoch: 15   Global Step: 79950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:14:05,019-Speed 11136.34 samples/sec   Loss 8.3640   LearningRate 0.0366   Epoch: 15   Global Step: 79960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:14:05,923-Speed 11332.44 samples/sec   Loss 8.1319   LearningRate 0.0366   Epoch: 15   Global Step: 79970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:14:06,899-Speed 10508.11 samples/sec   Loss 8.5013   LearningRate 0.0366   Epoch: 15   Global Step: 79980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:14:07,808-Speed 11272.01 samples/sec   Loss 8.4277   LearningRate 0.0366   Epoch: 15   Global Step: 79990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:14:08,754-Speed 10834.82 samples/sec   Loss 8.4499   LearningRate 0.0366   Epoch: 15   Global Step: 80000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:14:30,904-[lfw][80000]XNorm: 11.777881
Training: 2022-04-11 02:14:30,905-[lfw][80000]Accuracy-Flip: 0.99533+-0.00393
Training: 2022-04-11 02:14:30,906-[lfw][80000]Accuracy-Highest: 0.99617
Training: 2022-04-11 02:14:56,668-[cfp_fp][80000]XNorm: 9.892101
Training: 2022-04-11 02:14:56,669-[cfp_fp][80000]Accuracy-Flip: 0.95557+-0.01174
Training: 2022-04-11 02:14:56,670-[cfp_fp][80000]Accuracy-Highest: 0.95743
Training: 2022-04-11 02:15:19,103-[agedb_30][80000]XNorm: 11.400444
Training: 2022-04-11 02:15:19,103-[agedb_30][80000]Accuracy-Flip: 0.96433+-0.00926
Training: 2022-04-11 02:15:19,104-[agedb_30][80000]Accuracy-Highest: 0.96433
Training: 2022-04-11 02:15:20,036-Speed 143.66 samples/sec   Loss 8.3795   LearningRate 0.0365   Epoch: 15   Global Step: 80010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:15:20,995-Speed 10682.90 samples/sec   Loss 8.3771   LearningRate 0.0365   Epoch: 15   Global Step: 80020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:15:21,933-Speed 10940.48 samples/sec   Loss 8.4001   LearningRate 0.0365   Epoch: 15   Global Step: 80030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:15:22,918-Speed 10411.49 samples/sec   Loss 8.3056   LearningRate 0.0365   Epoch: 15   Global Step: 80040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:15:23,866-Speed 10808.98 samples/sec   Loss 8.3034   LearningRate 0.0365   Epoch: 15   Global Step: 80050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:15:24,914-Speed 9789.46 samples/sec   Loss 8.2930   LearningRate 0.0365   Epoch: 15   Global Step: 80060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:15:25,867-Speed 10752.50 samples/sec   Loss 8.2409   LearningRate 0.0365   Epoch: 15   Global Step: 80070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:15:26,857-Speed 10354.91 samples/sec   Loss 8.1486   LearningRate 0.0365   Epoch: 15   Global Step: 80080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:15:27,844-Speed 10387.61 samples/sec   Loss 8.3338   LearningRate 0.0365   Epoch: 15   Global Step: 80090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:15:28,779-Speed 10974.72 samples/sec   Loss 8.3258   LearningRate 0.0365   Epoch: 15   Global Step: 80100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:15:29,754-Speed 10508.40 samples/sec   Loss 8.3962   LearningRate 0.0365   Epoch: 15   Global Step: 80110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:15:30,793-Speed 9872.25 samples/sec   Loss 8.3473   LearningRate 0.0365   Epoch: 15   Global Step: 80120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:15:31,753-Speed 10677.44 samples/sec   Loss 8.3899   LearningRate 0.0365   Epoch: 15   Global Step: 80130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:15:32,715-Speed 10653.85 samples/sec   Loss 8.3422   LearningRate 0.0365   Epoch: 15   Global Step: 80140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:15:33,677-Speed 10654.49 samples/sec   Loss 8.1827   LearningRate 0.0365   Epoch: 15   Global Step: 80150   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:15:34,690-Speed 10118.66 samples/sec   Loss 8.2518   LearningRate 0.0365   Epoch: 15   Global Step: 80160   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:15:35,636-Speed 10842.82 samples/sec   Loss 8.3426   LearningRate 0.0365   Epoch: 15   Global Step: 80170   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:15:36,569-Speed 10982.80 samples/sec   Loss 8.3283   LearningRate 0.0364   Epoch: 15   Global Step: 80180   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:15:37,493-Speed 11087.83 samples/sec   Loss 8.2275   LearningRate 0.0364   Epoch: 15   Global Step: 80190   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:15:38,438-Speed 10845.61 samples/sec   Loss 8.3334   LearningRate 0.0364   Epoch: 15   Global Step: 80200   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:15:39,408-Speed 10573.52 samples/sec   Loss 8.3956   LearningRate 0.0364   Epoch: 15   Global Step: 80210   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:15:40,406-Speed 10264.01 samples/sec   Loss 8.2628   LearningRate 0.0364   Epoch: 15   Global Step: 80220   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:15:41,386-Speed 10470.30 samples/sec   Loss 8.4290   LearningRate 0.0364   Epoch: 15   Global Step: 80230   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:15:42,361-Speed 10502.96 samples/sec   Loss 8.3664   LearningRate 0.0364   Epoch: 15   Global Step: 80240   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:15:43,287-Speed 11077.14 samples/sec   Loss 8.3528   LearningRate 0.0364   Epoch: 15   Global Step: 80250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:15:44,243-Speed 10725.06 samples/sec   Loss 8.3168   LearningRate 0.0364   Epoch: 15   Global Step: 80260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:15:45,175-Speed 10995.89 samples/sec   Loss 8.4217   LearningRate 0.0364   Epoch: 15   Global Step: 80270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:15:46,110-Speed 10961.17 samples/sec   Loss 8.3702   LearningRate 0.0364   Epoch: 15   Global Step: 80280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:15:47,072-Speed 10650.33 samples/sec   Loss 8.3574   LearningRate 0.0364   Epoch: 15   Global Step: 80290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:15:48,045-Speed 10535.03 samples/sec   Loss 8.3868   LearningRate 0.0364   Epoch: 15   Global Step: 80300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:15:49,001-Speed 10717.59 samples/sec   Loss 8.2970   LearningRate 0.0364   Epoch: 15   Global Step: 80310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:15:49,932-Speed 11011.73 samples/sec   Loss 8.3588   LearningRate 0.0364   Epoch: 15   Global Step: 80320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:15:50,867-Speed 10964.93 samples/sec   Loss 8.3474   LearningRate 0.0364   Epoch: 15   Global Step: 80330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:15:51,829-Speed 10652.76 samples/sec   Loss 8.3071   LearningRate 0.0363   Epoch: 15   Global Step: 80340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:15:52,789-Speed 10673.29 samples/sec   Loss 8.3454   LearningRate 0.0363   Epoch: 15   Global Step: 80350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:15:53,746-Speed 10715.59 samples/sec   Loss 8.3620   LearningRate 0.0363   Epoch: 15   Global Step: 80360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:15:54,745-Speed 10255.74 samples/sec   Loss 8.2136   LearningRate 0.0363   Epoch: 15   Global Step: 80370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:15:55,719-Speed 10527.69 samples/sec   Loss 8.2666   LearningRate 0.0363   Epoch: 15   Global Step: 80380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:15:56,631-Speed 11233.68 samples/sec   Loss 8.2972   LearningRate 0.0363   Epoch: 15   Global Step: 80390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:15:57,616-Speed 10400.84 samples/sec   Loss 8.4101   LearningRate 0.0363   Epoch: 15   Global Step: 80400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:15:58,522-Speed 11328.35 samples/sec   Loss 8.3147   LearningRate 0.0363   Epoch: 15   Global Step: 80410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:15:59,468-Speed 10839.05 samples/sec   Loss 8.4518   LearningRate 0.0363   Epoch: 15   Global Step: 80420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:16:00,431-Speed 10643.14 samples/sec   Loss 8.3266   LearningRate 0.0363   Epoch: 15   Global Step: 80430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:16:01,387-Speed 10717.91 samples/sec   Loss 8.3078   LearningRate 0.0363   Epoch: 15   Global Step: 80440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:16:02,442-Speed 9719.65 samples/sec   Loss 8.0762   LearningRate 0.0363   Epoch: 15   Global Step: 80450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:16:03,403-Speed 10659.38 samples/sec   Loss 8.4010   LearningRate 0.0363   Epoch: 15   Global Step: 80460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:16:04,332-Speed 11028.90 samples/sec   Loss 8.4391   LearningRate 0.0363   Epoch: 15   Global Step: 80470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:16:05,305-Speed 10540.74 samples/sec   Loss 8.3727   LearningRate 0.0363   Epoch: 15   Global Step: 80480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:16:06,272-Speed 10589.63 samples/sec   Loss 8.2834   LearningRate 0.0363   Epoch: 15   Global Step: 80490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:16:07,261-Speed 10374.26 samples/sec   Loss 8.2696   LearningRate 0.0363   Epoch: 15   Global Step: 80500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:16:08,212-Speed 10779.39 samples/sec   Loss 8.3312   LearningRate 0.0362   Epoch: 15   Global Step: 80510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:16:09,177-Speed 10619.58 samples/sec   Loss 8.5398   LearningRate 0.0362   Epoch: 15   Global Step: 80520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:16:10,205-Speed 9966.90 samples/sec   Loss 8.3002   LearningRate 0.0362   Epoch: 15   Global Step: 80530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:16:11,253-Speed 9785.05 samples/sec   Loss 8.3416   LearningRate 0.0362   Epoch: 15   Global Step: 80540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:16:12,203-Speed 10792.48 samples/sec   Loss 8.5030   LearningRate 0.0362   Epoch: 15   Global Step: 80550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:16:13,128-Speed 11086.44 samples/sec   Loss 8.3568   LearningRate 0.0362   Epoch: 15   Global Step: 80560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:16:14,084-Speed 10715.44 samples/sec   Loss 8.3440   LearningRate 0.0362   Epoch: 15   Global Step: 80570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:16:15,049-Speed 10620.17 samples/sec   Loss 8.2292   LearningRate 0.0362   Epoch: 15   Global Step: 80580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:16:16,017-Speed 10583.17 samples/sec   Loss 8.3018   LearningRate 0.0362   Epoch: 15   Global Step: 80590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:16:16,990-Speed 10535.39 samples/sec   Loss 8.3347   LearningRate 0.0362   Epoch: 15   Global Step: 80600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:16:17,913-Speed 11114.46 samples/sec   Loss 8.2967   LearningRate 0.0362   Epoch: 15   Global Step: 80610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:16:18,870-Speed 10702.62 samples/sec   Loss 8.3914   LearningRate 0.0362   Epoch: 15   Global Step: 80620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:16:19,845-Speed 10517.46 samples/sec   Loss 8.2365   LearningRate 0.0362   Epoch: 15   Global Step: 80630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:16:20,784-Speed 10913.50 samples/sec   Loss 8.2444   LearningRate 0.0362   Epoch: 15   Global Step: 80640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:16:21,763-Speed 10479.30 samples/sec   Loss 8.3360   LearningRate 0.0362   Epoch: 15   Global Step: 80650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:16:22,726-Speed 10648.45 samples/sec   Loss 8.3261   LearningRate 0.0362   Epoch: 15   Global Step: 80660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:16:23,648-Speed 11111.89 samples/sec   Loss 8.2946   LearningRate 0.0362   Epoch: 15   Global Step: 80670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:16:24,627-Speed 10464.73 samples/sec   Loss 8.3116   LearningRate 0.0361   Epoch: 15   Global Step: 80680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:16:25,596-Speed 10581.72 samples/sec   Loss 8.3277   LearningRate 0.0361   Epoch: 15   Global Step: 80690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:16:26,548-Speed 10774.47 samples/sec   Loss 8.3024   LearningRate 0.0361   Epoch: 15   Global Step: 80700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:16:27,526-Speed 10476.65 samples/sec   Loss 8.2708   LearningRate 0.0361   Epoch: 15   Global Step: 80710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:16:28,494-Speed 10588.79 samples/sec   Loss 8.3662   LearningRate 0.0361   Epoch: 15   Global Step: 80720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:16:29,428-Speed 10986.12 samples/sec   Loss 8.3444   LearningRate 0.0361   Epoch: 15   Global Step: 80730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:16:30,383-Speed 10731.82 samples/sec   Loss 8.3735   LearningRate 0.0361   Epoch: 15   Global Step: 80740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:16:31,321-Speed 10927.17 samples/sec   Loss 8.3569   LearningRate 0.0361   Epoch: 15   Global Step: 80750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:16:32,325-Speed 10206.94 samples/sec   Loss 8.3704   LearningRate 0.0361   Epoch: 15   Global Step: 80760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:16:33,273-Speed 10814.49 samples/sec   Loss 8.4598   LearningRate 0.0361   Epoch: 15   Global Step: 80770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:16:34,215-Speed 10878.10 samples/sec   Loss 8.3969   LearningRate 0.0361   Epoch: 15   Global Step: 80780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:16:35,167-Speed 10775.09 samples/sec   Loss 8.2868   LearningRate 0.0361   Epoch: 15   Global Step: 80790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:16:36,162-Speed 10299.71 samples/sec   Loss 8.3165   LearningRate 0.0361   Epoch: 15   Global Step: 80800   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:16:37,150-Speed 10369.28 samples/sec   Loss 8.4130   LearningRate 0.0361   Epoch: 15   Global Step: 80810   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:16:38,127-Speed 10490.47 samples/sec   Loss 8.1918   LearningRate 0.0361   Epoch: 15   Global Step: 80820   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:16:39,082-Speed 10733.45 samples/sec   Loss 8.3730   LearningRate 0.0361   Epoch: 15   Global Step: 80830   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:16:40,055-Speed 10529.49 samples/sec   Loss 8.3754   LearningRate 0.0361   Epoch: 15   Global Step: 80840   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:16:41,031-Speed 10508.06 samples/sec   Loss 8.3402   LearningRate 0.0360   Epoch: 15   Global Step: 80850   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:16:41,992-Speed 10665.60 samples/sec   Loss 8.3437   LearningRate 0.0360   Epoch: 15   Global Step: 80860   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:16:42,970-Speed 10477.80 samples/sec   Loss 8.2358   LearningRate 0.0360   Epoch: 15   Global Step: 80870   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:16:43,946-Speed 10504.64 samples/sec   Loss 8.3478   LearningRate 0.0360   Epoch: 15   Global Step: 80880   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:16:44,882-Speed 10948.47 samples/sec   Loss 8.1607   LearningRate 0.0360   Epoch: 15   Global Step: 80890   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:16:45,827-Speed 10845.95 samples/sec   Loss 8.3513   LearningRate 0.0360   Epoch: 15   Global Step: 80900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:16:46,785-Speed 10705.06 samples/sec   Loss 8.4453   LearningRate 0.0360   Epoch: 15   Global Step: 80910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:16:47,792-Speed 10177.20 samples/sec   Loss 8.3359   LearningRate 0.0360   Epoch: 15   Global Step: 80920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:01,497-Speed 747.22 samples/sec   Loss 8.1603   LearningRate 0.0360   Epoch: 16   Global Step: 80930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:02,679-Speed 8682.99 samples/sec   Loss 7.2747   LearningRate 0.0360   Epoch: 16   Global Step: 80940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:03,646-Speed 10589.80 samples/sec   Loss 7.5232   LearningRate 0.0360   Epoch: 16   Global Step: 80950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:04,721-Speed 9536.66 samples/sec   Loss 7.2677   LearningRate 0.0360   Epoch: 16   Global Step: 80960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:05,680-Speed 10691.09 samples/sec   Loss 7.4077   LearningRate 0.0360   Epoch: 16   Global Step: 80970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:06,634-Speed 10741.89 samples/sec   Loss 7.5158   LearningRate 0.0360   Epoch: 16   Global Step: 80980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:07,698-Speed 9638.65 samples/sec   Loss 7.4588   LearningRate 0.0360   Epoch: 16   Global Step: 80990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:08,672-Speed 10520.42 samples/sec   Loss 7.3490   LearningRate 0.0360   Epoch: 16   Global Step: 81000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:17:09,689-Speed 10076.15 samples/sec   Loss 7.4365   LearningRate 0.0360   Epoch: 16   Global Step: 81010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:17:10,630-Speed 10900.11 samples/sec   Loss 7.4325   LearningRate 0.0359   Epoch: 16   Global Step: 81020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:11,590-Speed 10677.48 samples/sec   Loss 7.3053   LearningRate 0.0359   Epoch: 16   Global Step: 81030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:12,552-Speed 10654.49 samples/sec   Loss 7.4645   LearningRate 0.0359   Epoch: 16   Global Step: 81040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:13,644-Speed 9388.06 samples/sec   Loss 7.4248   LearningRate 0.0359   Epoch: 16   Global Step: 81050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:14,580-Speed 10947.02 samples/sec   Loss 7.5049   LearningRate 0.0359   Epoch: 16   Global Step: 81060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:15,550-Speed 10572.13 samples/sec   Loss 7.5050   LearningRate 0.0359   Epoch: 16   Global Step: 81070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:16,477-Speed 11051.90 samples/sec   Loss 7.5457   LearningRate 0.0359   Epoch: 16   Global Step: 81080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:17,478-Speed 10240.92 samples/sec   Loss 7.3703   LearningRate 0.0359   Epoch: 16   Global Step: 81090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:18,491-Speed 10118.04 samples/sec   Loss 7.5410   LearningRate 0.0359   Epoch: 16   Global Step: 81100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:19,452-Speed 10662.25 samples/sec   Loss 7.5532   LearningRate 0.0359   Epoch: 16   Global Step: 81110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:20,456-Speed 10219.92 samples/sec   Loss 7.5333   LearningRate 0.0359   Epoch: 16   Global Step: 81120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:17:21,410-Speed 10734.73 samples/sec   Loss 7.4451   LearningRate 0.0359   Epoch: 16   Global Step: 81130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:17:22,402-Speed 10341.31 samples/sec   Loss 7.4278   LearningRate 0.0359   Epoch: 16   Global Step: 81140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:17:23,407-Speed 10200.86 samples/sec   Loss 7.3813   LearningRate 0.0359   Epoch: 16   Global Step: 81150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:17:24,346-Speed 10917.54 samples/sec   Loss 7.5122   LearningRate 0.0359   Epoch: 16   Global Step: 81160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:25,358-Speed 10124.71 samples/sec   Loss 7.5499   LearningRate 0.0359   Epoch: 16   Global Step: 81170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:26,337-Speed 10465.66 samples/sec   Loss 7.6648   LearningRate 0.0359   Epoch: 16   Global Step: 81180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:27,282-Speed 10853.99 samples/sec   Loss 7.5017   LearningRate 0.0358   Epoch: 16   Global Step: 81190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:28,237-Speed 10726.84 samples/sec   Loss 7.6280   LearningRate 0.0358   Epoch: 16   Global Step: 81200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:29,259-Speed 10030.28 samples/sec   Loss 7.6057   LearningRate 0.0358   Epoch: 16   Global Step: 81210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:30,187-Speed 11051.29 samples/sec   Loss 7.5892   LearningRate 0.0358   Epoch: 16   Global Step: 81220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:31,124-Speed 10939.78 samples/sec   Loss 7.5211   LearningRate 0.0358   Epoch: 16   Global Step: 81230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:32,136-Speed 10130.82 samples/sec   Loss 7.4636   LearningRate 0.0358   Epoch: 16   Global Step: 81240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:33,110-Speed 10528.34 samples/sec   Loss 7.7048   LearningRate 0.0358   Epoch: 16   Global Step: 81250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:34,062-Speed 10767.63 samples/sec   Loss 7.5893   LearningRate 0.0358   Epoch: 16   Global Step: 81260   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:17:35,023-Speed 10665.41 samples/sec   Loss 7.5174   LearningRate 0.0358   Epoch: 16   Global Step: 81270   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:17:36,000-Speed 10479.14 samples/sec   Loss 7.4736   LearningRate 0.0358   Epoch: 16   Global Step: 81280   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:17:37,049-Speed 9774.56 samples/sec   Loss 7.7731   LearningRate 0.0358   Epoch: 16   Global Step: 81290   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:17:38,001-Speed 10771.33 samples/sec   Loss 7.6760   LearningRate 0.0358   Epoch: 16   Global Step: 81300   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:17:38,963-Speed 10647.88 samples/sec   Loss 7.5698   LearningRate 0.0358   Epoch: 16   Global Step: 81310   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:17:39,975-Speed 10129.71 samples/sec   Loss 7.8043   LearningRate 0.0358   Epoch: 16   Global Step: 81320   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:17:40,940-Speed 10617.53 samples/sec   Loss 7.6311   LearningRate 0.0358   Epoch: 16   Global Step: 81330   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:17:41,905-Speed 10628.66 samples/sec   Loss 7.6757   LearningRate 0.0358   Epoch: 16   Global Step: 81340   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:17:42,869-Speed 10634.09 samples/sec   Loss 7.7854   LearningRate 0.0358   Epoch: 16   Global Step: 81350   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:17:43,875-Speed 10187.53 samples/sec   Loss 7.7666   LearningRate 0.0357   Epoch: 16   Global Step: 81360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:44,856-Speed 10448.91 samples/sec   Loss 7.7708   LearningRate 0.0357   Epoch: 16   Global Step: 81370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:45,844-Speed 10366.87 samples/sec   Loss 7.6933   LearningRate 0.0357   Epoch: 16   Global Step: 81380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:46,791-Speed 10830.21 samples/sec   Loss 7.6312   LearningRate 0.0357   Epoch: 16   Global Step: 81390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:47,781-Speed 10346.14 samples/sec   Loss 7.5531   LearningRate 0.0357   Epoch: 16   Global Step: 81400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:48,816-Speed 9903.55 samples/sec   Loss 7.6072   LearningRate 0.0357   Epoch: 16   Global Step: 81410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:49,810-Speed 10317.71 samples/sec   Loss 7.7764   LearningRate 0.0357   Epoch: 16   Global Step: 81420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:50,769-Speed 10682.01 samples/sec   Loss 7.6958   LearningRate 0.0357   Epoch: 16   Global Step: 81430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:51,732-Speed 10642.73 samples/sec   Loss 7.7091   LearningRate 0.0357   Epoch: 16   Global Step: 81440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:52,747-Speed 10106.00 samples/sec   Loss 7.8159   LearningRate 0.0357   Epoch: 16   Global Step: 81450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:53,700-Speed 10752.45 samples/sec   Loss 7.6932   LearningRate 0.0357   Epoch: 16   Global Step: 81460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:17:54,646-Speed 10832.86 samples/sec   Loss 7.8884   LearningRate 0.0357   Epoch: 16   Global Step: 81470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:17:55,585-Speed 10915.12 samples/sec   Loss 7.5513   LearningRate 0.0357   Epoch: 16   Global Step: 81480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:17:56,564-Speed 10463.57 samples/sec   Loss 7.8038   LearningRate 0.0357   Epoch: 16   Global Step: 81490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:17:57,468-Speed 11360.36 samples/sec   Loss 7.9059   LearningRate 0.0357   Epoch: 16   Global Step: 81500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:58,397-Speed 11036.80 samples/sec   Loss 7.7533   LearningRate 0.0357   Epoch: 16   Global Step: 81510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:17:59,329-Speed 10990.69 samples/sec   Loss 7.7185   LearningRate 0.0356   Epoch: 16   Global Step: 81520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:18:00,312-Speed 10432.49 samples/sec   Loss 7.7267   LearningRate 0.0356   Epoch: 16   Global Step: 81530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:18:01,217-Speed 11334.82 samples/sec   Loss 7.7076   LearningRate 0.0356   Epoch: 16   Global Step: 81540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:18:02,178-Speed 10681.78 samples/sec   Loss 7.8794   LearningRate 0.0356   Epoch: 16   Global Step: 81550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:18:03,200-Speed 10029.65 samples/sec   Loss 7.8551   LearningRate 0.0356   Epoch: 16   Global Step: 81560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:18:04,144-Speed 10848.67 samples/sec   Loss 7.8044   LearningRate 0.0356   Epoch: 16   Global Step: 81570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:18:05,107-Speed 10648.49 samples/sec   Loss 7.7969   LearningRate 0.0356   Epoch: 16   Global Step: 81580   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:18:06,091-Speed 10406.62 samples/sec   Loss 7.7788   LearningRate 0.0356   Epoch: 16   Global Step: 81590   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:18:07,133-Speed 9837.36 samples/sec   Loss 7.8082   LearningRate 0.0356   Epoch: 16   Global Step: 81600   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:18:08,083-Speed 10787.95 samples/sec   Loss 7.7597   LearningRate 0.0356   Epoch: 16   Global Step: 81610   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:18:09,045-Speed 10654.93 samples/sec   Loss 7.7378   LearningRate 0.0356   Epoch: 16   Global Step: 81620   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:18:10,031-Speed 10392.07 samples/sec   Loss 7.7619   LearningRate 0.0356   Epoch: 16   Global Step: 81630   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:18:10,991-Speed 10689.10 samples/sec   Loss 7.9085   LearningRate 0.0356   Epoch: 16   Global Step: 81640   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:18:11,930-Speed 10908.26 samples/sec   Loss 7.8460   LearningRate 0.0356   Epoch: 16   Global Step: 81650   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:18:12,877-Speed 10823.05 samples/sec   Loss 7.8427   LearningRate 0.0356   Epoch: 16   Global Step: 81660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:18:13,836-Speed 10686.69 samples/sec   Loss 7.8603   LearningRate 0.0356   Epoch: 16   Global Step: 81670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:18:14,784-Speed 10810.42 samples/sec   Loss 7.8773   LearningRate 0.0356   Epoch: 16   Global Step: 81680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:18:15,732-Speed 10817.36 samples/sec   Loss 7.9058   LearningRate 0.0355   Epoch: 16   Global Step: 81690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:18:16,682-Speed 10784.29 samples/sec   Loss 7.8472   LearningRate 0.0355   Epoch: 16   Global Step: 81700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:18:17,726-Speed 9817.88 samples/sec   Loss 7.6720   LearningRate 0.0355   Epoch: 16   Global Step: 81710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:18:18,746-Speed 10048.24 samples/sec   Loss 8.0281   LearningRate 0.0355   Epoch: 16   Global Step: 81720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:18:19,703-Speed 10712.38 samples/sec   Loss 7.7838   LearningRate 0.0355   Epoch: 16   Global Step: 81730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:18:20,642-Speed 10915.69 samples/sec   Loss 8.0224   LearningRate 0.0355   Epoch: 16   Global Step: 81740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:18:21,646-Speed 10201.98 samples/sec   Loss 7.9761   LearningRate 0.0355   Epoch: 16   Global Step: 81750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:18:22,607-Speed 10676.66 samples/sec   Loss 8.0214   LearningRate 0.0355   Epoch: 16   Global Step: 81760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:18:23,565-Speed 10705.57 samples/sec   Loss 7.9081   LearningRate 0.0355   Epoch: 16   Global Step: 81770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:18:24,516-Speed 10772.30 samples/sec   Loss 7.8878   LearningRate 0.0355   Epoch: 16   Global Step: 81780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:18:25,468-Speed 10758.23 samples/sec   Loss 7.8232   LearningRate 0.0355   Epoch: 16   Global Step: 81790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:18:26,426-Speed 10711.98 samples/sec   Loss 7.8853   LearningRate 0.0355   Epoch: 16   Global Step: 81800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:18:27,380-Speed 10743.60 samples/sec   Loss 7.6958   LearningRate 0.0355   Epoch: 16   Global Step: 81810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:18:28,322-Speed 10885.35 samples/sec   Loss 7.9368   LearningRate 0.0355   Epoch: 16   Global Step: 81820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:18:29,288-Speed 10608.44 samples/sec   Loss 7.9947   LearningRate 0.0355   Epoch: 16   Global Step: 81830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:18:30,311-Speed 10017.66 samples/sec   Loss 7.9418   LearningRate 0.0355   Epoch: 16   Global Step: 81840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:18:31,265-Speed 10745.92 samples/sec   Loss 7.8983   LearningRate 0.0355   Epoch: 16   Global Step: 81850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:18:32,199-Speed 10972.38 samples/sec   Loss 7.9580   LearningRate 0.0354   Epoch: 16   Global Step: 81860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:18:33,115-Speed 11192.13 samples/sec   Loss 7.8970   LearningRate 0.0354   Epoch: 16   Global Step: 81870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:18:34,063-Speed 10808.52 samples/sec   Loss 8.0814   LearningRate 0.0354   Epoch: 16   Global Step: 81880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:18:35,074-Speed 10146.17 samples/sec   Loss 7.8828   LearningRate 0.0354   Epoch: 16   Global Step: 81890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:18:36,047-Speed 10532.48 samples/sec   Loss 8.0887   LearningRate 0.0354   Epoch: 16   Global Step: 81900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:18:36,989-Speed 10882.11 samples/sec   Loss 7.8232   LearningRate 0.0354   Epoch: 16   Global Step: 81910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:18:37,983-Speed 10313.48 samples/sec   Loss 7.9330   LearningRate 0.0354   Epoch: 16   Global Step: 81920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:18:38,929-Speed 10833.38 samples/sec   Loss 7.9618   LearningRate 0.0354   Epoch: 16   Global Step: 81930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:18:39,851-Speed 11115.75 samples/sec   Loss 7.9454   LearningRate 0.0354   Epoch: 16   Global Step: 81940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:18:40,804-Speed 10750.82 samples/sec   Loss 7.9643   LearningRate 0.0354   Epoch: 16   Global Step: 81950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:18:41,765-Speed 10668.47 samples/sec   Loss 7.9093   LearningRate 0.0354   Epoch: 16   Global Step: 81960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:18:42,722-Speed 10707.05 samples/sec   Loss 8.0254   LearningRate 0.0354   Epoch: 16   Global Step: 81970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:18:43,672-Speed 10793.36 samples/sec   Loss 7.9481   LearningRate 0.0354   Epoch: 16   Global Step: 81980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:18:44,604-Speed 11010.29 samples/sec   Loss 7.9118   LearningRate 0.0354   Epoch: 16   Global Step: 81990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:18:45,566-Speed 10648.75 samples/sec   Loss 8.0285   LearningRate 0.0354   Epoch: 16   Global Step: 82000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:19:07,749-[lfw][82000]XNorm: 11.725746
Training: 2022-04-11 02:19:07,749-[lfw][82000]Accuracy-Flip: 0.99450+-0.00388
Training: 2022-04-11 02:19:07,750-[lfw][82000]Accuracy-Highest: 0.99617
Training: 2022-04-11 02:19:33,541-[cfp_fp][82000]XNorm: 9.963313
Training: 2022-04-11 02:19:33,542-[cfp_fp][82000]Accuracy-Flip: 0.95371+-0.01021
Training: 2022-04-11 02:19:33,543-[cfp_fp][82000]Accuracy-Highest: 0.95743
Training: 2022-04-11 02:19:55,402-[agedb_30][82000]XNorm: 11.459319
Training: 2022-04-11 02:19:55,403-[agedb_30][82000]Accuracy-Flip: 0.96517+-0.00950
Training: 2022-04-11 02:19:55,403-[agedb_30][82000]Accuracy-Highest: 0.96517
Training: 2022-04-11 02:19:56,350-Speed 144.67 samples/sec   Loss 7.9272   LearningRate 0.0354   Epoch: 16   Global Step: 82010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:19:57,294-Speed 10866.64 samples/sec   Loss 7.9355   LearningRate 0.0354   Epoch: 16   Global Step: 82020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:19:58,231-Speed 10935.41 samples/sec   Loss 7.9810   LearningRate 0.0353   Epoch: 16   Global Step: 82030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:19:59,240-Speed 10155.77 samples/sec   Loss 8.0349   LearningRate 0.0353   Epoch: 16   Global Step: 82040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:20:00,256-Speed 10088.97 samples/sec   Loss 7.8076   LearningRate 0.0353   Epoch: 16   Global Step: 82050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:20:01,190-Speed 10979.54 samples/sec   Loss 7.8377   LearningRate 0.0353   Epoch: 16   Global Step: 82060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:20:02,207-Speed 10072.48 samples/sec   Loss 7.9067   LearningRate 0.0353   Epoch: 16   Global Step: 82070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:20:03,171-Speed 10649.79 samples/sec   Loss 8.1863   LearningRate 0.0353   Epoch: 16   Global Step: 82080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:20:04,129-Speed 10691.97 samples/sec   Loss 8.0049   LearningRate 0.0353   Epoch: 16   Global Step: 82090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:20:05,104-Speed 10513.65 samples/sec   Loss 8.0116   LearningRate 0.0353   Epoch: 16   Global Step: 82100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:20:06,067-Speed 10644.12 samples/sec   Loss 7.9843   LearningRate 0.0353   Epoch: 16   Global Step: 82110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:20:07,008-Speed 10884.28 samples/sec   Loss 7.8081   LearningRate 0.0353   Epoch: 16   Global Step: 82120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:20:08,031-Speed 10018.17 samples/sec   Loss 8.0413   LearningRate 0.0353   Epoch: 16   Global Step: 82130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:20:08,989-Speed 10702.47 samples/sec   Loss 7.9233   LearningRate 0.0353   Epoch: 16   Global Step: 82140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:20:09,967-Speed 10476.00 samples/sec   Loss 8.0642   LearningRate 0.0353   Epoch: 16   Global Step: 82150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:20:10,971-Speed 10207.29 samples/sec   Loss 7.9970   LearningRate 0.0353   Epoch: 16   Global Step: 82160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:20:11,944-Speed 10537.20 samples/sec   Loss 7.9730   LearningRate 0.0353   Epoch: 16   Global Step: 82170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:20:12,907-Speed 10639.97 samples/sec   Loss 8.0779   LearningRate 0.0353   Epoch: 16   Global Step: 82180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:20:13,868-Speed 10665.58 samples/sec   Loss 7.9810   LearningRate 0.0353   Epoch: 16   Global Step: 82190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 02:20:14,831-Speed 10646.09 samples/sec   Loss 8.0424   LearningRate 0.0352   Epoch: 16   Global Step: 82200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:20:15,782-Speed 10784.00 samples/sec   Loss 7.8130   LearningRate 0.0352   Epoch: 16   Global Step: 82210   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:20:16,715-Speed 10987.45 samples/sec   Loss 8.0546   LearningRate 0.0352   Epoch: 16   Global Step: 82220   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:20:17,740-Speed 9997.21 samples/sec   Loss 7.9984   LearningRate 0.0352   Epoch: 16   Global Step: 82230   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:20:18,700-Speed 10681.69 samples/sec   Loss 8.1229   LearningRate 0.0352   Epoch: 16   Global Step: 82240   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:20:19,672-Speed 10542.88 samples/sec   Loss 8.0101   LearningRate 0.0352   Epoch: 16   Global Step: 82250   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:20:20,645-Speed 10528.89 samples/sec   Loss 8.1236   LearningRate 0.0352   Epoch: 16   Global Step: 82260   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:20:21,644-Speed 10263.96 samples/sec   Loss 8.1773   LearningRate 0.0352   Epoch: 16   Global Step: 82270   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:20:22,590-Speed 10835.52 samples/sec   Loss 7.9117   LearningRate 0.0352   Epoch: 16   Global Step: 82280   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:20:23,569-Speed 10465.20 samples/sec   Loss 8.0137   LearningRate 0.0352   Epoch: 16   Global Step: 82290   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:20:24,534-Speed 10625.38 samples/sec   Loss 8.0492   LearningRate 0.0352   Epoch: 16   Global Step: 82300   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:20:25,489-Speed 10736.88 samples/sec   Loss 8.0346   LearningRate 0.0352   Epoch: 16   Global Step: 82310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:20:26,419-Speed 11020.61 samples/sec   Loss 8.1699   LearningRate 0.0352   Epoch: 16   Global Step: 82320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:20:27,400-Speed 10440.04 samples/sec   Loss 8.0944   LearningRate 0.0352   Epoch: 16   Global Step: 82330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:20:28,332-Speed 10998.54 samples/sec   Loss 7.9113   LearningRate 0.0352   Epoch: 16   Global Step: 82340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:20:29,276-Speed 10854.08 samples/sec   Loss 7.9096   LearningRate 0.0352   Epoch: 16   Global Step: 82350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:20:30,256-Speed 10467.45 samples/sec   Loss 7.9846   LearningRate 0.0352   Epoch: 16   Global Step: 82360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:20:31,218-Speed 10653.47 samples/sec   Loss 7.8998   LearningRate 0.0351   Epoch: 16   Global Step: 82370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:20:32,157-Speed 10921.20 samples/sec   Loss 8.0450   LearningRate 0.0351   Epoch: 16   Global Step: 82380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:20:33,126-Speed 10574.03 samples/sec   Loss 8.0572   LearningRate 0.0351   Epoch: 16   Global Step: 82390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:20:34,074-Speed 10832.78 samples/sec   Loss 8.0029   LearningRate 0.0351   Epoch: 16   Global Step: 82400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:20:35,151-Speed 9513.63 samples/sec   Loss 7.9478   LearningRate 0.0351   Epoch: 16   Global Step: 82410   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:20:36,112-Speed 10671.34 samples/sec   Loss 8.0218   LearningRate 0.0351   Epoch: 16   Global Step: 82420   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:20:37,077-Speed 10620.46 samples/sec   Loss 7.9272   LearningRate 0.0351   Epoch: 16   Global Step: 82430   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:20:38,032-Speed 10736.28 samples/sec   Loss 8.1203   LearningRate 0.0351   Epoch: 16   Global Step: 82440   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:20:38,960-Speed 11039.38 samples/sec   Loss 7.9419   LearningRate 0.0351   Epoch: 16   Global Step: 82450   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:20:39,897-Speed 10939.58 samples/sec   Loss 8.0410   LearningRate 0.0351   Epoch: 16   Global Step: 82460   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:20:40,849-Speed 10781.77 samples/sec   Loss 7.8911   LearningRate 0.0351   Epoch: 16   Global Step: 82470   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:20:41,863-Speed 10103.04 samples/sec   Loss 7.9479   LearningRate 0.0351   Epoch: 16   Global Step: 82480   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:20:42,832-Speed 10581.04 samples/sec   Loss 8.0696   LearningRate 0.0351   Epoch: 16   Global Step: 82490   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:20:43,760-Speed 11043.28 samples/sec   Loss 8.0006   LearningRate 0.0351   Epoch: 16   Global Step: 82500   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 02:20:44,818-Speed 9686.96 samples/sec   Loss 7.9241   LearningRate 0.0351   Epoch: 16   Global Step: 82510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:20:45,805-Speed 10374.98 samples/sec   Loss 8.1283   LearningRate 0.0351   Epoch: 16   Global Step: 82520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 02:20:46,795-Speed 10351.91 samples/sec   Loss 8.0655   LearningRate 0.0351   Epoch: 16   Global Step: 82530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:20:47,761-Speed 10610.45 samples/sec   Loss 8.1575   LearningRate 0.0351   Epoch: 16   Global Step: 82540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:20:48,721-Speed 10680.69 samples/sec   Loss 8.0922   LearningRate 0.0350   Epoch: 16   Global Step: 82550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:20:49,683-Speed 10652.90 samples/sec   Loss 8.1659   LearningRate 0.0350   Epoch: 16   Global Step: 82560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:20:50,728-Speed 9810.46 samples/sec   Loss 8.1160   LearningRate 0.0350   Epoch: 16   Global Step: 82570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:20:51,762-Speed 9912.15 samples/sec   Loss 8.0003   LearningRate 0.0350   Epoch: 16   Global Step: 82580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:20:52,713-Speed 10776.07 samples/sec   Loss 8.0787   LearningRate 0.0350   Epoch: 16   Global Step: 82590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:20:53,632-Speed 11152.40 samples/sec   Loss 8.0685   LearningRate 0.0350   Epoch: 16   Global Step: 82600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:20:54,594-Speed 10657.22 samples/sec   Loss 8.2739   LearningRate 0.0350   Epoch: 16   Global Step: 82610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:20:55,566-Speed 10537.06 samples/sec   Loss 7.9726   LearningRate 0.0350   Epoch: 16   Global Step: 82620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:20:56,573-Speed 10184.75 samples/sec   Loss 8.1234   LearningRate 0.0350   Epoch: 16   Global Step: 82630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:20:57,536-Speed 10652.68 samples/sec   Loss 7.8891   LearningRate 0.0350   Epoch: 16   Global Step: 82640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:20:58,502-Speed 10603.24 samples/sec   Loss 8.0747   LearningRate 0.0350   Epoch: 16   Global Step: 82650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:20:59,484-Speed 10440.41 samples/sec   Loss 8.0115   LearningRate 0.0350   Epoch: 16   Global Step: 82660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:00,453-Speed 10582.81 samples/sec   Loss 8.0892   LearningRate 0.0350   Epoch: 16   Global Step: 82670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:01,430-Speed 10484.31 samples/sec   Loss 7.9400   LearningRate 0.0350   Epoch: 16   Global Step: 82680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:02,441-Speed 10149.09 samples/sec   Loss 8.0672   LearningRate 0.0350   Epoch: 16   Global Step: 82690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:03,392-Speed 10774.84 samples/sec   Loss 7.9767   LearningRate 0.0350   Epoch: 16   Global Step: 82700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:04,348-Speed 10722.13 samples/sec   Loss 8.0181   LearningRate 0.0350   Epoch: 16   Global Step: 82710   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:21:05,288-Speed 10905.60 samples/sec   Loss 8.0275   LearningRate 0.0349   Epoch: 16   Global Step: 82720   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:21:06,248-Speed 10674.87 samples/sec   Loss 8.1015   LearningRate 0.0349   Epoch: 16   Global Step: 82730   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:21:07,216-Speed 10589.22 samples/sec   Loss 8.0104   LearningRate 0.0349   Epoch: 16   Global Step: 82740   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:21:08,147-Speed 11008.48 samples/sec   Loss 7.9862   LearningRate 0.0349   Epoch: 16   Global Step: 82750   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:21:09,127-Speed 10464.69 samples/sec   Loss 7.8878   LearningRate 0.0349   Epoch: 16   Global Step: 82760   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:21:10,055-Speed 11041.40 samples/sec   Loss 8.0938   LearningRate 0.0349   Epoch: 16   Global Step: 82770   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:21:11,015-Speed 10670.83 samples/sec   Loss 8.0642   LearningRate 0.0349   Epoch: 16   Global Step: 82780   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:21:11,913-Speed 11416.32 samples/sec   Loss 8.0903   LearningRate 0.0349   Epoch: 16   Global Step: 82790   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:21:12,880-Speed 10599.29 samples/sec   Loss 8.0229   LearningRate 0.0349   Epoch: 16   Global Step: 82800   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:21:13,803-Speed 11102.41 samples/sec   Loss 7.9555   LearningRate 0.0349   Epoch: 16   Global Step: 82810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:14,724-Speed 11139.91 samples/sec   Loss 8.1535   LearningRate 0.0349   Epoch: 16   Global Step: 82820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:15,665-Speed 10884.52 samples/sec   Loss 8.0794   LearningRate 0.0349   Epoch: 16   Global Step: 82830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:16,682-Speed 10080.12 samples/sec   Loss 8.1169   LearningRate 0.0349   Epoch: 16   Global Step: 82840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:17,655-Speed 10537.35 samples/sec   Loss 8.0348   LearningRate 0.0349   Epoch: 16   Global Step: 82850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:18,583-Speed 11050.07 samples/sec   Loss 8.1964   LearningRate 0.0349   Epoch: 16   Global Step: 82860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:19,495-Speed 11231.84 samples/sec   Loss 8.2122   LearningRate 0.0349   Epoch: 16   Global Step: 82870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:20,409-Speed 11212.83 samples/sec   Loss 8.2332   LearningRate 0.0349   Epoch: 16   Global Step: 82880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:21,491-Speed 9474.58 samples/sec   Loss 8.1156   LearningRate 0.0348   Epoch: 16   Global Step: 82890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:22,434-Speed 10869.21 samples/sec   Loss 8.1228   LearningRate 0.0348   Epoch: 16   Global Step: 82900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:23,397-Speed 10661.34 samples/sec   Loss 8.2026   LearningRate 0.0348   Epoch: 16   Global Step: 82910   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:21:24,355-Speed 10689.22 samples/sec   Loss 8.0786   LearningRate 0.0348   Epoch: 16   Global Step: 82920   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:21:25,318-Speed 10647.78 samples/sec   Loss 8.0670   LearningRate 0.0348   Epoch: 16   Global Step: 82930   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:21:26,283-Speed 10624.20 samples/sec   Loss 8.1323   LearningRate 0.0348   Epoch: 16   Global Step: 82940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:27,221-Speed 10919.82 samples/sec   Loss 8.0566   LearningRate 0.0348   Epoch: 16   Global Step: 82950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:28,151-Speed 11017.96 samples/sec   Loss 8.1559   LearningRate 0.0348   Epoch: 16   Global Step: 82960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:29,129-Speed 10478.98 samples/sec   Loss 8.2178   LearningRate 0.0348   Epoch: 16   Global Step: 82970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:30,028-Speed 11404.84 samples/sec   Loss 8.3120   LearningRate 0.0348   Epoch: 16   Global Step: 82980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:31,002-Speed 10520.43 samples/sec   Loss 8.1906   LearningRate 0.0348   Epoch: 16   Global Step: 82990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:31,942-Speed 10911.74 samples/sec   Loss 7.9857   LearningRate 0.0348   Epoch: 16   Global Step: 83000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:32,866-Speed 11092.88 samples/sec   Loss 7.9690   LearningRate 0.0348   Epoch: 16   Global Step: 83010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:33,790-Speed 11082.85 samples/sec   Loss 8.2966   LearningRate 0.0348   Epoch: 16   Global Step: 83020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:34,740-Speed 10795.82 samples/sec   Loss 8.0852   LearningRate 0.0348   Epoch: 16   Global Step: 83030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:35,645-Speed 11330.71 samples/sec   Loss 8.1779   LearningRate 0.0348   Epoch: 16   Global Step: 83040   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:21:36,588-Speed 10877.59 samples/sec   Loss 8.0690   LearningRate 0.0348   Epoch: 16   Global Step: 83050   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:21:37,556-Speed 10592.01 samples/sec   Loss 8.0809   LearningRate 0.0347   Epoch: 16   Global Step: 83060   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:21:38,466-Speed 11261.48 samples/sec   Loss 7.9925   LearningRate 0.0347   Epoch: 16   Global Step: 83070   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:21:39,468-Speed 10223.31 samples/sec   Loss 8.1606   LearningRate 0.0347   Epoch: 16   Global Step: 83080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:40,415-Speed 10832.13 samples/sec   Loss 8.0919   LearningRate 0.0347   Epoch: 16   Global Step: 83090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:41,402-Speed 10380.02 samples/sec   Loss 8.1699   LearningRate 0.0347   Epoch: 16   Global Step: 83100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:42,360-Speed 10692.28 samples/sec   Loss 8.1044   LearningRate 0.0347   Epoch: 16   Global Step: 83110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:43,319-Speed 10686.99 samples/sec   Loss 8.0374   LearningRate 0.0347   Epoch: 16   Global Step: 83120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:44,303-Speed 10417.99 samples/sec   Loss 8.0425   LearningRate 0.0347   Epoch: 16   Global Step: 83130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:45,234-Speed 11015.55 samples/sec   Loss 8.1163   LearningRate 0.0347   Epoch: 16   Global Step: 83140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:46,190-Speed 10709.73 samples/sec   Loss 8.0819   LearningRate 0.0347   Epoch: 16   Global Step: 83150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:47,145-Speed 10738.85 samples/sec   Loss 8.1210   LearningRate 0.0347   Epoch: 16   Global Step: 83160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:48,082-Speed 10935.60 samples/sec   Loss 8.0809   LearningRate 0.0347   Epoch: 16   Global Step: 83170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:49,023-Speed 10883.39 samples/sec   Loss 8.0237   LearningRate 0.0347   Epoch: 16   Global Step: 83180   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:21:49,975-Speed 10774.67 samples/sec   Loss 8.1184   LearningRate 0.0347   Epoch: 16   Global Step: 83190   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:21:50,922-Speed 10823.10 samples/sec   Loss 8.1217   LearningRate 0.0347   Epoch: 16   Global Step: 83200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:21:51,856-Speed 10980.43 samples/sec   Loss 8.1351   LearningRate 0.0347   Epoch: 16   Global Step: 83210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:52,822-Speed 10612.91 samples/sec   Loss 8.3991   LearningRate 0.0347   Epoch: 16   Global Step: 83220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:53,804-Speed 10428.20 samples/sec   Loss 8.0754   LearningRate 0.0346   Epoch: 16   Global Step: 83230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:54,831-Speed 9988.68 samples/sec   Loss 8.0671   LearningRate 0.0346   Epoch: 16   Global Step: 83240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:55,763-Speed 10986.25 samples/sec   Loss 7.8900   LearningRate 0.0346   Epoch: 16   Global Step: 83250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:56,729-Speed 10615.41 samples/sec   Loss 8.0846   LearningRate 0.0346   Epoch: 16   Global Step: 83260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:57,730-Speed 10238.56 samples/sec   Loss 8.1805   LearningRate 0.0346   Epoch: 16   Global Step: 83270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:58,696-Speed 10617.96 samples/sec   Loss 8.1219   LearningRate 0.0346   Epoch: 16   Global Step: 83280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:21:59,676-Speed 10451.69 samples/sec   Loss 8.0862   LearningRate 0.0346   Epoch: 16   Global Step: 83290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:00,661-Speed 10404.30 samples/sec   Loss 8.1859   LearningRate 0.0346   Epoch: 16   Global Step: 83300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:01,568-Speed 11306.43 samples/sec   Loss 8.1537   LearningRate 0.0346   Epoch: 16   Global Step: 83310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:02,523-Speed 10733.80 samples/sec   Loss 8.0766   LearningRate 0.0346   Epoch: 16   Global Step: 83320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:03,543-Speed 10049.64 samples/sec   Loss 8.0794   LearningRate 0.0346   Epoch: 16   Global Step: 83330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:04,480-Speed 10936.52 samples/sec   Loss 8.1370   LearningRate 0.0346   Epoch: 16   Global Step: 83340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:05,450-Speed 10574.45 samples/sec   Loss 8.1970   LearningRate 0.0346   Epoch: 16   Global Step: 83350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:06,405-Speed 10724.13 samples/sec   Loss 8.0014   LearningRate 0.0346   Epoch: 16   Global Step: 83360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:07,428-Speed 10024.19 samples/sec   Loss 8.1409   LearningRate 0.0346   Epoch: 16   Global Step: 83370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:08,376-Speed 10814.94 samples/sec   Loss 8.0884   LearningRate 0.0346   Epoch: 16   Global Step: 83380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:09,353-Speed 10481.60 samples/sec   Loss 7.9739   LearningRate 0.0346   Epoch: 16   Global Step: 83390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:10,378-Speed 10002.75 samples/sec   Loss 8.1822   LearningRate 0.0345   Epoch: 16   Global Step: 83400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:11,336-Speed 10702.15 samples/sec   Loss 8.1234   LearningRate 0.0345   Epoch: 16   Global Step: 83410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:12,259-Speed 11119.56 samples/sec   Loss 8.0453   LearningRate 0.0345   Epoch: 16   Global Step: 83420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:13,216-Speed 10706.98 samples/sec   Loss 8.1295   LearningRate 0.0345   Epoch: 16   Global Step: 83430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:14,175-Speed 10693.94 samples/sec   Loss 8.1859   LearningRate 0.0345   Epoch: 16   Global Step: 83440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:15,092-Speed 11179.18 samples/sec   Loss 8.0438   LearningRate 0.0345   Epoch: 16   Global Step: 83450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:16,039-Speed 10831.21 samples/sec   Loss 8.1163   LearningRate 0.0345   Epoch: 16   Global Step: 83460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:16,985-Speed 10821.54 samples/sec   Loss 8.1097   LearningRate 0.0345   Epoch: 16   Global Step: 83470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:17,981-Speed 10301.49 samples/sec   Loss 8.2388   LearningRate 0.0345   Epoch: 16   Global Step: 83480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:18,948-Speed 10597.77 samples/sec   Loss 8.0558   LearningRate 0.0345   Epoch: 16   Global Step: 83490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:19,901-Speed 10756.18 samples/sec   Loss 8.1165   LearningRate 0.0345   Epoch: 16   Global Step: 83500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:20,866-Speed 10616.82 samples/sec   Loss 8.1869   LearningRate 0.0345   Epoch: 16   Global Step: 83510   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:22:21,854-Speed 10375.76 samples/sec   Loss 8.0490   LearningRate 0.0345   Epoch: 16   Global Step: 83520   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:22:22,808-Speed 10746.49 samples/sec   Loss 8.0899   LearningRate 0.0345   Epoch: 16   Global Step: 83530   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:22:23,753-Speed 10844.16 samples/sec   Loss 7.9405   LearningRate 0.0345   Epoch: 16   Global Step: 83540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:24,738-Speed 10407.54 samples/sec   Loss 7.9508   LearningRate 0.0345   Epoch: 16   Global Step: 83550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:25,725-Speed 10388.43 samples/sec   Loss 8.3564   LearningRate 0.0345   Epoch: 16   Global Step: 83560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:26,685-Speed 10678.71 samples/sec   Loss 8.2140   LearningRate 0.0345   Epoch: 16   Global Step: 83570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:27,658-Speed 10526.89 samples/sec   Loss 8.2366   LearningRate 0.0344   Epoch: 16   Global Step: 83580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:28,566-Speed 11282.94 samples/sec   Loss 8.2550   LearningRate 0.0344   Epoch: 16   Global Step: 83590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:29,519-Speed 10758.33 samples/sec   Loss 8.2869   LearningRate 0.0344   Epoch: 16   Global Step: 83600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:30,438-Speed 11154.06 samples/sec   Loss 8.0395   LearningRate 0.0344   Epoch: 16   Global Step: 83610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:31,396-Speed 10690.45 samples/sec   Loss 8.1223   LearningRate 0.0344   Epoch: 16   Global Step: 83620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:32,363-Speed 10608.46 samples/sec   Loss 7.9767   LearningRate 0.0344   Epoch: 16   Global Step: 83630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:33,325-Speed 10656.33 samples/sec   Loss 8.1472   LearningRate 0.0344   Epoch: 16   Global Step: 83640   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:22:34,280-Speed 10733.34 samples/sec   Loss 8.1584   LearningRate 0.0344   Epoch: 16   Global Step: 83650   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:22:35,268-Speed 10376.14 samples/sec   Loss 8.3080   LearningRate 0.0344   Epoch: 16   Global Step: 83660   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:22:36,224-Speed 10720.30 samples/sec   Loss 8.2604   LearningRate 0.0344   Epoch: 16   Global Step: 83670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:22:37,203-Speed 10465.82 samples/sec   Loss 8.1565   LearningRate 0.0344   Epoch: 16   Global Step: 83680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:22:38,145-Speed 10879.66 samples/sec   Loss 8.0373   LearningRate 0.0344   Epoch: 16   Global Step: 83690   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:22:39,084-Speed 10915.89 samples/sec   Loss 7.8918   LearningRate 0.0344   Epoch: 16   Global Step: 83700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:40,004-Speed 11137.26 samples/sec   Loss 8.1605   LearningRate 0.0344   Epoch: 16   Global Step: 83710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:40,985-Speed 10452.15 samples/sec   Loss 8.2149   LearningRate 0.0344   Epoch: 16   Global Step: 83720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:41,897-Speed 11239.64 samples/sec   Loss 8.1473   LearningRate 0.0344   Epoch: 16   Global Step: 83730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:42,850-Speed 10758.41 samples/sec   Loss 8.2065   LearningRate 0.0344   Epoch: 16   Global Step: 83740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:43,834-Speed 10413.50 samples/sec   Loss 8.2096   LearningRate 0.0343   Epoch: 16   Global Step: 83750   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:22:44,866-Speed 9930.70 samples/sec   Loss 8.0354   LearningRate 0.0343   Epoch: 16   Global Step: 83760   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:22:45,832-Speed 10618.01 samples/sec   Loss 8.0623   LearningRate 0.0343   Epoch: 16   Global Step: 83770   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:22:46,786-Speed 10735.08 samples/sec   Loss 8.2085   LearningRate 0.0343   Epoch: 16   Global Step: 83780   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:22:47,765-Speed 10477.43 samples/sec   Loss 8.1962   LearningRate 0.0343   Epoch: 16   Global Step: 83790   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:22:48,731-Speed 10610.88 samples/sec   Loss 8.2673   LearningRate 0.0343   Epoch: 16   Global Step: 83800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:22:49,678-Speed 10821.89 samples/sec   Loss 8.0321   LearningRate 0.0343   Epoch: 16   Global Step: 83810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:22:50,665-Speed 10383.99 samples/sec   Loss 8.0078   LearningRate 0.0343   Epoch: 16   Global Step: 83820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:22:51,630-Speed 10622.10 samples/sec   Loss 8.1390   LearningRate 0.0343   Epoch: 16   Global Step: 83830   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:22:52,569-Speed 10919.43 samples/sec   Loss 8.1022   LearningRate 0.0343   Epoch: 16   Global Step: 83840   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:22:53,589-Speed 10043.60 samples/sec   Loss 8.2048   LearningRate 0.0343   Epoch: 16   Global Step: 83850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:54,507-Speed 11168.86 samples/sec   Loss 8.0926   LearningRate 0.0343   Epoch: 16   Global Step: 83860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:55,464-Speed 10703.09 samples/sec   Loss 8.1579   LearningRate 0.0343   Epoch: 16   Global Step: 83870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:56,397-Speed 10981.13 samples/sec   Loss 8.2558   LearningRate 0.0343   Epoch: 16   Global Step: 83880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:57,392-Speed 10298.72 samples/sec   Loss 8.1927   LearningRate 0.0343   Epoch: 16   Global Step: 83890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:58,354-Speed 10662.11 samples/sec   Loss 8.1734   LearningRate 0.0343   Epoch: 16   Global Step: 83900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:22:59,302-Speed 10809.41 samples/sec   Loss 8.0826   LearningRate 0.0343   Epoch: 16   Global Step: 83910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:23:00,308-Speed 10190.71 samples/sec   Loss 8.2213   LearningRate 0.0342   Epoch: 16   Global Step: 83920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:23:01,302-Speed 10305.87 samples/sec   Loss 8.1552   LearningRate 0.0342   Epoch: 16   Global Step: 83930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:23:02,296-Speed 10310.90 samples/sec   Loss 8.1987   LearningRate 0.0342   Epoch: 16   Global Step: 83940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:23:03,209-Speed 11226.65 samples/sec   Loss 8.1330   LearningRate 0.0342   Epoch: 16   Global Step: 83950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:23:04,172-Speed 10644.41 samples/sec   Loss 8.2178   LearningRate 0.0342   Epoch: 16   Global Step: 83960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:23:05,126-Speed 10737.21 samples/sec   Loss 8.2374   LearningRate 0.0342   Epoch: 16   Global Step: 83970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:23:06,068-Speed 10883.61 samples/sec   Loss 8.0874   LearningRate 0.0342   Epoch: 16   Global Step: 83980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:23:06,977-Speed 11279.68 samples/sec   Loss 8.2414   LearningRate 0.0342   Epoch: 16   Global Step: 83990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:23:07,946-Speed 10576.31 samples/sec   Loss 8.2714   LearningRate 0.0342   Epoch: 16   Global Step: 84000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:23:30,487-[lfw][84000]XNorm: 11.601733
Training: 2022-04-11 02:23:30,488-[lfw][84000]Accuracy-Flip: 0.99617+-0.00269
Training: 2022-04-11 02:23:30,488-[lfw][84000]Accuracy-Highest: 0.99617
Training: 2022-04-11 02:23:56,010-[cfp_fp][84000]XNorm: 9.872416
Training: 2022-04-11 02:23:56,011-[cfp_fp][84000]Accuracy-Flip: 0.95843+-0.01033
Training: 2022-04-11 02:23:56,011-[cfp_fp][84000]Accuracy-Highest: 0.95843
Training: 2022-04-11 02:24:18,065-[agedb_30][84000]XNorm: 11.297971
Training: 2022-04-11 02:24:18,065-[agedb_30][84000]Accuracy-Flip: 0.96217+-0.00937
Training: 2022-04-11 02:24:18,066-[agedb_30][84000]Accuracy-Highest: 0.96517
Training: 2022-04-11 02:24:18,976-Speed 144.17 samples/sec   Loss 8.3250   LearningRate 0.0342   Epoch: 16   Global Step: 84010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:24:19,899-Speed 11101.10 samples/sec   Loss 8.1928   LearningRate 0.0342   Epoch: 16   Global Step: 84020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:24:20,894-Speed 10292.28 samples/sec   Loss 8.1547   LearningRate 0.0342   Epoch: 16   Global Step: 84030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:24:21,862-Speed 10592.72 samples/sec   Loss 8.1477   LearningRate 0.0342   Epoch: 16   Global Step: 84040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:24:22,871-Speed 10161.10 samples/sec   Loss 8.3595   LearningRate 0.0342   Epoch: 16   Global Step: 84050   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:24:23,817-Speed 10833.75 samples/sec   Loss 8.1941   LearningRate 0.0342   Epoch: 16   Global Step: 84060   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:24:24,832-Speed 10088.80 samples/sec   Loss 8.0889   LearningRate 0.0342   Epoch: 16   Global Step: 84070   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:24:25,735-Speed 11355.37 samples/sec   Loss 8.1604   LearningRate 0.0342   Epoch: 16   Global Step: 84080   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:24:26,751-Speed 10090.04 samples/sec   Loss 7.9844   LearningRate 0.0341   Epoch: 16   Global Step: 84090   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:24:27,711-Speed 10684.67 samples/sec   Loss 8.3444   LearningRate 0.0341   Epoch: 16   Global Step: 84100   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:24:28,680-Speed 10569.40 samples/sec   Loss 8.1872   LearningRate 0.0341   Epoch: 16   Global Step: 84110   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:24:29,661-Speed 10450.08 samples/sec   Loss 8.0038   LearningRate 0.0341   Epoch: 16   Global Step: 84120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:24:30,606-Speed 10848.65 samples/sec   Loss 8.0637   LearningRate 0.0341   Epoch: 16   Global Step: 84130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:24:31,577-Speed 10551.67 samples/sec   Loss 8.2962   LearningRate 0.0341   Epoch: 16   Global Step: 84140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:24:32,528-Speed 10790.48 samples/sec   Loss 8.1952   LearningRate 0.0341   Epoch: 16   Global Step: 84150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:24:33,511-Speed 10428.51 samples/sec   Loss 8.2675   LearningRate 0.0341   Epoch: 16   Global Step: 84160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:24:34,452-Speed 10891.88 samples/sec   Loss 8.1230   LearningRate 0.0341   Epoch: 16   Global Step: 84170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:24:35,394-Speed 10884.48 samples/sec   Loss 8.0483   LearningRate 0.0341   Epoch: 16   Global Step: 84180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:24:36,363-Speed 10572.43 samples/sec   Loss 8.3028   LearningRate 0.0341   Epoch: 16   Global Step: 84190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:24:37,335-Speed 10546.48 samples/sec   Loss 8.3067   LearningRate 0.0341   Epoch: 16   Global Step: 84200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:24:38,316-Speed 10450.14 samples/sec   Loss 8.1113   LearningRate 0.0341   Epoch: 16   Global Step: 84210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:24:39,278-Speed 10657.24 samples/sec   Loss 8.1090   LearningRate 0.0341   Epoch: 16   Global Step: 84220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:24:40,189-Speed 11248.96 samples/sec   Loss 8.2288   LearningRate 0.0341   Epoch: 16   Global Step: 84230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:24:41,199-Speed 10141.68 samples/sec   Loss 8.1944   LearningRate 0.0341   Epoch: 16   Global Step: 84240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:24:42,141-Speed 10882.79 samples/sec   Loss 8.1134   LearningRate 0.0341   Epoch: 16   Global Step: 84250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:24:43,116-Speed 10511.83 samples/sec   Loss 8.2293   LearningRate 0.0341   Epoch: 16   Global Step: 84260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:24:44,076-Speed 10677.36 samples/sec   Loss 8.2095   LearningRate 0.0340   Epoch: 16   Global Step: 84270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:24:45,022-Speed 10843.88 samples/sec   Loss 8.0735   LearningRate 0.0340   Epoch: 16   Global Step: 84280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:24:45,976-Speed 10744.32 samples/sec   Loss 7.9412   LearningRate 0.0340   Epoch: 16   Global Step: 84290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:24:46,958-Speed 10434.58 samples/sec   Loss 8.1108   LearningRate 0.0340   Epoch: 16   Global Step: 84300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:24:47,923-Speed 10616.10 samples/sec   Loss 8.0927   LearningRate 0.0340   Epoch: 16   Global Step: 84310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:24:48,855-Speed 11000.64 samples/sec   Loss 8.2192   LearningRate 0.0340   Epoch: 16   Global Step: 84320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:24:49,840-Speed 10406.06 samples/sec   Loss 8.2718   LearningRate 0.0340   Epoch: 16   Global Step: 84330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:24:50,807-Speed 10608.91 samples/sec   Loss 8.3110   LearningRate 0.0340   Epoch: 16   Global Step: 84340   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:24:51,758-Speed 10770.56 samples/sec   Loss 8.1079   LearningRate 0.0340   Epoch: 16   Global Step: 84350   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:24:52,768-Speed 10146.27 samples/sec   Loss 8.1564   LearningRate 0.0340   Epoch: 16   Global Step: 84360   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:24:53,735-Speed 10598.47 samples/sec   Loss 8.1284   LearningRate 0.0340   Epoch: 16   Global Step: 84370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:24:54,689-Speed 10742.23 samples/sec   Loss 8.1121   LearningRate 0.0340   Epoch: 16   Global Step: 84380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:24:55,606-Speed 11181.94 samples/sec   Loss 8.1869   LearningRate 0.0340   Epoch: 16   Global Step: 84390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:24:56,566-Speed 10677.11 samples/sec   Loss 8.1393   LearningRate 0.0340   Epoch: 16   Global Step: 84400   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:24:57,532-Speed 10612.80 samples/sec   Loss 8.1675   LearningRate 0.0340   Epoch: 16   Global Step: 84410   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:24:58,480-Speed 10809.40 samples/sec   Loss 8.2796   LearningRate 0.0340   Epoch: 16   Global Step: 84420   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:24:59,425-Speed 10845.74 samples/sec   Loss 8.2311   LearningRate 0.0340   Epoch: 16   Global Step: 84430   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:25:00,402-Speed 10486.39 samples/sec   Loss 8.1699   LearningRate 0.0339   Epoch: 16   Global Step: 84440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:01,328-Speed 11075.26 samples/sec   Loss 8.1995   LearningRate 0.0339   Epoch: 16   Global Step: 84450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:02,318-Speed 10349.81 samples/sec   Loss 8.2830   LearningRate 0.0339   Epoch: 16   Global Step: 84460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:03,243-Speed 11077.10 samples/sec   Loss 8.1307   LearningRate 0.0339   Epoch: 16   Global Step: 84470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:04,226-Speed 10430.27 samples/sec   Loss 8.1193   LearningRate 0.0339   Epoch: 16   Global Step: 84480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:05,162-Speed 10947.42 samples/sec   Loss 8.0729   LearningRate 0.0339   Epoch: 16   Global Step: 84490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:06,097-Speed 10962.73 samples/sec   Loss 8.1601   LearningRate 0.0339   Epoch: 16   Global Step: 84500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:07,201-Speed 9286.34 samples/sec   Loss 8.2551   LearningRate 0.0339   Epoch: 16   Global Step: 84510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:08,170-Speed 10580.90 samples/sec   Loss 8.0305   LearningRate 0.0339   Epoch: 16   Global Step: 84520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:09,146-Speed 10502.02 samples/sec   Loss 8.1655   LearningRate 0.0339   Epoch: 16   Global Step: 84530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:10,125-Speed 10460.64 samples/sec   Loss 8.3148   LearningRate 0.0339   Epoch: 16   Global Step: 84540   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:25:11,087-Speed 10665.55 samples/sec   Loss 8.0336   LearningRate 0.0339   Epoch: 16   Global Step: 84550   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:25:12,034-Speed 10821.24 samples/sec   Loss 8.2652   LearningRate 0.0339   Epoch: 16   Global Step: 84560   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:25:13,015-Speed 10446.86 samples/sec   Loss 8.1037   LearningRate 0.0339   Epoch: 16   Global Step: 84570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:25:13,956-Speed 10892.33 samples/sec   Loss 8.2029   LearningRate 0.0339   Epoch: 16   Global Step: 84580   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:25:14,922-Speed 10612.81 samples/sec   Loss 8.1041   LearningRate 0.0339   Epoch: 16   Global Step: 84590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:15,820-Speed 11415.05 samples/sec   Loss 8.3137   LearningRate 0.0339   Epoch: 16   Global Step: 84600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:16,806-Speed 10389.36 samples/sec   Loss 8.0476   LearningRate 0.0338   Epoch: 16   Global Step: 84610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:17,743-Speed 10946.85 samples/sec   Loss 8.2697   LearningRate 0.0338   Epoch: 16   Global Step: 84620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:18,694-Speed 10768.82 samples/sec   Loss 8.2048   LearningRate 0.0338   Epoch: 16   Global Step: 84630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:19,659-Speed 10626.49 samples/sec   Loss 8.2230   LearningRate 0.0338   Epoch: 16   Global Step: 84640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:20,650-Speed 10340.05 samples/sec   Loss 8.1579   LearningRate 0.0338   Epoch: 16   Global Step: 84650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:21,608-Speed 10700.84 samples/sec   Loss 7.9842   LearningRate 0.0338   Epoch: 16   Global Step: 84660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:22,559-Speed 10774.44 samples/sec   Loss 8.1682   LearningRate 0.0338   Epoch: 16   Global Step: 84670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:24,306-Speed 5863.24 samples/sec   Loss 8.1160   LearningRate 0.0338   Epoch: 16   Global Step: 84680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:25,243-Speed 10945.87 samples/sec   Loss 8.1610   LearningRate 0.0338   Epoch: 16   Global Step: 84690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:26,215-Speed 10545.87 samples/sec   Loss 8.0175   LearningRate 0.0338   Epoch: 16   Global Step: 84700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:27,167-Speed 10766.28 samples/sec   Loss 8.1002   LearningRate 0.0338   Epoch: 16   Global Step: 84710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:28,182-Speed 10094.89 samples/sec   Loss 8.0955   LearningRate 0.0338   Epoch: 16   Global Step: 84720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:29,151-Speed 10574.85 samples/sec   Loss 8.1424   LearningRate 0.0338   Epoch: 16   Global Step: 84730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:30,131-Speed 10462.19 samples/sec   Loss 8.1043   LearningRate 0.0338   Epoch: 16   Global Step: 84740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:31,085-Speed 10740.05 samples/sec   Loss 8.1906   LearningRate 0.0338   Epoch: 16   Global Step: 84750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:32,088-Speed 10224.87 samples/sec   Loss 8.2261   LearningRate 0.0338   Epoch: 16   Global Step: 84760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:33,114-Speed 9984.04 samples/sec   Loss 8.1223   LearningRate 0.0338   Epoch: 16   Global Step: 84770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:34,063-Speed 10807.93 samples/sec   Loss 8.1398   LearningRate 0.0338   Epoch: 16   Global Step: 84780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:35,000-Speed 10941.04 samples/sec   Loss 8.1001   LearningRate 0.0337   Epoch: 16   Global Step: 84790   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:25:35,943-Speed 10859.85 samples/sec   Loss 8.1104   LearningRate 0.0337   Epoch: 16   Global Step: 84800   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:25:36,883-Speed 10905.77 samples/sec   Loss 8.2883   LearningRate 0.0337   Epoch: 16   Global Step: 84810   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:25:37,841-Speed 10701.37 samples/sec   Loss 8.1354   LearningRate 0.0337   Epoch: 16   Global Step: 84820   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:25:38,795-Speed 10734.44 samples/sec   Loss 8.2754   LearningRate 0.0337   Epoch: 16   Global Step: 84830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:39,738-Speed 10868.35 samples/sec   Loss 8.1598   LearningRate 0.0337   Epoch: 16   Global Step: 84840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:40,688-Speed 10802.57 samples/sec   Loss 8.0616   LearningRate 0.0337   Epoch: 16   Global Step: 84850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:41,648-Speed 10687.05 samples/sec   Loss 8.2459   LearningRate 0.0337   Epoch: 16   Global Step: 84860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:42,572-Speed 11095.56 samples/sec   Loss 8.1297   LearningRate 0.0337   Epoch: 16   Global Step: 84870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:43,510-Speed 10928.07 samples/sec   Loss 8.1169   LearningRate 0.0337   Epoch: 16   Global Step: 84880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:44,501-Speed 10335.79 samples/sec   Loss 8.1044   LearningRate 0.0337   Epoch: 16   Global Step: 84890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:45,461-Speed 10685.78 samples/sec   Loss 8.2086   LearningRate 0.0337   Epoch: 16   Global Step: 84900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:46,456-Speed 10297.07 samples/sec   Loss 8.0949   LearningRate 0.0337   Epoch: 16   Global Step: 84910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:47,433-Speed 10489.37 samples/sec   Loss 8.3712   LearningRate 0.0337   Epoch: 16   Global Step: 84920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:25:48,426-Speed 10319.76 samples/sec   Loss 8.4092   LearningRate 0.0337   Epoch: 16   Global Step: 84930   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:25:49,393-Speed 10600.09 samples/sec   Loss 8.0098   LearningRate 0.0337   Epoch: 16   Global Step: 84940   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:25:50,320-Speed 11051.92 samples/sec   Loss 8.1948   LearningRate 0.0337   Epoch: 16   Global Step: 84950   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:25:51,315-Speed 10300.38 samples/sec   Loss 8.1511   LearningRate 0.0336   Epoch: 16   Global Step: 84960   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:25:52,271-Speed 10727.09 samples/sec   Loss 8.0987   LearningRate 0.0336   Epoch: 16   Global Step: 84970   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:25:53,264-Speed 10310.49 samples/sec   Loss 8.2035   LearningRate 0.0336   Epoch: 16   Global Step: 84980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:25:54,256-Speed 10333.99 samples/sec   Loss 8.0204   LearningRate 0.0336   Epoch: 16   Global Step: 84990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:25:55,216-Speed 10680.04 samples/sec   Loss 8.2770   LearningRate 0.0336   Epoch: 16   Global Step: 85000   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:25:56,148-Speed 10999.53 samples/sec   Loss 8.1259   LearningRate 0.0336   Epoch: 16   Global Step: 85010   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:25:57,103-Speed 10727.92 samples/sec   Loss 8.2363   LearningRate 0.0336   Epoch: 16   Global Step: 85020   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:25:58,065-Speed 10661.62 samples/sec   Loss 8.0066   LearningRate 0.0336   Epoch: 16   Global Step: 85030   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:25:59,011-Speed 10825.61 samples/sec   Loss 8.0599   LearningRate 0.0336   Epoch: 16   Global Step: 85040   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:25:59,949-Speed 10939.90 samples/sec   Loss 8.0827   LearningRate 0.0336   Epoch: 16   Global Step: 85050   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:26:00,854-Speed 11317.61 samples/sec   Loss 8.0831   LearningRate 0.0336   Epoch: 16   Global Step: 85060   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:26:01,839-Speed 10403.19 samples/sec   Loss 8.0622   LearningRate 0.0336   Epoch: 16   Global Step: 85070   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:26:02,750-Speed 11256.57 samples/sec   Loss 8.1221   LearningRate 0.0336   Epoch: 16   Global Step: 85080   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:26:03,683-Speed 10981.65 samples/sec   Loss 8.1704   LearningRate 0.0336   Epoch: 16   Global Step: 85090   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:26:04,638-Speed 10732.66 samples/sec   Loss 8.1740   LearningRate 0.0336   Epoch: 16   Global Step: 85100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:26:05,573-Speed 10973.48 samples/sec   Loss 8.2333   LearningRate 0.0336   Epoch: 16   Global Step: 85110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:26:06,551-Speed 10477.02 samples/sec   Loss 8.0645   LearningRate 0.0336   Epoch: 16   Global Step: 85120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:26:07,540-Speed 10361.62 samples/sec   Loss 8.3839   LearningRate 0.0336   Epoch: 16   Global Step: 85130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:26:08,483-Speed 10890.95 samples/sec   Loss 8.1516   LearningRate 0.0335   Epoch: 16   Global Step: 85140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:26:09,438-Speed 10724.19 samples/sec   Loss 8.2180   LearningRate 0.0335   Epoch: 16   Global Step: 85150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:26:10,443-Speed 10206.19 samples/sec   Loss 8.2913   LearningRate 0.0335   Epoch: 16   Global Step: 85160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:26:11,428-Speed 10395.70 samples/sec   Loss 8.2667   LearningRate 0.0335   Epoch: 16   Global Step: 85170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:26:12,361-Speed 10987.32 samples/sec   Loss 8.2323   LearningRate 0.0335   Epoch: 16   Global Step: 85180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:26:13,307-Speed 10836.09 samples/sec   Loss 8.0956   LearningRate 0.0335   Epoch: 16   Global Step: 85190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:26:14,299-Speed 10326.76 samples/sec   Loss 8.1022   LearningRate 0.0335   Epoch: 16   Global Step: 85200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:26:15,246-Speed 10824.03 samples/sec   Loss 7.8689   LearningRate 0.0335   Epoch: 16   Global Step: 85210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:26:16,191-Speed 10844.76 samples/sec   Loss 8.2137   LearningRate 0.0335   Epoch: 16   Global Step: 85220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:26:17,163-Speed 10543.48 samples/sec   Loss 8.2089   LearningRate 0.0335   Epoch: 16   Global Step: 85230   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:26:18,154-Speed 10343.16 samples/sec   Loss 8.1583   LearningRate 0.0335   Epoch: 16   Global Step: 85240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:26:19,123-Speed 10584.11 samples/sec   Loss 8.0658   LearningRate 0.0335   Epoch: 16   Global Step: 85250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:26:20,102-Speed 10497.68 samples/sec   Loss 8.1238   LearningRate 0.0335   Epoch: 16   Global Step: 85260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:26:21,078-Speed 10500.24 samples/sec   Loss 8.1290   LearningRate 0.0335   Epoch: 16   Global Step: 85270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:26:22,076-Speed 10272.52 samples/sec   Loss 8.1222   LearningRate 0.0335   Epoch: 16   Global Step: 85280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:26:22,967-Speed 11505.93 samples/sec   Loss 8.0419   LearningRate 0.0335   Epoch: 16   Global Step: 85290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:26:23,932-Speed 10618.06 samples/sec   Loss 8.1500   LearningRate 0.0335   Epoch: 16   Global Step: 85300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:26:24,933-Speed 10243.14 samples/sec   Loss 8.2053   LearningRate 0.0334   Epoch: 16   Global Step: 85310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:26:25,894-Speed 10661.26 samples/sec   Loss 8.0244   LearningRate 0.0334   Epoch: 16   Global Step: 85320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:26:26,822-Speed 11041.79 samples/sec   Loss 8.0724   LearningRate 0.0334   Epoch: 16   Global Step: 85330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:26:27,832-Speed 10155.05 samples/sec   Loss 8.0407   LearningRate 0.0334   Epoch: 16   Global Step: 85340   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:26:28,822-Speed 10353.71 samples/sec   Loss 8.2551   LearningRate 0.0334   Epoch: 16   Global Step: 85350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:26:29,814-Speed 10325.84 samples/sec   Loss 7.9288   LearningRate 0.0334   Epoch: 16   Global Step: 85360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:26:30,781-Speed 10602.50 samples/sec   Loss 8.1403   LearningRate 0.0334   Epoch: 16   Global Step: 85370   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:26:31,758-Speed 10493.49 samples/sec   Loss 8.2790   LearningRate 0.0334   Epoch: 16   Global Step: 85380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:26:32,713-Speed 10728.69 samples/sec   Loss 8.3232   LearningRate 0.0334   Epoch: 16   Global Step: 85390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:26:33,696-Speed 10424.53 samples/sec   Loss 8.2426   LearningRate 0.0334   Epoch: 16   Global Step: 85400   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:26:34,681-Speed 10414.06 samples/sec   Loss 7.9938   LearningRate 0.0334   Epoch: 16   Global Step: 85410   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:26:35,607-Speed 11059.80 samples/sec   Loss 8.2017   LearningRate 0.0334   Epoch: 16   Global Step: 85420   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:26:36,556-Speed 10805.08 samples/sec   Loss 8.2349   LearningRate 0.0334   Epoch: 16   Global Step: 85430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:26:37,544-Speed 10366.97 samples/sec   Loss 8.2474   LearningRate 0.0334   Epoch: 16   Global Step: 85440   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:26:38,541-Speed 10281.74 samples/sec   Loss 8.1390   LearningRate 0.0334   Epoch: 16   Global Step: 85450   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:26:39,468-Speed 11061.89 samples/sec   Loss 8.1261   LearningRate 0.0334   Epoch: 16   Global Step: 85460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:26:40,462-Speed 10309.55 samples/sec   Loss 8.2747   LearningRate 0.0334   Epoch: 16   Global Step: 85470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:26:41,467-Speed 10204.29 samples/sec   Loss 8.2474   LearningRate 0.0334   Epoch: 16   Global Step: 85480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:26:42,431-Speed 10637.25 samples/sec   Loss 8.3169   LearningRate 0.0333   Epoch: 16   Global Step: 85490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:26:43,461-Speed 9948.01 samples/sec   Loss 8.2534   LearningRate 0.0333   Epoch: 16   Global Step: 85500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:26:44,422-Speed 10668.39 samples/sec   Loss 8.0892   LearningRate 0.0333   Epoch: 16   Global Step: 85510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:26:45,367-Speed 10847.69 samples/sec   Loss 8.2591   LearningRate 0.0333   Epoch: 16   Global Step: 85520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:26:46,327-Speed 10674.63 samples/sec   Loss 8.2847   LearningRate 0.0333   Epoch: 16   Global Step: 85530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:26:47,284-Speed 10706.26 samples/sec   Loss 8.1833   LearningRate 0.0333   Epoch: 16   Global Step: 85540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:26:48,278-Speed 10311.90 samples/sec   Loss 8.1321   LearningRate 0.0333   Epoch: 16   Global Step: 85550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:26:49,252-Speed 10539.33 samples/sec   Loss 8.2354   LearningRate 0.0333   Epoch: 16   Global Step: 85560   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:26:50,187-Speed 10960.19 samples/sec   Loss 8.0178   LearningRate 0.0333   Epoch: 16   Global Step: 85570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:26:51,138-Speed 10774.93 samples/sec   Loss 8.2604   LearningRate 0.0333   Epoch: 16   Global Step: 85580   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:26:52,123-Speed 10406.69 samples/sec   Loss 8.1000   LearningRate 0.0333   Epoch: 16   Global Step: 85590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:26:53,084-Speed 10661.94 samples/sec   Loss 8.0581   LearningRate 0.0333   Epoch: 16   Global Step: 85600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:26:54,037-Speed 10761.38 samples/sec   Loss 7.9739   LearningRate 0.0333   Epoch: 16   Global Step: 85610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:26:54,950-Speed 11221.80 samples/sec   Loss 8.2163   LearningRate 0.0333   Epoch: 16   Global Step: 85620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:26:55,910-Speed 10674.01 samples/sec   Loss 8.3079   LearningRate 0.0333   Epoch: 16   Global Step: 85630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:26:56,895-Speed 10407.66 samples/sec   Loss 8.1497   LearningRate 0.0333   Epoch: 16   Global Step: 85640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:26:57,835-Speed 10901.05 samples/sec   Loss 8.2011   LearningRate 0.0333   Epoch: 16   Global Step: 85650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:26:58,803-Speed 10589.00 samples/sec   Loss 8.1647   LearningRate 0.0332   Epoch: 16   Global Step: 85660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:26:59,772-Speed 10571.81 samples/sec   Loss 8.2731   LearningRate 0.0332   Epoch: 16   Global Step: 85670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:27:00,750-Speed 10488.39 samples/sec   Loss 7.9811   LearningRate 0.0332   Epoch: 16   Global Step: 85680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:27:01,719-Speed 10567.21 samples/sec   Loss 8.1033   LearningRate 0.0332   Epoch: 16   Global Step: 85690   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:27:02,692-Speed 10541.03 samples/sec   Loss 8.0395   LearningRate 0.0332   Epoch: 16   Global Step: 85700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:27:03,680-Speed 10368.39 samples/sec   Loss 8.1330   LearningRate 0.0332   Epoch: 16   Global Step: 85710   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:27:04,618-Speed 10939.28 samples/sec   Loss 8.2237   LearningRate 0.0332   Epoch: 16   Global Step: 85720   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:27:05,590-Speed 10545.88 samples/sec   Loss 8.0393   LearningRate 0.0332   Epoch: 16   Global Step: 85730   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:27:06,495-Speed 11327.31 samples/sec   Loss 8.0833   LearningRate 0.0332   Epoch: 16   Global Step: 85740   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:27:07,448-Speed 10751.36 samples/sec   Loss 8.0671   LearningRate 0.0332   Epoch: 16   Global Step: 85750   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:27:08,428-Speed 10461.31 samples/sec   Loss 8.1284   LearningRate 0.0332   Epoch: 16   Global Step: 85760   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:27:09,407-Speed 10469.89 samples/sec   Loss 8.2233   LearningRate 0.0332   Epoch: 16   Global Step: 85770   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:27:10,380-Speed 10525.83 samples/sec   Loss 7.9519   LearningRate 0.0332   Epoch: 16   Global Step: 85780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:27:11,368-Speed 10378.42 samples/sec   Loss 8.1725   LearningRate 0.0332   Epoch: 16   Global Step: 85790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:27:12,281-Speed 11225.29 samples/sec   Loss 8.2222   LearningRate 0.0332   Epoch: 16   Global Step: 85800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:27:13,255-Speed 10527.15 samples/sec   Loss 8.0989   LearningRate 0.0332   Epoch: 16   Global Step: 85810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:27:14,262-Speed 10169.97 samples/sec   Loss 8.2872   LearningRate 0.0332   Epoch: 16   Global Step: 85820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:27:15,237-Speed 10518.27 samples/sec   Loss 8.1984   LearningRate 0.0332   Epoch: 16   Global Step: 85830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:27:16,202-Speed 10627.77 samples/sec   Loss 8.2029   LearningRate 0.0331   Epoch: 16   Global Step: 85840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:27:17,150-Speed 10807.15 samples/sec   Loss 8.2156   LearningRate 0.0331   Epoch: 16   Global Step: 85850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:27:18,104-Speed 10733.36 samples/sec   Loss 8.2468   LearningRate 0.0331   Epoch: 16   Global Step: 85860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:27:19,092-Speed 10384.49 samples/sec   Loss 8.1143   LearningRate 0.0331   Epoch: 16   Global Step: 85870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:27:20,097-Speed 10199.37 samples/sec   Loss 8.1146   LearningRate 0.0331   Epoch: 16   Global Step: 85880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:27:21,023-Speed 11069.91 samples/sec   Loss 8.1121   LearningRate 0.0331   Epoch: 16   Global Step: 85890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:27:21,992-Speed 10581.20 samples/sec   Loss 8.0947   LearningRate 0.0331   Epoch: 16   Global Step: 85900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:27:22,929-Speed 10944.19 samples/sec   Loss 8.3661   LearningRate 0.0331   Epoch: 16   Global Step: 85910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:27:23,872-Speed 10867.20 samples/sec   Loss 8.1288   LearningRate 0.0331   Epoch: 16   Global Step: 85920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:27:24,816-Speed 10856.52 samples/sec   Loss 8.1229   LearningRate 0.0331   Epoch: 16   Global Step: 85930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:27:25,749-Speed 10991.09 samples/sec   Loss 8.2432   LearningRate 0.0331   Epoch: 16   Global Step: 85940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:27:26,699-Speed 10783.79 samples/sec   Loss 8.1859   LearningRate 0.0331   Epoch: 16   Global Step: 85950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:27:27,695-Speed 10289.72 samples/sec   Loss 8.1343   LearningRate 0.0331   Epoch: 16   Global Step: 85960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:27:28,635-Speed 10900.15 samples/sec   Loss 8.0430   LearningRate 0.0331   Epoch: 16   Global Step: 85970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:27:29,703-Speed 9603.23 samples/sec   Loss 8.4154   LearningRate 0.0331   Epoch: 16   Global Step: 85980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:27:42,000-Speed 832.98 samples/sec   Loss 7.6941   LearningRate 0.0331   Epoch: 17   Global Step: 85990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:27:43,313-Speed 7808.78 samples/sec   Loss 7.2998   LearningRate 0.0331   Epoch: 17   Global Step: 86000   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:28:05,714-[lfw][86000]XNorm: 11.665890
Training: 2022-04-11 02:28:05,714-[lfw][86000]Accuracy-Flip: 0.99517+-0.00302
Training: 2022-04-11 02:28:05,715-[lfw][86000]Accuracy-Highest: 0.99617
Training: 2022-04-11 02:28:31,299-[cfp_fp][86000]XNorm: 9.861430
Training: 2022-04-11 02:28:31,300-[cfp_fp][86000]Accuracy-Flip: 0.96071+-0.00990
Training: 2022-04-11 02:28:31,300-[cfp_fp][86000]Accuracy-Highest: 0.96071
Training: 2022-04-11 02:28:53,407-[agedb_30][86000]XNorm: 11.299735
Training: 2022-04-11 02:28:53,407-[agedb_30][86000]Accuracy-Flip: 0.96183+-0.00941
Training: 2022-04-11 02:28:53,407-[agedb_30][86000]Accuracy-Highest: 0.96517
Training: 2022-04-11 02:28:54,368-Speed 144.12 samples/sec   Loss 7.3007   LearningRate 0.0330   Epoch: 17   Global Step: 86010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:28:55,326-Speed 10697.84 samples/sec   Loss 7.0764   LearningRate 0.0330   Epoch: 17   Global Step: 86020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:28:56,361-Speed 9898.49 samples/sec   Loss 7.3132   LearningRate 0.0330   Epoch: 17   Global Step: 86030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:28:57,383-Speed 10025.13 samples/sec   Loss 7.1316   LearningRate 0.0330   Epoch: 17   Global Step: 86040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:28:58,372-Speed 10369.70 samples/sec   Loss 7.3736   LearningRate 0.0330   Epoch: 17   Global Step: 86050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:28:59,366-Speed 10308.70 samples/sec   Loss 7.3006   LearningRate 0.0330   Epoch: 17   Global Step: 86060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:29:00,325-Speed 10693.52 samples/sec   Loss 7.3271   LearningRate 0.0330   Epoch: 17   Global Step: 86070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:29:01,299-Speed 10518.96 samples/sec   Loss 7.3654   LearningRate 0.0330   Epoch: 17   Global Step: 86080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:29:02,257-Speed 10697.58 samples/sec   Loss 7.3442   LearningRate 0.0330   Epoch: 17   Global Step: 86090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:29:03,313-Speed 9713.09 samples/sec   Loss 7.3592   LearningRate 0.0330   Epoch: 17   Global Step: 86100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:29:04,295-Speed 10443.53 samples/sec   Loss 7.3576   LearningRate 0.0330   Epoch: 17   Global Step: 86110   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:29:05,256-Speed 10664.39 samples/sec   Loss 7.3823   LearningRate 0.0330   Epoch: 17   Global Step: 86120   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:29:06,230-Speed 10519.46 samples/sec   Loss 7.3522   LearningRate 0.0330   Epoch: 17   Global Step: 86130   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:29:07,186-Speed 10719.91 samples/sec   Loss 7.3446   LearningRate 0.0330   Epoch: 17   Global Step: 86140   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:29:08,171-Speed 10401.61 samples/sec   Loss 7.3751   LearningRate 0.0330   Epoch: 17   Global Step: 86150   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:29:09,133-Speed 10661.72 samples/sec   Loss 7.4614   LearningRate 0.0330   Epoch: 17   Global Step: 86160   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:29:10,101-Speed 10579.04 samples/sec   Loss 7.2584   LearningRate 0.0330   Epoch: 17   Global Step: 86170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:29:11,139-Speed 9875.17 samples/sec   Loss 7.2668   LearningRate 0.0330   Epoch: 17   Global Step: 86180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:29:12,092-Speed 10757.63 samples/sec   Loss 7.3193   LearningRate 0.0329   Epoch: 17   Global Step: 86190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:29:13,048-Speed 10726.96 samples/sec   Loss 7.3922   LearningRate 0.0329   Epoch: 17   Global Step: 86200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:29:13,998-Speed 10792.10 samples/sec   Loss 7.2407   LearningRate 0.0329   Epoch: 17   Global Step: 86210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:29:14,927-Speed 11030.49 samples/sec   Loss 7.3217   LearningRate 0.0329   Epoch: 17   Global Step: 86220   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:29:15,858-Speed 11002.00 samples/sec   Loss 7.3660   LearningRate 0.0329   Epoch: 17   Global Step: 86230   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:29:16,827-Speed 10583.81 samples/sec   Loss 7.4073   LearningRate 0.0329   Epoch: 17   Global Step: 86240   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:29:17,805-Speed 10478.76 samples/sec   Loss 7.5542   LearningRate 0.0329   Epoch: 17   Global Step: 86250   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:29:18,749-Speed 10859.70 samples/sec   Loss 7.3487   LearningRate 0.0329   Epoch: 17   Global Step: 86260   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:29:19,727-Speed 10482.29 samples/sec   Loss 7.2827   LearningRate 0.0329   Epoch: 17   Global Step: 86270   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:29:20,727-Speed 10243.87 samples/sec   Loss 7.3169   LearningRate 0.0329   Epoch: 17   Global Step: 86280   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:29:21,791-Speed 9636.21 samples/sec   Loss 7.5037   LearningRate 0.0329   Epoch: 17   Global Step: 86290   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:29:22,807-Speed 10085.74 samples/sec   Loss 7.4083   LearningRate 0.0329   Epoch: 17   Global Step: 86300   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:29:23,854-Speed 9791.65 samples/sec   Loss 7.4344   LearningRate 0.0329   Epoch: 17   Global Step: 86310   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:29:24,926-Speed 9555.72 samples/sec   Loss 7.4971   LearningRate 0.0329   Epoch: 17   Global Step: 86320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:29:25,870-Speed 10862.06 samples/sec   Loss 7.4523   LearningRate 0.0329   Epoch: 17   Global Step: 86330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:29:26,851-Speed 10442.71 samples/sec   Loss 7.5918   LearningRate 0.0329   Epoch: 17   Global Step: 86340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:29:27,919-Speed 9606.13 samples/sec   Loss 7.4385   LearningRate 0.0329   Epoch: 17   Global Step: 86350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:29:28,892-Speed 10536.46 samples/sec   Loss 7.4714   LearningRate 0.0329   Epoch: 17   Global Step: 86360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:29:29,869-Speed 10485.71 samples/sec   Loss 7.4776   LearningRate 0.0328   Epoch: 17   Global Step: 86370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:29:30,838-Speed 10569.47 samples/sec   Loss 7.4897   LearningRate 0.0328   Epoch: 17   Global Step: 86380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:29:31,784-Speed 10838.62 samples/sec   Loss 7.4684   LearningRate 0.0328   Epoch: 17   Global Step: 86390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:29:32,752-Speed 10590.38 samples/sec   Loss 7.3824   LearningRate 0.0328   Epoch: 17   Global Step: 86400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:29:33,772-Speed 10041.10 samples/sec   Loss 7.3731   LearningRate 0.0328   Epoch: 17   Global Step: 86410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:29:34,740-Speed 10591.64 samples/sec   Loss 7.4944   LearningRate 0.0328   Epoch: 17   Global Step: 86420   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:29:35,692-Speed 10768.49 samples/sec   Loss 7.5101   LearningRate 0.0328   Epoch: 17   Global Step: 86430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:29:36,657-Speed 10622.50 samples/sec   Loss 7.4424   LearningRate 0.0328   Epoch: 17   Global Step: 86440   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:29:37,602-Speed 10844.47 samples/sec   Loss 7.5423   LearningRate 0.0328   Epoch: 17   Global Step: 86450   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:29:38,597-Speed 10299.22 samples/sec   Loss 7.5242   LearningRate 0.0328   Epoch: 17   Global Step: 86460   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:29:39,512-Speed 11200.91 samples/sec   Loss 7.4178   LearningRate 0.0328   Epoch: 17   Global Step: 86470   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:29:40,476-Speed 10637.26 samples/sec   Loss 7.6013   LearningRate 0.0328   Epoch: 17   Global Step: 86480   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:29:41,404-Speed 11043.02 samples/sec   Loss 7.4996   LearningRate 0.0328   Epoch: 17   Global Step: 86490   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:29:42,427-Speed 10019.55 samples/sec   Loss 7.5475   LearningRate 0.0328   Epoch: 17   Global Step: 86500   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:29:43,391-Speed 10631.62 samples/sec   Loss 7.5024   LearningRate 0.0328   Epoch: 17   Global Step: 86510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:29:44,362-Speed 10563.14 samples/sec   Loss 7.6078   LearningRate 0.0328   Epoch: 17   Global Step: 86520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:29:45,315-Speed 10746.43 samples/sec   Loss 7.7925   LearningRate 0.0328   Epoch: 17   Global Step: 86530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:29:46,285-Speed 10564.78 samples/sec   Loss 7.4899   LearningRate 0.0327   Epoch: 17   Global Step: 86540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:29:47,235-Speed 10790.42 samples/sec   Loss 7.6210   LearningRate 0.0327   Epoch: 17   Global Step: 86550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:29:48,164-Speed 11038.54 samples/sec   Loss 7.6193   LearningRate 0.0327   Epoch: 17   Global Step: 86560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:29:49,134-Speed 10558.33 samples/sec   Loss 7.8330   LearningRate 0.0327   Epoch: 17   Global Step: 86570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:29:50,149-Speed 10098.76 samples/sec   Loss 7.6206   LearningRate 0.0327   Epoch: 17   Global Step: 86580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:29:51,091-Speed 10877.83 samples/sec   Loss 7.5105   LearningRate 0.0327   Epoch: 17   Global Step: 86590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:29:52,044-Speed 10752.60 samples/sec   Loss 7.6336   LearningRate 0.0327   Epoch: 17   Global Step: 86600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:29:53,026-Speed 10438.38 samples/sec   Loss 7.5850   LearningRate 0.0327   Epoch: 17   Global Step: 86610   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:29:54,000-Speed 10523.92 samples/sec   Loss 7.5993   LearningRate 0.0327   Epoch: 17   Global Step: 86620   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:29:54,980-Speed 10460.95 samples/sec   Loss 7.6591   LearningRate 0.0327   Epoch: 17   Global Step: 86630   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:29:55,937-Speed 10713.47 samples/sec   Loss 7.5932   LearningRate 0.0327   Epoch: 17   Global Step: 86640   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:29:56,902-Speed 10612.90 samples/sec   Loss 7.5112   LearningRate 0.0327   Epoch: 17   Global Step: 86650   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:29:57,893-Speed 10349.36 samples/sec   Loss 7.6185   LearningRate 0.0327   Epoch: 17   Global Step: 86660   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:29:58,843-Speed 10791.56 samples/sec   Loss 7.7038   LearningRate 0.0327   Epoch: 17   Global Step: 86670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:29:59,879-Speed 9888.91 samples/sec   Loss 7.7270   LearningRate 0.0327   Epoch: 17   Global Step: 86680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:30:00,884-Speed 10198.74 samples/sec   Loss 7.7724   LearningRate 0.0327   Epoch: 17   Global Step: 86690   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:30:01,868-Speed 10413.27 samples/sec   Loss 7.6464   LearningRate 0.0327   Epoch: 17   Global Step: 86700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:30:02,833-Speed 10627.07 samples/sec   Loss 7.7343   LearningRate 0.0327   Epoch: 17   Global Step: 86710   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:30:03,758-Speed 11081.12 samples/sec   Loss 7.6649   LearningRate 0.0326   Epoch: 17   Global Step: 86720   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:30:04,668-Speed 11259.24 samples/sec   Loss 7.6777   LearningRate 0.0326   Epoch: 17   Global Step: 86730   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:30:05,631-Speed 10643.68 samples/sec   Loss 7.5468   LearningRate 0.0326   Epoch: 17   Global Step: 86740   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:30:06,607-Speed 10508.69 samples/sec   Loss 7.5800   LearningRate 0.0326   Epoch: 17   Global Step: 86750   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:30:07,562-Speed 10734.42 samples/sec   Loss 7.7717   LearningRate 0.0326   Epoch: 17   Global Step: 86760   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:30:08,509-Speed 10826.60 samples/sec   Loss 7.6863   LearningRate 0.0326   Epoch: 17   Global Step: 86770   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:30:09,482-Speed 10532.91 samples/sec   Loss 7.7470   LearningRate 0.0326   Epoch: 17   Global Step: 86780   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:30:10,446-Speed 10632.99 samples/sec   Loss 7.6984   LearningRate 0.0326   Epoch: 17   Global Step: 86790   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:30:11,480-Speed 9910.58 samples/sec   Loss 7.6978   LearningRate 0.0326   Epoch: 17   Global Step: 86800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:30:12,427-Speed 10834.44 samples/sec   Loss 7.7363   LearningRate 0.0326   Epoch: 17   Global Step: 86810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:30:13,377-Speed 10779.23 samples/sec   Loss 7.6206   LearningRate 0.0326   Epoch: 17   Global Step: 86820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:30:14,337-Speed 10680.15 samples/sec   Loss 7.7635   LearningRate 0.0326   Epoch: 17   Global Step: 86830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:30:15,295-Speed 10699.84 samples/sec   Loss 7.6907   LearningRate 0.0326   Epoch: 17   Global Step: 86840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:30:16,245-Speed 10789.37 samples/sec   Loss 7.7418   LearningRate 0.0326   Epoch: 17   Global Step: 86850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:30:17,192-Speed 10821.14 samples/sec   Loss 7.7807   LearningRate 0.0326   Epoch: 17   Global Step: 86860   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:30:18,147-Speed 10737.20 samples/sec   Loss 7.6899   LearningRate 0.0326   Epoch: 17   Global Step: 86870   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:30:19,092-Speed 10855.91 samples/sec   Loss 7.6752   LearningRate 0.0326   Epoch: 17   Global Step: 86880   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:30:20,075-Speed 10420.04 samples/sec   Loss 7.7037   LearningRate 0.0326   Epoch: 17   Global Step: 86890   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:30:21,086-Speed 10144.11 samples/sec   Loss 7.6041   LearningRate 0.0325   Epoch: 17   Global Step: 86900   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:30:22,053-Speed 10596.70 samples/sec   Loss 7.6934   LearningRate 0.0325   Epoch: 17   Global Step: 86910   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:30:23,010-Speed 10715.57 samples/sec   Loss 7.6666   LearningRate 0.0325   Epoch: 17   Global Step: 86920   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:30:23,999-Speed 10360.34 samples/sec   Loss 7.7125   LearningRate 0.0325   Epoch: 17   Global Step: 86930   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:30:24,983-Speed 10422.75 samples/sec   Loss 7.6312   LearningRate 0.0325   Epoch: 17   Global Step: 86940   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:30:25,958-Speed 10521.58 samples/sec   Loss 7.7032   LearningRate 0.0325   Epoch: 17   Global Step: 86950   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:30:26,910-Speed 10778.56 samples/sec   Loss 7.6077   LearningRate 0.0325   Epoch: 17   Global Step: 86960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:30:27,921-Speed 10135.04 samples/sec   Loss 7.6927   LearningRate 0.0325   Epoch: 17   Global Step: 86970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:30:28,881-Speed 10673.73 samples/sec   Loss 7.8360   LearningRate 0.0325   Epoch: 17   Global Step: 86980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:30:29,825-Speed 10857.18 samples/sec   Loss 7.8534   LearningRate 0.0325   Epoch: 17   Global Step: 86990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:30:30,797-Speed 10549.00 samples/sec   Loss 7.7497   LearningRate 0.0325   Epoch: 17   Global Step: 87000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:30:31,741-Speed 10862.66 samples/sec   Loss 7.7235   LearningRate 0.0325   Epoch: 17   Global Step: 87010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:30:32,673-Speed 10992.96 samples/sec   Loss 7.7839   LearningRate 0.0325   Epoch: 17   Global Step: 87020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:30:33,645-Speed 10550.00 samples/sec   Loss 7.5639   LearningRate 0.0325   Epoch: 17   Global Step: 87030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:30:34,618-Speed 10534.48 samples/sec   Loss 7.7651   LearningRate 0.0325   Epoch: 17   Global Step: 87040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:30:35,636-Speed 10065.97 samples/sec   Loss 7.8036   LearningRate 0.0325   Epoch: 17   Global Step: 87050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:30:36,611-Speed 10504.98 samples/sec   Loss 7.7759   LearningRate 0.0325   Epoch: 17   Global Step: 87060   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:30:37,570-Speed 10696.66 samples/sec   Loss 7.7350   LearningRate 0.0324   Epoch: 17   Global Step: 87070   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:30:38,572-Speed 10226.75 samples/sec   Loss 7.6974   LearningRate 0.0324   Epoch: 17   Global Step: 87080   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:30:39,558-Speed 10397.28 samples/sec   Loss 7.6504   LearningRate 0.0324   Epoch: 17   Global Step: 87090   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:30:40,579-Speed 10038.61 samples/sec   Loss 7.7696   LearningRate 0.0324   Epoch: 17   Global Step: 87100   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:30:41,570-Speed 10346.42 samples/sec   Loss 7.7029   LearningRate 0.0324   Epoch: 17   Global Step: 87110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:30:42,530-Speed 10673.49 samples/sec   Loss 7.8269   LearningRate 0.0324   Epoch: 17   Global Step: 87120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:30:43,499-Speed 10579.67 samples/sec   Loss 7.9104   LearningRate 0.0324   Epoch: 17   Global Step: 87130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:30:44,478-Speed 10459.41 samples/sec   Loss 7.7732   LearningRate 0.0324   Epoch: 17   Global Step: 87140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:30:45,417-Speed 10915.22 samples/sec   Loss 7.8519   LearningRate 0.0324   Epoch: 17   Global Step: 87150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:30:46,358-Speed 10895.08 samples/sec   Loss 7.8496   LearningRate 0.0324   Epoch: 17   Global Step: 87160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:30:47,368-Speed 10149.03 samples/sec   Loss 7.7685   LearningRate 0.0324   Epoch: 17   Global Step: 87170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:30:48,308-Speed 10905.05 samples/sec   Loss 7.8947   LearningRate 0.0324   Epoch: 17   Global Step: 87180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:30:49,256-Speed 10810.44 samples/sec   Loss 7.6810   LearningRate 0.0324   Epoch: 17   Global Step: 87190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:30:50,220-Speed 10634.51 samples/sec   Loss 7.7743   LearningRate 0.0324   Epoch: 17   Global Step: 87200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:30:51,215-Speed 10299.39 samples/sec   Loss 7.8272   LearningRate 0.0324   Epoch: 17   Global Step: 87210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:30:52,173-Speed 10705.32 samples/sec   Loss 7.8397   LearningRate 0.0324   Epoch: 17   Global Step: 87220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:30:53,155-Speed 10441.57 samples/sec   Loss 7.8398   LearningRate 0.0324   Epoch: 17   Global Step: 87230   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:30:54,091-Speed 10946.53 samples/sec   Loss 7.6569   LearningRate 0.0324   Epoch: 17   Global Step: 87240   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:30:55,048-Speed 10706.53 samples/sec   Loss 7.7373   LearningRate 0.0323   Epoch: 17   Global Step: 87250   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:30:56,009-Speed 10664.30 samples/sec   Loss 7.8233   LearningRate 0.0323   Epoch: 17   Global Step: 87260   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:30:56,972-Speed 10649.46 samples/sec   Loss 7.8422   LearningRate 0.0323   Epoch: 17   Global Step: 87270   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:30:57,904-Speed 10992.65 samples/sec   Loss 7.8654   LearningRate 0.0323   Epoch: 17   Global Step: 87280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:30:58,870-Speed 10611.87 samples/sec   Loss 7.6846   LearningRate 0.0323   Epoch: 17   Global Step: 87290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:30:59,858-Speed 10371.22 samples/sec   Loss 7.9722   LearningRate 0.0323   Epoch: 17   Global Step: 87300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:31:00,835-Speed 10487.81 samples/sec   Loss 7.8165   LearningRate 0.0323   Epoch: 17   Global Step: 87310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:31:01,799-Speed 10641.02 samples/sec   Loss 7.8026   LearningRate 0.0323   Epoch: 17   Global Step: 87320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:31:02,805-Speed 10185.66 samples/sec   Loss 7.9241   LearningRate 0.0323   Epoch: 17   Global Step: 87330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:31:03,748-Speed 10872.02 samples/sec   Loss 7.9246   LearningRate 0.0323   Epoch: 17   Global Step: 87340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:31:04,700-Speed 10772.22 samples/sec   Loss 7.8186   LearningRate 0.0323   Epoch: 17   Global Step: 87350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:31:05,658-Speed 10694.68 samples/sec   Loss 7.6258   LearningRate 0.0323   Epoch: 17   Global Step: 87360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:31:06,617-Speed 10691.04 samples/sec   Loss 7.8614   LearningRate 0.0323   Epoch: 17   Global Step: 87370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:31:07,549-Speed 10991.28 samples/sec   Loss 7.8096   LearningRate 0.0323   Epoch: 17   Global Step: 87380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:31:08,500-Speed 10780.79 samples/sec   Loss 7.9117   LearningRate 0.0323   Epoch: 17   Global Step: 87390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:31:09,426-Speed 11074.48 samples/sec   Loss 7.6722   LearningRate 0.0323   Epoch: 17   Global Step: 87400   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:31:10,416-Speed 10349.43 samples/sec   Loss 7.7600   LearningRate 0.0323   Epoch: 17   Global Step: 87410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:31:11,359-Speed 10885.34 samples/sec   Loss 7.8698   LearningRate 0.0323   Epoch: 17   Global Step: 87420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:31:12,314-Speed 10731.46 samples/sec   Loss 7.9063   LearningRate 0.0322   Epoch: 17   Global Step: 87430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:31:13,283-Speed 10583.05 samples/sec   Loss 7.8597   LearningRate 0.0322   Epoch: 17   Global Step: 87440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:31:14,256-Speed 10529.42 samples/sec   Loss 7.7572   LearningRate 0.0322   Epoch: 17   Global Step: 87450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:31:15,260-Speed 10216.50 samples/sec   Loss 7.9818   LearningRate 0.0322   Epoch: 17   Global Step: 87460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:31:16,172-Speed 11236.57 samples/sec   Loss 7.8422   LearningRate 0.0322   Epoch: 17   Global Step: 87470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:31:17,123-Speed 10775.59 samples/sec   Loss 7.9607   LearningRate 0.0322   Epoch: 17   Global Step: 87480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:31:18,085-Speed 10652.48 samples/sec   Loss 7.9047   LearningRate 0.0322   Epoch: 17   Global Step: 87490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:31:19,104-Speed 10059.34 samples/sec   Loss 7.9060   LearningRate 0.0322   Epoch: 17   Global Step: 87500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:31:20,035-Speed 10998.49 samples/sec   Loss 7.8423   LearningRate 0.0322   Epoch: 17   Global Step: 87510   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:31:20,981-Speed 10845.80 samples/sec   Loss 7.9958   LearningRate 0.0322   Epoch: 17   Global Step: 87520   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:31:21,906-Speed 11072.90 samples/sec   Loss 7.8562   LearningRate 0.0322   Epoch: 17   Global Step: 87530   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:31:22,923-Speed 10082.25 samples/sec   Loss 7.9992   LearningRate 0.0322   Epoch: 17   Global Step: 87540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:31:23,890-Speed 10602.64 samples/sec   Loss 7.9931   LearningRate 0.0322   Epoch: 17   Global Step: 87550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:31:24,859-Speed 10572.94 samples/sec   Loss 7.7982   LearningRate 0.0322   Epoch: 17   Global Step: 87560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:31:25,834-Speed 10511.03 samples/sec   Loss 7.8464   LearningRate 0.0322   Epoch: 17   Global Step: 87570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:31:26,800-Speed 10613.09 samples/sec   Loss 7.9031   LearningRate 0.0322   Epoch: 17   Global Step: 87580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:31:27,769-Speed 10578.01 samples/sec   Loss 8.1290   LearningRate 0.0322   Epoch: 17   Global Step: 87590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:31:28,732-Speed 10641.41 samples/sec   Loss 7.9081   LearningRate 0.0322   Epoch: 17   Global Step: 87600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:31:29,675-Speed 10867.31 samples/sec   Loss 7.8311   LearningRate 0.0321   Epoch: 17   Global Step: 87610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:31:30,618-Speed 10876.61 samples/sec   Loss 8.0443   LearningRate 0.0321   Epoch: 17   Global Step: 87620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:31:31,568-Speed 10790.60 samples/sec   Loss 7.8687   LearningRate 0.0321   Epoch: 17   Global Step: 87630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:31:32,530-Speed 10650.31 samples/sec   Loss 8.0336   LearningRate 0.0321   Epoch: 17   Global Step: 87640   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:31:33,469-Speed 10917.67 samples/sec   Loss 7.9326   LearningRate 0.0321   Epoch: 17   Global Step: 87650   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:31:34,544-Speed 9535.54 samples/sec   Loss 7.8992   LearningRate 0.0321   Epoch: 17   Global Step: 87660   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:31:35,461-Speed 11178.84 samples/sec   Loss 8.0220   LearningRate 0.0321   Epoch: 17   Global Step: 87670   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:31:36,387-Speed 11070.08 samples/sec   Loss 7.9122   LearningRate 0.0321   Epoch: 17   Global Step: 87680   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:31:37,350-Speed 10651.08 samples/sec   Loss 8.1272   LearningRate 0.0321   Epoch: 17   Global Step: 87690   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:31:38,332-Speed 10432.93 samples/sec   Loss 7.9334   LearningRate 0.0321   Epoch: 17   Global Step: 87700   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:31:39,285-Speed 10760.12 samples/sec   Loss 7.9742   LearningRate 0.0321   Epoch: 17   Global Step: 87710   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:31:40,296-Speed 10143.05 samples/sec   Loss 8.0092   LearningRate 0.0321   Epoch: 17   Global Step: 87720   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:31:41,276-Speed 10454.00 samples/sec   Loss 7.9760   LearningRate 0.0321   Epoch: 17   Global Step: 87730   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:31:42,231-Speed 10737.48 samples/sec   Loss 7.9249   LearningRate 0.0321   Epoch: 17   Global Step: 87740   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:31:43,187-Speed 10719.92 samples/sec   Loss 8.0012   LearningRate 0.0321   Epoch: 17   Global Step: 87750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:31:44,188-Speed 10238.45 samples/sec   Loss 7.9397   LearningRate 0.0321   Epoch: 17   Global Step: 87760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:31:45,175-Speed 10382.68 samples/sec   Loss 7.7862   LearningRate 0.0321   Epoch: 17   Global Step: 87770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:31:46,125-Speed 10784.64 samples/sec   Loss 7.9468   LearningRate 0.0321   Epoch: 17   Global Step: 87780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:31:47,101-Speed 10500.81 samples/sec   Loss 7.7853   LearningRate 0.0320   Epoch: 17   Global Step: 87790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:31:48,225-Speed 9114.68 samples/sec   Loss 7.7411   LearningRate 0.0320   Epoch: 17   Global Step: 87800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:31:49,185-Speed 10679.22 samples/sec   Loss 7.9343   LearningRate 0.0320   Epoch: 17   Global Step: 87810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:31:50,142-Speed 10720.57 samples/sec   Loss 7.8737   LearningRate 0.0320   Epoch: 17   Global Step: 87820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:31:51,084-Speed 10879.65 samples/sec   Loss 7.8490   LearningRate 0.0320   Epoch: 17   Global Step: 87830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:31:52,049-Speed 10621.31 samples/sec   Loss 7.9562   LearningRate 0.0320   Epoch: 17   Global Step: 87840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:31:53,025-Speed 10502.83 samples/sec   Loss 7.9500   LearningRate 0.0320   Epoch: 17   Global Step: 87850   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:31:53,962-Speed 10937.41 samples/sec   Loss 7.9576   LearningRate 0.0320   Epoch: 17   Global Step: 87860   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:31:54,973-Speed 10136.77 samples/sec   Loss 7.9987   LearningRate 0.0320   Epoch: 17   Global Step: 87870   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:31:56,613-Speed 6243.78 samples/sec   Loss 8.0326   LearningRate 0.0320   Epoch: 17   Global Step: 87880   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:31:57,882-Speed 8078.62 samples/sec   Loss 7.9108   LearningRate 0.0320   Epoch: 17   Global Step: 87890   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:31:58,837-Speed 10735.47 samples/sec   Loss 7.8855   LearningRate 0.0320   Epoch: 17   Global Step: 87900   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:32:00,814-Speed 5181.04 samples/sec   Loss 8.0185   LearningRate 0.0320   Epoch: 17   Global Step: 87910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:32:01,763-Speed 10799.34 samples/sec   Loss 7.8291   LearningRate 0.0320   Epoch: 17   Global Step: 87920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:32:02,738-Speed 10518.16 samples/sec   Loss 7.9517   LearningRate 0.0320   Epoch: 17   Global Step: 87930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:32:03,726-Speed 10368.65 samples/sec   Loss 7.8243   LearningRate 0.0320   Epoch: 17   Global Step: 87940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:32:04,713-Speed 10379.86 samples/sec   Loss 7.9671   LearningRate 0.0320   Epoch: 17   Global Step: 87950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:32:05,685-Speed 10541.59 samples/sec   Loss 7.8804   LearningRate 0.0319   Epoch: 17   Global Step: 87960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:32:06,660-Speed 10519.74 samples/sec   Loss 7.8827   LearningRate 0.0319   Epoch: 17   Global Step: 87970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:32:07,617-Speed 10710.55 samples/sec   Loss 7.9166   LearningRate 0.0319   Epoch: 17   Global Step: 87980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:32:08,574-Speed 10707.20 samples/sec   Loss 7.9512   LearningRate 0.0319   Epoch: 17   Global Step: 87990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:32:09,565-Speed 10336.78 samples/sec   Loss 7.9723   LearningRate 0.0319   Epoch: 17   Global Step: 88000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:32:31,787-[lfw][88000]XNorm: 11.375489
Training: 2022-04-11 02:32:31,788-[lfw][88000]Accuracy-Flip: 0.99650+-0.00329
Training: 2022-04-11 02:32:31,789-[lfw][88000]Accuracy-Highest: 0.99650
Training: 2022-04-11 02:32:57,288-[cfp_fp][88000]XNorm: 9.641358
Training: 2022-04-11 02:32:57,289-[cfp_fp][88000]Accuracy-Flip: 0.95357+-0.01005
Training: 2022-04-11 02:32:57,290-[cfp_fp][88000]Accuracy-Highest: 0.96071
Training: 2022-04-11 02:33:19,370-[agedb_30][88000]XNorm: 11.041657
Training: 2022-04-11 02:33:19,370-[agedb_30][88000]Accuracy-Flip: 0.96267+-0.00961
Training: 2022-04-11 02:33:19,371-[agedb_30][88000]Accuracy-Highest: 0.96517
Training: 2022-04-11 02:33:20,328-Speed 144.71 samples/sec   Loss 7.9998   LearningRate 0.0319   Epoch: 17   Global Step: 88010   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:33:21,277-Speed 10804.46 samples/sec   Loss 7.9851   LearningRate 0.0319   Epoch: 17   Global Step: 88020   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:33:22,241-Speed 10625.93 samples/sec   Loss 7.9029   LearningRate 0.0319   Epoch: 17   Global Step: 88030   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:33:23,271-Speed 9951.23 samples/sec   Loss 7.9178   LearningRate 0.0319   Epoch: 17   Global Step: 88040   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:33:24,225-Speed 10747.68 samples/sec   Loss 7.9448   LearningRate 0.0319   Epoch: 17   Global Step: 88050   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:33:25,171-Speed 10844.34 samples/sec   Loss 7.8962   LearningRate 0.0319   Epoch: 17   Global Step: 88060   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:33:26,123-Speed 10762.00 samples/sec   Loss 7.9222   LearningRate 0.0319   Epoch: 17   Global Step: 88070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:33:27,105-Speed 10437.44 samples/sec   Loss 7.9304   LearningRate 0.0319   Epoch: 17   Global Step: 88080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:33:28,154-Speed 9768.91 samples/sec   Loss 7.9261   LearningRate 0.0319   Epoch: 17   Global Step: 88090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:33:29,104-Speed 10795.71 samples/sec   Loss 7.9884   LearningRate 0.0319   Epoch: 17   Global Step: 88100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:33:30,086-Speed 10427.57 samples/sec   Loss 7.7363   LearningRate 0.0319   Epoch: 17   Global Step: 88110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:33:31,084-Speed 10265.43 samples/sec   Loss 7.9789   LearningRate 0.0319   Epoch: 17   Global Step: 88120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:33:32,019-Speed 10972.85 samples/sec   Loss 8.0171   LearningRate 0.0319   Epoch: 17   Global Step: 88130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:33:32,948-Speed 11029.31 samples/sec   Loss 7.9140   LearningRate 0.0318   Epoch: 17   Global Step: 88140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:33:33,918-Speed 10566.15 samples/sec   Loss 7.8735   LearningRate 0.0318   Epoch: 17   Global Step: 88150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:33:34,877-Speed 10690.01 samples/sec   Loss 7.8267   LearningRate 0.0318   Epoch: 17   Global Step: 88160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:33:35,825-Speed 10810.04 samples/sec   Loss 7.9062   LearningRate 0.0318   Epoch: 17   Global Step: 88170   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:33:36,826-Speed 10237.85 samples/sec   Loss 7.9925   LearningRate 0.0318   Epoch: 17   Global Step: 88180   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:33:37,794-Speed 10582.99 samples/sec   Loss 7.9154   LearningRate 0.0318   Epoch: 17   Global Step: 88190   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:33:38,772-Speed 10482.78 samples/sec   Loss 7.9090   LearningRate 0.0318   Epoch: 17   Global Step: 88200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:33:39,689-Speed 11184.30 samples/sec   Loss 7.7994   LearningRate 0.0318   Epoch: 17   Global Step: 88210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:33:40,687-Speed 10273.80 samples/sec   Loss 8.1107   LearningRate 0.0318   Epoch: 17   Global Step: 88220   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:33:41,652-Speed 10616.58 samples/sec   Loss 7.8703   LearningRate 0.0318   Epoch: 17   Global Step: 88230   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:33:42,634-Speed 10432.90 samples/sec   Loss 8.0215   LearningRate 0.0318   Epoch: 17   Global Step: 88240   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:33:43,731-Speed 9343.42 samples/sec   Loss 7.8591   LearningRate 0.0318   Epoch: 17   Global Step: 88250   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:33:44,714-Speed 10426.26 samples/sec   Loss 8.1726   LearningRate 0.0318   Epoch: 17   Global Step: 88260   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:33:45,683-Speed 10583.62 samples/sec   Loss 7.9408   LearningRate 0.0318   Epoch: 17   Global Step: 88270   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:33:46,653-Speed 10557.42 samples/sec   Loss 7.9080   LearningRate 0.0318   Epoch: 17   Global Step: 88280   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:33:47,676-Speed 10018.53 samples/sec   Loss 8.0455   LearningRate 0.0318   Epoch: 17   Global Step: 88290   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:33:48,642-Speed 10615.41 samples/sec   Loss 7.9524   LearningRate 0.0318   Epoch: 17   Global Step: 88300   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:33:49,588-Speed 10832.16 samples/sec   Loss 7.8469   LearningRate 0.0318   Epoch: 17   Global Step: 88310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:33:50,510-Speed 11130.82 samples/sec   Loss 7.8604   LearningRate 0.0317   Epoch: 17   Global Step: 88320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:33:51,484-Speed 10515.98 samples/sec   Loss 7.9353   LearningRate 0.0317   Epoch: 17   Global Step: 88330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:33:52,463-Speed 10470.85 samples/sec   Loss 7.9882   LearningRate 0.0317   Epoch: 17   Global Step: 88340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:33:53,428-Speed 10627.79 samples/sec   Loss 8.0140   LearningRate 0.0317   Epoch: 17   Global Step: 88350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:33:54,384-Speed 10725.86 samples/sec   Loss 7.9550   LearningRate 0.0317   Epoch: 17   Global Step: 88360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:33:55,333-Speed 10798.29 samples/sec   Loss 7.9247   LearningRate 0.0317   Epoch: 17   Global Step: 88370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:33:56,293-Speed 10681.48 samples/sec   Loss 7.9787   LearningRate 0.0317   Epoch: 17   Global Step: 88380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:33:57,288-Speed 10298.00 samples/sec   Loss 7.9391   LearningRate 0.0317   Epoch: 17   Global Step: 88390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:33:58,269-Speed 10460.32 samples/sec   Loss 7.8842   LearningRate 0.0317   Epoch: 17   Global Step: 88400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:33:59,217-Speed 10817.81 samples/sec   Loss 8.0821   LearningRate 0.0317   Epoch: 17   Global Step: 88410   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:34:00,185-Speed 10584.94 samples/sec   Loss 8.0111   LearningRate 0.0317   Epoch: 17   Global Step: 88420   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:34:01,124-Speed 10916.79 samples/sec   Loss 7.9596   LearningRate 0.0317   Epoch: 17   Global Step: 88430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:34:02,062-Speed 10937.67 samples/sec   Loss 8.0068   LearningRate 0.0317   Epoch: 17   Global Step: 88440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:03,031-Speed 10570.93 samples/sec   Loss 8.0585   LearningRate 0.0317   Epoch: 17   Global Step: 88450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:04,018-Speed 10391.67 samples/sec   Loss 7.9305   LearningRate 0.0317   Epoch: 17   Global Step: 88460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:04,998-Speed 10455.65 samples/sec   Loss 8.0603   LearningRate 0.0317   Epoch: 17   Global Step: 88470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:05,954-Speed 10733.29 samples/sec   Loss 8.0232   LearningRate 0.0317   Epoch: 17   Global Step: 88480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:06,894-Speed 10902.29 samples/sec   Loss 7.8869   LearningRate 0.0317   Epoch: 17   Global Step: 88490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:07,874-Speed 10451.46 samples/sec   Loss 8.0314   LearningRate 0.0316   Epoch: 17   Global Step: 88500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:08,850-Speed 10505.24 samples/sec   Loss 7.9116   LearningRate 0.0316   Epoch: 17   Global Step: 88510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:09,869-Speed 10075.90 samples/sec   Loss 7.9575   LearningRate 0.0316   Epoch: 17   Global Step: 88520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:10,825-Speed 10727.05 samples/sec   Loss 7.9609   LearningRate 0.0316   Epoch: 17   Global Step: 88530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:11,808-Speed 10420.64 samples/sec   Loss 7.7937   LearningRate 0.0316   Epoch: 17   Global Step: 88540   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:34:12,779-Speed 10549.44 samples/sec   Loss 7.9972   LearningRate 0.0316   Epoch: 17   Global Step: 88550   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:34:13,701-Speed 11128.07 samples/sec   Loss 7.8960   LearningRate 0.0316   Epoch: 17   Global Step: 88560   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:34:14,701-Speed 10247.29 samples/sec   Loss 8.0601   LearningRate 0.0316   Epoch: 17   Global Step: 88570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:34:15,641-Speed 10904.71 samples/sec   Loss 7.9392   LearningRate 0.0316   Epoch: 17   Global Step: 88580   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:34:16,626-Speed 10408.18 samples/sec   Loss 7.9041   LearningRate 0.0316   Epoch: 17   Global Step: 88590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:34:17,564-Speed 10930.23 samples/sec   Loss 7.8819   LearningRate 0.0316   Epoch: 17   Global Step: 88600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:34:18,541-Speed 10498.26 samples/sec   Loss 7.9206   LearningRate 0.0316   Epoch: 17   Global Step: 88610   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:34:19,545-Speed 10209.40 samples/sec   Loss 8.0029   LearningRate 0.0316   Epoch: 17   Global Step: 88620   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:34:20,488-Speed 10862.19 samples/sec   Loss 8.0448   LearningRate 0.0316   Epoch: 17   Global Step: 88630   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:34:21,465-Speed 10490.86 samples/sec   Loss 8.0363   LearningRate 0.0316   Epoch: 17   Global Step: 88640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:22,436-Speed 10558.47 samples/sec   Loss 8.0793   LearningRate 0.0316   Epoch: 17   Global Step: 88650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:23,647-Speed 8460.09 samples/sec   Loss 7.8004   LearningRate 0.0316   Epoch: 17   Global Step: 88660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:24,720-Speed 9558.53 samples/sec   Loss 7.9467   LearningRate 0.0316   Epoch: 17   Global Step: 88670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:25,667-Speed 10831.87 samples/sec   Loss 7.9594   LearningRate 0.0315   Epoch: 17   Global Step: 88680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:26,636-Speed 10570.62 samples/sec   Loss 8.0287   LearningRate 0.0315   Epoch: 17   Global Step: 88690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:27,603-Speed 10601.20 samples/sec   Loss 7.8098   LearningRate 0.0315   Epoch: 17   Global Step: 88700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:28,605-Speed 10238.50 samples/sec   Loss 8.0681   LearningRate 0.0315   Epoch: 17   Global Step: 88710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:29,571-Speed 10608.65 samples/sec   Loss 7.8254   LearningRate 0.0315   Epoch: 17   Global Step: 88720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:30,528-Speed 10716.20 samples/sec   Loss 8.0615   LearningRate 0.0315   Epoch: 17   Global Step: 88730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:31,544-Speed 10079.58 samples/sec   Loss 8.0687   LearningRate 0.0315   Epoch: 17   Global Step: 88740   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:34:32,544-Speed 10251.40 samples/sec   Loss 7.9823   LearningRate 0.0315   Epoch: 17   Global Step: 88750   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:34:33,489-Speed 10847.50 samples/sec   Loss 7.9590   LearningRate 0.0315   Epoch: 17   Global Step: 88760   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:34:34,433-Speed 10863.41 samples/sec   Loss 7.9824   LearningRate 0.0315   Epoch: 17   Global Step: 88770   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:34:35,375-Speed 10875.08 samples/sec   Loss 7.8156   LearningRate 0.0315   Epoch: 17   Global Step: 88780   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:34:36,302-Speed 11057.16 samples/sec   Loss 7.9998   LearningRate 0.0315   Epoch: 17   Global Step: 88790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:37,277-Speed 10519.35 samples/sec   Loss 8.0066   LearningRate 0.0315   Epoch: 17   Global Step: 88800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:38,244-Speed 10599.69 samples/sec   Loss 8.0465   LearningRate 0.0315   Epoch: 17   Global Step: 88810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:39,175-Speed 11000.18 samples/sec   Loss 7.9382   LearningRate 0.0315   Epoch: 17   Global Step: 88820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:40,136-Speed 10675.87 samples/sec   Loss 7.9826   LearningRate 0.0315   Epoch: 17   Global Step: 88830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:41,113-Speed 10496.90 samples/sec   Loss 8.0902   LearningRate 0.0315   Epoch: 17   Global Step: 88840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:42,092-Speed 10473.22 samples/sec   Loss 7.8941   LearningRate 0.0315   Epoch: 17   Global Step: 88850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:43,065-Speed 10538.80 samples/sec   Loss 7.8809   LearningRate 0.0314   Epoch: 17   Global Step: 88860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:44,055-Speed 10341.64 samples/sec   Loss 8.0484   LearningRate 0.0314   Epoch: 17   Global Step: 88870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:45,011-Speed 10723.50 samples/sec   Loss 8.1135   LearningRate 0.0314   Epoch: 17   Global Step: 88880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:45,960-Speed 10804.89 samples/sec   Loss 7.8983   LearningRate 0.0314   Epoch: 17   Global Step: 88890   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:34:46,943-Speed 10420.42 samples/sec   Loss 7.8081   LearningRate 0.0314   Epoch: 17   Global Step: 88900   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:34:47,979-Speed 9899.99 samples/sec   Loss 7.9831   LearningRate 0.0314   Epoch: 17   Global Step: 88910   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:34:48,953-Speed 10529.03 samples/sec   Loss 8.1406   LearningRate 0.0314   Epoch: 17   Global Step: 88920   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:34:49,888-Speed 10953.59 samples/sec   Loss 7.9731   LearningRate 0.0314   Epoch: 17   Global Step: 88930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:50,873-Speed 10404.78 samples/sec   Loss 7.9748   LearningRate 0.0314   Epoch: 17   Global Step: 88940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:51,850-Speed 10500.11 samples/sec   Loss 8.0691   LearningRate 0.0314   Epoch: 17   Global Step: 88950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:52,821-Speed 10556.14 samples/sec   Loss 7.9164   LearningRate 0.0314   Epoch: 17   Global Step: 88960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:53,800-Speed 10462.04 samples/sec   Loss 8.0341   LearningRate 0.0314   Epoch: 17   Global Step: 88970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:54,793-Speed 10325.37 samples/sec   Loss 8.1903   LearningRate 0.0314   Epoch: 17   Global Step: 88980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:55,717-Speed 11085.15 samples/sec   Loss 8.0639   LearningRate 0.0314   Epoch: 17   Global Step: 88990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:56,684-Speed 10599.65 samples/sec   Loss 7.8501   LearningRate 0.0314   Epoch: 17   Global Step: 89000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:57,655-Speed 10560.75 samples/sec   Loss 8.0296   LearningRate 0.0314   Epoch: 17   Global Step: 89010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:58,637-Speed 10438.30 samples/sec   Loss 8.1045   LearningRate 0.0314   Epoch: 17   Global Step: 89020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:34:59,625-Speed 10367.46 samples/sec   Loss 7.9832   LearningRate 0.0314   Epoch: 17   Global Step: 89030   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:35:00,580-Speed 10734.61 samples/sec   Loss 8.0693   LearningRate 0.0313   Epoch: 17   Global Step: 89040   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:35:01,554-Speed 10528.41 samples/sec   Loss 7.9843   LearningRate 0.0313   Epoch: 17   Global Step: 89050   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:35:02,526-Speed 10544.96 samples/sec   Loss 7.9429   LearningRate 0.0313   Epoch: 17   Global Step: 89060   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:35:03,487-Speed 10659.06 samples/sec   Loss 7.9538   LearningRate 0.0313   Epoch: 17   Global Step: 89070   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:35:04,431-Speed 10859.57 samples/sec   Loss 8.1033   LearningRate 0.0313   Epoch: 17   Global Step: 89080   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:35:05,404-Speed 10538.03 samples/sec   Loss 8.1719   LearningRate 0.0313   Epoch: 17   Global Step: 89090   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:35:06,321-Speed 11170.08 samples/sec   Loss 8.0423   LearningRate 0.0313   Epoch: 17   Global Step: 89100   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:35:07,262-Speed 10882.98 samples/sec   Loss 8.0261   LearningRate 0.0313   Epoch: 17   Global Step: 89110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:35:08,246-Speed 10419.37 samples/sec   Loss 8.0791   LearningRate 0.0313   Epoch: 17   Global Step: 89120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:35:09,250-Speed 10212.66 samples/sec   Loss 8.0168   LearningRate 0.0313   Epoch: 17   Global Step: 89130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:35:10,178-Speed 11043.73 samples/sec   Loss 8.0844   LearningRate 0.0313   Epoch: 17   Global Step: 89140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:35:11,151-Speed 10542.24 samples/sec   Loss 7.7198   LearningRate 0.0313   Epoch: 17   Global Step: 89150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:35:12,121-Speed 10564.40 samples/sec   Loss 7.9897   LearningRate 0.0313   Epoch: 17   Global Step: 89160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:35:13,082-Speed 10676.38 samples/sec   Loss 7.9529   LearningRate 0.0313   Epoch: 17   Global Step: 89170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:35:14,034-Speed 10758.44 samples/sec   Loss 8.1946   LearningRate 0.0313   Epoch: 17   Global Step: 89180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:35:14,959-Speed 11085.63 samples/sec   Loss 8.1140   LearningRate 0.0313   Epoch: 17   Global Step: 89190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:35:15,860-Speed 11376.17 samples/sec   Loss 7.9951   LearningRate 0.0313   Epoch: 17   Global Step: 89200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:35:16,854-Speed 10309.06 samples/sec   Loss 7.9835   LearningRate 0.0313   Epoch: 17   Global Step: 89210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:35:17,831-Speed 10495.04 samples/sec   Loss 8.1475   LearningRate 0.0312   Epoch: 17   Global Step: 89220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:35:18,782-Speed 10784.67 samples/sec   Loss 7.9847   LearningRate 0.0312   Epoch: 17   Global Step: 89230   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:35:19,761-Speed 10465.52 samples/sec   Loss 7.9079   LearningRate 0.0312   Epoch: 17   Global Step: 89240   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:35:20,706-Speed 10844.39 samples/sec   Loss 8.0516   LearningRate 0.0312   Epoch: 17   Global Step: 89250   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:35:21,658-Speed 10766.80 samples/sec   Loss 8.0040   LearningRate 0.0312   Epoch: 17   Global Step: 89260   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:35:22,651-Speed 10316.70 samples/sec   Loss 8.1143   LearningRate 0.0312   Epoch: 17   Global Step: 89270   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:35:23,582-Speed 11011.52 samples/sec   Loss 8.0404   LearningRate 0.0312   Epoch: 17   Global Step: 89280   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:35:24,538-Speed 10721.98 samples/sec   Loss 7.8949   LearningRate 0.0312   Epoch: 17   Global Step: 89290   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:35:25,505-Speed 10599.48 samples/sec   Loss 8.0372   LearningRate 0.0312   Epoch: 17   Global Step: 89300   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:35:26,501-Speed 10296.78 samples/sec   Loss 7.7686   LearningRate 0.0312   Epoch: 17   Global Step: 89310   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:35:27,510-Speed 10152.59 samples/sec   Loss 8.1062   LearningRate 0.0312   Epoch: 17   Global Step: 89320   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:35:28,417-Speed 11306.45 samples/sec   Loss 7.9896   LearningRate 0.0312   Epoch: 17   Global Step: 89330   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:35:29,369-Speed 10772.65 samples/sec   Loss 8.0165   LearningRate 0.0312   Epoch: 17   Global Step: 89340   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:35:30,302-Speed 10987.60 samples/sec   Loss 8.1504   LearningRate 0.0312   Epoch: 17   Global Step: 89350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:35:31,289-Speed 10379.44 samples/sec   Loss 7.9456   LearningRate 0.0312   Epoch: 17   Global Step: 89360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:35:32,211-Speed 11121.22 samples/sec   Loss 8.0056   LearningRate 0.0312   Epoch: 17   Global Step: 89370   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:35:33,122-Speed 11248.83 samples/sec   Loss 8.1111   LearningRate 0.0312   Epoch: 17   Global Step: 89380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:35:34,106-Speed 10416.36 samples/sec   Loss 7.8617   LearningRate 0.0312   Epoch: 17   Global Step: 89390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:35:35,105-Speed 10264.18 samples/sec   Loss 8.1012   LearningRate 0.0312   Epoch: 17   Global Step: 89400   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:35:36,039-Speed 10979.06 samples/sec   Loss 7.9303   LearningRate 0.0311   Epoch: 17   Global Step: 89410   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:35:37,026-Speed 10378.34 samples/sec   Loss 7.8572   LearningRate 0.0311   Epoch: 17   Global Step: 89420   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:35:38,085-Speed 9683.43 samples/sec   Loss 8.0740   LearningRate 0.0311   Epoch: 17   Global Step: 89430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:35:39,019-Speed 10972.82 samples/sec   Loss 7.8290   LearningRate 0.0311   Epoch: 17   Global Step: 89440   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:35:39,950-Speed 11009.48 samples/sec   Loss 7.9116   LearningRate 0.0311   Epoch: 17   Global Step: 89450   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:35:40,909-Speed 10689.95 samples/sec   Loss 8.0898   LearningRate 0.0311   Epoch: 17   Global Step: 89460   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:35:41,921-Speed 10134.73 samples/sec   Loss 8.0628   LearningRate 0.0311   Epoch: 17   Global Step: 89470   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:35:42,846-Speed 11075.41 samples/sec   Loss 7.9219   LearningRate 0.0311   Epoch: 17   Global Step: 89480   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:35:43,779-Speed 10987.43 samples/sec   Loss 8.0565   LearningRate 0.0311   Epoch: 17   Global Step: 89490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:35:44,798-Speed 10051.68 samples/sec   Loss 7.9559   LearningRate 0.0311   Epoch: 17   Global Step: 89500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:35:45,772-Speed 10525.23 samples/sec   Loss 8.0268   LearningRate 0.0311   Epoch: 17   Global Step: 89510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:35:46,747-Speed 10513.68 samples/sec   Loss 8.0414   LearningRate 0.0311   Epoch: 17   Global Step: 89520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:35:47,684-Speed 10947.40 samples/sec   Loss 8.0186   LearningRate 0.0311   Epoch: 17   Global Step: 89530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:35:48,614-Speed 11016.80 samples/sec   Loss 7.9506   LearningRate 0.0311   Epoch: 17   Global Step: 89540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:35:49,673-Speed 9678.68 samples/sec   Loss 7.8575   LearningRate 0.0311   Epoch: 17   Global Step: 89550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:35:50,609-Speed 10956.63 samples/sec   Loss 8.0111   LearningRate 0.0311   Epoch: 17   Global Step: 89560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:35:51,547-Speed 10925.40 samples/sec   Loss 7.9636   LearningRate 0.0311   Epoch: 17   Global Step: 89570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:35:52,483-Speed 10945.72 samples/sec   Loss 8.0231   LearningRate 0.0311   Epoch: 17   Global Step: 89580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:35:53,500-Speed 10076.90 samples/sec   Loss 8.0251   LearningRate 0.0310   Epoch: 17   Global Step: 89590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:35:54,436-Speed 10959.31 samples/sec   Loss 7.9113   LearningRate 0.0310   Epoch: 17   Global Step: 89600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:35:55,377-Speed 10883.17 samples/sec   Loss 7.9956   LearningRate 0.0310   Epoch: 17   Global Step: 89610   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:35:56,326-Speed 10810.34 samples/sec   Loss 7.9378   LearningRate 0.0310   Epoch: 17   Global Step: 89620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:35:57,279-Speed 10748.70 samples/sec   Loss 7.9884   LearningRate 0.0310   Epoch: 17   Global Step: 89630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:35:58,278-Speed 10258.89 samples/sec   Loss 8.0378   LearningRate 0.0310   Epoch: 17   Global Step: 89640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:35:59,276-Speed 10275.24 samples/sec   Loss 8.0268   LearningRate 0.0310   Epoch: 17   Global Step: 89650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:36:00,191-Speed 11199.07 samples/sec   Loss 7.8730   LearningRate 0.0310   Epoch: 17   Global Step: 89660   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:36:01,141-Speed 10785.04 samples/sec   Loss 8.0342   LearningRate 0.0310   Epoch: 17   Global Step: 89670   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:36:02,121-Speed 10467.42 samples/sec   Loss 8.0556   LearningRate 0.0310   Epoch: 17   Global Step: 89680   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:36:03,077-Speed 10726.53 samples/sec   Loss 8.1002   LearningRate 0.0310   Epoch: 17   Global Step: 89690   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:36:04,041-Speed 10633.62 samples/sec   Loss 8.1514   LearningRate 0.0310   Epoch: 17   Global Step: 89700   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:36:05,016-Speed 10510.57 samples/sec   Loss 7.9470   LearningRate 0.0310   Epoch: 17   Global Step: 89710   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:36:05,953-Speed 10933.13 samples/sec   Loss 8.0014   LearningRate 0.0310   Epoch: 17   Global Step: 89720   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:36:06,889-Speed 10953.20 samples/sec   Loss 8.0408   LearningRate 0.0310   Epoch: 17   Global Step: 89730   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:36:07,875-Speed 10404.37 samples/sec   Loss 7.8990   LearningRate 0.0310   Epoch: 17   Global Step: 89740   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:36:08,876-Speed 10238.40 samples/sec   Loss 7.8807   LearningRate 0.0310   Epoch: 17   Global Step: 89750   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:36:09,818-Speed 10886.54 samples/sec   Loss 8.0738   LearningRate 0.0310   Epoch: 17   Global Step: 89760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:36:10,807-Speed 10355.84 samples/sec   Loss 7.8177   LearningRate 0.0309   Epoch: 17   Global Step: 89770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:36:11,854-Speed 9790.43 samples/sec   Loss 8.0566   LearningRate 0.0309   Epoch: 17   Global Step: 89780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:36:12,782-Speed 11048.29 samples/sec   Loss 7.9197   LearningRate 0.0309   Epoch: 17   Global Step: 89790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:36:13,840-Speed 9687.37 samples/sec   Loss 7.9673   LearningRate 0.0309   Epoch: 17   Global Step: 89800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:36:14,802-Speed 10659.51 samples/sec   Loss 8.0321   LearningRate 0.0309   Epoch: 17   Global Step: 89810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:36:15,735-Speed 10982.70 samples/sec   Loss 7.9702   LearningRate 0.0309   Epoch: 17   Global Step: 89820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:36:16,684-Speed 10811.14 samples/sec   Loss 8.0865   LearningRate 0.0309   Epoch: 17   Global Step: 89830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:36:17,603-Speed 11144.80 samples/sec   Loss 8.0430   LearningRate 0.0309   Epoch: 17   Global Step: 89840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:36:18,660-Speed 9694.35 samples/sec   Loss 8.0079   LearningRate 0.0309   Epoch: 17   Global Step: 89850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:36:19,600-Speed 10913.49 samples/sec   Loss 8.0131   LearningRate 0.0309   Epoch: 17   Global Step: 89860   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:36:20,537-Speed 10933.67 samples/sec   Loss 7.9799   LearningRate 0.0309   Epoch: 17   Global Step: 89870   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:36:21,532-Speed 10297.07 samples/sec   Loss 7.9576   LearningRate 0.0309   Epoch: 17   Global Step: 89880   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:36:22,494-Speed 10666.19 samples/sec   Loss 7.9035   LearningRate 0.0309   Epoch: 17   Global Step: 89890   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:36:23,433-Speed 10908.40 samples/sec   Loss 7.9214   LearningRate 0.0309   Epoch: 17   Global Step: 89900   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:36:24,344-Speed 11252.78 samples/sec   Loss 8.1726   LearningRate 0.0309   Epoch: 17   Global Step: 89910   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:36:25,269-Speed 11079.12 samples/sec   Loss 8.1026   LearningRate 0.0309   Epoch: 17   Global Step: 89920   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:36:26,198-Speed 11035.60 samples/sec   Loss 7.9942   LearningRate 0.0309   Epoch: 17   Global Step: 89930   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:36:27,233-Speed 9994.35 samples/sec   Loss 8.2240   LearningRate 0.0309   Epoch: 17   Global Step: 89940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:36:28,169-Speed 10958.01 samples/sec   Loss 7.9452   LearningRate 0.0308   Epoch: 17   Global Step: 89950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:36:29,111-Speed 10872.14 samples/sec   Loss 8.0812   LearningRate 0.0308   Epoch: 17   Global Step: 89960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:36:30,129-Speed 10068.67 samples/sec   Loss 8.0570   LearningRate 0.0308   Epoch: 17   Global Step: 89970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:36:31,080-Speed 10776.83 samples/sec   Loss 7.9213   LearningRate 0.0308   Epoch: 17   Global Step: 89980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:36:32,027-Speed 10834.04 samples/sec   Loss 7.9559   LearningRate 0.0308   Epoch: 17   Global Step: 89990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:36:32,965-Speed 10921.36 samples/sec   Loss 8.0402   LearningRate 0.0308   Epoch: 17   Global Step: 90000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:36:55,124-[lfw][90000]XNorm: 11.052117
Training: 2022-04-11 02:36:55,125-[lfw][90000]Accuracy-Flip: 0.99567+-0.00410
Training: 2022-04-11 02:36:55,126-[lfw][90000]Accuracy-Highest: 0.99650
Training: 2022-04-11 02:37:20,684-[cfp_fp][90000]XNorm: 9.396634
Training: 2022-04-11 02:37:20,685-[cfp_fp][90000]Accuracy-Flip: 0.95743+-0.01158
Training: 2022-04-11 02:37:20,686-[cfp_fp][90000]Accuracy-Highest: 0.96071
Training: 2022-04-11 02:37:42,945-[agedb_30][90000]XNorm: 10.780513
Training: 2022-04-11 02:37:42,946-[agedb_30][90000]Accuracy-Flip: 0.96317+-0.00867
Training: 2022-04-11 02:37:42,947-[agedb_30][90000]Accuracy-Highest: 0.96517
Training: 2022-04-11 02:37:43,900-Speed 144.36 samples/sec   Loss 8.0255   LearningRate 0.0308   Epoch: 17   Global Step: 90010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:37:44,836-Speed 10953.70 samples/sec   Loss 8.1254   LearningRate 0.0308   Epoch: 17   Global Step: 90020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:37:45,741-Speed 11318.12 samples/sec   Loss 7.9040   LearningRate 0.0308   Epoch: 17   Global Step: 90030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:37:46,682-Speed 10899.39 samples/sec   Loss 8.0532   LearningRate 0.0308   Epoch: 17   Global Step: 90040   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:37:47,628-Speed 10825.83 samples/sec   Loss 7.9003   LearningRate 0.0308   Epoch: 17   Global Step: 90050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:37:48,605-Speed 10487.33 samples/sec   Loss 7.9514   LearningRate 0.0308   Epoch: 17   Global Step: 90060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:37:49,593-Speed 10393.43 samples/sec   Loss 7.9967   LearningRate 0.0308   Epoch: 17   Global Step: 90070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:37:50,510-Speed 11166.91 samples/sec   Loss 8.0487   LearningRate 0.0308   Epoch: 17   Global Step: 90080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:37:51,483-Speed 10540.92 samples/sec   Loss 8.1520   LearningRate 0.0308   Epoch: 17   Global Step: 90090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:37:52,406-Speed 11095.92 samples/sec   Loss 8.0162   LearningRate 0.0308   Epoch: 17   Global Step: 90100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:37:53,455-Speed 9778.50 samples/sec   Loss 8.1337   LearningRate 0.0308   Epoch: 17   Global Step: 90110   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:37:54,397-Speed 10888.51 samples/sec   Loss 7.8762   LearningRate 0.0308   Epoch: 17   Global Step: 90120   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:37:55,354-Speed 10699.70 samples/sec   Loss 8.0939   LearningRate 0.0307   Epoch: 17   Global Step: 90130   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:37:56,333-Speed 10470.87 samples/sec   Loss 8.0001   LearningRate 0.0307   Epoch: 17   Global Step: 90140   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:37:57,329-Speed 10290.79 samples/sec   Loss 8.1495   LearningRate 0.0307   Epoch: 17   Global Step: 90150   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:37:58,288-Speed 10697.68 samples/sec   Loss 7.9241   LearningRate 0.0307   Epoch: 17   Global Step: 90160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:37:59,262-Speed 10526.40 samples/sec   Loss 7.7971   LearningRate 0.0307   Epoch: 17   Global Step: 90170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:38:00,255-Speed 10323.43 samples/sec   Loss 7.9916   LearningRate 0.0307   Epoch: 17   Global Step: 90180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:38:01,237-Speed 10437.43 samples/sec   Loss 8.0331   LearningRate 0.0307   Epoch: 17   Global Step: 90190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:38:02,229-Speed 10332.84 samples/sec   Loss 8.0509   LearningRate 0.0307   Epoch: 17   Global Step: 90200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:38:03,205-Speed 10503.06 samples/sec   Loss 7.9614   LearningRate 0.0307   Epoch: 17   Global Step: 90210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:38:04,239-Speed 9913.55 samples/sec   Loss 8.0895   LearningRate 0.0307   Epoch: 17   Global Step: 90220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:38:05,181-Speed 10884.27 samples/sec   Loss 8.0540   LearningRate 0.0307   Epoch: 17   Global Step: 90230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:38:06,108-Speed 11056.68 samples/sec   Loss 8.0121   LearningRate 0.0307   Epoch: 17   Global Step: 90240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:38:07,105-Speed 10278.31 samples/sec   Loss 7.9718   LearningRate 0.0307   Epoch: 17   Global Step: 90250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:38:08,080-Speed 10521.35 samples/sec   Loss 8.1031   LearningRate 0.0307   Epoch: 17   Global Step: 90260   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:38:09,035-Speed 10728.60 samples/sec   Loss 7.9655   LearningRate 0.0307   Epoch: 17   Global Step: 90270   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:38:10,043-Speed 10170.88 samples/sec   Loss 8.1011   LearningRate 0.0307   Epoch: 17   Global Step: 90280   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:38:11,019-Speed 10513.91 samples/sec   Loss 8.0519   LearningRate 0.0307   Epoch: 17   Global Step: 90290   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:38:11,996-Speed 10482.67 samples/sec   Loss 7.8395   LearningRate 0.0307   Epoch: 17   Global Step: 90300   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:38:12,996-Speed 10247.95 samples/sec   Loss 8.0811   LearningRate 0.0307   Epoch: 17   Global Step: 90310   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:38:13,985-Speed 10364.78 samples/sec   Loss 8.0944   LearningRate 0.0306   Epoch: 17   Global Step: 90320   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:38:14,940-Speed 10740.92 samples/sec   Loss 7.9240   LearningRate 0.0306   Epoch: 17   Global Step: 90330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:38:15,864-Speed 11092.58 samples/sec   Loss 7.9438   LearningRate 0.0306   Epoch: 17   Global Step: 90340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:38:16,813-Speed 10808.48 samples/sec   Loss 7.9843   LearningRate 0.0306   Epoch: 17   Global Step: 90350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:38:17,805-Speed 10337.81 samples/sec   Loss 7.8993   LearningRate 0.0306   Epoch: 17   Global Step: 90360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:38:18,802-Speed 10288.83 samples/sec   Loss 8.0275   LearningRate 0.0306   Epoch: 17   Global Step: 90370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:38:19,753-Speed 10778.24 samples/sec   Loss 8.0508   LearningRate 0.0306   Epoch: 17   Global Step: 90380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:38:20,731-Speed 10485.08 samples/sec   Loss 7.9865   LearningRate 0.0306   Epoch: 17   Global Step: 90390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:38:21,678-Speed 10815.55 samples/sec   Loss 8.0683   LearningRate 0.0306   Epoch: 17   Global Step: 90400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:38:22,688-Speed 10149.33 samples/sec   Loss 8.0490   LearningRate 0.0306   Epoch: 17   Global Step: 90410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:38:23,665-Speed 10487.89 samples/sec   Loss 8.1046   LearningRate 0.0306   Epoch: 17   Global Step: 90420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:38:24,599-Speed 10976.14 samples/sec   Loss 8.1187   LearningRate 0.0306   Epoch: 17   Global Step: 90430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:38:25,551-Speed 10770.06 samples/sec   Loss 7.9402   LearningRate 0.0306   Epoch: 17   Global Step: 90440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:38:26,513-Speed 10647.42 samples/sec   Loss 8.0007   LearningRate 0.0306   Epoch: 17   Global Step: 90450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:38:27,496-Speed 10443.46 samples/sec   Loss 8.0477   LearningRate 0.0306   Epoch: 17   Global Step: 90460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:38:28,440-Speed 10864.85 samples/sec   Loss 8.1880   LearningRate 0.0306   Epoch: 17   Global Step: 90470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:38:29,398-Speed 10699.47 samples/sec   Loss 8.1026   LearningRate 0.0306   Epoch: 17   Global Step: 90480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:38:30,384-Speed 10387.65 samples/sec   Loss 8.1310   LearningRate 0.0306   Epoch: 17   Global Step: 90490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:38:31,380-Speed 10294.91 samples/sec   Loss 8.1955   LearningRate 0.0305   Epoch: 17   Global Step: 90500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:38:32,332-Speed 10763.36 samples/sec   Loss 7.9130   LearningRate 0.0305   Epoch: 17   Global Step: 90510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:38:33,262-Speed 11028.75 samples/sec   Loss 8.1642   LearningRate 0.0305   Epoch: 17   Global Step: 90520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:38:34,242-Speed 10454.07 samples/sec   Loss 8.0324   LearningRate 0.0305   Epoch: 17   Global Step: 90530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:38:35,212-Speed 10579.21 samples/sec   Loss 8.0410   LearningRate 0.0305   Epoch: 17   Global Step: 90540   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:38:36,198-Speed 10394.06 samples/sec   Loss 8.1587   LearningRate 0.0305   Epoch: 17   Global Step: 90550   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:38:37,163-Speed 10615.65 samples/sec   Loss 8.0376   LearningRate 0.0305   Epoch: 17   Global Step: 90560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:38:38,151-Speed 10378.85 samples/sec   Loss 8.0246   LearningRate 0.0305   Epoch: 17   Global Step: 90570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:38:39,120-Speed 10580.49 samples/sec   Loss 7.9785   LearningRate 0.0305   Epoch: 17   Global Step: 90580   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:38:40,110-Speed 10342.75 samples/sec   Loss 8.0679   LearningRate 0.0305   Epoch: 17   Global Step: 90590   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:38:41,071-Speed 10663.60 samples/sec   Loss 8.1554   LearningRate 0.0305   Epoch: 17   Global Step: 90600   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:38:42,074-Speed 10232.81 samples/sec   Loss 8.0163   LearningRate 0.0305   Epoch: 17   Global Step: 90610   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:38:43,016-Speed 10881.39 samples/sec   Loss 8.1084   LearningRate 0.0305   Epoch: 17   Global Step: 90620   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:38:43,997-Speed 10439.17 samples/sec   Loss 7.9224   LearningRate 0.0305   Epoch: 17   Global Step: 90630   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:38:44,972-Speed 10520.37 samples/sec   Loss 8.0006   LearningRate 0.0305   Epoch: 17   Global Step: 90640   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:38:45,925-Speed 10755.98 samples/sec   Loss 7.9746   LearningRate 0.0305   Epoch: 17   Global Step: 90650   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:38:46,920-Speed 10299.73 samples/sec   Loss 7.8210   LearningRate 0.0305   Epoch: 17   Global Step: 90660   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:38:47,847-Speed 11057.02 samples/sec   Loss 8.0375   LearningRate 0.0305   Epoch: 17   Global Step: 90670   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:38:48,815-Speed 10596.44 samples/sec   Loss 8.1935   LearningRate 0.0304   Epoch: 17   Global Step: 90680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:38:49,780-Speed 10618.26 samples/sec   Loss 8.0686   LearningRate 0.0304   Epoch: 17   Global Step: 90690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:38:50,782-Speed 10241.89 samples/sec   Loss 7.9141   LearningRate 0.0304   Epoch: 17   Global Step: 90700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:38:51,709-Speed 11056.93 samples/sec   Loss 7.8892   LearningRate 0.0304   Epoch: 17   Global Step: 90710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:38:52,671-Speed 10657.99 samples/sec   Loss 7.9956   LearningRate 0.0304   Epoch: 17   Global Step: 90720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:38:53,679-Speed 10176.89 samples/sec   Loss 8.0117   LearningRate 0.0304   Epoch: 17   Global Step: 90730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:38:54,650-Speed 10551.79 samples/sec   Loss 7.9781   LearningRate 0.0304   Epoch: 17   Global Step: 90740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:38:55,623-Speed 10535.36 samples/sec   Loss 8.0011   LearningRate 0.0304   Epoch: 17   Global Step: 90750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:38:56,537-Speed 11221.01 samples/sec   Loss 8.0363   LearningRate 0.0304   Epoch: 17   Global Step: 90760   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:38:57,595-Speed 9691.15 samples/sec   Loss 8.0064   LearningRate 0.0304   Epoch: 17   Global Step: 90770   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:38:58,536-Speed 10894.21 samples/sec   Loss 7.9114   LearningRate 0.0304   Epoch: 17   Global Step: 90780   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:38:59,495-Speed 10683.45 samples/sec   Loss 8.0262   LearningRate 0.0304   Epoch: 17   Global Step: 90790   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:39:00,450-Speed 10726.79 samples/sec   Loss 8.0940   LearningRate 0.0304   Epoch: 17   Global Step: 90800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:39:01,419-Speed 10578.14 samples/sec   Loss 8.0006   LearningRate 0.0304   Epoch: 17   Global Step: 90810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:39:02,356-Speed 10936.82 samples/sec   Loss 7.9193   LearningRate 0.0304   Epoch: 17   Global Step: 90820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:39:03,284-Speed 11052.22 samples/sec   Loss 8.0492   LearningRate 0.0304   Epoch: 17   Global Step: 90830   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:39:04,264-Speed 10679.89 samples/sec   Loss 7.8533   LearningRate 0.0304   Epoch: 17   Global Step: 90840   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:39:05,241-Speed 10486.14 samples/sec   Loss 8.1187   LearningRate 0.0304   Epoch: 17   Global Step: 90850   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:39:06,218-Speed 10497.86 samples/sec   Loss 8.1040   LearningRate 0.0304   Epoch: 17   Global Step: 90860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:39:07,165-Speed 10819.05 samples/sec   Loss 8.1366   LearningRate 0.0303   Epoch: 17   Global Step: 90870   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:39:08,206-Speed 9850.47 samples/sec   Loss 7.9800   LearningRate 0.0303   Epoch: 17   Global Step: 90880   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:39:09,150-Speed 10863.94 samples/sec   Loss 7.9967   LearningRate 0.0303   Epoch: 17   Global Step: 90890   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:39:10,080-Speed 11018.73 samples/sec   Loss 7.9683   LearningRate 0.0303   Epoch: 17   Global Step: 90900   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:39:11,043-Speed 10649.97 samples/sec   Loss 7.9662   LearningRate 0.0303   Epoch: 17   Global Step: 90910   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:39:12,123-Speed 9482.10 samples/sec   Loss 7.8907   LearningRate 0.0303   Epoch: 17   Global Step: 90920   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:39:13,117-Speed 10320.49 samples/sec   Loss 8.1116   LearningRate 0.0303   Epoch: 17   Global Step: 90930   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:39:14,046-Speed 11033.03 samples/sec   Loss 8.0493   LearningRate 0.0303   Epoch: 17   Global Step: 90940   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:39:14,988-Speed 10875.16 samples/sec   Loss 7.9579   LearningRate 0.0303   Epoch: 17   Global Step: 90950   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:39:15,928-Speed 10911.52 samples/sec   Loss 8.0042   LearningRate 0.0303   Epoch: 17   Global Step: 90960   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:39:16,896-Speed 10580.61 samples/sec   Loss 7.7988   LearningRate 0.0303   Epoch: 17   Global Step: 90970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:39:17,903-Speed 10190.00 samples/sec   Loss 8.0054   LearningRate 0.0303   Epoch: 17   Global Step: 90980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:39:18,843-Speed 10926.06 samples/sec   Loss 7.9546   LearningRate 0.0303   Epoch: 17   Global Step: 90990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:39:19,796-Speed 10755.27 samples/sec   Loss 8.1924   LearningRate 0.0303   Epoch: 17   Global Step: 91000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:39:20,762-Speed 10619.30 samples/sec   Loss 8.1456   LearningRate 0.0303   Epoch: 17   Global Step: 91010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:39:21,739-Speed 10485.84 samples/sec   Loss 8.0694   LearningRate 0.0303   Epoch: 17   Global Step: 91020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:39:22,692-Speed 10755.39 samples/sec   Loss 8.0102   LearningRate 0.0303   Epoch: 17   Global Step: 91030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:39:23,807-Speed 9197.13 samples/sec   Loss 7.8933   LearningRate 0.0303   Epoch: 17   Global Step: 91040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:39:36,017-Speed 838.73 samples/sec   Loss 7.2783   LearningRate 0.0302   Epoch: 18   Global Step: 91050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:39:37,031-Speed 10111.41 samples/sec   Loss 7.0715   LearningRate 0.0302   Epoch: 18   Global Step: 91060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:39:38,064-Speed 9923.10 samples/sec   Loss 7.0204   LearningRate 0.0302   Epoch: 18   Global Step: 91070   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:39:39,128-Speed 9631.81 samples/sec   Loss 7.0261   LearningRate 0.0302   Epoch: 18   Global Step: 91080   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:39:40,676-Speed 6616.88 samples/sec   Loss 7.2472   LearningRate 0.0302   Epoch: 18   Global Step: 91090   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:39:41,724-Speed 9776.54 samples/sec   Loss 7.0472   LearningRate 0.0302   Epoch: 18   Global Step: 91100   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:39:42,735-Speed 10135.66 samples/sec   Loss 7.1245   LearningRate 0.0302   Epoch: 18   Global Step: 91110   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:39:43,709-Speed 10545.01 samples/sec   Loss 6.9484   LearningRate 0.0302   Epoch: 18   Global Step: 91120   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:39:44,713-Speed 10210.65 samples/sec   Loss 7.0526   LearningRate 0.0302   Epoch: 18   Global Step: 91130   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:39:45,619-Speed 11318.82 samples/sec   Loss 7.0622   LearningRate 0.0302   Epoch: 18   Global Step: 91140   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:39:46,558-Speed 10910.64 samples/sec   Loss 7.1209   LearningRate 0.0302   Epoch: 18   Global Step: 91150   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:39:47,575-Speed 10082.42 samples/sec   Loss 7.1796   LearningRate 0.0302   Epoch: 18   Global Step: 91160   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:39:48,704-Speed 9071.19 samples/sec   Loss 7.1990   LearningRate 0.0302   Epoch: 18   Global Step: 91170   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:39:49,697-Speed 10327.86 samples/sec   Loss 7.4494   LearningRate 0.0302   Epoch: 18   Global Step: 91180   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:39:50,657-Speed 10683.04 samples/sec   Loss 7.2446   LearningRate 0.0302   Epoch: 18   Global Step: 91190   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:39:51,585-Speed 11049.20 samples/sec   Loss 7.1529   LearningRate 0.0302   Epoch: 18   Global Step: 91200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:39:52,495-Speed 11260.28 samples/sec   Loss 7.2462   LearningRate 0.0302   Epoch: 18   Global Step: 91210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:39:53,506-Speed 10136.85 samples/sec   Loss 6.9863   LearningRate 0.0302   Epoch: 18   Global Step: 91220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:39:54,518-Speed 10124.25 samples/sec   Loss 7.2177   LearningRate 0.0301   Epoch: 18   Global Step: 91230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:39:55,475-Speed 10709.62 samples/sec   Loss 7.4165   LearningRate 0.0301   Epoch: 18   Global Step: 91240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:39:56,575-Speed 9316.82 samples/sec   Loss 7.0975   LearningRate 0.0301   Epoch: 18   Global Step: 91250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:39:57,523-Speed 10813.74 samples/sec   Loss 7.2284   LearningRate 0.0301   Epoch: 18   Global Step: 91260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:39:58,503-Speed 10459.49 samples/sec   Loss 7.3722   LearningRate 0.0301   Epoch: 18   Global Step: 91270   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:39:59,458-Speed 10735.48 samples/sec   Loss 7.3274   LearningRate 0.0301   Epoch: 18   Global Step: 91280   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:40:00,400-Speed 10877.85 samples/sec   Loss 7.2673   LearningRate 0.0301   Epoch: 18   Global Step: 91290   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:40:01,433-Speed 9932.77 samples/sec   Loss 7.2981   LearningRate 0.0301   Epoch: 18   Global Step: 91300   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:40:02,395-Speed 10657.10 samples/sec   Loss 7.3049   LearningRate 0.0301   Epoch: 18   Global Step: 91310   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:40:03,369-Speed 10526.93 samples/sec   Loss 7.2955   LearningRate 0.0301   Epoch: 18   Global Step: 91320   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:40:04,363-Speed 10303.48 samples/sec   Loss 7.3310   LearningRate 0.0301   Epoch: 18   Global Step: 91330   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:40:05,297-Speed 10976.71 samples/sec   Loss 7.3377   LearningRate 0.0301   Epoch: 18   Global Step: 91340   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:40:06,225-Speed 11050.95 samples/sec   Loss 7.2854   LearningRate 0.0301   Epoch: 18   Global Step: 91350   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:40:07,178-Speed 10757.52 samples/sec   Loss 7.4074   LearningRate 0.0301   Epoch: 18   Global Step: 91360   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:40:08,186-Speed 10173.73 samples/sec   Loss 7.1904   LearningRate 0.0301   Epoch: 18   Global Step: 91370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:09,146-Speed 10672.42 samples/sec   Loss 7.3427   LearningRate 0.0301   Epoch: 18   Global Step: 91380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:10,107-Speed 10668.21 samples/sec   Loss 7.2984   LearningRate 0.0301   Epoch: 18   Global Step: 91390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:11,067-Speed 10672.07 samples/sec   Loss 7.3564   LearningRate 0.0301   Epoch: 18   Global Step: 91400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:12,071-Speed 10216.60 samples/sec   Loss 7.3871   LearningRate 0.0301   Epoch: 18   Global Step: 91410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:12,988-Speed 11189.08 samples/sec   Loss 7.4211   LearningRate 0.0300   Epoch: 18   Global Step: 91420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:13,916-Speed 11045.68 samples/sec   Loss 7.2171   LearningRate 0.0300   Epoch: 18   Global Step: 91430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:14,854-Speed 10919.98 samples/sec   Loss 7.3664   LearningRate 0.0300   Epoch: 18   Global Step: 91440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:15,786-Speed 11002.24 samples/sec   Loss 7.3748   LearningRate 0.0300   Epoch: 18   Global Step: 91450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:16,795-Speed 10158.18 samples/sec   Loss 7.4215   LearningRate 0.0300   Epoch: 18   Global Step: 91460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:17,764-Speed 10585.47 samples/sec   Loss 7.3756   LearningRate 0.0300   Epoch: 18   Global Step: 91470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:18,668-Speed 11340.83 samples/sec   Loss 7.3685   LearningRate 0.0300   Epoch: 18   Global Step: 91480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:19,602-Speed 10975.69 samples/sec   Loss 7.3130   LearningRate 0.0300   Epoch: 18   Global Step: 91490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:20,571-Speed 10583.11 samples/sec   Loss 7.3117   LearningRate 0.0300   Epoch: 18   Global Step: 91500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:21,524-Speed 10762.03 samples/sec   Loss 7.2500   LearningRate 0.0300   Epoch: 18   Global Step: 91510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:22,515-Speed 10345.02 samples/sec   Loss 7.2199   LearningRate 0.0300   Epoch: 18   Global Step: 91520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:23,521-Speed 10183.56 samples/sec   Loss 7.4483   LearningRate 0.0300   Epoch: 18   Global Step: 91530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:24,456-Speed 10961.95 samples/sec   Loss 7.4246   LearningRate 0.0300   Epoch: 18   Global Step: 91540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:25,391-Speed 10968.04 samples/sec   Loss 7.4594   LearningRate 0.0300   Epoch: 18   Global Step: 91550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:26,343-Speed 10768.44 samples/sec   Loss 7.4159   LearningRate 0.0300   Epoch: 18   Global Step: 91560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:27,293-Speed 10785.99 samples/sec   Loss 7.4556   LearningRate 0.0300   Epoch: 18   Global Step: 91570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:40:28,249-Speed 10728.29 samples/sec   Loss 7.3913   LearningRate 0.0300   Epoch: 18   Global Step: 91580   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:40:29,217-Speed 10589.67 samples/sec   Loss 7.3605   LearningRate 0.0300   Epoch: 18   Global Step: 91590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:40:30,172-Speed 10732.71 samples/sec   Loss 7.3581   LearningRate 0.0299   Epoch: 18   Global Step: 91600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:31,229-Speed 9689.83 samples/sec   Loss 7.4470   LearningRate 0.0299   Epoch: 18   Global Step: 91610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:32,178-Speed 10813.65 samples/sec   Loss 7.4837   LearningRate 0.0299   Epoch: 18   Global Step: 91620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:33,134-Speed 10724.31 samples/sec   Loss 7.4729   LearningRate 0.0299   Epoch: 18   Global Step: 91630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:34,064-Speed 11029.63 samples/sec   Loss 7.3777   LearningRate 0.0299   Epoch: 18   Global Step: 91640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:35,107-Speed 9830.19 samples/sec   Loss 7.5832   LearningRate 0.0299   Epoch: 18   Global Step: 91650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:36,103-Speed 10296.65 samples/sec   Loss 7.5563   LearningRate 0.0299   Epoch: 18   Global Step: 91660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:37,158-Speed 9712.47 samples/sec   Loss 7.3761   LearningRate 0.0299   Epoch: 18   Global Step: 91670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:38,228-Speed 9582.82 samples/sec   Loss 7.4111   LearningRate 0.0299   Epoch: 18   Global Step: 91680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:39,201-Speed 10540.97 samples/sec   Loss 7.5411   LearningRate 0.0299   Epoch: 18   Global Step: 91690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:40,178-Speed 10483.80 samples/sec   Loss 7.4219   LearningRate 0.0299   Epoch: 18   Global Step: 91700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:40:41,136-Speed 10696.25 samples/sec   Loss 7.4835   LearningRate 0.0299   Epoch: 18   Global Step: 91710   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:40:42,106-Speed 10574.49 samples/sec   Loss 7.3645   LearningRate 0.0299   Epoch: 18   Global Step: 91720   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:40:43,078-Speed 10544.81 samples/sec   Loss 7.4564   LearningRate 0.0299   Epoch: 18   Global Step: 91730   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:40:44,065-Speed 10385.87 samples/sec   Loss 7.4287   LearningRate 0.0299   Epoch: 18   Global Step: 91740   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:40:45,021-Speed 10723.78 samples/sec   Loss 7.3907   LearningRate 0.0299   Epoch: 18   Global Step: 91750   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:40:45,950-Speed 11039.16 samples/sec   Loss 7.3256   LearningRate 0.0299   Epoch: 18   Global Step: 91760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:46,896-Speed 10830.15 samples/sec   Loss 7.5733   LearningRate 0.0299   Epoch: 18   Global Step: 91770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:47,854-Speed 10696.17 samples/sec   Loss 7.4548   LearningRate 0.0299   Epoch: 18   Global Step: 91780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:48,771-Speed 11184.29 samples/sec   Loss 7.4422   LearningRate 0.0298   Epoch: 18   Global Step: 91790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:49,777-Speed 10186.84 samples/sec   Loss 7.5203   LearningRate 0.0298   Epoch: 18   Global Step: 91800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:50,719-Speed 10875.55 samples/sec   Loss 7.5405   LearningRate 0.0298   Epoch: 18   Global Step: 91810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:51,706-Speed 10385.92 samples/sec   Loss 7.6213   LearningRate 0.0298   Epoch: 18   Global Step: 91820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:52,671-Speed 10628.39 samples/sec   Loss 7.5439   LearningRate 0.0298   Epoch: 18   Global Step: 91830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:53,665-Speed 10301.75 samples/sec   Loss 7.5031   LearningRate 0.0298   Epoch: 18   Global Step: 91840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:54,677-Speed 10141.70 samples/sec   Loss 7.5259   LearningRate 0.0298   Epoch: 18   Global Step: 91850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:55,634-Speed 10707.84 samples/sec   Loss 7.4592   LearningRate 0.0298   Epoch: 18   Global Step: 91860   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:40:56,566-Speed 10995.65 samples/sec   Loss 7.5468   LearningRate 0.0298   Epoch: 18   Global Step: 91870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:57,563-Speed 10280.03 samples/sec   Loss 7.5430   LearningRate 0.0298   Epoch: 18   Global Step: 91880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:58,538-Speed 10523.79 samples/sec   Loss 7.5503   LearningRate 0.0298   Epoch: 18   Global Step: 91890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:40:59,472-Speed 10972.00 samples/sec   Loss 7.4247   LearningRate 0.0298   Epoch: 18   Global Step: 91900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:41:00,411-Speed 10909.02 samples/sec   Loss 7.5102   LearningRate 0.0298   Epoch: 18   Global Step: 91910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:41:01,408-Speed 10277.92 samples/sec   Loss 7.4921   LearningRate 0.0298   Epoch: 18   Global Step: 91920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:41:02,377-Speed 10580.99 samples/sec   Loss 7.4584   LearningRate 0.0298   Epoch: 18   Global Step: 91930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:41:03,312-Speed 10965.94 samples/sec   Loss 7.5535   LearningRate 0.0298   Epoch: 18   Global Step: 91940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:41:04,233-Speed 11128.88 samples/sec   Loss 7.4958   LearningRate 0.0298   Epoch: 18   Global Step: 91950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:41:05,191-Speed 10698.24 samples/sec   Loss 7.6304   LearningRate 0.0298   Epoch: 18   Global Step: 91960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:41:06,131-Speed 10901.87 samples/sec   Loss 7.6452   LearningRate 0.0297   Epoch: 18   Global Step: 91970   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:41:07,108-Speed 10497.21 samples/sec   Loss 7.5373   LearningRate 0.0297   Epoch: 18   Global Step: 91980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:41:08,029-Speed 11128.31 samples/sec   Loss 7.5883   LearningRate 0.0297   Epoch: 18   Global Step: 91990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:41:08,996-Speed 10606.37 samples/sec   Loss 7.4502   LearningRate 0.0297   Epoch: 18   Global Step: 92000   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:41:31,768-[lfw][92000]XNorm: 11.412780
Training: 2022-04-11 02:41:31,769-[lfw][92000]Accuracy-Flip: 0.99600+-0.00410
Training: 2022-04-11 02:41:31,770-[lfw][92000]Accuracy-Highest: 0.99650
Training: 2022-04-11 02:41:57,358-[cfp_fp][92000]XNorm: 9.715233
Training: 2022-04-11 02:41:57,359-[cfp_fp][92000]Accuracy-Flip: 0.95971+-0.00859
Training: 2022-04-11 02:41:57,360-[cfp_fp][92000]Accuracy-Highest: 0.96071
Training: 2022-04-11 02:42:19,532-[agedb_30][92000]XNorm: 11.078798
Training: 2022-04-11 02:42:19,532-[agedb_30][92000]Accuracy-Flip: 0.96517+-0.01007
Training: 2022-04-11 02:42:19,533-[agedb_30][92000]Accuracy-Highest: 0.96517
Training: 2022-04-11 02:42:20,491-Speed 143.23 samples/sec   Loss 7.6415   LearningRate 0.0297   Epoch: 18   Global Step: 92010   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:42:21,418-Speed 11057.08 samples/sec   Loss 7.5461   LearningRate 0.0297   Epoch: 18   Global Step: 92020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:42:22,416-Speed 10268.78 samples/sec   Loss 7.4841   LearningRate 0.0297   Epoch: 18   Global Step: 92030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:42:23,392-Speed 10503.33 samples/sec   Loss 7.4723   LearningRate 0.0297   Epoch: 18   Global Step: 92040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:42:24,357-Speed 10631.45 samples/sec   Loss 7.7121   LearningRate 0.0297   Epoch: 18   Global Step: 92050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:42:25,299-Speed 10881.01 samples/sec   Loss 7.6219   LearningRate 0.0297   Epoch: 18   Global Step: 92060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:42:26,207-Speed 11283.47 samples/sec   Loss 7.7231   LearningRate 0.0297   Epoch: 18   Global Step: 92070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:42:27,142-Speed 10958.73 samples/sec   Loss 7.7044   LearningRate 0.0297   Epoch: 18   Global Step: 92080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:42:28,168-Speed 10009.97 samples/sec   Loss 7.6131   LearningRate 0.0297   Epoch: 18   Global Step: 92090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:42:29,135-Speed 10598.46 samples/sec   Loss 7.4674   LearningRate 0.0297   Epoch: 18   Global Step: 92100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:42:30,106-Speed 10558.11 samples/sec   Loss 7.5637   LearningRate 0.0297   Epoch: 18   Global Step: 92110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:42:31,117-Speed 10137.13 samples/sec   Loss 7.4598   LearningRate 0.0297   Epoch: 18   Global Step: 92120   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:42:32,064-Speed 10820.70 samples/sec   Loss 7.6375   LearningRate 0.0297   Epoch: 18   Global Step: 92130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:42:33,034-Speed 10565.82 samples/sec   Loss 7.7158   LearningRate 0.0297   Epoch: 18   Global Step: 92140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:42:34,031-Speed 10280.23 samples/sec   Loss 7.5567   LearningRate 0.0297   Epoch: 18   Global Step: 92150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:42:34,961-Speed 11028.18 samples/sec   Loss 7.6151   LearningRate 0.0296   Epoch: 18   Global Step: 92160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:42:35,892-Speed 11021.18 samples/sec   Loss 7.6054   LearningRate 0.0296   Epoch: 18   Global Step: 92170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:42:36,839-Speed 10818.15 samples/sec   Loss 7.6927   LearningRate 0.0296   Epoch: 18   Global Step: 92180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:42:37,781-Speed 10875.45 samples/sec   Loss 7.6660   LearningRate 0.0296   Epoch: 18   Global Step: 92190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:42:38,796-Speed 10102.34 samples/sec   Loss 7.7815   LearningRate 0.0296   Epoch: 18   Global Step: 92200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:42:39,761-Speed 10618.30 samples/sec   Loss 7.5245   LearningRate 0.0296   Epoch: 18   Global Step: 92210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:42:40,677-Speed 11191.09 samples/sec   Loss 7.6373   LearningRate 0.0296   Epoch: 18   Global Step: 92220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:42:41,594-Speed 11164.16 samples/sec   Loss 7.7140   LearningRate 0.0296   Epoch: 18   Global Step: 92230   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:42:42,550-Speed 10731.61 samples/sec   Loss 7.6179   LearningRate 0.0296   Epoch: 18   Global Step: 92240   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:42:43,491-Speed 10899.50 samples/sec   Loss 7.6432   LearningRate 0.0296   Epoch: 18   Global Step: 92250   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:42:44,449-Speed 10687.33 samples/sec   Loss 7.8002   LearningRate 0.0296   Epoch: 18   Global Step: 92260   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:42:45,407-Speed 10703.86 samples/sec   Loss 7.7638   LearningRate 0.0296   Epoch: 18   Global Step: 92270   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:42:46,394-Speed 10374.26 samples/sec   Loss 7.7040   LearningRate 0.0296   Epoch: 18   Global Step: 92280   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:42:47,380-Speed 10398.66 samples/sec   Loss 7.5606   LearningRate 0.0296   Epoch: 18   Global Step: 92290   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:42:48,331-Speed 10780.09 samples/sec   Loss 7.6442   LearningRate 0.0296   Epoch: 18   Global Step: 92300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:42:49,299-Speed 10587.69 samples/sec   Loss 7.5274   LearningRate 0.0296   Epoch: 18   Global Step: 92310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:42:50,265-Speed 10609.61 samples/sec   Loss 7.5341   LearningRate 0.0296   Epoch: 18   Global Step: 92320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:42:51,287-Speed 10024.62 samples/sec   Loss 7.7955   LearningRate 0.0296   Epoch: 18   Global Step: 92330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:42:52,255-Speed 10585.38 samples/sec   Loss 7.7842   LearningRate 0.0295   Epoch: 18   Global Step: 92340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:42:53,200-Speed 10850.11 samples/sec   Loss 7.6807   LearningRate 0.0295   Epoch: 18   Global Step: 92350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:42:54,184-Speed 10415.14 samples/sec   Loss 7.6625   LearningRate 0.0295   Epoch: 18   Global Step: 92360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:42:55,128-Speed 10862.31 samples/sec   Loss 7.7199   LearningRate 0.0295   Epoch: 18   Global Step: 92370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:42:56,037-Speed 11280.64 samples/sec   Loss 7.7020   LearningRate 0.0295   Epoch: 18   Global Step: 92380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:42:56,974-Speed 10937.28 samples/sec   Loss 7.5942   LearningRate 0.0295   Epoch: 18   Global Step: 92390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:42:57,955-Speed 10446.32 samples/sec   Loss 7.7504   LearningRate 0.0295   Epoch: 18   Global Step: 92400   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:42:58,908-Speed 10757.30 samples/sec   Loss 7.6965   LearningRate 0.0295   Epoch: 18   Global Step: 92410   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:42:59,850-Speed 10876.87 samples/sec   Loss 7.8199   LearningRate 0.0295   Epoch: 18   Global Step: 92420   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:43:00,837-Speed 10388.84 samples/sec   Loss 7.6737   LearningRate 0.0295   Epoch: 18   Global Step: 92430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:43:01,829-Speed 10335.76 samples/sec   Loss 7.4772   LearningRate 0.0295   Epoch: 18   Global Step: 92440   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:43:02,776-Speed 10811.96 samples/sec   Loss 7.8244   LearningRate 0.0295   Epoch: 18   Global Step: 92450   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:43:03,716-Speed 10904.97 samples/sec   Loss 7.6048   LearningRate 0.0295   Epoch: 18   Global Step: 92460   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:43:04,704-Speed 10376.94 samples/sec   Loss 7.7152   LearningRate 0.0295   Epoch: 18   Global Step: 92470   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:43:05,622-Speed 11158.56 samples/sec   Loss 7.5301   LearningRate 0.0295   Epoch: 18   Global Step: 92480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:43:06,537-Speed 11209.40 samples/sec   Loss 7.7703   LearningRate 0.0295   Epoch: 18   Global Step: 92490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:43:07,447-Speed 11265.44 samples/sec   Loss 7.8326   LearningRate 0.0295   Epoch: 18   Global Step: 92500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:43:08,477-Speed 9948.90 samples/sec   Loss 7.7017   LearningRate 0.0295   Epoch: 18   Global Step: 92510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:43:09,414-Speed 10942.60 samples/sec   Loss 7.8020   LearningRate 0.0295   Epoch: 18   Global Step: 92520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:43:10,358-Speed 10847.85 samples/sec   Loss 7.7564   LearningRate 0.0294   Epoch: 18   Global Step: 92530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:43:11,285-Speed 11060.25 samples/sec   Loss 7.7483   LearningRate 0.0294   Epoch: 18   Global Step: 92540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:43:12,285-Speed 10245.68 samples/sec   Loss 7.6542   LearningRate 0.0294   Epoch: 18   Global Step: 92550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:43:13,235-Speed 10795.52 samples/sec   Loss 7.6645   LearningRate 0.0294   Epoch: 18   Global Step: 92560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:43:14,195-Speed 10679.16 samples/sec   Loss 7.6613   LearningRate 0.0294   Epoch: 18   Global Step: 92570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:43:15,234-Speed 9869.38 samples/sec   Loss 7.7145   LearningRate 0.0294   Epoch: 18   Global Step: 92580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:43:16,138-Speed 11344.27 samples/sec   Loss 7.8474   LearningRate 0.0294   Epoch: 18   Global Step: 92590   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:43:17,093-Speed 10739.84 samples/sec   Loss 7.7200   LearningRate 0.0294   Epoch: 18   Global Step: 92600   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:43:18,072-Speed 10460.57 samples/sec   Loss 7.8065   LearningRate 0.0294   Epoch: 18   Global Step: 92610   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:43:19,005-Speed 10991.80 samples/sec   Loss 7.6415   LearningRate 0.0294   Epoch: 18   Global Step: 92620   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:43:19,895-Speed 11523.05 samples/sec   Loss 7.6574   LearningRate 0.0294   Epoch: 18   Global Step: 92630   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:43:20,828-Speed 10988.32 samples/sec   Loss 7.6851   LearningRate 0.0294   Epoch: 18   Global Step: 92640   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:43:21,868-Speed 9862.38 samples/sec   Loss 7.6454   LearningRate 0.0294   Epoch: 18   Global Step: 92650   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:43:22,789-Speed 11120.30 samples/sec   Loss 7.7691   LearningRate 0.0294   Epoch: 18   Global Step: 92660   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:43:23,738-Speed 10806.35 samples/sec   Loss 8.0262   LearningRate 0.0294   Epoch: 18   Global Step: 92670   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:43:24,733-Speed 10298.03 samples/sec   Loss 7.7600   LearningRate 0.0294   Epoch: 18   Global Step: 92680   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:43:25,706-Speed 10537.58 samples/sec   Loss 7.7330   LearningRate 0.0294   Epoch: 18   Global Step: 92690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:43:26,687-Speed 10453.84 samples/sec   Loss 7.5919   LearningRate 0.0294   Epoch: 18   Global Step: 92700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:43:27,606-Speed 11146.14 samples/sec   Loss 7.7596   LearningRate 0.0294   Epoch: 18   Global Step: 92710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:43:28,512-Speed 11321.10 samples/sec   Loss 7.6797   LearningRate 0.0293   Epoch: 18   Global Step: 92720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:43:29,446-Speed 10962.33 samples/sec   Loss 7.8829   LearningRate 0.0293   Epoch: 18   Global Step: 92730   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:43:30,470-Speed 10013.72 samples/sec   Loss 7.6865   LearningRate 0.0293   Epoch: 18   Global Step: 92740   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:43:31,434-Speed 10635.59 samples/sec   Loss 7.7004   LearningRate 0.0293   Epoch: 18   Global Step: 92750   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:43:32,359-Speed 11081.11 samples/sec   Loss 7.6872   LearningRate 0.0293   Epoch: 18   Global Step: 92760   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:43:33,313-Speed 10740.11 samples/sec   Loss 7.9214   LearningRate 0.0293   Epoch: 18   Global Step: 92770   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:43:34,251-Speed 10933.89 samples/sec   Loss 7.7431   LearningRate 0.0293   Epoch: 18   Global Step: 92780   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:43:35,179-Speed 11048.78 samples/sec   Loss 7.8717   LearningRate 0.0293   Epoch: 18   Global Step: 92790   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:43:36,073-Speed 11469.58 samples/sec   Loss 7.7811   LearningRate 0.0293   Epoch: 18   Global Step: 92800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:43:37,025-Speed 10761.08 samples/sec   Loss 7.8327   LearningRate 0.0293   Epoch: 18   Global Step: 92810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:43:37,997-Speed 10545.22 samples/sec   Loss 7.7626   LearningRate 0.0293   Epoch: 18   Global Step: 92820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:43:38,981-Speed 10416.79 samples/sec   Loss 7.7194   LearningRate 0.0293   Epoch: 18   Global Step: 92830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:43:39,951-Speed 10571.39 samples/sec   Loss 7.9168   LearningRate 0.0293   Epoch: 18   Global Step: 92840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:43:40,920-Speed 10573.53 samples/sec   Loss 7.7446   LearningRate 0.0293   Epoch: 18   Global Step: 92850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:43:41,847-Speed 11048.02 samples/sec   Loss 7.6576   LearningRate 0.0293   Epoch: 18   Global Step: 92860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:43:42,775-Speed 11051.67 samples/sec   Loss 7.7339   LearningRate 0.0293   Epoch: 18   Global Step: 92870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:43:43,656-Speed 11644.47 samples/sec   Loss 7.7416   LearningRate 0.0293   Epoch: 18   Global Step: 92880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:43:44,590-Speed 10965.42 samples/sec   Loss 7.5435   LearningRate 0.0293   Epoch: 18   Global Step: 92890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:43:45,539-Speed 10806.46 samples/sec   Loss 7.5782   LearningRate 0.0292   Epoch: 18   Global Step: 92900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:43:46,519-Speed 10452.12 samples/sec   Loss 7.8077   LearningRate 0.0292   Epoch: 18   Global Step: 92910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:43:47,520-Speed 10237.71 samples/sec   Loss 7.5777   LearningRate 0.0292   Epoch: 18   Global Step: 92920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:43:48,467-Speed 10837.61 samples/sec   Loss 7.6065   LearningRate 0.0292   Epoch: 18   Global Step: 92930   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:43:49,414-Speed 10818.73 samples/sec   Loss 7.7843   LearningRate 0.0292   Epoch: 18   Global Step: 92940   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:43:50,356-Speed 10878.79 samples/sec   Loss 7.7351   LearningRate 0.0292   Epoch: 18   Global Step: 92950   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:43:51,394-Speed 9875.51 samples/sec   Loss 7.6916   LearningRate 0.0292   Epoch: 18   Global Step: 92960   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:43:52,335-Speed 10890.14 samples/sec   Loss 7.7402   LearningRate 0.0292   Epoch: 18   Global Step: 92970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:43:53,307-Speed 10545.78 samples/sec   Loss 7.8468   LearningRate 0.0292   Epoch: 18   Global Step: 92980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:43:54,247-Speed 10895.37 samples/sec   Loss 7.7164   LearningRate 0.0292   Epoch: 18   Global Step: 92990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:43:55,227-Speed 10459.12 samples/sec   Loss 7.7559   LearningRate 0.0292   Epoch: 18   Global Step: 93000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:43:56,186-Speed 10692.86 samples/sec   Loss 7.7467   LearningRate 0.0292   Epoch: 18   Global Step: 93010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:43:57,291-Speed 9277.14 samples/sec   Loss 7.7028   LearningRate 0.0292   Epoch: 18   Global Step: 93020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:43:58,269-Speed 10475.41 samples/sec   Loss 7.7518   LearningRate 0.0292   Epoch: 18   Global Step: 93030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:43:59,261-Speed 10334.65 samples/sec   Loss 7.8651   LearningRate 0.0292   Epoch: 18   Global Step: 93040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:00,235-Speed 10524.04 samples/sec   Loss 7.7668   LearningRate 0.0292   Epoch: 18   Global Step: 93050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:01,203-Speed 10588.43 samples/sec   Loss 7.8541   LearningRate 0.0292   Epoch: 18   Global Step: 93060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:02,195-Speed 10331.80 samples/sec   Loss 7.6981   LearningRate 0.0292   Epoch: 18   Global Step: 93070   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:44:03,114-Speed 11155.13 samples/sec   Loss 7.8440   LearningRate 0.0292   Epoch: 18   Global Step: 93080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:04,027-Speed 11226.06 samples/sec   Loss 7.6917   LearningRate 0.0291   Epoch: 18   Global Step: 93090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:05,021-Speed 10301.85 samples/sec   Loss 7.7832   LearningRate 0.0291   Epoch: 18   Global Step: 93100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:05,998-Speed 10496.05 samples/sec   Loss 7.7071   LearningRate 0.0291   Epoch: 18   Global Step: 93110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:06,980-Speed 10795.00 samples/sec   Loss 7.7063   LearningRate 0.0291   Epoch: 18   Global Step: 93120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:07,963-Speed 10429.66 samples/sec   Loss 7.7754   LearningRate 0.0291   Epoch: 18   Global Step: 93130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:08,861-Speed 11421.85 samples/sec   Loss 7.7827   LearningRate 0.0291   Epoch: 18   Global Step: 93140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:09,810-Speed 10799.71 samples/sec   Loss 7.7657   LearningRate 0.0291   Epoch: 18   Global Step: 93150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:10,792-Speed 10429.63 samples/sec   Loss 7.7989   LearningRate 0.0291   Epoch: 18   Global Step: 93160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:11,762-Speed 10568.39 samples/sec   Loss 7.6411   LearningRate 0.0291   Epoch: 18   Global Step: 93170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:12,732-Speed 10566.15 samples/sec   Loss 7.7609   LearningRate 0.0291   Epoch: 18   Global Step: 93180   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:44:13,649-Speed 11185.93 samples/sec   Loss 7.8030   LearningRate 0.0291   Epoch: 18   Global Step: 93190   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:44:14,634-Speed 10396.18 samples/sec   Loss 7.8257   LearningRate 0.0291   Epoch: 18   Global Step: 93200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:44:15,589-Speed 10726.99 samples/sec   Loss 7.8454   LearningRate 0.0291   Epoch: 18   Global Step: 93210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:44:16,596-Speed 10187.44 samples/sec   Loss 7.8663   LearningRate 0.0291   Epoch: 18   Global Step: 93220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:44:17,545-Speed 10799.11 samples/sec   Loss 7.8835   LearningRate 0.0291   Epoch: 18   Global Step: 93230   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:44:18,473-Speed 11045.64 samples/sec   Loss 7.7204   LearningRate 0.0291   Epoch: 18   Global Step: 93240   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:44:19,527-Speed 9724.95 samples/sec   Loss 7.6696   LearningRate 0.0291   Epoch: 18   Global Step: 93250   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:44:20,474-Speed 10820.74 samples/sec   Loss 7.7548   LearningRate 0.0291   Epoch: 18   Global Step: 93260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:21,379-Speed 11321.37 samples/sec   Loss 7.7722   LearningRate 0.0291   Epoch: 18   Global Step: 93270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:22,341-Speed 10662.81 samples/sec   Loss 7.9012   LearningRate 0.0290   Epoch: 18   Global Step: 93280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:23,272-Speed 11001.70 samples/sec   Loss 7.8039   LearningRate 0.0290   Epoch: 18   Global Step: 93290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:24,282-Speed 10159.16 samples/sec   Loss 7.6611   LearningRate 0.0290   Epoch: 18   Global Step: 93300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:25,257-Speed 10513.94 samples/sec   Loss 7.7770   LearningRate 0.0290   Epoch: 18   Global Step: 93310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:26,224-Speed 10601.05 samples/sec   Loss 7.8194   LearningRate 0.0290   Epoch: 18   Global Step: 93320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:27,163-Speed 10917.80 samples/sec   Loss 7.8017   LearningRate 0.0290   Epoch: 18   Global Step: 93330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:28,102-Speed 10907.55 samples/sec   Loss 7.9001   LearningRate 0.0290   Epoch: 18   Global Step: 93340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:29,044-Speed 10886.86 samples/sec   Loss 7.8045   LearningRate 0.0290   Epoch: 18   Global Step: 93350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:29,974-Speed 11020.49 samples/sec   Loss 7.9444   LearningRate 0.0290   Epoch: 18   Global Step: 93360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:44:30,941-Speed 10606.08 samples/sec   Loss 7.7962   LearningRate 0.0290   Epoch: 18   Global Step: 93370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:31,955-Speed 10099.53 samples/sec   Loss 7.7290   LearningRate 0.0290   Epoch: 18   Global Step: 93380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:32,928-Speed 10533.60 samples/sec   Loss 7.6445   LearningRate 0.0290   Epoch: 18   Global Step: 93390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:33,896-Speed 10591.18 samples/sec   Loss 7.8284   LearningRate 0.0290   Epoch: 18   Global Step: 93400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:34,824-Speed 11042.37 samples/sec   Loss 7.7810   LearningRate 0.0290   Epoch: 18   Global Step: 93410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:35,747-Speed 11107.90 samples/sec   Loss 7.9795   LearningRate 0.0290   Epoch: 18   Global Step: 93420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:36,692-Speed 10847.29 samples/sec   Loss 7.8640   LearningRate 0.0290   Epoch: 18   Global Step: 93430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:37,647-Speed 10730.38 samples/sec   Loss 7.8071   LearningRate 0.0290   Epoch: 18   Global Step: 93440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:38,591-Speed 10872.83 samples/sec   Loss 7.7020   LearningRate 0.0290   Epoch: 18   Global Step: 93450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:39,534-Speed 10864.31 samples/sec   Loss 7.8416   LearningRate 0.0290   Epoch: 18   Global Step: 93460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:40,552-Speed 10066.85 samples/sec   Loss 7.7658   LearningRate 0.0289   Epoch: 18   Global Step: 93470   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:44:41,471-Speed 11152.87 samples/sec   Loss 7.7850   LearningRate 0.0289   Epoch: 18   Global Step: 93480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:42,446-Speed 10511.66 samples/sec   Loss 7.7853   LearningRate 0.0289   Epoch: 18   Global Step: 93490   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:44:43,457-Speed 10326.87 samples/sec   Loss 7.7364   LearningRate 0.0289   Epoch: 18   Global Step: 93500   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:44:44,395-Speed 10935.33 samples/sec   Loss 7.8746   LearningRate 0.0289   Epoch: 18   Global Step: 93510   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:44:45,322-Speed 11047.55 samples/sec   Loss 7.9452   LearningRate 0.0289   Epoch: 18   Global Step: 93520   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:44:46,251-Speed 11037.41 samples/sec   Loss 7.9292   LearningRate 0.0289   Epoch: 18   Global Step: 93530   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:44:47,199-Speed 10814.34 samples/sec   Loss 7.9413   LearningRate 0.0289   Epoch: 18   Global Step: 93540   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:44:48,216-Speed 10075.34 samples/sec   Loss 7.7494   LearningRate 0.0289   Epoch: 18   Global Step: 93550   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:44:49,162-Speed 10835.20 samples/sec   Loss 7.8583   LearningRate 0.0289   Epoch: 18   Global Step: 93560   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:44:50,086-Speed 11095.30 samples/sec   Loss 7.9184   LearningRate 0.0289   Epoch: 18   Global Step: 93570   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:44:51,025-Speed 10914.50 samples/sec   Loss 7.7567   LearningRate 0.0289   Epoch: 18   Global Step: 93580   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:44:52,012-Speed 10385.26 samples/sec   Loss 7.6909   LearningRate 0.0289   Epoch: 18   Global Step: 93590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:52,953-Speed 10885.84 samples/sec   Loss 7.9672   LearningRate 0.0289   Epoch: 18   Global Step: 93600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:53,920-Speed 10603.71 samples/sec   Loss 7.7733   LearningRate 0.0289   Epoch: 18   Global Step: 93610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:54,855-Speed 10957.82 samples/sec   Loss 7.7642   LearningRate 0.0289   Epoch: 18   Global Step: 93620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:55,813-Speed 10697.79 samples/sec   Loss 7.8229   LearningRate 0.0289   Epoch: 18   Global Step: 93630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:56,788-Speed 10513.29 samples/sec   Loss 7.7933   LearningRate 0.0289   Epoch: 18   Global Step: 93640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:57,740-Speed 10766.68 samples/sec   Loss 7.8094   LearningRate 0.0288   Epoch: 18   Global Step: 93650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:58,679-Speed 10923.76 samples/sec   Loss 7.8771   LearningRate 0.0288   Epoch: 18   Global Step: 93660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:44:59,676-Speed 10285.90 samples/sec   Loss 7.7903   LearningRate 0.0288   Epoch: 18   Global Step: 93670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:45:00,590-Speed 11225.56 samples/sec   Loss 7.7536   LearningRate 0.0288   Epoch: 18   Global Step: 93680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:45:01,531-Speed 10885.95 samples/sec   Loss 7.8577   LearningRate 0.0288   Epoch: 18   Global Step: 93690   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:45:02,436-Speed 11333.69 samples/sec   Loss 7.6806   LearningRate 0.0288   Epoch: 18   Global Step: 93700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:45:03,386-Speed 10787.19 samples/sec   Loss 8.0227   LearningRate 0.0288   Epoch: 18   Global Step: 93710   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:45:04,352-Speed 10605.09 samples/sec   Loss 7.9526   LearningRate 0.0288   Epoch: 18   Global Step: 93720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:45:05,288-Speed 10956.65 samples/sec   Loss 7.8900   LearningRate 0.0288   Epoch: 18   Global Step: 93730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:45:06,240-Speed 10767.90 samples/sec   Loss 7.8074   LearningRate 0.0288   Epoch: 18   Global Step: 93740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:45:07,137-Speed 11432.33 samples/sec   Loss 7.9371   LearningRate 0.0288   Epoch: 18   Global Step: 93750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:45:08,098-Speed 10658.95 samples/sec   Loss 7.7736   LearningRate 0.0288   Epoch: 18   Global Step: 93760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:45:09,093-Speed 10302.32 samples/sec   Loss 8.0107   LearningRate 0.0288   Epoch: 18   Global Step: 93770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:45:10,046-Speed 10752.82 samples/sec   Loss 7.8392   LearningRate 0.0288   Epoch: 18   Global Step: 93780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:45:11,006-Speed 10680.66 samples/sec   Loss 7.7930   LearningRate 0.0288   Epoch: 18   Global Step: 93790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:45:11,989-Speed 10424.61 samples/sec   Loss 7.8043   LearningRate 0.0288   Epoch: 18   Global Step: 93800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:45:12,964-Speed 10525.31 samples/sec   Loss 7.9368   LearningRate 0.0288   Epoch: 18   Global Step: 93810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:45:13,885-Speed 11125.80 samples/sec   Loss 8.0429   LearningRate 0.0288   Epoch: 18   Global Step: 93820   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:45:14,778-Speed 11474.40 samples/sec   Loss 7.7092   LearningRate 0.0288   Epoch: 18   Global Step: 93830   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:45:15,753-Speed 10509.19 samples/sec   Loss 7.9230   LearningRate 0.0287   Epoch: 18   Global Step: 93840   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:45:16,776-Speed 10020.64 samples/sec   Loss 7.8797   LearningRate 0.0287   Epoch: 18   Global Step: 93850   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:45:17,739-Speed 10646.54 samples/sec   Loss 7.7645   LearningRate 0.0287   Epoch: 18   Global Step: 93860   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:45:18,769-Speed 9949.43 samples/sec   Loss 7.8307   LearningRate 0.0287   Epoch: 18   Global Step: 93870   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:45:19,770-Speed 10245.34 samples/sec   Loss 7.8745   LearningRate 0.0287   Epoch: 18   Global Step: 93880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:45:20,690-Speed 11148.80 samples/sec   Loss 8.0099   LearningRate 0.0287   Epoch: 18   Global Step: 93890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:45:21,607-Speed 11167.16 samples/sec   Loss 7.8235   LearningRate 0.0287   Epoch: 18   Global Step: 93900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:45:22,582-Speed 10514.31 samples/sec   Loss 7.8501   LearningRate 0.0287   Epoch: 18   Global Step: 93910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:45:23,514-Speed 10998.64 samples/sec   Loss 8.0664   LearningRate 0.0287   Epoch: 18   Global Step: 93920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:45:24,491-Speed 10495.87 samples/sec   Loss 7.6817   LearningRate 0.0287   Epoch: 18   Global Step: 93930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:45:25,421-Speed 11027.40 samples/sec   Loss 7.8020   LearningRate 0.0287   Epoch: 18   Global Step: 93940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:45:26,360-Speed 10917.14 samples/sec   Loss 7.7080   LearningRate 0.0287   Epoch: 18   Global Step: 93950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:45:27,278-Speed 11170.46 samples/sec   Loss 7.8109   LearningRate 0.0287   Epoch: 18   Global Step: 93960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:45:28,262-Speed 10423.15 samples/sec   Loss 7.7921   LearningRate 0.0287   Epoch: 18   Global Step: 93970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:45:29,217-Speed 10733.12 samples/sec   Loss 8.0444   LearningRate 0.0287   Epoch: 18   Global Step: 93980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:45:30,172-Speed 10734.80 samples/sec   Loss 7.9098   LearningRate 0.0287   Epoch: 18   Global Step: 93990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:45:31,154-Speed 10436.79 samples/sec   Loss 7.9407   LearningRate 0.0287   Epoch: 18   Global Step: 94000   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:45:53,324-[lfw][94000]XNorm: 11.337297
Training: 2022-04-11 02:45:53,325-[lfw][94000]Accuracy-Flip: 0.99500+-0.00373
Training: 2022-04-11 02:45:53,326-[lfw][94000]Accuracy-Highest: 0.99650
Training: 2022-04-11 02:46:18,983-[cfp_fp][94000]XNorm: 9.587214
Training: 2022-04-11 02:46:18,984-[cfp_fp][94000]Accuracy-Flip: 0.96171+-0.01188
Training: 2022-04-11 02:46:18,985-[cfp_fp][94000]Accuracy-Highest: 0.96171
Training: 2022-04-11 02:46:41,284-[agedb_30][94000]XNorm: 11.063094
Training: 2022-04-11 02:46:41,285-[agedb_30][94000]Accuracy-Flip: 0.96400+-0.00688
Training: 2022-04-11 02:46:41,285-[agedb_30][94000]Accuracy-Highest: 0.96517
Training: 2022-04-11 02:46:42,264-Speed 144.00 samples/sec   Loss 7.8169   LearningRate 0.0287   Epoch: 18   Global Step: 94010   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:46:43,168-Speed 11341.62 samples/sec   Loss 7.8410   LearningRate 0.0287   Epoch: 18   Global Step: 94020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:46:44,132-Speed 10630.35 samples/sec   Loss 7.9774   LearningRate 0.0286   Epoch: 18   Global Step: 94030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:46:45,099-Speed 10597.21 samples/sec   Loss 7.8476   LearningRate 0.0286   Epoch: 18   Global Step: 94040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:46:46,063-Speed 10642.03 samples/sec   Loss 7.8778   LearningRate 0.0286   Epoch: 18   Global Step: 94050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:46:47,004-Speed 10910.37 samples/sec   Loss 7.7866   LearningRate 0.0286   Epoch: 18   Global Step: 94060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:46:48,075-Speed 9582.32 samples/sec   Loss 7.8264   LearningRate 0.0286   Epoch: 18   Global Step: 94070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:46:49,012-Speed 10942.15 samples/sec   Loss 7.8153   LearningRate 0.0286   Epoch: 18   Global Step: 94080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:46:49,987-Speed 10509.65 samples/sec   Loss 7.8684   LearningRate 0.0286   Epoch: 18   Global Step: 94090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:46:50,990-Speed 10217.00 samples/sec   Loss 7.7636   LearningRate 0.0286   Epoch: 18   Global Step: 94100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:46:51,926-Speed 10960.95 samples/sec   Loss 7.9492   LearningRate 0.0286   Epoch: 18   Global Step: 94110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:46:52,868-Speed 10878.71 samples/sec   Loss 7.8814   LearningRate 0.0286   Epoch: 18   Global Step: 94120   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:46:53,828-Speed 10669.42 samples/sec   Loss 7.8868   LearningRate 0.0286   Epoch: 18   Global Step: 94130   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:46:54,803-Speed 10520.23 samples/sec   Loss 7.8159   LearningRate 0.0286   Epoch: 18   Global Step: 94140   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:46:55,777-Speed 10515.95 samples/sec   Loss 7.8210   LearningRate 0.0286   Epoch: 18   Global Step: 94150   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:46:56,714-Speed 10946.89 samples/sec   Loss 7.7904   LearningRate 0.0286   Epoch: 18   Global Step: 94160   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:46:57,682-Speed 10580.77 samples/sec   Loss 7.8936   LearningRate 0.0286   Epoch: 18   Global Step: 94170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:46:58,610-Speed 11057.65 samples/sec   Loss 7.9841   LearningRate 0.0286   Epoch: 18   Global Step: 94180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:46:59,563-Speed 10758.19 samples/sec   Loss 7.7975   LearningRate 0.0286   Epoch: 18   Global Step: 94190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:00,534-Speed 10551.52 samples/sec   Loss 7.7581   LearningRate 0.0286   Epoch: 18   Global Step: 94200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:01,499-Speed 10619.05 samples/sec   Loss 7.9083   LearningRate 0.0286   Epoch: 18   Global Step: 94210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:02,558-Speed 9681.25 samples/sec   Loss 7.9492   LearningRate 0.0285   Epoch: 18   Global Step: 94220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:03,481-Speed 11124.06 samples/sec   Loss 7.8159   LearningRate 0.0285   Epoch: 18   Global Step: 94230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:04,418-Speed 10932.33 samples/sec   Loss 7.8402   LearningRate 0.0285   Epoch: 18   Global Step: 94240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:05,368-Speed 10788.32 samples/sec   Loss 7.8738   LearningRate 0.0285   Epoch: 18   Global Step: 94250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:06,271-Speed 11349.56 samples/sec   Loss 7.6627   LearningRate 0.0285   Epoch: 18   Global Step: 94260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:07,275-Speed 10214.95 samples/sec   Loss 7.9046   LearningRate 0.0285   Epoch: 18   Global Step: 94270   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:47:08,218-Speed 10871.41 samples/sec   Loss 7.9568   LearningRate 0.0285   Epoch: 18   Global Step: 94280   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:47:09,147-Speed 11042.42 samples/sec   Loss 8.0265   LearningRate 0.0285   Epoch: 18   Global Step: 94290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:10,090-Speed 10865.28 samples/sec   Loss 7.7455   LearningRate 0.0285   Epoch: 18   Global Step: 94300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:11,073-Speed 10440.43 samples/sec   Loss 7.8591   LearningRate 0.0285   Epoch: 18   Global Step: 94310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:11,996-Speed 11114.21 samples/sec   Loss 7.8538   LearningRate 0.0285   Epoch: 18   Global Step: 94320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:12,919-Speed 11102.69 samples/sec   Loss 7.7761   LearningRate 0.0285   Epoch: 18   Global Step: 94330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:13,835-Speed 11189.06 samples/sec   Loss 7.7754   LearningRate 0.0285   Epoch: 18   Global Step: 94340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:14,790-Speed 10733.74 samples/sec   Loss 7.9689   LearningRate 0.0285   Epoch: 18   Global Step: 94350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:15,745-Speed 10730.61 samples/sec   Loss 7.9319   LearningRate 0.0285   Epoch: 18   Global Step: 94360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:16,654-Speed 11283.62 samples/sec   Loss 7.9629   LearningRate 0.0285   Epoch: 18   Global Step: 94370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:17,647-Speed 10322.32 samples/sec   Loss 7.8811   LearningRate 0.0285   Epoch: 18   Global Step: 94380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:18,653-Speed 10190.56 samples/sec   Loss 7.9857   LearningRate 0.0285   Epoch: 18   Global Step: 94390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:47:19,573-Speed 11144.04 samples/sec   Loss 7.7123   LearningRate 0.0285   Epoch: 18   Global Step: 94400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:20,524-Speed 10768.94 samples/sec   Loss 7.6539   LearningRate 0.0284   Epoch: 18   Global Step: 94410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:21,486-Speed 10654.29 samples/sec   Loss 7.9282   LearningRate 0.0284   Epoch: 18   Global Step: 94420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:22,479-Speed 10333.83 samples/sec   Loss 7.7557   LearningRate 0.0284   Epoch: 18   Global Step: 94430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:23,424-Speed 10849.42 samples/sec   Loss 7.8902   LearningRate 0.0284   Epoch: 18   Global Step: 94440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:24,360-Speed 10942.30 samples/sec   Loss 7.6352   LearningRate 0.0284   Epoch: 18   Global Step: 94450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:25,334-Speed 10541.19 samples/sec   Loss 7.9226   LearningRate 0.0284   Epoch: 18   Global Step: 94460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:26,313-Speed 10466.99 samples/sec   Loss 7.7419   LearningRate 0.0284   Epoch: 18   Global Step: 94470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:27,309-Speed 10288.91 samples/sec   Loss 7.8462   LearningRate 0.0284   Epoch: 18   Global Step: 94480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:28,241-Speed 11012.52 samples/sec   Loss 7.9408   LearningRate 0.0284   Epoch: 18   Global Step: 94490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:29,181-Speed 10901.23 samples/sec   Loss 7.6552   LearningRate 0.0284   Epoch: 18   Global Step: 94500   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:47:30,146-Speed 10619.93 samples/sec   Loss 7.7226   LearningRate 0.0284   Epoch: 18   Global Step: 94510   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:47:31,196-Speed 9769.25 samples/sec   Loss 7.9305   LearningRate 0.0284   Epoch: 18   Global Step: 94520   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:47:32,120-Speed 11089.61 samples/sec   Loss 7.8766   LearningRate 0.0284   Epoch: 18   Global Step: 94530   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:47:33,033-Speed 11220.52 samples/sec   Loss 7.7664   LearningRate 0.0284   Epoch: 18   Global Step: 94540   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:47:33,960-Speed 11050.29 samples/sec   Loss 7.9791   LearningRate 0.0284   Epoch: 18   Global Step: 94550   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:47:34,966-Speed 10191.91 samples/sec   Loss 7.7409   LearningRate 0.0284   Epoch: 18   Global Step: 94560   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:47:35,863-Speed 11424.66 samples/sec   Loss 7.7935   LearningRate 0.0284   Epoch: 18   Global Step: 94570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:36,839-Speed 10506.96 samples/sec   Loss 7.7850   LearningRate 0.0284   Epoch: 18   Global Step: 94580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:37,782-Speed 10870.18 samples/sec   Loss 7.7503   LearningRate 0.0284   Epoch: 18   Global Step: 94590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:38,752-Speed 10567.17 samples/sec   Loss 7.9770   LearningRate 0.0283   Epoch: 18   Global Step: 94600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:39,748-Speed 10286.30 samples/sec   Loss 7.7619   LearningRate 0.0283   Epoch: 18   Global Step: 94610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:40,706-Speed 10707.86 samples/sec   Loss 7.8221   LearningRate 0.0283   Epoch: 18   Global Step: 94620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:41,639-Speed 10990.00 samples/sec   Loss 7.7127   LearningRate 0.0283   Epoch: 18   Global Step: 94630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:42,569-Speed 11014.06 samples/sec   Loss 7.8121   LearningRate 0.0283   Epoch: 18   Global Step: 94640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:43,488-Speed 11152.60 samples/sec   Loss 7.9205   LearningRate 0.0283   Epoch: 18   Global Step: 94650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:44,509-Speed 10047.11 samples/sec   Loss 7.7423   LearningRate 0.0283   Epoch: 18   Global Step: 94660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:45,420-Speed 11251.92 samples/sec   Loss 7.9542   LearningRate 0.0283   Epoch: 18   Global Step: 94670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:47:46,380-Speed 10675.40 samples/sec   Loss 7.8236   LearningRate 0.0283   Epoch: 18   Global Step: 94680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:47:47,291-Speed 11243.47 samples/sec   Loss 7.8723   LearningRate 0.0283   Epoch: 18   Global Step: 94690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:48,269-Speed 10490.10 samples/sec   Loss 7.7724   LearningRate 0.0283   Epoch: 18   Global Step: 94700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:49,208-Speed 10907.83 samples/sec   Loss 7.8557   LearningRate 0.0283   Epoch: 18   Global Step: 94710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:50,147-Speed 10919.05 samples/sec   Loss 7.7607   LearningRate 0.0283   Epoch: 18   Global Step: 94720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:51,080-Speed 10986.65 samples/sec   Loss 7.8558   LearningRate 0.0283   Epoch: 18   Global Step: 94730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:52,132-Speed 9740.27 samples/sec   Loss 7.7407   LearningRate 0.0283   Epoch: 18   Global Step: 94740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:53,081-Speed 10799.83 samples/sec   Loss 7.7791   LearningRate 0.0283   Epoch: 18   Global Step: 94750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:54,038-Speed 10740.31 samples/sec   Loss 7.7750   LearningRate 0.0283   Epoch: 18   Global Step: 94760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:54,987-Speed 10808.42 samples/sec   Loss 7.8870   LearningRate 0.0283   Epoch: 18   Global Step: 94770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:55,906-Speed 11155.11 samples/sec   Loss 7.9004   LearningRate 0.0283   Epoch: 18   Global Step: 94780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:56,869-Speed 10657.62 samples/sec   Loss 7.7793   LearningRate 0.0282   Epoch: 18   Global Step: 94790   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:47:57,871-Speed 10234.02 samples/sec   Loss 7.7051   LearningRate 0.0282   Epoch: 18   Global Step: 94800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:58,844-Speed 10523.58 samples/sec   Loss 7.8090   LearningRate 0.0282   Epoch: 18   Global Step: 94810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:47:59,804-Speed 10682.07 samples/sec   Loss 7.9287   LearningRate 0.0282   Epoch: 18   Global Step: 94820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:00,782-Speed 10483.16 samples/sec   Loss 7.7819   LearningRate 0.0282   Epoch: 18   Global Step: 94830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:01,682-Speed 11395.51 samples/sec   Loss 7.8841   LearningRate 0.0282   Epoch: 18   Global Step: 94840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:02,618-Speed 10941.65 samples/sec   Loss 7.8418   LearningRate 0.0282   Epoch: 18   Global Step: 94850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:03,549-Speed 11007.70 samples/sec   Loss 7.6717   LearningRate 0.0282   Epoch: 18   Global Step: 94860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:04,533-Speed 10416.15 samples/sec   Loss 7.9456   LearningRate 0.0282   Epoch: 18   Global Step: 94870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:05,448-Speed 11203.45 samples/sec   Loss 7.8315   LearningRate 0.0282   Epoch: 18   Global Step: 94880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:06,368-Speed 11146.81 samples/sec   Loss 7.7660   LearningRate 0.0282   Epoch: 18   Global Step: 94890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:07,285-Speed 11179.26 samples/sec   Loss 7.9446   LearningRate 0.0282   Epoch: 18   Global Step: 94900   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:48:08,320-Speed 9896.92 samples/sec   Loss 7.8271   LearningRate 0.0282   Epoch: 18   Global Step: 94910   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:48:09,278-Speed 10711.35 samples/sec   Loss 7.9364   LearningRate 0.0282   Epoch: 18   Global Step: 94920   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:48:10,219-Speed 10896.36 samples/sec   Loss 7.7756   LearningRate 0.0282   Epoch: 18   Global Step: 94930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:11,215-Speed 10282.14 samples/sec   Loss 7.6636   LearningRate 0.0282   Epoch: 18   Global Step: 94940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:12,171-Speed 10720.44 samples/sec   Loss 7.7984   LearningRate 0.0282   Epoch: 18   Global Step: 94950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:13,096-Speed 11088.06 samples/sec   Loss 7.8794   LearningRate 0.0282   Epoch: 18   Global Step: 94960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:14,011-Speed 11210.26 samples/sec   Loss 7.8431   LearningRate 0.0282   Epoch: 18   Global Step: 94970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:14,942-Speed 11006.51 samples/sec   Loss 7.9096   LearningRate 0.0281   Epoch: 18   Global Step: 94980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:15,909-Speed 10591.25 samples/sec   Loss 7.9218   LearningRate 0.0281   Epoch: 18   Global Step: 94990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:16,838-Speed 11040.10 samples/sec   Loss 7.7040   LearningRate 0.0281   Epoch: 18   Global Step: 95000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:17,808-Speed 10558.69 samples/sec   Loss 7.8258   LearningRate 0.0281   Epoch: 18   Global Step: 95010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:18,763-Speed 10743.03 samples/sec   Loss 7.8107   LearningRate 0.0281   Epoch: 18   Global Step: 95020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:19,740-Speed 10493.22 samples/sec   Loss 7.9741   LearningRate 0.0281   Epoch: 18   Global Step: 95030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:20,703-Speed 10639.50 samples/sec   Loss 7.7536   LearningRate 0.0281   Epoch: 18   Global Step: 95040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:21,680-Speed 10490.25 samples/sec   Loss 7.8428   LearningRate 0.0281   Epoch: 18   Global Step: 95050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:22,614-Speed 10970.70 samples/sec   Loss 7.8993   LearningRate 0.0281   Epoch: 18   Global Step: 95060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:23,541-Speed 11048.28 samples/sec   Loss 7.8084   LearningRate 0.0281   Epoch: 18   Global Step: 95070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:24,510-Speed 10582.31 samples/sec   Loss 7.9051   LearningRate 0.0281   Epoch: 18   Global Step: 95080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:25,434-Speed 11095.21 samples/sec   Loss 7.8628   LearningRate 0.0281   Epoch: 18   Global Step: 95090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:26,422-Speed 10382.54 samples/sec   Loss 7.6967   LearningRate 0.0281   Epoch: 18   Global Step: 95100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:27,390-Speed 10588.65 samples/sec   Loss 8.0077   LearningRate 0.0281   Epoch: 18   Global Step: 95110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:28,378-Speed 10369.72 samples/sec   Loss 7.9481   LearningRate 0.0281   Epoch: 18   Global Step: 95120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:29,327-Speed 10800.28 samples/sec   Loss 7.9446   LearningRate 0.0281   Epoch: 18   Global Step: 95130   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:48:30,306-Speed 10468.97 samples/sec   Loss 7.9075   LearningRate 0.0281   Epoch: 18   Global Step: 95140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:31,319-Speed 10117.45 samples/sec   Loss 7.8762   LearningRate 0.0281   Epoch: 18   Global Step: 95150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:32,277-Speed 10702.26 samples/sec   Loss 7.9773   LearningRate 0.0281   Epoch: 18   Global Step: 95160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:33,215-Speed 10920.79 samples/sec   Loss 8.0043   LearningRate 0.0280   Epoch: 18   Global Step: 95170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:34,283-Speed 9600.98 samples/sec   Loss 7.8105   LearningRate 0.0280   Epoch: 18   Global Step: 95180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:35,234-Speed 10776.08 samples/sec   Loss 7.9920   LearningRate 0.0280   Epoch: 18   Global Step: 95190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:36,156-Speed 11119.78 samples/sec   Loss 7.9281   LearningRate 0.0280   Epoch: 18   Global Step: 95200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:37,077-Speed 11135.23 samples/sec   Loss 7.9881   LearningRate 0.0280   Epoch: 18   Global Step: 95210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:38,036-Speed 10694.91 samples/sec   Loss 7.7579   LearningRate 0.0280   Epoch: 18   Global Step: 95220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:39,088-Speed 9744.21 samples/sec   Loss 7.9617   LearningRate 0.0280   Epoch: 18   Global Step: 95230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:40,042-Speed 10737.69 samples/sec   Loss 7.9572   LearningRate 0.0280   Epoch: 18   Global Step: 95240   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:48:40,975-Speed 10987.79 samples/sec   Loss 7.8883   LearningRate 0.0280   Epoch: 18   Global Step: 95250   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:48:41,934-Speed 10687.28 samples/sec   Loss 7.8128   LearningRate 0.0280   Epoch: 18   Global Step: 95260   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:48:42,889-Speed 10738.73 samples/sec   Loss 8.0029   LearningRate 0.0280   Epoch: 18   Global Step: 95270   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:48:43,820-Speed 11004.66 samples/sec   Loss 7.7795   LearningRate 0.0280   Epoch: 18   Global Step: 95280   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:48:44,767-Speed 10828.19 samples/sec   Loss 7.8916   LearningRate 0.0280   Epoch: 18   Global Step: 95290   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:48:45,700-Speed 10987.82 samples/sec   Loss 7.7912   LearningRate 0.0280   Epoch: 18   Global Step: 95300   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:48:46,696-Speed 10291.38 samples/sec   Loss 7.7325   LearningRate 0.0280   Epoch: 18   Global Step: 95310   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:48:47,668-Speed 10541.56 samples/sec   Loss 7.7239   LearningRate 0.0280   Epoch: 18   Global Step: 95320   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:48:48,654-Speed 10395.41 samples/sec   Loss 7.8231   LearningRate 0.0280   Epoch: 18   Global Step: 95330   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:48:49,626-Speed 10551.23 samples/sec   Loss 8.0014   LearningRate 0.0280   Epoch: 18   Global Step: 95340   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-11 02:48:50,538-Speed 11235.42 samples/sec   Loss 7.8742   LearningRate 0.0280   Epoch: 18   Global Step: 95350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:48:51,535-Speed 10277.33 samples/sec   Loss 7.9128   LearningRate 0.0279   Epoch: 18   Global Step: 95360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:52,514-Speed 10477.99 samples/sec   Loss 7.7487   LearningRate 0.0279   Epoch: 18   Global Step: 95370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:53,469-Speed 10733.22 samples/sec   Loss 7.9319   LearningRate 0.0279   Epoch: 18   Global Step: 95380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:54,467-Speed 10263.33 samples/sec   Loss 7.8723   LearningRate 0.0279   Epoch: 18   Global Step: 95390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:55,451-Speed 10429.22 samples/sec   Loss 7.6995   LearningRate 0.0279   Epoch: 18   Global Step: 95400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:56,392-Speed 10891.14 samples/sec   Loss 7.7780   LearningRate 0.0279   Epoch: 18   Global Step: 95410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:57,370-Speed 10472.46 samples/sec   Loss 7.7120   LearningRate 0.0279   Epoch: 18   Global Step: 95420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:58,374-Speed 10214.71 samples/sec   Loss 7.8197   LearningRate 0.0279   Epoch: 18   Global Step: 95430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:48:59,337-Speed 10639.31 samples/sec   Loss 7.9786   LearningRate 0.0279   Epoch: 18   Global Step: 95440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:49:00,310-Speed 10531.02 samples/sec   Loss 8.0654   LearningRate 0.0279   Epoch: 18   Global Step: 95450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:49:01,247-Speed 10945.24 samples/sec   Loss 7.9130   LearningRate 0.0279   Epoch: 18   Global Step: 95460   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:49:02,215-Speed 10590.94 samples/sec   Loss 7.9389   LearningRate 0.0279   Epoch: 18   Global Step: 95470   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:49:03,143-Speed 11041.89 samples/sec   Loss 7.8375   LearningRate 0.0279   Epoch: 18   Global Step: 95480   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:49:04,047-Speed 11335.74 samples/sec   Loss 7.9180   LearningRate 0.0279   Epoch: 18   Global Step: 95490   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:49:05,010-Speed 10637.59 samples/sec   Loss 7.9295   LearningRate 0.0279   Epoch: 18   Global Step: 95500   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:49:05,940-Speed 11023.46 samples/sec   Loss 7.8649   LearningRate 0.0279   Epoch: 18   Global Step: 95510   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:49:06,910-Speed 10565.67 samples/sec   Loss 7.9441   LearningRate 0.0279   Epoch: 18   Global Step: 95520   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:49:07,848-Speed 10932.01 samples/sec   Loss 7.8766   LearningRate 0.0279   Epoch: 18   Global Step: 95530   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:49:08,791-Speed 10867.11 samples/sec   Loss 7.9760   LearningRate 0.0279   Epoch: 18   Global Step: 95540   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:49:09,768-Speed 10492.58 samples/sec   Loss 7.6765   LearningRate 0.0278   Epoch: 18   Global Step: 95550   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:49:10,744-Speed 10502.93 samples/sec   Loss 7.8004   LearningRate 0.0278   Epoch: 18   Global Step: 95560   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:49:11,689-Speed 10844.77 samples/sec   Loss 7.7463   LearningRate 0.0278   Epoch: 18   Global Step: 95570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:49:12,640-Speed 10772.88 samples/sec   Loss 7.7577   LearningRate 0.0278   Epoch: 18   Global Step: 95580   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:49:13,625-Speed 10410.81 samples/sec   Loss 7.8463   LearningRate 0.0278   Epoch: 18   Global Step: 95590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:49:14,559-Speed 10977.37 samples/sec   Loss 7.8021   LearningRate 0.0278   Epoch: 18   Global Step: 95600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:49:15,498-Speed 10916.74 samples/sec   Loss 7.8750   LearningRate 0.0278   Epoch: 18   Global Step: 95610   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:49:16,488-Speed 10349.37 samples/sec   Loss 7.8083   LearningRate 0.0278   Epoch: 18   Global Step: 95620   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:49:17,454-Speed 10611.81 samples/sec   Loss 7.6965   LearningRate 0.0278   Epoch: 18   Global Step: 95630   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:49:18,435-Speed 10451.17 samples/sec   Loss 7.9542   LearningRate 0.0278   Epoch: 18   Global Step: 95640   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:49:19,364-Speed 11033.38 samples/sec   Loss 7.9718   LearningRate 0.0278   Epoch: 18   Global Step: 95650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:49:20,353-Speed 10365.52 samples/sec   Loss 7.7356   LearningRate 0.0278   Epoch: 18   Global Step: 95660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:49:21,320-Speed 10596.63 samples/sec   Loss 7.7885   LearningRate 0.0278   Epoch: 18   Global Step: 95670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:49:22,281-Speed 10676.09 samples/sec   Loss 7.8762   LearningRate 0.0278   Epoch: 18   Global Step: 95680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:49:23,236-Speed 10729.44 samples/sec   Loss 7.8755   LearningRate 0.0278   Epoch: 18   Global Step: 95690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:49:24,202-Speed 10612.33 samples/sec   Loss 7.8737   LearningRate 0.0278   Epoch: 18   Global Step: 95700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:49:25,113-Speed 11241.61 samples/sec   Loss 7.9047   LearningRate 0.0278   Epoch: 18   Global Step: 95710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:49:26,056-Speed 10879.83 samples/sec   Loss 7.7400   LearningRate 0.0278   Epoch: 18   Global Step: 95720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:49:27,003-Speed 10819.53 samples/sec   Loss 7.5740   LearningRate 0.0278   Epoch: 18   Global Step: 95730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:49:27,919-Speed 11195.49 samples/sec   Loss 7.9473   LearningRate 0.0278   Epoch: 18   Global Step: 95740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:49:28,908-Speed 10355.22 samples/sec   Loss 8.0080   LearningRate 0.0277   Epoch: 18   Global Step: 95750   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:49:29,897-Speed 10368.01 samples/sec   Loss 7.8706   LearningRate 0.0277   Epoch: 18   Global Step: 95760   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:49:30,874-Speed 10493.96 samples/sec   Loss 7.9176   LearningRate 0.0277   Epoch: 18   Global Step: 95770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:49:31,811-Speed 10942.88 samples/sec   Loss 7.8342   LearningRate 0.0277   Epoch: 18   Global Step: 95780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:49:32,781-Speed 10558.82 samples/sec   Loss 7.7974   LearningRate 0.0277   Epoch: 18   Global Step: 95790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:49:33,707-Speed 11071.48 samples/sec   Loss 7.7977   LearningRate 0.0277   Epoch: 18   Global Step: 95800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:49:34,621-Speed 11213.88 samples/sec   Loss 7.8723   LearningRate 0.0277   Epoch: 18   Global Step: 95810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:49:35,581-Speed 10674.39 samples/sec   Loss 7.8696   LearningRate 0.0277   Epoch: 18   Global Step: 95820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:49:36,485-Speed 11345.62 samples/sec   Loss 7.9592   LearningRate 0.0277   Epoch: 18   Global Step: 95830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:49:37,473-Speed 10370.37 samples/sec   Loss 7.7593   LearningRate 0.0277   Epoch: 18   Global Step: 95840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:49:38,433-Speed 10685.58 samples/sec   Loss 7.8455   LearningRate 0.0277   Epoch: 18   Global Step: 95850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:49:39,390-Speed 10712.18 samples/sec   Loss 7.7760   LearningRate 0.0277   Epoch: 18   Global Step: 95860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:49:40,392-Speed 10228.53 samples/sec   Loss 7.8540   LearningRate 0.0277   Epoch: 18   Global Step: 95870   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:49:41,338-Speed 10832.41 samples/sec   Loss 8.0681   LearningRate 0.0277   Epoch: 18   Global Step: 95880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:49:42,364-Speed 9985.00 samples/sec   Loss 7.9721   LearningRate 0.0277   Epoch: 18   Global Step: 95890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:49:43,294-Speed 11028.56 samples/sec   Loss 7.8646   LearningRate 0.0277   Epoch: 18   Global Step: 95900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:49:44,239-Speed 10846.55 samples/sec   Loss 8.0690   LearningRate 0.0277   Epoch: 18   Global Step: 95910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:49:45,268-Speed 9965.28 samples/sec   Loss 7.8795   LearningRate 0.0277   Epoch: 18   Global Step: 95920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:49:46,246-Speed 10514.15 samples/sec   Loss 7.7950   LearningRate 0.0277   Epoch: 18   Global Step: 95930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:49:47,203-Speed 10703.45 samples/sec   Loss 7.9906   LearningRate 0.0276   Epoch: 18   Global Step: 95940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:49:48,196-Speed 10329.85 samples/sec   Loss 7.8591   LearningRate 0.0276   Epoch: 18   Global Step: 95950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:49:49,158-Speed 10644.11 samples/sec   Loss 7.7624   LearningRate 0.0276   Epoch: 18   Global Step: 95960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:49:50,112-Speed 10748.33 samples/sec   Loss 7.8800   LearningRate 0.0276   Epoch: 18   Global Step: 95970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:49:51,082-Speed 10561.19 samples/sec   Loss 7.8167   LearningRate 0.0276   Epoch: 18   Global Step: 95980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:49:52,051-Speed 10579.76 samples/sec   Loss 7.8461   LearningRate 0.0276   Epoch: 18   Global Step: 95990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:49:53,016-Speed 10620.99 samples/sec   Loss 7.7680   LearningRate 0.0276   Epoch: 18   Global Step: 96000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:50:15,457-[lfw][96000]XNorm: 10.994974
Training: 2022-04-11 02:50:15,458-[lfw][96000]Accuracy-Flip: 0.99633+-0.00314
Training: 2022-04-11 02:50:15,459-[lfw][96000]Accuracy-Highest: 0.99650
Training: 2022-04-11 02:50:41,225-[cfp_fp][96000]XNorm: 9.499207
Training: 2022-04-11 02:50:41,226-[cfp_fp][96000]Accuracy-Flip: 0.96014+-0.01039
Training: 2022-04-11 02:50:41,227-[cfp_fp][96000]Accuracy-Highest: 0.96171
Training: 2022-04-11 02:51:03,547-[agedb_30][96000]XNorm: 10.803999
Training: 2022-04-11 02:51:03,547-[agedb_30][96000]Accuracy-Flip: 0.96383+-0.00860
Training: 2022-04-11 02:51:03,547-[agedb_30][96000]Accuracy-Highest: 0.96517
Training: 2022-04-11 02:51:04,492-Speed 143.27 samples/sec   Loss 7.7996   LearningRate 0.0276   Epoch: 18   Global Step: 96010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:51:05,417-Speed 11087.47 samples/sec   Loss 7.6760   LearningRate 0.0276   Epoch: 18   Global Step: 96020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:51:06,376-Speed 10680.43 samples/sec   Loss 7.8401   LearningRate 0.0276   Epoch: 18   Global Step: 96030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:51:07,317-Speed 10892.83 samples/sec   Loss 7.8286   LearningRate 0.0276   Epoch: 18   Global Step: 96040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:51:08,370-Speed 9739.59 samples/sec   Loss 7.9189   LearningRate 0.0276   Epoch: 18   Global Step: 96050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:51:09,350-Speed 10459.11 samples/sec   Loss 7.8609   LearningRate 0.0276   Epoch: 18   Global Step: 96060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:51:10,268-Speed 11160.21 samples/sec   Loss 7.7405   LearningRate 0.0276   Epoch: 18   Global Step: 96070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:51:11,255-Speed 10388.49 samples/sec   Loss 7.8487   LearningRate 0.0276   Epoch: 18   Global Step: 96080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:51:12,256-Speed 10243.25 samples/sec   Loss 7.9181   LearningRate 0.0276   Epoch: 18   Global Step: 96090   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:51:13,226-Speed 10566.13 samples/sec   Loss 7.7492   LearningRate 0.0276   Epoch: 18   Global Step: 96100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:51:25,119-Speed 861.12 samples/sec   Loss 7.1716   LearningRate 0.0276   Epoch: 19   Global Step: 96110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:51:26,123-Speed 10212.07 samples/sec   Loss 6.9964   LearningRate 0.0276   Epoch: 19   Global Step: 96120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:51:27,210-Speed 9431.88 samples/sec   Loss 7.0652   LearningRate 0.0275   Epoch: 19   Global Step: 96130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:51:28,233-Speed 10025.90 samples/sec   Loss 6.9712   LearningRate 0.0275   Epoch: 19   Global Step: 96140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:51:29,247-Speed 10107.42 samples/sec   Loss 6.9506   LearningRate 0.0275   Epoch: 19   Global Step: 96150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:51:30,320-Speed 9555.83 samples/sec   Loss 6.9988   LearningRate 0.0275   Epoch: 19   Global Step: 96160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:51:31,387-Speed 9608.48 samples/sec   Loss 7.0825   LearningRate 0.0275   Epoch: 19   Global Step: 96170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:51:32,402-Speed 10094.45 samples/sec   Loss 7.1563   LearningRate 0.0275   Epoch: 19   Global Step: 96180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:51:33,343-Speed 10903.03 samples/sec   Loss 7.0908   LearningRate 0.0275   Epoch: 19   Global Step: 96190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:51:34,240-Speed 11425.33 samples/sec   Loss 6.8943   LearningRate 0.0275   Epoch: 19   Global Step: 96200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:51:35,217-Speed 10486.09 samples/sec   Loss 6.7935   LearningRate 0.0275   Epoch: 19   Global Step: 96210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:51:36,153-Speed 10945.95 samples/sec   Loss 6.9746   LearningRate 0.0275   Epoch: 19   Global Step: 96220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:51:37,136-Speed 10439.08 samples/sec   Loss 6.9086   LearningRate 0.0275   Epoch: 19   Global Step: 96230   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:51:38,199-Speed 9643.81 samples/sec   Loss 6.9199   LearningRate 0.0275   Epoch: 19   Global Step: 96240   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:51:39,106-Speed 11302.64 samples/sec   Loss 7.0515   LearningRate 0.0275   Epoch: 19   Global Step: 96250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:51:40,109-Speed 10221.28 samples/sec   Loss 7.0918   LearningRate 0.0275   Epoch: 19   Global Step: 96260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:51:41,134-Speed 10003.01 samples/sec   Loss 7.1254   LearningRate 0.0275   Epoch: 19   Global Step: 96270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:51:42,157-Speed 10018.22 samples/sec   Loss 6.9277   LearningRate 0.0275   Epoch: 19   Global Step: 96280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:51:43,176-Speed 10059.99 samples/sec   Loss 6.9787   LearningRate 0.0275   Epoch: 19   Global Step: 96290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:51:44,180-Speed 10226.41 samples/sec   Loss 6.9875   LearningRate 0.0275   Epoch: 19   Global Step: 96300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:51:45,132-Speed 10764.43 samples/sec   Loss 7.0701   LearningRate 0.0275   Epoch: 19   Global Step: 96310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:51:46,059-Speed 11055.56 samples/sec   Loss 6.9637   LearningRate 0.0274   Epoch: 19   Global Step: 96320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:51:47,016-Speed 10706.38 samples/sec   Loss 7.0172   LearningRate 0.0274   Epoch: 19   Global Step: 96330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:51:48,004-Speed 10376.10 samples/sec   Loss 7.0050   LearningRate 0.0274   Epoch: 19   Global Step: 96340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:51:48,981-Speed 10492.41 samples/sec   Loss 7.0267   LearningRate 0.0274   Epoch: 19   Global Step: 96350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:51:49,947-Speed 10611.88 samples/sec   Loss 7.0141   LearningRate 0.0274   Epoch: 19   Global Step: 96360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:51:50,887-Speed 10906.57 samples/sec   Loss 6.9542   LearningRate 0.0274   Epoch: 19   Global Step: 96370   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:51:51,848-Speed 10672.37 samples/sec   Loss 7.0447   LearningRate 0.0274   Epoch: 19   Global Step: 96380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:51:52,805-Speed 10720.36 samples/sec   Loss 6.9522   LearningRate 0.0274   Epoch: 19   Global Step: 96390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:51:53,779-Speed 10513.48 samples/sec   Loss 7.2098   LearningRate 0.0274   Epoch: 19   Global Step: 96400   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:51:54,738-Speed 10686.83 samples/sec   Loss 7.0666   LearningRate 0.0274   Epoch: 19   Global Step: 96410   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:51:55,718-Speed 10467.51 samples/sec   Loss 7.1903   LearningRate 0.0274   Epoch: 19   Global Step: 96420   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:51:56,676-Speed 10700.14 samples/sec   Loss 7.0516   LearningRate 0.0274   Epoch: 19   Global Step: 96430   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:51:57,657-Speed 10443.25 samples/sec   Loss 7.1497   LearningRate 0.0274   Epoch: 19   Global Step: 96440   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:51:58,656-Speed 10264.56 samples/sec   Loss 7.0726   LearningRate 0.0274   Epoch: 19   Global Step: 96450   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:51:59,594-Speed 10936.75 samples/sec   Loss 7.0769   LearningRate 0.0274   Epoch: 19   Global Step: 96460   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:52:00,531-Speed 10935.38 samples/sec   Loss 7.2634   LearningRate 0.0274   Epoch: 19   Global Step: 96470   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:52:01,484-Speed 10756.08 samples/sec   Loss 7.2900   LearningRate 0.0274   Epoch: 19   Global Step: 96480   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:52:02,502-Speed 10068.12 samples/sec   Loss 7.1321   LearningRate 0.0274   Epoch: 19   Global Step: 96490   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:52:03,486-Speed 10422.45 samples/sec   Loss 7.1800   LearningRate 0.0274   Epoch: 19   Global Step: 96500   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:52:04,459-Speed 10539.25 samples/sec   Loss 7.1490   LearningRate 0.0274   Epoch: 19   Global Step: 96510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:52:05,412-Speed 10752.79 samples/sec   Loss 7.1136   LearningRate 0.0273   Epoch: 19   Global Step: 96520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:52:06,367-Speed 10728.85 samples/sec   Loss 7.4004   LearningRate 0.0273   Epoch: 19   Global Step: 96530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:52:07,336-Speed 10581.76 samples/sec   Loss 7.1398   LearningRate 0.0273   Epoch: 19   Global Step: 96540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:52:08,292-Speed 10717.19 samples/sec   Loss 7.2502   LearningRate 0.0273   Epoch: 19   Global Step: 96550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:52:09,212-Speed 11143.13 samples/sec   Loss 7.4216   LearningRate 0.0273   Epoch: 19   Global Step: 96560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:52:10,192-Speed 10458.62 samples/sec   Loss 7.2505   LearningRate 0.0273   Epoch: 19   Global Step: 96570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:52:11,199-Speed 10177.39 samples/sec   Loss 7.2432   LearningRate 0.0273   Epoch: 19   Global Step: 96580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:52:12,158-Speed 10690.88 samples/sec   Loss 7.3565   LearningRate 0.0273   Epoch: 19   Global Step: 96590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:52:13,147-Speed 10367.71 samples/sec   Loss 7.0896   LearningRate 0.0273   Epoch: 19   Global Step: 96600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:52:14,075-Speed 11048.21 samples/sec   Loss 7.3957   LearningRate 0.0273   Epoch: 19   Global Step: 96610   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:52:15,087-Speed 10137.72 samples/sec   Loss 7.2813   LearningRate 0.0273   Epoch: 19   Global Step: 96620   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:52:16,042-Speed 10734.18 samples/sec   Loss 7.2826   LearningRate 0.0273   Epoch: 19   Global Step: 96630   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:52:16,963-Speed 11117.66 samples/sec   Loss 7.3391   LearningRate 0.0273   Epoch: 19   Global Step: 96640   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:52:17,973-Speed 10150.72 samples/sec   Loss 7.2183   LearningRate 0.0273   Epoch: 19   Global Step: 96650   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:52:18,945-Speed 10548.64 samples/sec   Loss 7.3259   LearningRate 0.0273   Epoch: 19   Global Step: 96660   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:52:19,931-Speed 10396.24 samples/sec   Loss 7.3532   LearningRate 0.0273   Epoch: 19   Global Step: 96670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:52:20,905-Speed 10516.23 samples/sec   Loss 7.1528   LearningRate 0.0273   Epoch: 19   Global Step: 96680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:52:21,896-Speed 10345.08 samples/sec   Loss 7.3046   LearningRate 0.0273   Epoch: 19   Global Step: 96690   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:52:22,873-Speed 10495.48 samples/sec   Loss 7.1214   LearningRate 0.0273   Epoch: 19   Global Step: 96700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:52:23,863-Speed 10351.88 samples/sec   Loss 7.3427   LearningRate 0.0272   Epoch: 19   Global Step: 96710   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:52:24,821-Speed 10691.28 samples/sec   Loss 7.2699   LearningRate 0.0272   Epoch: 19   Global Step: 96720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:52:25,784-Speed 10649.43 samples/sec   Loss 7.4641   LearningRate 0.0272   Epoch: 19   Global Step: 96730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:52:26,763-Speed 10468.00 samples/sec   Loss 7.2721   LearningRate 0.0272   Epoch: 19   Global Step: 96740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:52:27,698-Speed 10966.44 samples/sec   Loss 7.2779   LearningRate 0.0272   Epoch: 19   Global Step: 96750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:52:28,658-Speed 10671.70 samples/sec   Loss 7.3868   LearningRate 0.0272   Epoch: 19   Global Step: 96760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:52:29,675-Speed 10083.06 samples/sec   Loss 7.2472   LearningRate 0.0272   Epoch: 19   Global Step: 96770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:52:30,711-Speed 9891.52 samples/sec   Loss 7.3943   LearningRate 0.0272   Epoch: 19   Global Step: 96780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:52:31,679-Speed 10586.64 samples/sec   Loss 7.4205   LearningRate 0.0272   Epoch: 19   Global Step: 96790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:52:32,631-Speed 10769.55 samples/sec   Loss 7.2658   LearningRate 0.0272   Epoch: 19   Global Step: 96800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:52:33,640-Speed 10154.57 samples/sec   Loss 7.3962   LearningRate 0.0272   Epoch: 19   Global Step: 96810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:52:34,584-Speed 10861.04 samples/sec   Loss 7.3835   LearningRate 0.0272   Epoch: 19   Global Step: 96820   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:52:35,514-Speed 11022.80 samples/sec   Loss 7.1939   LearningRate 0.0272   Epoch: 19   Global Step: 96830   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:52:36,457-Speed 10865.63 samples/sec   Loss 7.4091   LearningRate 0.0272   Epoch: 19   Global Step: 96840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:52:37,421-Speed 10638.77 samples/sec   Loss 7.4062   LearningRate 0.0272   Epoch: 19   Global Step: 96850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:52:38,494-Speed 9550.77 samples/sec   Loss 7.3884   LearningRate 0.0272   Epoch: 19   Global Step: 96860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:52:39,453-Speed 10689.88 samples/sec   Loss 7.3985   LearningRate 0.0272   Epoch: 19   Global Step: 96870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:52:40,410-Speed 10719.52 samples/sec   Loss 7.1900   LearningRate 0.0272   Epoch: 19   Global Step: 96880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:52:41,364-Speed 10732.26 samples/sec   Loss 7.3373   LearningRate 0.0272   Epoch: 19   Global Step: 96890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:52:42,361-Speed 10283.52 samples/sec   Loss 7.4690   LearningRate 0.0271   Epoch: 19   Global Step: 96900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:52:43,298-Speed 10944.84 samples/sec   Loss 7.4583   LearningRate 0.0271   Epoch: 19   Global Step: 96910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:52:44,194-Speed 11435.51 samples/sec   Loss 7.4409   LearningRate 0.0271   Epoch: 19   Global Step: 96920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:52:45,168-Speed 10522.42 samples/sec   Loss 7.3234   LearningRate 0.0271   Epoch: 19   Global Step: 96930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:52:46,129-Speed 10669.44 samples/sec   Loss 7.5035   LearningRate 0.0271   Epoch: 19   Global Step: 96940   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:52:47,138-Speed 10154.46 samples/sec   Loss 7.3247   LearningRate 0.0271   Epoch: 19   Global Step: 96950   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:52:48,078-Speed 10911.06 samples/sec   Loss 7.3751   LearningRate 0.0271   Epoch: 19   Global Step: 96960   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:52:49,020-Speed 10878.42 samples/sec   Loss 7.4258   LearningRate 0.0271   Epoch: 19   Global Step: 96970   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:52:49,963-Speed 10870.86 samples/sec   Loss 7.3126   LearningRate 0.0271   Epoch: 19   Global Step: 96980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:52:50,962-Speed 10507.34 samples/sec   Loss 7.4046   LearningRate 0.0271   Epoch: 19   Global Step: 96990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:52:51,942-Speed 10465.29 samples/sec   Loss 7.4647   LearningRate 0.0271   Epoch: 19   Global Step: 97000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:52:52,900-Speed 10691.13 samples/sec   Loss 7.4318   LearningRate 0.0271   Epoch: 19   Global Step: 97010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:52:53,848-Speed 10814.93 samples/sec   Loss 7.3083   LearningRate 0.0271   Epoch: 19   Global Step: 97020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:52:54,771-Speed 11104.34 samples/sec   Loss 7.4271   LearningRate 0.0271   Epoch: 19   Global Step: 97030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:52:55,711-Speed 10910.81 samples/sec   Loss 7.4845   LearningRate 0.0271   Epoch: 19   Global Step: 97040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:52:56,663-Speed 10765.93 samples/sec   Loss 7.3486   LearningRate 0.0271   Epoch: 19   Global Step: 97050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:52:57,669-Speed 10188.21 samples/sec   Loss 7.2332   LearningRate 0.0271   Epoch: 19   Global Step: 97060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:52:58,614-Speed 10855.38 samples/sec   Loss 7.5051   LearningRate 0.0271   Epoch: 19   Global Step: 97070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:52:59,609-Speed 10297.78 samples/sec   Loss 7.5361   LearningRate 0.0271   Epoch: 19   Global Step: 97080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:53:00,612-Speed 10215.63 samples/sec   Loss 7.4106   LearningRate 0.0271   Epoch: 19   Global Step: 97090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:53:01,640-Speed 9970.07 samples/sec   Loss 7.4737   LearningRate 0.0270   Epoch: 19   Global Step: 97100   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:53:02,609-Speed 10583.09 samples/sec   Loss 7.5562   LearningRate 0.0270   Epoch: 19   Global Step: 97110   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:53:03,570-Speed 10659.54 samples/sec   Loss 7.5177   LearningRate 0.0270   Epoch: 19   Global Step: 97120   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:53:04,528-Speed 10701.13 samples/sec   Loss 7.5338   LearningRate 0.0270   Epoch: 19   Global Step: 97130   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:53:05,539-Speed 10134.98 samples/sec   Loss 7.5645   LearningRate 0.0270   Epoch: 19   Global Step: 97140   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:53:06,454-Speed 11205.80 samples/sec   Loss 7.3939   LearningRate 0.0270   Epoch: 19   Global Step: 97150   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:53:07,441-Speed 10381.43 samples/sec   Loss 7.3741   LearningRate 0.0270   Epoch: 19   Global Step: 97160   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:53:08,406-Speed 10629.12 samples/sec   Loss 7.4424   LearningRate 0.0270   Epoch: 19   Global Step: 97170   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:53:09,370-Speed 10627.96 samples/sec   Loss 7.2573   LearningRate 0.0270   Epoch: 19   Global Step: 97180   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:53:10,311-Speed 10891.63 samples/sec   Loss 7.5563   LearningRate 0.0270   Epoch: 19   Global Step: 97190   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:53:11,323-Speed 10129.79 samples/sec   Loss 7.4847   LearningRate 0.0270   Epoch: 19   Global Step: 97200   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:53:12,260-Speed 10941.41 samples/sec   Loss 7.5449   LearningRate 0.0270   Epoch: 19   Global Step: 97210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:53:13,159-Speed 11404.78 samples/sec   Loss 7.5000   LearningRate 0.0270   Epoch: 19   Global Step: 97220   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:53:14,175-Speed 10082.73 samples/sec   Loss 7.4255   LearningRate 0.0270   Epoch: 19   Global Step: 97230   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:53:15,140-Speed 10627.16 samples/sec   Loss 7.5405   LearningRate 0.0270   Epoch: 19   Global Step: 97240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:53:16,086-Speed 10829.47 samples/sec   Loss 7.3639   LearningRate 0.0270   Epoch: 19   Global Step: 97250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:53:17,084-Speed 10266.54 samples/sec   Loss 7.4775   LearningRate 0.0270   Epoch: 19   Global Step: 97260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:53:18,103-Speed 10063.39 samples/sec   Loss 7.4247   LearningRate 0.0270   Epoch: 19   Global Step: 97270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:53:19,043-Speed 10897.38 samples/sec   Loss 7.4065   LearningRate 0.0270   Epoch: 19   Global Step: 97280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:53:20,046-Speed 10226.89 samples/sec   Loss 7.4757   LearningRate 0.0269   Epoch: 19   Global Step: 97290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:53:21,053-Speed 10169.09 samples/sec   Loss 7.6247   LearningRate 0.0269   Epoch: 19   Global Step: 97300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:53:22,028-Speed 10521.97 samples/sec   Loss 7.4996   LearningRate 0.0269   Epoch: 19   Global Step: 97310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:53:22,986-Speed 10706.49 samples/sec   Loss 7.5265   LearningRate 0.0269   Epoch: 19   Global Step: 97320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:53:23,906-Speed 11139.88 samples/sec   Loss 7.5621   LearningRate 0.0269   Epoch: 19   Global Step: 97330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:53:24,917-Speed 10136.92 samples/sec   Loss 7.3945   LearningRate 0.0269   Epoch: 19   Global Step: 97340   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:53:25,855-Speed 10924.61 samples/sec   Loss 7.5785   LearningRate 0.0269   Epoch: 19   Global Step: 97350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:53:26,797-Speed 10882.28 samples/sec   Loss 7.4861   LearningRate 0.0269   Epoch: 19   Global Step: 97360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:53:27,763-Speed 10612.31 samples/sec   Loss 7.5397   LearningRate 0.0269   Epoch: 19   Global Step: 97370   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:53:28,730-Speed 10604.03 samples/sec   Loss 7.4935   LearningRate 0.0269   Epoch: 19   Global Step: 97380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:53:29,705-Speed 10513.59 samples/sec   Loss 7.5131   LearningRate 0.0269   Epoch: 19   Global Step: 97390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:53:30,665-Speed 10668.63 samples/sec   Loss 7.4504   LearningRate 0.0269   Epoch: 19   Global Step: 97400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:53:31,634-Speed 10580.70 samples/sec   Loss 7.3729   LearningRate 0.0269   Epoch: 19   Global Step: 97410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:53:32,672-Speed 9870.32 samples/sec   Loss 7.4919   LearningRate 0.0269   Epoch: 19   Global Step: 97420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:53:33,634-Speed 10661.31 samples/sec   Loss 7.4397   LearningRate 0.0269   Epoch: 19   Global Step: 97430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:53:34,588-Speed 10752.69 samples/sec   Loss 7.6054   LearningRate 0.0269   Epoch: 19   Global Step: 97440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:53:35,536-Speed 10813.00 samples/sec   Loss 7.4351   LearningRate 0.0269   Epoch: 19   Global Step: 97450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:53:36,485-Speed 10790.53 samples/sec   Loss 7.4291   LearningRate 0.0269   Epoch: 19   Global Step: 97460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:53:37,507-Speed 10031.69 samples/sec   Loss 7.5610   LearningRate 0.0269   Epoch: 19   Global Step: 97470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:53:38,483-Speed 10510.49 samples/sec   Loss 7.4496   LearningRate 0.0269   Epoch: 19   Global Step: 97480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:53:39,462-Speed 10467.11 samples/sec   Loss 7.4855   LearningRate 0.0268   Epoch: 19   Global Step: 97490   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:53:40,489-Speed 9979.31 samples/sec   Loss 7.5267   LearningRate 0.0268   Epoch: 19   Global Step: 97500   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:53:41,430-Speed 10898.80 samples/sec   Loss 7.4161   LearningRate 0.0268   Epoch: 19   Global Step: 97510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:53:42,400-Speed 10571.29 samples/sec   Loss 7.7535   LearningRate 0.0268   Epoch: 19   Global Step: 97520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:53:43,412-Speed 10137.09 samples/sec   Loss 7.5758   LearningRate 0.0268   Epoch: 19   Global Step: 97530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:53:44,354-Speed 10878.08 samples/sec   Loss 7.3987   LearningRate 0.0268   Epoch: 19   Global Step: 97540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:53:45,330-Speed 10500.62 samples/sec   Loss 7.4351   LearningRate 0.0268   Epoch: 19   Global Step: 97550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:53:46,295-Speed 10628.68 samples/sec   Loss 7.4693   LearningRate 0.0268   Epoch: 19   Global Step: 97560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:53:47,362-Speed 9625.95 samples/sec   Loss 7.4899   LearningRate 0.0268   Epoch: 19   Global Step: 97570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:53:48,314-Speed 10768.10 samples/sec   Loss 7.6136   LearningRate 0.0268   Epoch: 19   Global Step: 97580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:53:49,264-Speed 10780.75 samples/sec   Loss 7.4729   LearningRate 0.0268   Epoch: 19   Global Step: 97590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:53:50,243-Speed 10479.58 samples/sec   Loss 7.5479   LearningRate 0.0268   Epoch: 19   Global Step: 97600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:53:51,197-Speed 10750.24 samples/sec   Loss 7.4641   LearningRate 0.0268   Epoch: 19   Global Step: 97610   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:53:52,187-Speed 10352.37 samples/sec   Loss 7.4063   LearningRate 0.0268   Epoch: 19   Global Step: 97620   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:53:53,196-Speed 10157.26 samples/sec   Loss 7.4715   LearningRate 0.0268   Epoch: 19   Global Step: 97630   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:53:54,168-Speed 10556.28 samples/sec   Loss 7.6610   LearningRate 0.0268   Epoch: 19   Global Step: 97640   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:53:55,072-Speed 11340.23 samples/sec   Loss 7.4696   LearningRate 0.0268   Epoch: 19   Global Step: 97650   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:53:56,010-Speed 10930.50 samples/sec   Loss 7.5671   LearningRate 0.0268   Epoch: 19   Global Step: 97660   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:53:56,978-Speed 10581.50 samples/sec   Loss 7.4183   LearningRate 0.0268   Epoch: 19   Global Step: 97670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:53:57,991-Speed 10125.28 samples/sec   Loss 7.5141   LearningRate 0.0267   Epoch: 19   Global Step: 97680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:53:58,945-Speed 10745.06 samples/sec   Loss 7.4756   LearningRate 0.0267   Epoch: 19   Global Step: 97690   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:53:59,900-Speed 10731.96 samples/sec   Loss 7.4394   LearningRate 0.0267   Epoch: 19   Global Step: 97700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:54:00,855-Speed 10731.29 samples/sec   Loss 7.6404   LearningRate 0.0267   Epoch: 19   Global Step: 97710   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:54:01,837-Speed 10445.33 samples/sec   Loss 7.6653   LearningRate 0.0267   Epoch: 19   Global Step: 97720   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:54:02,807-Speed 10569.40 samples/sec   Loss 7.4624   LearningRate 0.0267   Epoch: 19   Global Step: 97730   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:54:03,759-Speed 10761.93 samples/sec   Loss 7.5826   LearningRate 0.0267   Epoch: 19   Global Step: 97740   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:54:04,721-Speed 10659.84 samples/sec   Loss 7.6108   LearningRate 0.0267   Epoch: 19   Global Step: 97750   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:54:05,680-Speed 10678.86 samples/sec   Loss 7.5912   LearningRate 0.0267   Epoch: 19   Global Step: 97760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:54:06,620-Speed 10912.51 samples/sec   Loss 7.6444   LearningRate 0.0267   Epoch: 19   Global Step: 97770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:54:07,648-Speed 9966.84 samples/sec   Loss 7.5673   LearningRate 0.0267   Epoch: 19   Global Step: 97780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:54:08,625-Speed 10493.48 samples/sec   Loss 7.4856   LearningRate 0.0267   Epoch: 19   Global Step: 97790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:54:09,653-Speed 9978.77 samples/sec   Loss 7.3965   LearningRate 0.0267   Epoch: 19   Global Step: 97800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:54:10,683-Speed 9949.13 samples/sec   Loss 7.4624   LearningRate 0.0267   Epoch: 19   Global Step: 97810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:54:11,616-Speed 10988.71 samples/sec   Loss 7.5910   LearningRate 0.0267   Epoch: 19   Global Step: 97820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:54:12,523-Speed 11308.15 samples/sec   Loss 7.6822   LearningRate 0.0267   Epoch: 19   Global Step: 97830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:54:13,506-Speed 10419.71 samples/sec   Loss 7.7573   LearningRate 0.0267   Epoch: 19   Global Step: 97840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:54:14,511-Speed 10201.18 samples/sec   Loss 7.6509   LearningRate 0.0267   Epoch: 19   Global Step: 97850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:54:15,475-Speed 10646.49 samples/sec   Loss 7.6948   LearningRate 0.0267   Epoch: 19   Global Step: 97860   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:54:16,420-Speed 10858.01 samples/sec   Loss 7.7138   LearningRate 0.0267   Epoch: 19   Global Step: 97870   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:54:17,313-Speed 11484.91 samples/sec   Loss 7.5866   LearningRate 0.0266   Epoch: 19   Global Step: 97880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:54:18,277-Speed 10626.59 samples/sec   Loss 7.6147   LearningRate 0.0266   Epoch: 19   Global Step: 97890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:54:19,242-Speed 10616.33 samples/sec   Loss 7.5567   LearningRate 0.0266   Epoch: 19   Global Step: 97900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:54:20,202-Speed 10686.98 samples/sec   Loss 7.4627   LearningRate 0.0266   Epoch: 19   Global Step: 97910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:54:21,185-Speed 10424.70 samples/sec   Loss 7.5847   LearningRate 0.0266   Epoch: 19   Global Step: 97920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:54:22,202-Speed 10081.62 samples/sec   Loss 7.6597   LearningRate 0.0266   Epoch: 19   Global Step: 97930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:54:23,123-Speed 11132.59 samples/sec   Loss 7.6887   LearningRate 0.0266   Epoch: 19   Global Step: 97940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:54:24,071-Speed 10805.28 samples/sec   Loss 7.6532   LearningRate 0.0266   Epoch: 19   Global Step: 97950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:54:25,054-Speed 10430.05 samples/sec   Loss 7.5437   LearningRate 0.0266   Epoch: 19   Global Step: 97960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:54:26,019-Speed 10621.42 samples/sec   Loss 7.7002   LearningRate 0.0266   Epoch: 19   Global Step: 97970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:54:27,021-Speed 10224.15 samples/sec   Loss 7.5787   LearningRate 0.0266   Epoch: 19   Global Step: 97980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:54:28,024-Speed 10221.74 samples/sec   Loss 7.6056   LearningRate 0.0266   Epoch: 19   Global Step: 97990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:54:28,978-Speed 10749.29 samples/sec   Loss 7.6327   LearningRate 0.0266   Epoch: 19   Global Step: 98000   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:54:51,149-[lfw][98000]XNorm: 10.974050
Training: 2022-04-11 02:54:51,150-[lfw][98000]Accuracy-Flip: 0.99667+-0.00325
Training: 2022-04-11 02:54:51,150-[lfw][98000]Accuracy-Highest: 0.99667
Training: 2022-04-11 02:55:16,573-[cfp_fp][98000]XNorm: 9.334824
Training: 2022-04-11 02:55:16,574-[cfp_fp][98000]Accuracy-Flip: 0.96114+-0.01036
Training: 2022-04-11 02:55:16,575-[cfp_fp][98000]Accuracy-Highest: 0.96171
Training: 2022-04-11 02:55:38,527-[agedb_30][98000]XNorm: 10.768596
Training: 2022-04-11 02:55:38,527-[agedb_30][98000]Accuracy-Flip: 0.96733+-0.00750
Training: 2022-04-11 02:55:38,527-[agedb_30][98000]Accuracy-Highest: 0.96733
Training: 2022-04-11 02:55:39,459-Speed 145.29 samples/sec   Loss 7.6956   LearningRate 0.0266   Epoch: 19   Global Step: 98010   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:55:40,417-Speed 10701.48 samples/sec   Loss 7.7188   LearningRate 0.0266   Epoch: 19   Global Step: 98020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:55:41,448-Speed 9940.57 samples/sec   Loss 7.6165   LearningRate 0.0266   Epoch: 19   Global Step: 98030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:55:42,442-Speed 10308.23 samples/sec   Loss 7.5910   LearningRate 0.0266   Epoch: 19   Global Step: 98040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:55:43,378-Speed 10951.45 samples/sec   Loss 7.3694   LearningRate 0.0266   Epoch: 19   Global Step: 98050   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:55:44,327-Speed 10800.64 samples/sec   Loss 7.6350   LearningRate 0.0266   Epoch: 19   Global Step: 98060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:55:45,272-Speed 10838.02 samples/sec   Loss 7.4759   LearningRate 0.0266   Epoch: 19   Global Step: 98070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:55:46,166-Speed 11467.36 samples/sec   Loss 7.4510   LearningRate 0.0265   Epoch: 19   Global Step: 98080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:55:47,121-Speed 10742.92 samples/sec   Loss 7.6258   LearningRate 0.0265   Epoch: 19   Global Step: 98090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:55:48,097-Speed 10493.98 samples/sec   Loss 7.6385   LearningRate 0.0265   Epoch: 19   Global Step: 98100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:55:49,117-Speed 10057.11 samples/sec   Loss 7.6150   LearningRate 0.0265   Epoch: 19   Global Step: 98110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:55:50,063-Speed 10831.45 samples/sec   Loss 7.7780   LearningRate 0.0265   Epoch: 19   Global Step: 98120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:55:51,030-Speed 10590.54 samples/sec   Loss 7.5449   LearningRate 0.0265   Epoch: 19   Global Step: 98130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:55:52,032-Speed 10234.63 samples/sec   Loss 7.5539   LearningRate 0.0265   Epoch: 19   Global Step: 98140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:55:52,972-Speed 10902.97 samples/sec   Loss 7.6641   LearningRate 0.0265   Epoch: 19   Global Step: 98150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:55:53,896-Speed 11099.19 samples/sec   Loss 7.6949   LearningRate 0.0265   Epoch: 19   Global Step: 98160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:55:54,838-Speed 10870.85 samples/sec   Loss 7.6400   LearningRate 0.0265   Epoch: 19   Global Step: 98170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:55:55,790-Speed 10771.65 samples/sec   Loss 7.5798   LearningRate 0.0265   Epoch: 19   Global Step: 98180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:55:56,742-Speed 10758.78 samples/sec   Loss 7.5963   LearningRate 0.0265   Epoch: 19   Global Step: 98190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:55:57,696-Speed 10742.84 samples/sec   Loss 7.5597   LearningRate 0.0265   Epoch: 19   Global Step: 98200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:55:58,673-Speed 10494.98 samples/sec   Loss 7.7377   LearningRate 0.0265   Epoch: 19   Global Step: 98210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:55:59,637-Speed 10625.73 samples/sec   Loss 7.7358   LearningRate 0.0265   Epoch: 19   Global Step: 98220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:56:00,652-Speed 10099.02 samples/sec   Loss 7.4503   LearningRate 0.0265   Epoch: 19   Global Step: 98230   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:56:01,596-Speed 10859.84 samples/sec   Loss 7.6000   LearningRate 0.0265   Epoch: 19   Global Step: 98240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:56:02,550-Speed 10749.44 samples/sec   Loss 7.7224   LearningRate 0.0265   Epoch: 19   Global Step: 98250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:56:03,568-Speed 10061.06 samples/sec   Loss 7.6406   LearningRate 0.0265   Epoch: 19   Global Step: 98260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:56:04,513-Speed 10854.86 samples/sec   Loss 7.6292   LearningRate 0.0264   Epoch: 19   Global Step: 98270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:56:05,443-Speed 11016.00 samples/sec   Loss 7.6551   LearningRate 0.0264   Epoch: 19   Global Step: 98280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:56:06,417-Speed 10527.78 samples/sec   Loss 7.6894   LearningRate 0.0264   Epoch: 19   Global Step: 98290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:56:07,366-Speed 10792.45 samples/sec   Loss 7.5877   LearningRate 0.0264   Epoch: 19   Global Step: 98300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:56:08,369-Speed 10229.94 samples/sec   Loss 7.6072   LearningRate 0.0264   Epoch: 19   Global Step: 98310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:56:09,314-Speed 10850.12 samples/sec   Loss 7.6883   LearningRate 0.0264   Epoch: 19   Global Step: 98320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:56:10,212-Speed 11412.23 samples/sec   Loss 7.7139   LearningRate 0.0264   Epoch: 19   Global Step: 98330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:56:11,177-Speed 10621.46 samples/sec   Loss 7.6240   LearningRate 0.0264   Epoch: 19   Global Step: 98340   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:56:12,159-Speed 10438.49 samples/sec   Loss 7.6161   LearningRate 0.0264   Epoch: 19   Global Step: 98350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:56:13,114-Speed 10727.68 samples/sec   Loss 7.5721   LearningRate 0.0264   Epoch: 19   Global Step: 98360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:56:14,090-Speed 10506.48 samples/sec   Loss 7.5391   LearningRate 0.0264   Epoch: 19   Global Step: 98370   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:56:15,008-Speed 11166.30 samples/sec   Loss 7.8962   LearningRate 0.0264   Epoch: 19   Global Step: 98380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:56:15,900-Speed 11494.66 samples/sec   Loss 7.6472   LearningRate 0.0264   Epoch: 19   Global Step: 98390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:56:16,835-Speed 10958.28 samples/sec   Loss 7.6949   LearningRate 0.0264   Epoch: 19   Global Step: 98400   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:56:17,802-Speed 10607.95 samples/sec   Loss 7.6206   LearningRate 0.0264   Epoch: 19   Global Step: 98410   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:56:18,776-Speed 10518.17 samples/sec   Loss 7.5474   LearningRate 0.0264   Epoch: 19   Global Step: 98420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:56:19,742-Speed 10609.44 samples/sec   Loss 7.5838   LearningRate 0.0264   Epoch: 19   Global Step: 98430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:56:20,695-Speed 10749.60 samples/sec   Loss 7.7593   LearningRate 0.0264   Epoch: 19   Global Step: 98440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:56:21,646-Speed 10786.57 samples/sec   Loss 7.6477   LearningRate 0.0264   Epoch: 19   Global Step: 98450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:56:22,619-Speed 10540.44 samples/sec   Loss 7.7562   LearningRate 0.0264   Epoch: 19   Global Step: 98460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:56:23,605-Speed 10390.22 samples/sec   Loss 7.8393   LearningRate 0.0263   Epoch: 19   Global Step: 98470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:56:24,582-Speed 10498.10 samples/sec   Loss 7.6083   LearningRate 0.0263   Epoch: 19   Global Step: 98480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:56:25,509-Speed 11054.53 samples/sec   Loss 7.5924   LearningRate 0.0263   Epoch: 19   Global Step: 98490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:56:26,468-Speed 10695.34 samples/sec   Loss 7.7037   LearningRate 0.0263   Epoch: 19   Global Step: 98500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:56:27,458-Speed 10347.98 samples/sec   Loss 7.8030   LearningRate 0.0263   Epoch: 19   Global Step: 98510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:56:28,454-Speed 10289.69 samples/sec   Loss 7.7289   LearningRate 0.0263   Epoch: 19   Global Step: 98520   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:56:29,421-Speed 10606.89 samples/sec   Loss 7.8236   LearningRate 0.0263   Epoch: 19   Global Step: 98530   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:56:30,346-Speed 11079.24 samples/sec   Loss 7.6994   LearningRate 0.0263   Epoch: 19   Global Step: 98540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:56:31,345-Speed 10257.95 samples/sec   Loss 7.6435   LearningRate 0.0263   Epoch: 19   Global Step: 98550   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:56:32,349-Speed 10204.22 samples/sec   Loss 7.6278   LearningRate 0.0263   Epoch: 19   Global Step: 98560   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:56:33,281-Speed 11001.69 samples/sec   Loss 7.4742   LearningRate 0.0263   Epoch: 19   Global Step: 98570   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:56:34,207-Speed 11060.98 samples/sec   Loss 7.8550   LearningRate 0.0263   Epoch: 19   Global Step: 98580   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:56:35,205-Speed 10275.60 samples/sec   Loss 7.6576   LearningRate 0.0263   Epoch: 19   Global Step: 98590   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:56:36,205-Speed 10248.96 samples/sec   Loss 7.6606   LearningRate 0.0263   Epoch: 19   Global Step: 98600   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:56:37,188-Speed 10426.63 samples/sec   Loss 7.6405   LearningRate 0.0263   Epoch: 19   Global Step: 98610   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:56:38,126-Speed 10927.73 samples/sec   Loss 7.6320   LearningRate 0.0263   Epoch: 19   Global Step: 98620   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:56:39,067-Speed 10890.21 samples/sec   Loss 7.7146   LearningRate 0.0263   Epoch: 19   Global Step: 98630   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:56:40,027-Speed 10678.97 samples/sec   Loss 7.6522   LearningRate 0.0263   Epoch: 19   Global Step: 98640   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:56:41,000-Speed 10535.54 samples/sec   Loss 7.7238   LearningRate 0.0263   Epoch: 19   Global Step: 98650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:56:41,926-Speed 11075.83 samples/sec   Loss 7.4429   LearningRate 0.0263   Epoch: 19   Global Step: 98660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:56:42,969-Speed 9825.69 samples/sec   Loss 7.5847   LearningRate 0.0262   Epoch: 19   Global Step: 98670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:56:43,930-Speed 10660.65 samples/sec   Loss 7.8215   LearningRate 0.0262   Epoch: 19   Global Step: 98680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:56:44,860-Speed 11026.66 samples/sec   Loss 7.6308   LearningRate 0.0262   Epoch: 19   Global Step: 98690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:56:45,754-Speed 11467.14 samples/sec   Loss 7.7283   LearningRate 0.0262   Epoch: 19   Global Step: 98700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:56:46,746-Speed 10326.46 samples/sec   Loss 7.6223   LearningRate 0.0262   Epoch: 19   Global Step: 98710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:56:47,667-Speed 11127.98 samples/sec   Loss 7.6924   LearningRate 0.0262   Epoch: 19   Global Step: 98720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:56:48,662-Speed 10362.53 samples/sec   Loss 7.8032   LearningRate 0.0262   Epoch: 19   Global Step: 98730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:56:49,599-Speed 10945.66 samples/sec   Loss 7.5421   LearningRate 0.0262   Epoch: 19   Global Step: 98740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:56:50,569-Speed 10564.08 samples/sec   Loss 7.9001   LearningRate 0.0262   Epoch: 19   Global Step: 98750   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:56:51,528-Speed 10697.82 samples/sec   Loss 7.5709   LearningRate 0.0262   Epoch: 19   Global Step: 98760   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:56:52,461-Speed 10982.30 samples/sec   Loss 7.7286   LearningRate 0.0262   Epoch: 19   Global Step: 98770   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:56:53,440-Speed 10472.28 samples/sec   Loss 7.7556   LearningRate 0.0262   Epoch: 19   Global Step: 98780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:56:54,433-Speed 10316.35 samples/sec   Loss 7.6611   LearningRate 0.0262   Epoch: 19   Global Step: 98790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:56:55,367-Speed 10975.39 samples/sec   Loss 7.6593   LearningRate 0.0262   Epoch: 19   Global Step: 98800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:56:56,304-Speed 10945.27 samples/sec   Loss 7.5786   LearningRate 0.0262   Epoch: 19   Global Step: 98810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:56:57,268-Speed 10625.96 samples/sec   Loss 7.5455   LearningRate 0.0262   Epoch: 19   Global Step: 98820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:56:58,254-Speed 10396.96 samples/sec   Loss 7.6803   LearningRate 0.0262   Epoch: 19   Global Step: 98830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:56:59,173-Speed 11156.20 samples/sec   Loss 7.6364   LearningRate 0.0262   Epoch: 19   Global Step: 98840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:57:00,106-Speed 10989.01 samples/sec   Loss 7.8095   LearningRate 0.0262   Epoch: 19   Global Step: 98850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:57:01,056-Speed 10781.66 samples/sec   Loss 7.6862   LearningRate 0.0261   Epoch: 19   Global Step: 98860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:57:02,046-Speed 10351.88 samples/sec   Loss 7.7735   LearningRate 0.0261   Epoch: 19   Global Step: 98870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:57:02,978-Speed 11037.56 samples/sec   Loss 7.6779   LearningRate 0.0261   Epoch: 19   Global Step: 98880   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:57:03,912-Speed 10971.95 samples/sec   Loss 7.6903   LearningRate 0.0261   Epoch: 19   Global Step: 98890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:57:04,894-Speed 10432.81 samples/sec   Loss 7.6121   LearningRate 0.0261   Epoch: 19   Global Step: 98900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:57:05,798-Speed 11341.50 samples/sec   Loss 7.6815   LearningRate 0.0261   Epoch: 19   Global Step: 98910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:57:06,751-Speed 10758.00 samples/sec   Loss 7.7373   LearningRate 0.0261   Epoch: 19   Global Step: 98920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:57:07,737-Speed 10390.02 samples/sec   Loss 7.5744   LearningRate 0.0261   Epoch: 19   Global Step: 98930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:57:08,695-Speed 10701.35 samples/sec   Loss 7.7426   LearningRate 0.0261   Epoch: 19   Global Step: 98940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:57:09,619-Speed 11098.72 samples/sec   Loss 7.6822   LearningRate 0.0261   Epoch: 19   Global Step: 98950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:57:10,566-Speed 10823.01 samples/sec   Loss 7.6838   LearningRate 0.0261   Epoch: 19   Global Step: 98960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:57:11,512-Speed 10828.24 samples/sec   Loss 7.6999   LearningRate 0.0261   Epoch: 19   Global Step: 98970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:57:12,461-Speed 10811.07 samples/sec   Loss 7.6968   LearningRate 0.0261   Epoch: 19   Global Step: 98980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:57:13,394-Speed 10976.12 samples/sec   Loss 7.7969   LearningRate 0.0261   Epoch: 19   Global Step: 98990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:57:14,316-Speed 11120.22 samples/sec   Loss 7.8049   LearningRate 0.0261   Epoch: 19   Global Step: 99000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:57:15,228-Speed 11244.35 samples/sec   Loss 7.7847   LearningRate 0.0261   Epoch: 19   Global Step: 99010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:57:16,200-Speed 10539.13 samples/sec   Loss 7.7067   LearningRate 0.0261   Epoch: 19   Global Step: 99020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:57:17,205-Speed 10202.92 samples/sec   Loss 7.6737   LearningRate 0.0261   Epoch: 19   Global Step: 99030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:57:18,157-Speed 10767.94 samples/sec   Loss 7.7080   LearningRate 0.0261   Epoch: 19   Global Step: 99040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:57:19,075-Speed 11166.89 samples/sec   Loss 7.5916   LearningRate 0.0261   Epoch: 19   Global Step: 99050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:57:20,043-Speed 10585.45 samples/sec   Loss 7.7613   LearningRate 0.0260   Epoch: 19   Global Step: 99060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:57:21,112-Speed 9582.27 samples/sec   Loss 7.6099   LearningRate 0.0260   Epoch: 19   Global Step: 99070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:57:22,056-Speed 10868.97 samples/sec   Loss 7.5312   LearningRate 0.0260   Epoch: 19   Global Step: 99080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:57:23,001-Speed 10845.44 samples/sec   Loss 7.7765   LearningRate 0.0260   Epoch: 19   Global Step: 99090   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:57:23,955-Speed 10742.86 samples/sec   Loss 7.8276   LearningRate 0.0260   Epoch: 19   Global Step: 99100   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:57:24,930-Speed 10516.75 samples/sec   Loss 7.7010   LearningRate 0.0260   Epoch: 19   Global Step: 99110   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:57:25,873-Speed 10866.14 samples/sec   Loss 7.5524   LearningRate 0.0260   Epoch: 19   Global Step: 99120   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:57:26,811-Speed 10936.13 samples/sec   Loss 7.6278   LearningRate 0.0260   Epoch: 19   Global Step: 99130   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:57:27,777-Speed 10603.72 samples/sec   Loss 7.5391   LearningRate 0.0260   Epoch: 19   Global Step: 99140   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:57:28,752-Speed 10513.51 samples/sec   Loss 7.5708   LearningRate 0.0260   Epoch: 19   Global Step: 99150   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:57:29,731-Speed 10471.13 samples/sec   Loss 7.6320   LearningRate 0.0260   Epoch: 19   Global Step: 99160   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:57:30,685-Speed 10749.73 samples/sec   Loss 7.7574   LearningRate 0.0260   Epoch: 19   Global Step: 99170   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:57:31,642-Speed 10701.97 samples/sec   Loss 7.8340   LearningRate 0.0260   Epoch: 19   Global Step: 99180   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:57:32,607-Speed 10622.48 samples/sec   Loss 7.6370   LearningRate 0.0260   Epoch: 19   Global Step: 99190   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:57:33,598-Speed 10341.04 samples/sec   Loss 7.5046   LearningRate 0.0260   Epoch: 19   Global Step: 99200   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:57:34,540-Speed 10882.90 samples/sec   Loss 7.6498   LearningRate 0.0260   Epoch: 19   Global Step: 99210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:57:35,496-Speed 10722.60 samples/sec   Loss 7.6627   LearningRate 0.0260   Epoch: 19   Global Step: 99220   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:57:36,451-Speed 10734.72 samples/sec   Loss 7.7038   LearningRate 0.0260   Epoch: 19   Global Step: 99230   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:57:37,415-Speed 10631.59 samples/sec   Loss 7.5609   LearningRate 0.0260   Epoch: 19   Global Step: 99240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:57:38,352-Speed 10945.65 samples/sec   Loss 7.6389   LearningRate 0.0260   Epoch: 19   Global Step: 99250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:57:39,347-Speed 10303.33 samples/sec   Loss 7.6691   LearningRate 0.0259   Epoch: 19   Global Step: 99260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:57:40,337-Speed 10350.10 samples/sec   Loss 7.6625   LearningRate 0.0259   Epoch: 19   Global Step: 99270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:57:41,304-Speed 10606.64 samples/sec   Loss 7.5711   LearningRate 0.0259   Epoch: 19   Global Step: 99280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:57:42,243-Speed 10910.93 samples/sec   Loss 7.5871   LearningRate 0.0259   Epoch: 19   Global Step: 99290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:57:43,229-Speed 10407.62 samples/sec   Loss 7.7789   LearningRate 0.0259   Epoch: 19   Global Step: 99300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:57:44,167-Speed 10933.85 samples/sec   Loss 7.5182   LearningRate 0.0259   Epoch: 19   Global Step: 99310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:57:45,097-Speed 11008.14 samples/sec   Loss 7.8002   LearningRate 0.0259   Epoch: 19   Global Step: 99320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:57:46,052-Speed 10734.96 samples/sec   Loss 7.6276   LearningRate 0.0259   Epoch: 19   Global Step: 99330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:57:47,038-Speed 10392.24 samples/sec   Loss 7.7312   LearningRate 0.0259   Epoch: 19   Global Step: 99340   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:57:48,011-Speed 10540.55 samples/sec   Loss 7.7885   LearningRate 0.0259   Epoch: 19   Global Step: 99350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:57:48,912-Speed 11369.93 samples/sec   Loss 7.6115   LearningRate 0.0259   Epoch: 19   Global Step: 99360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:57:49,907-Speed 10299.53 samples/sec   Loss 7.7652   LearningRate 0.0259   Epoch: 19   Global Step: 99370   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:57:50,921-Speed 10112.95 samples/sec   Loss 7.6677   LearningRate 0.0259   Epoch: 19   Global Step: 99380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:57:51,835-Speed 11206.81 samples/sec   Loss 7.6504   LearningRate 0.0259   Epoch: 19   Global Step: 99390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:57:52,800-Speed 10625.67 samples/sec   Loss 7.6409   LearningRate 0.0259   Epoch: 19   Global Step: 99400   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:57:53,746-Speed 10825.71 samples/sec   Loss 7.6635   LearningRate 0.0259   Epoch: 19   Global Step: 99410   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:57:54,717-Speed 10565.99 samples/sec   Loss 7.6552   LearningRate 0.0259   Epoch: 19   Global Step: 99420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:57:55,664-Speed 10821.26 samples/sec   Loss 7.6306   LearningRate 0.0259   Epoch: 19   Global Step: 99430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:57:56,607-Speed 10869.53 samples/sec   Loss 7.4890   LearningRate 0.0259   Epoch: 19   Global Step: 99440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:57:57,546-Speed 10915.23 samples/sec   Loss 7.5909   LearningRate 0.0259   Epoch: 19   Global Step: 99450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:57:58,569-Speed 10021.73 samples/sec   Loss 7.8739   LearningRate 0.0258   Epoch: 19   Global Step: 99460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:57:59,512-Speed 10866.50 samples/sec   Loss 7.8808   LearningRate 0.0258   Epoch: 19   Global Step: 99470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:58:00,491-Speed 10476.09 samples/sec   Loss 7.7074   LearningRate 0.0258   Epoch: 19   Global Step: 99480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:58:01,462-Speed 10559.34 samples/sec   Loss 7.6704   LearningRate 0.0258   Epoch: 19   Global Step: 99490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:58:02,401-Speed 10907.09 samples/sec   Loss 7.6169   LearningRate 0.0258   Epoch: 19   Global Step: 99500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:58:03,351-Speed 10796.75 samples/sec   Loss 7.6277   LearningRate 0.0258   Epoch: 19   Global Step: 99510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:58:04,319-Speed 10578.34 samples/sec   Loss 7.8232   LearningRate 0.0258   Epoch: 19   Global Step: 99520   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:58:05,285-Speed 10613.38 samples/sec   Loss 7.7703   LearningRate 0.0258   Epoch: 19   Global Step: 99530   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:58:06,217-Speed 10999.60 samples/sec   Loss 7.6299   LearningRate 0.0258   Epoch: 19   Global Step: 99540   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:58:07,159-Speed 10885.48 samples/sec   Loss 7.6578   LearningRate 0.0258   Epoch: 19   Global Step: 99550   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:58:08,160-Speed 10242.03 samples/sec   Loss 7.8194   LearningRate 0.0258   Epoch: 19   Global Step: 99560   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:58:09,107-Speed 10818.77 samples/sec   Loss 7.9053   LearningRate 0.0258   Epoch: 19   Global Step: 99570   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:58:10,054-Speed 10829.65 samples/sec   Loss 7.6134   LearningRate 0.0258   Epoch: 19   Global Step: 99580   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:58:10,972-Speed 11157.51 samples/sec   Loss 7.7121   LearningRate 0.0258   Epoch: 19   Global Step: 99590   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:58:11,917-Speed 10846.82 samples/sec   Loss 7.7521   LearningRate 0.0258   Epoch: 19   Global Step: 99600   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:58:12,920-Speed 10220.29 samples/sec   Loss 7.6500   LearningRate 0.0258   Epoch: 19   Global Step: 99610   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:58:13,892-Speed 10552.64 samples/sec   Loss 7.7017   LearningRate 0.0258   Epoch: 19   Global Step: 99620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:58:14,860-Speed 10578.39 samples/sec   Loss 7.7366   LearningRate 0.0258   Epoch: 19   Global Step: 99630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:58:15,829-Speed 10578.75 samples/sec   Loss 7.7174   LearningRate 0.0258   Epoch: 19   Global Step: 99640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:58:16,787-Speed 10697.39 samples/sec   Loss 7.6067   LearningRate 0.0258   Epoch: 19   Global Step: 99650   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:58:17,742-Speed 10741.29 samples/sec   Loss 7.6414   LearningRate 0.0257   Epoch: 19   Global Step: 99660   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:58:18,715-Speed 10537.98 samples/sec   Loss 7.7540   LearningRate 0.0257   Epoch: 19   Global Step: 99670   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:58:19,670-Speed 10728.90 samples/sec   Loss 7.7395   LearningRate 0.0257   Epoch: 19   Global Step: 99680   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:58:20,618-Speed 10814.55 samples/sec   Loss 7.6261   LearningRate 0.0257   Epoch: 19   Global Step: 99690   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:58:21,561-Speed 10862.56 samples/sec   Loss 7.7118   LearningRate 0.0257   Epoch: 19   Global Step: 99700   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:58:22,511-Speed 10786.64 samples/sec   Loss 7.7592   LearningRate 0.0257   Epoch: 19   Global Step: 99710   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:58:23,461-Speed 10788.88 samples/sec   Loss 7.7901   LearningRate 0.0257   Epoch: 19   Global Step: 99720   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:58:24,395-Speed 10975.61 samples/sec   Loss 7.9752   LearningRate 0.0257   Epoch: 19   Global Step: 99730   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:58:25,352-Speed 10704.46 samples/sec   Loss 7.6248   LearningRate 0.0257   Epoch: 19   Global Step: 99740   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:58:26,318-Speed 10614.10 samples/sec   Loss 7.6882   LearningRate 0.0257   Epoch: 19   Global Step: 99750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:58:27,240-Speed 11114.88 samples/sec   Loss 7.5997   LearningRate 0.0257   Epoch: 19   Global Step: 99760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:58:28,188-Speed 10813.53 samples/sec   Loss 7.6450   LearningRate 0.0257   Epoch: 19   Global Step: 99770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:58:29,171-Speed 10424.26 samples/sec   Loss 7.6042   LearningRate 0.0257   Epoch: 19   Global Step: 99780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:58:30,143-Speed 10548.87 samples/sec   Loss 7.7939   LearningRate 0.0257   Epoch: 19   Global Step: 99790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:58:31,092-Speed 10796.97 samples/sec   Loss 7.6427   LearningRate 0.0257   Epoch: 19   Global Step: 99800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:58:32,022-Speed 11029.50 samples/sec   Loss 7.7091   LearningRate 0.0257   Epoch: 19   Global Step: 99810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:58:33,020-Speed 10270.69 samples/sec   Loss 7.7206   LearningRate 0.0257   Epoch: 19   Global Step: 99820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:58:33,993-Speed 10534.62 samples/sec   Loss 7.6926   LearningRate 0.0257   Epoch: 19   Global Step: 99830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:58:34,956-Speed 10648.05 samples/sec   Loss 7.7427   LearningRate 0.0257   Epoch: 19   Global Step: 99840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:58:35,901-Speed 10847.58 samples/sec   Loss 7.6116   LearningRate 0.0257   Epoch: 19   Global Step: 99850   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:58:36,865-Speed 10625.90 samples/sec   Loss 7.7300   LearningRate 0.0256   Epoch: 19   Global Step: 99860   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:58:37,831-Speed 10609.29 samples/sec   Loss 7.8308   LearningRate 0.0256   Epoch: 19   Global Step: 99870   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:58:38,767-Speed 10952.49 samples/sec   Loss 7.5879   LearningRate 0.0256   Epoch: 19   Global Step: 99880   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 02:58:39,718-Speed 10784.64 samples/sec   Loss 7.5989   LearningRate 0.0256   Epoch: 19   Global Step: 99890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:58:40,668-Speed 10787.80 samples/sec   Loss 7.6285   LearningRate 0.0256   Epoch: 19   Global Step: 99900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:58:41,622-Speed 10740.94 samples/sec   Loss 7.7427   LearningRate 0.0256   Epoch: 19   Global Step: 99910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:58:42,633-Speed 10140.32 samples/sec   Loss 7.7677   LearningRate 0.0256   Epoch: 19   Global Step: 99920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:58:43,551-Speed 11178.43 samples/sec   Loss 7.6513   LearningRate 0.0256   Epoch: 19   Global Step: 99930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:58:44,498-Speed 10820.31 samples/sec   Loss 7.7124   LearningRate 0.0256   Epoch: 19   Global Step: 99940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 02:58:45,429-Speed 11010.09 samples/sec   Loss 7.8220   LearningRate 0.0256   Epoch: 19   Global Step: 99950   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:58:46,361-Speed 10994.91 samples/sec   Loss 7.6283   LearningRate 0.0256   Epoch: 19   Global Step: 99960   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:58:47,356-Speed 10296.92 samples/sec   Loss 7.7794   LearningRate 0.0256   Epoch: 19   Global Step: 99970   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:58:48,301-Speed 10847.50 samples/sec   Loss 7.7411   LearningRate 0.0256   Epoch: 19   Global Step: 99980   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:58:49,255-Speed 10754.26 samples/sec   Loss 7.5496   LearningRate 0.0256   Epoch: 19   Global Step: 99990   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:58:50,228-Speed 10533.25 samples/sec   Loss 7.8056   LearningRate 0.0256   Epoch: 19   Global Step: 100000   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 02:59:12,709-[lfw][100000]XNorm: 10.631548
Training: 2022-04-11 02:59:12,710-[lfw][100000]Accuracy-Flip: 0.99550+-0.00388
Training: 2022-04-11 02:59:12,710-[lfw][100000]Accuracy-Highest: 0.99667
Training: 2022-04-11 02:59:38,215-[cfp_fp][100000]XNorm: 9.081367
Training: 2022-04-11 02:59:38,216-[cfp_fp][100000]Accuracy-Flip: 0.96243+-0.01250
Training: 2022-04-11 02:59:38,217-[cfp_fp][100000]Accuracy-Highest: 0.96243
Training: 2022-04-11 03:00:00,464-[agedb_30][100000]XNorm: 10.294084
Training: 2022-04-11 03:00:00,466-[agedb_30][100000]Accuracy-Flip: 0.96683+-0.00923
Training: 2022-04-11 03:00:00,466-[agedb_30][100000]Accuracy-Highest: 0.96733
Training: 2022-04-11 03:00:01,418-Speed 143.84 samples/sec   Loss 7.7814   LearningRate 0.0256   Epoch: 19   Global Step: 100010   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:00:02,357-Speed 10916.87 samples/sec   Loss 7.5963   LearningRate 0.0256   Epoch: 19   Global Step: 100020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:00:03,306-Speed 10805.63 samples/sec   Loss 7.8031   LearningRate 0.0256   Epoch: 19   Global Step: 100030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:00:04,270-Speed 10627.82 samples/sec   Loss 7.8950   LearningRate 0.0256   Epoch: 19   Global Step: 100040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:00:05,264-Speed 10311.91 samples/sec   Loss 7.7031   LearningRate 0.0256   Epoch: 19   Global Step: 100050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:00:06,214-Speed 10797.70 samples/sec   Loss 7.6984   LearningRate 0.0255   Epoch: 19   Global Step: 100060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:00:07,158-Speed 10866.61 samples/sec   Loss 7.7203   LearningRate 0.0255   Epoch: 19   Global Step: 100070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:00:08,123-Speed 10617.06 samples/sec   Loss 7.6537   LearningRate 0.0255   Epoch: 19   Global Step: 100080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:00:09,121-Speed 10270.70 samples/sec   Loss 7.6699   LearningRate 0.0255   Epoch: 19   Global Step: 100090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:00:10,094-Speed 10541.21 samples/sec   Loss 7.6917   LearningRate 0.0255   Epoch: 19   Global Step: 100100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:00:11,087-Speed 10314.50 samples/sec   Loss 7.7182   LearningRate 0.0255   Epoch: 19   Global Step: 100110   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:00:12,089-Speed 10225.47 samples/sec   Loss 7.7443   LearningRate 0.0255   Epoch: 19   Global Step: 100120   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:00:13,077-Speed 10387.26 samples/sec   Loss 7.8112   LearningRate 0.0255   Epoch: 19   Global Step: 100130   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:00:13,977-Speed 11395.94 samples/sec   Loss 7.6110   LearningRate 0.0255   Epoch: 19   Global Step: 100140   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:00:14,943-Speed 10608.23 samples/sec   Loss 7.6908   LearningRate 0.0255   Epoch: 19   Global Step: 100150   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:00:15,902-Speed 10690.61 samples/sec   Loss 7.6889   LearningRate 0.0255   Epoch: 19   Global Step: 100160   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:00:16,849-Speed 10822.83 samples/sec   Loss 7.7203   LearningRate 0.0255   Epoch: 19   Global Step: 100170   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:00:17,905-Speed 9702.99 samples/sec   Loss 7.6029   LearningRate 0.0255   Epoch: 19   Global Step: 100180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:00:18,874-Speed 10578.71 samples/sec   Loss 7.8225   LearningRate 0.0255   Epoch: 19   Global Step: 100190   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:00:19,824-Speed 10796.56 samples/sec   Loss 7.6655   LearningRate 0.0255   Epoch: 19   Global Step: 100200   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:00:20,797-Speed 10531.91 samples/sec   Loss 7.6454   LearningRate 0.0255   Epoch: 19   Global Step: 100210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:00:21,789-Speed 10331.06 samples/sec   Loss 7.5544   LearningRate 0.0255   Epoch: 19   Global Step: 100220   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:00:22,745-Speed 10731.17 samples/sec   Loss 7.6829   LearningRate 0.0255   Epoch: 19   Global Step: 100230   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:00:23,658-Speed 11218.04 samples/sec   Loss 7.7832   LearningRate 0.0255   Epoch: 19   Global Step: 100240   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:00:24,596-Speed 10933.55 samples/sec   Loss 7.5781   LearningRate 0.0255   Epoch: 19   Global Step: 100250   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:00:25,547-Speed 10780.08 samples/sec   Loss 7.7434   LearningRate 0.0254   Epoch: 19   Global Step: 100260   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:00:26,497-Speed 10783.50 samples/sec   Loss 7.7035   LearningRate 0.0254   Epoch: 19   Global Step: 100270   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:00:27,462-Speed 10629.03 samples/sec   Loss 7.6247   LearningRate 0.0254   Epoch: 19   Global Step: 100280   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:00:28,427-Speed 10614.53 samples/sec   Loss 7.6904   LearningRate 0.0254   Epoch: 19   Global Step: 100290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:00:29,409-Speed 10454.83 samples/sec   Loss 7.7145   LearningRate 0.0254   Epoch: 19   Global Step: 100300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:00:30,373-Speed 10632.30 samples/sec   Loss 7.7337   LearningRate 0.0254   Epoch: 19   Global Step: 100310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:00:31,380-Speed 10172.86 samples/sec   Loss 7.7337   LearningRate 0.0254   Epoch: 19   Global Step: 100320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:00:32,365-Speed 10415.66 samples/sec   Loss 7.7945   LearningRate 0.0254   Epoch: 19   Global Step: 100330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:00:33,340-Speed 10505.30 samples/sec   Loss 7.7839   LearningRate 0.0254   Epoch: 19   Global Step: 100340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:00:34,258-Speed 11171.88 samples/sec   Loss 7.5710   LearningRate 0.0254   Epoch: 19   Global Step: 100350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:00:35,237-Speed 10469.62 samples/sec   Loss 7.7964   LearningRate 0.0254   Epoch: 19   Global Step: 100360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:00:36,175-Speed 10929.48 samples/sec   Loss 7.6950   LearningRate 0.0254   Epoch: 19   Global Step: 100370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:00:37,088-Speed 11221.97 samples/sec   Loss 7.7323   LearningRate 0.0254   Epoch: 19   Global Step: 100380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:00:38,112-Speed 10013.58 samples/sec   Loss 7.7460   LearningRate 0.0254   Epoch: 19   Global Step: 100390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:00:39,084-Speed 10539.71 samples/sec   Loss 7.6497   LearningRate 0.0254   Epoch: 19   Global Step: 100400   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:00:40,023-Speed 10923.64 samples/sec   Loss 7.6155   LearningRate 0.0254   Epoch: 19   Global Step: 100410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:00:41,002-Speed 10469.00 samples/sec   Loss 7.6162   LearningRate 0.0254   Epoch: 19   Global Step: 100420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:00:41,964-Speed 10652.10 samples/sec   Loss 7.6214   LearningRate 0.0254   Epoch: 19   Global Step: 100430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:00:42,994-Speed 9959.81 samples/sec   Loss 7.7294   LearningRate 0.0254   Epoch: 19   Global Step: 100440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:00:43,915-Speed 11129.13 samples/sec   Loss 7.7436   LearningRate 0.0254   Epoch: 19   Global Step: 100450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:00:44,863-Speed 10803.16 samples/sec   Loss 7.6130   LearningRate 0.0253   Epoch: 19   Global Step: 100460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:00:45,837-Speed 10527.52 samples/sec   Loss 7.7900   LearningRate 0.0253   Epoch: 19   Global Step: 100470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:00:46,794-Speed 10706.21 samples/sec   Loss 7.4869   LearningRate 0.0253   Epoch: 19   Global Step: 100480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:00:47,756-Speed 10657.08 samples/sec   Loss 7.7066   LearningRate 0.0253   Epoch: 19   Global Step: 100490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:00:48,705-Speed 10797.24 samples/sec   Loss 7.6105   LearningRate 0.0253   Epoch: 19   Global Step: 100500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:00:49,650-Speed 10851.09 samples/sec   Loss 7.7380   LearningRate 0.0253   Epoch: 19   Global Step: 100510   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:00:50,683-Speed 9978.86 samples/sec   Loss 7.6477   LearningRate 0.0253   Epoch: 19   Global Step: 100520   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:00:51,644-Speed 10670.50 samples/sec   Loss 7.5645   LearningRate 0.0253   Epoch: 19   Global Step: 100530   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:00:52,620-Speed 10498.41 samples/sec   Loss 7.7469   LearningRate 0.0253   Epoch: 19   Global Step: 100540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:00:53,594-Speed 10519.40 samples/sec   Loss 7.7164   LearningRate 0.0253   Epoch: 19   Global Step: 100550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:00:54,540-Speed 10841.64 samples/sec   Loss 7.7262   LearningRate 0.0253   Epoch: 19   Global Step: 100560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:00:55,467-Speed 11051.43 samples/sec   Loss 7.6242   LearningRate 0.0253   Epoch: 19   Global Step: 100570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:00:56,409-Speed 10880.19 samples/sec   Loss 7.7089   LearningRate 0.0253   Epoch: 19   Global Step: 100580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:00:57,398-Speed 10364.58 samples/sec   Loss 7.7939   LearningRate 0.0253   Epoch: 19   Global Step: 100590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:00:58,340-Speed 10881.27 samples/sec   Loss 7.6647   LearningRate 0.0253   Epoch: 19   Global Step: 100600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:00:59,294-Speed 10745.45 samples/sec   Loss 7.7095   LearningRate 0.0253   Epoch: 19   Global Step: 100610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:01:00,221-Speed 11054.18 samples/sec   Loss 7.8784   LearningRate 0.0253   Epoch: 19   Global Step: 100620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:01:01,194-Speed 10530.91 samples/sec   Loss 7.7488   LearningRate 0.0253   Epoch: 19   Global Step: 100630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:01:02,226-Speed 9935.09 samples/sec   Loss 7.5174   LearningRate 0.0253   Epoch: 19   Global Step: 100640   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:01:03,174-Speed 10809.84 samples/sec   Loss 7.7777   LearningRate 0.0253   Epoch: 19   Global Step: 100650   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:01:04,124-Speed 10791.74 samples/sec   Loss 7.5721   LearningRate 0.0252   Epoch: 19   Global Step: 100660   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:01:05,105-Speed 10442.98 samples/sec   Loss 7.7678   LearningRate 0.0252   Epoch: 19   Global Step: 100670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:01:06,082-Speed 10493.65 samples/sec   Loss 7.7739   LearningRate 0.0252   Epoch: 19   Global Step: 100680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:01:07,134-Speed 9742.77 samples/sec   Loss 7.6220   LearningRate 0.0252   Epoch: 19   Global Step: 100690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:01:08,089-Speed 10737.27 samples/sec   Loss 7.8742   LearningRate 0.0252   Epoch: 19   Global Step: 100700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:01:09,051-Speed 10654.23 samples/sec   Loss 7.7514   LearningRate 0.0252   Epoch: 19   Global Step: 100710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:01:10,060-Speed 10156.63 samples/sec   Loss 7.7930   LearningRate 0.0252   Epoch: 19   Global Step: 100720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:01:11,035-Speed 10510.05 samples/sec   Loss 7.6362   LearningRate 0.0252   Epoch: 19   Global Step: 100730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:01:12,005-Speed 10570.02 samples/sec   Loss 7.5714   LearningRate 0.0252   Epoch: 19   Global Step: 100740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:01:12,974-Speed 10583.32 samples/sec   Loss 7.6517   LearningRate 0.0252   Epoch: 19   Global Step: 100750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:01:13,946-Speed 10543.80 samples/sec   Loss 7.6708   LearningRate 0.0252   Epoch: 19   Global Step: 100760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:01:14,875-Speed 11039.46 samples/sec   Loss 7.6970   LearningRate 0.0252   Epoch: 19   Global Step: 100770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:01:15,800-Speed 11076.99 samples/sec   Loss 7.7677   LearningRate 0.0252   Epoch: 19   Global Step: 100780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:01:16,716-Speed 11187.01 samples/sec   Loss 7.6393   LearningRate 0.0252   Epoch: 19   Global Step: 100790   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:01:17,721-Speed 10194.32 samples/sec   Loss 7.6488   LearningRate 0.0252   Epoch: 19   Global Step: 100800   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:01:18,683-Speed 10664.48 samples/sec   Loss 7.7194   LearningRate 0.0252   Epoch: 19   Global Step: 100810   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:01:19,627-Speed 10857.52 samples/sec   Loss 7.6056   LearningRate 0.0252   Epoch: 19   Global Step: 100820   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:01:20,612-Speed 10409.79 samples/sec   Loss 7.7390   LearningRate 0.0252   Epoch: 19   Global Step: 100830   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:01:21,608-Speed 10290.89 samples/sec   Loss 7.7037   LearningRate 0.0252   Epoch: 19   Global Step: 100840   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:01:22,572-Speed 10626.14 samples/sec   Loss 7.6372   LearningRate 0.0252   Epoch: 19   Global Step: 100850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:01:23,538-Speed 10615.95 samples/sec   Loss 7.6629   LearningRate 0.0251   Epoch: 19   Global Step: 100860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:01:24,553-Speed 10098.29 samples/sec   Loss 7.5238   LearningRate 0.0251   Epoch: 19   Global Step: 100870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:01:25,533-Speed 10457.86 samples/sec   Loss 7.6981   LearningRate 0.0251   Epoch: 19   Global Step: 100880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:01:26,439-Speed 11308.74 samples/sec   Loss 7.7597   LearningRate 0.0251   Epoch: 19   Global Step: 100890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:01:27,384-Speed 10855.05 samples/sec   Loss 7.7305   LearningRate 0.0251   Epoch: 19   Global Step: 100900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:01:28,322-Speed 10924.55 samples/sec   Loss 7.5955   LearningRate 0.0251   Epoch: 19   Global Step: 100910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:01:29,325-Speed 10214.14 samples/sec   Loss 7.6974   LearningRate 0.0251   Epoch: 19   Global Step: 100920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:01:30,297-Speed 10548.19 samples/sec   Loss 7.6722   LearningRate 0.0251   Epoch: 19   Global Step: 100930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:01:31,266-Speed 10573.74 samples/sec   Loss 7.5749   LearningRate 0.0251   Epoch: 19   Global Step: 100940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:01:32,231-Speed 10620.03 samples/sec   Loss 7.7704   LearningRate 0.0251   Epoch: 19   Global Step: 100950   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:01:33,201-Speed 10577.13 samples/sec   Loss 7.6169   LearningRate 0.0251   Epoch: 19   Global Step: 100960   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:01:34,125-Speed 11123.94 samples/sec   Loss 7.7353   LearningRate 0.0251   Epoch: 19   Global Step: 100970   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:01:35,086-Speed 10674.97 samples/sec   Loss 7.6767   LearningRate 0.0251   Epoch: 19   Global Step: 100980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:01:35,991-Speed 11323.97 samples/sec   Loss 7.6933   LearningRate 0.0251   Epoch: 19   Global Step: 100990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:01:36,989-Speed 10269.55 samples/sec   Loss 7.7870   LearningRate 0.0251   Epoch: 19   Global Step: 101000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:01:37,988-Speed 10265.55 samples/sec   Loss 7.6811   LearningRate 0.0251   Epoch: 19   Global Step: 101010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:01:38,940-Speed 10763.06 samples/sec   Loss 7.6825   LearningRate 0.0251   Epoch: 19   Global Step: 101020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:01:39,927-Speed 10383.72 samples/sec   Loss 7.7064   LearningRate 0.0251   Epoch: 19   Global Step: 101030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:01:40,873-Speed 10842.99 samples/sec   Loss 7.7237   LearningRate 0.0251   Epoch: 19   Global Step: 101040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:01:41,789-Speed 11183.60 samples/sec   Loss 7.5492   LearningRate 0.0251   Epoch: 19   Global Step: 101050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:01:42,740-Speed 10773.30 samples/sec   Loss 7.6137   LearningRate 0.0250   Epoch: 19   Global Step: 101060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:01:43,729-Speed 10372.13 samples/sec   Loss 7.6131   LearningRate 0.0250   Epoch: 19   Global Step: 101070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:01:44,671-Speed 10884.85 samples/sec   Loss 7.8780   LearningRate 0.0250   Epoch: 19   Global Step: 101080   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:01:45,640-Speed 10568.34 samples/sec   Loss 7.7683   LearningRate 0.0250   Epoch: 19   Global Step: 101090   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:01:46,606-Speed 10610.85 samples/sec   Loss 7.7552   LearningRate 0.0250   Epoch: 19   Global Step: 101100   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:01:47,570-Speed 10632.49 samples/sec   Loss 7.8429   LearningRate 0.0250   Epoch: 19   Global Step: 101110   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:01:48,537-Speed 10613.20 samples/sec   Loss 7.7151   LearningRate 0.0250   Epoch: 19   Global Step: 101120   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:01:49,487-Speed 10793.92 samples/sec   Loss 7.7125   LearningRate 0.0250   Epoch: 19   Global Step: 101130   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:01:50,401-Speed 11205.85 samples/sec   Loss 7.7005   LearningRate 0.0250   Epoch: 19   Global Step: 101140   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:01:51,505-Speed 9285.57 samples/sec   Loss 7.8580   LearningRate 0.0250   Epoch: 19   Global Step: 101150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:01:52,462-Speed 10717.02 samples/sec   Loss 7.7494   LearningRate 0.0250   Epoch: 19   Global Step: 101160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:02,272-Speed 1043.93 samples/sec   Loss 7.0179   LearningRate 0.0250   Epoch: 20   Global Step: 101170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:03,508-Speed 8293.13 samples/sec   Loss 6.8315   LearningRate 0.0250   Epoch: 20   Global Step: 101180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:04,497-Speed 10364.17 samples/sec   Loss 6.8947   LearningRate 0.0250   Epoch: 20   Global Step: 101190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:05,604-Speed 9273.21 samples/sec   Loss 6.7933   LearningRate 0.0250   Epoch: 20   Global Step: 101200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:06,656-Speed 9743.43 samples/sec   Loss 6.6031   LearningRate 0.0250   Epoch: 20   Global Step: 101210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:07,856-Speed 8536.00 samples/sec   Loss 6.8470   LearningRate 0.0250   Epoch: 20   Global Step: 101220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:08,839-Speed 10425.34 samples/sec   Loss 6.8026   LearningRate 0.0250   Epoch: 20   Global Step: 101230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:09,840-Speed 10251.09 samples/sec   Loss 6.8683   LearningRate 0.0250   Epoch: 20   Global Step: 101240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:10,797-Speed 10714.20 samples/sec   Loss 6.8433   LearningRate 0.0250   Epoch: 20   Global Step: 101250   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:02:11,733-Speed 10945.29 samples/sec   Loss 6.6885   LearningRate 0.0250   Epoch: 20   Global Step: 101260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:12,700-Speed 10598.61 samples/sec   Loss 6.9209   LearningRate 0.0249   Epoch: 20   Global Step: 101270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:13,679-Speed 10464.60 samples/sec   Loss 6.8934   LearningRate 0.0249   Epoch: 20   Global Step: 101280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:14,649-Speed 10564.83 samples/sec   Loss 7.0222   LearningRate 0.0249   Epoch: 20   Global Step: 101290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:15,609-Speed 10674.67 samples/sec   Loss 6.9373   LearningRate 0.0249   Epoch: 20   Global Step: 101300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:16,549-Speed 10907.56 samples/sec   Loss 6.9313   LearningRate 0.0249   Epoch: 20   Global Step: 101310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:17,492-Speed 10874.01 samples/sec   Loss 6.8214   LearningRate 0.0249   Epoch: 20   Global Step: 101320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:18,492-Speed 10242.74 samples/sec   Loss 6.9290   LearningRate 0.0249   Epoch: 20   Global Step: 101330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:19,503-Speed 10143.12 samples/sec   Loss 6.7620   LearningRate 0.0249   Epoch: 20   Global Step: 101340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:20,492-Speed 10369.40 samples/sec   Loss 6.7631   LearningRate 0.0249   Epoch: 20   Global Step: 101350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:21,440-Speed 10811.62 samples/sec   Loss 6.7974   LearningRate 0.0249   Epoch: 20   Global Step: 101360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:02:22,423-Speed 10429.41 samples/sec   Loss 6.8712   LearningRate 0.0249   Epoch: 20   Global Step: 101370   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:02:23,461-Speed 9876.51 samples/sec   Loss 6.8802   LearningRate 0.0249   Epoch: 20   Global Step: 101380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:02:24,452-Speed 10347.04 samples/sec   Loss 6.9521   LearningRate 0.0249   Epoch: 20   Global Step: 101390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:02:25,626-Speed 8726.24 samples/sec   Loss 6.7915   LearningRate 0.0249   Epoch: 20   Global Step: 101400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:26,555-Speed 11034.35 samples/sec   Loss 6.9516   LearningRate 0.0249   Epoch: 20   Global Step: 101410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:27,485-Speed 11029.31 samples/sec   Loss 6.7591   LearningRate 0.0249   Epoch: 20   Global Step: 101420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:28,461-Speed 10501.53 samples/sec   Loss 6.9212   LearningRate 0.0249   Epoch: 20   Global Step: 101430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:29,446-Speed 10401.07 samples/sec   Loss 6.8547   LearningRate 0.0249   Epoch: 20   Global Step: 101440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:30,402-Speed 10729.83 samples/sec   Loss 6.9464   LearningRate 0.0249   Epoch: 20   Global Step: 101450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:31,427-Speed 10007.97 samples/sec   Loss 7.0455   LearningRate 0.0249   Epoch: 20   Global Step: 101460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:32,401-Speed 10514.41 samples/sec   Loss 7.0625   LearningRate 0.0248   Epoch: 20   Global Step: 101470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:33,385-Speed 10426.51 samples/sec   Loss 7.1701   LearningRate 0.0248   Epoch: 20   Global Step: 101480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:34,361-Speed 10495.57 samples/sec   Loss 7.0010   LearningRate 0.0248   Epoch: 20   Global Step: 101490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:35,344-Speed 10425.45 samples/sec   Loss 7.0061   LearningRate 0.0248   Epoch: 20   Global Step: 101500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:36,305-Speed 10670.45 samples/sec   Loss 7.0413   LearningRate 0.0248   Epoch: 20   Global Step: 101510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:37,290-Speed 10399.25 samples/sec   Loss 7.0221   LearningRate 0.0248   Epoch: 20   Global Step: 101520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:38,266-Speed 10514.77 samples/sec   Loss 7.0344   LearningRate 0.0248   Epoch: 20   Global Step: 101530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:39,230-Speed 10633.05 samples/sec   Loss 7.0946   LearningRate 0.0248   Epoch: 20   Global Step: 101540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:40,223-Speed 10317.22 samples/sec   Loss 6.9278   LearningRate 0.0248   Epoch: 20   Global Step: 101550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:41,194-Speed 10549.53 samples/sec   Loss 6.9037   LearningRate 0.0248   Epoch: 20   Global Step: 101560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:42,162-Speed 10593.95 samples/sec   Loss 7.1051   LearningRate 0.0248   Epoch: 20   Global Step: 101570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:43,084-Speed 11106.59 samples/sec   Loss 7.1190   LearningRate 0.0248   Epoch: 20   Global Step: 101580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:44,091-Speed 10183.36 samples/sec   Loss 7.0180   LearningRate 0.0248   Epoch: 20   Global Step: 101590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:45,134-Speed 9832.20 samples/sec   Loss 7.0196   LearningRate 0.0248   Epoch: 20   Global Step: 101600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:02:46,083-Speed 10801.18 samples/sec   Loss 7.1805   LearningRate 0.0248   Epoch: 20   Global Step: 101610   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:02:47,112-Speed 9969.59 samples/sec   Loss 7.0581   LearningRate 0.0248   Epoch: 20   Global Step: 101620   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:02:48,077-Speed 10621.00 samples/sec   Loss 7.0486   LearningRate 0.0248   Epoch: 20   Global Step: 101630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:49,146-Speed 9591.08 samples/sec   Loss 7.2031   LearningRate 0.0248   Epoch: 20   Global Step: 101640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:50,092-Speed 10836.94 samples/sec   Loss 7.0598   LearningRate 0.0248   Epoch: 20   Global Step: 101650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:51,122-Speed 9951.29 samples/sec   Loss 7.1530   LearningRate 0.0248   Epoch: 20   Global Step: 101660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:52,138-Speed 10087.40 samples/sec   Loss 7.0681   LearningRate 0.0247   Epoch: 20   Global Step: 101670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:53,108-Speed 10563.60 samples/sec   Loss 7.0324   LearningRate 0.0247   Epoch: 20   Global Step: 101680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:54,162-Speed 9727.26 samples/sec   Loss 6.9730   LearningRate 0.0247   Epoch: 20   Global Step: 101690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:55,141-Speed 10471.11 samples/sec   Loss 7.1747   LearningRate 0.0247   Epoch: 20   Global Step: 101700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:56,069-Speed 11041.44 samples/sec   Loss 7.1152   LearningRate 0.0247   Epoch: 20   Global Step: 101710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:57,005-Speed 10949.09 samples/sec   Loss 7.0828   LearningRate 0.0247   Epoch: 20   Global Step: 101720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:57,991-Speed 10403.07 samples/sec   Loss 7.1618   LearningRate 0.0247   Epoch: 20   Global Step: 101730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:58,918-Speed 11052.96 samples/sec   Loss 7.1623   LearningRate 0.0247   Epoch: 20   Global Step: 101740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:02:59,842-Speed 11097.86 samples/sec   Loss 7.1429   LearningRate 0.0247   Epoch: 20   Global Step: 101750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:03:00,824-Speed 10433.63 samples/sec   Loss 6.9352   LearningRate 0.0247   Epoch: 20   Global Step: 101760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:03:01,783-Speed 10680.26 samples/sec   Loss 7.1851   LearningRate 0.0247   Epoch: 20   Global Step: 101770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:03:02,773-Speed 10358.48 samples/sec   Loss 7.1305   LearningRate 0.0247   Epoch: 20   Global Step: 101780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:03:03,672-Speed 11403.71 samples/sec   Loss 7.1307   LearningRate 0.0247   Epoch: 20   Global Step: 101790   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:03:04,621-Speed 10800.66 samples/sec   Loss 7.1239   LearningRate 0.0247   Epoch: 20   Global Step: 101800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:03:05,553-Speed 11000.30 samples/sec   Loss 7.1325   LearningRate 0.0247   Epoch: 20   Global Step: 101810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:03:06,523-Speed 10558.69 samples/sec   Loss 6.9665   LearningRate 0.0247   Epoch: 20   Global Step: 101820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:03:07,462-Speed 10917.84 samples/sec   Loss 7.0581   LearningRate 0.0247   Epoch: 20   Global Step: 101830   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:03:08,442-Speed 10475.56 samples/sec   Loss 7.0522   LearningRate 0.0247   Epoch: 20   Global Step: 101840   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:03:09,394-Speed 10755.45 samples/sec   Loss 7.1802   LearningRate 0.0247   Epoch: 20   Global Step: 101850   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:03:10,355-Speed 10672.08 samples/sec   Loss 7.0278   LearningRate 0.0247   Epoch: 20   Global Step: 101860   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:03:11,329-Speed 10517.29 samples/sec   Loss 7.1149   LearningRate 0.0247   Epoch: 20   Global Step: 101870   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:03:12,283-Speed 10737.19 samples/sec   Loss 7.1965   LearningRate 0.0246   Epoch: 20   Global Step: 101880   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:03:13,243-Speed 10685.10 samples/sec   Loss 7.3162   LearningRate 0.0246   Epoch: 20   Global Step: 101890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:03:14,192-Speed 10795.87 samples/sec   Loss 7.1968   LearningRate 0.0246   Epoch: 20   Global Step: 101900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:03:15,170-Speed 10483.25 samples/sec   Loss 7.1787   LearningRate 0.0246   Epoch: 20   Global Step: 101910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:03:16,065-Speed 11441.98 samples/sec   Loss 7.4005   LearningRate 0.0246   Epoch: 20   Global Step: 101920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:03:17,064-Speed 10265.97 samples/sec   Loss 7.1634   LearningRate 0.0246   Epoch: 20   Global Step: 101930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:03:18,043-Speed 10494.51 samples/sec   Loss 7.1366   LearningRate 0.0246   Epoch: 20   Global Step: 101940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:03:19,027-Speed 10416.14 samples/sec   Loss 7.1034   LearningRate 0.0246   Epoch: 20   Global Step: 101950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:03:19,978-Speed 10776.37 samples/sec   Loss 7.3236   LearningRate 0.0246   Epoch: 20   Global Step: 101960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:03:20,930-Speed 10774.77 samples/sec   Loss 7.1414   LearningRate 0.0246   Epoch: 20   Global Step: 101970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:03:21,886-Speed 10713.75 samples/sec   Loss 7.2432   LearningRate 0.0246   Epoch: 20   Global Step: 101980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:03:22,905-Speed 10067.34 samples/sec   Loss 7.2441   LearningRate 0.0246   Epoch: 20   Global Step: 101990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:03:23,836-Speed 10999.71 samples/sec   Loss 7.1724   LearningRate 0.0246   Epoch: 20   Global Step: 102000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:03:46,243-[lfw][102000]XNorm: 10.618879
Training: 2022-04-11 03:03:46,244-[lfw][102000]Accuracy-Flip: 0.99533+-0.00332
Training: 2022-04-11 03:03:46,245-[lfw][102000]Accuracy-Highest: 0.99667
Training: 2022-04-11 03:04:11,993-[cfp_fp][102000]XNorm: 9.081579
Training: 2022-04-11 03:04:11,994-[cfp_fp][102000]Accuracy-Flip: 0.96329+-0.00919
Training: 2022-04-11 03:04:11,995-[cfp_fp][102000]Accuracy-Highest: 0.96329
Training: 2022-04-11 03:04:34,216-[agedb_30][102000]XNorm: 10.321643
Training: 2022-04-11 03:04:34,217-[agedb_30][102000]Accuracy-Flip: 0.96600+-0.00932
Training: 2022-04-11 03:04:34,218-[agedb_30][102000]Accuracy-Highest: 0.96733
Training: 2022-04-11 03:04:35,145-Speed 143.60 samples/sec   Loss 7.1696   LearningRate 0.0246   Epoch: 20   Global Step: 102010   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:04:36,082-Speed 10940.60 samples/sec   Loss 7.0572   LearningRate 0.0246   Epoch: 20   Global Step: 102020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:04:36,986-Speed 11336.68 samples/sec   Loss 7.2570   LearningRate 0.0246   Epoch: 20   Global Step: 102030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:04:37,923-Speed 10941.95 samples/sec   Loss 7.1821   LearningRate 0.0246   Epoch: 20   Global Step: 102040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:04:38,881-Speed 10693.60 samples/sec   Loss 7.2666   LearningRate 0.0246   Epoch: 20   Global Step: 102050   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:04:39,868-Speed 10392.76 samples/sec   Loss 7.1649   LearningRate 0.0246   Epoch: 20   Global Step: 102060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:04:40,789-Speed 11132.62 samples/sec   Loss 7.2812   LearningRate 0.0246   Epoch: 20   Global Step: 102070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:04:41,737-Speed 10813.37 samples/sec   Loss 7.2684   LearningRate 0.0245   Epoch: 20   Global Step: 102080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:04:42,706-Speed 10582.36 samples/sec   Loss 7.1641   LearningRate 0.0245   Epoch: 20   Global Step: 102090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:04:43,711-Speed 10195.38 samples/sec   Loss 7.2338   LearningRate 0.0245   Epoch: 20   Global Step: 102100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:04:44,684-Speed 10534.66 samples/sec   Loss 7.1172   LearningRate 0.0245   Epoch: 20   Global Step: 102110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:04:45,602-Speed 11171.63 samples/sec   Loss 7.3408   LearningRate 0.0245   Epoch: 20   Global Step: 102120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:04:46,540-Speed 10920.90 samples/sec   Loss 7.2017   LearningRate 0.0245   Epoch: 20   Global Step: 102130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:04:47,489-Speed 10798.57 samples/sec   Loss 7.3702   LearningRate 0.0245   Epoch: 20   Global Step: 102140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:04:48,492-Speed 10215.60 samples/sec   Loss 7.2971   LearningRate 0.0245   Epoch: 20   Global Step: 102150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:04:49,442-Speed 10794.24 samples/sec   Loss 7.2401   LearningRate 0.0245   Epoch: 20   Global Step: 102160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:04:50,399-Speed 10704.53 samples/sec   Loss 7.1409   LearningRate 0.0245   Epoch: 20   Global Step: 102170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:04:51,347-Speed 10818.51 samples/sec   Loss 7.1819   LearningRate 0.0245   Epoch: 20   Global Step: 102180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:04:52,316-Speed 10580.70 samples/sec   Loss 7.1223   LearningRate 0.0245   Epoch: 20   Global Step: 102190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:04:53,283-Speed 10592.88 samples/sec   Loss 7.3348   LearningRate 0.0245   Epoch: 20   Global Step: 102200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:04:54,278-Speed 10304.53 samples/sec   Loss 7.3603   LearningRate 0.0245   Epoch: 20   Global Step: 102210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:04:55,257-Speed 10470.52 samples/sec   Loss 7.1843   LearningRate 0.0245   Epoch: 20   Global Step: 102220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:04:56,208-Speed 10777.12 samples/sec   Loss 7.2595   LearningRate 0.0245   Epoch: 20   Global Step: 102230   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:04:57,158-Speed 10790.98 samples/sec   Loss 7.3987   LearningRate 0.0245   Epoch: 20   Global Step: 102240   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:04:58,140-Speed 10428.01 samples/sec   Loss 7.0821   LearningRate 0.0245   Epoch: 20   Global Step: 102250   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:04:59,141-Speed 10245.65 samples/sec   Loss 7.3378   LearningRate 0.0245   Epoch: 20   Global Step: 102260   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:05:00,125-Speed 10415.91 samples/sec   Loss 7.1900   LearningRate 0.0245   Epoch: 20   Global Step: 102270   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:05:01,073-Speed 10811.65 samples/sec   Loss 7.2231   LearningRate 0.0244   Epoch: 20   Global Step: 102280   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:05:02,035-Speed 10652.25 samples/sec   Loss 7.3631   LearningRate 0.0244   Epoch: 20   Global Step: 102290   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:05:02,997-Speed 10653.75 samples/sec   Loss 7.1993   LearningRate 0.0244   Epoch: 20   Global Step: 102300   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:05:03,946-Speed 10810.16 samples/sec   Loss 7.3239   LearningRate 0.0244   Epoch: 20   Global Step: 102310   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:05:04,906-Speed 10667.34 samples/sec   Loss 7.4120   LearningRate 0.0244   Epoch: 20   Global Step: 102320   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:05:05,869-Speed 10639.89 samples/sec   Loss 7.4505   LearningRate 0.0244   Epoch: 20   Global Step: 102330   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:05:06,801-Speed 11001.64 samples/sec   Loss 7.2759   LearningRate 0.0244   Epoch: 20   Global Step: 102340   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:05:07,724-Speed 11104.74 samples/sec   Loss 7.4115   LearningRate 0.0244   Epoch: 20   Global Step: 102350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:05:08,659-Speed 10967.57 samples/sec   Loss 7.2092   LearningRate 0.0244   Epoch: 20   Global Step: 102360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:05:09,634-Speed 10507.60 samples/sec   Loss 7.3422   LearningRate 0.0244   Epoch: 20   Global Step: 102370   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:05:10,560-Speed 11064.59 samples/sec   Loss 7.1481   LearningRate 0.0244   Epoch: 20   Global Step: 102380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:05:11,481-Speed 11125.06 samples/sec   Loss 7.2108   LearningRate 0.0244   Epoch: 20   Global Step: 102390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:05:12,547-Speed 9614.24 samples/sec   Loss 7.4479   LearningRate 0.0244   Epoch: 20   Global Step: 102400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:05:13,529-Speed 10438.38 samples/sec   Loss 7.2860   LearningRate 0.0244   Epoch: 20   Global Step: 102410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:05:14,480-Speed 10780.72 samples/sec   Loss 7.3440   LearningRate 0.0244   Epoch: 20   Global Step: 102420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:05:15,437-Speed 10714.83 samples/sec   Loss 7.3660   LearningRate 0.0244   Epoch: 20   Global Step: 102430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:05:16,409-Speed 10542.70 samples/sec   Loss 7.5141   LearningRate 0.0244   Epoch: 20   Global Step: 102440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:05:17,404-Speed 10297.51 samples/sec   Loss 7.3973   LearningRate 0.0244   Epoch: 20   Global Step: 102450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:05:18,382-Speed 10479.44 samples/sec   Loss 7.4506   LearningRate 0.0244   Epoch: 20   Global Step: 102460   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:05:19,317-Speed 10964.76 samples/sec   Loss 7.3391   LearningRate 0.0244   Epoch: 20   Global Step: 102470   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:05:20,255-Speed 10916.28 samples/sec   Loss 7.4428   LearningRate 0.0244   Epoch: 20   Global Step: 102480   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:05:21,178-Speed 11108.48 samples/sec   Loss 7.3585   LearningRate 0.0243   Epoch: 20   Global Step: 102490   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:05:22,154-Speed 10497.97 samples/sec   Loss 7.2292   LearningRate 0.0243   Epoch: 20   Global Step: 102500   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:05:23,115-Speed 10666.20 samples/sec   Loss 7.3775   LearningRate 0.0243   Epoch: 20   Global Step: 102510   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:05:24,084-Speed 10587.56 samples/sec   Loss 7.3261   LearningRate 0.0243   Epoch: 20   Global Step: 102520   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:05:25,055-Speed 10553.13 samples/sec   Loss 7.4522   LearningRate 0.0243   Epoch: 20   Global Step: 102530   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:05:25,992-Speed 10935.66 samples/sec   Loss 7.4503   LearningRate 0.0243   Epoch: 20   Global Step: 102540   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:05:26,961-Speed 10578.47 samples/sec   Loss 7.4691   LearningRate 0.0243   Epoch: 20   Global Step: 102550   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:05:27,950-Speed 10369.58 samples/sec   Loss 7.4115   LearningRate 0.0243   Epoch: 20   Global Step: 102560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:05:28,939-Speed 10372.59 samples/sec   Loss 7.2352   LearningRate 0.0243   Epoch: 20   Global Step: 102570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:05:29,869-Speed 11016.96 samples/sec   Loss 7.4176   LearningRate 0.0243   Epoch: 20   Global Step: 102580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:05:30,803-Speed 10967.23 samples/sec   Loss 7.2653   LearningRate 0.0243   Epoch: 20   Global Step: 102590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:05:31,768-Speed 10657.76 samples/sec   Loss 7.5566   LearningRate 0.0243   Epoch: 20   Global Step: 102600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:05:32,712-Speed 10855.53 samples/sec   Loss 7.3808   LearningRate 0.0243   Epoch: 20   Global Step: 102610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:05:33,683-Speed 10552.68 samples/sec   Loss 7.3534   LearningRate 0.0243   Epoch: 20   Global Step: 102620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:05:34,691-Speed 10176.74 samples/sec   Loss 7.4065   LearningRate 0.0243   Epoch: 20   Global Step: 102630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:05:35,669-Speed 10481.07 samples/sec   Loss 7.4086   LearningRate 0.0243   Epoch: 20   Global Step: 102640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:05:36,637-Speed 10588.37 samples/sec   Loss 7.3314   LearningRate 0.0243   Epoch: 20   Global Step: 102650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:05:37,605-Speed 10600.65 samples/sec   Loss 7.4549   LearningRate 0.0243   Epoch: 20   Global Step: 102660   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:05:38,554-Speed 10803.22 samples/sec   Loss 7.4394   LearningRate 0.0243   Epoch: 20   Global Step: 102670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:05:39,538-Speed 10417.48 samples/sec   Loss 7.3877   LearningRate 0.0243   Epoch: 20   Global Step: 102680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:05:40,464-Speed 11068.15 samples/sec   Loss 7.3897   LearningRate 0.0242   Epoch: 20   Global Step: 102690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:05:41,372-Speed 11285.84 samples/sec   Loss 7.4526   LearningRate 0.0242   Epoch: 20   Global Step: 102700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:05:42,316-Speed 10858.55 samples/sec   Loss 7.5306   LearningRate 0.0242   Epoch: 20   Global Step: 102710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:05:43,266-Speed 10789.53 samples/sec   Loss 7.4113   LearningRate 0.0242   Epoch: 20   Global Step: 102720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:05:44,259-Speed 10329.12 samples/sec   Loss 7.4176   LearningRate 0.0242   Epoch: 20   Global Step: 102730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:05:45,209-Speed 10794.59 samples/sec   Loss 7.3822   LearningRate 0.0242   Epoch: 20   Global Step: 102740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:05:46,144-Speed 10958.76 samples/sec   Loss 7.2839   LearningRate 0.0242   Epoch: 20   Global Step: 102750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:05:47,066-Speed 11109.35 samples/sec   Loss 7.4627   LearningRate 0.0242   Epoch: 20   Global Step: 102760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:05:48,081-Speed 10106.79 samples/sec   Loss 7.2760   LearningRate 0.0242   Epoch: 20   Global Step: 102770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:05:49,056-Speed 10514.37 samples/sec   Loss 7.4670   LearningRate 0.0242   Epoch: 20   Global Step: 102780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:05:49,987-Speed 11003.43 samples/sec   Loss 7.5037   LearningRate 0.0242   Epoch: 20   Global Step: 102790   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:05:50,915-Speed 11035.72 samples/sec   Loss 7.6488   LearningRate 0.0242   Epoch: 20   Global Step: 102800   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:05:51,887-Speed 10548.60 samples/sec   Loss 7.3837   LearningRate 0.0242   Epoch: 20   Global Step: 102810   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:05:52,860-Speed 10533.38 samples/sec   Loss 7.4997   LearningRate 0.0242   Epoch: 20   Global Step: 102820   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:05:53,861-Speed 10235.82 samples/sec   Loss 7.3310   LearningRate 0.0242   Epoch: 20   Global Step: 102830   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:05:54,848-Speed 10379.56 samples/sec   Loss 7.4647   LearningRate 0.0242   Epoch: 20   Global Step: 102840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:05:55,803-Speed 10740.89 samples/sec   Loss 7.5014   LearningRate 0.0242   Epoch: 20   Global Step: 102850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:05:56,754-Speed 10772.23 samples/sec   Loss 7.4910   LearningRate 0.0242   Epoch: 20   Global Step: 102860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:05:57,730-Speed 10507.64 samples/sec   Loss 7.3447   LearningRate 0.0242   Epoch: 20   Global Step: 102870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:05:58,686-Speed 10722.25 samples/sec   Loss 7.5219   LearningRate 0.0242   Epoch: 20   Global Step: 102880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:05:59,718-Speed 9925.41 samples/sec   Loss 7.3005   LearningRate 0.0242   Epoch: 20   Global Step: 102890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:06:00,673-Speed 10738.86 samples/sec   Loss 7.4212   LearningRate 0.0241   Epoch: 20   Global Step: 102900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:06:01,618-Speed 10849.06 samples/sec   Loss 7.3583   LearningRate 0.0241   Epoch: 20   Global Step: 102910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:06:02,604-Speed 10392.06 samples/sec   Loss 7.4116   LearningRate 0.0241   Epoch: 20   Global Step: 102920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:06:03,585-Speed 10449.17 samples/sec   Loss 7.2323   LearningRate 0.0241   Epoch: 20   Global Step: 102930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:06:04,545-Speed 10675.63 samples/sec   Loss 7.4001   LearningRate 0.0241   Epoch: 20   Global Step: 102940   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:06:05,511-Speed 10613.97 samples/sec   Loss 7.4215   LearningRate 0.0241   Epoch: 20   Global Step: 102950   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:06:06,479-Speed 10591.11 samples/sec   Loss 7.4131   LearningRate 0.0241   Epoch: 20   Global Step: 102960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:06:07,478-Speed 10248.02 samples/sec   Loss 7.4081   LearningRate 0.0241   Epoch: 20   Global Step: 102970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:06:08,452-Speed 10535.69 samples/sec   Loss 7.4982   LearningRate 0.0241   Epoch: 20   Global Step: 102980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:06:09,422-Speed 10560.56 samples/sec   Loss 7.5119   LearningRate 0.0241   Epoch: 20   Global Step: 102990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:06:10,367-Speed 10852.69 samples/sec   Loss 7.4409   LearningRate 0.0241   Epoch: 20   Global Step: 103000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:06:11,303-Speed 10949.04 samples/sec   Loss 7.4710   LearningRate 0.0241   Epoch: 20   Global Step: 103010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:06:12,346-Speed 9831.95 samples/sec   Loss 7.4036   LearningRate 0.0241   Epoch: 20   Global Step: 103020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:06:13,254-Speed 11290.37 samples/sec   Loss 7.3770   LearningRate 0.0241   Epoch: 20   Global Step: 103030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:06:14,159-Speed 11313.25 samples/sec   Loss 7.4580   LearningRate 0.0241   Epoch: 20   Global Step: 103040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:06:15,136-Speed 10500.74 samples/sec   Loss 7.3395   LearningRate 0.0241   Epoch: 20   Global Step: 103050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:06:16,091-Speed 10728.95 samples/sec   Loss 7.4176   LearningRate 0.0241   Epoch: 20   Global Step: 103060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:06:17,096-Speed 10195.95 samples/sec   Loss 7.5361   LearningRate 0.0241   Epoch: 20   Global Step: 103070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:06:18,042-Speed 10833.79 samples/sec   Loss 7.4474   LearningRate 0.0241   Epoch: 20   Global Step: 103080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:06:19,012-Speed 10572.88 samples/sec   Loss 7.3093   LearningRate 0.0241   Epoch: 20   Global Step: 103090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:06:20,004-Speed 10333.28 samples/sec   Loss 7.5554   LearningRate 0.0241   Epoch: 20   Global Step: 103100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:06:20,974-Speed 10570.81 samples/sec   Loss 7.5010   LearningRate 0.0240   Epoch: 20   Global Step: 103110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:06:21,924-Speed 10786.47 samples/sec   Loss 7.3719   LearningRate 0.0240   Epoch: 20   Global Step: 103120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:06:22,894-Speed 10566.92 samples/sec   Loss 7.4869   LearningRate 0.0240   Epoch: 20   Global Step: 103130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:06:23,967-Speed 9553.36 samples/sec   Loss 7.5784   LearningRate 0.0240   Epoch: 20   Global Step: 103140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:06:24,951-Speed 10414.94 samples/sec   Loss 7.2831   LearningRate 0.0240   Epoch: 20   Global Step: 103150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:06:25,875-Speed 11086.93 samples/sec   Loss 7.4557   LearningRate 0.0240   Epoch: 20   Global Step: 103160   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:06:26,821-Speed 10828.29 samples/sec   Loss 7.3884   LearningRate 0.0240   Epoch: 20   Global Step: 103170   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:06:27,801-Speed 10462.57 samples/sec   Loss 7.3502   LearningRate 0.0240   Epoch: 20   Global Step: 103180   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:06:28,718-Speed 11186.70 samples/sec   Loss 7.3382   LearningRate 0.0240   Epoch: 20   Global Step: 103190   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:06:29,649-Speed 11007.86 samples/sec   Loss 7.4527   LearningRate 0.0240   Epoch: 20   Global Step: 103200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:06:30,597-Speed 10813.83 samples/sec   Loss 7.2371   LearningRate 0.0240   Epoch: 20   Global Step: 103210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:06:31,564-Speed 10590.15 samples/sec   Loss 7.2996   LearningRate 0.0240   Epoch: 20   Global Step: 103220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:06:32,610-Speed 9797.11 samples/sec   Loss 7.3207   LearningRate 0.0240   Epoch: 20   Global Step: 103230   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:06:33,583-Speed 10541.27 samples/sec   Loss 7.4754   LearningRate 0.0240   Epoch: 20   Global Step: 103240   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:06:34,487-Speed 11336.76 samples/sec   Loss 7.4864   LearningRate 0.0240   Epoch: 20   Global Step: 103250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:06:35,451-Speed 10629.80 samples/sec   Loss 7.5416   LearningRate 0.0240   Epoch: 20   Global Step: 103260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:06:36,387-Speed 10945.81 samples/sec   Loss 7.4074   LearningRate 0.0240   Epoch: 20   Global Step: 103270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:06:37,350-Speed 10647.07 samples/sec   Loss 7.4312   LearningRate 0.0240   Epoch: 20   Global Step: 103280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:06:38,310-Speed 10675.12 samples/sec   Loss 7.5186   LearningRate 0.0240   Epoch: 20   Global Step: 103290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:06:39,275-Speed 10632.90 samples/sec   Loss 7.7063   LearningRate 0.0240   Epoch: 20   Global Step: 103300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:06:40,216-Speed 10880.84 samples/sec   Loss 7.6445   LearningRate 0.0239   Epoch: 20   Global Step: 103310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:06:41,231-Speed 10098.93 samples/sec   Loss 7.5297   LearningRate 0.0239   Epoch: 20   Global Step: 103320   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:06:42,257-Speed 9993.22 samples/sec   Loss 7.2777   LearningRate 0.0239   Epoch: 20   Global Step: 103330   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:06:43,188-Speed 11015.78 samples/sec   Loss 7.5352   LearningRate 0.0239   Epoch: 20   Global Step: 103340   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:06:44,117-Speed 11036.45 samples/sec   Loss 7.5339   LearningRate 0.0239   Epoch: 20   Global Step: 103350   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:06:45,038-Speed 11126.24 samples/sec   Loss 7.5502   LearningRate 0.0239   Epoch: 20   Global Step: 103360   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:06:46,018-Speed 10455.84 samples/sec   Loss 7.5507   LearningRate 0.0239   Epoch: 20   Global Step: 103370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:06:47,043-Speed 9998.62 samples/sec   Loss 7.4078   LearningRate 0.0239   Epoch: 20   Global Step: 103380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:06:47,975-Speed 11007.12 samples/sec   Loss 7.3749   LearningRate 0.0239   Epoch: 20   Global Step: 103390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:06:48,981-Speed 10183.97 samples/sec   Loss 7.4932   LearningRate 0.0239   Epoch: 20   Global Step: 103400   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:06:49,904-Speed 11098.45 samples/sec   Loss 7.4850   LearningRate 0.0239   Epoch: 20   Global Step: 103410   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:06:50,932-Speed 9964.11 samples/sec   Loss 7.3983   LearningRate 0.0239   Epoch: 20   Global Step: 103420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:06:51,896-Speed 10638.13 samples/sec   Loss 7.5173   LearningRate 0.0239   Epoch: 20   Global Step: 103430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:06:52,854-Speed 10706.20 samples/sec   Loss 7.5944   LearningRate 0.0239   Epoch: 20   Global Step: 103440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:06:53,818-Speed 10625.84 samples/sec   Loss 7.5647   LearningRate 0.0239   Epoch: 20   Global Step: 103450   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:06:54,750-Speed 10999.98 samples/sec   Loss 7.2830   LearningRate 0.0239   Epoch: 20   Global Step: 103460   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:06:55,735-Speed 10396.56 samples/sec   Loss 7.5679   LearningRate 0.0239   Epoch: 20   Global Step: 103470   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:06:56,693-Speed 10707.63 samples/sec   Loss 7.4426   LearningRate 0.0239   Epoch: 20   Global Step: 103480   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:06:57,629-Speed 10947.83 samples/sec   Loss 7.4459   LearningRate 0.0239   Epoch: 20   Global Step: 103490   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:06:58,559-Speed 11022.44 samples/sec   Loss 7.3711   LearningRate 0.0239   Epoch: 20   Global Step: 103500   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:06:59,549-Speed 10356.94 samples/sec   Loss 7.4928   LearningRate 0.0239   Epoch: 20   Global Step: 103510   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:07:00,522-Speed 10528.43 samples/sec   Loss 7.5428   LearningRate 0.0238   Epoch: 20   Global Step: 103520   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:07:01,463-Speed 10896.42 samples/sec   Loss 7.6987   LearningRate 0.0238   Epoch: 20   Global Step: 103530   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:07:02,440-Speed 10486.17 samples/sec   Loss 7.4037   LearningRate 0.0238   Epoch: 20   Global Step: 103540   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:07:03,403-Speed 10651.61 samples/sec   Loss 7.3168   LearningRate 0.0238   Epoch: 20   Global Step: 103550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:07:04,353-Speed 10790.53 samples/sec   Loss 7.4405   LearningRate 0.0238   Epoch: 20   Global Step: 103560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:07:05,316-Speed 10641.58 samples/sec   Loss 7.4314   LearningRate 0.0238   Epoch: 20   Global Step: 103570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:07:06,278-Speed 10653.23 samples/sec   Loss 7.4938   LearningRate 0.0238   Epoch: 20   Global Step: 103580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:07:07,247-Speed 10573.69 samples/sec   Loss 7.4159   LearningRate 0.0238   Epoch: 20   Global Step: 103590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:07:08,231-Speed 10419.14 samples/sec   Loss 7.4045   LearningRate 0.0238   Epoch: 20   Global Step: 103600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:07:09,202-Speed 10566.57 samples/sec   Loss 7.4086   LearningRate 0.0238   Epoch: 20   Global Step: 103610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:07:10,110-Speed 11287.00 samples/sec   Loss 7.4377   LearningRate 0.0238   Epoch: 20   Global Step: 103620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:07:11,101-Speed 10339.17 samples/sec   Loss 7.5202   LearningRate 0.0238   Epoch: 20   Global Step: 103630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:07:12,045-Speed 10853.09 samples/sec   Loss 7.5682   LearningRate 0.0238   Epoch: 20   Global Step: 103640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:07:12,985-Speed 10902.51 samples/sec   Loss 7.4713   LearningRate 0.0238   Epoch: 20   Global Step: 103650   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:07:13,922-Speed 10941.60 samples/sec   Loss 7.5401   LearningRate 0.0238   Epoch: 20   Global Step: 103660   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:07:14,896-Speed 10520.75 samples/sec   Loss 7.5179   LearningRate 0.0238   Epoch: 20   Global Step: 103670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:07:15,857-Speed 10665.43 samples/sec   Loss 7.5752   LearningRate 0.0238   Epoch: 20   Global Step: 103680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:07:16,831-Speed 10528.10 samples/sec   Loss 7.4311   LearningRate 0.0238   Epoch: 20   Global Step: 103690   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:07:17,803-Speed 10547.09 samples/sec   Loss 7.4745   LearningRate 0.0238   Epoch: 20   Global Step: 103700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:07:18,780-Speed 10494.23 samples/sec   Loss 7.4445   LearningRate 0.0238   Epoch: 20   Global Step: 103710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:07:19,763-Speed 10420.60 samples/sec   Loss 7.4899   LearningRate 0.0238   Epoch: 20   Global Step: 103720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:07:20,726-Speed 10647.96 samples/sec   Loss 7.4895   LearningRate 0.0237   Epoch: 20   Global Step: 103730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:07:21,715-Speed 10366.85 samples/sec   Loss 7.3785   LearningRate 0.0237   Epoch: 20   Global Step: 103740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:07:22,655-Speed 10903.85 samples/sec   Loss 7.6337   LearningRate 0.0237   Epoch: 20   Global Step: 103750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:07:23,625-Speed 10567.76 samples/sec   Loss 7.5112   LearningRate 0.0237   Epoch: 20   Global Step: 103760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:07:24,583-Speed 10702.44 samples/sec   Loss 7.3913   LearningRate 0.0237   Epoch: 20   Global Step: 103770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:07:25,489-Speed 11309.63 samples/sec   Loss 7.5489   LearningRate 0.0237   Epoch: 20   Global Step: 103780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:07:26,423-Speed 10980.45 samples/sec   Loss 7.5174   LearningRate 0.0237   Epoch: 20   Global Step: 103790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:07:27,366-Speed 10868.03 samples/sec   Loss 7.5447   LearningRate 0.0237   Epoch: 20   Global Step: 103800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:07:28,337-Speed 10589.32 samples/sec   Loss 7.4205   LearningRate 0.0237   Epoch: 20   Global Step: 103810   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:07:29,317-Speed 10457.46 samples/sec   Loss 7.3453   LearningRate 0.0237   Epoch: 20   Global Step: 103820   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:07:30,298-Speed 10449.19 samples/sec   Loss 7.4653   LearningRate 0.0237   Epoch: 20   Global Step: 103830   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:07:31,252-Speed 10738.84 samples/sec   Loss 7.5034   LearningRate 0.0237   Epoch: 20   Global Step: 103840   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:07:32,202-Speed 10796.17 samples/sec   Loss 7.5010   LearningRate 0.0237   Epoch: 20   Global Step: 103850   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:07:33,169-Speed 10601.89 samples/sec   Loss 7.5034   LearningRate 0.0237   Epoch: 20   Global Step: 103860   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:07:34,104-Speed 10962.98 samples/sec   Loss 7.5915   LearningRate 0.0237   Epoch: 20   Global Step: 103870   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:07:35,041-Speed 10930.51 samples/sec   Loss 7.6247   LearningRate 0.0237   Epoch: 20   Global Step: 103880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:07:35,977-Speed 10954.58 samples/sec   Loss 7.4611   LearningRate 0.0237   Epoch: 20   Global Step: 103890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:07:36,926-Speed 10797.82 samples/sec   Loss 7.3542   LearningRate 0.0237   Epoch: 20   Global Step: 103900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:07:37,865-Speed 10908.98 samples/sec   Loss 7.3961   LearningRate 0.0237   Epoch: 20   Global Step: 103910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:07:38,882-Speed 10080.41 samples/sec   Loss 7.5433   LearningRate 0.0237   Epoch: 20   Global Step: 103920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:07:39,813-Speed 11019.39 samples/sec   Loss 7.6436   LearningRate 0.0236   Epoch: 20   Global Step: 103930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:07:40,779-Speed 10601.13 samples/sec   Loss 7.6162   LearningRate 0.0236   Epoch: 20   Global Step: 103940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:07:41,753-Speed 10517.55 samples/sec   Loss 7.6218   LearningRate 0.0236   Epoch: 20   Global Step: 103950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:07:42,746-Speed 10330.55 samples/sec   Loss 7.5490   LearningRate 0.0236   Epoch: 20   Global Step: 103960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:07:43,684-Speed 10923.77 samples/sec   Loss 7.4879   LearningRate 0.0236   Epoch: 20   Global Step: 103970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:07:44,606-Speed 11127.07 samples/sec   Loss 7.6257   LearningRate 0.0236   Epoch: 20   Global Step: 103980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:07:45,596-Speed 10351.14 samples/sec   Loss 7.5207   LearningRate 0.0236   Epoch: 20   Global Step: 103990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:07:46,633-Speed 9879.67 samples/sec   Loss 7.6225   LearningRate 0.0236   Epoch: 20   Global Step: 104000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:08:08,868-[lfw][104000]XNorm: 10.634067
Training: 2022-04-11 03:08:08,869-[lfw][104000]Accuracy-Flip: 0.99600+-0.00281
Training: 2022-04-11 03:08:08,870-[lfw][104000]Accuracy-Highest: 0.99667
Training: 2022-04-11 03:08:34,311-[cfp_fp][104000]XNorm: 9.036723
Training: 2022-04-11 03:08:34,311-[cfp_fp][104000]Accuracy-Flip: 0.96600+-0.01063
Training: 2022-04-11 03:08:34,313-[cfp_fp][104000]Accuracy-Highest: 0.96600
Training: 2022-04-11 03:08:56,314-[agedb_30][104000]XNorm: 10.365039
Training: 2022-04-11 03:08:56,315-[agedb_30][104000]Accuracy-Flip: 0.97017+-0.00681
Training: 2022-04-11 03:08:56,315-[agedb_30][104000]Accuracy-Highest: 0.97017
Training: 2022-04-11 03:08:57,297-Speed 144.91 samples/sec   Loss 7.5843   LearningRate 0.0236   Epoch: 20   Global Step: 104010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:08:58,294-Speed 10277.53 samples/sec   Loss 7.4505   LearningRate 0.0236   Epoch: 20   Global Step: 104020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:08:59,219-Speed 11090.69 samples/sec   Loss 7.5697   LearningRate 0.0236   Epoch: 20   Global Step: 104030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:00,171-Speed 10765.81 samples/sec   Loss 7.6249   LearningRate 0.0236   Epoch: 20   Global Step: 104040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:01,119-Speed 10805.03 samples/sec   Loss 7.5704   LearningRate 0.0236   Epoch: 20   Global Step: 104050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:02,098-Speed 10471.71 samples/sec   Loss 7.4121   LearningRate 0.0236   Epoch: 20   Global Step: 104060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:03,029-Speed 11015.97 samples/sec   Loss 7.6106   LearningRate 0.0236   Epoch: 20   Global Step: 104070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:04,000-Speed 10564.01 samples/sec   Loss 7.5718   LearningRate 0.0236   Epoch: 20   Global Step: 104080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:04,963-Speed 10633.38 samples/sec   Loss 7.4643   LearningRate 0.0236   Epoch: 20   Global Step: 104090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:05,920-Speed 10705.81 samples/sec   Loss 7.6772   LearningRate 0.0236   Epoch: 20   Global Step: 104100   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:09:06,872-Speed 10771.34 samples/sec   Loss 7.5642   LearningRate 0.0236   Epoch: 20   Global Step: 104110   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:09:07,828-Speed 10718.33 samples/sec   Loss 7.4581   LearningRate 0.0236   Epoch: 20   Global Step: 104120   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:09:08,776-Speed 10813.22 samples/sec   Loss 7.4614   LearningRate 0.0236   Epoch: 20   Global Step: 104130   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:09:09,704-Speed 11052.78 samples/sec   Loss 7.4777   LearningRate 0.0235   Epoch: 20   Global Step: 104140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:10,684-Speed 10446.86 samples/sec   Loss 7.5930   LearningRate 0.0235   Epoch: 20   Global Step: 104150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:11,674-Speed 10366.84 samples/sec   Loss 7.5323   LearningRate 0.0235   Epoch: 20   Global Step: 104160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:12,625-Speed 10780.62 samples/sec   Loss 7.5453   LearningRate 0.0235   Epoch: 20   Global Step: 104170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:13,580-Speed 10726.83 samples/sec   Loss 7.4767   LearningRate 0.0235   Epoch: 20   Global Step: 104180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:14,512-Speed 11000.39 samples/sec   Loss 7.7385   LearningRate 0.0235   Epoch: 20   Global Step: 104190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:15,509-Speed 10284.39 samples/sec   Loss 7.4766   LearningRate 0.0235   Epoch: 20   Global Step: 104200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:16,428-Speed 11155.87 samples/sec   Loss 7.6359   LearningRate 0.0235   Epoch: 20   Global Step: 104210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:17,393-Speed 10621.35 samples/sec   Loss 7.3712   LearningRate 0.0235   Epoch: 20   Global Step: 104220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:18,352-Speed 10693.94 samples/sec   Loss 7.4546   LearningRate 0.0235   Epoch: 20   Global Step: 104230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:19,323-Speed 10553.72 samples/sec   Loss 7.7052   LearningRate 0.0235   Epoch: 20   Global Step: 104240   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:09:20,283-Speed 10672.22 samples/sec   Loss 7.7042   LearningRate 0.0235   Epoch: 20   Global Step: 104250   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:09:21,261-Speed 10486.27 samples/sec   Loss 7.6560   LearningRate 0.0235   Epoch: 20   Global Step: 104260   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:09:22,263-Speed 10234.88 samples/sec   Loss 7.5003   LearningRate 0.0235   Epoch: 20   Global Step: 104270   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:09:23,228-Speed 10630.13 samples/sec   Loss 7.3952   LearningRate 0.0235   Epoch: 20   Global Step: 104280   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:09:24,178-Speed 10790.32 samples/sec   Loss 7.5739   LearningRate 0.0235   Epoch: 20   Global Step: 104290   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:09:25,179-Speed 10238.60 samples/sec   Loss 7.5899   LearningRate 0.0235   Epoch: 20   Global Step: 104300   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:09:26,150-Speed 10548.64 samples/sec   Loss 7.3817   LearningRate 0.0235   Epoch: 20   Global Step: 104310   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:09:27,139-Speed 10375.51 samples/sec   Loss 7.6283   LearningRate 0.0235   Epoch: 20   Global Step: 104320   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:09:28,024-Speed 11582.04 samples/sec   Loss 7.5884   LearningRate 0.0235   Epoch: 20   Global Step: 104330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:28,932-Speed 11288.69 samples/sec   Loss 7.3796   LearningRate 0.0235   Epoch: 20   Global Step: 104340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:29,931-Speed 10265.91 samples/sec   Loss 7.5755   LearningRate 0.0234   Epoch: 20   Global Step: 104350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:30,890-Speed 10736.03 samples/sec   Loss 7.5630   LearningRate 0.0234   Epoch: 20   Global Step: 104360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:31,850-Speed 10667.21 samples/sec   Loss 7.6135   LearningRate 0.0234   Epoch: 20   Global Step: 104370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:32,812-Speed 10666.28 samples/sec   Loss 7.3892   LearningRate 0.0234   Epoch: 20   Global Step: 104380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:33,800-Speed 10369.87 samples/sec   Loss 7.5349   LearningRate 0.0234   Epoch: 20   Global Step: 104390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:34,782-Speed 10441.72 samples/sec   Loss 7.6526   LearningRate 0.0234   Epoch: 20   Global Step: 104400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:35,733-Speed 10771.89 samples/sec   Loss 7.6848   LearningRate 0.0234   Epoch: 20   Global Step: 104410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:36,703-Speed 10558.91 samples/sec   Loss 7.5878   LearningRate 0.0234   Epoch: 20   Global Step: 104420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:37,654-Speed 10780.33 samples/sec   Loss 7.4712   LearningRate 0.0234   Epoch: 20   Global Step: 104430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:09:38,601-Speed 10823.57 samples/sec   Loss 7.5680   LearningRate 0.0234   Epoch: 20   Global Step: 104440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:39,572-Speed 10552.03 samples/sec   Loss 7.6881   LearningRate 0.0234   Epoch: 20   Global Step: 104450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:40,538-Speed 10621.25 samples/sec   Loss 7.5688   LearningRate 0.0234   Epoch: 20   Global Step: 104460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:41,468-Speed 11023.95 samples/sec   Loss 7.4548   LearningRate 0.0234   Epoch: 20   Global Step: 104470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:42,401-Speed 11005.98 samples/sec   Loss 7.5619   LearningRate 0.0234   Epoch: 20   Global Step: 104480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:43,375-Speed 10521.69 samples/sec   Loss 7.6404   LearningRate 0.0234   Epoch: 20   Global Step: 104490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:44,353-Speed 10483.20 samples/sec   Loss 7.4900   LearningRate 0.0234   Epoch: 20   Global Step: 104500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:45,315-Speed 10648.46 samples/sec   Loss 7.3342   LearningRate 0.0234   Epoch: 20   Global Step: 104510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:46,259-Speed 10863.52 samples/sec   Loss 7.5888   LearningRate 0.0234   Epoch: 20   Global Step: 104520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:47,221-Speed 10647.75 samples/sec   Loss 7.4823   LearningRate 0.0234   Epoch: 20   Global Step: 104530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:48,213-Speed 10341.96 samples/sec   Loss 7.7439   LearningRate 0.0234   Epoch: 20   Global Step: 104540   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:09:49,166-Speed 10746.33 samples/sec   Loss 7.6870   LearningRate 0.0234   Epoch: 20   Global Step: 104550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:50,127-Speed 10681.63 samples/sec   Loss 7.5525   LearningRate 0.0233   Epoch: 20   Global Step: 104560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:51,075-Speed 10814.15 samples/sec   Loss 7.4607   LearningRate 0.0233   Epoch: 20   Global Step: 104570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:52,032-Speed 10716.05 samples/sec   Loss 7.4768   LearningRate 0.0233   Epoch: 20   Global Step: 104580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:52,973-Speed 10894.99 samples/sec   Loss 7.4129   LearningRate 0.0233   Epoch: 20   Global Step: 104590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:53,938-Speed 10611.42 samples/sec   Loss 7.5761   LearningRate 0.0233   Epoch: 20   Global Step: 104600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:54,889-Speed 10777.33 samples/sec   Loss 7.4914   LearningRate 0.0233   Epoch: 20   Global Step: 104610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:55,841-Speed 10764.46 samples/sec   Loss 7.5798   LearningRate 0.0233   Epoch: 20   Global Step: 104620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:56,794-Speed 10764.17 samples/sec   Loss 7.6986   LearningRate 0.0233   Epoch: 20   Global Step: 104630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:57,727-Speed 10984.43 samples/sec   Loss 7.5460   LearningRate 0.0233   Epoch: 20   Global Step: 104640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:09:58,677-Speed 10780.75 samples/sec   Loss 7.6121   LearningRate 0.0233   Epoch: 20   Global Step: 104650   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:09:59,595-Speed 11175.46 samples/sec   Loss 7.5611   LearningRate 0.0233   Epoch: 20   Global Step: 104660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:00,573-Speed 10483.42 samples/sec   Loss 7.4989   LearningRate 0.0233   Epoch: 20   Global Step: 104670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:01,520-Speed 10816.21 samples/sec   Loss 7.4975   LearningRate 0.0233   Epoch: 20   Global Step: 104680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:02,488-Speed 10592.17 samples/sec   Loss 7.4480   LearningRate 0.0233   Epoch: 20   Global Step: 104690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:03,431-Speed 10874.81 samples/sec   Loss 7.5087   LearningRate 0.0233   Epoch: 20   Global Step: 104700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:04,399-Speed 10591.68 samples/sec   Loss 7.5573   LearningRate 0.0233   Epoch: 20   Global Step: 104710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:05,367-Speed 10586.87 samples/sec   Loss 7.4335   LearningRate 0.0233   Epoch: 20   Global Step: 104720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:06,323-Speed 10720.59 samples/sec   Loss 7.7078   LearningRate 0.0233   Epoch: 20   Global Step: 104730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:07,284-Speed 10657.35 samples/sec   Loss 7.4490   LearningRate 0.0233   Epoch: 20   Global Step: 104740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:08,305-Speed 10042.31 samples/sec   Loss 7.3131   LearningRate 0.0233   Epoch: 20   Global Step: 104750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:09,256-Speed 10780.42 samples/sec   Loss 7.5264   LearningRate 0.0233   Epoch: 20   Global Step: 104760   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:10:10,247-Speed 10335.16 samples/sec   Loss 7.5854   LearningRate 0.0232   Epoch: 20   Global Step: 104770   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:10:11,163-Speed 11189.62 samples/sec   Loss 7.6095   LearningRate 0.0232   Epoch: 20   Global Step: 104780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:12,123-Speed 10678.44 samples/sec   Loss 7.6320   LearningRate 0.0232   Epoch: 20   Global Step: 104790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:13,072-Speed 10794.72 samples/sec   Loss 7.6802   LearningRate 0.0232   Epoch: 20   Global Step: 104800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:14,031-Speed 10692.34 samples/sec   Loss 7.3897   LearningRate 0.0232   Epoch: 20   Global Step: 104810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:14,970-Speed 10918.44 samples/sec   Loss 7.6614   LearningRate 0.0232   Epoch: 20   Global Step: 104820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:15,897-Speed 11052.37 samples/sec   Loss 7.5016   LearningRate 0.0232   Epoch: 20   Global Step: 104830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:16,845-Speed 10813.67 samples/sec   Loss 7.5754   LearningRate 0.0232   Epoch: 20   Global Step: 104840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:17,801-Speed 10719.82 samples/sec   Loss 7.6774   LearningRate 0.0232   Epoch: 20   Global Step: 104850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:18,724-Speed 11108.23 samples/sec   Loss 7.5092   LearningRate 0.0232   Epoch: 20   Global Step: 104860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:19,686-Speed 10657.87 samples/sec   Loss 7.5031   LearningRate 0.0232   Epoch: 20   Global Step: 104870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:20,611-Speed 11079.48 samples/sec   Loss 7.5543   LearningRate 0.0232   Epoch: 20   Global Step: 104880   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:10:21,531-Speed 11136.55 samples/sec   Loss 7.5809   LearningRate 0.0232   Epoch: 20   Global Step: 104890   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:10:22,466-Speed 10971.79 samples/sec   Loss 7.3922   LearningRate 0.0232   Epoch: 20   Global Step: 104900   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:10:23,408-Speed 10879.30 samples/sec   Loss 7.6034   LearningRate 0.0232   Epoch: 20   Global Step: 104910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:24,347-Speed 10910.19 samples/sec   Loss 7.5369   LearningRate 0.0232   Epoch: 20   Global Step: 104920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:25,300-Speed 10752.85 samples/sec   Loss 7.4608   LearningRate 0.0232   Epoch: 20   Global Step: 104930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:26,262-Speed 10653.53 samples/sec   Loss 7.5706   LearningRate 0.0232   Epoch: 20   Global Step: 104940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:27,229-Speed 10604.61 samples/sec   Loss 7.5324   LearningRate 0.0232   Epoch: 20   Global Step: 104950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:28,176-Speed 10820.81 samples/sec   Loss 7.5289   LearningRate 0.0232   Epoch: 20   Global Step: 104960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:29,154-Speed 10482.22 samples/sec   Loss 7.7058   LearningRate 0.0232   Epoch: 20   Global Step: 104970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:30,155-Speed 10238.35 samples/sec   Loss 7.6989   LearningRate 0.0231   Epoch: 20   Global Step: 104980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:31,131-Speed 10508.84 samples/sec   Loss 7.4855   LearningRate 0.0231   Epoch: 20   Global Step: 104990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:32,117-Speed 10394.21 samples/sec   Loss 7.7243   LearningRate 0.0231   Epoch: 20   Global Step: 105000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:33,082-Speed 10620.29 samples/sec   Loss 7.5452   LearningRate 0.0231   Epoch: 20   Global Step: 105010   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:10:34,047-Speed 10623.42 samples/sec   Loss 7.5089   LearningRate 0.0231   Epoch: 20   Global Step: 105020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:35,030-Speed 10427.85 samples/sec   Loss 7.5444   LearningRate 0.0231   Epoch: 20   Global Step: 105030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:35,965-Speed 10961.16 samples/sec   Loss 7.5800   LearningRate 0.0231   Epoch: 20   Global Step: 105040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:36,907-Speed 10875.89 samples/sec   Loss 7.6204   LearningRate 0.0231   Epoch: 20   Global Step: 105050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:37,888-Speed 10450.46 samples/sec   Loss 7.4704   LearningRate 0.0231   Epoch: 20   Global Step: 105060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:38,869-Speed 10448.30 samples/sec   Loss 7.5771   LearningRate 0.0231   Epoch: 20   Global Step: 105070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:39,879-Speed 10151.40 samples/sec   Loss 7.6478   LearningRate 0.0231   Epoch: 20   Global Step: 105080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:40,814-Speed 10970.72 samples/sec   Loss 7.7025   LearningRate 0.0231   Epoch: 20   Global Step: 105090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:10:41,733-Speed 11144.56 samples/sec   Loss 7.5112   LearningRate 0.0231   Epoch: 20   Global Step: 105100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:10:42,690-Speed 10705.93 samples/sec   Loss 7.3900   LearningRate 0.0231   Epoch: 20   Global Step: 105110   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:10:43,669-Speed 10473.45 samples/sec   Loss 7.4761   LearningRate 0.0231   Epoch: 20   Global Step: 105120   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:10:44,599-Speed 11020.85 samples/sec   Loss 7.4944   LearningRate 0.0231   Epoch: 20   Global Step: 105130   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:10:45,548-Speed 10799.37 samples/sec   Loss 7.4374   LearningRate 0.0231   Epoch: 20   Global Step: 105140   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:10:46,499-Speed 10778.04 samples/sec   Loss 7.6044   LearningRate 0.0231   Epoch: 20   Global Step: 105150   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:10:47,547-Speed 9778.68 samples/sec   Loss 7.7264   LearningRate 0.0231   Epoch: 20   Global Step: 105160   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:10:48,451-Speed 11341.06 samples/sec   Loss 7.6074   LearningRate 0.0231   Epoch: 20   Global Step: 105170   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:10:49,372-Speed 11137.32 samples/sec   Loss 7.4729   LearningRate 0.0231   Epoch: 20   Global Step: 105180   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:10:50,385-Speed 10109.62 samples/sec   Loss 7.5765   LearningRate 0.0230   Epoch: 20   Global Step: 105190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:51,318-Speed 10995.16 samples/sec   Loss 7.5918   LearningRate 0.0230   Epoch: 20   Global Step: 105200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:52,254-Speed 10950.22 samples/sec   Loss 7.4936   LearningRate 0.0230   Epoch: 20   Global Step: 105210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:53,195-Speed 10889.65 samples/sec   Loss 7.6245   LearningRate 0.0230   Epoch: 20   Global Step: 105220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:54,185-Speed 10348.13 samples/sec   Loss 7.6491   LearningRate 0.0230   Epoch: 20   Global Step: 105230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:55,113-Speed 11052.63 samples/sec   Loss 7.4082   LearningRate 0.0230   Epoch: 20   Global Step: 105240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:56,027-Speed 11220.29 samples/sec   Loss 7.5133   LearningRate 0.0230   Epoch: 20   Global Step: 105250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:56,961-Speed 10965.88 samples/sec   Loss 7.6654   LearningRate 0.0230   Epoch: 20   Global Step: 105260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:57,983-Speed 10029.44 samples/sec   Loss 7.7180   LearningRate 0.0230   Epoch: 20   Global Step: 105270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:58,916-Speed 10990.08 samples/sec   Loss 7.5461   LearningRate 0.0230   Epoch: 20   Global Step: 105280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:10:59,889-Speed 10538.94 samples/sec   Loss 7.6465   LearningRate 0.0230   Epoch: 20   Global Step: 105290   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:11:00,853-Speed 10629.07 samples/sec   Loss 7.6806   LearningRate 0.0230   Epoch: 20   Global Step: 105300   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:11:01,819-Speed 10611.44 samples/sec   Loss 7.6809   LearningRate 0.0230   Epoch: 20   Global Step: 105310   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:11:02,798-Speed 10469.18 samples/sec   Loss 7.4240   LearningRate 0.0230   Epoch: 20   Global Step: 105320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:03,764-Speed 10613.36 samples/sec   Loss 7.6915   LearningRate 0.0230   Epoch: 20   Global Step: 105330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:04,722-Speed 10696.74 samples/sec   Loss 7.5744   LearningRate 0.0230   Epoch: 20   Global Step: 105340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:05,669-Speed 10818.46 samples/sec   Loss 7.5071   LearningRate 0.0230   Epoch: 20   Global Step: 105350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:06,616-Speed 10825.25 samples/sec   Loss 7.4501   LearningRate 0.0230   Epoch: 20   Global Step: 105360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:07,521-Speed 11317.77 samples/sec   Loss 7.6612   LearningRate 0.0230   Epoch: 20   Global Step: 105370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:08,553-Speed 9932.17 samples/sec   Loss 7.5966   LearningRate 0.0230   Epoch: 20   Global Step: 105380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:09,522-Speed 10585.25 samples/sec   Loss 7.6611   LearningRate 0.0230   Epoch: 20   Global Step: 105390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:10,486-Speed 10631.75 samples/sec   Loss 7.6563   LearningRate 0.0229   Epoch: 20   Global Step: 105400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:11,439-Speed 10749.57 samples/sec   Loss 7.4765   LearningRate 0.0229   Epoch: 20   Global Step: 105410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:12,397-Speed 10704.74 samples/sec   Loss 7.5866   LearningRate 0.0229   Epoch: 20   Global Step: 105420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:13,311-Speed 11220.03 samples/sec   Loss 7.6436   LearningRate 0.0229   Epoch: 20   Global Step: 105430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:14,288-Speed 10497.78 samples/sec   Loss 7.5209   LearningRate 0.0229   Epoch: 20   Global Step: 105440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:15,235-Speed 10817.68 samples/sec   Loss 7.4251   LearningRate 0.0229   Epoch: 20   Global Step: 105450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:16,221-Speed 10396.13 samples/sec   Loss 7.4207   LearningRate 0.0229   Epoch: 20   Global Step: 105460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:17,252-Speed 9940.71 samples/sec   Loss 7.6028   LearningRate 0.0229   Epoch: 20   Global Step: 105470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:18,186-Speed 10967.22 samples/sec   Loss 7.4696   LearningRate 0.0229   Epoch: 20   Global Step: 105480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:19,148-Speed 10662.43 samples/sec   Loss 7.5484   LearningRate 0.0229   Epoch: 20   Global Step: 105490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:20,088-Speed 10896.62 samples/sec   Loss 7.5553   LearningRate 0.0229   Epoch: 20   Global Step: 105500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:21,061-Speed 10534.65 samples/sec   Loss 7.6915   LearningRate 0.0229   Epoch: 20   Global Step: 105510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:22,037-Speed 10497.39 samples/sec   Loss 7.4751   LearningRate 0.0229   Epoch: 20   Global Step: 105520   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:11:22,983-Speed 10833.25 samples/sec   Loss 7.4384   LearningRate 0.0229   Epoch: 20   Global Step: 105530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:23,953-Speed 10571.88 samples/sec   Loss 7.4862   LearningRate 0.0229   Epoch: 20   Global Step: 105540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:24,905-Speed 10769.14 samples/sec   Loss 7.4579   LearningRate 0.0229   Epoch: 20   Global Step: 105550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:25,837-Speed 11002.65 samples/sec   Loss 7.5693   LearningRate 0.0229   Epoch: 20   Global Step: 105560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:26,801-Speed 10634.03 samples/sec   Loss 7.5046   LearningRate 0.0229   Epoch: 20   Global Step: 105570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:27,753-Speed 10762.34 samples/sec   Loss 7.5566   LearningRate 0.0229   Epoch: 20   Global Step: 105580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:28,718-Speed 10629.50 samples/sec   Loss 7.4945   LearningRate 0.0229   Epoch: 20   Global Step: 105590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:29,673-Speed 10724.95 samples/sec   Loss 7.6378   LearningRate 0.0229   Epoch: 20   Global Step: 105600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:30,666-Speed 10329.62 samples/sec   Loss 7.6468   LearningRate 0.0228   Epoch: 20   Global Step: 105610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:31,609-Speed 10862.44 samples/sec   Loss 7.6292   LearningRate 0.0228   Epoch: 20   Global Step: 105620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:32,582-Speed 10539.78 samples/sec   Loss 7.5693   LearningRate 0.0228   Epoch: 20   Global Step: 105630   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:11:33,540-Speed 10692.84 samples/sec   Loss 7.6099   LearningRate 0.0228   Epoch: 20   Global Step: 105640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:34,512-Speed 10551.87 samples/sec   Loss 7.5780   LearningRate 0.0228   Epoch: 20   Global Step: 105650   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:11:35,463-Speed 10780.08 samples/sec   Loss 7.5180   LearningRate 0.0228   Epoch: 20   Global Step: 105660   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:11:36,417-Speed 10743.97 samples/sec   Loss 7.6031   LearningRate 0.0228   Epoch: 20   Global Step: 105670   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:11:37,353-Speed 10954.47 samples/sec   Loss 7.6956   LearningRate 0.0228   Epoch: 20   Global Step: 105680   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:11:38,288-Speed 10957.61 samples/sec   Loss 7.6665   LearningRate 0.0228   Epoch: 20   Global Step: 105690   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:11:39,243-Speed 10735.30 samples/sec   Loss 7.5810   LearningRate 0.0228   Epoch: 20   Global Step: 105700   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:11:40,195-Speed 10767.13 samples/sec   Loss 7.4733   LearningRate 0.0228   Epoch: 20   Global Step: 105710   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:11:41,147-Speed 10769.78 samples/sec   Loss 7.4132   LearningRate 0.0228   Epoch: 20   Global Step: 105720   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:11:42,115-Speed 10589.34 samples/sec   Loss 7.7220   LearningRate 0.0228   Epoch: 20   Global Step: 105730   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:11:43,170-Speed 9718.52 samples/sec   Loss 7.6196   LearningRate 0.0228   Epoch: 20   Global Step: 105740   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:11:44,154-Speed 10415.80 samples/sec   Loss 7.5572   LearningRate 0.0228   Epoch: 20   Global Step: 105750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:45,121-Speed 10601.17 samples/sec   Loss 7.7754   LearningRate 0.0228   Epoch: 20   Global Step: 105760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:46,112-Speed 10335.40 samples/sec   Loss 7.5178   LearningRate 0.0228   Epoch: 20   Global Step: 105770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:47,073-Speed 10668.38 samples/sec   Loss 7.7303   LearningRate 0.0228   Epoch: 20   Global Step: 105780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:48,020-Speed 10823.51 samples/sec   Loss 7.6885   LearningRate 0.0228   Epoch: 20   Global Step: 105790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:48,975-Speed 10737.69 samples/sec   Loss 7.7366   LearningRate 0.0228   Epoch: 20   Global Step: 105800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:49,963-Speed 10371.45 samples/sec   Loss 7.5315   LearningRate 0.0228   Epoch: 20   Global Step: 105810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:50,952-Speed 10364.41 samples/sec   Loss 7.5456   LearningRate 0.0227   Epoch: 20   Global Step: 105820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:51,922-Speed 10572.42 samples/sec   Loss 7.6073   LearningRate 0.0227   Epoch: 20   Global Step: 105830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:52,902-Speed 10450.91 samples/sec   Loss 7.4803   LearningRate 0.0227   Epoch: 20   Global Step: 105840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:53,831-Speed 11034.55 samples/sec   Loss 7.6281   LearningRate 0.0227   Epoch: 20   Global Step: 105850   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:11:54,797-Speed 10610.87 samples/sec   Loss 7.7077   LearningRate 0.0227   Epoch: 20   Global Step: 105860   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:11:55,697-Speed 11391.39 samples/sec   Loss 7.5708   LearningRate 0.0227   Epoch: 20   Global Step: 105870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:56,655-Speed 10693.47 samples/sec   Loss 7.4965   LearningRate 0.0227   Epoch: 20   Global Step: 105880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:57,628-Speed 10544.19 samples/sec   Loss 7.5465   LearningRate 0.0227   Epoch: 20   Global Step: 105890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:58,546-Speed 11172.92 samples/sec   Loss 7.5859   LearningRate 0.0227   Epoch: 20   Global Step: 105900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:11:59,491-Speed 10839.70 samples/sec   Loss 7.5179   LearningRate 0.0227   Epoch: 20   Global Step: 105910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:12:00,421-Speed 11016.76 samples/sec   Loss 7.5820   LearningRate 0.0227   Epoch: 20   Global Step: 105920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:12:01,397-Speed 10500.33 samples/sec   Loss 7.5500   LearningRate 0.0227   Epoch: 20   Global Step: 105930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:12:02,354-Speed 10713.57 samples/sec   Loss 7.4057   LearningRate 0.0227   Epoch: 20   Global Step: 105940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:12:03,310-Speed 10729.67 samples/sec   Loss 7.4854   LearningRate 0.0227   Epoch: 20   Global Step: 105950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:12:04,300-Speed 10348.02 samples/sec   Loss 7.6005   LearningRate 0.0227   Epoch: 20   Global Step: 105960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:12:05,279-Speed 10469.46 samples/sec   Loss 7.5560   LearningRate 0.0227   Epoch: 20   Global Step: 105970   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:12:06,219-Speed 10905.00 samples/sec   Loss 7.6163   LearningRate 0.0227   Epoch: 20   Global Step: 105980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:12:07,172-Speed 10756.96 samples/sec   Loss 7.4703   LearningRate 0.0227   Epoch: 20   Global Step: 105990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:12:08,082-Speed 11253.59 samples/sec   Loss 7.5518   LearningRate 0.0227   Epoch: 20   Global Step: 106000   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:12:30,651-[lfw][106000]XNorm: 10.345657
Training: 2022-04-11 03:12:30,652-[lfw][106000]Accuracy-Flip: 0.99500+-0.00428
Training: 2022-04-11 03:12:30,652-[lfw][106000]Accuracy-Highest: 0.99667
Training: 2022-04-11 03:12:56,239-[cfp_fp][106000]XNorm: 8.843255
Training: 2022-04-11 03:12:56,240-[cfp_fp][106000]Accuracy-Flip: 0.95843+-0.00916
Training: 2022-04-11 03:12:56,241-[cfp_fp][106000]Accuracy-Highest: 0.96600
Training: 2022-04-11 03:13:18,523-[agedb_30][106000]XNorm: 10.157230
Training: 2022-04-11 03:13:18,523-[agedb_30][106000]Accuracy-Flip: 0.96550+-0.00633
Training: 2022-04-11 03:13:18,524-[agedb_30][106000]Accuracy-Highest: 0.97017
Training: 2022-04-11 03:13:19,446-Speed 143.49 samples/sec   Loss 7.6192   LearningRate 0.0227   Epoch: 20   Global Step: 106010   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:13:20,420-Speed 10525.27 samples/sec   Loss 7.6436   LearningRate 0.0227   Epoch: 20   Global Step: 106020   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:13:21,429-Speed 10153.87 samples/sec   Loss 7.6015   LearningRate 0.0227   Epoch: 20   Global Step: 106030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:13:22,365-Speed 10956.64 samples/sec   Loss 7.5124   LearningRate 0.0226   Epoch: 20   Global Step: 106040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:13:23,320-Speed 10736.09 samples/sec   Loss 7.5748   LearningRate 0.0226   Epoch: 20   Global Step: 106050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:13:24,276-Speed 10727.05 samples/sec   Loss 7.5651   LearningRate 0.0226   Epoch: 20   Global Step: 106060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:13:25,232-Speed 10720.13 samples/sec   Loss 7.5765   LearningRate 0.0226   Epoch: 20   Global Step: 106070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:13:26,173-Speed 10894.30 samples/sec   Loss 7.5609   LearningRate 0.0226   Epoch: 20   Global Step: 106080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:13:27,110-Speed 10934.89 samples/sec   Loss 7.6585   LearningRate 0.0226   Epoch: 20   Global Step: 106090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:13:28,094-Speed 10413.81 samples/sec   Loss 7.5291   LearningRate 0.0226   Epoch: 20   Global Step: 106100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:13:29,025-Speed 11008.98 samples/sec   Loss 7.6356   LearningRate 0.0226   Epoch: 20   Global Step: 106110   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:13:29,980-Speed 10739.02 samples/sec   Loss 7.5617   LearningRate 0.0226   Epoch: 20   Global Step: 106120   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:13:30,924-Speed 10857.31 samples/sec   Loss 7.5461   LearningRate 0.0226   Epoch: 20   Global Step: 106130   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:13:31,883-Speed 10688.64 samples/sec   Loss 7.4880   LearningRate 0.0226   Epoch: 20   Global Step: 106140   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:13:32,849-Speed 10602.58 samples/sec   Loss 7.4252   LearningRate 0.0226   Epoch: 20   Global Step: 106150   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:13:33,830-Speed 10455.67 samples/sec   Loss 7.5869   LearningRate 0.0226   Epoch: 20   Global Step: 106160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:13:34,813-Speed 10430.62 samples/sec   Loss 7.5771   LearningRate 0.0226   Epoch: 20   Global Step: 106170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:13:35,773-Speed 10678.87 samples/sec   Loss 7.5355   LearningRate 0.0226   Epoch: 20   Global Step: 106180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:13:36,731-Speed 10697.91 samples/sec   Loss 7.7966   LearningRate 0.0226   Epoch: 20   Global Step: 106190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:13:37,656-Speed 11078.60 samples/sec   Loss 7.4983   LearningRate 0.0226   Epoch: 20   Global Step: 106200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:13:38,724-Speed 9592.91 samples/sec   Loss 7.5581   LearningRate 0.0226   Epoch: 20   Global Step: 106210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:13:48,579-Speed 1039.22 samples/sec   Loss 7.3474   LearningRate 0.0226   Epoch: 21   Global Step: 106220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:13:49,816-Speed 8298.63 samples/sec   Loss 6.6370   LearningRate 0.0226   Epoch: 21   Global Step: 106230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:13:50,903-Speed 9430.87 samples/sec   Loss 6.8526   LearningRate 0.0226   Epoch: 21   Global Step: 106240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:13:51,865-Speed 10646.87 samples/sec   Loss 6.7329   LearningRate 0.0225   Epoch: 21   Global Step: 106250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:13:52,897-Speed 9931.56 samples/sec   Loss 6.7546   LearningRate 0.0225   Epoch: 21   Global Step: 106260   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:13:53,864-Speed 10600.20 samples/sec   Loss 6.8013   LearningRate 0.0225   Epoch: 21   Global Step: 106270   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:13:54,976-Speed 9212.39 samples/sec   Loss 6.7081   LearningRate 0.0225   Epoch: 21   Global Step: 106280   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:13:55,915-Speed 10919.29 samples/sec   Loss 6.6308   LearningRate 0.0225   Epoch: 21   Global Step: 106290   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:13:56,883-Speed 10592.68 samples/sec   Loss 6.8602   LearningRate 0.0225   Epoch: 21   Global Step: 106300   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:13:57,830-Speed 10820.23 samples/sec   Loss 6.6122   LearningRate 0.0225   Epoch: 21   Global Step: 106310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:13:58,780-Speed 10787.59 samples/sec   Loss 6.6457   LearningRate 0.0225   Epoch: 21   Global Step: 106320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:13:59,778-Speed 10261.03 samples/sec   Loss 6.6880   LearningRate 0.0225   Epoch: 21   Global Step: 106330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:14:00,781-Speed 10233.33 samples/sec   Loss 6.8302   LearningRate 0.0225   Epoch: 21   Global Step: 106340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:14:01,730-Speed 10796.45 samples/sec   Loss 6.7851   LearningRate 0.0225   Epoch: 21   Global Step: 106350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:14:02,698-Speed 10589.44 samples/sec   Loss 6.9627   LearningRate 0.0225   Epoch: 21   Global Step: 106360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:14:03,650-Speed 10765.35 samples/sec   Loss 6.7533   LearningRate 0.0225   Epoch: 21   Global Step: 106370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:14:04,599-Speed 10792.64 samples/sec   Loss 6.8796   LearningRate 0.0225   Epoch: 21   Global Step: 106380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:14:05,573-Speed 10521.12 samples/sec   Loss 6.7860   LearningRate 0.0225   Epoch: 21   Global Step: 106390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:14:06,519-Speed 10839.96 samples/sec   Loss 6.7086   LearningRate 0.0225   Epoch: 21   Global Step: 106400   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:14:07,443-Speed 11084.73 samples/sec   Loss 6.7651   LearningRate 0.0225   Epoch: 21   Global Step: 106410   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:14:08,404-Speed 10673.62 samples/sec   Loss 6.7307   LearningRate 0.0225   Epoch: 21   Global Step: 106420   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:14:09,387-Speed 10421.54 samples/sec   Loss 6.8055   LearningRate 0.0225   Epoch: 21   Global Step: 106430   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:14:10,337-Speed 10783.17 samples/sec   Loss 6.7651   LearningRate 0.0225   Epoch: 21   Global Step: 106440   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:14:11,300-Speed 10643.64 samples/sec   Loss 6.7279   LearningRate 0.0225   Epoch: 21   Global Step: 106450   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:14:12,270-Speed 10567.37 samples/sec   Loss 6.9020   LearningRate 0.0224   Epoch: 21   Global Step: 106460   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:14:13,222-Speed 10764.35 samples/sec   Loss 6.7154   LearningRate 0.0224   Epoch: 21   Global Step: 106470   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:14:14,168-Speed 10839.58 samples/sec   Loss 6.8581   LearningRate 0.0224   Epoch: 21   Global Step: 106480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:14:15,129-Speed 10657.18 samples/sec   Loss 6.8891   LearningRate 0.0224   Epoch: 21   Global Step: 106490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:14:16,070-Speed 10893.51 samples/sec   Loss 6.7110   LearningRate 0.0224   Epoch: 21   Global Step: 106500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:14:17,083-Speed 10119.17 samples/sec   Loss 6.7475   LearningRate 0.0224   Epoch: 21   Global Step: 106510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:14:18,046-Speed 10646.42 samples/sec   Loss 6.7234   LearningRate 0.0224   Epoch: 21   Global Step: 106520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:14:19,006-Speed 10673.61 samples/sec   Loss 6.8348   LearningRate 0.0224   Epoch: 21   Global Step: 106530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:14:19,959-Speed 10753.07 samples/sec   Loss 6.9009   LearningRate 0.0224   Epoch: 21   Global Step: 106540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:14:20,988-Speed 9963.12 samples/sec   Loss 6.7155   LearningRate 0.0224   Epoch: 21   Global Step: 106550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:14:21,945-Speed 10703.96 samples/sec   Loss 6.9796   LearningRate 0.0224   Epoch: 21   Global Step: 106560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:14:22,909-Speed 10629.93 samples/sec   Loss 6.8006   LearningRate 0.0224   Epoch: 21   Global Step: 106570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:14:23,851-Speed 10885.04 samples/sec   Loss 6.9147   LearningRate 0.0224   Epoch: 21   Global Step: 106580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:14:24,837-Speed 10394.36 samples/sec   Loss 6.8873   LearningRate 0.0224   Epoch: 21   Global Step: 106590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:14:25,799-Speed 10651.95 samples/sec   Loss 7.0044   LearningRate 0.0224   Epoch: 21   Global Step: 106600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:14:26,749-Speed 10788.94 samples/sec   Loss 6.9768   LearningRate 0.0224   Epoch: 21   Global Step: 106610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:14:27,704-Speed 10741.59 samples/sec   Loss 6.7635   LearningRate 0.0224   Epoch: 21   Global Step: 106620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:14:28,656-Speed 10756.34 samples/sec   Loss 7.0897   LearningRate 0.0224   Epoch: 21   Global Step: 106630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:14:29,624-Speed 10592.69 samples/sec   Loss 7.0221   LearningRate 0.0224   Epoch: 21   Global Step: 106640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:14:30,615-Speed 10343.96 samples/sec   Loss 6.8832   LearningRate 0.0224   Epoch: 21   Global Step: 106650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:14:31,571-Speed 10723.58 samples/sec   Loss 7.0287   LearningRate 0.0224   Epoch: 21   Global Step: 106660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:14:32,544-Speed 10555.82 samples/sec   Loss 6.7930   LearningRate 0.0224   Epoch: 21   Global Step: 106670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:14:33,513-Speed 10572.28 samples/sec   Loss 6.8290   LearningRate 0.0223   Epoch: 21   Global Step: 106680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:14:34,473-Speed 10682.33 samples/sec   Loss 6.9213   LearningRate 0.0223   Epoch: 21   Global Step: 106690   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:14:35,443-Speed 10570.66 samples/sec   Loss 6.8453   LearningRate 0.0223   Epoch: 21   Global Step: 106700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:14:36,382-Speed 10911.73 samples/sec   Loss 7.0334   LearningRate 0.0223   Epoch: 21   Global Step: 106710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:14:37,297-Speed 11200.15 samples/sec   Loss 6.9785   LearningRate 0.0223   Epoch: 21   Global Step: 106720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:14:38,253-Speed 10720.53 samples/sec   Loss 6.9806   LearningRate 0.0223   Epoch: 21   Global Step: 106730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:14:39,201-Speed 10811.86 samples/sec   Loss 7.0573   LearningRate 0.0223   Epoch: 21   Global Step: 106740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:14:40,160-Speed 10688.11 samples/sec   Loss 6.8939   LearningRate 0.0223   Epoch: 21   Global Step: 106750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:14:41,111-Speed 10780.53 samples/sec   Loss 6.9927   LearningRate 0.0223   Epoch: 21   Global Step: 106760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:14:42,061-Speed 10787.85 samples/sec   Loss 6.7938   LearningRate 0.0223   Epoch: 21   Global Step: 106770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:14:43,008-Speed 10817.88 samples/sec   Loss 6.9655   LearningRate 0.0223   Epoch: 21   Global Step: 106780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:14:43,979-Speed 10556.83 samples/sec   Loss 7.0402   LearningRate 0.0223   Epoch: 21   Global Step: 106790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:14:44,909-Speed 11027.94 samples/sec   Loss 6.9670   LearningRate 0.0223   Epoch: 21   Global Step: 106800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:14:45,895-Speed 10395.08 samples/sec   Loss 7.0730   LearningRate 0.0223   Epoch: 21   Global Step: 106810   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:14:46,865-Speed 10554.15 samples/sec   Loss 6.9776   LearningRate 0.0223   Epoch: 21   Global Step: 106820   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:14:47,841-Speed 10503.84 samples/sec   Loss 6.8635   LearningRate 0.0223   Epoch: 21   Global Step: 106830   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:14:48,798-Speed 10721.32 samples/sec   Loss 6.9225   LearningRate 0.0223   Epoch: 21   Global Step: 106840   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:14:49,768-Speed 10570.01 samples/sec   Loss 6.9182   LearningRate 0.0223   Epoch: 21   Global Step: 106850   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:14:50,732-Speed 10630.75 samples/sec   Loss 6.9526   LearningRate 0.0223   Epoch: 21   Global Step: 106860   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:14:51,681-Speed 10804.95 samples/sec   Loss 7.0493   LearningRate 0.0223   Epoch: 21   Global Step: 106870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:14:52,608-Speed 11049.53 samples/sec   Loss 7.1370   LearningRate 0.0223   Epoch: 21   Global Step: 106880   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:14:53,588-Speed 10464.66 samples/sec   Loss 6.9579   LearningRate 0.0222   Epoch: 21   Global Step: 106890   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:14:54,568-Speed 10463.40 samples/sec   Loss 6.9617   LearningRate 0.0222   Epoch: 21   Global Step: 106900   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:14:55,498-Speed 11024.65 samples/sec   Loss 7.0777   LearningRate 0.0222   Epoch: 21   Global Step: 106910   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:14:56,536-Speed 9869.62 samples/sec   Loss 6.9956   LearningRate 0.0222   Epoch: 21   Global Step: 106920   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:14:57,506-Speed 10561.07 samples/sec   Loss 7.0655   LearningRate 0.0222   Epoch: 21   Global Step: 106930   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:14:58,555-Speed 9777.05 samples/sec   Loss 6.9675   LearningRate 0.0222   Epoch: 21   Global Step: 106940   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:14:59,504-Speed 10803.62 samples/sec   Loss 7.0873   LearningRate 0.0222   Epoch: 21   Global Step: 106950   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:15:00,432-Speed 11044.88 samples/sec   Loss 7.1681   LearningRate 0.0222   Epoch: 21   Global Step: 106960   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:15:01,390-Speed 10702.83 samples/sec   Loss 6.9308   LearningRate 0.0222   Epoch: 21   Global Step: 106970   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:15:02,432-Speed 9825.88 samples/sec   Loss 7.1038   LearningRate 0.0222   Epoch: 21   Global Step: 106980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:15:03,409-Speed 10502.71 samples/sec   Loss 7.0422   LearningRate 0.0222   Epoch: 21   Global Step: 106990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:15:04,390-Speed 10475.00 samples/sec   Loss 7.0228   LearningRate 0.0222   Epoch: 21   Global Step: 107000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:15:05,338-Speed 10820.51 samples/sec   Loss 7.1785   LearningRate 0.0222   Epoch: 21   Global Step: 107010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:15:06,318-Speed 10453.88 samples/sec   Loss 7.1357   LearningRate 0.0222   Epoch: 21   Global Step: 107020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:15:07,254-Speed 10948.24 samples/sec   Loss 6.9599   LearningRate 0.0222   Epoch: 21   Global Step: 107030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:15:08,193-Speed 10921.07 samples/sec   Loss 7.0081   LearningRate 0.0222   Epoch: 21   Global Step: 107040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:15:09,171-Speed 10481.23 samples/sec   Loss 6.9695   LearningRate 0.0222   Epoch: 21   Global Step: 107050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:15:10,112-Speed 10892.77 samples/sec   Loss 7.2941   LearningRate 0.0222   Epoch: 21   Global Step: 107060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:15:11,096-Speed 10415.26 samples/sec   Loss 7.0683   LearningRate 0.0222   Epoch: 21   Global Step: 107070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:15:12,030-Speed 10976.49 samples/sec   Loss 7.0491   LearningRate 0.0222   Epoch: 21   Global Step: 107080   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:15:12,957-Speed 11051.18 samples/sec   Loss 7.1283   LearningRate 0.0222   Epoch: 21   Global Step: 107090   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:15:13,930-Speed 10533.88 samples/sec   Loss 7.1284   LearningRate 0.0222   Epoch: 21   Global Step: 107100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:15:14,917-Speed 10395.82 samples/sec   Loss 7.1717   LearningRate 0.0221   Epoch: 21   Global Step: 107110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:15:15,841-Speed 11081.32 samples/sec   Loss 6.9586   LearningRate 0.0221   Epoch: 21   Global Step: 107120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:15:16,814-Speed 10539.85 samples/sec   Loss 7.0501   LearningRate 0.0221   Epoch: 21   Global Step: 107130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:15:17,777-Speed 10640.49 samples/sec   Loss 7.1131   LearningRate 0.0221   Epoch: 21   Global Step: 107140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:15:18,726-Speed 10809.04 samples/sec   Loss 7.0000   LearningRate 0.0221   Epoch: 21   Global Step: 107150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:15:19,697-Speed 10557.18 samples/sec   Loss 7.0520   LearningRate 0.0221   Epoch: 21   Global Step: 107160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:15:20,651-Speed 10740.31 samples/sec   Loss 7.3062   LearningRate 0.0221   Epoch: 21   Global Step: 107170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:15:21,602-Speed 10782.01 samples/sec   Loss 6.9780   LearningRate 0.0221   Epoch: 21   Global Step: 107180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:15:22,524-Speed 11113.86 samples/sec   Loss 7.0508   LearningRate 0.0221   Epoch: 21   Global Step: 107190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:15:23,471-Speed 10822.97 samples/sec   Loss 6.9362   LearningRate 0.0221   Epoch: 21   Global Step: 107200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:15:24,448-Speed 10488.91 samples/sec   Loss 7.1650   LearningRate 0.0221   Epoch: 21   Global Step: 107210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:15:25,421-Speed 10536.86 samples/sec   Loss 7.1892   LearningRate 0.0221   Epoch: 21   Global Step: 107220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:15:26,383-Speed 10647.20 samples/sec   Loss 7.2704   LearningRate 0.0221   Epoch: 21   Global Step: 107230   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:15:27,312-Speed 11036.81 samples/sec   Loss 7.2062   LearningRate 0.0221   Epoch: 21   Global Step: 107240   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:15:28,297-Speed 10401.31 samples/sec   Loss 7.1402   LearningRate 0.0221   Epoch: 21   Global Step: 107250   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:15:29,310-Speed 10120.47 samples/sec   Loss 7.1373   LearningRate 0.0221   Epoch: 21   Global Step: 107260   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:15:30,281-Speed 10553.97 samples/sec   Loss 7.0947   LearningRate 0.0221   Epoch: 21   Global Step: 107270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:15:31,221-Speed 10905.27 samples/sec   Loss 7.1890   LearningRate 0.0221   Epoch: 21   Global Step: 107280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:15:32,169-Speed 10823.30 samples/sec   Loss 7.1239   LearningRate 0.0221   Epoch: 21   Global Step: 107290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:15:33,129-Speed 10670.29 samples/sec   Loss 7.2340   LearningRate 0.0221   Epoch: 21   Global Step: 107300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:15:34,022-Speed 11474.45 samples/sec   Loss 7.2211   LearningRate 0.0221   Epoch: 21   Global Step: 107310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:15:34,964-Speed 10884.25 samples/sec   Loss 7.2575   LearningRate 0.0220   Epoch: 21   Global Step: 107320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:15:35,919-Speed 10725.37 samples/sec   Loss 7.0374   LearningRate 0.0220   Epoch: 21   Global Step: 107330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:15:36,939-Speed 10053.95 samples/sec   Loss 7.0615   LearningRate 0.0220   Epoch: 21   Global Step: 107340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:15:37,887-Speed 10811.58 samples/sec   Loss 7.1498   LearningRate 0.0220   Epoch: 21   Global Step: 107350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:15:38,871-Speed 10408.31 samples/sec   Loss 7.1990   LearningRate 0.0220   Epoch: 21   Global Step: 107360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:15:39,855-Speed 10416.11 samples/sec   Loss 7.2891   LearningRate 0.0220   Epoch: 21   Global Step: 107370   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:15:40,818-Speed 10639.22 samples/sec   Loss 7.1597   LearningRate 0.0220   Epoch: 21   Global Step: 107380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:15:41,767-Speed 10800.98 samples/sec   Loss 7.2392   LearningRate 0.0220   Epoch: 21   Global Step: 107390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:15:42,749-Speed 10438.38 samples/sec   Loss 7.1256   LearningRate 0.0220   Epoch: 21   Global Step: 107400   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:15:43,698-Speed 10802.40 samples/sec   Loss 7.1952   LearningRate 0.0220   Epoch: 21   Global Step: 107410   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:15:44,629-Speed 10999.47 samples/sec   Loss 7.1380   LearningRate 0.0220   Epoch: 21   Global Step: 107420   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:15:45,632-Speed 10216.42 samples/sec   Loss 7.0653   LearningRate 0.0220   Epoch: 21   Global Step: 107430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:15:46,567-Speed 10963.21 samples/sec   Loss 7.1023   LearningRate 0.0220   Epoch: 21   Global Step: 107440   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:15:47,515-Speed 10814.57 samples/sec   Loss 7.1372   LearningRate 0.0220   Epoch: 21   Global Step: 107450   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:15:48,444-Speed 11025.29 samples/sec   Loss 7.2849   LearningRate 0.0220   Epoch: 21   Global Step: 107460   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:15:49,405-Speed 10674.19 samples/sec   Loss 7.2173   LearningRate 0.0220   Epoch: 21   Global Step: 107470   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:15:50,398-Speed 10317.52 samples/sec   Loss 7.1057   LearningRate 0.0220   Epoch: 21   Global Step: 107480   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:15:51,328-Speed 11018.68 samples/sec   Loss 7.1945   LearningRate 0.0220   Epoch: 21   Global Step: 107490   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:15:52,349-Speed 10039.12 samples/sec   Loss 7.2533   LearningRate 0.0220   Epoch: 21   Global Step: 107500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:15:53,290-Speed 10886.32 samples/sec   Loss 7.1878   LearningRate 0.0220   Epoch: 21   Global Step: 107510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:15:54,245-Speed 10739.87 samples/sec   Loss 7.1674   LearningRate 0.0220   Epoch: 21   Global Step: 107520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:15:55,240-Speed 10298.69 samples/sec   Loss 7.1089   LearningRate 0.0220   Epoch: 21   Global Step: 107530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:15:56,171-Speed 11015.18 samples/sec   Loss 7.2807   LearningRate 0.0219   Epoch: 21   Global Step: 107540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:15:57,092-Speed 11124.04 samples/sec   Loss 7.1885   LearningRate 0.0219   Epoch: 21   Global Step: 107550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:15:58,033-Speed 10893.60 samples/sec   Loss 7.1770   LearningRate 0.0219   Epoch: 21   Global Step: 107560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:15:59,016-Speed 10427.61 samples/sec   Loss 7.1478   LearningRate 0.0219   Epoch: 21   Global Step: 107570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:15:59,976-Speed 10678.32 samples/sec   Loss 7.2510   LearningRate 0.0219   Epoch: 21   Global Step: 107580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:16:00,941-Speed 10619.21 samples/sec   Loss 7.2394   LearningRate 0.0219   Epoch: 21   Global Step: 107590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:16:01,884-Speed 10871.75 samples/sec   Loss 7.1909   LearningRate 0.0219   Epoch: 21   Global Step: 107600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:16:02,819-Speed 10972.25 samples/sec   Loss 7.2686   LearningRate 0.0219   Epoch: 21   Global Step: 107610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:16:03,773-Speed 10743.24 samples/sec   Loss 7.1751   LearningRate 0.0219   Epoch: 21   Global Step: 107620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:16:04,787-Speed 10106.14 samples/sec   Loss 7.2402   LearningRate 0.0219   Epoch: 21   Global Step: 107630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:16:05,762-Speed 10504.43 samples/sec   Loss 7.2121   LearningRate 0.0219   Epoch: 21   Global Step: 107640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:16:06,711-Speed 10809.65 samples/sec   Loss 7.0968   LearningRate 0.0219   Epoch: 21   Global Step: 107650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:16:07,648-Speed 10929.08 samples/sec   Loss 7.3006   LearningRate 0.0219   Epoch: 21   Global Step: 107660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:16:08,588-Speed 10907.72 samples/sec   Loss 7.2699   LearningRate 0.0219   Epoch: 21   Global Step: 107670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:16:09,573-Speed 10395.76 samples/sec   Loss 7.2339   LearningRate 0.0219   Epoch: 21   Global Step: 107680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:16:10,515-Speed 10881.49 samples/sec   Loss 7.2653   LearningRate 0.0219   Epoch: 21   Global Step: 107690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:16:11,493-Speed 10481.86 samples/sec   Loss 7.1588   LearningRate 0.0219   Epoch: 21   Global Step: 107700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:16:12,432-Speed 10916.05 samples/sec   Loss 7.3315   LearningRate 0.0219   Epoch: 21   Global Step: 107710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:16:13,397-Speed 10621.13 samples/sec   Loss 7.1957   LearningRate 0.0219   Epoch: 21   Global Step: 107720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:16:14,344-Speed 10827.88 samples/sec   Loss 7.1380   LearningRate 0.0219   Epoch: 21   Global Step: 107730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:16:15,282-Speed 10925.68 samples/sec   Loss 7.1639   LearningRate 0.0219   Epoch: 21   Global Step: 107740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:16:16,245-Speed 10642.32 samples/sec   Loss 7.2503   LearningRate 0.0218   Epoch: 21   Global Step: 107750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:16:17,203-Speed 10688.54 samples/sec   Loss 7.2658   LearningRate 0.0218   Epoch: 21   Global Step: 107760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:16:18,129-Speed 11074.15 samples/sec   Loss 7.2764   LearningRate 0.0218   Epoch: 21   Global Step: 107770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:16:19,104-Speed 10509.39 samples/sec   Loss 7.3826   LearningRate 0.0218   Epoch: 21   Global Step: 107780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:16:20,058-Speed 10746.26 samples/sec   Loss 7.2562   LearningRate 0.0218   Epoch: 21   Global Step: 107790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:16:20,996-Speed 10927.28 samples/sec   Loss 7.2213   LearningRate 0.0218   Epoch: 21   Global Step: 107800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:16:21,919-Speed 11097.92 samples/sec   Loss 7.2705   LearningRate 0.0218   Epoch: 21   Global Step: 107810   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:16:22,873-Speed 10744.00 samples/sec   Loss 7.3026   LearningRate 0.0218   Epoch: 21   Global Step: 107820   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:16:23,807-Speed 10984.32 samples/sec   Loss 7.1503   LearningRate 0.0218   Epoch: 21   Global Step: 107830   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:16:24,766-Speed 10687.00 samples/sec   Loss 7.2106   LearningRate 0.0218   Epoch: 21   Global Step: 107840   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:16:25,703-Speed 10936.12 samples/sec   Loss 7.2618   LearningRate 0.0218   Epoch: 21   Global Step: 107850   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:16:26,684-Speed 10448.00 samples/sec   Loss 7.1766   LearningRate 0.0218   Epoch: 21   Global Step: 107860   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:16:27,633-Speed 10805.50 samples/sec   Loss 7.2719   LearningRate 0.0218   Epoch: 21   Global Step: 107870   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:16:28,611-Speed 10479.35 samples/sec   Loss 7.2352   LearningRate 0.0218   Epoch: 21   Global Step: 107880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:16:29,572-Speed 10666.99 samples/sec   Loss 7.2721   LearningRate 0.0218   Epoch: 21   Global Step: 107890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:16:30,536-Speed 10625.68 samples/sec   Loss 7.2482   LearningRate 0.0218   Epoch: 21   Global Step: 107900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:16:31,493-Speed 10720.97 samples/sec   Loss 7.2032   LearningRate 0.0218   Epoch: 21   Global Step: 107910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:16:32,447-Speed 10747.01 samples/sec   Loss 7.3163   LearningRate 0.0218   Epoch: 21   Global Step: 107920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:16:33,363-Speed 11187.65 samples/sec   Loss 7.2215   LearningRate 0.0218   Epoch: 21   Global Step: 107930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:16:34,318-Speed 10726.55 samples/sec   Loss 7.2178   LearningRate 0.0218   Epoch: 21   Global Step: 107940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:16:35,239-Speed 11133.27 samples/sec   Loss 7.1684   LearningRate 0.0218   Epoch: 21   Global Step: 107950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:16:36,174-Speed 10958.80 samples/sec   Loss 7.2294   LearningRate 0.0218   Epoch: 21   Global Step: 107960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:16:37,149-Speed 10507.46 samples/sec   Loss 7.2062   LearningRate 0.0217   Epoch: 21   Global Step: 107970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:16:38,061-Speed 11238.37 samples/sec   Loss 7.1818   LearningRate 0.0217   Epoch: 21   Global Step: 107980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:16:39,003-Speed 10882.66 samples/sec   Loss 7.2320   LearningRate 0.0217   Epoch: 21   Global Step: 107990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:16:39,976-Speed 10534.55 samples/sec   Loss 7.1803   LearningRate 0.0217   Epoch: 21   Global Step: 108000   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:17:01,998-[lfw][108000]XNorm: 10.409884
Training: 2022-04-11 03:17:01,998-[lfw][108000]Accuracy-Flip: 0.99550+-0.00380
Training: 2022-04-11 03:17:01,999-[lfw][108000]Accuracy-Highest: 0.99667
Training: 2022-04-11 03:17:27,793-[cfp_fp][108000]XNorm: 8.849245
Training: 2022-04-11 03:17:27,794-[cfp_fp][108000]Accuracy-Flip: 0.96414+-0.00889
Training: 2022-04-11 03:17:27,795-[cfp_fp][108000]Accuracy-Highest: 0.96600
Training: 2022-04-11 03:17:51,508-[agedb_30][108000]XNorm: 10.102646
Training: 2022-04-11 03:17:51,508-[agedb_30][108000]Accuracy-Flip: 0.96817+-0.00867
Training: 2022-04-11 03:17:51,509-[agedb_30][108000]Accuracy-Highest: 0.97017
Training: 2022-04-11 03:17:52,459-Speed 141.28 samples/sec   Loss 7.2066   LearningRate 0.0217   Epoch: 21   Global Step: 108010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:17:53,467-Speed 10168.29 samples/sec   Loss 7.4383   LearningRate 0.0217   Epoch: 21   Global Step: 108020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:17:54,413-Speed 10840.18 samples/sec   Loss 7.2115   LearningRate 0.0217   Epoch: 21   Global Step: 108030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:17:55,361-Speed 10810.31 samples/sec   Loss 7.4624   LearningRate 0.0217   Epoch: 21   Global Step: 108040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:17:56,285-Speed 11102.75 samples/sec   Loss 7.3333   LearningRate 0.0217   Epoch: 21   Global Step: 108050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:17:57,247-Speed 10649.51 samples/sec   Loss 7.3031   LearningRate 0.0217   Epoch: 21   Global Step: 108060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:17:58,213-Speed 10623.04 samples/sec   Loss 7.2560   LearningRate 0.0217   Epoch: 21   Global Step: 108070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:17:59,148-Speed 10969.28 samples/sec   Loss 7.1595   LearningRate 0.0217   Epoch: 21   Global Step: 108080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:00,119-Speed 10548.92 samples/sec   Loss 7.3273   LearningRate 0.0217   Epoch: 21   Global Step: 108090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:01,098-Speed 10464.90 samples/sec   Loss 7.1059   LearningRate 0.0217   Epoch: 21   Global Step: 108100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:02,081-Speed 10428.67 samples/sec   Loss 7.2642   LearningRate 0.0217   Epoch: 21   Global Step: 108110   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:18:02,998-Speed 11175.46 samples/sec   Loss 7.1892   LearningRate 0.0217   Epoch: 21   Global Step: 108120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:03,952-Speed 10745.21 samples/sec   Loss 7.3731   LearningRate 0.0217   Epoch: 21   Global Step: 108130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:04,866-Speed 11226.14 samples/sec   Loss 7.4605   LearningRate 0.0217   Epoch: 21   Global Step: 108140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:05,786-Speed 11160.53 samples/sec   Loss 7.2654   LearningRate 0.0217   Epoch: 21   Global Step: 108150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:06,753-Speed 10597.14 samples/sec   Loss 7.1825   LearningRate 0.0217   Epoch: 21   Global Step: 108160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:07,664-Speed 11250.13 samples/sec   Loss 7.2814   LearningRate 0.0217   Epoch: 21   Global Step: 108170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:08,596-Speed 10994.04 samples/sec   Loss 7.2912   LearningRate 0.0217   Epoch: 21   Global Step: 108180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:09,572-Speed 10501.09 samples/sec   Loss 7.1231   LearningRate 0.0216   Epoch: 21   Global Step: 108190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:10,540-Speed 10585.78 samples/sec   Loss 7.2771   LearningRate 0.0216   Epoch: 21   Global Step: 108200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:11,518-Speed 10487.70 samples/sec   Loss 7.2827   LearningRate 0.0216   Epoch: 21   Global Step: 108210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:12,439-Speed 11124.08 samples/sec   Loss 7.4020   LearningRate 0.0216   Epoch: 21   Global Step: 108220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:18:13,395-Speed 10717.49 samples/sec   Loss 7.3945   LearningRate 0.0216   Epoch: 21   Global Step: 108230   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:18:14,348-Speed 10758.82 samples/sec   Loss 7.2663   LearningRate 0.0216   Epoch: 21   Global Step: 108240   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:18:15,294-Speed 10838.21 samples/sec   Loss 7.3351   LearningRate 0.0216   Epoch: 21   Global Step: 108250   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:18:16,237-Speed 10864.97 samples/sec   Loss 7.2569   LearningRate 0.0216   Epoch: 21   Global Step: 108260   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:18:17,197-Speed 10671.75 samples/sec   Loss 7.2230   LearningRate 0.0216   Epoch: 21   Global Step: 108270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:18,112-Speed 11208.12 samples/sec   Loss 7.3302   LearningRate 0.0216   Epoch: 21   Global Step: 108280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:19,083-Speed 10560.65 samples/sec   Loss 7.3358   LearningRate 0.0216   Epoch: 21   Global Step: 108290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:20,060-Speed 10489.17 samples/sec   Loss 7.3782   LearningRate 0.0216   Epoch: 21   Global Step: 108300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:20,991-Speed 11006.45 samples/sec   Loss 7.4199   LearningRate 0.0216   Epoch: 21   Global Step: 108310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:21,928-Speed 10931.17 samples/sec   Loss 7.3666   LearningRate 0.0216   Epoch: 21   Global Step: 108320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:22,955-Speed 9986.91 samples/sec   Loss 7.2701   LearningRate 0.0216   Epoch: 21   Global Step: 108330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:23,927-Speed 10536.05 samples/sec   Loss 7.3065   LearningRate 0.0216   Epoch: 21   Global Step: 108340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:24,898-Speed 10560.60 samples/sec   Loss 7.3094   LearningRate 0.0216   Epoch: 21   Global Step: 108350   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:18:25,876-Speed 10485.37 samples/sec   Loss 7.3554   LearningRate 0.0216   Epoch: 21   Global Step: 108360   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:18:26,820-Speed 10847.55 samples/sec   Loss 7.2319   LearningRate 0.0216   Epoch: 21   Global Step: 108370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:18:27,814-Speed 10317.79 samples/sec   Loss 7.3307   LearningRate 0.0216   Epoch: 21   Global Step: 108380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:18:28,799-Speed 10400.30 samples/sec   Loss 7.3324   LearningRate 0.0216   Epoch: 21   Global Step: 108390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:18:29,749-Speed 10786.85 samples/sec   Loss 7.4237   LearningRate 0.0215   Epoch: 21   Global Step: 108400   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:18:30,751-Speed 10224.43 samples/sec   Loss 7.3161   LearningRate 0.0215   Epoch: 21   Global Step: 108410   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:18:31,712-Speed 10671.18 samples/sec   Loss 7.3738   LearningRate 0.0215   Epoch: 21   Global Step: 108420   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:18:32,685-Speed 10532.55 samples/sec   Loss 7.3324   LearningRate 0.0215   Epoch: 21   Global Step: 108430   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:18:33,652-Speed 10594.56 samples/sec   Loss 7.2723   LearningRate 0.0215   Epoch: 21   Global Step: 108440   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 03:18:34,588-Speed 10960.31 samples/sec   Loss 7.3020   LearningRate 0.0215   Epoch: 21   Global Step: 108450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:35,535-Speed 10817.06 samples/sec   Loss 7.3903   LearningRate 0.0215   Epoch: 21   Global Step: 108460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:36,495-Speed 10677.82 samples/sec   Loss 7.2707   LearningRate 0.0215   Epoch: 21   Global Step: 108470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:37,455-Speed 10676.72 samples/sec   Loss 7.2449   LearningRate 0.0215   Epoch: 21   Global Step: 108480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:38,438-Speed 10428.38 samples/sec   Loss 7.3329   LearningRate 0.0215   Epoch: 21   Global Step: 108490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:39,395-Speed 10707.50 samples/sec   Loss 7.3019   LearningRate 0.0215   Epoch: 21   Global Step: 108500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:40,359-Speed 10640.74 samples/sec   Loss 7.3128   LearningRate 0.0215   Epoch: 21   Global Step: 108510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:41,328-Speed 10572.63 samples/sec   Loss 7.3400   LearningRate 0.0215   Epoch: 21   Global Step: 108520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:42,282-Speed 10737.65 samples/sec   Loss 7.3101   LearningRate 0.0215   Epoch: 21   Global Step: 108530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:43,239-Speed 10713.77 samples/sec   Loss 7.2271   LearningRate 0.0215   Epoch: 21   Global Step: 108540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:44,256-Speed 10084.63 samples/sec   Loss 7.2267   LearningRate 0.0215   Epoch: 21   Global Step: 108550   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:18:45,208-Speed 10759.10 samples/sec   Loss 7.3948   LearningRate 0.0215   Epoch: 21   Global Step: 108560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:46,192-Speed 10435.28 samples/sec   Loss 7.3273   LearningRate 0.0215   Epoch: 21   Global Step: 108570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:47,130-Speed 10925.01 samples/sec   Loss 7.2809   LearningRate 0.0215   Epoch: 21   Global Step: 108580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:48,100-Speed 10561.30 samples/sec   Loss 7.3597   LearningRate 0.0215   Epoch: 21   Global Step: 108590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:49,044-Speed 10862.07 samples/sec   Loss 7.3264   LearningRate 0.0215   Epoch: 21   Global Step: 108600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:50,024-Speed 10461.61 samples/sec   Loss 7.4061   LearningRate 0.0215   Epoch: 21   Global Step: 108610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:50,987-Speed 10645.35 samples/sec   Loss 7.3603   LearningRate 0.0214   Epoch: 21   Global Step: 108620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:51,956-Speed 10574.31 samples/sec   Loss 7.4882   LearningRate 0.0214   Epoch: 21   Global Step: 108630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:52,934-Speed 10483.64 samples/sec   Loss 7.5778   LearningRate 0.0214   Epoch: 21   Global Step: 108640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:53,905-Speed 10563.36 samples/sec   Loss 7.3245   LearningRate 0.0214   Epoch: 21   Global Step: 108650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:54,925-Speed 10041.33 samples/sec   Loss 7.2752   LearningRate 0.0214   Epoch: 21   Global Step: 108660   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:18:55,849-Speed 11091.73 samples/sec   Loss 7.2516   LearningRate 0.0214   Epoch: 21   Global Step: 108670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:18:56,802-Speed 10752.54 samples/sec   Loss 7.3151   LearningRate 0.0214   Epoch: 21   Global Step: 108680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:18:57,763-Speed 10675.64 samples/sec   Loss 7.4384   LearningRate 0.0214   Epoch: 21   Global Step: 108690   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:18:58,693-Speed 11017.42 samples/sec   Loss 7.2709   LearningRate 0.0214   Epoch: 21   Global Step: 108700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:18:59,663-Speed 10569.45 samples/sec   Loss 7.3171   LearningRate 0.0214   Epoch: 21   Global Step: 108710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:19:00,623-Speed 10670.02 samples/sec   Loss 7.3140   LearningRate 0.0214   Epoch: 21   Global Step: 108720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:19:01,622-Speed 10263.42 samples/sec   Loss 7.3348   LearningRate 0.0214   Epoch: 21   Global Step: 108730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:19:02,544-Speed 11117.10 samples/sec   Loss 7.3559   LearningRate 0.0214   Epoch: 21   Global Step: 108740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:19:03,505-Speed 10669.14 samples/sec   Loss 7.2533   LearningRate 0.0214   Epoch: 21   Global Step: 108750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:19:04,450-Speed 10843.07 samples/sec   Loss 7.2798   LearningRate 0.0214   Epoch: 21   Global Step: 108760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:19:05,401-Speed 10790.56 samples/sec   Loss 7.2564   LearningRate 0.0214   Epoch: 21   Global Step: 108770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:19:06,365-Speed 10630.09 samples/sec   Loss 7.3681   LearningRate 0.0214   Epoch: 21   Global Step: 108780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:19:07,293-Speed 11042.70 samples/sec   Loss 7.3535   LearningRate 0.0214   Epoch: 21   Global Step: 108790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:19:08,234-Speed 10894.08 samples/sec   Loss 7.2812   LearningRate 0.0214   Epoch: 21   Global Step: 108800   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 03:19:09,179-Speed 10864.09 samples/sec   Loss 7.3111   LearningRate 0.0214   Epoch: 21   Global Step: 108810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 03:19:10,120-Speed 10896.88 samples/sec   Loss 7.2381   LearningRate 0.0214   Epoch: 21   Global Step: 108820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:19:11,121-Speed 10234.49 samples/sec   Loss 7.4306   LearningRate 0.0214   Epoch: 21   Global Step: 108830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:19:12,101-Speed 10456.20 samples/sec   Loss 7.4010   LearningRate 0.0213   Epoch: 21   Global Step: 108840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:19:13,066-Speed 10622.00 samples/sec   Loss 7.4194   LearningRate 0.0213   Epoch: 21   Global Step: 108850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:19:14,010-Speed 10860.59 samples/sec   Loss 7.4844   LearningRate 0.0213   Epoch: 21   Global Step: 108860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:19:14,989-Speed 10465.43 samples/sec   Loss 7.3680   LearningRate 0.0213   Epoch: 21   Global Step: 108870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:19:15,957-Speed 10590.34 samples/sec   Loss 7.2990   LearningRate 0.0213   Epoch: 21   Global Step: 108880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:19:16,929-Speed 10554.00 samples/sec   Loss 7.3212   LearningRate 0.0213   Epoch: 21   Global Step: 108890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:19:17,892-Speed 10644.61 samples/sec   Loss 7.2744   LearningRate 0.0213   Epoch: 21   Global Step: 108900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:19:18,850-Speed 10697.66 samples/sec   Loss 7.3279   LearningRate 0.0213   Epoch: 21   Global Step: 108910   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:19:19,819-Speed 10575.09 samples/sec   Loss 7.3690   LearningRate 0.0213   Epoch: 21   Global Step: 108920   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:19:20,781-Speed 10651.67 samples/sec   Loss 7.3927   LearningRate 0.0213   Epoch: 21   Global Step: 108930   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:19:21,773-Speed 10330.55 samples/sec   Loss 7.3792   LearningRate 0.0213   Epoch: 21   Global Step: 108940   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:19:22,709-Speed 10944.98 samples/sec   Loss 7.3459   LearningRate 0.0213   Epoch: 21   Global Step: 108950   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:19:23,677-Speed 10607.70 samples/sec   Loss 7.3965   LearningRate 0.0213   Epoch: 21   Global Step: 108960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:19:24,657-Speed 10460.33 samples/sec   Loss 7.3360   LearningRate 0.0213   Epoch: 21   Global Step: 108970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:19:25,642-Speed 10403.44 samples/sec   Loss 7.4611   LearningRate 0.0213   Epoch: 21   Global Step: 108980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:19:26,627-Speed 10409.09 samples/sec   Loss 7.2786   LearningRate 0.0213   Epoch: 21   Global Step: 108990   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:19:27,593-Speed 10614.52 samples/sec   Loss 7.2701   LearningRate 0.0213   Epoch: 21   Global Step: 109000   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:19:28,561-Speed 10583.02 samples/sec   Loss 7.3373   LearningRate 0.0213   Epoch: 21   Global Step: 109010   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:19:29,551-Speed 10350.66 samples/sec   Loss 7.4869   LearningRate 0.0213   Epoch: 21   Global Step: 109020   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:19:30,477-Speed 11067.94 samples/sec   Loss 7.2244   LearningRate 0.0213   Epoch: 21   Global Step: 109030   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:19:31,440-Speed 10648.04 samples/sec   Loss 7.4316   LearningRate 0.0213   Epoch: 21   Global Step: 109040   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:19:32,431-Speed 10338.06 samples/sec   Loss 7.4299   LearningRate 0.0213   Epoch: 21   Global Step: 109050   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:19:33,403-Speed 10542.37 samples/sec   Loss 7.3112   LearningRate 0.0212   Epoch: 21   Global Step: 109060   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:19:34,339-Speed 10960.51 samples/sec   Loss 7.5415   LearningRate 0.0212   Epoch: 21   Global Step: 109070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:19:35,300-Speed 10658.58 samples/sec   Loss 7.3376   LearningRate 0.0212   Epoch: 21   Global Step: 109080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:19:36,229-Speed 11026.17 samples/sec   Loss 7.5098   LearningRate 0.0212   Epoch: 21   Global Step: 109090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:19:37,185-Speed 10721.13 samples/sec   Loss 7.5409   LearningRate 0.0212   Epoch: 21   Global Step: 109100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:19:38,181-Speed 10290.34 samples/sec   Loss 7.4028   LearningRate 0.0212   Epoch: 21   Global Step: 109110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:19:39,137-Speed 10724.88 samples/sec   Loss 7.3574   LearningRate 0.0212   Epoch: 21   Global Step: 109120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:19:40,100-Speed 10652.45 samples/sec   Loss 7.4642   LearningRate 0.0212   Epoch: 21   Global Step: 109130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:19:41,075-Speed 10504.10 samples/sec   Loss 7.2422   LearningRate 0.0212   Epoch: 21   Global Step: 109140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:19:42,063-Speed 10367.55 samples/sec   Loss 7.2625   LearningRate 0.0212   Epoch: 21   Global Step: 109150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:19:42,997-Speed 10987.67 samples/sec   Loss 7.4731   LearningRate 0.0212   Epoch: 21   Global Step: 109160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:19:43,926-Speed 11024.71 samples/sec   Loss 7.2937   LearningRate 0.0212   Epoch: 21   Global Step: 109170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:19:44,881-Speed 10735.49 samples/sec   Loss 7.3897   LearningRate 0.0212   Epoch: 21   Global Step: 109180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:19:45,840-Speed 10694.29 samples/sec   Loss 7.2243   LearningRate 0.0212   Epoch: 21   Global Step: 109190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:19:46,830-Speed 10347.51 samples/sec   Loss 7.4270   LearningRate 0.0212   Epoch: 21   Global Step: 109200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:19:47,762-Speed 11004.33 samples/sec   Loss 7.4051   LearningRate 0.0212   Epoch: 21   Global Step: 109210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:19:48,721-Speed 10702.61 samples/sec   Loss 7.3038   LearningRate 0.0212   Epoch: 21   Global Step: 109220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:19:49,670-Speed 10800.58 samples/sec   Loss 7.2555   LearningRate 0.0212   Epoch: 21   Global Step: 109230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:19:50,633-Speed 10636.05 samples/sec   Loss 7.3329   LearningRate 0.0212   Epoch: 21   Global Step: 109240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:19:51,565-Speed 11001.49 samples/sec   Loss 7.5526   LearningRate 0.0212   Epoch: 21   Global Step: 109250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:19:52,475-Speed 11270.99 samples/sec   Loss 7.2274   LearningRate 0.0212   Epoch: 21   Global Step: 109260   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:19:53,390-Speed 11189.44 samples/sec   Loss 7.3342   LearningRate 0.0212   Epoch: 21   Global Step: 109270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:19:54,354-Speed 10637.32 samples/sec   Loss 7.2549   LearningRate 0.0211   Epoch: 21   Global Step: 109280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:19:55,315-Speed 10662.65 samples/sec   Loss 7.4554   LearningRate 0.0211   Epoch: 21   Global Step: 109290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:19:56,247-Speed 10998.68 samples/sec   Loss 7.4162   LearningRate 0.0211   Epoch: 21   Global Step: 109300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:19:57,182-Speed 10953.16 samples/sec   Loss 7.3449   LearningRate 0.0211   Epoch: 21   Global Step: 109310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:19:58,145-Speed 10651.28 samples/sec   Loss 7.2355   LearningRate 0.0211   Epoch: 21   Global Step: 109320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:19:59,084-Speed 10913.33 samples/sec   Loss 7.3032   LearningRate 0.0211   Epoch: 21   Global Step: 109330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:00,011-Speed 11051.65 samples/sec   Loss 7.3999   LearningRate 0.0211   Epoch: 21   Global Step: 109340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:00,923-Speed 11245.06 samples/sec   Loss 7.3417   LearningRate 0.0211   Epoch: 21   Global Step: 109350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:01,880-Speed 10698.77 samples/sec   Loss 7.4092   LearningRate 0.0211   Epoch: 21   Global Step: 109360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:02,832-Speed 10772.80 samples/sec   Loss 7.4708   LearningRate 0.0211   Epoch: 21   Global Step: 109370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:03,761-Speed 11031.29 samples/sec   Loss 7.2558   LearningRate 0.0211   Epoch: 21   Global Step: 109380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:04,722-Speed 10663.49 samples/sec   Loss 7.4651   LearningRate 0.0211   Epoch: 21   Global Step: 109390   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:20:05,660-Speed 10925.62 samples/sec   Loss 7.3924   LearningRate 0.0211   Epoch: 21   Global Step: 109400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:06,645-Speed 10406.17 samples/sec   Loss 7.3480   LearningRate 0.0211   Epoch: 21   Global Step: 109410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:07,597-Speed 10764.88 samples/sec   Loss 7.4428   LearningRate 0.0211   Epoch: 21   Global Step: 109420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:08,547-Speed 10789.62 samples/sec   Loss 7.3771   LearningRate 0.0211   Epoch: 21   Global Step: 109430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:09,508-Speed 10668.66 samples/sec   Loss 7.4231   LearningRate 0.0211   Epoch: 21   Global Step: 109440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:10,465-Speed 10702.87 samples/sec   Loss 7.4821   LearningRate 0.0211   Epoch: 21   Global Step: 109450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:20:11,429-Speed 10633.09 samples/sec   Loss 7.4984   LearningRate 0.0211   Epoch: 21   Global Step: 109460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:20:12,405-Speed 10501.19 samples/sec   Loss 7.4449   LearningRate 0.0211   Epoch: 21   Global Step: 109470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:20:13,390-Speed 10406.36 samples/sec   Loss 7.2968   LearningRate 0.0211   Epoch: 21   Global Step: 109480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:20:14,330-Speed 10911.09 samples/sec   Loss 7.3701   LearningRate 0.0211   Epoch: 21   Global Step: 109490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:20:15,288-Speed 10699.59 samples/sec   Loss 7.3303   LearningRate 0.0210   Epoch: 21   Global Step: 109500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:20:16,221-Speed 10985.89 samples/sec   Loss 7.3380   LearningRate 0.0210   Epoch: 21   Global Step: 109510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:20:17,212-Speed 10332.28 samples/sec   Loss 7.3830   LearningRate 0.0210   Epoch: 21   Global Step: 109520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:20:18,178-Speed 10614.14 samples/sec   Loss 7.2701   LearningRate 0.0210   Epoch: 21   Global Step: 109530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:20:19,116-Speed 10932.69 samples/sec   Loss 7.4062   LearningRate 0.0210   Epoch: 21   Global Step: 109540   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:20:20,128-Speed 10129.14 samples/sec   Loss 7.5566   LearningRate 0.0210   Epoch: 21   Global Step: 109550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:21,096-Speed 10591.70 samples/sec   Loss 7.3673   LearningRate 0.0210   Epoch: 21   Global Step: 109560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:22,065-Speed 10570.18 samples/sec   Loss 7.4226   LearningRate 0.0210   Epoch: 21   Global Step: 109570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:23,057-Speed 10338.18 samples/sec   Loss 7.3733   LearningRate 0.0210   Epoch: 21   Global Step: 109580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:24,057-Speed 10241.84 samples/sec   Loss 7.4917   LearningRate 0.0210   Epoch: 21   Global Step: 109590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:25,006-Speed 10796.51 samples/sec   Loss 7.4122   LearningRate 0.0210   Epoch: 21   Global Step: 109600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:25,940-Speed 10981.93 samples/sec   Loss 7.6072   LearningRate 0.0210   Epoch: 21   Global Step: 109610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:26,905-Speed 10620.31 samples/sec   Loss 7.3787   LearningRate 0.0210   Epoch: 21   Global Step: 109620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:27,852-Speed 10823.16 samples/sec   Loss 7.4864   LearningRate 0.0210   Epoch: 21   Global Step: 109630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:28,822-Speed 10562.62 samples/sec   Loss 7.3589   LearningRate 0.0210   Epoch: 21   Global Step: 109640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:29,841-Speed 10063.03 samples/sec   Loss 7.3370   LearningRate 0.0210   Epoch: 21   Global Step: 109650   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:20:30,811-Speed 10560.05 samples/sec   Loss 7.5690   LearningRate 0.0210   Epoch: 21   Global Step: 109660   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:20:31,766-Speed 10734.62 samples/sec   Loss 7.1835   LearningRate 0.0210   Epoch: 21   Global Step: 109670   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:20:32,761-Speed 10315.03 samples/sec   Loss 7.4646   LearningRate 0.0210   Epoch: 21   Global Step: 109680   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:20:33,704-Speed 10867.50 samples/sec   Loss 7.2627   LearningRate 0.0210   Epoch: 21   Global Step: 109690   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:20:34,695-Speed 10340.12 samples/sec   Loss 7.5591   LearningRate 0.0210   Epoch: 21   Global Step: 109700   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:20:35,593-Speed 11414.68 samples/sec   Loss 7.2866   LearningRate 0.0210   Epoch: 21   Global Step: 109710   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:20:36,564-Speed 10578.97 samples/sec   Loss 7.4626   LearningRate 0.0209   Epoch: 21   Global Step: 109720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:37,533-Speed 10574.22 samples/sec   Loss 7.3825   LearningRate 0.0209   Epoch: 21   Global Step: 109730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:38,494-Speed 10666.15 samples/sec   Loss 7.1793   LearningRate 0.0209   Epoch: 21   Global Step: 109740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:39,459-Speed 10614.80 samples/sec   Loss 7.4121   LearningRate 0.0209   Epoch: 21   Global Step: 109750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:40,415-Speed 10725.41 samples/sec   Loss 7.4014   LearningRate 0.0209   Epoch: 21   Global Step: 109760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:41,399-Speed 10410.43 samples/sec   Loss 7.4346   LearningRate 0.0209   Epoch: 21   Global Step: 109770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:42,393-Speed 10319.10 samples/sec   Loss 7.3141   LearningRate 0.0209   Epoch: 21   Global Step: 109780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:43,313-Speed 11160.61 samples/sec   Loss 7.4264   LearningRate 0.0209   Epoch: 21   Global Step: 109790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:44,262-Speed 10801.76 samples/sec   Loss 7.5224   LearningRate 0.0209   Epoch: 21   Global Step: 109800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:45,245-Speed 10421.70 samples/sec   Loss 7.4610   LearningRate 0.0209   Epoch: 21   Global Step: 109810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:46,213-Speed 10592.18 samples/sec   Loss 7.5106   LearningRate 0.0209   Epoch: 21   Global Step: 109820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:47,145-Speed 10998.47 samples/sec   Loss 7.2740   LearningRate 0.0209   Epoch: 21   Global Step: 109830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:48,095-Speed 10786.35 samples/sec   Loss 7.4857   LearningRate 0.0209   Epoch: 21   Global Step: 109840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:49,078-Speed 10423.37 samples/sec   Loss 7.3463   LearningRate 0.0209   Epoch: 21   Global Step: 109850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:50,124-Speed 9803.74 samples/sec   Loss 7.3807   LearningRate 0.0209   Epoch: 21   Global Step: 109860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:51,053-Speed 11030.15 samples/sec   Loss 7.4279   LearningRate 0.0209   Epoch: 21   Global Step: 109870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:51,962-Speed 11279.16 samples/sec   Loss 7.3931   LearningRate 0.0209   Epoch: 21   Global Step: 109880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:52,920-Speed 10690.09 samples/sec   Loss 7.3776   LearningRate 0.0209   Epoch: 21   Global Step: 109890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:53,877-Speed 10710.51 samples/sec   Loss 7.3670   LearningRate 0.0209   Epoch: 21   Global Step: 109900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:54,825-Speed 10808.64 samples/sec   Loss 7.3907   LearningRate 0.0209   Epoch: 21   Global Step: 109910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:55,765-Speed 10901.18 samples/sec   Loss 7.3520   LearningRate 0.0209   Epoch: 21   Global Step: 109920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:56,700-Speed 10992.06 samples/sec   Loss 7.3526   LearningRate 0.0209   Epoch: 21   Global Step: 109930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:57,637-Speed 10939.95 samples/sec   Loss 7.4177   LearningRate 0.0208   Epoch: 21   Global Step: 109940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:58,619-Speed 10440.59 samples/sec   Loss 7.3705   LearningRate 0.0208   Epoch: 21   Global Step: 109950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:20:59,542-Speed 11102.79 samples/sec   Loss 7.3230   LearningRate 0.0208   Epoch: 21   Global Step: 109960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:21:00,526-Speed 10414.14 samples/sec   Loss 7.4523   LearningRate 0.0208   Epoch: 21   Global Step: 109970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:21:01,441-Speed 11202.25 samples/sec   Loss 7.3015   LearningRate 0.0208   Epoch: 21   Global Step: 109980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:21:02,390-Speed 10797.55 samples/sec   Loss 7.4985   LearningRate 0.0208   Epoch: 21   Global Step: 109990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:21:03,301-Speed 11247.59 samples/sec   Loss 7.2882   LearningRate 0.0208   Epoch: 21   Global Step: 110000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:21:25,591-[lfw][110000]XNorm: 10.528549
Training: 2022-04-11 03:21:25,592-[lfw][110000]Accuracy-Flip: 0.99517+-0.00353
Training: 2022-04-11 03:21:25,593-[lfw][110000]Accuracy-Highest: 0.99667
Training: 2022-04-11 03:21:51,588-[cfp_fp][110000]XNorm: 8.999908
Training: 2022-04-11 03:21:51,589-[cfp_fp][110000]Accuracy-Flip: 0.96529+-0.00977
Training: 2022-04-11 03:21:51,590-[cfp_fp][110000]Accuracy-Highest: 0.96600
Training: 2022-04-11 03:22:14,150-[agedb_30][110000]XNorm: 10.277697
Training: 2022-04-11 03:22:14,151-[agedb_30][110000]Accuracy-Flip: 0.96733+-0.00731
Training: 2022-04-11 03:22:14,152-[agedb_30][110000]Accuracy-Highest: 0.97017
Training: 2022-04-11 03:22:15,104-Speed 142.61 samples/sec   Loss 7.4132   LearningRate 0.0208   Epoch: 21   Global Step: 110010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:22:16,040-Speed 10953.34 samples/sec   Loss 7.3807   LearningRate 0.0208   Epoch: 21   Global Step: 110020   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:22:17,008-Speed 10585.35 samples/sec   Loss 7.2863   LearningRate 0.0208   Epoch: 21   Global Step: 110030   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:22:17,963-Speed 10732.34 samples/sec   Loss 7.4365   LearningRate 0.0208   Epoch: 21   Global Step: 110040   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:22:18,902-Speed 10922.44 samples/sec   Loss 7.4801   LearningRate 0.0208   Epoch: 21   Global Step: 110050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:22:19,900-Speed 10277.71 samples/sec   Loss 7.3466   LearningRate 0.0208   Epoch: 21   Global Step: 110060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:22:20,874-Speed 10520.06 samples/sec   Loss 7.4084   LearningRate 0.0208   Epoch: 21   Global Step: 110070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:22:21,812-Speed 10928.33 samples/sec   Loss 7.4233   LearningRate 0.0208   Epoch: 21   Global Step: 110080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:22:22,859-Speed 9785.15 samples/sec   Loss 7.3526   LearningRate 0.0208   Epoch: 21   Global Step: 110090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:22:23,780-Speed 11137.60 samples/sec   Loss 7.4359   LearningRate 0.0208   Epoch: 21   Global Step: 110100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:22:24,725-Speed 10841.91 samples/sec   Loss 7.3461   LearningRate 0.0208   Epoch: 21   Global Step: 110110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:22:25,721-Speed 10288.47 samples/sec   Loss 7.3567   LearningRate 0.0208   Epoch: 21   Global Step: 110120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:22:26,668-Speed 10821.56 samples/sec   Loss 7.5209   LearningRate 0.0208   Epoch: 21   Global Step: 110130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:22:27,604-Speed 10947.26 samples/sec   Loss 7.5126   LearningRate 0.0208   Epoch: 21   Global Step: 110140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:22:28,596-Speed 10339.32 samples/sec   Loss 7.3964   LearningRate 0.0208   Epoch: 21   Global Step: 110150   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:22:29,531-Speed 10956.05 samples/sec   Loss 7.2582   LearningRate 0.0207   Epoch: 21   Global Step: 110160   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:22:30,470-Speed 10916.42 samples/sec   Loss 7.4108   LearningRate 0.0207   Epoch: 21   Global Step: 110170   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:22:31,425-Speed 10731.42 samples/sec   Loss 7.4928   LearningRate 0.0207   Epoch: 21   Global Step: 110180   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:22:32,367-Speed 10883.04 samples/sec   Loss 7.4057   LearningRate 0.0207   Epoch: 21   Global Step: 110190   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:22:33,316-Speed 10797.77 samples/sec   Loss 7.3856   LearningRate 0.0207   Epoch: 21   Global Step: 110200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:22:34,283-Speed 10601.10 samples/sec   Loss 7.3843   LearningRate 0.0207   Epoch: 21   Global Step: 110210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:22:35,261-Speed 10481.38 samples/sec   Loss 7.4678   LearningRate 0.0207   Epoch: 21   Global Step: 110220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:22:36,212-Speed 10771.19 samples/sec   Loss 7.3227   LearningRate 0.0207   Epoch: 21   Global Step: 110230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:22:37,186-Speed 10530.49 samples/sec   Loss 7.4277   LearningRate 0.0207   Epoch: 21   Global Step: 110240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:22:38,124-Speed 10921.42 samples/sec   Loss 7.3984   LearningRate 0.0207   Epoch: 21   Global Step: 110250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:22:39,059-Speed 10963.82 samples/sec   Loss 7.4898   LearningRate 0.0207   Epoch: 21   Global Step: 110260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:22:40,018-Speed 10699.04 samples/sec   Loss 7.1806   LearningRate 0.0207   Epoch: 21   Global Step: 110270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:22:40,926-Speed 11280.79 samples/sec   Loss 7.4466   LearningRate 0.0207   Epoch: 21   Global Step: 110280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:22:41,865-Speed 10918.37 samples/sec   Loss 7.3940   LearningRate 0.0207   Epoch: 21   Global Step: 110290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:22:42,794-Speed 11027.27 samples/sec   Loss 7.2216   LearningRate 0.0207   Epoch: 21   Global Step: 110300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:22:43,771-Speed 10487.73 samples/sec   Loss 7.4569   LearningRate 0.0207   Epoch: 21   Global Step: 110310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:22:44,718-Speed 10832.12 samples/sec   Loss 7.4114   LearningRate 0.0207   Epoch: 21   Global Step: 110320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:22:45,678-Speed 10670.04 samples/sec   Loss 7.5274   LearningRate 0.0207   Epoch: 21   Global Step: 110330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:22:46,627-Speed 10802.04 samples/sec   Loss 7.2977   LearningRate 0.0207   Epoch: 21   Global Step: 110340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:22:47,586-Speed 10690.56 samples/sec   Loss 7.3092   LearningRate 0.0207   Epoch: 21   Global Step: 110350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:22:48,582-Speed 10288.29 samples/sec   Loss 7.4494   LearningRate 0.0207   Epoch: 21   Global Step: 110360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:22:49,527-Speed 10842.30 samples/sec   Loss 7.5563   LearningRate 0.0207   Epoch: 21   Global Step: 110370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:22:50,457-Speed 11019.56 samples/sec   Loss 7.5076   LearningRate 0.0207   Epoch: 21   Global Step: 110380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:22:51,421-Speed 10633.99 samples/sec   Loss 7.3078   LearningRate 0.0206   Epoch: 21   Global Step: 110390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:22:52,409-Speed 10378.03 samples/sec   Loss 7.4362   LearningRate 0.0206   Epoch: 21   Global Step: 110400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:22:53,345-Speed 10945.53 samples/sec   Loss 7.4216   LearningRate 0.0206   Epoch: 21   Global Step: 110410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:22:54,269-Speed 11095.16 samples/sec   Loss 7.5283   LearningRate 0.0206   Epoch: 21   Global Step: 110420   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:22:55,213-Speed 10856.94 samples/sec   Loss 7.4332   LearningRate 0.0206   Epoch: 21   Global Step: 110430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:22:56,194-Speed 10445.56 samples/sec   Loss 7.3494   LearningRate 0.0206   Epoch: 21   Global Step: 110440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:22:57,180-Speed 10397.30 samples/sec   Loss 7.5796   LearningRate 0.0206   Epoch: 21   Global Step: 110450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:22:58,138-Speed 10704.21 samples/sec   Loss 7.5065   LearningRate 0.0206   Epoch: 21   Global Step: 110460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:22:59,078-Speed 10904.10 samples/sec   Loss 7.5371   LearningRate 0.0206   Epoch: 21   Global Step: 110470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:23:00,041-Speed 10645.53 samples/sec   Loss 7.3393   LearningRate 0.0206   Epoch: 21   Global Step: 110480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:23:00,971-Speed 11013.27 samples/sec   Loss 7.2998   LearningRate 0.0206   Epoch: 21   Global Step: 110490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:23:01,917-Speed 10840.74 samples/sec   Loss 7.4778   LearningRate 0.0206   Epoch: 21   Global Step: 110500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:23:02,855-Speed 10922.30 samples/sec   Loss 7.4186   LearningRate 0.0206   Epoch: 21   Global Step: 110510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:23:03,819-Speed 10643.54 samples/sec   Loss 7.3789   LearningRate 0.0206   Epoch: 21   Global Step: 110520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:23:04,759-Speed 10902.81 samples/sec   Loss 7.3997   LearningRate 0.0206   Epoch: 21   Global Step: 110530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:23:05,707-Speed 10805.05 samples/sec   Loss 7.5226   LearningRate 0.0206   Epoch: 21   Global Step: 110540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:23:06,650-Speed 10877.11 samples/sec   Loss 7.4516   LearningRate 0.0206   Epoch: 21   Global Step: 110550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:23:07,653-Speed 10220.53 samples/sec   Loss 7.3083   LearningRate 0.0206   Epoch: 21   Global Step: 110560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:23:08,583-Speed 11023.50 samples/sec   Loss 7.2965   LearningRate 0.0206   Epoch: 21   Global Step: 110570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:23:09,541-Speed 10699.32 samples/sec   Loss 7.4810   LearningRate 0.0206   Epoch: 21   Global Step: 110580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:23:10,464-Speed 11103.23 samples/sec   Loss 7.2636   LearningRate 0.0206   Epoch: 21   Global Step: 110590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:23:11,416-Speed 10760.39 samples/sec   Loss 7.5196   LearningRate 0.0206   Epoch: 21   Global Step: 110600   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:23:12,363-Speed 10831.38 samples/sec   Loss 7.4632   LearningRate 0.0205   Epoch: 21   Global Step: 110610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:23:13,325-Speed 10653.49 samples/sec   Loss 7.3029   LearningRate 0.0205   Epoch: 21   Global Step: 110620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:23:14,273-Speed 10821.03 samples/sec   Loss 7.4964   LearningRate 0.0205   Epoch: 21   Global Step: 110630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:23:15,226-Speed 10751.06 samples/sec   Loss 7.4744   LearningRate 0.0205   Epoch: 21   Global Step: 110640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:23:16,178-Speed 10766.84 samples/sec   Loss 7.5472   LearningRate 0.0205   Epoch: 21   Global Step: 110650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:23:17,165-Speed 10390.40 samples/sec   Loss 7.3544   LearningRate 0.0205   Epoch: 21   Global Step: 110660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:23:18,057-Speed 11491.59 samples/sec   Loss 7.3716   LearningRate 0.0205   Epoch: 21   Global Step: 110670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:23:19,004-Speed 10821.07 samples/sec   Loss 7.3298   LearningRate 0.0205   Epoch: 21   Global Step: 110680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:23:19,944-Speed 10902.21 samples/sec   Loss 7.5168   LearningRate 0.0205   Epoch: 21   Global Step: 110690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:23:20,872-Speed 11036.30 samples/sec   Loss 7.3580   LearningRate 0.0205   Epoch: 21   Global Step: 110700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:23:21,885-Speed 10121.71 samples/sec   Loss 7.4294   LearningRate 0.0205   Epoch: 21   Global Step: 110710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:23:22,821-Speed 10958.53 samples/sec   Loss 7.4670   LearningRate 0.0205   Epoch: 21   Global Step: 110720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:23:23,766-Speed 10844.56 samples/sec   Loss 7.5171   LearningRate 0.0205   Epoch: 21   Global Step: 110730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:23:24,727-Speed 10655.35 samples/sec   Loss 7.4877   LearningRate 0.0205   Epoch: 21   Global Step: 110740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:23:25,679-Speed 10766.52 samples/sec   Loss 7.4566   LearningRate 0.0205   Epoch: 21   Global Step: 110750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:23:26,624-Speed 10852.89 samples/sec   Loss 7.3417   LearningRate 0.0205   Epoch: 21   Global Step: 110760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:23:27,538-Speed 11215.98 samples/sec   Loss 7.2918   LearningRate 0.0205   Epoch: 21   Global Step: 110770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:23:28,478-Speed 10894.20 samples/sec   Loss 7.4013   LearningRate 0.0205   Epoch: 21   Global Step: 110780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:23:29,415-Speed 10941.68 samples/sec   Loss 7.4442   LearningRate 0.0205   Epoch: 21   Global Step: 110790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:23:30,353-Speed 10919.96 samples/sec   Loss 7.4494   LearningRate 0.0205   Epoch: 21   Global Step: 110800   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:23:31,337-Speed 10420.17 samples/sec   Loss 7.3051   LearningRate 0.0205   Epoch: 21   Global Step: 110810   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:23:32,316-Speed 10463.89 samples/sec   Loss 7.4893   LearningRate 0.0205   Epoch: 21   Global Step: 110820   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:23:33,291-Speed 10504.92 samples/sec   Loss 7.4131   LearningRate 0.0204   Epoch: 21   Global Step: 110830   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:23:34,255-Speed 10638.64 samples/sec   Loss 7.3226   LearningRate 0.0204   Epoch: 21   Global Step: 110840   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:23:35,206-Speed 10783.91 samples/sec   Loss 7.5034   LearningRate 0.0204   Epoch: 21   Global Step: 110850   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:23:36,181-Speed 10501.43 samples/sec   Loss 7.5706   LearningRate 0.0204   Epoch: 21   Global Step: 110860   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:23:37,124-Speed 10871.13 samples/sec   Loss 7.4586   LearningRate 0.0204   Epoch: 21   Global Step: 110870   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:23:38,127-Speed 10220.58 samples/sec   Loss 7.4599   LearningRate 0.0204   Epoch: 21   Global Step: 110880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:23:39,086-Speed 10689.62 samples/sec   Loss 7.4139   LearningRate 0.0204   Epoch: 21   Global Step: 110890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:23:40,013-Speed 11055.67 samples/sec   Loss 7.3693   LearningRate 0.0204   Epoch: 21   Global Step: 110900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:23:40,973-Speed 10677.53 samples/sec   Loss 7.2542   LearningRate 0.0204   Epoch: 21   Global Step: 110910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:23:41,926-Speed 10750.91 samples/sec   Loss 7.5456   LearningRate 0.0204   Epoch: 21   Global Step: 110920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:23:42,895-Speed 10571.25 samples/sec   Loss 7.4231   LearningRate 0.0204   Epoch: 21   Global Step: 110930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:23:43,857-Speed 10655.34 samples/sec   Loss 7.2740   LearningRate 0.0204   Epoch: 21   Global Step: 110940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:23:44,818-Speed 10672.62 samples/sec   Loss 7.4949   LearningRate 0.0204   Epoch: 21   Global Step: 110950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:23:45,736-Speed 11156.10 samples/sec   Loss 7.4954   LearningRate 0.0204   Epoch: 21   Global Step: 110960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:23:46,681-Speed 10850.19 samples/sec   Loss 7.3076   LearningRate 0.0204   Epoch: 21   Global Step: 110970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:23:47,627-Speed 10831.20 samples/sec   Loss 7.4146   LearningRate 0.0204   Epoch: 21   Global Step: 110980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:23:48,589-Speed 10660.19 samples/sec   Loss 7.3177   LearningRate 0.0204   Epoch: 21   Global Step: 110990   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:23:49,539-Speed 10782.49 samples/sec   Loss 7.3623   LearningRate 0.0204   Epoch: 21   Global Step: 111000   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:23:50,516-Speed 10487.73 samples/sec   Loss 7.4236   LearningRate 0.0204   Epoch: 21   Global Step: 111010   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:23:51,439-Speed 11101.22 samples/sec   Loss 7.2896   LearningRate 0.0204   Epoch: 21   Global Step: 111020   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:23:52,373-Speed 10969.81 samples/sec   Loss 7.3866   LearningRate 0.0204   Epoch: 21   Global Step: 111030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:23:53,304-Speed 11008.49 samples/sec   Loss 7.4428   LearningRate 0.0204   Epoch: 21   Global Step: 111040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:23:54,245-Speed 10885.21 samples/sec   Loss 7.4834   LearningRate 0.0204   Epoch: 21   Global Step: 111050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:23:55,208-Speed 10644.76 samples/sec   Loss 7.3416   LearningRate 0.0203   Epoch: 21   Global Step: 111060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:23:56,136-Speed 11043.70 samples/sec   Loss 7.3352   LearningRate 0.0203   Epoch: 21   Global Step: 111070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:23:57,114-Speed 10488.64 samples/sec   Loss 7.3872   LearningRate 0.0203   Epoch: 21   Global Step: 111080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:23:58,046-Speed 10994.71 samples/sec   Loss 7.3529   LearningRate 0.0203   Epoch: 21   Global Step: 111090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:23:59,002-Speed 10719.66 samples/sec   Loss 7.3349   LearningRate 0.0203   Epoch: 21   Global Step: 111100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:23:59,958-Speed 10714.12 samples/sec   Loss 7.3894   LearningRate 0.0203   Epoch: 21   Global Step: 111110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:24:00,900-Speed 10884.02 samples/sec   Loss 7.4425   LearningRate 0.0203   Epoch: 21   Global Step: 111120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:24:01,835-Speed 10955.72 samples/sec   Loss 7.5397   LearningRate 0.0203   Epoch: 21   Global Step: 111130   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:24:02,770-Speed 10972.15 samples/sec   Loss 7.3546   LearningRate 0.0203   Epoch: 21   Global Step: 111140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:24:03,709-Speed 10908.36 samples/sec   Loss 7.4840   LearningRate 0.0203   Epoch: 21   Global Step: 111150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:24:04,682-Speed 10537.14 samples/sec   Loss 7.4736   LearningRate 0.0203   Epoch: 21   Global Step: 111160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:24:05,653-Speed 10549.88 samples/sec   Loss 7.4304   LearningRate 0.0203   Epoch: 21   Global Step: 111170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:24:06,630-Speed 10495.94 samples/sec   Loss 7.4126   LearningRate 0.0203   Epoch: 21   Global Step: 111180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:24:07,586-Speed 10714.26 samples/sec   Loss 7.3977   LearningRate 0.0203   Epoch: 21   Global Step: 111190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:24:08,595-Speed 10161.98 samples/sec   Loss 7.4045   LearningRate 0.0203   Epoch: 21   Global Step: 111200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:24:09,538-Speed 10863.74 samples/sec   Loss 7.2243   LearningRate 0.0203   Epoch: 21   Global Step: 111210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:24:10,500-Speed 10657.35 samples/sec   Loss 7.3290   LearningRate 0.0203   Epoch: 21   Global Step: 111220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:24:11,468-Speed 10595.84 samples/sec   Loss 7.4318   LearningRate 0.0203   Epoch: 21   Global Step: 111230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:24:12,413-Speed 10849.77 samples/sec   Loss 7.4629   LearningRate 0.0203   Epoch: 21   Global Step: 111240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:24:13,375-Speed 10648.22 samples/sec   Loss 7.4225   LearningRate 0.0203   Epoch: 21   Global Step: 111250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:24:14,318-Speed 10872.30 samples/sec   Loss 7.4496   LearningRate 0.0203   Epoch: 21   Global Step: 111260   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:24:15,328-Speed 10142.59 samples/sec   Loss 7.3304   LearningRate 0.0203   Epoch: 21   Global Step: 111270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:24:25,282-Speed 1028.86 samples/sec   Loss 7.1372   LearningRate 0.0202   Epoch: 22   Global Step: 111280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:24:26,326-Speed 9821.25 samples/sec   Loss 6.5314   LearningRate 0.0202   Epoch: 22   Global Step: 111290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:24:27,440-Speed 9207.63 samples/sec   Loss 6.4855   LearningRate 0.0202   Epoch: 22   Global Step: 111300   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:24:28,648-Speed 8484.86 samples/sec   Loss 6.5142   LearningRate 0.0202   Epoch: 22   Global Step: 111310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:24:29,919-Speed 8063.51 samples/sec   Loss 6.5221   LearningRate 0.0202   Epoch: 22   Global Step: 111320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:24:31,082-Speed 8817.61 samples/sec   Loss 6.5127   LearningRate 0.0202   Epoch: 22   Global Step: 111330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:24:32,103-Speed 10031.04 samples/sec   Loss 6.6921   LearningRate 0.0202   Epoch: 22   Global Step: 111340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:24:33,076-Speed 10541.05 samples/sec   Loss 6.4829   LearningRate 0.0202   Epoch: 22   Global Step: 111350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:24:34,068-Speed 10333.57 samples/sec   Loss 6.5098   LearningRate 0.0202   Epoch: 22   Global Step: 111360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:24:35,027-Speed 10689.02 samples/sec   Loss 6.5893   LearningRate 0.0202   Epoch: 22   Global Step: 111370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:24:35,977-Speed 10785.63 samples/sec   Loss 6.5185   LearningRate 0.0202   Epoch: 22   Global Step: 111380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:24:36,953-Speed 10504.68 samples/sec   Loss 6.5160   LearningRate 0.0202   Epoch: 22   Global Step: 111390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:24:37,974-Speed 10038.26 samples/sec   Loss 6.5365   LearningRate 0.0202   Epoch: 22   Global Step: 111400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:24:38,922-Speed 10812.26 samples/sec   Loss 6.7107   LearningRate 0.0202   Epoch: 22   Global Step: 111410   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:24:39,919-Speed 10278.50 samples/sec   Loss 6.5690   LearningRate 0.0202   Epoch: 22   Global Step: 111420   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:24:40,872-Speed 10757.17 samples/sec   Loss 6.6201   LearningRate 0.0202   Epoch: 22   Global Step: 111430   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:24:41,834-Speed 10656.70 samples/sec   Loss 6.6781   LearningRate 0.0202   Epoch: 22   Global Step: 111440   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:24:42,760-Speed 11063.95 samples/sec   Loss 6.5297   LearningRate 0.0202   Epoch: 22   Global Step: 111450   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:24:43,726-Speed 10606.18 samples/sec   Loss 6.7547   LearningRate 0.0202   Epoch: 22   Global Step: 111460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:24:44,652-Speed 11082.63 samples/sec   Loss 6.6244   LearningRate 0.0202   Epoch: 22   Global Step: 111470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:24:45,587-Speed 10961.17 samples/sec   Loss 6.6616   LearningRate 0.0202   Epoch: 22   Global Step: 111480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:24:46,530-Speed 10869.11 samples/sec   Loss 6.5418   LearningRate 0.0202   Epoch: 22   Global Step: 111490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:24:47,460-Speed 11010.84 samples/sec   Loss 6.8349   LearningRate 0.0202   Epoch: 22   Global Step: 111500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:24:48,402-Speed 10888.30 samples/sec   Loss 6.6875   LearningRate 0.0201   Epoch: 22   Global Step: 111510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:24:49,407-Speed 10200.86 samples/sec   Loss 6.7236   LearningRate 0.0201   Epoch: 22   Global Step: 111520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:24:50,384-Speed 10486.47 samples/sec   Loss 6.5940   LearningRate 0.0201   Epoch: 22   Global Step: 111530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:24:51,379-Speed 10304.95 samples/sec   Loss 6.5803   LearningRate 0.0201   Epoch: 22   Global Step: 111540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:24:52,346-Speed 10598.66 samples/sec   Loss 6.6387   LearningRate 0.0201   Epoch: 22   Global Step: 111550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:24:53,357-Speed 10139.70 samples/sec   Loss 6.8046   LearningRate 0.0201   Epoch: 22   Global Step: 111560   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:24:54,381-Speed 10006.47 samples/sec   Loss 6.6599   LearningRate 0.0201   Epoch: 22   Global Step: 111570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:24:55,417-Speed 9896.95 samples/sec   Loss 6.6428   LearningRate 0.0201   Epoch: 22   Global Step: 111580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:24:56,367-Speed 10793.54 samples/sec   Loss 6.7517   LearningRate 0.0201   Epoch: 22   Global Step: 111590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:24:57,317-Speed 10785.89 samples/sec   Loss 6.7156   LearningRate 0.0201   Epoch: 22   Global Step: 111600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:24:58,329-Speed 10124.74 samples/sec   Loss 6.7062   LearningRate 0.0201   Epoch: 22   Global Step: 111610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:24:59,279-Speed 10797.52 samples/sec   Loss 6.8927   LearningRate 0.0201   Epoch: 22   Global Step: 111620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:25:00,219-Speed 10902.69 samples/sec   Loss 6.5474   LearningRate 0.0201   Epoch: 22   Global Step: 111630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:25:01,191-Speed 10543.53 samples/sec   Loss 6.7681   LearningRate 0.0201   Epoch: 22   Global Step: 111640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:25:02,135-Speed 10864.41 samples/sec   Loss 6.7982   LearningRate 0.0201   Epoch: 22   Global Step: 111650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:25:03,062-Speed 11044.31 samples/sec   Loss 6.7986   LearningRate 0.0201   Epoch: 22   Global Step: 111660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:25:04,011-Speed 10798.68 samples/sec   Loss 6.7058   LearningRate 0.0201   Epoch: 22   Global Step: 111670   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:25:05,072-Speed 9675.07 samples/sec   Loss 6.8038   LearningRate 0.0201   Epoch: 22   Global Step: 111680   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:25:06,003-Speed 11011.17 samples/sec   Loss 6.7405   LearningRate 0.0201   Epoch: 22   Global Step: 111690   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:25:06,946-Speed 10863.96 samples/sec   Loss 6.7711   LearningRate 0.0201   Epoch: 22   Global Step: 111700   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:25:07,909-Speed 10640.27 samples/sec   Loss 6.8549   LearningRate 0.0201   Epoch: 22   Global Step: 111710   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:25:08,869-Speed 10681.98 samples/sec   Loss 6.7132   LearningRate 0.0201   Epoch: 22   Global Step: 111720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:25:09,825-Speed 10721.72 samples/sec   Loss 6.8615   LearningRate 0.0200   Epoch: 22   Global Step: 111730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:25:10,772-Speed 10820.44 samples/sec   Loss 6.8903   LearningRate 0.0200   Epoch: 22   Global Step: 111740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:25:11,691-Speed 11139.05 samples/sec   Loss 6.8259   LearningRate 0.0200   Epoch: 22   Global Step: 111750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:25:12,619-Speed 11049.22 samples/sec   Loss 6.8451   LearningRate 0.0200   Epoch: 22   Global Step: 111760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:25:13,551-Speed 10998.47 samples/sec   Loss 6.7260   LearningRate 0.0200   Epoch: 22   Global Step: 111770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:25:14,531-Speed 10461.02 samples/sec   Loss 6.7203   LearningRate 0.0200   Epoch: 22   Global Step: 111780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:25:15,443-Speed 11234.88 samples/sec   Loss 6.8555   LearningRate 0.0200   Epoch: 22   Global Step: 111790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:25:16,402-Speed 10687.49 samples/sec   Loss 6.7390   LearningRate 0.0200   Epoch: 22   Global Step: 111800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:25:17,378-Speed 10505.34 samples/sec   Loss 6.6709   LearningRate 0.0200   Epoch: 22   Global Step: 111810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:25:18,336-Speed 10691.72 samples/sec   Loss 6.8274   LearningRate 0.0200   Epoch: 22   Global Step: 111820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:25:19,281-Speed 10848.94 samples/sec   Loss 6.5842   LearningRate 0.0200   Epoch: 22   Global Step: 111830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:25:20,255-Speed 10521.65 samples/sec   Loss 6.8924   LearningRate 0.0200   Epoch: 22   Global Step: 111840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:25:21,219-Speed 10637.76 samples/sec   Loss 6.8689   LearningRate 0.0200   Epoch: 22   Global Step: 111850   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:25:22,170-Speed 10772.15 samples/sec   Loss 6.7288   LearningRate 0.0200   Epoch: 22   Global Step: 111860   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:25:23,089-Speed 11155.00 samples/sec   Loss 6.7880   LearningRate 0.0200   Epoch: 22   Global Step: 111870   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:25:24,043-Speed 10744.75 samples/sec   Loss 6.7890   LearningRate 0.0200   Epoch: 22   Global Step: 111880   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:25:24,994-Speed 10772.85 samples/sec   Loss 6.8679   LearningRate 0.0200   Epoch: 22   Global Step: 111890   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:25:25,962-Speed 10596.44 samples/sec   Loss 6.7470   LearningRate 0.0200   Epoch: 22   Global Step: 111900   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:25:26,917-Speed 10724.14 samples/sec   Loss 6.9890   LearningRate 0.0200   Epoch: 22   Global Step: 111910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:25:27,898-Speed 10450.88 samples/sec   Loss 6.9730   LearningRate 0.0200   Epoch: 22   Global Step: 111920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:25:28,858-Speed 10670.64 samples/sec   Loss 6.7585   LearningRate 0.0200   Epoch: 22   Global Step: 111930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:25:29,811-Speed 10761.83 samples/sec   Loss 6.9650   LearningRate 0.0200   Epoch: 22   Global Step: 111940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:25:30,753-Speed 10877.73 samples/sec   Loss 6.9434   LearningRate 0.0200   Epoch: 22   Global Step: 111950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:25:31,706-Speed 10758.65 samples/sec   Loss 7.0174   LearningRate 0.0199   Epoch: 22   Global Step: 111960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:25:32,692-Speed 10386.09 samples/sec   Loss 7.0433   LearningRate 0.0199   Epoch: 22   Global Step: 111970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:25:33,676-Speed 10424.67 samples/sec   Loss 6.8991   LearningRate 0.0199   Epoch: 22   Global Step: 111980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:25:34,632-Speed 10719.51 samples/sec   Loss 6.7636   LearningRate 0.0199   Epoch: 22   Global Step: 111990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:25:35,538-Speed 11317.45 samples/sec   Loss 6.7623   LearningRate 0.0199   Epoch: 22   Global Step: 112000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:25:57,777-[lfw][112000]XNorm: 10.386530
Training: 2022-04-11 03:25:57,778-[lfw][112000]Accuracy-Flip: 0.99583+-0.00359
Training: 2022-04-11 03:25:57,779-[lfw][112000]Accuracy-Highest: 0.99667
Training: 2022-04-11 03:26:23,374-[cfp_fp][112000]XNorm: 8.842858
Training: 2022-04-11 03:26:23,375-[cfp_fp][112000]Accuracy-Flip: 0.96071+-0.00992
Training: 2022-04-11 03:26:23,376-[cfp_fp][112000]Accuracy-Highest: 0.96600
Training: 2022-04-11 03:26:45,559-[agedb_30][112000]XNorm: 10.037620
Training: 2022-04-11 03:26:45,560-[agedb_30][112000]Accuracy-Flip: 0.96600+-0.00917
Training: 2022-04-11 03:26:45,560-[agedb_30][112000]Accuracy-Highest: 0.97017
Training: 2022-04-11 03:26:46,529-Speed 144.25 samples/sec   Loss 6.9432   LearningRate 0.0199   Epoch: 22   Global Step: 112010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:26:47,439-Speed 11257.27 samples/sec   Loss 6.9060   LearningRate 0.0199   Epoch: 22   Global Step: 112020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:26:48,404-Speed 10623.26 samples/sec   Loss 6.8741   LearningRate 0.0199   Epoch: 22   Global Step: 112030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:26:49,365-Speed 10665.16 samples/sec   Loss 6.9207   LearningRate 0.0199   Epoch: 22   Global Step: 112040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:26:50,290-Speed 11083.40 samples/sec   Loss 6.7077   LearningRate 0.0199   Epoch: 22   Global Step: 112050   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:26:51,231-Speed 10900.67 samples/sec   Loss 6.6401   LearningRate 0.0199   Epoch: 22   Global Step: 112060   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:26:52,203-Speed 10541.62 samples/sec   Loss 6.8120   LearningRate 0.0199   Epoch: 22   Global Step: 112070   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:26:53,146-Speed 10861.15 samples/sec   Loss 6.8305   LearningRate 0.0199   Epoch: 22   Global Step: 112080   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:26:54,120-Speed 10531.94 samples/sec   Loss 6.9811   LearningRate 0.0199   Epoch: 22   Global Step: 112090   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:26:55,060-Speed 10923.63 samples/sec   Loss 6.9030   LearningRate 0.0199   Epoch: 22   Global Step: 112100   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:26:56,039-Speed 10466.37 samples/sec   Loss 6.9118   LearningRate 0.0199   Epoch: 22   Global Step: 112110   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:26:56,993-Speed 10750.24 samples/sec   Loss 7.0499   LearningRate 0.0199   Epoch: 22   Global Step: 112120   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:26:57,925-Speed 10996.20 samples/sec   Loss 6.9899   LearningRate 0.0199   Epoch: 22   Global Step: 112130   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:26:58,876-Speed 10773.77 samples/sec   Loss 7.1910   LearningRate 0.0199   Epoch: 22   Global Step: 112140   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:26:59,804-Speed 11044.00 samples/sec   Loss 6.9608   LearningRate 0.0199   Epoch: 22   Global Step: 112150   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:27:00,728-Speed 11094.73 samples/sec   Loss 6.8949   LearningRate 0.0199   Epoch: 22   Global Step: 112160   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:27:01,688-Speed 10670.82 samples/sec   Loss 6.9398   LearningRate 0.0199   Epoch: 22   Global Step: 112170   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:27:02,603-Speed 11210.83 samples/sec   Loss 6.8719   LearningRate 0.0198   Epoch: 22   Global Step: 112180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:27:03,565-Speed 10650.55 samples/sec   Loss 6.8909   LearningRate 0.0198   Epoch: 22   Global Step: 112190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:27:04,523-Speed 10701.57 samples/sec   Loss 7.0248   LearningRate 0.0198   Epoch: 22   Global Step: 112200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:27:05,479-Speed 10707.77 samples/sec   Loss 6.8808   LearningRate 0.0198   Epoch: 22   Global Step: 112210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:27:06,455-Speed 10503.90 samples/sec   Loss 7.0016   LearningRate 0.0198   Epoch: 22   Global Step: 112220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:27:07,410-Speed 10740.98 samples/sec   Loss 6.8925   LearningRate 0.0198   Epoch: 22   Global Step: 112230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:27:08,347-Speed 10930.09 samples/sec   Loss 7.0291   LearningRate 0.0198   Epoch: 22   Global Step: 112240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:27:09,312-Speed 10620.03 samples/sec   Loss 7.0174   LearningRate 0.0198   Epoch: 22   Global Step: 112250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:27:10,295-Speed 10433.76 samples/sec   Loss 6.9089   LearningRate 0.0198   Epoch: 22   Global Step: 112260   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:27:11,277-Speed 10435.19 samples/sec   Loss 7.0406   LearningRate 0.0198   Epoch: 22   Global Step: 112270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:27:12,213-Speed 10942.36 samples/sec   Loss 6.9255   LearningRate 0.0198   Epoch: 22   Global Step: 112280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:27:13,156-Speed 10876.64 samples/sec   Loss 6.8483   LearningRate 0.0198   Epoch: 22   Global Step: 112290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:27:14,108-Speed 10765.14 samples/sec   Loss 7.0175   LearningRate 0.0198   Epoch: 22   Global Step: 112300   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:27:15,132-Speed 10005.17 samples/sec   Loss 6.9334   LearningRate 0.0198   Epoch: 22   Global Step: 112310   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:27:16,065-Speed 10985.00 samples/sec   Loss 7.0267   LearningRate 0.0198   Epoch: 22   Global Step: 112320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:27:17,020-Speed 10732.12 samples/sec   Loss 7.0984   LearningRate 0.0198   Epoch: 22   Global Step: 112330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:27:17,970-Speed 10790.27 samples/sec   Loss 6.9477   LearningRate 0.0198   Epoch: 22   Global Step: 112340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:27:18,919-Speed 10804.64 samples/sec   Loss 6.9438   LearningRate 0.0198   Epoch: 22   Global Step: 112350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:27:19,886-Speed 10714.79 samples/sec   Loss 7.1492   LearningRate 0.0198   Epoch: 22   Global Step: 112360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:27:20,799-Speed 11221.81 samples/sec   Loss 7.0394   LearningRate 0.0198   Epoch: 22   Global Step: 112370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:27:21,756-Speed 10718.68 samples/sec   Loss 6.9903   LearningRate 0.0198   Epoch: 22   Global Step: 112380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:27:22,721-Speed 10620.10 samples/sec   Loss 7.0498   LearningRate 0.0198   Epoch: 22   Global Step: 112390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:27:23,705-Speed 10413.54 samples/sec   Loss 7.0065   LearningRate 0.0198   Epoch: 22   Global Step: 112400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:27:24,657-Speed 10770.86 samples/sec   Loss 6.9523   LearningRate 0.0197   Epoch: 22   Global Step: 112410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:27:25,583-Speed 11061.59 samples/sec   Loss 7.0427   LearningRate 0.0197   Epoch: 22   Global Step: 112420   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:27:26,556-Speed 10530.16 samples/sec   Loss 6.9747   LearningRate 0.0197   Epoch: 22   Global Step: 112430   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:27:27,529-Speed 10543.96 samples/sec   Loss 6.9961   LearningRate 0.0197   Epoch: 22   Global Step: 112440   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:27:28,475-Speed 10843.37 samples/sec   Loss 6.8639   LearningRate 0.0197   Epoch: 22   Global Step: 112450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:27:29,426-Speed 10773.51 samples/sec   Loss 7.0677   LearningRate 0.0197   Epoch: 22   Global Step: 112460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:27:30,399-Speed 10535.94 samples/sec   Loss 7.0630   LearningRate 0.0197   Epoch: 22   Global Step: 112470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:27:31,346-Speed 10820.99 samples/sec   Loss 6.9654   LearningRate 0.0197   Epoch: 22   Global Step: 112480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:27:32,313-Speed 10615.98 samples/sec   Loss 7.0324   LearningRate 0.0197   Epoch: 22   Global Step: 112490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:27:33,250-Speed 10958.15 samples/sec   Loss 7.1184   LearningRate 0.0197   Epoch: 22   Global Step: 112500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:27:34,167-Speed 11170.93 samples/sec   Loss 7.1156   LearningRate 0.0197   Epoch: 22   Global Step: 112510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:27:35,075-Speed 11294.35 samples/sec   Loss 7.0221   LearningRate 0.0197   Epoch: 22   Global Step: 112520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:27:36,019-Speed 10851.52 samples/sec   Loss 7.2701   LearningRate 0.0197   Epoch: 22   Global Step: 112530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:27:36,999-Speed 10465.69 samples/sec   Loss 7.1379   LearningRate 0.0197   Epoch: 22   Global Step: 112540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:27:37,920-Speed 11128.65 samples/sec   Loss 7.0648   LearningRate 0.0197   Epoch: 22   Global Step: 112550   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:27:38,865-Speed 10848.66 samples/sec   Loss 7.0181   LearningRate 0.0197   Epoch: 22   Global Step: 112560   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:27:39,810-Speed 10837.45 samples/sec   Loss 6.9545   LearningRate 0.0197   Epoch: 22   Global Step: 112570   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:27:40,749-Speed 10916.04 samples/sec   Loss 6.9840   LearningRate 0.0197   Epoch: 22   Global Step: 112580   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:27:41,658-Speed 11285.75 samples/sec   Loss 6.9968   LearningRate 0.0197   Epoch: 22   Global Step: 112590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:27:42,628-Speed 10566.13 samples/sec   Loss 6.9742   LearningRate 0.0197   Epoch: 22   Global Step: 112600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:27:43,591-Speed 10641.54 samples/sec   Loss 6.8800   LearningRate 0.0197   Epoch: 22   Global Step: 112610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:27:44,541-Speed 10783.12 samples/sec   Loss 7.1725   LearningRate 0.0197   Epoch: 22   Global Step: 112620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:27:45,492-Speed 10777.82 samples/sec   Loss 7.2056   LearningRate 0.0197   Epoch: 22   Global Step: 112630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:27:46,435-Speed 10876.95 samples/sec   Loss 7.1134   LearningRate 0.0196   Epoch: 22   Global Step: 112640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:27:47,346-Speed 11240.64 samples/sec   Loss 7.0538   LearningRate 0.0196   Epoch: 22   Global Step: 112650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:27:48,300-Speed 10747.59 samples/sec   Loss 6.9151   LearningRate 0.0196   Epoch: 22   Global Step: 112660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:27:49,279-Speed 10467.56 samples/sec   Loss 7.1422   LearningRate 0.0196   Epoch: 22   Global Step: 112670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:27:50,253-Speed 10520.17 samples/sec   Loss 7.1028   LearningRate 0.0196   Epoch: 22   Global Step: 112680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:27:51,167-Speed 11227.59 samples/sec   Loss 7.0526   LearningRate 0.0196   Epoch: 22   Global Step: 112690   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:27:52,126-Speed 10677.92 samples/sec   Loss 7.0230   LearningRate 0.0196   Epoch: 22   Global Step: 112700   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:27:53,077-Speed 10770.93 samples/sec   Loss 6.9678   LearningRate 0.0196   Epoch: 22   Global Step: 112710   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:27:54,032-Speed 10737.83 samples/sec   Loss 7.1433   LearningRate 0.0196   Epoch: 22   Global Step: 112720   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:27:54,974-Speed 10885.70 samples/sec   Loss 7.0356   LearningRate 0.0196   Epoch: 22   Global Step: 112730   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:27:55,908-Speed 10964.93 samples/sec   Loss 7.0136   LearningRate 0.0196   Epoch: 22   Global Step: 112740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:27:56,874-Speed 10612.86 samples/sec   Loss 7.0624   LearningRate 0.0196   Epoch: 22   Global Step: 112750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:27:57,808-Speed 10964.03 samples/sec   Loss 6.9746   LearningRate 0.0196   Epoch: 22   Global Step: 112760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:27:58,750-Speed 10891.90 samples/sec   Loss 7.1601   LearningRate 0.0196   Epoch: 22   Global Step: 112770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:27:59,697-Speed 10811.82 samples/sec   Loss 7.0027   LearningRate 0.0196   Epoch: 22   Global Step: 112780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:28:00,653-Speed 10720.07 samples/sec   Loss 7.1138   LearningRate 0.0196   Epoch: 22   Global Step: 112790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:28:01,617-Speed 10633.25 samples/sec   Loss 6.9931   LearningRate 0.0196   Epoch: 22   Global Step: 112800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:28:02,579-Speed 10663.69 samples/sec   Loss 6.9988   LearningRate 0.0196   Epoch: 22   Global Step: 112810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:28:03,526-Speed 10818.80 samples/sec   Loss 7.0603   LearningRate 0.0196   Epoch: 22   Global Step: 112820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:28:04,483-Speed 10704.00 samples/sec   Loss 7.1670   LearningRate 0.0196   Epoch: 22   Global Step: 112830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:28:05,446-Speed 10649.17 samples/sec   Loss 7.0857   LearningRate 0.0196   Epoch: 22   Global Step: 112840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:28:06,417-Speed 10549.46 samples/sec   Loss 7.0497   LearningRate 0.0196   Epoch: 22   Global Step: 112850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:28:07,379-Speed 10649.89 samples/sec   Loss 7.1546   LearningRate 0.0196   Epoch: 22   Global Step: 112860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:28:08,365-Speed 10399.95 samples/sec   Loss 7.0271   LearningRate 0.0195   Epoch: 22   Global Step: 112870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:28:09,371-Speed 10190.52 samples/sec   Loss 7.0185   LearningRate 0.0195   Epoch: 22   Global Step: 112880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:28:10,294-Speed 11106.83 samples/sec   Loss 7.0128   LearningRate 0.0195   Epoch: 22   Global Step: 112890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:28:11,264-Speed 10569.13 samples/sec   Loss 7.1566   LearningRate 0.0195   Epoch: 22   Global Step: 112900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:28:12,229-Speed 10614.60 samples/sec   Loss 7.1356   LearningRate 0.0195   Epoch: 22   Global Step: 112910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:28:13,161-Speed 10991.87 samples/sec   Loss 7.0895   LearningRate 0.0195   Epoch: 22   Global Step: 112920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:28:14,149-Speed 10382.02 samples/sec   Loss 7.0895   LearningRate 0.0195   Epoch: 22   Global Step: 112930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:28:15,093-Speed 10861.42 samples/sec   Loss 7.0082   LearningRate 0.0195   Epoch: 22   Global Step: 112940   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:28:16,070-Speed 10479.03 samples/sec   Loss 6.9119   LearningRate 0.0195   Epoch: 22   Global Step: 112950   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:28:17,019-Speed 10803.16 samples/sec   Loss 7.0101   LearningRate 0.0195   Epoch: 22   Global Step: 112960   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:28:17,973-Speed 10747.98 samples/sec   Loss 7.1230   LearningRate 0.0195   Epoch: 22   Global Step: 112970   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:28:18,931-Speed 10686.36 samples/sec   Loss 7.1112   LearningRate 0.0195   Epoch: 22   Global Step: 112980   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:28:19,886-Speed 10730.63 samples/sec   Loss 7.0288   LearningRate 0.0195   Epoch: 22   Global Step: 112990   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:28:20,788-Speed 11366.35 samples/sec   Loss 6.9754   LearningRate 0.0195   Epoch: 22   Global Step: 113000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:28:21,691-Speed 11347.23 samples/sec   Loss 7.2354   LearningRate 0.0195   Epoch: 22   Global Step: 113010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:28:22,655-Speed 10634.93 samples/sec   Loss 7.0581   LearningRate 0.0195   Epoch: 22   Global Step: 113020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:28:23,655-Speed 10255.91 samples/sec   Loss 7.0541   LearningRate 0.0195   Epoch: 22   Global Step: 113030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:28:24,616-Speed 10668.30 samples/sec   Loss 7.1322   LearningRate 0.0195   Epoch: 22   Global Step: 113040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:28:25,527-Speed 11237.77 samples/sec   Loss 7.0890   LearningRate 0.0195   Epoch: 22   Global Step: 113050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:28:26,524-Speed 10280.17 samples/sec   Loss 7.1099   LearningRate 0.0195   Epoch: 22   Global Step: 113060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:28:27,487-Speed 10650.51 samples/sec   Loss 7.2559   LearningRate 0.0195   Epoch: 22   Global Step: 113070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:28:28,431-Speed 10857.66 samples/sec   Loss 7.1565   LearningRate 0.0195   Epoch: 22   Global Step: 113080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:28:29,399-Speed 10594.96 samples/sec   Loss 7.2401   LearningRate 0.0195   Epoch: 22   Global Step: 113090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:28:30,323-Speed 11094.43 samples/sec   Loss 7.3340   LearningRate 0.0194   Epoch: 22   Global Step: 113100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:28:31,252-Speed 11027.24 samples/sec   Loss 7.2863   LearningRate 0.0194   Epoch: 22   Global Step: 113110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:28:32,179-Speed 11067.64 samples/sec   Loss 6.9456   LearningRate 0.0194   Epoch: 22   Global Step: 113120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:28:33,131-Speed 10757.49 samples/sec   Loss 7.1409   LearningRate 0.0194   Epoch: 22   Global Step: 113130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:28:34,088-Speed 10705.26 samples/sec   Loss 7.1292   LearningRate 0.0194   Epoch: 22   Global Step: 113140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:28:35,049-Speed 10675.22 samples/sec   Loss 7.0754   LearningRate 0.0194   Epoch: 22   Global Step: 113150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:28:35,968-Speed 11148.99 samples/sec   Loss 7.1842   LearningRate 0.0194   Epoch: 22   Global Step: 113160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:28:36,975-Speed 10174.58 samples/sec   Loss 6.9222   LearningRate 0.0194   Epoch: 22   Global Step: 113170   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:28:37,920-Speed 10851.48 samples/sec   Loss 7.0346   LearningRate 0.0194   Epoch: 22   Global Step: 113180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:28:38,863-Speed 10872.67 samples/sec   Loss 7.2757   LearningRate 0.0194   Epoch: 22   Global Step: 113190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:28:39,795-Speed 10998.73 samples/sec   Loss 7.1757   LearningRate 0.0194   Epoch: 22   Global Step: 113200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:28:40,729-Speed 10968.89 samples/sec   Loss 7.2084   LearningRate 0.0194   Epoch: 22   Global Step: 113210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:28:41,674-Speed 10849.93 samples/sec   Loss 7.0680   LearningRate 0.0194   Epoch: 22   Global Step: 113220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:28:42,679-Speed 10191.61 samples/sec   Loss 7.1429   LearningRate 0.0194   Epoch: 22   Global Step: 113230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:28:43,635-Speed 10727.39 samples/sec   Loss 7.1034   LearningRate 0.0194   Epoch: 22   Global Step: 113240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:28:44,572-Speed 10941.43 samples/sec   Loss 7.2499   LearningRate 0.0194   Epoch: 22   Global Step: 113250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:28:45,505-Speed 10980.07 samples/sec   Loss 7.2853   LearningRate 0.0194   Epoch: 22   Global Step: 113260   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:28:46,462-Speed 10713.10 samples/sec   Loss 7.0806   LearningRate 0.0194   Epoch: 22   Global Step: 113270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:28:47,399-Speed 10940.24 samples/sec   Loss 7.0473   LearningRate 0.0194   Epoch: 22   Global Step: 113280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:28:48,354-Speed 10729.80 samples/sec   Loss 7.2077   LearningRate 0.0194   Epoch: 22   Global Step: 113290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:28:49,271-Speed 11177.25 samples/sec   Loss 7.0798   LearningRate 0.0194   Epoch: 22   Global Step: 113300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:28:50,230-Speed 10687.71 samples/sec   Loss 7.2051   LearningRate 0.0194   Epoch: 22   Global Step: 113310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:28:51,214-Speed 10422.74 samples/sec   Loss 7.0173   LearningRate 0.0194   Epoch: 22   Global Step: 113320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:28:52,161-Speed 10824.14 samples/sec   Loss 7.1243   LearningRate 0.0193   Epoch: 22   Global Step: 113330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:28:53,150-Speed 10365.90 samples/sec   Loss 7.1723   LearningRate 0.0193   Epoch: 22   Global Step: 113340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:28:54,110-Speed 10681.47 samples/sec   Loss 7.1890   LearningRate 0.0193   Epoch: 22   Global Step: 113350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:28:55,064-Speed 10736.86 samples/sec   Loss 7.2474   LearningRate 0.0193   Epoch: 22   Global Step: 113360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:28:55,965-Speed 11385.53 samples/sec   Loss 7.0151   LearningRate 0.0193   Epoch: 22   Global Step: 113370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:28:56,903-Speed 10918.66 samples/sec   Loss 7.1972   LearningRate 0.0193   Epoch: 22   Global Step: 113380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:28:57,909-Speed 10187.92 samples/sec   Loss 7.2082   LearningRate 0.0193   Epoch: 22   Global Step: 113390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:28:58,816-Speed 11311.04 samples/sec   Loss 6.9642   LearningRate 0.0193   Epoch: 22   Global Step: 113400   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:28:59,790-Speed 10524.04 samples/sec   Loss 7.0012   LearningRate 0.0193   Epoch: 22   Global Step: 113410   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:29:00,772-Speed 10435.74 samples/sec   Loss 7.0477   LearningRate 0.0193   Epoch: 22   Global Step: 113420   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:29:01,725-Speed 10754.84 samples/sec   Loss 7.1386   LearningRate 0.0193   Epoch: 22   Global Step: 113430   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:29:02,662-Speed 10934.30 samples/sec   Loss 7.3278   LearningRate 0.0193   Epoch: 22   Global Step: 113440   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:29:03,596-Speed 10972.56 samples/sec   Loss 7.2136   LearningRate 0.0193   Epoch: 22   Global Step: 113450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:29:04,564-Speed 10587.84 samples/sec   Loss 7.2407   LearningRate 0.0193   Epoch: 22   Global Step: 113460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:29:05,516-Speed 10759.12 samples/sec   Loss 7.1527   LearningRate 0.0193   Epoch: 22   Global Step: 113470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:29:06,448-Speed 10999.88 samples/sec   Loss 7.2367   LearningRate 0.0193   Epoch: 22   Global Step: 113480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:29:07,381-Speed 10979.84 samples/sec   Loss 7.3112   LearningRate 0.0193   Epoch: 22   Global Step: 113490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:29:08,351-Speed 10572.05 samples/sec   Loss 7.1143   LearningRate 0.0193   Epoch: 22   Global Step: 113500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:29:09,325-Speed 10517.37 samples/sec   Loss 7.1548   LearningRate 0.0193   Epoch: 22   Global Step: 113510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:29:10,288-Speed 10645.10 samples/sec   Loss 7.0253   LearningRate 0.0193   Epoch: 22   Global Step: 113520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:29:11,266-Speed 10485.10 samples/sec   Loss 7.1240   LearningRate 0.0193   Epoch: 22   Global Step: 113530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:29:12,205-Speed 10909.50 samples/sec   Loss 7.1155   LearningRate 0.0193   Epoch: 22   Global Step: 113540   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:29:13,170-Speed 10624.07 samples/sec   Loss 7.2791   LearningRate 0.0193   Epoch: 22   Global Step: 113550   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:29:14,124-Speed 10738.46 samples/sec   Loss 7.1734   LearningRate 0.0192   Epoch: 22   Global Step: 113560   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:29:15,089-Speed 10619.26 samples/sec   Loss 7.0511   LearningRate 0.0192   Epoch: 22   Global Step: 113570   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:29:16,012-Speed 11106.08 samples/sec   Loss 7.2380   LearningRate 0.0192   Epoch: 22   Global Step: 113580   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:29:16,934-Speed 11119.18 samples/sec   Loss 7.1249   LearningRate 0.0192   Epoch: 22   Global Step: 113590   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:29:17,872-Speed 10927.74 samples/sec   Loss 7.2338   LearningRate 0.0192   Epoch: 22   Global Step: 113600   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:29:18,769-Speed 11426.34 samples/sec   Loss 7.2083   LearningRate 0.0192   Epoch: 22   Global Step: 113610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:29:19,700-Speed 10999.63 samples/sec   Loss 7.2405   LearningRate 0.0192   Epoch: 22   Global Step: 113620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:29:20,653-Speed 10760.39 samples/sec   Loss 7.1530   LearningRate 0.0192   Epoch: 22   Global Step: 113630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:29:21,603-Speed 10781.17 samples/sec   Loss 7.1461   LearningRate 0.0192   Epoch: 22   Global Step: 113640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:29:22,521-Speed 11166.02 samples/sec   Loss 7.2734   LearningRate 0.0192   Epoch: 22   Global Step: 113650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:29:23,447-Speed 11074.20 samples/sec   Loss 7.1732   LearningRate 0.0192   Epoch: 22   Global Step: 113660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:29:24,380-Speed 10979.35 samples/sec   Loss 7.1840   LearningRate 0.0192   Epoch: 22   Global Step: 113670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:29:25,279-Speed 11399.69 samples/sec   Loss 7.2291   LearningRate 0.0192   Epoch: 22   Global Step: 113680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:29:26,246-Speed 10593.48 samples/sec   Loss 7.2551   LearningRate 0.0192   Epoch: 22   Global Step: 113690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:29:27,228-Speed 10441.07 samples/sec   Loss 7.2040   LearningRate 0.0192   Epoch: 22   Global Step: 113700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:29:28,162-Speed 10970.83 samples/sec   Loss 7.2263   LearningRate 0.0192   Epoch: 22   Global Step: 113710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:29:29,115-Speed 10750.55 samples/sec   Loss 7.2256   LearningRate 0.0192   Epoch: 22   Global Step: 113720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:29:30,020-Speed 11328.50 samples/sec   Loss 7.0958   LearningRate 0.0192   Epoch: 22   Global Step: 113730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:29:30,963-Speed 10867.73 samples/sec   Loss 6.9735   LearningRate 0.0192   Epoch: 22   Global Step: 113740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:29:31,898-Speed 10961.74 samples/sec   Loss 7.1125   LearningRate 0.0192   Epoch: 22   Global Step: 113750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:29:32,845-Speed 10824.89 samples/sec   Loss 7.2196   LearningRate 0.0192   Epoch: 22   Global Step: 113760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:29:33,815-Speed 10568.16 samples/sec   Loss 7.3084   LearningRate 0.0192   Epoch: 22   Global Step: 113770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:29:34,817-Speed 10229.72 samples/sec   Loss 7.1004   LearningRate 0.0192   Epoch: 22   Global Step: 113780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:29:35,818-Speed 10237.14 samples/sec   Loss 7.1246   LearningRate 0.0191   Epoch: 22   Global Step: 113790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:29:36,777-Speed 10682.40 samples/sec   Loss 7.1645   LearningRate 0.0191   Epoch: 22   Global Step: 113800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:29:37,768-Speed 10344.70 samples/sec   Loss 7.1746   LearningRate 0.0191   Epoch: 22   Global Step: 113810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:29:38,717-Speed 10801.49 samples/sec   Loss 7.1176   LearningRate 0.0191   Epoch: 22   Global Step: 113820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:29:39,627-Speed 11260.53 samples/sec   Loss 7.3217   LearningRate 0.0191   Epoch: 22   Global Step: 113830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:29:40,568-Speed 10894.46 samples/sec   Loss 6.9914   LearningRate 0.0191   Epoch: 22   Global Step: 113840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:29:41,530-Speed 10651.64 samples/sec   Loss 7.1085   LearningRate 0.0191   Epoch: 22   Global Step: 113850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:29:42,463-Speed 10976.80 samples/sec   Loss 7.3586   LearningRate 0.0191   Epoch: 22   Global Step: 113860   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:29:43,455-Speed 10337.89 samples/sec   Loss 7.1717   LearningRate 0.0191   Epoch: 22   Global Step: 113870   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:29:44,429-Speed 10521.36 samples/sec   Loss 7.2413   LearningRate 0.0191   Epoch: 22   Global Step: 113880   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:29:45,388-Speed 10692.00 samples/sec   Loss 7.1586   LearningRate 0.0191   Epoch: 22   Global Step: 113890   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:29:46,356-Speed 10576.42 samples/sec   Loss 7.1819   LearningRate 0.0191   Epoch: 22   Global Step: 113900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:29:47,260-Speed 11340.11 samples/sec   Loss 7.2614   LearningRate 0.0191   Epoch: 22   Global Step: 113910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:29:48,200-Speed 10897.76 samples/sec   Loss 7.0159   LearningRate 0.0191   Epoch: 22   Global Step: 113920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:29:49,139-Speed 10917.19 samples/sec   Loss 7.1765   LearningRate 0.0191   Epoch: 22   Global Step: 113930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:29:50,101-Speed 10660.62 samples/sec   Loss 7.1655   LearningRate 0.0191   Epoch: 22   Global Step: 113940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:29:51,031-Speed 11017.18 samples/sec   Loss 7.2468   LearningRate 0.0191   Epoch: 22   Global Step: 113950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:29:52,026-Speed 10294.87 samples/sec   Loss 7.3373   LearningRate 0.0191   Epoch: 22   Global Step: 113960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:29:52,978-Speed 10765.10 samples/sec   Loss 7.2763   LearningRate 0.0191   Epoch: 22   Global Step: 113970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:29:53,941-Speed 10643.74 samples/sec   Loss 7.1958   LearningRate 0.0191   Epoch: 22   Global Step: 113980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:29:54,873-Speed 11005.60 samples/sec   Loss 7.3646   LearningRate 0.0191   Epoch: 22   Global Step: 113990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:29:55,793-Speed 11129.72 samples/sec   Loss 7.1628   LearningRate 0.0191   Epoch: 22   Global Step: 114000   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:30:17,941-[lfw][114000]XNorm: 10.236233
Training: 2022-04-11 03:30:17,942-[lfw][114000]Accuracy-Flip: 0.99600+-0.00335
Training: 2022-04-11 03:30:17,942-[lfw][114000]Accuracy-Highest: 0.99667
Training: 2022-04-11 03:30:43,391-[cfp_fp][114000]XNorm: 8.686158
Training: 2022-04-11 03:30:43,392-[cfp_fp][114000]Accuracy-Flip: 0.96071+-0.01025
Training: 2022-04-11 03:30:43,393-[cfp_fp][114000]Accuracy-Highest: 0.96600
Training: 2022-04-11 03:31:05,465-[agedb_30][114000]XNorm: 9.924356
Training: 2022-04-11 03:31:05,466-[agedb_30][114000]Accuracy-Flip: 0.96917+-0.00757
Training: 2022-04-11 03:31:05,466-[agedb_30][114000]Accuracy-Highest: 0.97017
Training: 2022-04-11 03:31:06,402-Speed 145.03 samples/sec   Loss 7.0419   LearningRate 0.0191   Epoch: 22   Global Step: 114010   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:31:07,369-Speed 10600.03 samples/sec   Loss 7.2374   LearningRate 0.0190   Epoch: 22   Global Step: 114020   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:31:08,294-Speed 11098.81 samples/sec   Loss 7.1256   LearningRate 0.0190   Epoch: 22   Global Step: 114030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:31:09,263-Speed 10571.98 samples/sec   Loss 7.2822   LearningRate 0.0190   Epoch: 22   Global Step: 114040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:31:10,206-Speed 10860.91 samples/sec   Loss 7.1289   LearningRate 0.0190   Epoch: 22   Global Step: 114050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:31:11,184-Speed 10485.18 samples/sec   Loss 7.0369   LearningRate 0.0190   Epoch: 22   Global Step: 114060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:31:12,120-Speed 10948.19 samples/sec   Loss 7.1077   LearningRate 0.0190   Epoch: 22   Global Step: 114070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:31:13,075-Speed 10735.14 samples/sec   Loss 7.1941   LearningRate 0.0190   Epoch: 22   Global Step: 114080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:31:14,051-Speed 10499.17 samples/sec   Loss 7.2493   LearningRate 0.0190   Epoch: 22   Global Step: 114090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:31:14,992-Speed 10885.42 samples/sec   Loss 7.2050   LearningRate 0.0190   Epoch: 22   Global Step: 114100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:31:15,949-Speed 10710.05 samples/sec   Loss 7.2345   LearningRate 0.0190   Epoch: 22   Global Step: 114110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:31:16,871-Speed 11116.70 samples/sec   Loss 7.2056   LearningRate 0.0190   Epoch: 22   Global Step: 114120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:31:17,859-Speed 10374.73 samples/sec   Loss 7.2903   LearningRate 0.0190   Epoch: 22   Global Step: 114130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:31:18,800-Speed 10889.26 samples/sec   Loss 7.3424   LearningRate 0.0190   Epoch: 22   Global Step: 114140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:31:19,776-Speed 10508.22 samples/sec   Loss 7.1756   LearningRate 0.0190   Epoch: 22   Global Step: 114150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:31:20,709-Speed 10978.40 samples/sec   Loss 7.1870   LearningRate 0.0190   Epoch: 22   Global Step: 114160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:31:21,690-Speed 10444.98 samples/sec   Loss 7.3003   LearningRate 0.0190   Epoch: 22   Global Step: 114170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:31:22,662-Speed 10544.46 samples/sec   Loss 7.1240   LearningRate 0.0190   Epoch: 22   Global Step: 114180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:31:23,640-Speed 10482.21 samples/sec   Loss 7.1622   LearningRate 0.0190   Epoch: 22   Global Step: 114190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:31:24,596-Speed 10719.41 samples/sec   Loss 7.1595   LearningRate 0.0190   Epoch: 22   Global Step: 114200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:31:25,530-Speed 10967.29 samples/sec   Loss 7.1335   LearningRate 0.0190   Epoch: 22   Global Step: 114210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:31:26,497-Speed 10596.21 samples/sec   Loss 7.3442   LearningRate 0.0190   Epoch: 22   Global Step: 114220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:31:27,440-Speed 10874.31 samples/sec   Loss 7.1916   LearningRate 0.0190   Epoch: 22   Global Step: 114230   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:31:28,375-Speed 10955.60 samples/sec   Loss 7.2464   LearningRate 0.0190   Epoch: 22   Global Step: 114240   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:31:29,324-Speed 10799.11 samples/sec   Loss 7.2916   LearningRate 0.0189   Epoch: 22   Global Step: 114250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:31:30,281-Speed 10706.57 samples/sec   Loss 7.3326   LearningRate 0.0189   Epoch: 22   Global Step: 114260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:31:31,245-Speed 10636.30 samples/sec   Loss 7.3959   LearningRate 0.0189   Epoch: 22   Global Step: 114270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:31:32,192-Speed 10816.40 samples/sec   Loss 7.1081   LearningRate 0.0189   Epoch: 22   Global Step: 114280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:31:33,150-Speed 10701.76 samples/sec   Loss 7.2736   LearningRate 0.0189   Epoch: 22   Global Step: 114290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:31:34,099-Speed 10801.01 samples/sec   Loss 7.2806   LearningRate 0.0189   Epoch: 22   Global Step: 114300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:31:35,041-Speed 10867.89 samples/sec   Loss 7.2599   LearningRate 0.0189   Epoch: 22   Global Step: 114310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:31:36,012-Speed 10556.94 samples/sec   Loss 7.3585   LearningRate 0.0189   Epoch: 22   Global Step: 114320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:31:36,939-Speed 11059.56 samples/sec   Loss 7.3610   LearningRate 0.0189   Epoch: 22   Global Step: 114330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:31:37,869-Speed 11021.96 samples/sec   Loss 7.1744   LearningRate 0.0189   Epoch: 22   Global Step: 114340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:31:38,796-Speed 11047.38 samples/sec   Loss 7.1115   LearningRate 0.0189   Epoch: 22   Global Step: 114350   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:31:39,710-Speed 11211.99 samples/sec   Loss 7.2652   LearningRate 0.0189   Epoch: 22   Global Step: 114360   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:31:40,680-Speed 10584.86 samples/sec   Loss 7.2324   LearningRate 0.0189   Epoch: 22   Global Step: 114370   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:31:41,621-Speed 10888.32 samples/sec   Loss 7.1924   LearningRate 0.0189   Epoch: 22   Global Step: 114380   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:31:42,575-Speed 10739.65 samples/sec   Loss 7.2769   LearningRate 0.0189   Epoch: 22   Global Step: 114390   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:31:43,530-Speed 10736.97 samples/sec   Loss 7.2909   LearningRate 0.0189   Epoch: 22   Global Step: 114400   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:31:44,491-Speed 10655.81 samples/sec   Loss 7.0772   LearningRate 0.0189   Epoch: 22   Global Step: 114410   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:31:45,414-Speed 11116.31 samples/sec   Loss 7.1744   LearningRate 0.0189   Epoch: 22   Global Step: 114420   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:31:46,380-Speed 10600.94 samples/sec   Loss 7.2787   LearningRate 0.0189   Epoch: 22   Global Step: 114430   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:31:47,323-Speed 10873.99 samples/sec   Loss 7.2748   LearningRate 0.0189   Epoch: 22   Global Step: 114440   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:31:48,271-Speed 10815.69 samples/sec   Loss 7.1752   LearningRate 0.0189   Epoch: 22   Global Step: 114450   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:31:49,241-Speed 10563.85 samples/sec   Loss 7.2229   LearningRate 0.0189   Epoch: 22   Global Step: 114460   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:31:50,228-Speed 10383.95 samples/sec   Loss 7.1945   LearningRate 0.0189   Epoch: 22   Global Step: 114470   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:31:51,166-Speed 10919.66 samples/sec   Loss 7.1560   LearningRate 0.0188   Epoch: 22   Global Step: 114480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:31:52,121-Speed 10729.81 samples/sec   Loss 7.1173   LearningRate 0.0188   Epoch: 22   Global Step: 114490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:31:53,069-Speed 10810.88 samples/sec   Loss 7.1041   LearningRate 0.0188   Epoch: 22   Global Step: 114500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:31:54,008-Speed 10921.52 samples/sec   Loss 7.4707   LearningRate 0.0188   Epoch: 22   Global Step: 114510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:31:54,961-Speed 10756.59 samples/sec   Loss 7.2399   LearningRate 0.0188   Epoch: 22   Global Step: 114520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:31:55,899-Speed 10924.23 samples/sec   Loss 7.3772   LearningRate 0.0188   Epoch: 22   Global Step: 114530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:31:56,830-Speed 10997.62 samples/sec   Loss 7.3732   LearningRate 0.0188   Epoch: 22   Global Step: 114540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:31:57,749-Speed 11159.78 samples/sec   Loss 7.3136   LearningRate 0.0188   Epoch: 22   Global Step: 114550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:31:58,683-Speed 10961.79 samples/sec   Loss 7.2492   LearningRate 0.0188   Epoch: 22   Global Step: 114560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:31:59,611-Speed 11055.96 samples/sec   Loss 7.1886   LearningRate 0.0188   Epoch: 22   Global Step: 114570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:00,565-Speed 10731.82 samples/sec   Loss 7.2268   LearningRate 0.0188   Epoch: 22   Global Step: 114580   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:32:01,512-Speed 10827.86 samples/sec   Loss 7.3711   LearningRate 0.0188   Epoch: 22   Global Step: 114590   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:32:02,438-Speed 11071.40 samples/sec   Loss 7.1014   LearningRate 0.0188   Epoch: 22   Global Step: 114600   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:32:03,387-Speed 10798.64 samples/sec   Loss 7.4459   LearningRate 0.0188   Epoch: 22   Global Step: 114610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:04,347-Speed 10668.36 samples/sec   Loss 7.2929   LearningRate 0.0188   Epoch: 22   Global Step: 114620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:05,241-Speed 11467.66 samples/sec   Loss 7.2723   LearningRate 0.0188   Epoch: 22   Global Step: 114630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:06,197-Speed 10721.20 samples/sec   Loss 7.3999   LearningRate 0.0188   Epoch: 22   Global Step: 114640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:07,131-Speed 10967.07 samples/sec   Loss 7.1943   LearningRate 0.0188   Epoch: 22   Global Step: 114650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:08,121-Speed 10356.29 samples/sec   Loss 7.3199   LearningRate 0.0188   Epoch: 22   Global Step: 114660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:09,161-Speed 9854.70 samples/sec   Loss 7.2643   LearningRate 0.0188   Epoch: 22   Global Step: 114670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:10,140-Speed 10469.18 samples/sec   Loss 7.3221   LearningRate 0.0188   Epoch: 22   Global Step: 114680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:11,104-Speed 10628.84 samples/sec   Loss 7.3250   LearningRate 0.0188   Epoch: 22   Global Step: 114690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:12,064-Speed 10681.59 samples/sec   Loss 7.2308   LearningRate 0.0188   Epoch: 22   Global Step: 114700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:13,060-Speed 10283.49 samples/sec   Loss 7.1924   LearningRate 0.0188   Epoch: 22   Global Step: 114710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:14,003-Speed 10880.32 samples/sec   Loss 7.0132   LearningRate 0.0187   Epoch: 22   Global Step: 114720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:14,958-Speed 10726.33 samples/sec   Loss 7.2323   LearningRate 0.0187   Epoch: 22   Global Step: 114730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:15,877-Speed 11149.22 samples/sec   Loss 7.1435   LearningRate 0.0187   Epoch: 22   Global Step: 114740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:16,825-Speed 10815.78 samples/sec   Loss 7.1722   LearningRate 0.0187   Epoch: 22   Global Step: 114750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:17,825-Speed 10248.26 samples/sec   Loss 7.1610   LearningRate 0.0187   Epoch: 22   Global Step: 114760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:18,788-Speed 10646.16 samples/sec   Loss 7.1106   LearningRate 0.0187   Epoch: 22   Global Step: 114770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:19,755-Speed 10596.23 samples/sec   Loss 7.1502   LearningRate 0.0187   Epoch: 22   Global Step: 114780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:20,707-Speed 10761.70 samples/sec   Loss 7.2725   LearningRate 0.0187   Epoch: 22   Global Step: 114790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:21,620-Speed 11229.88 samples/sec   Loss 7.1840   LearningRate 0.0187   Epoch: 22   Global Step: 114800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:22,569-Speed 10795.35 samples/sec   Loss 7.4263   LearningRate 0.0187   Epoch: 22   Global Step: 114810   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:32:23,541-Speed 10541.05 samples/sec   Loss 7.2215   LearningRate 0.0187   Epoch: 22   Global Step: 114820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:24,504-Speed 10649.37 samples/sec   Loss 7.2226   LearningRate 0.0187   Epoch: 22   Global Step: 114830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:25,450-Speed 10841.37 samples/sec   Loss 7.2044   LearningRate 0.0187   Epoch: 22   Global Step: 114840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:26,379-Speed 11032.37 samples/sec   Loss 7.1456   LearningRate 0.0187   Epoch: 22   Global Step: 114850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:27,325-Speed 10830.38 samples/sec   Loss 7.3874   LearningRate 0.0187   Epoch: 22   Global Step: 114860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:28,278-Speed 10749.69 samples/sec   Loss 7.1691   LearningRate 0.0187   Epoch: 22   Global Step: 114870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:29,230-Speed 10762.85 samples/sec   Loss 7.3038   LearningRate 0.0187   Epoch: 22   Global Step: 114880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:30,249-Speed 10064.50 samples/sec   Loss 7.2628   LearningRate 0.0187   Epoch: 22   Global Step: 114890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:31,192-Speed 10862.78 samples/sec   Loss 7.0868   LearningRate 0.0187   Epoch: 22   Global Step: 114900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:32,101-Speed 11270.90 samples/sec   Loss 6.9641   LearningRate 0.0187   Epoch: 22   Global Step: 114910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:33,047-Speed 10837.72 samples/sec   Loss 7.2395   LearningRate 0.0187   Epoch: 22   Global Step: 114920   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:32:33,999-Speed 10765.87 samples/sec   Loss 7.3464   LearningRate 0.0187   Epoch: 22   Global Step: 114930   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:32:34,995-Speed 10287.68 samples/sec   Loss 7.2520   LearningRate 0.0187   Epoch: 22   Global Step: 114940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:35,948-Speed 10753.87 samples/sec   Loss 7.1398   LearningRate 0.0186   Epoch: 22   Global Step: 114950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:36,920-Speed 10553.82 samples/sec   Loss 7.2392   LearningRate 0.0186   Epoch: 22   Global Step: 114960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:37,866-Speed 10832.53 samples/sec   Loss 7.1658   LearningRate 0.0186   Epoch: 22   Global Step: 114970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:38,831-Speed 10619.08 samples/sec   Loss 7.1257   LearningRate 0.0186   Epoch: 22   Global Step: 114980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:39,808-Speed 10499.38 samples/sec   Loss 7.1957   LearningRate 0.0186   Epoch: 22   Global Step: 114990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:40,708-Speed 11382.72 samples/sec   Loss 7.2518   LearningRate 0.0186   Epoch: 22   Global Step: 115000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:41,633-Speed 11080.03 samples/sec   Loss 7.2144   LearningRate 0.0186   Epoch: 22   Global Step: 115010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:42,621-Speed 10377.23 samples/sec   Loss 7.3543   LearningRate 0.0186   Epoch: 22   Global Step: 115020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:43,560-Speed 10907.99 samples/sec   Loss 7.2873   LearningRate 0.0186   Epoch: 22   Global Step: 115030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:44,534-Speed 10531.86 samples/sec   Loss 7.1484   LearningRate 0.0186   Epoch: 22   Global Step: 115040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:45,477-Speed 10868.03 samples/sec   Loss 7.2829   LearningRate 0.0186   Epoch: 22   Global Step: 115050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:46,409-Speed 10998.34 samples/sec   Loss 7.3642   LearningRate 0.0186   Epoch: 22   Global Step: 115060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:47,402-Speed 10312.19 samples/sec   Loss 7.2590   LearningRate 0.0186   Epoch: 22   Global Step: 115070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:48,366-Speed 10641.87 samples/sec   Loss 7.3147   LearningRate 0.0186   Epoch: 22   Global Step: 115080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:49,325-Speed 10683.62 samples/sec   Loss 7.1907   LearningRate 0.0186   Epoch: 22   Global Step: 115090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:50,279-Speed 10742.99 samples/sec   Loss 7.2269   LearningRate 0.0186   Epoch: 22   Global Step: 115100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:51,207-Speed 11040.79 samples/sec   Loss 7.0922   LearningRate 0.0186   Epoch: 22   Global Step: 115110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:52,166-Speed 10692.31 samples/sec   Loss 7.4311   LearningRate 0.0186   Epoch: 22   Global Step: 115120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:53,144-Speed 10473.47 samples/sec   Loss 7.3403   LearningRate 0.0186   Epoch: 22   Global Step: 115130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:54,137-Speed 10324.16 samples/sec   Loss 7.3021   LearningRate 0.0186   Epoch: 22   Global Step: 115140   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:32:55,109-Speed 10546.74 samples/sec   Loss 7.3893   LearningRate 0.0186   Epoch: 22   Global Step: 115150   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:32:56,064-Speed 10723.83 samples/sec   Loss 7.2158   LearningRate 0.0186   Epoch: 22   Global Step: 115160   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:32:57,023-Speed 10692.63 samples/sec   Loss 7.1488   LearningRate 0.0186   Epoch: 22   Global Step: 115170   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:32:57,944-Speed 11133.72 samples/sec   Loss 7.1527   LearningRate 0.0186   Epoch: 22   Global Step: 115180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:58,869-Speed 11084.16 samples/sec   Loss 7.2468   LearningRate 0.0185   Epoch: 22   Global Step: 115190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:32:59,807-Speed 10926.88 samples/sec   Loss 7.2841   LearningRate 0.0185   Epoch: 22   Global Step: 115200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:33:00,726-Speed 11146.45 samples/sec   Loss 7.1422   LearningRate 0.0185   Epoch: 22   Global Step: 115210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:33:01,665-Speed 10913.67 samples/sec   Loss 7.3029   LearningRate 0.0185   Epoch: 22   Global Step: 115220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:33:02,635-Speed 10576.05 samples/sec   Loss 7.3815   LearningRate 0.0185   Epoch: 22   Global Step: 115230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:33:03,578-Speed 10862.71 samples/sec   Loss 7.1912   LearningRate 0.0185   Epoch: 22   Global Step: 115240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:33:04,527-Speed 10806.92 samples/sec   Loss 7.3602   LearningRate 0.0185   Epoch: 22   Global Step: 115250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:33:05,456-Speed 11025.26 samples/sec   Loss 7.2632   LearningRate 0.0185   Epoch: 22   Global Step: 115260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:33:06,377-Speed 11125.81 samples/sec   Loss 7.2578   LearningRate 0.0185   Epoch: 22   Global Step: 115270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:33:07,353-Speed 10501.56 samples/sec   Loss 7.1659   LearningRate 0.0185   Epoch: 22   Global Step: 115280   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:33:08,314-Speed 10674.79 samples/sec   Loss 7.2718   LearningRate 0.0185   Epoch: 22   Global Step: 115290   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:33:09,266-Speed 10770.48 samples/sec   Loss 7.2978   LearningRate 0.0185   Epoch: 22   Global Step: 115300   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:33:10,221-Speed 10729.25 samples/sec   Loss 7.3153   LearningRate 0.0185   Epoch: 22   Global Step: 115310   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:33:11,151-Speed 11012.31 samples/sec   Loss 7.2603   LearningRate 0.0185   Epoch: 22   Global Step: 115320   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:33:12,079-Speed 11045.75 samples/sec   Loss 7.3416   LearningRate 0.0185   Epoch: 22   Global Step: 115330   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:33:13,081-Speed 10229.63 samples/sec   Loss 7.3887   LearningRate 0.0185   Epoch: 22   Global Step: 115340   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:33:14,024-Speed 10866.87 samples/sec   Loss 7.3814   LearningRate 0.0185   Epoch: 22   Global Step: 115350   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:33:14,988-Speed 10633.73 samples/sec   Loss 7.3072   LearningRate 0.0185   Epoch: 22   Global Step: 115360   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:33:15,946-Speed 10701.25 samples/sec   Loss 7.2391   LearningRate 0.0185   Epoch: 22   Global Step: 115370   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:33:16,852-Speed 11304.75 samples/sec   Loss 7.3566   LearningRate 0.0185   Epoch: 22   Global Step: 115380   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:33:17,814-Speed 10673.79 samples/sec   Loss 7.2234   LearningRate 0.0185   Epoch: 22   Global Step: 115390   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:33:18,774-Speed 10675.23 samples/sec   Loss 7.2027   LearningRate 0.0185   Epoch: 22   Global Step: 115400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:33:19,725-Speed 10774.13 samples/sec   Loss 7.3387   LearningRate 0.0185   Epoch: 22   Global Step: 115410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:33:20,680-Speed 10729.11 samples/sec   Loss 7.2462   LearningRate 0.0184   Epoch: 22   Global Step: 115420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:33:21,634-Speed 10739.77 samples/sec   Loss 7.1938   LearningRate 0.0184   Epoch: 22   Global Step: 115430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:33:22,589-Speed 10730.10 samples/sec   Loss 7.2738   LearningRate 0.0184   Epoch: 22   Global Step: 115440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:33:23,500-Speed 11249.49 samples/sec   Loss 7.1463   LearningRate 0.0184   Epoch: 22   Global Step: 115450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:33:24,484-Speed 10422.35 samples/sec   Loss 7.3352   LearningRate 0.0184   Epoch: 22   Global Step: 115460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:33:25,462-Speed 10473.68 samples/sec   Loss 7.2688   LearningRate 0.0184   Epoch: 22   Global Step: 115470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:33:26,412-Speed 10783.07 samples/sec   Loss 7.3245   LearningRate 0.0184   Epoch: 22   Global Step: 115480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:33:27,364-Speed 10771.16 samples/sec   Loss 7.2468   LearningRate 0.0184   Epoch: 22   Global Step: 115490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:33:28,317-Speed 10758.72 samples/sec   Loss 7.2679   LearningRate 0.0184   Epoch: 22   Global Step: 115500   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:33:29,273-Speed 10724.12 samples/sec   Loss 7.2379   LearningRate 0.0184   Epoch: 22   Global Step: 115510   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:33:30,220-Speed 10814.46 samples/sec   Loss 7.3003   LearningRate 0.0184   Epoch: 22   Global Step: 115520   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:33:31,116-Speed 11436.70 samples/sec   Loss 7.2464   LearningRate 0.0184   Epoch: 22   Global Step: 115530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:33:32,065-Speed 10800.49 samples/sec   Loss 7.2418   LearningRate 0.0184   Epoch: 22   Global Step: 115540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:33:33,032-Speed 10597.48 samples/sec   Loss 7.3173   LearningRate 0.0184   Epoch: 22   Global Step: 115550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:33:34,004-Speed 10545.28 samples/sec   Loss 7.3153   LearningRate 0.0184   Epoch: 22   Global Step: 115560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:33:34,959-Speed 10736.35 samples/sec   Loss 7.3340   LearningRate 0.0184   Epoch: 22   Global Step: 115570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:33:35,891-Speed 10998.84 samples/sec   Loss 7.3192   LearningRate 0.0184   Epoch: 22   Global Step: 115580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:33:36,844-Speed 10762.84 samples/sec   Loss 7.2193   LearningRate 0.0184   Epoch: 22   Global Step: 115590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:33:37,802-Speed 10694.32 samples/sec   Loss 7.2471   LearningRate 0.0184   Epoch: 22   Global Step: 115600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:33:38,733-Speed 11023.05 samples/sec   Loss 7.2343   LearningRate 0.0184   Epoch: 22   Global Step: 115610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:33:39,657-Speed 11093.00 samples/sec   Loss 7.2858   LearningRate 0.0184   Epoch: 22   Global Step: 115620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:33:40,602-Speed 10840.57 samples/sec   Loss 7.2297   LearningRate 0.0184   Epoch: 22   Global Step: 115630   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:33:41,574-Speed 10538.27 samples/sec   Loss 7.1095   LearningRate 0.0184   Epoch: 22   Global Step: 115640   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:33:42,526-Speed 10769.61 samples/sec   Loss 7.1976   LearningRate 0.0184   Epoch: 22   Global Step: 115650   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:33:43,422-Speed 11437.70 samples/sec   Loss 7.3689   LearningRate 0.0183   Epoch: 22   Global Step: 115660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:33:44,376-Speed 10743.78 samples/sec   Loss 7.2752   LearningRate 0.0183   Epoch: 22   Global Step: 115670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:33:45,346-Speed 10565.76 samples/sec   Loss 7.2976   LearningRate 0.0183   Epoch: 22   Global Step: 115680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:33:46,279-Speed 10985.52 samples/sec   Loss 7.4173   LearningRate 0.0183   Epoch: 22   Global Step: 115690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:33:47,241-Speed 10647.30 samples/sec   Loss 7.3086   LearningRate 0.0183   Epoch: 22   Global Step: 115700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:33:48,228-Speed 10420.03 samples/sec   Loss 7.2675   LearningRate 0.0183   Epoch: 22   Global Step: 115710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:33:49,147-Speed 11158.04 samples/sec   Loss 7.2993   LearningRate 0.0183   Epoch: 22   Global Step: 115720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:33:50,097-Speed 10789.01 samples/sec   Loss 7.3145   LearningRate 0.0183   Epoch: 22   Global Step: 115730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:33:51,000-Speed 11342.84 samples/sec   Loss 7.2793   LearningRate 0.0183   Epoch: 22   Global Step: 115740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:33:51,965-Speed 10618.06 samples/sec   Loss 7.2395   LearningRate 0.0183   Epoch: 22   Global Step: 115750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:33:52,909-Speed 10861.31 samples/sec   Loss 7.2250   LearningRate 0.0183   Epoch: 22   Global Step: 115760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:33:53,837-Speed 11051.69 samples/sec   Loss 7.2525   LearningRate 0.0183   Epoch: 22   Global Step: 115770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:33:54,769-Speed 11001.23 samples/sec   Loss 7.2631   LearningRate 0.0183   Epoch: 22   Global Step: 115780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:33:55,706-Speed 10935.83 samples/sec   Loss 7.3401   LearningRate 0.0183   Epoch: 22   Global Step: 115790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:33:56,653-Speed 10818.55 samples/sec   Loss 7.3009   LearningRate 0.0183   Epoch: 22   Global Step: 115800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:33:57,604-Speed 10778.41 samples/sec   Loss 7.3070   LearningRate 0.0183   Epoch: 22   Global Step: 115810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:33:58,505-Speed 11377.35 samples/sec   Loss 7.2702   LearningRate 0.0183   Epoch: 22   Global Step: 115820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:33:59,400-Speed 11452.76 samples/sec   Loss 7.2763   LearningRate 0.0183   Epoch: 22   Global Step: 115830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:34:00,370-Speed 10566.06 samples/sec   Loss 7.2684   LearningRate 0.0183   Epoch: 22   Global Step: 115840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:34:01,303-Speed 10974.54 samples/sec   Loss 7.2346   LearningRate 0.0183   Epoch: 22   Global Step: 115850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:34:02,235-Speed 11012.24 samples/sec   Loss 7.2356   LearningRate 0.0183   Epoch: 22   Global Step: 115860   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:34:03,178-Speed 10863.01 samples/sec   Loss 7.3823   LearningRate 0.0183   Epoch: 22   Global Step: 115870   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:34:04,132-Speed 10734.51 samples/sec   Loss 7.4020   LearningRate 0.0183   Epoch: 22   Global Step: 115880   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:34:05,088-Speed 10722.23 samples/sec   Loss 7.2974   LearningRate 0.0182   Epoch: 22   Global Step: 115890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:34:06,042-Speed 10747.09 samples/sec   Loss 7.3074   LearningRate 0.0182   Epoch: 22   Global Step: 115900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:34:07,002-Speed 10675.11 samples/sec   Loss 7.2544   LearningRate 0.0182   Epoch: 22   Global Step: 115910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:34:07,949-Speed 10821.38 samples/sec   Loss 7.2942   LearningRate 0.0182   Epoch: 22   Global Step: 115920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:34:08,919-Speed 10566.77 samples/sec   Loss 7.2655   LearningRate 0.0182   Epoch: 22   Global Step: 115930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:34:09,851-Speed 10993.19 samples/sec   Loss 7.3072   LearningRate 0.0182   Epoch: 22   Global Step: 115940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:34:10,809-Speed 10698.54 samples/sec   Loss 7.3017   LearningRate 0.0182   Epoch: 22   Global Step: 115950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:34:11,764-Speed 10729.25 samples/sec   Loss 7.0063   LearningRate 0.0182   Epoch: 22   Global Step: 115960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:34:12,718-Speed 10751.33 samples/sec   Loss 7.2936   LearningRate 0.0182   Epoch: 22   Global Step: 115970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:34:13,668-Speed 10783.48 samples/sec   Loss 7.3984   LearningRate 0.0182   Epoch: 22   Global Step: 115980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:34:14,587-Speed 11148.48 samples/sec   Loss 7.3453   LearningRate 0.0182   Epoch: 22   Global Step: 115990   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:34:15,520-Speed 10985.18 samples/sec   Loss 7.3305   LearningRate 0.0182   Epoch: 22   Global Step: 116000   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:34:37,584-[lfw][116000]XNorm: 10.107439
Training: 2022-04-11 03:34:37,585-[lfw][116000]Accuracy-Flip: 0.99617+-0.00334
Training: 2022-04-11 03:34:37,585-[lfw][116000]Accuracy-Highest: 0.99667
Training: 2022-04-11 03:35:02,911-[cfp_fp][116000]XNorm: 8.626950
Training: 2022-04-11 03:35:02,912-[cfp_fp][116000]Accuracy-Flip: 0.96529+-0.01006
Training: 2022-04-11 03:35:02,913-[cfp_fp][116000]Accuracy-Highest: 0.96600
Training: 2022-04-11 03:35:25,173-[agedb_30][116000]XNorm: 9.825822
Training: 2022-04-11 03:35:25,173-[agedb_30][116000]Accuracy-Flip: 0.96917+-0.00668
Training: 2022-04-11 03:35:25,175-[agedb_30][116000]Accuracy-Highest: 0.97017
Training: 2022-04-11 03:35:26,120-Speed 145.04 samples/sec   Loss 7.3728   LearningRate 0.0182   Epoch: 22   Global Step: 116010   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:35:27,079-Speed 10681.67 samples/sec   Loss 7.2197   LearningRate 0.0182   Epoch: 22   Global Step: 116020   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:35:28,014-Speed 10968.47 samples/sec   Loss 7.3156   LearningRate 0.0182   Epoch: 22   Global Step: 116030   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:35:28,985-Speed 10552.79 samples/sec   Loss 7.1993   LearningRate 0.0182   Epoch: 22   Global Step: 116040   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:35:29,928-Speed 10864.32 samples/sec   Loss 7.2049   LearningRate 0.0182   Epoch: 22   Global Step: 116050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:35:30,913-Speed 10404.66 samples/sec   Loss 7.3467   LearningRate 0.0182   Epoch: 22   Global Step: 116060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:35:31,845-Speed 11003.09 samples/sec   Loss 7.2382   LearningRate 0.0182   Epoch: 22   Global Step: 116070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:35:32,778-Speed 10985.46 samples/sec   Loss 7.2822   LearningRate 0.0182   Epoch: 22   Global Step: 116080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:35:33,693-Speed 11203.93 samples/sec   Loss 7.2165   LearningRate 0.0182   Epoch: 22   Global Step: 116090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:35:34,614-Speed 11117.37 samples/sec   Loss 7.2419   LearningRate 0.0182   Epoch: 22   Global Step: 116100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:35:35,538-Speed 11090.68 samples/sec   Loss 7.2330   LearningRate 0.0182   Epoch: 22   Global Step: 116110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:35:36,469-Speed 11012.95 samples/sec   Loss 7.0557   LearningRate 0.0182   Epoch: 22   Global Step: 116120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:35:37,423-Speed 10734.34 samples/sec   Loss 7.2220   LearningRate 0.0181   Epoch: 22   Global Step: 116130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:35:38,354-Speed 11010.29 samples/sec   Loss 7.2978   LearningRate 0.0181   Epoch: 22   Global Step: 116140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:35:39,281-Speed 11057.32 samples/sec   Loss 7.2516   LearningRate 0.0181   Epoch: 22   Global Step: 116150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:35:40,245-Speed 10628.38 samples/sec   Loss 7.3010   LearningRate 0.0181   Epoch: 22   Global Step: 116160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:35:41,213-Speed 10593.49 samples/sec   Loss 7.0586   LearningRate 0.0181   Epoch: 22   Global Step: 116170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:35:42,124-Speed 11250.90 samples/sec   Loss 7.2935   LearningRate 0.0181   Epoch: 22   Global Step: 116180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:35:43,094-Speed 10569.70 samples/sec   Loss 7.3604   LearningRate 0.0181   Epoch: 22   Global Step: 116190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:35:44,041-Speed 10818.16 samples/sec   Loss 7.2256   LearningRate 0.0181   Epoch: 22   Global Step: 116200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:35:44,970-Speed 11031.67 samples/sec   Loss 7.2818   LearningRate 0.0181   Epoch: 22   Global Step: 116210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:35:45,897-Speed 11053.85 samples/sec   Loss 7.2900   LearningRate 0.0181   Epoch: 22   Global Step: 116220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:35:46,858-Speed 10669.82 samples/sec   Loss 7.4321   LearningRate 0.0181   Epoch: 22   Global Step: 116230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:35:47,821-Speed 10639.56 samples/sec   Loss 7.2401   LearningRate 0.0181   Epoch: 22   Global Step: 116240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:35:48,777-Speed 10721.71 samples/sec   Loss 7.3282   LearningRate 0.0181   Epoch: 22   Global Step: 116250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:35:49,742-Speed 10618.50 samples/sec   Loss 7.1542   LearningRate 0.0181   Epoch: 22   Global Step: 116260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:35:50,709-Speed 10596.87 samples/sec   Loss 7.2639   LearningRate 0.0181   Epoch: 22   Global Step: 116270   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:35:51,646-Speed 10945.04 samples/sec   Loss 7.2498   LearningRate 0.0181   Epoch: 22   Global Step: 116280   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:35:52,602-Speed 10715.87 samples/sec   Loss 7.3417   LearningRate 0.0181   Epoch: 22   Global Step: 116290   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:35:53,568-Speed 10610.80 samples/sec   Loss 7.2741   LearningRate 0.0181   Epoch: 22   Global Step: 116300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:35:54,502-Speed 10989.70 samples/sec   Loss 7.2128   LearningRate 0.0181   Epoch: 22   Global Step: 116310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:35:55,460-Speed 10697.15 samples/sec   Loss 7.2979   LearningRate 0.0181   Epoch: 22   Global Step: 116320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:35:56,399-Speed 10907.72 samples/sec   Loss 7.2663   LearningRate 0.0181   Epoch: 22   Global Step: 116330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:36:05,676-Speed 1103.92 samples/sec   Loss 6.6155   LearningRate 0.0181   Epoch: 23   Global Step: 116340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:36:06,686-Speed 10148.01 samples/sec   Loss 6.3759   LearningRate 0.0181   Epoch: 23   Global Step: 116350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:36:07,899-Speed 8451.62 samples/sec   Loss 6.3610   LearningRate 0.0181   Epoch: 23   Global Step: 116360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:36:08,874-Speed 10506.72 samples/sec   Loss 6.5445   LearningRate 0.0180   Epoch: 23   Global Step: 116370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:36:09,870-Speed 10287.22 samples/sec   Loss 6.5825   LearningRate 0.0180   Epoch: 23   Global Step: 116380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:36:10,904-Speed 9908.97 samples/sec   Loss 6.4315   LearningRate 0.0180   Epoch: 23   Global Step: 116390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:36:11,924-Speed 10048.93 samples/sec   Loss 6.4410   LearningRate 0.0180   Epoch: 23   Global Step: 116400   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:36:12,916-Speed 10335.99 samples/sec   Loss 6.4311   LearningRate 0.0180   Epoch: 23   Global Step: 116410   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:36:13,880-Speed 10633.10 samples/sec   Loss 6.4372   LearningRate 0.0180   Epoch: 23   Global Step: 116420   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:36:14,847-Speed 10594.43 samples/sec   Loss 6.5264   LearningRate 0.0180   Epoch: 23   Global Step: 116430   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:36:15,794-Speed 10825.20 samples/sec   Loss 6.4942   LearningRate 0.0180   Epoch: 23   Global Step: 116440   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:36:16,740-Speed 10830.77 samples/sec   Loss 6.3751   LearningRate 0.0180   Epoch: 23   Global Step: 116450   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:36:17,687-Speed 10817.48 samples/sec   Loss 6.6399   LearningRate 0.0180   Epoch: 23   Global Step: 116460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:36:18,661-Speed 10525.27 samples/sec   Loss 6.4523   LearningRate 0.0180   Epoch: 23   Global Step: 116470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:36:19,620-Speed 10690.88 samples/sec   Loss 6.4174   LearningRate 0.0180   Epoch: 23   Global Step: 116480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:36:20,566-Speed 10829.55 samples/sec   Loss 6.4741   LearningRate 0.0180   Epoch: 23   Global Step: 116490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:36:21,520-Speed 10746.50 samples/sec   Loss 6.5479   LearningRate 0.0180   Epoch: 23   Global Step: 116500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:36:22,459-Speed 10914.01 samples/sec   Loss 6.6264   LearningRate 0.0180   Epoch: 23   Global Step: 116510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:36:23,482-Speed 10015.21 samples/sec   Loss 6.4769   LearningRate 0.0180   Epoch: 23   Global Step: 116520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:36:24,415-Speed 10988.44 samples/sec   Loss 6.6314   LearningRate 0.0180   Epoch: 23   Global Step: 116530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:36:25,373-Speed 10700.15 samples/sec   Loss 6.4685   LearningRate 0.0180   Epoch: 23   Global Step: 116540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:36:26,344-Speed 10556.51 samples/sec   Loss 6.5219   LearningRate 0.0180   Epoch: 23   Global Step: 116550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:36:27,288-Speed 10853.12 samples/sec   Loss 6.6209   LearningRate 0.0180   Epoch: 23   Global Step: 116560   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:36:28,228-Speed 10913.46 samples/sec   Loss 6.5133   LearningRate 0.0180   Epoch: 23   Global Step: 116570   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:36:29,131-Speed 11348.79 samples/sec   Loss 6.4220   LearningRate 0.0180   Epoch: 23   Global Step: 116580   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:36:30,077-Speed 10834.13 samples/sec   Loss 6.6300   LearningRate 0.0180   Epoch: 23   Global Step: 116590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:36:31,055-Speed 10472.80 samples/sec   Loss 6.3930   LearningRate 0.0180   Epoch: 23   Global Step: 116600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:36:31,994-Speed 10923.67 samples/sec   Loss 6.5099   LearningRate 0.0179   Epoch: 23   Global Step: 116610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:36:32,987-Speed 10319.03 samples/sec   Loss 6.6865   LearningRate 0.0179   Epoch: 23   Global Step: 116620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:36:33,958-Speed 10563.00 samples/sec   Loss 6.5916   LearningRate 0.0179   Epoch: 23   Global Step: 116630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:36:34,925-Speed 10595.77 samples/sec   Loss 6.6358   LearningRate 0.0179   Epoch: 23   Global Step: 116640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:36:35,894-Speed 10583.60 samples/sec   Loss 6.4628   LearningRate 0.0179   Epoch: 23   Global Step: 116650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:36:36,834-Speed 10896.71 samples/sec   Loss 6.4521   LearningRate 0.0179   Epoch: 23   Global Step: 116660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:36:37,842-Speed 10174.40 samples/sec   Loss 6.5152   LearningRate 0.0179   Epoch: 23   Global Step: 116670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:36:38,809-Speed 10602.89 samples/sec   Loss 6.5356   LearningRate 0.0179   Epoch: 23   Global Step: 116680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:36:39,729-Speed 11140.36 samples/sec   Loss 6.4236   LearningRate 0.0179   Epoch: 23   Global Step: 116690   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:36:40,692-Speed 10644.83 samples/sec   Loss 6.5541   LearningRate 0.0179   Epoch: 23   Global Step: 116700   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:36:41,635-Speed 10866.12 samples/sec   Loss 6.6865   LearningRate 0.0179   Epoch: 23   Global Step: 116710   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:36:42,561-Speed 11063.04 samples/sec   Loss 6.8419   LearningRate 0.0179   Epoch: 23   Global Step: 116720   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:36:43,542-Speed 10447.76 samples/sec   Loss 6.5795   LearningRate 0.0179   Epoch: 23   Global Step: 116730   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:36:44,473-Speed 11005.86 samples/sec   Loss 6.4993   LearningRate 0.0179   Epoch: 23   Global Step: 116740   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:36:45,456-Speed 10430.74 samples/sec   Loss 6.6156   LearningRate 0.0179   Epoch: 23   Global Step: 116750   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:36:46,398-Speed 10874.21 samples/sec   Loss 6.6039   LearningRate 0.0179   Epoch: 23   Global Step: 116760   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:36:47,329-Speed 11014.00 samples/sec   Loss 6.7632   LearningRate 0.0179   Epoch: 23   Global Step: 116770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:36:48,290-Speed 10660.51 samples/sec   Loss 6.5915   LearningRate 0.0179   Epoch: 23   Global Step: 116780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:36:49,236-Speed 10832.54 samples/sec   Loss 6.7676   LearningRate 0.0179   Epoch: 23   Global Step: 116790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:36:50,174-Speed 10928.99 samples/sec   Loss 6.5463   LearningRate 0.0179   Epoch: 23   Global Step: 116800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:36:51,117-Speed 10867.87 samples/sec   Loss 6.6360   LearningRate 0.0179   Epoch: 23   Global Step: 116810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:36:52,031-Speed 11212.73 samples/sec   Loss 6.6523   LearningRate 0.0179   Epoch: 23   Global Step: 116820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:36:52,970-Speed 10918.20 samples/sec   Loss 6.6293   LearningRate 0.0179   Epoch: 23   Global Step: 116830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:36:53,902-Speed 10999.07 samples/sec   Loss 6.6517   LearningRate 0.0179   Epoch: 23   Global Step: 116840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:36:54,864-Speed 10652.07 samples/sec   Loss 6.6356   LearningRate 0.0178   Epoch: 23   Global Step: 116850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:36:55,852-Speed 10377.66 samples/sec   Loss 6.7251   LearningRate 0.0178   Epoch: 23   Global Step: 116860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:36:56,812-Speed 10673.80 samples/sec   Loss 6.5867   LearningRate 0.0178   Epoch: 23   Global Step: 116870   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:36:57,750-Speed 10930.66 samples/sec   Loss 6.5191   LearningRate 0.0178   Epoch: 23   Global Step: 116880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:36:58,708-Speed 10697.14 samples/sec   Loss 6.7695   LearningRate 0.0178   Epoch: 23   Global Step: 116890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:36:59,699-Speed 10340.54 samples/sec   Loss 6.7187   LearningRate 0.0178   Epoch: 23   Global Step: 116900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:37:00,770-Speed 9566.98 samples/sec   Loss 6.5129   LearningRate 0.0178   Epoch: 23   Global Step: 116910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:37:01,789-Speed 10058.37 samples/sec   Loss 6.7289   LearningRate 0.0178   Epoch: 23   Global Step: 116920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:37:02,826-Speed 9881.49 samples/sec   Loss 6.6047   LearningRate 0.0178   Epoch: 23   Global Step: 116930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:37:03,882-Speed 9701.54 samples/sec   Loss 6.5696   LearningRate 0.0178   Epoch: 23   Global Step: 116940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:37:04,848-Speed 10611.67 samples/sec   Loss 6.5909   LearningRate 0.0178   Epoch: 23   Global Step: 116950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:37:05,871-Speed 10020.28 samples/sec   Loss 6.6348   LearningRate 0.0178   Epoch: 23   Global Step: 116960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:37:06,823-Speed 10764.06 samples/sec   Loss 6.7919   LearningRate 0.0178   Epoch: 23   Global Step: 116970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:37:07,765-Speed 10889.27 samples/sec   Loss 6.8112   LearningRate 0.0178   Epoch: 23   Global Step: 116980   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:37:08,723-Speed 10702.52 samples/sec   Loss 6.8587   LearningRate 0.0178   Epoch: 23   Global Step: 116990   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:37:09,609-Speed 11564.26 samples/sec   Loss 6.8054   LearningRate 0.0178   Epoch: 23   Global Step: 117000   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:37:10,527-Speed 11162.69 samples/sec   Loss 6.6211   LearningRate 0.0178   Epoch: 23   Global Step: 117010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:37:11,542-Speed 10103.29 samples/sec   Loss 6.8100   LearningRate 0.0178   Epoch: 23   Global Step: 117020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:37:12,501-Speed 10686.11 samples/sec   Loss 6.6843   LearningRate 0.0178   Epoch: 23   Global Step: 117030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:37:13,430-Speed 11032.01 samples/sec   Loss 6.7395   LearningRate 0.0178   Epoch: 23   Global Step: 117040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:37:14,364-Speed 10970.65 samples/sec   Loss 6.6302   LearningRate 0.0178   Epoch: 23   Global Step: 117050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:37:15,323-Speed 10682.05 samples/sec   Loss 6.6081   LearningRate 0.0178   Epoch: 23   Global Step: 117060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:37:16,278-Speed 10732.06 samples/sec   Loss 6.6473   LearningRate 0.0178   Epoch: 23   Global Step: 117070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:37:17,244-Speed 10614.87 samples/sec   Loss 6.8599   LearningRate 0.0178   Epoch: 23   Global Step: 117080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:37:18,170-Speed 11070.02 samples/sec   Loss 6.8715   LearningRate 0.0177   Epoch: 23   Global Step: 117090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:37:19,128-Speed 10699.33 samples/sec   Loss 6.6714   LearningRate 0.0177   Epoch: 23   Global Step: 117100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:37:20,081-Speed 10751.93 samples/sec   Loss 6.8770   LearningRate 0.0177   Epoch: 23   Global Step: 117110   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:37:21,072-Speed 10343.09 samples/sec   Loss 6.6619   LearningRate 0.0177   Epoch: 23   Global Step: 117120   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:37:22,026-Speed 10743.67 samples/sec   Loss 6.5527   LearningRate 0.0177   Epoch: 23   Global Step: 117130   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:37:22,981-Speed 10739.04 samples/sec   Loss 6.6651   LearningRate 0.0177   Epoch: 23   Global Step: 117140   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:37:23,917-Speed 10938.84 samples/sec   Loss 6.7449   LearningRate 0.0177   Epoch: 23   Global Step: 117150   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:37:24,803-Speed 11574.69 samples/sec   Loss 6.8219   LearningRate 0.0177   Epoch: 23   Global Step: 117160   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:37:25,792-Speed 10374.68 samples/sec   Loss 6.6610   LearningRate 0.0177   Epoch: 23   Global Step: 117170   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:37:26,796-Speed 10204.73 samples/sec   Loss 6.7001   LearningRate 0.0177   Epoch: 23   Global Step: 117180   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:37:27,721-Speed 11076.06 samples/sec   Loss 6.8037   LearningRate 0.0177   Epoch: 23   Global Step: 117190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:37:28,678-Speed 10718.73 samples/sec   Loss 6.8696   LearningRate 0.0177   Epoch: 23   Global Step: 117200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:37:29,663-Speed 10396.75 samples/sec   Loss 6.6721   LearningRate 0.0177   Epoch: 23   Global Step: 117210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:37:30,621-Speed 10698.64 samples/sec   Loss 6.9041   LearningRate 0.0177   Epoch: 23   Global Step: 117220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:37:31,565-Speed 10856.02 samples/sec   Loss 6.6876   LearningRate 0.0177   Epoch: 23   Global Step: 117230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:37:32,472-Speed 11302.79 samples/sec   Loss 6.8148   LearningRate 0.0177   Epoch: 23   Global Step: 117240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:37:33,456-Speed 10422.08 samples/sec   Loss 6.7826   LearningRate 0.0177   Epoch: 23   Global Step: 117250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:37:34,436-Speed 10454.11 samples/sec   Loss 6.7934   LearningRate 0.0177   Epoch: 23   Global Step: 117260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:37:35,391-Speed 10738.60 samples/sec   Loss 6.6725   LearningRate 0.0177   Epoch: 23   Global Step: 117270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:37:36,324-Speed 10986.59 samples/sec   Loss 6.6689   LearningRate 0.0177   Epoch: 23   Global Step: 117280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:37:37,219-Speed 11445.09 samples/sec   Loss 6.7314   LearningRate 0.0177   Epoch: 23   Global Step: 117290   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:37:38,170-Speed 10796.62 samples/sec   Loss 6.6932   LearningRate 0.0177   Epoch: 23   Global Step: 117300   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:37:39,086-Speed 11189.32 samples/sec   Loss 6.8613   LearningRate 0.0177   Epoch: 23   Global Step: 117310   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:37:40,035-Speed 10805.72 samples/sec   Loss 6.8437   LearningRate 0.0177   Epoch: 23   Global Step: 117320   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:37:40,945-Speed 11259.94 samples/sec   Loss 6.7692   LearningRate 0.0176   Epoch: 23   Global Step: 117330   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:37:41,877-Speed 10996.07 samples/sec   Loss 6.7808   LearningRate 0.0176   Epoch: 23   Global Step: 117340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:37:42,836-Speed 10689.44 samples/sec   Loss 6.7735   LearningRate 0.0176   Epoch: 23   Global Step: 117350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:37:43,794-Speed 10698.73 samples/sec   Loss 6.8095   LearningRate 0.0176   Epoch: 23   Global Step: 117360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:37:44,791-Speed 10275.13 samples/sec   Loss 6.7641   LearningRate 0.0176   Epoch: 23   Global Step: 117370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:37:45,756-Speed 10619.75 samples/sec   Loss 6.8350   LearningRate 0.0176   Epoch: 23   Global Step: 117380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:37:46,686-Speed 11016.61 samples/sec   Loss 6.9584   LearningRate 0.0176   Epoch: 23   Global Step: 117390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:37:47,645-Speed 10693.56 samples/sec   Loss 6.7813   LearningRate 0.0176   Epoch: 23   Global Step: 117400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:37:48,612-Speed 10603.18 samples/sec   Loss 6.7901   LearningRate 0.0176   Epoch: 23   Global Step: 117410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:37:49,571-Speed 10680.63 samples/sec   Loss 6.7270   LearningRate 0.0176   Epoch: 23   Global Step: 117420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:37:50,509-Speed 10930.29 samples/sec   Loss 6.7864   LearningRate 0.0176   Epoch: 23   Global Step: 117430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:37:51,447-Speed 10927.78 samples/sec   Loss 6.9070   LearningRate 0.0176   Epoch: 23   Global Step: 117440   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:37:52,375-Speed 11036.31 samples/sec   Loss 6.9083   LearningRate 0.0176   Epoch: 23   Global Step: 117450   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:37:53,281-Speed 11329.75 samples/sec   Loss 6.7770   LearningRate 0.0176   Epoch: 23   Global Step: 117460   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:37:54,277-Speed 10286.57 samples/sec   Loss 6.7925   LearningRate 0.0176   Epoch: 23   Global Step: 117470   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:37:55,186-Speed 11285.65 samples/sec   Loss 6.7640   LearningRate 0.0176   Epoch: 23   Global Step: 117480   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:37:56,097-Speed 11256.87 samples/sec   Loss 6.8564   LearningRate 0.0176   Epoch: 23   Global Step: 117490   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:37:57,033-Speed 10938.52 samples/sec   Loss 6.7368   LearningRate 0.0176   Epoch: 23   Global Step: 117500   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:37:57,988-Speed 10883.57 samples/sec   Loss 6.8052   LearningRate 0.0176   Epoch: 23   Global Step: 117510   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:37:58,952-Speed 10633.08 samples/sec   Loss 6.9537   LearningRate 0.0176   Epoch: 23   Global Step: 117520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:37:59,950-Speed 10273.54 samples/sec   Loss 6.9252   LearningRate 0.0176   Epoch: 23   Global Step: 117530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:38:00,914-Speed 10640.69 samples/sec   Loss 6.8111   LearningRate 0.0176   Epoch: 23   Global Step: 117540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:38:01,838-Speed 11087.10 samples/sec   Loss 6.7495   LearningRate 0.0176   Epoch: 23   Global Step: 117550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:38:02,752-Speed 11210.49 samples/sec   Loss 6.9739   LearningRate 0.0176   Epoch: 23   Global Step: 117560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:38:03,672-Speed 11143.79 samples/sec   Loss 6.8520   LearningRate 0.0175   Epoch: 23   Global Step: 117570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:38:04,563-Speed 11501.75 samples/sec   Loss 6.8236   LearningRate 0.0175   Epoch: 23   Global Step: 117580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:38:05,448-Speed 11571.28 samples/sec   Loss 6.7163   LearningRate 0.0175   Epoch: 23   Global Step: 117590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:38:06,377-Speed 11039.98 samples/sec   Loss 6.8927   LearningRate 0.0175   Epoch: 23   Global Step: 117600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:38:07,334-Speed 10728.88 samples/sec   Loss 6.8856   LearningRate 0.0175   Epoch: 23   Global Step: 117610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:38:08,326-Speed 10343.67 samples/sec   Loss 6.8799   LearningRate 0.0175   Epoch: 23   Global Step: 117620   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:38:09,273-Speed 10822.30 samples/sec   Loss 6.7678   LearningRate 0.0175   Epoch: 23   Global Step: 117630   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:38:10,214-Speed 10893.33 samples/sec   Loss 6.7772   LearningRate 0.0175   Epoch: 23   Global Step: 117640   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:38:11,165-Speed 10767.27 samples/sec   Loss 6.9668   LearningRate 0.0175   Epoch: 23   Global Step: 117650   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:38:12,139-Speed 10520.65 samples/sec   Loss 6.9784   LearningRate 0.0175   Epoch: 23   Global Step: 117660   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:38:13,070-Speed 11009.64 samples/sec   Loss 6.8871   LearningRate 0.0175   Epoch: 23   Global Step: 117670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:38:13,997-Speed 11059.81 samples/sec   Loss 7.0396   LearningRate 0.0175   Epoch: 23   Global Step: 117680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:38:14,953-Speed 10719.13 samples/sec   Loss 6.8877   LearningRate 0.0175   Epoch: 23   Global Step: 117690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:38:15,867-Speed 11209.22 samples/sec   Loss 7.0456   LearningRate 0.0175   Epoch: 23   Global Step: 117700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:38:16,792-Speed 11088.91 samples/sec   Loss 6.9154   LearningRate 0.0175   Epoch: 23   Global Step: 117710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:38:17,752-Speed 10674.38 samples/sec   Loss 6.8475   LearningRate 0.0175   Epoch: 23   Global Step: 117720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:38:18,752-Speed 10243.65 samples/sec   Loss 6.8017   LearningRate 0.0175   Epoch: 23   Global Step: 117730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:38:19,747-Speed 10303.47 samples/sec   Loss 6.8864   LearningRate 0.0175   Epoch: 23   Global Step: 117740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:38:20,673-Speed 11067.83 samples/sec   Loss 6.9250   LearningRate 0.0175   Epoch: 23   Global Step: 117750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:38:21,574-Speed 11372.83 samples/sec   Loss 6.8498   LearningRate 0.0175   Epoch: 23   Global Step: 117760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:38:22,518-Speed 10857.57 samples/sec   Loss 6.9041   LearningRate 0.0175   Epoch: 23   Global Step: 117770   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:38:23,470-Speed 10763.24 samples/sec   Loss 6.9790   LearningRate 0.0175   Epoch: 23   Global Step: 117780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:38:24,430-Speed 10677.14 samples/sec   Loss 6.7936   LearningRate 0.0175   Epoch: 23   Global Step: 117790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:38:25,377-Speed 10824.94 samples/sec   Loss 6.8982   LearningRate 0.0175   Epoch: 23   Global Step: 117800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:38:26,304-Speed 11058.69 samples/sec   Loss 6.9401   LearningRate 0.0174   Epoch: 23   Global Step: 117810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:38:27,244-Speed 10898.29 samples/sec   Loss 6.9430   LearningRate 0.0174   Epoch: 23   Global Step: 117820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:38:28,180-Speed 10954.63 samples/sec   Loss 6.9928   LearningRate 0.0174   Epoch: 23   Global Step: 117830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:38:29,109-Speed 11035.01 samples/sec   Loss 6.8366   LearningRate 0.0174   Epoch: 23   Global Step: 117840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:38:30,093-Speed 10408.36 samples/sec   Loss 6.8604   LearningRate 0.0174   Epoch: 23   Global Step: 117850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:38:31,083-Speed 10356.55 samples/sec   Loss 6.9752   LearningRate 0.0174   Epoch: 23   Global Step: 117860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:38:32,035-Speed 10764.53 samples/sec   Loss 7.0078   LearningRate 0.0174   Epoch: 23   Global Step: 117870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:38:32,985-Speed 10785.87 samples/sec   Loss 6.9813   LearningRate 0.0174   Epoch: 23   Global Step: 117880   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:38:33,944-Speed 10694.78 samples/sec   Loss 6.9687   LearningRate 0.0174   Epoch: 23   Global Step: 117890   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:38:34,858-Speed 11204.67 samples/sec   Loss 6.8157   LearningRate 0.0174   Epoch: 23   Global Step: 117900   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:38:35,799-Speed 10885.95 samples/sec   Loss 6.9693   LearningRate 0.0174   Epoch: 23   Global Step: 117910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:38:36,734-Speed 10963.77 samples/sec   Loss 6.9063   LearningRate 0.0174   Epoch: 23   Global Step: 117920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:38:37,683-Speed 10808.27 samples/sec   Loss 7.1007   LearningRate 0.0174   Epoch: 23   Global Step: 117930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:38:38,662-Speed 10477.26 samples/sec   Loss 6.8694   LearningRate 0.0174   Epoch: 23   Global Step: 117940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:38:39,600-Speed 10917.59 samples/sec   Loss 7.0036   LearningRate 0.0174   Epoch: 23   Global Step: 117950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:38:40,532-Speed 10996.98 samples/sec   Loss 6.8555   LearningRate 0.0174   Epoch: 23   Global Step: 117960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:38:41,471-Speed 10917.75 samples/sec   Loss 7.0850   LearningRate 0.0174   Epoch: 23   Global Step: 117970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:38:42,427-Speed 10728.54 samples/sec   Loss 6.9185   LearningRate 0.0174   Epoch: 23   Global Step: 117980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:38:43,391-Speed 10628.50 samples/sec   Loss 6.9142   LearningRate 0.0174   Epoch: 23   Global Step: 117990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:38:44,323-Speed 10998.22 samples/sec   Loss 7.0073   LearningRate 0.0174   Epoch: 23   Global Step: 118000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:39:06,521-[lfw][118000]XNorm: 9.950758
Training: 2022-04-11 03:39:06,522-[lfw][118000]Accuracy-Flip: 0.99533+-0.00371
Training: 2022-04-11 03:39:06,522-[lfw][118000]Accuracy-Highest: 0.99667
Training: 2022-04-11 03:39:31,939-[cfp_fp][118000]XNorm: 8.498469
Training: 2022-04-11 03:39:31,939-[cfp_fp][118000]Accuracy-Flip: 0.96429+-0.00919
Training: 2022-04-11 03:39:31,941-[cfp_fp][118000]Accuracy-Highest: 0.96600
Training: 2022-04-11 03:39:54,012-[agedb_30][118000]XNorm: 9.680325
Training: 2022-04-11 03:39:54,012-[agedb_30][118000]Accuracy-Flip: 0.96767+-0.00958
Training: 2022-04-11 03:39:54,013-[agedb_30][118000]Accuracy-Highest: 0.97017
Training: 2022-04-11 03:39:54,943-Speed 145.00 samples/sec   Loss 6.8922   LearningRate 0.0174   Epoch: 23   Global Step: 118010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:39:55,893-Speed 10788.77 samples/sec   Loss 6.9908   LearningRate 0.0174   Epoch: 23   Global Step: 118020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:39:56,875-Speed 10432.16 samples/sec   Loss 6.9062   LearningRate 0.0174   Epoch: 23   Global Step: 118030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:39:57,803-Speed 11046.79 samples/sec   Loss 6.9885   LearningRate 0.0174   Epoch: 23   Global Step: 118040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:39:58,733-Speed 11020.65 samples/sec   Loss 6.8017   LearningRate 0.0173   Epoch: 23   Global Step: 118050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:39:59,625-Speed 11508.80 samples/sec   Loss 6.9298   LearningRate 0.0173   Epoch: 23   Global Step: 118060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:40:00,542-Speed 11178.73 samples/sec   Loss 7.0546   LearningRate 0.0173   Epoch: 23   Global Step: 118070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:40:01,516-Speed 10524.58 samples/sec   Loss 6.7526   LearningRate 0.0173   Epoch: 23   Global Step: 118080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:40:02,459-Speed 10879.78 samples/sec   Loss 7.1202   LearningRate 0.0173   Epoch: 23   Global Step: 118090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:40:03,370-Speed 11252.24 samples/sec   Loss 6.9678   LearningRate 0.0173   Epoch: 23   Global Step: 118100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:40:04,335-Speed 10631.60 samples/sec   Loss 7.0111   LearningRate 0.0173   Epoch: 23   Global Step: 118110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:40:05,255-Speed 11133.78 samples/sec   Loss 6.9347   LearningRate 0.0173   Epoch: 23   Global Step: 118120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:40:06,165-Speed 11254.61 samples/sec   Loss 7.0211   LearningRate 0.0173   Epoch: 23   Global Step: 118130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:40:07,126-Speed 10666.92 samples/sec   Loss 6.8730   LearningRate 0.0173   Epoch: 23   Global Step: 118140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:40:08,088-Speed 10662.95 samples/sec   Loss 6.9421   LearningRate 0.0173   Epoch: 23   Global Step: 118150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:40:09,040-Speed 10761.39 samples/sec   Loss 6.9759   LearningRate 0.0173   Epoch: 23   Global Step: 118160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:40:09,986-Speed 10837.61 samples/sec   Loss 7.0972   LearningRate 0.0173   Epoch: 23   Global Step: 118170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:40:10,952-Speed 10615.06 samples/sec   Loss 7.0032   LearningRate 0.0173   Epoch: 23   Global Step: 118180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:40:11,910-Speed 10695.73 samples/sec   Loss 6.9131   LearningRate 0.0173   Epoch: 23   Global Step: 118190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:40:12,816-Speed 11311.15 samples/sec   Loss 6.9612   LearningRate 0.0173   Epoch: 23   Global Step: 118200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:40:13,765-Speed 10805.42 samples/sec   Loss 7.0363   LearningRate 0.0173   Epoch: 23   Global Step: 118210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:40:14,725-Speed 10666.64 samples/sec   Loss 6.9610   LearningRate 0.0173   Epoch: 23   Global Step: 118220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:40:15,665-Speed 10913.70 samples/sec   Loss 6.9371   LearningRate 0.0173   Epoch: 23   Global Step: 118230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:40:16,593-Speed 11049.56 samples/sec   Loss 6.9116   LearningRate 0.0173   Epoch: 23   Global Step: 118240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:40:17,555-Speed 10643.53 samples/sec   Loss 7.0400   LearningRate 0.0173   Epoch: 23   Global Step: 118250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:40:18,502-Speed 10831.03 samples/sec   Loss 6.9685   LearningRate 0.0173   Epoch: 23   Global Step: 118260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:40:19,447-Speed 10852.45 samples/sec   Loss 6.8314   LearningRate 0.0173   Epoch: 23   Global Step: 118270   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:40:20,389-Speed 10872.93 samples/sec   Loss 7.0558   LearningRate 0.0173   Epoch: 23   Global Step: 118280   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:40:21,322-Speed 10988.88 samples/sec   Loss 6.9455   LearningRate 0.0173   Epoch: 23   Global Step: 118290   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:40:22,240-Speed 11169.84 samples/sec   Loss 6.8357   LearningRate 0.0172   Epoch: 23   Global Step: 118300   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:40:23,230-Speed 10352.29 samples/sec   Loss 6.8576   LearningRate 0.0172   Epoch: 23   Global Step: 118310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:40:24,142-Speed 11230.01 samples/sec   Loss 6.9663   LearningRate 0.0172   Epoch: 23   Global Step: 118320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:40:25,113-Speed 10563.41 samples/sec   Loss 7.0475   LearningRate 0.0172   Epoch: 23   Global Step: 118330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:40:26,007-Speed 11457.71 samples/sec   Loss 6.9895   LearningRate 0.0172   Epoch: 23   Global Step: 118340   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:40:26,956-Speed 10804.10 samples/sec   Loss 7.0768   LearningRate 0.0172   Epoch: 23   Global Step: 118350   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:40:27,944-Speed 10368.14 samples/sec   Loss 6.9972   LearningRate 0.0172   Epoch: 23   Global Step: 118360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:40:28,901-Speed 10717.57 samples/sec   Loss 6.9000   LearningRate 0.0172   Epoch: 23   Global Step: 118370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:40:29,835-Speed 10968.80 samples/sec   Loss 6.9296   LearningRate 0.0172   Epoch: 23   Global Step: 118380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:40:30,823-Speed 10376.18 samples/sec   Loss 7.0711   LearningRate 0.0172   Epoch: 23   Global Step: 118390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:40:31,776-Speed 10756.14 samples/sec   Loss 6.9933   LearningRate 0.0172   Epoch: 23   Global Step: 118400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:40:32,680-Speed 11344.49 samples/sec   Loss 7.1017   LearningRate 0.0172   Epoch: 23   Global Step: 118410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:40:33,620-Speed 10901.15 samples/sec   Loss 6.9559   LearningRate 0.0172   Epoch: 23   Global Step: 118420   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:40:34,550-Speed 11014.89 samples/sec   Loss 7.0344   LearningRate 0.0172   Epoch: 23   Global Step: 118430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:40:35,523-Speed 10536.20 samples/sec   Loss 6.8864   LearningRate 0.0172   Epoch: 23   Global Step: 118440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:40:36,480-Speed 10702.72 samples/sec   Loss 6.9122   LearningRate 0.0172   Epoch: 23   Global Step: 118450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:40:37,429-Speed 10801.76 samples/sec   Loss 6.9946   LearningRate 0.0172   Epoch: 23   Global Step: 118460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:40:38,343-Speed 11215.45 samples/sec   Loss 7.0689   LearningRate 0.0172   Epoch: 23   Global Step: 118470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:40:39,253-Speed 11256.85 samples/sec   Loss 7.0708   LearningRate 0.0172   Epoch: 23   Global Step: 118480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:40:40,232-Speed 10478.51 samples/sec   Loss 7.1043   LearningRate 0.0172   Epoch: 23   Global Step: 118490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:40:41,198-Speed 10608.60 samples/sec   Loss 6.8930   LearningRate 0.0172   Epoch: 23   Global Step: 118500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:40:42,171-Speed 10541.14 samples/sec   Loss 6.9463   LearningRate 0.0172   Epoch: 23   Global Step: 118510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:40:43,112-Speed 10886.13 samples/sec   Loss 7.0073   LearningRate 0.0172   Epoch: 23   Global Step: 118520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:40:44,082-Speed 10570.99 samples/sec   Loss 7.0005   LearningRate 0.0172   Epoch: 23   Global Step: 118530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:40:45,072-Speed 10348.55 samples/sec   Loss 7.1067   LearningRate 0.0171   Epoch: 23   Global Step: 118540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:40:46,054-Speed 10444.38 samples/sec   Loss 6.9142   LearningRate 0.0171   Epoch: 23   Global Step: 118550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:40:46,991-Speed 10931.10 samples/sec   Loss 6.9390   LearningRate 0.0171   Epoch: 23   Global Step: 118560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:40:47,920-Speed 11034.19 samples/sec   Loss 7.0496   LearningRate 0.0171   Epoch: 23   Global Step: 118570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:40:48,945-Speed 10004.17 samples/sec   Loss 6.9335   LearningRate 0.0171   Epoch: 23   Global Step: 118580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:40:49,860-Speed 11204.48 samples/sec   Loss 7.0074   LearningRate 0.0171   Epoch: 23   Global Step: 118590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:40:50,795-Speed 10958.60 samples/sec   Loss 7.0444   LearningRate 0.0171   Epoch: 23   Global Step: 118600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:40:51,730-Speed 10966.11 samples/sec   Loss 7.1344   LearningRate 0.0171   Epoch: 23   Global Step: 118610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:40:52,679-Speed 10797.02 samples/sec   Loss 7.0853   LearningRate 0.0171   Epoch: 23   Global Step: 118620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:40:53,647-Speed 10581.72 samples/sec   Loss 7.0520   LearningRate 0.0171   Epoch: 23   Global Step: 118630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:40:54,617-Speed 10572.03 samples/sec   Loss 6.8758   LearningRate 0.0171   Epoch: 23   Global Step: 118640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:40:55,565-Speed 10808.08 samples/sec   Loss 7.1326   LearningRate 0.0171   Epoch: 23   Global Step: 118650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:40:56,518-Speed 10758.43 samples/sec   Loss 7.0147   LearningRate 0.0171   Epoch: 23   Global Step: 118660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:40:57,452-Speed 10971.39 samples/sec   Loss 7.0961   LearningRate 0.0171   Epoch: 23   Global Step: 118670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:40:58,419-Speed 10611.23 samples/sec   Loss 6.9692   LearningRate 0.0171   Epoch: 23   Global Step: 118680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:40:59,371-Speed 10771.83 samples/sec   Loss 7.0695   LearningRate 0.0171   Epoch: 23   Global Step: 118690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:41:00,317-Speed 10825.37 samples/sec   Loss 7.3133   LearningRate 0.0171   Epoch: 23   Global Step: 118700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:41:01,243-Speed 11075.79 samples/sec   Loss 7.0808   LearningRate 0.0171   Epoch: 23   Global Step: 118710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:41:02,208-Speed 10609.70 samples/sec   Loss 6.9898   LearningRate 0.0171   Epoch: 23   Global Step: 118720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:41:03,170-Speed 10660.93 samples/sec   Loss 7.1939   LearningRate 0.0171   Epoch: 23   Global Step: 118730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:41:04,063-Speed 11474.37 samples/sec   Loss 7.1087   LearningRate 0.0171   Epoch: 23   Global Step: 118740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:41:05,015-Speed 10773.65 samples/sec   Loss 6.9739   LearningRate 0.0171   Epoch: 23   Global Step: 118750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:41:05,943-Speed 11042.17 samples/sec   Loss 6.9677   LearningRate 0.0171   Epoch: 23   Global Step: 118760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:41:06,885-Speed 10878.07 samples/sec   Loss 7.0392   LearningRate 0.0171   Epoch: 23   Global Step: 118770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:41:07,880-Speed 10300.48 samples/sec   Loss 7.0558   LearningRate 0.0170   Epoch: 23   Global Step: 118780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:41:08,806-Speed 11065.10 samples/sec   Loss 6.9605   LearningRate 0.0170   Epoch: 23   Global Step: 118790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:41:09,775-Speed 10575.75 samples/sec   Loss 6.9463   LearningRate 0.0170   Epoch: 23   Global Step: 118800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:41:10,686-Speed 11241.41 samples/sec   Loss 7.0437   LearningRate 0.0170   Epoch: 23   Global Step: 118810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:41:11,636-Speed 10795.61 samples/sec   Loss 7.0475   LearningRate 0.0170   Epoch: 23   Global Step: 118820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:41:12,570-Speed 10967.56 samples/sec   Loss 7.1944   LearningRate 0.0170   Epoch: 23   Global Step: 118830   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:41:13,511-Speed 10890.04 samples/sec   Loss 7.0083   LearningRate 0.0170   Epoch: 23   Global Step: 118840   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:41:14,443-Speed 10993.84 samples/sec   Loss 7.1016   LearningRate 0.0170   Epoch: 23   Global Step: 118850   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:41:15,374-Speed 11014.56 samples/sec   Loss 7.0343   LearningRate 0.0170   Epoch: 23   Global Step: 118860   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:41:16,294-Speed 11141.70 samples/sec   Loss 6.9748   LearningRate 0.0170   Epoch: 23   Global Step: 118870   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:41:17,211-Speed 11173.31 samples/sec   Loss 7.1756   LearningRate 0.0170   Epoch: 23   Global Step: 118880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:41:18,168-Speed 10715.97 samples/sec   Loss 6.9233   LearningRate 0.0170   Epoch: 23   Global Step: 118890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:41:19,147-Speed 10464.45 samples/sec   Loss 6.8587   LearningRate 0.0170   Epoch: 23   Global Step: 118900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:41:20,118-Speed 10550.26 samples/sec   Loss 7.0638   LearningRate 0.0170   Epoch: 23   Global Step: 118910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:41:21,032-Speed 11213.20 samples/sec   Loss 6.9623   LearningRate 0.0170   Epoch: 23   Global Step: 118920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:41:21,995-Speed 10643.22 samples/sec   Loss 7.1003   LearningRate 0.0170   Epoch: 23   Global Step: 118930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:41:22,945-Speed 10782.32 samples/sec   Loss 7.0506   LearningRate 0.0170   Epoch: 23   Global Step: 118940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:41:23,877-Speed 11007.57 samples/sec   Loss 7.0134   LearningRate 0.0170   Epoch: 23   Global Step: 118950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:41:24,787-Speed 11261.48 samples/sec   Loss 7.0896   LearningRate 0.0170   Epoch: 23   Global Step: 118960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:41:25,692-Speed 11317.32 samples/sec   Loss 7.0944   LearningRate 0.0170   Epoch: 23   Global Step: 118970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:41:26,636-Speed 10858.25 samples/sec   Loss 7.1514   LearningRate 0.0170   Epoch: 23   Global Step: 118980   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:41:27,609-Speed 10535.09 samples/sec   Loss 7.1519   LearningRate 0.0170   Epoch: 23   Global Step: 118990   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:41:28,535-Speed 11070.89 samples/sec   Loss 6.9722   LearningRate 0.0170   Epoch: 23   Global Step: 119000   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:41:29,466-Speed 11014.54 samples/sec   Loss 7.1643   LearningRate 0.0170   Epoch: 23   Global Step: 119010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:41:30,410-Speed 10856.73 samples/sec   Loss 6.9995   LearningRate 0.0170   Epoch: 23   Global Step: 119020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:41:31,343-Speed 10984.65 samples/sec   Loss 7.0374   LearningRate 0.0169   Epoch: 23   Global Step: 119030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:41:32,331-Speed 10372.91 samples/sec   Loss 7.1151   LearningRate 0.0169   Epoch: 23   Global Step: 119040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:41:33,290-Speed 10685.88 samples/sec   Loss 7.0442   LearningRate 0.0169   Epoch: 23   Global Step: 119050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:41:34,219-Speed 11034.23 samples/sec   Loss 7.0836   LearningRate 0.0169   Epoch: 23   Global Step: 119060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:41:35,189-Speed 10569.19 samples/sec   Loss 7.0844   LearningRate 0.0169   Epoch: 23   Global Step: 119070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:41:36,142-Speed 10744.50 samples/sec   Loss 7.0409   LearningRate 0.0169   Epoch: 23   Global Step: 119080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:41:37,088-Speed 10833.36 samples/sec   Loss 7.0768   LearningRate 0.0169   Epoch: 23   Global Step: 119090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:41:38,055-Speed 10603.41 samples/sec   Loss 6.8613   LearningRate 0.0169   Epoch: 23   Global Step: 119100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:41:38,983-Speed 11047.12 samples/sec   Loss 7.2120   LearningRate 0.0169   Epoch: 23   Global Step: 119110   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:41:39,918-Speed 10962.88 samples/sec   Loss 6.9654   LearningRate 0.0169   Epoch: 23   Global Step: 119120   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:41:40,865-Speed 10814.71 samples/sec   Loss 7.1440   LearningRate 0.0169   Epoch: 23   Global Step: 119130   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:41:41,830-Speed 10627.13 samples/sec   Loss 7.0907   LearningRate 0.0169   Epoch: 23   Global Step: 119140   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:41:42,824-Speed 10304.69 samples/sec   Loss 7.1486   LearningRate 0.0169   Epoch: 23   Global Step: 119150   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:41:43,767-Speed 10870.25 samples/sec   Loss 7.0397   LearningRate 0.0169   Epoch: 23   Global Step: 119160   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:41:44,707-Speed 10898.20 samples/sec   Loss 7.1076   LearningRate 0.0169   Epoch: 23   Global Step: 119170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:41:45,645-Speed 10928.42 samples/sec   Loss 7.1777   LearningRate 0.0169   Epoch: 23   Global Step: 119180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:41:46,586-Speed 10886.33 samples/sec   Loss 7.0004   LearningRate 0.0169   Epoch: 23   Global Step: 119190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:41:47,509-Speed 11106.74 samples/sec   Loss 6.8878   LearningRate 0.0169   Epoch: 23   Global Step: 119200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:41:48,455-Speed 10846.06 samples/sec   Loss 7.0707   LearningRate 0.0169   Epoch: 23   Global Step: 119210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:41:49,451-Speed 10284.43 samples/sec   Loss 7.0680   LearningRate 0.0169   Epoch: 23   Global Step: 119220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:41:50,408-Speed 10707.18 samples/sec   Loss 7.0230   LearningRate 0.0169   Epoch: 23   Global Step: 119230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:41:51,336-Speed 11047.37 samples/sec   Loss 7.0492   LearningRate 0.0169   Epoch: 23   Global Step: 119240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:41:52,302-Speed 10620.23 samples/sec   Loss 7.1080   LearningRate 0.0169   Epoch: 23   Global Step: 119250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:41:53,213-Speed 11259.38 samples/sec   Loss 7.0353   LearningRate 0.0169   Epoch: 23   Global Step: 119260   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:41:54,195-Speed 10437.88 samples/sec   Loss 7.0571   LearningRate 0.0169   Epoch: 23   Global Step: 119270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:41:55,154-Speed 10687.01 samples/sec   Loss 7.0644   LearningRate 0.0168   Epoch: 23   Global Step: 119280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:41:56,083-Speed 11024.53 samples/sec   Loss 7.1695   LearningRate 0.0168   Epoch: 23   Global Step: 119290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:41:56,998-Speed 11194.87 samples/sec   Loss 6.9837   LearningRate 0.0168   Epoch: 23   Global Step: 119300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:41:57,913-Speed 11205.31 samples/sec   Loss 7.1703   LearningRate 0.0168   Epoch: 23   Global Step: 119310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:41:58,877-Speed 10636.46 samples/sec   Loss 7.2099   LearningRate 0.0168   Epoch: 23   Global Step: 119320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:41:59,860-Speed 10423.05 samples/sec   Loss 6.9936   LearningRate 0.0168   Epoch: 23   Global Step: 119330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:42:00,827-Speed 10598.69 samples/sec   Loss 6.9686   LearningRate 0.0168   Epoch: 23   Global Step: 119340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:42:01,768-Speed 10889.96 samples/sec   Loss 7.1160   LearningRate 0.0168   Epoch: 23   Global Step: 119350   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:42:02,719-Speed 10785.12 samples/sec   Loss 6.9744   LearningRate 0.0168   Epoch: 23   Global Step: 119360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:42:03,663-Speed 10858.34 samples/sec   Loss 7.2531   LearningRate 0.0168   Epoch: 23   Global Step: 119370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:42:04,618-Speed 10734.90 samples/sec   Loss 7.0912   LearningRate 0.0168   Epoch: 23   Global Step: 119380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:42:05,556-Speed 10926.88 samples/sec   Loss 7.2259   LearningRate 0.0168   Epoch: 23   Global Step: 119390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:42:06,478-Speed 11116.73 samples/sec   Loss 7.0632   LearningRate 0.0168   Epoch: 23   Global Step: 119400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:42:07,409-Speed 11005.53 samples/sec   Loss 7.0951   LearningRate 0.0168   Epoch: 23   Global Step: 119410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:42:08,342-Speed 10989.91 samples/sec   Loss 7.0407   LearningRate 0.0168   Epoch: 23   Global Step: 119420   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:42:09,301-Speed 10681.29 samples/sec   Loss 7.0290   LearningRate 0.0168   Epoch: 23   Global Step: 119430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:42:10,261-Speed 10683.06 samples/sec   Loss 7.0365   LearningRate 0.0168   Epoch: 23   Global Step: 119440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:42:11,229-Speed 10586.01 samples/sec   Loss 7.0099   LearningRate 0.0168   Epoch: 23   Global Step: 119450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:42:12,215-Speed 10396.00 samples/sec   Loss 6.9440   LearningRate 0.0168   Epoch: 23   Global Step: 119460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:42:13,182-Speed 10589.58 samples/sec   Loss 7.0310   LearningRate 0.0168   Epoch: 23   Global Step: 119470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:42:14,157-Speed 10514.97 samples/sec   Loss 7.0496   LearningRate 0.0168   Epoch: 23   Global Step: 119480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:42:15,077-Speed 11135.26 samples/sec   Loss 6.9590   LearningRate 0.0168   Epoch: 23   Global Step: 119490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:42:16,010-Speed 10984.31 samples/sec   Loss 7.0350   LearningRate 0.0168   Epoch: 23   Global Step: 119500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:42:16,957-Speed 10824.46 samples/sec   Loss 7.1510   LearningRate 0.0168   Epoch: 23   Global Step: 119510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:42:17,891-Speed 10973.81 samples/sec   Loss 6.9522   LearningRate 0.0167   Epoch: 23   Global Step: 119520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:42:18,840-Speed 10795.08 samples/sec   Loss 7.0190   LearningRate 0.0167   Epoch: 23   Global Step: 119530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:42:19,825-Speed 10411.82 samples/sec   Loss 7.0802   LearningRate 0.0167   Epoch: 23   Global Step: 119540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:42:20,768-Speed 10867.22 samples/sec   Loss 7.0025   LearningRate 0.0167   Epoch: 23   Global Step: 119550   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:42:21,710-Speed 10867.99 samples/sec   Loss 7.1958   LearningRate 0.0167   Epoch: 23   Global Step: 119560   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:42:22,685-Speed 10518.51 samples/sec   Loss 7.1543   LearningRate 0.0167   Epoch: 23   Global Step: 119570   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:42:23,666-Speed 10443.67 samples/sec   Loss 7.0667   LearningRate 0.0167   Epoch: 23   Global Step: 119580   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:42:24,622-Speed 10719.17 samples/sec   Loss 7.1035   LearningRate 0.0167   Epoch: 23   Global Step: 119590   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:42:25,579-Speed 10718.35 samples/sec   Loss 6.9989   LearningRate 0.0167   Epoch: 23   Global Step: 119600   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:42:26,517-Speed 10920.93 samples/sec   Loss 7.0062   LearningRate 0.0167   Epoch: 23   Global Step: 119610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:42:27,481-Speed 10634.67 samples/sec   Loss 6.9422   LearningRate 0.0167   Epoch: 23   Global Step: 119620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:42:28,444-Speed 10641.13 samples/sec   Loss 7.1476   LearningRate 0.0167   Epoch: 23   Global Step: 119630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:42:29,378-Speed 10979.44 samples/sec   Loss 7.2095   LearningRate 0.0167   Epoch: 23   Global Step: 119640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:42:30,358-Speed 10460.83 samples/sec   Loss 7.0217   LearningRate 0.0167   Epoch: 23   Global Step: 119650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:42:31,289-Speed 11008.83 samples/sec   Loss 7.1105   LearningRate 0.0167   Epoch: 23   Global Step: 119660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:42:32,282-Speed 10323.80 samples/sec   Loss 7.1019   LearningRate 0.0167   Epoch: 23   Global Step: 119670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:42:33,210-Speed 11037.66 samples/sec   Loss 7.0693   LearningRate 0.0167   Epoch: 23   Global Step: 119680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:42:34,167-Speed 10705.73 samples/sec   Loss 7.1467   LearningRate 0.0167   Epoch: 23   Global Step: 119690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:42:35,150-Speed 10431.32 samples/sec   Loss 6.9103   LearningRate 0.0167   Epoch: 23   Global Step: 119700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:42:36,045-Speed 11461.56 samples/sec   Loss 7.1405   LearningRate 0.0167   Epoch: 23   Global Step: 119710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:42:36,991-Speed 10834.82 samples/sec   Loss 6.9766   LearningRate 0.0167   Epoch: 23   Global Step: 119720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:42:37,961-Speed 10564.55 samples/sec   Loss 7.0560   LearningRate 0.0167   Epoch: 23   Global Step: 119730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:42:38,916-Speed 10737.66 samples/sec   Loss 7.2138   LearningRate 0.0167   Epoch: 23   Global Step: 119740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:42:39,836-Speed 11132.63 samples/sec   Loss 7.2393   LearningRate 0.0167   Epoch: 23   Global Step: 119750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:42:40,751-Speed 11196.48 samples/sec   Loss 7.0916   LearningRate 0.0167   Epoch: 23   Global Step: 119760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:42:41,720-Speed 10578.61 samples/sec   Loss 6.9859   LearningRate 0.0166   Epoch: 23   Global Step: 119770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:42:42,696-Speed 10505.48 samples/sec   Loss 6.9728   LearningRate 0.0166   Epoch: 23   Global Step: 119780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:42:43,656-Speed 10670.29 samples/sec   Loss 7.0582   LearningRate 0.0166   Epoch: 23   Global Step: 119790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:42:44,567-Speed 11250.87 samples/sec   Loss 7.0721   LearningRate 0.0166   Epoch: 23   Global Step: 119800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:42:45,486-Speed 11157.98 samples/sec   Loss 7.0483   LearningRate 0.0166   Epoch: 23   Global Step: 119810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:42:46,399-Speed 11231.09 samples/sec   Loss 7.0877   LearningRate 0.0166   Epoch: 23   Global Step: 119820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:42:47,361-Speed 10647.77 samples/sec   Loss 7.0687   LearningRate 0.0166   Epoch: 23   Global Step: 119830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:42:48,331-Speed 10570.48 samples/sec   Loss 6.9744   LearningRate 0.0166   Epoch: 23   Global Step: 119840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:42:49,303-Speed 10546.96 samples/sec   Loss 7.1929   LearningRate 0.0166   Epoch: 23   Global Step: 119850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:42:50,281-Speed 10469.35 samples/sec   Loss 6.9427   LearningRate 0.0166   Epoch: 23   Global Step: 119860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:42:51,248-Speed 10598.46 samples/sec   Loss 7.0005   LearningRate 0.0166   Epoch: 23   Global Step: 119870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:42:52,197-Speed 10802.49 samples/sec   Loss 7.2208   LearningRate 0.0166   Epoch: 23   Global Step: 119880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:42:53,139-Speed 10878.83 samples/sec   Loss 7.1237   LearningRate 0.0166   Epoch: 23   Global Step: 119890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:42:54,165-Speed 9992.15 samples/sec   Loss 7.1483   LearningRate 0.0166   Epoch: 23   Global Step: 119900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:42:55,066-Speed 11374.56 samples/sec   Loss 7.1034   LearningRate 0.0166   Epoch: 23   Global Step: 119910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:42:56,051-Speed 10406.52 samples/sec   Loss 7.2163   LearningRate 0.0166   Epoch: 23   Global Step: 119920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:42:57,012-Speed 10655.74 samples/sec   Loss 7.1648   LearningRate 0.0166   Epoch: 23   Global Step: 119930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:42:58,001-Speed 10367.68 samples/sec   Loss 7.2214   LearningRate 0.0166   Epoch: 23   Global Step: 119940   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:42:58,965-Speed 10635.50 samples/sec   Loss 7.1072   LearningRate 0.0166   Epoch: 23   Global Step: 119950   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:42:59,932-Speed 10595.14 samples/sec   Loss 7.2243   LearningRate 0.0166   Epoch: 23   Global Step: 119960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:43:00,839-Speed 11298.89 samples/sec   Loss 7.2091   LearningRate 0.0166   Epoch: 23   Global Step: 119970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:43:01,801-Speed 10663.53 samples/sec   Loss 7.1415   LearningRate 0.0166   Epoch: 23   Global Step: 119980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:43:02,753-Speed 10764.17 samples/sec   Loss 7.0964   LearningRate 0.0166   Epoch: 23   Global Step: 119990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:43:03,669-Speed 11191.89 samples/sec   Loss 7.0382   LearningRate 0.0166   Epoch: 23   Global Step: 120000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:43:25,708-[lfw][120000]XNorm: 9.910217
Training: 2022-04-11 03:43:25,709-[lfw][120000]Accuracy-Flip: 0.99683+-0.00283
Training: 2022-04-11 03:43:25,709-[lfw][120000]Accuracy-Highest: 0.99683
Training: 2022-04-11 03:43:51,036-[cfp_fp][120000]XNorm: 8.408316
Training: 2022-04-11 03:43:51,037-[cfp_fp][120000]Accuracy-Flip: 0.96471+-0.00930
Training: 2022-04-11 03:43:51,037-[cfp_fp][120000]Accuracy-Highest: 0.96600
Training: 2022-04-11 03:44:13,468-[agedb_30][120000]XNorm: 9.644138
Training: 2022-04-11 03:44:13,469-[agedb_30][120000]Accuracy-Flip: 0.96783+-0.00850
Training: 2022-04-11 03:44:13,470-[agedb_30][120000]Accuracy-Highest: 0.97017
Training: 2022-04-11 03:44:14,413-Speed 144.75 samples/sec   Loss 7.1287   LearningRate 0.0166   Epoch: 23   Global Step: 120010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:44:15,339-Speed 11069.54 samples/sec   Loss 7.2026   LearningRate 0.0165   Epoch: 23   Global Step: 120020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:44:16,272-Speed 10989.96 samples/sec   Loss 7.1949   LearningRate 0.0165   Epoch: 23   Global Step: 120030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:44:17,188-Speed 11189.91 samples/sec   Loss 6.9814   LearningRate 0.0165   Epoch: 23   Global Step: 120040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:44:18,139-Speed 10776.30 samples/sec   Loss 7.0864   LearningRate 0.0165   Epoch: 23   Global Step: 120050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:44:19,087-Speed 10811.27 samples/sec   Loss 7.1485   LearningRate 0.0165   Epoch: 23   Global Step: 120060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:44:20,022-Speed 10959.17 samples/sec   Loss 7.0597   LearningRate 0.0165   Epoch: 23   Global Step: 120070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:44:21,009-Speed 10384.30 samples/sec   Loss 7.1224   LearningRate 0.0165   Epoch: 23   Global Step: 120080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:44:21,972-Speed 10658.29 samples/sec   Loss 7.0245   LearningRate 0.0165   Epoch: 23   Global Step: 120090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:44:22,936-Speed 10631.48 samples/sec   Loss 7.2614   LearningRate 0.0165   Epoch: 23   Global Step: 120100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:44:23,899-Speed 10648.56 samples/sec   Loss 7.1825   LearningRate 0.0165   Epoch: 23   Global Step: 120110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:44:24,825-Speed 11069.65 samples/sec   Loss 7.0435   LearningRate 0.0165   Epoch: 23   Global Step: 120120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:44:25,804-Speed 10464.39 samples/sec   Loss 7.1383   LearningRate 0.0165   Epoch: 23   Global Step: 120130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:44:26,750-Speed 10835.44 samples/sec   Loss 7.0346   LearningRate 0.0165   Epoch: 23   Global Step: 120140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:44:27,718-Speed 10579.07 samples/sec   Loss 7.0752   LearningRate 0.0165   Epoch: 23   Global Step: 120150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:44:28,697-Speed 10480.21 samples/sec   Loss 7.1736   LearningRate 0.0165   Epoch: 23   Global Step: 120160   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:44:29,659-Speed 10648.81 samples/sec   Loss 7.1731   LearningRate 0.0165   Epoch: 23   Global Step: 120170   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:44:30,597-Speed 10930.13 samples/sec   Loss 7.1990   LearningRate 0.0165   Epoch: 23   Global Step: 120180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:44:31,505-Speed 11277.57 samples/sec   Loss 7.2127   LearningRate 0.0165   Epoch: 23   Global Step: 120190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:44:32,500-Speed 10307.39 samples/sec   Loss 7.2129   LearningRate 0.0165   Epoch: 23   Global Step: 120200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:44:33,454-Speed 10748.17 samples/sec   Loss 7.0264   LearningRate 0.0165   Epoch: 23   Global Step: 120210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:44:34,396-Speed 10869.97 samples/sec   Loss 7.2175   LearningRate 0.0165   Epoch: 23   Global Step: 120220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:44:35,381-Speed 10415.68 samples/sec   Loss 7.1052   LearningRate 0.0165   Epoch: 23   Global Step: 120230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:44:36,298-Speed 11169.69 samples/sec   Loss 6.9277   LearningRate 0.0165   Epoch: 23   Global Step: 120240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:44:37,258-Speed 10673.53 samples/sec   Loss 7.0876   LearningRate 0.0165   Epoch: 23   Global Step: 120250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:44:38,184-Speed 11074.37 samples/sec   Loss 7.0982   LearningRate 0.0165   Epoch: 23   Global Step: 120260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:44:39,148-Speed 10631.02 samples/sec   Loss 7.0876   LearningRate 0.0164   Epoch: 23   Global Step: 120270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:44:40,078-Speed 11018.06 samples/sec   Loss 6.9888   LearningRate 0.0164   Epoch: 23   Global Step: 120280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:44:41,023-Speed 10845.06 samples/sec   Loss 6.8939   LearningRate 0.0164   Epoch: 23   Global Step: 120290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:44:41,990-Speed 10592.05 samples/sec   Loss 7.0263   LearningRate 0.0164   Epoch: 23   Global Step: 120300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:44:42,942-Speed 10770.17 samples/sec   Loss 6.9571   LearningRate 0.0164   Epoch: 23   Global Step: 120310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:44:43,938-Speed 10289.40 samples/sec   Loss 7.1953   LearningRate 0.0164   Epoch: 23   Global Step: 120320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:44:44,929-Speed 10334.61 samples/sec   Loss 6.9982   LearningRate 0.0164   Epoch: 23   Global Step: 120330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:44:45,899-Speed 10570.94 samples/sec   Loss 6.9513   LearningRate 0.0164   Epoch: 23   Global Step: 120340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:44:46,864-Speed 10614.72 samples/sec   Loss 6.9710   LearningRate 0.0164   Epoch: 23   Global Step: 120350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:44:47,804-Speed 10912.34 samples/sec   Loss 7.2227   LearningRate 0.0164   Epoch: 23   Global Step: 120360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:44:48,769-Speed 10616.61 samples/sec   Loss 7.0636   LearningRate 0.0164   Epoch: 23   Global Step: 120370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:44:49,724-Speed 10729.05 samples/sec   Loss 7.1267   LearningRate 0.0164   Epoch: 23   Global Step: 120380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:44:50,699-Speed 10513.90 samples/sec   Loss 7.1226   LearningRate 0.0164   Epoch: 23   Global Step: 120390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:44:51,665-Speed 10601.05 samples/sec   Loss 7.1582   LearningRate 0.0164   Epoch: 23   Global Step: 120400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:44:52,588-Speed 11114.89 samples/sec   Loss 7.1637   LearningRate 0.0164   Epoch: 23   Global Step: 120410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:44:53,542-Speed 10742.24 samples/sec   Loss 7.1076   LearningRate 0.0164   Epoch: 23   Global Step: 120420   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:44:54,518-Speed 10499.97 samples/sec   Loss 7.1510   LearningRate 0.0164   Epoch: 23   Global Step: 120430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:44:55,475-Speed 10712.01 samples/sec   Loss 7.1886   LearningRate 0.0164   Epoch: 23   Global Step: 120440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:44:56,435-Speed 10675.54 samples/sec   Loss 6.9553   LearningRate 0.0164   Epoch: 23   Global Step: 120450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:44:57,401-Speed 10610.09 samples/sec   Loss 7.2236   LearningRate 0.0164   Epoch: 23   Global Step: 120460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:44:58,337-Speed 10954.15 samples/sec   Loss 7.1059   LearningRate 0.0164   Epoch: 23   Global Step: 120470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:44:59,298-Speed 10668.33 samples/sec   Loss 7.1726   LearningRate 0.0164   Epoch: 23   Global Step: 120480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:45:00,266-Speed 10575.44 samples/sec   Loss 7.0674   LearningRate 0.0164   Epoch: 23   Global Step: 120490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:45:01,238-Speed 10556.34 samples/sec   Loss 7.1203   LearningRate 0.0164   Epoch: 23   Global Step: 120500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:45:02,179-Speed 10886.72 samples/sec   Loss 7.0863   LearningRate 0.0164   Epoch: 23   Global Step: 120510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:45:03,152-Speed 10529.99 samples/sec   Loss 7.0967   LearningRate 0.0163   Epoch: 23   Global Step: 120520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:45:04,108-Speed 10727.58 samples/sec   Loss 7.2694   LearningRate 0.0163   Epoch: 23   Global Step: 120530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:45:05,066-Speed 10702.34 samples/sec   Loss 7.0977   LearningRate 0.0163   Epoch: 23   Global Step: 120540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:45:05,995-Speed 11043.30 samples/sec   Loss 7.1069   LearningRate 0.0163   Epoch: 23   Global Step: 120550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:45:06,969-Speed 10528.18 samples/sec   Loss 7.0726   LearningRate 0.0163   Epoch: 23   Global Step: 120560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:45:07,925-Speed 10715.51 samples/sec   Loss 7.0849   LearningRate 0.0163   Epoch: 23   Global Step: 120570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:45:08,860-Speed 10963.07 samples/sec   Loss 7.0587   LearningRate 0.0163   Epoch: 23   Global Step: 120580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:45:09,828-Speed 10593.40 samples/sec   Loss 7.0964   LearningRate 0.0163   Epoch: 23   Global Step: 120590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:45:10,809-Speed 10441.33 samples/sec   Loss 7.1793   LearningRate 0.0163   Epoch: 23   Global Step: 120600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:45:11,770-Speed 10663.04 samples/sec   Loss 7.1619   LearningRate 0.0163   Epoch: 23   Global Step: 120610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:45:12,734-Speed 10631.81 samples/sec   Loss 7.0519   LearningRate 0.0163   Epoch: 23   Global Step: 120620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:45:13,674-Speed 10905.94 samples/sec   Loss 7.1215   LearningRate 0.0163   Epoch: 23   Global Step: 120630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:45:14,606-Speed 10995.05 samples/sec   Loss 7.1870   LearningRate 0.0163   Epoch: 23   Global Step: 120640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:45:15,541-Speed 10950.03 samples/sec   Loss 7.0369   LearningRate 0.0163   Epoch: 23   Global Step: 120650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:45:16,506-Speed 10630.59 samples/sec   Loss 7.0286   LearningRate 0.0163   Epoch: 23   Global Step: 120660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:45:17,480-Speed 10525.80 samples/sec   Loss 7.0526   LearningRate 0.0163   Epoch: 23   Global Step: 120670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:45:18,430-Speed 10787.82 samples/sec   Loss 7.1177   LearningRate 0.0163   Epoch: 23   Global Step: 120680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:45:19,395-Speed 10618.40 samples/sec   Loss 7.1077   LearningRate 0.0163   Epoch: 23   Global Step: 120690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:45:20,354-Speed 10692.52 samples/sec   Loss 7.1238   LearningRate 0.0163   Epoch: 23   Global Step: 120700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:45:21,305-Speed 10768.39 samples/sec   Loss 7.2535   LearningRate 0.0163   Epoch: 23   Global Step: 120710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:45:22,254-Speed 10803.14 samples/sec   Loss 6.9492   LearningRate 0.0163   Epoch: 23   Global Step: 120720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:45:23,282-Speed 9966.57 samples/sec   Loss 7.1708   LearningRate 0.0163   Epoch: 23   Global Step: 120730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:45:24,226-Speed 10856.24 samples/sec   Loss 7.0982   LearningRate 0.0163   Epoch: 23   Global Step: 120740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:45:25,170-Speed 10852.26 samples/sec   Loss 7.1254   LearningRate 0.0163   Epoch: 23   Global Step: 120750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:45:26,141-Speed 10562.68 samples/sec   Loss 7.0434   LearningRate 0.0163   Epoch: 23   Global Step: 120760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:45:27,127-Speed 10396.52 samples/sec   Loss 7.1947   LearningRate 0.0162   Epoch: 23   Global Step: 120770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:45:28,068-Speed 10885.36 samples/sec   Loss 7.1995   LearningRate 0.0162   Epoch: 23   Global Step: 120780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:45:29,019-Speed 10775.02 samples/sec   Loss 7.1588   LearningRate 0.0162   Epoch: 23   Global Step: 120790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:45:29,957-Speed 10921.93 samples/sec   Loss 7.3674   LearningRate 0.0162   Epoch: 23   Global Step: 120800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:45:30,902-Speed 10851.87 samples/sec   Loss 6.9821   LearningRate 0.0162   Epoch: 23   Global Step: 120810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:45:31,856-Speed 10738.34 samples/sec   Loss 7.0693   LearningRate 0.0162   Epoch: 23   Global Step: 120820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:45:32,753-Speed 11434.11 samples/sec   Loss 7.1003   LearningRate 0.0162   Epoch: 23   Global Step: 120830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:45:33,680-Speed 11047.38 samples/sec   Loss 6.8936   LearningRate 0.0162   Epoch: 23   Global Step: 120840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:45:34,655-Speed 10506.60 samples/sec   Loss 7.1879   LearningRate 0.0162   Epoch: 23   Global Step: 120850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:45:35,602-Speed 10827.51 samples/sec   Loss 7.0697   LearningRate 0.0162   Epoch: 23   Global Step: 120860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:45:36,540-Speed 10929.78 samples/sec   Loss 7.0090   LearningRate 0.0162   Epoch: 23   Global Step: 120870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:45:37,517-Speed 10490.34 samples/sec   Loss 7.2352   LearningRate 0.0162   Epoch: 23   Global Step: 120880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:45:38,490-Speed 10529.43 samples/sec   Loss 7.0505   LearningRate 0.0162   Epoch: 23   Global Step: 120890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:45:39,422-Speed 11001.67 samples/sec   Loss 7.2042   LearningRate 0.0162   Epoch: 23   Global Step: 120900   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:45:40,351-Speed 11027.60 samples/sec   Loss 7.0572   LearningRate 0.0162   Epoch: 23   Global Step: 120910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:45:41,264-Speed 11231.61 samples/sec   Loss 7.2027   LearningRate 0.0162   Epoch: 23   Global Step: 120920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:45:42,238-Speed 10521.29 samples/sec   Loss 7.1427   LearningRate 0.0162   Epoch: 23   Global Step: 120930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:45:43,154-Speed 11190.54 samples/sec   Loss 6.9795   LearningRate 0.0162   Epoch: 23   Global Step: 120940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:45:44,093-Speed 10912.19 samples/sec   Loss 7.2741   LearningRate 0.0162   Epoch: 23   Global Step: 120950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:45:45,077-Speed 10416.14 samples/sec   Loss 7.0168   LearningRate 0.0162   Epoch: 23   Global Step: 120960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:45:46,009-Speed 10993.87 samples/sec   Loss 7.1406   LearningRate 0.0162   Epoch: 23   Global Step: 120970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:45:46,955-Speed 10834.95 samples/sec   Loss 7.1400   LearningRate 0.0162   Epoch: 23   Global Step: 120980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:45:47,915-Speed 10704.29 samples/sec   Loss 7.0798   LearningRate 0.0162   Epoch: 23   Global Step: 120990   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:45:48,898-Speed 10423.60 samples/sec   Loss 7.1602   LearningRate 0.0162   Epoch: 23   Global Step: 121000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:45:49,850-Speed 10767.69 samples/sec   Loss 7.1520   LearningRate 0.0162   Epoch: 23   Global Step: 121010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:45:50,769-Speed 11156.52 samples/sec   Loss 7.1174   LearningRate 0.0161   Epoch: 23   Global Step: 121020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:45:51,718-Speed 10787.79 samples/sec   Loss 7.2471   LearningRate 0.0161   Epoch: 23   Global Step: 121030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:45:52,652-Speed 10975.82 samples/sec   Loss 7.1550   LearningRate 0.0161   Epoch: 23   Global Step: 121040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:45:53,634-Speed 10437.95 samples/sec   Loss 7.0697   LearningRate 0.0161   Epoch: 23   Global Step: 121050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:45:54,560-Speed 11059.50 samples/sec   Loss 7.2804   LearningRate 0.0161   Epoch: 23   Global Step: 121060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:45:55,480-Speed 11150.17 samples/sec   Loss 7.1486   LearningRate 0.0161   Epoch: 23   Global Step: 121070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:45:56,445-Speed 10612.91 samples/sec   Loss 7.0583   LearningRate 0.0161   Epoch: 23   Global Step: 121080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:45:57,401-Speed 10723.54 samples/sec   Loss 7.2014   LearningRate 0.0161   Epoch: 23   Global Step: 121090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:45:58,359-Speed 10693.14 samples/sec   Loss 7.0045   LearningRate 0.0161   Epoch: 23   Global Step: 121100   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:45:59,325-Speed 10617.27 samples/sec   Loss 7.0566   LearningRate 0.0161   Epoch: 23   Global Step: 121110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:46:00,307-Speed 10432.90 samples/sec   Loss 7.1416   LearningRate 0.0161   Epoch: 23   Global Step: 121120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:46:01,303-Speed 10285.76 samples/sec   Loss 7.0331   LearningRate 0.0161   Epoch: 23   Global Step: 121130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:46:02,272-Speed 10575.50 samples/sec   Loss 7.2120   LearningRate 0.0161   Epoch: 23   Global Step: 121140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:46:03,165-Speed 11481.44 samples/sec   Loss 7.2882   LearningRate 0.0161   Epoch: 23   Global Step: 121150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:46:04,087-Speed 11119.91 samples/sec   Loss 7.1991   LearningRate 0.0161   Epoch: 23   Global Step: 121160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:46:05,049-Speed 10657.80 samples/sec   Loss 7.0973   LearningRate 0.0161   Epoch: 23   Global Step: 121170   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:46:05,992-Speed 10863.23 samples/sec   Loss 7.1005   LearningRate 0.0161   Epoch: 23   Global Step: 121180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:46:06,970-Speed 10483.98 samples/sec   Loss 7.2164   LearningRate 0.0161   Epoch: 23   Global Step: 121190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:46:07,936-Speed 10606.16 samples/sec   Loss 7.0841   LearningRate 0.0161   Epoch: 23   Global Step: 121200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:46:08,928-Speed 10329.52 samples/sec   Loss 7.1806   LearningRate 0.0161   Epoch: 23   Global Step: 121210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:46:09,887-Speed 10697.14 samples/sec   Loss 6.9921   LearningRate 0.0161   Epoch: 23   Global Step: 121220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:46:10,849-Speed 10657.06 samples/sec   Loss 7.1995   LearningRate 0.0161   Epoch: 23   Global Step: 121230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:46:11,849-Speed 10253.64 samples/sec   Loss 7.0586   LearningRate 0.0161   Epoch: 23   Global Step: 121240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:46:12,760-Speed 11239.25 samples/sec   Loss 7.4764   LearningRate 0.0161   Epoch: 23   Global Step: 121250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:46:13,738-Speed 10487.48 samples/sec   Loss 7.1538   LearningRate 0.0161   Epoch: 23   Global Step: 121260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:46:14,678-Speed 10894.16 samples/sec   Loss 7.1854   LearningRate 0.0160   Epoch: 23   Global Step: 121270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:46:15,615-Speed 10943.88 samples/sec   Loss 7.1296   LearningRate 0.0160   Epoch: 23   Global Step: 121280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:46:16,574-Speed 10680.03 samples/sec   Loss 7.1855   LearningRate 0.0160   Epoch: 23   Global Step: 121290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:46:17,504-Speed 11025.73 samples/sec   Loss 7.1147   LearningRate 0.0160   Epoch: 23   Global Step: 121300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:46:18,449-Speed 10845.54 samples/sec   Loss 7.1375   LearningRate 0.0160   Epoch: 23   Global Step: 121310   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:46:19,425-Speed 10499.47 samples/sec   Loss 6.9481   LearningRate 0.0160   Epoch: 23   Global Step: 121320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:46:20,342-Speed 11176.69 samples/sec   Loss 7.1806   LearningRate 0.0160   Epoch: 23   Global Step: 121330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:46:21,277-Speed 10955.91 samples/sec   Loss 7.0545   LearningRate 0.0160   Epoch: 23   Global Step: 121340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:46:22,164-Speed 11561.28 samples/sec   Loss 7.1709   LearningRate 0.0160   Epoch: 23   Global Step: 121350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:46:23,139-Speed 10522.14 samples/sec   Loss 7.0282   LearningRate 0.0160   Epoch: 23   Global Step: 121360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:46:24,095-Speed 10720.83 samples/sec   Loss 7.1787   LearningRate 0.0160   Epoch: 23   Global Step: 121370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:46:25,028-Speed 10993.16 samples/sec   Loss 6.9905   LearningRate 0.0160   Epoch: 23   Global Step: 121380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:46:26,058-Speed 9946.11 samples/sec   Loss 7.1993   LearningRate 0.0160   Epoch: 23   Global Step: 121390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:46:36,342-Speed 995.89 samples/sec   Loss 6.5173   LearningRate 0.0160   Epoch: 24   Global Step: 121400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:46:37,287-Speed 10845.84 samples/sec   Loss 6.2983   LearningRate 0.0160   Epoch: 24   Global Step: 121410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:46:38,337-Speed 9763.63 samples/sec   Loss 6.2898   LearningRate 0.0160   Epoch: 24   Global Step: 121420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:46:39,301-Speed 10624.25 samples/sec   Loss 6.3743   LearningRate 0.0160   Epoch: 24   Global Step: 121430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:46:40,451-Speed 8912.57 samples/sec   Loss 6.4478   LearningRate 0.0160   Epoch: 24   Global Step: 121440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:46:41,471-Speed 10053.97 samples/sec   Loss 6.4345   LearningRate 0.0160   Epoch: 24   Global Step: 121450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:46:42,417-Speed 10846.13 samples/sec   Loss 6.4042   LearningRate 0.0160   Epoch: 24   Global Step: 121460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:46:43,367-Speed 10791.94 samples/sec   Loss 6.2232   LearningRate 0.0160   Epoch: 24   Global Step: 121470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:46:44,365-Speed 10266.44 samples/sec   Loss 6.2666   LearningRate 0.0160   Epoch: 24   Global Step: 121480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:46:45,334-Speed 10575.48 samples/sec   Loss 6.3972   LearningRate 0.0160   Epoch: 24   Global Step: 121490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:46:46,292-Speed 10704.21 samples/sec   Loss 6.2795   LearningRate 0.0160   Epoch: 24   Global Step: 121500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:46:47,279-Speed 10390.01 samples/sec   Loss 6.4228   LearningRate 0.0160   Epoch: 24   Global Step: 121510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:46:48,384-Speed 9275.03 samples/sec   Loss 6.3856   LearningRate 0.0159   Epoch: 24   Global Step: 121520   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:46:49,490-Speed 9264.02 samples/sec   Loss 6.3136   LearningRate 0.0159   Epoch: 24   Global Step: 121530   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:46:50,470-Speed 10463.40 samples/sec   Loss 6.4129   LearningRate 0.0159   Epoch: 24   Global Step: 121540   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:46:51,414-Speed 10864.13 samples/sec   Loss 6.4641   LearningRate 0.0159   Epoch: 24   Global Step: 121550   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:46:52,376-Speed 10648.28 samples/sec   Loss 6.4400   LearningRate 0.0159   Epoch: 24   Global Step: 121560   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:46:53,355-Speed 10471.27 samples/sec   Loss 6.3424   LearningRate 0.0159   Epoch: 24   Global Step: 121570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:46:54,296-Speed 10910.00 samples/sec   Loss 6.3409   LearningRate 0.0159   Epoch: 24   Global Step: 121580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:46:55,234-Speed 10919.33 samples/sec   Loss 6.4318   LearningRate 0.0159   Epoch: 24   Global Step: 121590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:46:56,195-Speed 10682.62 samples/sec   Loss 6.3487   LearningRate 0.0159   Epoch: 24   Global Step: 121600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:46:57,132-Speed 10929.50 samples/sec   Loss 6.5355   LearningRate 0.0159   Epoch: 24   Global Step: 121610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:46:58,109-Speed 10498.99 samples/sec   Loss 6.4342   LearningRate 0.0159   Epoch: 24   Global Step: 121620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:46:59,082-Speed 10527.46 samples/sec   Loss 6.3897   LearningRate 0.0159   Epoch: 24   Global Step: 121630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:47:00,062-Speed 10457.64 samples/sec   Loss 6.4361   LearningRate 0.0159   Epoch: 24   Global Step: 121640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:47:00,998-Speed 10949.85 samples/sec   Loss 6.3330   LearningRate 0.0159   Epoch: 24   Global Step: 121650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:47:01,961-Speed 10641.17 samples/sec   Loss 6.4003   LearningRate 0.0159   Epoch: 24   Global Step: 121660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:47:02,904-Speed 10872.61 samples/sec   Loss 6.4771   LearningRate 0.0159   Epoch: 24   Global Step: 121670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:47:03,879-Speed 10509.45 samples/sec   Loss 6.4706   LearningRate 0.0159   Epoch: 24   Global Step: 121680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:47:04,840-Speed 10664.90 samples/sec   Loss 6.4368   LearningRate 0.0159   Epoch: 24   Global Step: 121690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:47:05,781-Speed 10893.80 samples/sec   Loss 6.5679   LearningRate 0.0159   Epoch: 24   Global Step: 121700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:47:06,755-Speed 10517.68 samples/sec   Loss 6.4079   LearningRate 0.0159   Epoch: 24   Global Step: 121710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:47:07,687-Speed 11005.81 samples/sec   Loss 6.5098   LearningRate 0.0159   Epoch: 24   Global Step: 121720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:47:08,638-Speed 10772.85 samples/sec   Loss 6.5136   LearningRate 0.0159   Epoch: 24   Global Step: 121730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:47:09,583-Speed 10849.60 samples/sec   Loss 6.4664   LearningRate 0.0159   Epoch: 24   Global Step: 121740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:47:10,564-Speed 10450.53 samples/sec   Loss 6.3997   LearningRate 0.0159   Epoch: 24   Global Step: 121750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:47:11,558-Speed 10310.55 samples/sec   Loss 6.5073   LearningRate 0.0159   Epoch: 24   Global Step: 121760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:47:12,503-Speed 10849.85 samples/sec   Loss 6.5238   LearningRate 0.0159   Epoch: 24   Global Step: 121770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:47:13,452-Speed 10795.27 samples/sec   Loss 6.5629   LearningRate 0.0158   Epoch: 24   Global Step: 121780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:47:14,419-Speed 10602.78 samples/sec   Loss 6.5653   LearningRate 0.0158   Epoch: 24   Global Step: 121790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:47:15,385-Speed 10602.09 samples/sec   Loss 6.4826   LearningRate 0.0158   Epoch: 24   Global Step: 121800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:47:16,314-Speed 11033.81 samples/sec   Loss 6.3898   LearningRate 0.0158   Epoch: 24   Global Step: 121810   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:47:17,299-Speed 10404.02 samples/sec   Loss 6.3607   LearningRate 0.0158   Epoch: 24   Global Step: 121820   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:47:18,217-Speed 11164.46 samples/sec   Loss 6.4353   LearningRate 0.0158   Epoch: 24   Global Step: 121830   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:47:19,166-Speed 10808.75 samples/sec   Loss 6.5672   LearningRate 0.0158   Epoch: 24   Global Step: 121840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:47:20,157-Speed 10335.94 samples/sec   Loss 6.4955   LearningRate 0.0158   Epoch: 24   Global Step: 121850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:47:21,100-Speed 10871.30 samples/sec   Loss 6.5249   LearningRate 0.0158   Epoch: 24   Global Step: 121860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:47:22,075-Speed 10513.36 samples/sec   Loss 6.4928   LearningRate 0.0158   Epoch: 24   Global Step: 121870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:47:23,044-Speed 10581.22 samples/sec   Loss 6.4243   LearningRate 0.0158   Epoch: 24   Global Step: 121880   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:47:24,024-Speed 10459.26 samples/sec   Loss 6.3957   LearningRate 0.0158   Epoch: 24   Global Step: 121890   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:47:25,005-Speed 10448.84 samples/sec   Loss 6.5699   LearningRate 0.0158   Epoch: 24   Global Step: 121900   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:47:25,971-Speed 10605.11 samples/sec   Loss 6.3475   LearningRate 0.0158   Epoch: 24   Global Step: 121910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:47:26,933-Speed 10654.59 samples/sec   Loss 6.7081   LearningRate 0.0158   Epoch: 24   Global Step: 121920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:47:27,891-Speed 10696.83 samples/sec   Loss 6.3499   LearningRate 0.0158   Epoch: 24   Global Step: 121930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:47:28,896-Speed 10205.95 samples/sec   Loss 6.4486   LearningRate 0.0158   Epoch: 24   Global Step: 121940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:47:29,868-Speed 10545.05 samples/sec   Loss 6.5144   LearningRate 0.0158   Epoch: 24   Global Step: 121950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:47:30,820-Speed 10768.73 samples/sec   Loss 6.4775   LearningRate 0.0158   Epoch: 24   Global Step: 121960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:47:31,746-Speed 11057.96 samples/sec   Loss 6.5163   LearningRate 0.0158   Epoch: 24   Global Step: 121970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:47:32,675-Speed 11030.40 samples/sec   Loss 6.5267   LearningRate 0.0158   Epoch: 24   Global Step: 121980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:47:33,613-Speed 10931.53 samples/sec   Loss 6.5968   LearningRate 0.0158   Epoch: 24   Global Step: 121990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:47:34,544-Speed 11004.76 samples/sec   Loss 6.4206   LearningRate 0.0158   Epoch: 24   Global Step: 122000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:47:56,509-[lfw][122000]XNorm: 9.825811
Training: 2022-04-11 03:47:56,510-[lfw][122000]Accuracy-Flip: 0.99583+-0.00318
Training: 2022-04-11 03:47:56,510-[lfw][122000]Accuracy-Highest: 0.99683
Training: 2022-04-11 03:48:21,790-[cfp_fp][122000]XNorm: 8.450441
Training: 2022-04-11 03:48:21,790-[cfp_fp][122000]Accuracy-Flip: 0.96529+-0.01008
Training: 2022-04-11 03:48:21,791-[cfp_fp][122000]Accuracy-Highest: 0.96600
Training: 2022-04-11 03:48:43,588-[agedb_30][122000]XNorm: 9.623812
Training: 2022-04-11 03:48:43,588-[agedb_30][122000]Accuracy-Flip: 0.96517+-0.00893
Training: 2022-04-11 03:48:43,588-[agedb_30][122000]Accuracy-Highest: 0.97017
Training: 2022-04-11 03:48:44,542-Speed 146.29 samples/sec   Loss 6.7422   LearningRate 0.0158   Epoch: 24   Global Step: 122010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:48:45,489-Speed 10820.83 samples/sec   Loss 6.5478   LearningRate 0.0158   Epoch: 24   Global Step: 122020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:48:46,408-Speed 11155.61 samples/sec   Loss 6.5450   LearningRate 0.0157   Epoch: 24   Global Step: 122030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:48:47,351-Speed 10863.02 samples/sec   Loss 6.4484   LearningRate 0.0157   Epoch: 24   Global Step: 122040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:48:48,292-Speed 10895.94 samples/sec   Loss 6.5672   LearningRate 0.0157   Epoch: 24   Global Step: 122050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:48:49,274-Speed 10433.62 samples/sec   Loss 6.6288   LearningRate 0.0157   Epoch: 24   Global Step: 122060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:48:50,247-Speed 10534.31 samples/sec   Loss 6.6093   LearningRate 0.0157   Epoch: 24   Global Step: 122070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:48:51,164-Speed 11181.53 samples/sec   Loss 6.5608   LearningRate 0.0157   Epoch: 24   Global Step: 122080   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:48:52,120-Speed 10716.84 samples/sec   Loss 6.4777   LearningRate 0.0157   Epoch: 24   Global Step: 122090   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:48:53,079-Speed 10683.69 samples/sec   Loss 6.6838   LearningRate 0.0157   Epoch: 24   Global Step: 122100   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:48:54,019-Speed 10902.04 samples/sec   Loss 6.6272   LearningRate 0.0157   Epoch: 24   Global Step: 122110   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:48:54,955-Speed 10961.13 samples/sec   Loss 6.5329   LearningRate 0.0157   Epoch: 24   Global Step: 122120   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:48:55,908-Speed 10747.06 samples/sec   Loss 6.5935   LearningRate 0.0157   Epoch: 24   Global Step: 122130   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:48:56,812-Speed 11350.51 samples/sec   Loss 6.6318   LearningRate 0.0157   Epoch: 24   Global Step: 122140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:48:57,749-Speed 10933.88 samples/sec   Loss 6.6148   LearningRate 0.0157   Epoch: 24   Global Step: 122150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:48:58,716-Speed 10590.24 samples/sec   Loss 6.5034   LearningRate 0.0157   Epoch: 24   Global Step: 122160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:48:59,660-Speed 10860.52 samples/sec   Loss 6.5733   LearningRate 0.0157   Epoch: 24   Global Step: 122170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:49:00,613-Speed 10758.02 samples/sec   Loss 6.5868   LearningRate 0.0157   Epoch: 24   Global Step: 122180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:49:01,562-Speed 10791.65 samples/sec   Loss 6.6009   LearningRate 0.0157   Epoch: 24   Global Step: 122190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:49:02,496-Speed 10977.91 samples/sec   Loss 6.6103   LearningRate 0.0157   Epoch: 24   Global Step: 122200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:49:03,464-Speed 10586.58 samples/sec   Loss 6.6533   LearningRate 0.0157   Epoch: 24   Global Step: 122210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:49:04,422-Speed 10707.35 samples/sec   Loss 6.5216   LearningRate 0.0157   Epoch: 24   Global Step: 122220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:49:05,376-Speed 10742.77 samples/sec   Loss 6.6159   LearningRate 0.0157   Epoch: 24   Global Step: 122230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:49:06,335-Speed 10686.29 samples/sec   Loss 6.4895   LearningRate 0.0157   Epoch: 24   Global Step: 122240   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:49:07,285-Speed 10784.70 samples/sec   Loss 6.6463   LearningRate 0.0157   Epoch: 24   Global Step: 122250   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:49:08,283-Speed 10268.85 samples/sec   Loss 6.5417   LearningRate 0.0157   Epoch: 24   Global Step: 122260   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:49:09,268-Speed 10407.46 samples/sec   Loss 6.6539   LearningRate 0.0157   Epoch: 24   Global Step: 122270   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:49:10,225-Speed 10708.82 samples/sec   Loss 6.6156   LearningRate 0.0157   Epoch: 24   Global Step: 122280   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:49:11,146-Speed 11130.53 samples/sec   Loss 6.5571   LearningRate 0.0156   Epoch: 24   Global Step: 122290   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:49:12,125-Speed 10469.20 samples/sec   Loss 6.5944   LearningRate 0.0156   Epoch: 24   Global Step: 122300   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:49:13,089-Speed 10638.95 samples/sec   Loss 6.5229   LearningRate 0.0156   Epoch: 24   Global Step: 122310   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:49:14,075-Speed 10391.85 samples/sec   Loss 6.5162   LearningRate 0.0156   Epoch: 24   Global Step: 122320   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:49:15,012-Speed 10932.80 samples/sec   Loss 6.6422   LearningRate 0.0156   Epoch: 24   Global Step: 122330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:49:15,963-Speed 10772.72 samples/sec   Loss 6.6068   LearningRate 0.0156   Epoch: 24   Global Step: 122340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:49:16,910-Speed 10823.97 samples/sec   Loss 6.5761   LearningRate 0.0156   Epoch: 24   Global Step: 122350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:49:17,868-Speed 10692.86 samples/sec   Loss 6.6180   LearningRate 0.0156   Epoch: 24   Global Step: 122360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:49:18,821-Speed 10752.16 samples/sec   Loss 6.4939   LearningRate 0.0156   Epoch: 24   Global Step: 122370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:49:19,777-Speed 10729.04 samples/sec   Loss 6.6596   LearningRate 0.0156   Epoch: 24   Global Step: 122380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:49:20,733-Speed 10717.16 samples/sec   Loss 6.6626   LearningRate 0.0156   Epoch: 24   Global Step: 122390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:49:21,676-Speed 10863.87 samples/sec   Loss 6.5799   LearningRate 0.0156   Epoch: 24   Global Step: 122400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:49:22,665-Speed 10367.56 samples/sec   Loss 6.5673   LearningRate 0.0156   Epoch: 24   Global Step: 122410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:49:23,611-Speed 10843.85 samples/sec   Loss 6.6965   LearningRate 0.0156   Epoch: 24   Global Step: 122420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:49:24,556-Speed 10837.60 samples/sec   Loss 6.7502   LearningRate 0.0156   Epoch: 24   Global Step: 122430   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:49:25,476-Speed 11145.82 samples/sec   Loss 6.7502   LearningRate 0.0156   Epoch: 24   Global Step: 122440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:49:26,374-Speed 11404.04 samples/sec   Loss 6.6588   LearningRate 0.0156   Epoch: 24   Global Step: 122450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:49:27,352-Speed 10477.63 samples/sec   Loss 6.4271   LearningRate 0.0156   Epoch: 24   Global Step: 122460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:49:28,290-Speed 10926.21 samples/sec   Loss 6.6862   LearningRate 0.0156   Epoch: 24   Global Step: 122470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:49:29,239-Speed 10809.08 samples/sec   Loss 6.7747   LearningRate 0.0156   Epoch: 24   Global Step: 122480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:49:30,175-Speed 10938.27 samples/sec   Loss 6.6666   LearningRate 0.0156   Epoch: 24   Global Step: 122490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:49:31,088-Speed 11226.45 samples/sec   Loss 6.6562   LearningRate 0.0156   Epoch: 24   Global Step: 122500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:49:32,024-Speed 10952.45 samples/sec   Loss 6.6242   LearningRate 0.0156   Epoch: 24   Global Step: 122510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:49:32,983-Speed 10685.97 samples/sec   Loss 6.5646   LearningRate 0.0156   Epoch: 24   Global Step: 122520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:49:33,942-Speed 10697.85 samples/sec   Loss 6.7126   LearningRate 0.0156   Epoch: 24   Global Step: 122530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:49:34,906-Speed 10619.91 samples/sec   Loss 6.8682   LearningRate 0.0155   Epoch: 24   Global Step: 122540   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:49:35,847-Speed 10889.65 samples/sec   Loss 6.7313   LearningRate 0.0155   Epoch: 24   Global Step: 122550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:49:36,812-Speed 10626.16 samples/sec   Loss 6.6930   LearningRate 0.0155   Epoch: 24   Global Step: 122560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:49:37,756-Speed 10860.64 samples/sec   Loss 6.7690   LearningRate 0.0155   Epoch: 24   Global Step: 122570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:49:38,699-Speed 10860.97 samples/sec   Loss 6.8190   LearningRate 0.0155   Epoch: 24   Global Step: 122580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:49:39,621-Speed 11125.11 samples/sec   Loss 6.6834   LearningRate 0.0155   Epoch: 24   Global Step: 122590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:49:40,591-Speed 10558.74 samples/sec   Loss 6.6673   LearningRate 0.0155   Epoch: 24   Global Step: 122600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:49:41,520-Speed 11038.04 samples/sec   Loss 6.8266   LearningRate 0.0155   Epoch: 24   Global Step: 122610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:49:42,480-Speed 10669.25 samples/sec   Loss 6.6877   LearningRate 0.0155   Epoch: 24   Global Step: 122620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:49:43,455-Speed 10513.98 samples/sec   Loss 6.5946   LearningRate 0.0155   Epoch: 24   Global Step: 122630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:49:44,395-Speed 10909.05 samples/sec   Loss 6.5718   LearningRate 0.0155   Epoch: 24   Global Step: 122640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:49:45,353-Speed 10689.79 samples/sec   Loss 6.7046   LearningRate 0.0155   Epoch: 24   Global Step: 122650   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:49:46,335-Speed 10440.04 samples/sec   Loss 6.7557   LearningRate 0.0155   Epoch: 24   Global Step: 122660   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:49:47,255-Speed 11129.23 samples/sec   Loss 6.6734   LearningRate 0.0155   Epoch: 24   Global Step: 122670   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:49:48,232-Speed 10498.53 samples/sec   Loss 6.8632   LearningRate 0.0155   Epoch: 24   Global Step: 122680   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:49:49,187-Speed 10733.52 samples/sec   Loss 6.5411   LearningRate 0.0155   Epoch: 24   Global Step: 122690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:49:50,145-Speed 10690.07 samples/sec   Loss 6.7668   LearningRate 0.0155   Epoch: 24   Global Step: 122700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:49:51,098-Speed 10754.27 samples/sec   Loss 6.7866   LearningRate 0.0155   Epoch: 24   Global Step: 122710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:49:52,048-Speed 10790.63 samples/sec   Loss 6.7234   LearningRate 0.0155   Epoch: 24   Global Step: 122720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:49:53,013-Speed 10618.19 samples/sec   Loss 6.8124   LearningRate 0.0155   Epoch: 24   Global Step: 122730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:49:53,998-Speed 10408.15 samples/sec   Loss 6.8613   LearningRate 0.0155   Epoch: 24   Global Step: 122740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:49:54,955-Speed 10710.33 samples/sec   Loss 6.8177   LearningRate 0.0155   Epoch: 24   Global Step: 122750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:49:55,890-Speed 10957.44 samples/sec   Loss 6.7102   LearningRate 0.0155   Epoch: 24   Global Step: 122760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:49:56,830-Speed 10905.81 samples/sec   Loss 6.7332   LearningRate 0.0155   Epoch: 24   Global Step: 122770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:49:57,770-Speed 10894.97 samples/sec   Loss 6.6679   LearningRate 0.0155   Epoch: 24   Global Step: 122780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:49:58,739-Speed 10583.67 samples/sec   Loss 6.7290   LearningRate 0.0155   Epoch: 24   Global Step: 122790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:49:59,680-Speed 10895.27 samples/sec   Loss 6.6759   LearningRate 0.0154   Epoch: 24   Global Step: 122800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:50:00,637-Speed 10705.62 samples/sec   Loss 6.6371   LearningRate 0.0154   Epoch: 24   Global Step: 122810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:50:01,580-Speed 10863.89 samples/sec   Loss 6.7284   LearningRate 0.0154   Epoch: 24   Global Step: 122820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:50:02,496-Speed 11194.91 samples/sec   Loss 6.8305   LearningRate 0.0154   Epoch: 24   Global Step: 122830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:50:03,455-Speed 10676.40 samples/sec   Loss 6.7684   LearningRate 0.0154   Epoch: 24   Global Step: 122840   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:50:04,417-Speed 10656.75 samples/sec   Loss 6.7986   LearningRate 0.0154   Epoch: 24   Global Step: 122850   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:50:05,371-Speed 10741.88 samples/sec   Loss 6.8025   LearningRate 0.0154   Epoch: 24   Global Step: 122860   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:50:06,311-Speed 10907.46 samples/sec   Loss 6.6993   LearningRate 0.0154   Epoch: 24   Global Step: 122870   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:50:07,250-Speed 10910.02 samples/sec   Loss 6.8455   LearningRate 0.0154   Epoch: 24   Global Step: 122880   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:50:08,164-Speed 11218.96 samples/sec   Loss 6.6544   LearningRate 0.0154   Epoch: 24   Global Step: 122890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:09,149-Speed 10412.60 samples/sec   Loss 6.8193   LearningRate 0.0154   Epoch: 24   Global Step: 122900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:10,145-Speed 10286.25 samples/sec   Loss 6.6944   LearningRate 0.0154   Epoch: 24   Global Step: 122910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:11,107-Speed 10649.41 samples/sec   Loss 6.7113   LearningRate 0.0154   Epoch: 24   Global Step: 122920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:12,069-Speed 10656.26 samples/sec   Loss 6.7814   LearningRate 0.0154   Epoch: 24   Global Step: 122930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:12,990-Speed 11126.33 samples/sec   Loss 6.8580   LearningRate 0.0154   Epoch: 24   Global Step: 122940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:13,969-Speed 10469.05 samples/sec   Loss 6.6224   LearningRate 0.0154   Epoch: 24   Global Step: 122950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:14,986-Speed 10082.00 samples/sec   Loss 6.7141   LearningRate 0.0154   Epoch: 24   Global Step: 122960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:15,933-Speed 10814.75 samples/sec   Loss 6.8671   LearningRate 0.0154   Epoch: 24   Global Step: 122970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:16,868-Speed 10969.37 samples/sec   Loss 6.7780   LearningRate 0.0154   Epoch: 24   Global Step: 122980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:17,824-Speed 10717.38 samples/sec   Loss 6.7504   LearningRate 0.0154   Epoch: 24   Global Step: 122990   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:50:18,746-Speed 11113.95 samples/sec   Loss 6.7966   LearningRate 0.0154   Epoch: 24   Global Step: 123000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:19,746-Speed 10253.38 samples/sec   Loss 6.7515   LearningRate 0.0154   Epoch: 24   Global Step: 123010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:20,727-Speed 10439.71 samples/sec   Loss 6.7515   LearningRate 0.0154   Epoch: 24   Global Step: 123020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:21,668-Speed 10893.66 samples/sec   Loss 6.6902   LearningRate 0.0154   Epoch: 24   Global Step: 123030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:22,598-Speed 11022.02 samples/sec   Loss 6.8802   LearningRate 0.0154   Epoch: 24   Global Step: 123040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:23,551-Speed 10758.15 samples/sec   Loss 6.9840   LearningRate 0.0154   Epoch: 24   Global Step: 123050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:24,506-Speed 10729.47 samples/sec   Loss 6.6362   LearningRate 0.0153   Epoch: 24   Global Step: 123060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:25,499-Speed 10318.75 samples/sec   Loss 6.7966   LearningRate 0.0153   Epoch: 24   Global Step: 123070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:26,440-Speed 10895.16 samples/sec   Loss 6.7546   LearningRate 0.0153   Epoch: 24   Global Step: 123080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:27,396-Speed 10717.45 samples/sec   Loss 6.8976   LearningRate 0.0153   Epoch: 24   Global Step: 123090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:28,394-Speed 10275.43 samples/sec   Loss 6.8426   LearningRate 0.0153   Epoch: 24   Global Step: 123100   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:50:29,372-Speed 10482.84 samples/sec   Loss 6.8267   LearningRate 0.0153   Epoch: 24   Global Step: 123110   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:50:30,352-Speed 10458.63 samples/sec   Loss 6.7956   LearningRate 0.0153   Epoch: 24   Global Step: 123120   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:50:31,301-Speed 10797.51 samples/sec   Loss 6.8383   LearningRate 0.0153   Epoch: 24   Global Step: 123130   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:50:32,305-Speed 10205.34 samples/sec   Loss 6.7535   LearningRate 0.0153   Epoch: 24   Global Step: 123140   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:50:33,248-Speed 10861.38 samples/sec   Loss 6.8420   LearningRate 0.0153   Epoch: 24   Global Step: 123150   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:50:34,204-Speed 10726.35 samples/sec   Loss 6.8212   LearningRate 0.0153   Epoch: 24   Global Step: 123160   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:50:35,152-Speed 10812.63 samples/sec   Loss 6.7208   LearningRate 0.0153   Epoch: 24   Global Step: 123170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:36,125-Speed 10536.35 samples/sec   Loss 6.7336   LearningRate 0.0153   Epoch: 24   Global Step: 123180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:37,103-Speed 10474.60 samples/sec   Loss 6.8702   LearningRate 0.0153   Epoch: 24   Global Step: 123190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:38,063-Speed 10675.85 samples/sec   Loss 6.7624   LearningRate 0.0153   Epoch: 24   Global Step: 123200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:39,035-Speed 10546.59 samples/sec   Loss 6.8109   LearningRate 0.0153   Epoch: 24   Global Step: 123210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:39,967-Speed 10999.67 samples/sec   Loss 6.7339   LearningRate 0.0153   Epoch: 24   Global Step: 123220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:40,955-Speed 10376.37 samples/sec   Loss 6.7901   LearningRate 0.0153   Epoch: 24   Global Step: 123230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:41,908-Speed 10754.48 samples/sec   Loss 6.8280   LearningRate 0.0153   Epoch: 24   Global Step: 123240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:42,843-Speed 10955.08 samples/sec   Loss 6.8666   LearningRate 0.0153   Epoch: 24   Global Step: 123250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:43,789-Speed 10834.61 samples/sec   Loss 6.9043   LearningRate 0.0153   Epoch: 24   Global Step: 123260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:44,734-Speed 10842.85 samples/sec   Loss 6.8112   LearningRate 0.0153   Epoch: 24   Global Step: 123270   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:50:45,690-Speed 10717.74 samples/sec   Loss 6.8182   LearningRate 0.0153   Epoch: 24   Global Step: 123280   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:50:46,652-Speed 10654.34 samples/sec   Loss 6.8444   LearningRate 0.0153   Epoch: 24   Global Step: 123290   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:50:47,627-Speed 10512.17 samples/sec   Loss 6.6872   LearningRate 0.0153   Epoch: 24   Global Step: 123300   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:50:48,548-Speed 11128.76 samples/sec   Loss 6.7789   LearningRate 0.0153   Epoch: 24   Global Step: 123310   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:50:49,463-Speed 11200.93 samples/sec   Loss 6.8006   LearningRate 0.0152   Epoch: 24   Global Step: 123320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:50,409-Speed 10837.26 samples/sec   Loss 6.8168   LearningRate 0.0152   Epoch: 24   Global Step: 123330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:51,362-Speed 10751.17 samples/sec   Loss 6.9042   LearningRate 0.0152   Epoch: 24   Global Step: 123340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:52,351-Speed 10365.24 samples/sec   Loss 6.7431   LearningRate 0.0152   Epoch: 24   Global Step: 123350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:53,313-Speed 10649.39 samples/sec   Loss 7.0083   LearningRate 0.0152   Epoch: 24   Global Step: 123360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:54,281-Speed 10595.41 samples/sec   Loss 6.7421   LearningRate 0.0152   Epoch: 24   Global Step: 123370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:55,230-Speed 10800.95 samples/sec   Loss 6.8407   LearningRate 0.0152   Epoch: 24   Global Step: 123380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:56,166-Speed 10946.76 samples/sec   Loss 6.7669   LearningRate 0.0152   Epoch: 24   Global Step: 123390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:57,075-Speed 11267.50 samples/sec   Loss 6.8655   LearningRate 0.0152   Epoch: 24   Global Step: 123400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:58,067-Speed 10335.74 samples/sec   Loss 6.8590   LearningRate 0.0152   Epoch: 24   Global Step: 123410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:59,023-Speed 10729.31 samples/sec   Loss 6.7997   LearningRate 0.0152   Epoch: 24   Global Step: 123420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:50:59,974-Speed 10776.13 samples/sec   Loss 6.8420   LearningRate 0.0152   Epoch: 24   Global Step: 123430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:00,970-Speed 10287.97 samples/sec   Loss 6.8723   LearningRate 0.0152   Epoch: 24   Global Step: 123440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:01,942-Speed 10539.79 samples/sec   Loss 6.9001   LearningRate 0.0152   Epoch: 24   Global Step: 123450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:02,908-Speed 10610.78 samples/sec   Loss 6.8614   LearningRate 0.0152   Epoch: 24   Global Step: 123460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:03,867-Speed 10689.58 samples/sec   Loss 6.8007   LearningRate 0.0152   Epoch: 24   Global Step: 123470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:04,828-Speed 10666.17 samples/sec   Loss 6.7776   LearningRate 0.0152   Epoch: 24   Global Step: 123480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:05,790-Speed 10650.92 samples/sec   Loss 6.8122   LearningRate 0.0152   Epoch: 24   Global Step: 123490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:06,743-Speed 10761.69 samples/sec   Loss 6.9497   LearningRate 0.0152   Epoch: 24   Global Step: 123500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:07,708-Speed 10623.15 samples/sec   Loss 6.8070   LearningRate 0.0152   Epoch: 24   Global Step: 123510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:08,657-Speed 10788.94 samples/sec   Loss 6.7751   LearningRate 0.0152   Epoch: 24   Global Step: 123520   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:51:09,590-Speed 10982.60 samples/sec   Loss 6.8527   LearningRate 0.0152   Epoch: 24   Global Step: 123530   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:51:10,511-Speed 11129.70 samples/sec   Loss 6.8301   LearningRate 0.0152   Epoch: 24   Global Step: 123540   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:51:11,509-Speed 10267.20 samples/sec   Loss 6.6824   LearningRate 0.0152   Epoch: 24   Global Step: 123550   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:51:12,422-Speed 11224.89 samples/sec   Loss 6.9296   LearningRate 0.0152   Epoch: 24   Global Step: 123560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:13,430-Speed 10172.07 samples/sec   Loss 6.8576   LearningRate 0.0152   Epoch: 24   Global Step: 123570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:14,436-Speed 10191.48 samples/sec   Loss 6.8530   LearningRate 0.0151   Epoch: 24   Global Step: 123580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:15,408-Speed 10539.27 samples/sec   Loss 6.9029   LearningRate 0.0151   Epoch: 24   Global Step: 123590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:16,332-Speed 11086.22 samples/sec   Loss 6.8445   LearningRate 0.0151   Epoch: 24   Global Step: 123600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:17,254-Speed 11118.10 samples/sec   Loss 6.8872   LearningRate 0.0151   Epoch: 24   Global Step: 123610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:18,167-Speed 11218.95 samples/sec   Loss 6.9082   LearningRate 0.0151   Epoch: 24   Global Step: 123620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:19,150-Speed 10424.49 samples/sec   Loss 6.8527   LearningRate 0.0151   Epoch: 24   Global Step: 123630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:20,099-Speed 10809.47 samples/sec   Loss 6.7995   LearningRate 0.0151   Epoch: 24   Global Step: 123640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:21,052-Speed 10744.53 samples/sec   Loss 6.8152   LearningRate 0.0151   Epoch: 24   Global Step: 123650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:22,026-Speed 10523.37 samples/sec   Loss 6.8814   LearningRate 0.0151   Epoch: 24   Global Step: 123660   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:51:23,000-Speed 10521.20 samples/sec   Loss 6.9755   LearningRate 0.0151   Epoch: 24   Global Step: 123670   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:51:23,993-Speed 10331.64 samples/sec   Loss 6.8664   LearningRate 0.0151   Epoch: 24   Global Step: 123680   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:51:24,999-Speed 10191.78 samples/sec   Loss 6.9148   LearningRate 0.0151   Epoch: 24   Global Step: 123690   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:51:25,961-Speed 10656.00 samples/sec   Loss 6.8921   LearningRate 0.0151   Epoch: 24   Global Step: 123700   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:51:26,899-Speed 10925.51 samples/sec   Loss 6.8798   LearningRate 0.0151   Epoch: 24   Global Step: 123710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:27,838-Speed 10917.24 samples/sec   Loss 6.8974   LearningRate 0.0151   Epoch: 24   Global Step: 123720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:28,811-Speed 10536.96 samples/sec   Loss 6.8072   LearningRate 0.0151   Epoch: 24   Global Step: 123730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:29,747-Speed 10948.48 samples/sec   Loss 6.8349   LearningRate 0.0151   Epoch: 24   Global Step: 123740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:30,691-Speed 10855.81 samples/sec   Loss 6.7436   LearningRate 0.0151   Epoch: 24   Global Step: 123750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:31,642-Speed 10772.52 samples/sec   Loss 6.8494   LearningRate 0.0151   Epoch: 24   Global Step: 123760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:32,611-Speed 10586.20 samples/sec   Loss 6.7924   LearningRate 0.0151   Epoch: 24   Global Step: 123770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:33,601-Speed 10346.65 samples/sec   Loss 6.8983   LearningRate 0.0151   Epoch: 24   Global Step: 123780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:34,531-Speed 11024.97 samples/sec   Loss 6.9489   LearningRate 0.0151   Epoch: 24   Global Step: 123790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:35,494-Speed 10640.29 samples/sec   Loss 6.8247   LearningRate 0.0151   Epoch: 24   Global Step: 123800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:36,448-Speed 10748.31 samples/sec   Loss 6.9265   LearningRate 0.0151   Epoch: 24   Global Step: 123810   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:51:37,420-Speed 10535.22 samples/sec   Loss 6.8798   LearningRate 0.0151   Epoch: 24   Global Step: 123820   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:51:38,334-Speed 11214.51 samples/sec   Loss 6.6686   LearningRate 0.0151   Epoch: 24   Global Step: 123830   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:51:39,301-Speed 10606.99 samples/sec   Loss 6.8587   LearningRate 0.0150   Epoch: 24   Global Step: 123840   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:51:40,282-Speed 10440.35 samples/sec   Loss 6.8715   LearningRate 0.0150   Epoch: 24   Global Step: 123850   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:51:41,244-Speed 10660.82 samples/sec   Loss 6.6359   LearningRate 0.0150   Epoch: 24   Global Step: 123860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:42,191-Speed 10824.52 samples/sec   Loss 6.8653   LearningRate 0.0150   Epoch: 24   Global Step: 123870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:43,152-Speed 10665.51 samples/sec   Loss 6.8875   LearningRate 0.0150   Epoch: 24   Global Step: 123880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:44,113-Speed 10657.95 samples/sec   Loss 6.9038   LearningRate 0.0150   Epoch: 24   Global Step: 123890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:45,093-Speed 10456.58 samples/sec   Loss 6.8838   LearningRate 0.0150   Epoch: 24   Global Step: 123900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:46,061-Speed 10590.03 samples/sec   Loss 7.0971   LearningRate 0.0150   Epoch: 24   Global Step: 123910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:47,012-Speed 10782.18 samples/sec   Loss 6.6931   LearningRate 0.0150   Epoch: 24   Global Step: 123920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:47,953-Speed 10882.09 samples/sec   Loss 6.8295   LearningRate 0.0150   Epoch: 24   Global Step: 123930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:48,874-Speed 11132.42 samples/sec   Loss 6.8791   LearningRate 0.0150   Epoch: 24   Global Step: 123940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:49,808-Speed 10972.40 samples/sec   Loss 6.7760   LearningRate 0.0150   Epoch: 24   Global Step: 123950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:50,734-Speed 11071.64 samples/sec   Loss 6.8945   LearningRate 0.0150   Epoch: 24   Global Step: 123960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:51,668-Speed 10968.36 samples/sec   Loss 6.9716   LearningRate 0.0150   Epoch: 24   Global Step: 123970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:52,613-Speed 10846.57 samples/sec   Loss 6.9827   LearningRate 0.0150   Epoch: 24   Global Step: 123980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:53,583-Speed 10572.85 samples/sec   Loss 6.9977   LearningRate 0.0150   Epoch: 24   Global Step: 123990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:51:54,512-Speed 11022.94 samples/sec   Loss 6.8109   LearningRate 0.0150   Epoch: 24   Global Step: 124000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:52:16,683-[lfw][124000]XNorm: 9.637199
Training: 2022-04-11 03:52:16,684-[lfw][124000]Accuracy-Flip: 0.99700+-0.00267
Training: 2022-04-11 03:52:16,684-[lfw][124000]Accuracy-Highest: 0.99700
Training: 2022-04-11 03:52:42,013-[cfp_fp][124000]XNorm: 8.229580
Training: 2022-04-11 03:52:42,013-[cfp_fp][124000]Accuracy-Flip: 0.96614+-0.00908
Training: 2022-04-11 03:52:42,014-[cfp_fp][124000]Accuracy-Highest: 0.96614
Training: 2022-04-11 03:53:04,077-[agedb_30][124000]XNorm: 9.394519
Training: 2022-04-11 03:53:04,077-[agedb_30][124000]Accuracy-Flip: 0.96717+-0.00753
Training: 2022-04-11 03:53:04,077-[agedb_30][124000]Accuracy-Highest: 0.97017
Training: 2022-04-11 03:53:04,997-Speed 145.28 samples/sec   Loss 6.8227   LearningRate 0.0150   Epoch: 24   Global Step: 124010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:53:05,928-Speed 11012.32 samples/sec   Loss 6.8109   LearningRate 0.0150   Epoch: 24   Global Step: 124020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:53:06,895-Speed 10593.39 samples/sec   Loss 7.0869   LearningRate 0.0150   Epoch: 24   Global Step: 124030   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:53:07,851-Speed 10730.77 samples/sec   Loss 7.0374   LearningRate 0.0150   Epoch: 24   Global Step: 124040   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:53:08,787-Speed 10951.14 samples/sec   Loss 6.8624   LearningRate 0.0150   Epoch: 24   Global Step: 124050   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:53:09,765-Speed 10474.39 samples/sec   Loss 6.9642   LearningRate 0.0150   Epoch: 24   Global Step: 124060   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:53:10,716-Speed 10787.27 samples/sec   Loss 6.9341   LearningRate 0.0150   Epoch: 24   Global Step: 124070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:53:11,671-Speed 10724.55 samples/sec   Loss 6.8439   LearningRate 0.0150   Epoch: 24   Global Step: 124080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:53:12,636-Speed 10617.84 samples/sec   Loss 6.9350   LearningRate 0.0150   Epoch: 24   Global Step: 124090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:53:13,595-Speed 10685.79 samples/sec   Loss 6.7908   LearningRate 0.0149   Epoch: 24   Global Step: 124100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:53:14,557-Speed 10661.89 samples/sec   Loss 6.9342   LearningRate 0.0149   Epoch: 24   Global Step: 124110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:53:15,502-Speed 10838.94 samples/sec   Loss 6.7525   LearningRate 0.0149   Epoch: 24   Global Step: 124120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:53:16,461-Speed 10685.61 samples/sec   Loss 6.9977   LearningRate 0.0149   Epoch: 24   Global Step: 124130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:53:17,423-Speed 10648.50 samples/sec   Loss 6.7357   LearningRate 0.0149   Epoch: 24   Global Step: 124140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:53:18,412-Speed 10369.78 samples/sec   Loss 6.8094   LearningRate 0.0149   Epoch: 24   Global Step: 124150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:53:19,420-Speed 10161.45 samples/sec   Loss 6.9347   LearningRate 0.0149   Epoch: 24   Global Step: 124160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:53:20,356-Speed 10950.26 samples/sec   Loss 6.9956   LearningRate 0.0149   Epoch: 24   Global Step: 124170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:53:21,317-Speed 10665.08 samples/sec   Loss 7.0984   LearningRate 0.0149   Epoch: 24   Global Step: 124180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:53:22,272-Speed 10737.44 samples/sec   Loss 6.9678   LearningRate 0.0149   Epoch: 24   Global Step: 124190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:53:23,236-Speed 10628.74 samples/sec   Loss 6.9618   LearningRate 0.0149   Epoch: 24   Global Step: 124200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:53:24,188-Speed 10766.72 samples/sec   Loss 6.9037   LearningRate 0.0149   Epoch: 24   Global Step: 124210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:53:25,192-Speed 10206.65 samples/sec   Loss 6.9375   LearningRate 0.0149   Epoch: 24   Global Step: 124220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:53:26,164-Speed 10540.28 samples/sec   Loss 6.9553   LearningRate 0.0149   Epoch: 24   Global Step: 124230   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:53:27,124-Speed 10674.45 samples/sec   Loss 6.9503   LearningRate 0.0149   Epoch: 24   Global Step: 124240   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:53:28,080-Speed 10723.67 samples/sec   Loss 6.8077   LearningRate 0.0149   Epoch: 24   Global Step: 124250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:53:28,991-Speed 11247.76 samples/sec   Loss 6.9983   LearningRate 0.0149   Epoch: 24   Global Step: 124260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:53:29,920-Speed 11035.48 samples/sec   Loss 6.9870   LearningRate 0.0149   Epoch: 24   Global Step: 124270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:53:30,866-Speed 10826.65 samples/sec   Loss 6.8617   LearningRate 0.0149   Epoch: 24   Global Step: 124280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:53:31,834-Speed 10585.67 samples/sec   Loss 6.8298   LearningRate 0.0149   Epoch: 24   Global Step: 124290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:53:32,774-Speed 10898.68 samples/sec   Loss 6.7586   LearningRate 0.0149   Epoch: 24   Global Step: 124300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:53:33,754-Speed 10456.78 samples/sec   Loss 6.9390   LearningRate 0.0149   Epoch: 24   Global Step: 124310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:53:34,694-Speed 10901.64 samples/sec   Loss 7.0020   LearningRate 0.0149   Epoch: 24   Global Step: 124320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:53:35,629-Speed 10965.31 samples/sec   Loss 6.8747   LearningRate 0.0149   Epoch: 24   Global Step: 124330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:53:36,589-Speed 10667.50 samples/sec   Loss 6.9161   LearningRate 0.0149   Epoch: 24   Global Step: 124340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:53:37,566-Speed 10491.27 samples/sec   Loss 6.8364   LearningRate 0.0149   Epoch: 24   Global Step: 124350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:53:38,548-Speed 10439.36 samples/sec   Loss 6.9359   LearningRate 0.0148   Epoch: 24   Global Step: 124360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:53:39,500-Speed 10765.84 samples/sec   Loss 6.9311   LearningRate 0.0148   Epoch: 24   Global Step: 124370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:53:40,456-Speed 10723.71 samples/sec   Loss 6.8371   LearningRate 0.0148   Epoch: 24   Global Step: 124380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:53:41,410-Speed 10744.61 samples/sec   Loss 7.1974   LearningRate 0.0148   Epoch: 24   Global Step: 124390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:53:42,390-Speed 10450.87 samples/sec   Loss 6.9956   LearningRate 0.0148   Epoch: 24   Global Step: 124400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:53:43,382-Speed 10326.79 samples/sec   Loss 6.9781   LearningRate 0.0148   Epoch: 24   Global Step: 124410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:53:44,330-Speed 10818.73 samples/sec   Loss 7.0010   LearningRate 0.0148   Epoch: 24   Global Step: 124420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:53:45,278-Speed 10812.58 samples/sec   Loss 6.9160   LearningRate 0.0148   Epoch: 24   Global Step: 124430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:53:46,247-Speed 10578.91 samples/sec   Loss 6.8162   LearningRate 0.0148   Epoch: 24   Global Step: 124440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:53:47,204-Speed 10709.35 samples/sec   Loss 6.9835   LearningRate 0.0148   Epoch: 24   Global Step: 124450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:53:48,161-Speed 10717.65 samples/sec   Loss 6.9098   LearningRate 0.0148   Epoch: 24   Global Step: 124460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:53:49,085-Speed 11085.69 samples/sec   Loss 6.9518   LearningRate 0.0148   Epoch: 24   Global Step: 124470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:53:50,064-Speed 10474.40 samples/sec   Loss 7.0428   LearningRate 0.0148   Epoch: 24   Global Step: 124480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:53:51,003-Speed 10915.82 samples/sec   Loss 7.0271   LearningRate 0.0148   Epoch: 24   Global Step: 124490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:53:51,968-Speed 10619.38 samples/sec   Loss 6.8973   LearningRate 0.0148   Epoch: 24   Global Step: 124500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:53:52,922-Speed 10744.34 samples/sec   Loss 6.7761   LearningRate 0.0148   Epoch: 24   Global Step: 124510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:53:53,851-Speed 11023.87 samples/sec   Loss 6.8869   LearningRate 0.0148   Epoch: 24   Global Step: 124520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:53:54,813-Speed 10658.93 samples/sec   Loss 6.8252   LearningRate 0.0148   Epoch: 24   Global Step: 124530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:53:55,756-Speed 10868.61 samples/sec   Loss 6.8562   LearningRate 0.0148   Epoch: 24   Global Step: 124540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:53:56,693-Speed 10941.69 samples/sec   Loss 6.9313   LearningRate 0.0148   Epoch: 24   Global Step: 124550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:53:57,640-Speed 10817.76 samples/sec   Loss 6.8693   LearningRate 0.0148   Epoch: 24   Global Step: 124560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:53:58,569-Speed 11033.39 samples/sec   Loss 7.0688   LearningRate 0.0148   Epoch: 24   Global Step: 124570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:53:59,523-Speed 10737.71 samples/sec   Loss 6.8860   LearningRate 0.0148   Epoch: 24   Global Step: 124580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:00,446-Speed 11104.36 samples/sec   Loss 6.9771   LearningRate 0.0148   Epoch: 24   Global Step: 124590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:01,383-Speed 10935.98 samples/sec   Loss 6.8235   LearningRate 0.0148   Epoch: 24   Global Step: 124600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:02,324-Speed 10887.42 samples/sec   Loss 6.9728   LearningRate 0.0148   Epoch: 24   Global Step: 124610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:03,325-Speed 10247.78 samples/sec   Loss 6.9063   LearningRate 0.0147   Epoch: 24   Global Step: 124620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:04,262-Speed 10937.34 samples/sec   Loss 6.8738   LearningRate 0.0147   Epoch: 24   Global Step: 124630   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:54:05,192-Speed 11017.52 samples/sec   Loss 7.0165   LearningRate 0.0147   Epoch: 24   Global Step: 124640   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:54:06,146-Speed 10740.44 samples/sec   Loss 6.8907   LearningRate 0.0147   Epoch: 24   Global Step: 124650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:07,123-Speed 10499.76 samples/sec   Loss 6.8895   LearningRate 0.0147   Epoch: 24   Global Step: 124660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:08,015-Speed 11478.55 samples/sec   Loss 6.9985   LearningRate 0.0147   Epoch: 24   Global Step: 124670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:08,993-Speed 10477.38 samples/sec   Loss 6.9051   LearningRate 0.0147   Epoch: 24   Global Step: 124680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:09,979-Speed 10400.88 samples/sec   Loss 6.8229   LearningRate 0.0147   Epoch: 24   Global Step: 124690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:10,908-Speed 11037.13 samples/sec   Loss 7.1157   LearningRate 0.0147   Epoch: 24   Global Step: 124700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:11,853-Speed 10839.38 samples/sec   Loss 7.0406   LearningRate 0.0147   Epoch: 24   Global Step: 124710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:12,795-Speed 10879.64 samples/sec   Loss 7.0848   LearningRate 0.0147   Epoch: 24   Global Step: 124720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:13,778-Speed 10423.29 samples/sec   Loss 6.8712   LearningRate 0.0147   Epoch: 24   Global Step: 124730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:54:14,733-Speed 10734.47 samples/sec   Loss 6.9752   LearningRate 0.0147   Epoch: 24   Global Step: 124740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:54:15,685-Speed 10763.36 samples/sec   Loss 6.9948   LearningRate 0.0147   Epoch: 24   Global Step: 124750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:54:16,617-Speed 11003.62 samples/sec   Loss 7.1320   LearningRate 0.0147   Epoch: 24   Global Step: 124760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:54:17,556-Speed 10911.99 samples/sec   Loss 6.8646   LearningRate 0.0147   Epoch: 24   Global Step: 124770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:54:18,517-Speed 10674.14 samples/sec   Loss 7.0066   LearningRate 0.0147   Epoch: 24   Global Step: 124780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:54:19,499-Speed 10432.67 samples/sec   Loss 6.8326   LearningRate 0.0147   Epoch: 24   Global Step: 124790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:54:20,443-Speed 10853.33 samples/sec   Loss 6.9769   LearningRate 0.0147   Epoch: 24   Global Step: 124800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:54:21,425-Speed 10443.36 samples/sec   Loss 6.8681   LearningRate 0.0147   Epoch: 24   Global Step: 124810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:54:22,376-Speed 10775.45 samples/sec   Loss 6.9523   LearningRate 0.0147   Epoch: 24   Global Step: 124820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:54:23,337-Speed 10670.21 samples/sec   Loss 7.0507   LearningRate 0.0147   Epoch: 24   Global Step: 124830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:24,306-Speed 10565.75 samples/sec   Loss 6.9553   LearningRate 0.0147   Epoch: 24   Global Step: 124840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:25,285-Speed 10475.68 samples/sec   Loss 6.9274   LearningRate 0.0147   Epoch: 24   Global Step: 124850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:26,263-Speed 10469.87 samples/sec   Loss 6.8714   LearningRate 0.0147   Epoch: 24   Global Step: 124860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:27,237-Speed 10528.69 samples/sec   Loss 7.0847   LearningRate 0.0147   Epoch: 24   Global Step: 124870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:28,192-Speed 10723.83 samples/sec   Loss 6.8430   LearningRate 0.0147   Epoch: 24   Global Step: 124880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:29,146-Speed 10739.85 samples/sec   Loss 6.8425   LearningRate 0.0146   Epoch: 24   Global Step: 124890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:30,083-Speed 10942.51 samples/sec   Loss 6.9636   LearningRate 0.0146   Epoch: 24   Global Step: 124900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:31,046-Speed 10641.96 samples/sec   Loss 6.8894   LearningRate 0.0146   Epoch: 24   Global Step: 124910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:54:31,983-Speed 10935.19 samples/sec   Loss 6.8425   LearningRate 0.0146   Epoch: 24   Global Step: 124920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:54:32,943-Speed 10679.11 samples/sec   Loss 6.9259   LearningRate 0.0146   Epoch: 24   Global Step: 124930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:54:33,869-Speed 11073.35 samples/sec   Loss 6.9470   LearningRate 0.0146   Epoch: 24   Global Step: 124940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:54:34,839-Speed 10561.97 samples/sec   Loss 6.9225   LearningRate 0.0146   Epoch: 24   Global Step: 124950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:54:35,764-Speed 11082.45 samples/sec   Loss 6.9779   LearningRate 0.0146   Epoch: 24   Global Step: 124960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:54:36,738-Speed 10527.54 samples/sec   Loss 6.8807   LearningRate 0.0146   Epoch: 24   Global Step: 124970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:54:37,670-Speed 10993.81 samples/sec   Loss 7.0032   LearningRate 0.0146   Epoch: 24   Global Step: 124980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:54:38,624-Speed 10744.23 samples/sec   Loss 7.0101   LearningRate 0.0146   Epoch: 24   Global Step: 124990   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:54:39,575-Speed 10767.79 samples/sec   Loss 7.0869   LearningRate 0.0146   Epoch: 24   Global Step: 125000   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:54:40,518-Speed 10873.47 samples/sec   Loss 6.9219   LearningRate 0.0146   Epoch: 24   Global Step: 125010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:41,471-Speed 10746.04 samples/sec   Loss 6.9577   LearningRate 0.0146   Epoch: 24   Global Step: 125020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:42,397-Speed 11068.96 samples/sec   Loss 6.9097   LearningRate 0.0146   Epoch: 24   Global Step: 125030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:43,343-Speed 10832.46 samples/sec   Loss 6.8711   LearningRate 0.0146   Epoch: 24   Global Step: 125040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:44,306-Speed 10644.30 samples/sec   Loss 6.8855   LearningRate 0.0146   Epoch: 24   Global Step: 125050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:45,268-Speed 10653.20 samples/sec   Loss 6.9360   LearningRate 0.0146   Epoch: 24   Global Step: 125060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:46,210-Speed 10881.41 samples/sec   Loss 6.9196   LearningRate 0.0146   Epoch: 24   Global Step: 125070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:47,148-Speed 10924.44 samples/sec   Loss 6.8639   LearningRate 0.0146   Epoch: 24   Global Step: 125080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:48,091-Speed 10874.84 samples/sec   Loss 6.9036   LearningRate 0.0146   Epoch: 24   Global Step: 125090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:49,058-Speed 10590.60 samples/sec   Loss 6.8880   LearningRate 0.0146   Epoch: 24   Global Step: 125100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:50,039-Speed 10448.75 samples/sec   Loss 6.9854   LearningRate 0.0146   Epoch: 24   Global Step: 125110   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:54:50,977-Speed 10924.35 samples/sec   Loss 6.8751   LearningRate 0.0146   Epoch: 24   Global Step: 125120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:51,939-Speed 10651.58 samples/sec   Loss 6.9162   LearningRate 0.0146   Epoch: 24   Global Step: 125130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:52,927-Speed 10383.15 samples/sec   Loss 6.9624   LearningRate 0.0146   Epoch: 24   Global Step: 125140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:53,862-Speed 10962.34 samples/sec   Loss 7.0288   LearningRate 0.0145   Epoch: 24   Global Step: 125150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:54,792-Speed 11016.81 samples/sec   Loss 6.8704   LearningRate 0.0145   Epoch: 24   Global Step: 125160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:55,724-Speed 10993.25 samples/sec   Loss 6.9433   LearningRate 0.0145   Epoch: 24   Global Step: 125170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:56,659-Speed 10959.95 samples/sec   Loss 6.8778   LearningRate 0.0145   Epoch: 24   Global Step: 125180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:57,678-Speed 10056.00 samples/sec   Loss 7.0479   LearningRate 0.0145   Epoch: 24   Global Step: 125190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:58,650-Speed 10541.59 samples/sec   Loss 6.9514   LearningRate 0.0145   Epoch: 24   Global Step: 125200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:54:59,619-Speed 10576.13 samples/sec   Loss 6.9706   LearningRate 0.0145   Epoch: 24   Global Step: 125210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:55:00,580-Speed 10666.20 samples/sec   Loss 7.0544   LearningRate 0.0145   Epoch: 24   Global Step: 125220   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:55:01,517-Speed 10932.37 samples/sec   Loss 7.0202   LearningRate 0.0145   Epoch: 24   Global Step: 125230   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:55:02,473-Speed 10720.99 samples/sec   Loss 7.0933   LearningRate 0.0145   Epoch: 24   Global Step: 125240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:55:03,436-Speed 10638.92 samples/sec   Loss 6.8996   LearningRate 0.0145   Epoch: 24   Global Step: 125250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:55:04,355-Speed 11152.12 samples/sec   Loss 6.9700   LearningRate 0.0145   Epoch: 24   Global Step: 125260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:55:05,322-Speed 10602.71 samples/sec   Loss 6.7421   LearningRate 0.0145   Epoch: 24   Global Step: 125270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:55:06,296-Speed 10519.48 samples/sec   Loss 6.8875   LearningRate 0.0145   Epoch: 24   Global Step: 125280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:55:07,285-Speed 10359.43 samples/sec   Loss 6.7537   LearningRate 0.0145   Epoch: 24   Global Step: 125290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:55:08,245-Speed 10679.47 samples/sec   Loss 6.9868   LearningRate 0.0145   Epoch: 24   Global Step: 125300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:55:09,228-Speed 10425.92 samples/sec   Loss 6.8157   LearningRate 0.0145   Epoch: 24   Global Step: 125310   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:55:10,157-Speed 11034.14 samples/sec   Loss 6.9428   LearningRate 0.0145   Epoch: 24   Global Step: 125320   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 03:55:11,096-Speed 10914.75 samples/sec   Loss 6.8222   LearningRate 0.0145   Epoch: 24   Global Step: 125330   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 03:55:12,047-Speed 10769.44 samples/sec   Loss 6.8598   LearningRate 0.0145   Epoch: 24   Global Step: 125340   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 03:55:13,005-Speed 10697.78 samples/sec   Loss 6.8857   LearningRate 0.0145   Epoch: 24   Global Step: 125350   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 03:55:13,954-Speed 10804.89 samples/sec   Loss 6.7371   LearningRate 0.0145   Epoch: 24   Global Step: 125360   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 03:55:14,900-Speed 10835.41 samples/sec   Loss 6.9178   LearningRate 0.0145   Epoch: 24   Global Step: 125370   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 03:55:15,858-Speed 10699.50 samples/sec   Loss 6.8560   LearningRate 0.0145   Epoch: 24   Global Step: 125380   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 03:55:16,782-Speed 11093.99 samples/sec   Loss 7.0060   LearningRate 0.0145   Epoch: 24   Global Step: 125390   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 03:55:17,721-Speed 10914.50 samples/sec   Loss 7.0017   LearningRate 0.0145   Epoch: 24   Global Step: 125400   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 03:55:18,683-Speed 10651.31 samples/sec   Loss 6.9944   LearningRate 0.0145   Epoch: 24   Global Step: 125410   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 03:55:19,640-Speed 10708.72 samples/sec   Loss 6.9354   LearningRate 0.0144   Epoch: 24   Global Step: 125420   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:55:20,608-Speed 10581.88 samples/sec   Loss 6.8744   LearningRate 0.0144   Epoch: 24   Global Step: 125430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:55:21,589-Speed 10478.97 samples/sec   Loss 7.0093   LearningRate 0.0144   Epoch: 24   Global Step: 125440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:55:22,575-Speed 10392.74 samples/sec   Loss 6.8901   LearningRate 0.0144   Epoch: 24   Global Step: 125450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:55:23,538-Speed 10643.67 samples/sec   Loss 6.9583   LearningRate 0.0144   Epoch: 24   Global Step: 125460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:55:29,664-Speed 10879.10 samples/sec   Loss 6.9125   LearningRate 0.0144   Epoch: 24   Global Step: 125470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:55:30,587-Speed 11105.53 samples/sec   Loss 7.0128   LearningRate 0.0144   Epoch: 24   Global Step: 125480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:55:31,559-Speed 10546.97 samples/sec   Loss 6.9631   LearningRate 0.0144   Epoch: 24   Global Step: 125490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:55:32,527-Speed 10582.09 samples/sec   Loss 6.9749   LearningRate 0.0144   Epoch: 24   Global Step: 125500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:55:33,483-Speed 10727.14 samples/sec   Loss 6.9541   LearningRate 0.0144   Epoch: 24   Global Step: 125510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:55:34,428-Speed 10842.47 samples/sec   Loss 6.7541   LearningRate 0.0144   Epoch: 24   Global Step: 125520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:55:35,381-Speed 10753.06 samples/sec   Loss 6.9483   LearningRate 0.0144   Epoch: 24   Global Step: 125530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:55:36,340-Speed 10692.17 samples/sec   Loss 6.8065   LearningRate 0.0144   Epoch: 24   Global Step: 125540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:55:37,277-Speed 10938.23 samples/sec   Loss 6.7234   LearningRate 0.0144   Epoch: 24   Global Step: 125550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:55:38,231-Speed 10740.96 samples/sec   Loss 6.9826   LearningRate 0.0144   Epoch: 24   Global Step: 125560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:55:39,167-Speed 10942.23 samples/sec   Loss 6.9853   LearningRate 0.0144   Epoch: 24   Global Step: 125570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:55:40,199-Speed 9931.61 samples/sec   Loss 6.8689   LearningRate 0.0144   Epoch: 24   Global Step: 125580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:55:41,159-Speed 10674.84 samples/sec   Loss 7.0594   LearningRate 0.0144   Epoch: 24   Global Step: 125590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:55:42,095-Speed 10952.76 samples/sec   Loss 6.9861   LearningRate 0.0144   Epoch: 24   Global Step: 125600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:55:43,026-Speed 11008.16 samples/sec   Loss 7.0201   LearningRate 0.0144   Epoch: 24   Global Step: 125610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:55:43,996-Speed 10569.22 samples/sec   Loss 7.0700   LearningRate 0.0144   Epoch: 24   Global Step: 125620   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:55:44,932-Speed 10953.58 samples/sec   Loss 7.0203   LearningRate 0.0144   Epoch: 24   Global Step: 125630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:55:45,896-Speed 10624.15 samples/sec   Loss 6.9785   LearningRate 0.0144   Epoch: 24   Global Step: 125640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:55:46,875-Speed 10471.66 samples/sec   Loss 6.9145   LearningRate 0.0144   Epoch: 24   Global Step: 125650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:55:47,823-Speed 10813.81 samples/sec   Loss 7.0284   LearningRate 0.0144   Epoch: 24   Global Step: 125660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:55:48,759-Speed 10943.93 samples/sec   Loss 6.9895   LearningRate 0.0144   Epoch: 24   Global Step: 125670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:55:49,730-Speed 10561.47 samples/sec   Loss 6.9529   LearningRate 0.0143   Epoch: 24   Global Step: 125680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:55:50,636-Speed 11304.24 samples/sec   Loss 6.9868   LearningRate 0.0143   Epoch: 24   Global Step: 125690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:55:51,621-Speed 10406.39 samples/sec   Loss 7.0152   LearningRate 0.0143   Epoch: 24   Global Step: 125700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:55:52,592-Speed 10568.56 samples/sec   Loss 6.9347   LearningRate 0.0143   Epoch: 24   Global Step: 125710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:55:53,562-Speed 10565.52 samples/sec   Loss 7.0029   LearningRate 0.0143   Epoch: 24   Global Step: 125720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:55:54,523-Speed 10659.78 samples/sec   Loss 6.8585   LearningRate 0.0143   Epoch: 24   Global Step: 125730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:55:55,488-Speed 10619.13 samples/sec   Loss 6.9379   LearningRate 0.0143   Epoch: 24   Global Step: 125740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:55:56,442-Speed 10746.35 samples/sec   Loss 6.8999   LearningRate 0.0143   Epoch: 24   Global Step: 125750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:55:57,421-Speed 10460.60 samples/sec   Loss 7.1166   LearningRate 0.0143   Epoch: 24   Global Step: 125760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:55:58,369-Speed 10818.64 samples/sec   Loss 6.8143   LearningRate 0.0143   Epoch: 24   Global Step: 125770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:55:59,334-Speed 10619.29 samples/sec   Loss 7.0572   LearningRate 0.0143   Epoch: 24   Global Step: 125780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:56:00,281-Speed 10813.42 samples/sec   Loss 7.1019   LearningRate 0.0143   Epoch: 24   Global Step: 125790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:56:01,221-Speed 10902.09 samples/sec   Loss 6.8990   LearningRate 0.0143   Epoch: 24   Global Step: 125800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:56:02,153-Speed 10997.79 samples/sec   Loss 7.0179   LearningRate 0.0143   Epoch: 24   Global Step: 125810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:56:03,096-Speed 10871.06 samples/sec   Loss 6.8859   LearningRate 0.0143   Epoch: 24   Global Step: 125820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:56:04,035-Speed 10917.75 samples/sec   Loss 6.8941   LearningRate 0.0143   Epoch: 24   Global Step: 125830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:56:04,975-Speed 10904.23 samples/sec   Loss 6.9942   LearningRate 0.0143   Epoch: 24   Global Step: 125840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:56:05,934-Speed 10683.47 samples/sec   Loss 7.0289   LearningRate 0.0143   Epoch: 24   Global Step: 125850   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:56:06,926-Speed 10326.29 samples/sec   Loss 6.8926   LearningRate 0.0143   Epoch: 24   Global Step: 125860   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:56:07,908-Speed 10435.88 samples/sec   Loss 6.9790   LearningRate 0.0143   Epoch: 24   Global Step: 125870   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:56:08,847-Speed 10912.77 samples/sec   Loss 7.0516   LearningRate 0.0143   Epoch: 24   Global Step: 125880   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:56:09,853-Speed 10190.27 samples/sec   Loss 6.8674   LearningRate 0.0143   Epoch: 24   Global Step: 125890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:56:10,788-Speed 10958.23 samples/sec   Loss 6.9261   LearningRate 0.0143   Epoch: 24   Global Step: 125900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:56:11,724-Speed 10952.80 samples/sec   Loss 7.0586   LearningRate 0.0143   Epoch: 24   Global Step: 125910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:56:12,681-Speed 10706.03 samples/sec   Loss 6.8870   LearningRate 0.0143   Epoch: 24   Global Step: 125920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:56:13,655-Speed 10519.19 samples/sec   Loss 7.0393   LearningRate 0.0143   Epoch: 24   Global Step: 125930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:56:14,634-Speed 10470.03 samples/sec   Loss 6.9167   LearningRate 0.0143   Epoch: 24   Global Step: 125940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:56:15,553-Speed 11158.16 samples/sec   Loss 6.9677   LearningRate 0.0142   Epoch: 24   Global Step: 125950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:56:16,504-Speed 10778.70 samples/sec   Loss 7.0322   LearningRate 0.0142   Epoch: 24   Global Step: 125960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:56:17,426-Speed 11121.76 samples/sec   Loss 7.0467   LearningRate 0.0142   Epoch: 24   Global Step: 125970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:56:18,449-Speed 10016.99 samples/sec   Loss 7.1150   LearningRate 0.0142   Epoch: 24   Global Step: 125980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:56:19,411-Speed 10647.15 samples/sec   Loss 6.9593   LearningRate 0.0142   Epoch: 24   Global Step: 125990   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:56:20,377-Speed 10607.27 samples/sec   Loss 7.1112   LearningRate 0.0142   Epoch: 24   Global Step: 126000   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:56:42,791-[lfw][126000]XNorm: 9.549155
Training: 2022-04-11 03:56:42,791-[lfw][126000]Accuracy-Flip: 0.99500+-0.00350
Training: 2022-04-11 03:56:42,792-[lfw][126000]Accuracy-Highest: 0.99700
Training: 2022-04-11 03:57:08,254-[cfp_fp][126000]XNorm: 8.209213
Training: 2022-04-11 03:57:08,255-[cfp_fp][126000]Accuracy-Flip: 0.96471+-0.01071
Training: 2022-04-11 03:57:08,255-[cfp_fp][126000]Accuracy-Highest: 0.96614
Training: 2022-04-11 03:57:30,352-[agedb_30][126000]XNorm: 9.290670
Training: 2022-04-11 03:57:30,353-[agedb_30][126000]Accuracy-Flip: 0.96683+-0.00914
Training: 2022-04-11 03:57:30,354-[agedb_30][126000]Accuracy-Highest: 0.97017
Training: 2022-04-11 03:57:31,302-Speed 144.38 samples/sec   Loss 6.9396   LearningRate 0.0142   Epoch: 24   Global Step: 126010   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:57:32,284-Speed 10427.64 samples/sec   Loss 6.9135   LearningRate 0.0142   Epoch: 24   Global Step: 126020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:57:33,249-Speed 10618.36 samples/sec   Loss 6.9552   LearningRate 0.0142   Epoch: 24   Global Step: 126030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:57:34,209-Speed 10683.04 samples/sec   Loss 7.1009   LearningRate 0.0142   Epoch: 24   Global Step: 126040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:57:35,124-Speed 11198.91 samples/sec   Loss 7.0787   LearningRate 0.0142   Epoch: 24   Global Step: 126050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:57:36,047-Speed 11099.86 samples/sec   Loss 7.0781   LearningRate 0.0142   Epoch: 24   Global Step: 126060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:57:36,969-Speed 11115.46 samples/sec   Loss 6.9922   LearningRate 0.0142   Epoch: 24   Global Step: 126070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:57:37,866-Speed 11424.31 samples/sec   Loss 7.0389   LearningRate 0.0142   Epoch: 24   Global Step: 126080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:57:38,821-Speed 10732.67 samples/sec   Loss 6.8730   LearningRate 0.0142   Epoch: 24   Global Step: 126090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:57:39,764-Speed 10865.93 samples/sec   Loss 6.9482   LearningRate 0.0142   Epoch: 24   Global Step: 126100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:57:40,731-Speed 10598.23 samples/sec   Loss 6.9660   LearningRate 0.0142   Epoch: 24   Global Step: 126110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:57:41,658-Speed 11053.92 samples/sec   Loss 7.0553   LearningRate 0.0142   Epoch: 24   Global Step: 126120   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:57:42,571-Speed 11221.75 samples/sec   Loss 6.8797   LearningRate 0.0142   Epoch: 24   Global Step: 126130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:57:43,516-Speed 10847.16 samples/sec   Loss 6.9940   LearningRate 0.0142   Epoch: 24   Global Step: 126140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:57:44,511-Speed 10300.75 samples/sec   Loss 7.0797   LearningRate 0.0142   Epoch: 24   Global Step: 126150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:57:45,440-Speed 11041.47 samples/sec   Loss 6.9345   LearningRate 0.0142   Epoch: 24   Global Step: 126160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:57:46,407-Speed 10594.00 samples/sec   Loss 7.1166   LearningRate 0.0142   Epoch: 24   Global Step: 126170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:57:47,340-Speed 10984.57 samples/sec   Loss 6.9866   LearningRate 0.0142   Epoch: 24   Global Step: 126180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:57:48,309-Speed 10578.32 samples/sec   Loss 6.8843   LearningRate 0.0142   Epoch: 24   Global Step: 126190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:57:49,244-Speed 10967.43 samples/sec   Loss 6.9785   LearningRate 0.0142   Epoch: 24   Global Step: 126200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:57:50,187-Speed 10868.68 samples/sec   Loss 7.0448   LearningRate 0.0142   Epoch: 24   Global Step: 126210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:57:51,108-Speed 11125.85 samples/sec   Loss 7.1399   LearningRate 0.0141   Epoch: 24   Global Step: 126220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:57:52,067-Speed 10689.68 samples/sec   Loss 7.1536   LearningRate 0.0141   Epoch: 24   Global Step: 126230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:57:53,011-Speed 10861.73 samples/sec   Loss 7.0373   LearningRate 0.0141   Epoch: 24   Global Step: 126240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:57:53,965-Speed 10742.05 samples/sec   Loss 7.0016   LearningRate 0.0141   Epoch: 24   Global Step: 126250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:57:54,930-Speed 10624.75 samples/sec   Loss 6.9761   LearningRate 0.0141   Epoch: 24   Global Step: 126260   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:57:55,885-Speed 10733.99 samples/sec   Loss 6.9424   LearningRate 0.0141   Epoch: 24   Global Step: 126270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:57:56,875-Speed 10345.14 samples/sec   Loss 6.9748   LearningRate 0.0141   Epoch: 24   Global Step: 126280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:57:57,822-Speed 10824.24 samples/sec   Loss 6.9080   LearningRate 0.0141   Epoch: 24   Global Step: 126290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:57:58,736-Speed 11212.23 samples/sec   Loss 6.9919   LearningRate 0.0141   Epoch: 24   Global Step: 126300   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:57:59,695-Speed 10684.45 samples/sec   Loss 7.0351   LearningRate 0.0141   Epoch: 24   Global Step: 126310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:58:00,619-Speed 11089.17 samples/sec   Loss 6.8876   LearningRate 0.0141   Epoch: 24   Global Step: 126320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:58:01,521-Speed 11362.23 samples/sec   Loss 7.0584   LearningRate 0.0141   Epoch: 24   Global Step: 126330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:58:02,512-Speed 10344.37 samples/sec   Loss 7.1149   LearningRate 0.0141   Epoch: 24   Global Step: 126340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:58:03,506-Speed 10310.97 samples/sec   Loss 7.1446   LearningRate 0.0141   Epoch: 24   Global Step: 126350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:58:04,455-Speed 10799.45 samples/sec   Loss 6.8049   LearningRate 0.0141   Epoch: 24   Global Step: 126360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:58:05,418-Speed 10652.87 samples/sec   Loss 7.0605   LearningRate 0.0141   Epoch: 24   Global Step: 126370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:58:06,334-Speed 11187.33 samples/sec   Loss 6.8368   LearningRate 0.0141   Epoch: 24   Global Step: 126380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:58:07,282-Speed 10799.05 samples/sec   Loss 6.9569   LearningRate 0.0141   Epoch: 24   Global Step: 126390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:58:08,209-Speed 11062.94 samples/sec   Loss 6.9441   LearningRate 0.0141   Epoch: 24   Global Step: 126400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:58:09,179-Speed 10573.78 samples/sec   Loss 7.0031   LearningRate 0.0141   Epoch: 24   Global Step: 126410   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:58:10,137-Speed 10696.92 samples/sec   Loss 7.1422   LearningRate 0.0141   Epoch: 24   Global Step: 126420   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:58:11,089-Speed 10766.53 samples/sec   Loss 7.2252   LearningRate 0.0141   Epoch: 24   Global Step: 126430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:58:12,112-Speed 10011.72 samples/sec   Loss 6.9835   LearningRate 0.0141   Epoch: 24   Global Step: 126440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:58:12,978-Speed 11840.63 samples/sec   Loss 6.9006   LearningRate 0.0141   Epoch: 24   Global Step: 126450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:58:23,709-Speed 954.40 samples/sec   Loss 6.1569   LearningRate 0.0141   Epoch: 25   Global Step: 126460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:58:25,107-Speed 7330.96 samples/sec   Loss 6.3287   LearningRate 0.0141   Epoch: 25   Global Step: 126470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:58:26,061-Speed 10739.85 samples/sec   Loss 6.2556   LearningRate 0.0141   Epoch: 25   Global Step: 126480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:58:27,227-Speed 8790.87 samples/sec   Loss 6.2607   LearningRate 0.0140   Epoch: 25   Global Step: 126490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:58:28,232-Speed 10201.22 samples/sec   Loss 6.0042   LearningRate 0.0140   Epoch: 25   Global Step: 126500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:58:29,229-Speed 10280.07 samples/sec   Loss 6.1475   LearningRate 0.0140   Epoch: 25   Global Step: 126510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:58:30,169-Speed 10894.97 samples/sec   Loss 6.2820   LearningRate 0.0140   Epoch: 25   Global Step: 126520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:58:31,114-Speed 10851.97 samples/sec   Loss 6.2155   LearningRate 0.0140   Epoch: 25   Global Step: 126530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:58:32,044-Speed 11022.93 samples/sec   Loss 6.1610   LearningRate 0.0140   Epoch: 25   Global Step: 126540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:58:33,004-Speed 10684.42 samples/sec   Loss 6.1511   LearningRate 0.0140   Epoch: 25   Global Step: 126550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:58:33,994-Speed 10350.73 samples/sec   Loss 6.2383   LearningRate 0.0140   Epoch: 25   Global Step: 126560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:58:34,933-Speed 10927.60 samples/sec   Loss 6.2594   LearningRate 0.0140   Epoch: 25   Global Step: 126570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:58:35,858-Speed 11079.07 samples/sec   Loss 6.2645   LearningRate 0.0140   Epoch: 25   Global Step: 126580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:58:36,827-Speed 10580.27 samples/sec   Loss 6.2527   LearningRate 0.0140   Epoch: 25   Global Step: 126590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:58:37,804-Speed 10486.46 samples/sec   Loss 6.2992   LearningRate 0.0140   Epoch: 25   Global Step: 126600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:58:38,773-Speed 10580.45 samples/sec   Loss 6.2083   LearningRate 0.0140   Epoch: 25   Global Step: 126610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:58:39,740-Speed 10599.60 samples/sec   Loss 6.2200   LearningRate 0.0140   Epoch: 25   Global Step: 126620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:58:40,645-Speed 11327.99 samples/sec   Loss 6.1801   LearningRate 0.0140   Epoch: 25   Global Step: 126630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:58:41,626-Speed 10441.43 samples/sec   Loss 6.1229   LearningRate 0.0140   Epoch: 25   Global Step: 126640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:58:42,608-Speed 10432.56 samples/sec   Loss 6.2260   LearningRate 0.0140   Epoch: 25   Global Step: 126650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:58:43,536-Speed 11054.80 samples/sec   Loss 6.1586   LearningRate 0.0140   Epoch: 25   Global Step: 126660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:58:44,494-Speed 10698.84 samples/sec   Loss 6.2424   LearningRate 0.0140   Epoch: 25   Global Step: 126670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:58:45,451-Speed 10721.79 samples/sec   Loss 6.2660   LearningRate 0.0140   Epoch: 25   Global Step: 126680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:58:46,400-Speed 10787.10 samples/sec   Loss 6.3174   LearningRate 0.0140   Epoch: 25   Global Step: 126690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:58:47,403-Speed 10222.42 samples/sec   Loss 6.2289   LearningRate 0.0140   Epoch: 25   Global Step: 126700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:58:48,351-Speed 10819.85 samples/sec   Loss 6.3433   LearningRate 0.0140   Epoch: 25   Global Step: 126710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:58:49,322-Speed 10554.34 samples/sec   Loss 6.3268   LearningRate 0.0140   Epoch: 25   Global Step: 126720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:58:50,308-Speed 10393.86 samples/sec   Loss 6.4187   LearningRate 0.0140   Epoch: 25   Global Step: 126730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:58:51,259-Speed 10781.82 samples/sec   Loss 6.2292   LearningRate 0.0140   Epoch: 25   Global Step: 126740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:58:52,221-Speed 10654.00 samples/sec   Loss 6.1991   LearningRate 0.0140   Epoch: 25   Global Step: 126750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:58:53,206-Speed 10405.65 samples/sec   Loss 6.2434   LearningRate 0.0139   Epoch: 25   Global Step: 126760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:58:54,201-Speed 10302.17 samples/sec   Loss 6.3201   LearningRate 0.0139   Epoch: 25   Global Step: 126770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:58:55,143-Speed 10889.57 samples/sec   Loss 6.2322   LearningRate 0.0139   Epoch: 25   Global Step: 126780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:58:56,091-Speed 10809.87 samples/sec   Loss 6.2742   LearningRate 0.0139   Epoch: 25   Global Step: 126790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:58:57,051-Speed 10671.08 samples/sec   Loss 6.4098   LearningRate 0.0139   Epoch: 25   Global Step: 126800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:58:58,018-Speed 10607.56 samples/sec   Loss 6.2863   LearningRate 0.0139   Epoch: 25   Global Step: 126810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:58:58,992-Speed 10519.38 samples/sec   Loss 6.3399   LearningRate 0.0139   Epoch: 25   Global Step: 126820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:58:59,953-Speed 10669.15 samples/sec   Loss 6.2827   LearningRate 0.0139   Epoch: 25   Global Step: 126830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:00,945-Speed 10329.69 samples/sec   Loss 6.1870   LearningRate 0.0139   Epoch: 25   Global Step: 126840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:01,924-Speed 10467.29 samples/sec   Loss 6.2962   LearningRate 0.0139   Epoch: 25   Global Step: 126850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:02,911-Speed 10379.88 samples/sec   Loss 6.3617   LearningRate 0.0139   Epoch: 25   Global Step: 126860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:03,978-Speed 9607.27 samples/sec   Loss 6.3020   LearningRate 0.0139   Epoch: 25   Global Step: 126870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:04,993-Speed 10107.16 samples/sec   Loss 6.3449   LearningRate 0.0139   Epoch: 25   Global Step: 126880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:05,956-Speed 10640.37 samples/sec   Loss 6.3269   LearningRate 0.0139   Epoch: 25   Global Step: 126890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:06,894-Speed 10933.03 samples/sec   Loss 6.3076   LearningRate 0.0139   Epoch: 25   Global Step: 126900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:07,857-Speed 10644.08 samples/sec   Loss 6.4204   LearningRate 0.0139   Epoch: 25   Global Step: 126910   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:59:08,817-Speed 10678.34 samples/sec   Loss 6.3660   LearningRate 0.0139   Epoch: 25   Global Step: 126920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:09,797-Speed 10459.06 samples/sec   Loss 6.2462   LearningRate 0.0139   Epoch: 25   Global Step: 126930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:10,785-Speed 10373.01 samples/sec   Loss 6.3860   LearningRate 0.0139   Epoch: 25   Global Step: 126940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:11,752-Speed 10603.25 samples/sec   Loss 6.4478   LearningRate 0.0139   Epoch: 25   Global Step: 126950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:12,694-Speed 10871.74 samples/sec   Loss 6.3467   LearningRate 0.0139   Epoch: 25   Global Step: 126960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:13,659-Speed 10632.69 samples/sec   Loss 6.3951   LearningRate 0.0139   Epoch: 25   Global Step: 126970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:14,611-Speed 10763.57 samples/sec   Loss 6.2081   LearningRate 0.0139   Epoch: 25   Global Step: 126980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:15,564-Speed 10761.10 samples/sec   Loss 6.3478   LearningRate 0.0139   Epoch: 25   Global Step: 126990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:16,525-Speed 10682.84 samples/sec   Loss 6.2475   LearningRate 0.0139   Epoch: 25   Global Step: 127000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:17,450-Speed 11073.95 samples/sec   Loss 6.5247   LearningRate 0.0139   Epoch: 25   Global Step: 127010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:18,403-Speed 10752.54 samples/sec   Loss 6.4229   LearningRate 0.0139   Epoch: 25   Global Step: 127020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:19,366-Speed 10641.33 samples/sec   Loss 6.4544   LearningRate 0.0138   Epoch: 25   Global Step: 127030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:20,312-Speed 10835.15 samples/sec   Loss 6.1947   LearningRate 0.0138   Epoch: 25   Global Step: 127040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:21,273-Speed 10666.39 samples/sec   Loss 6.3367   LearningRate 0.0138   Epoch: 25   Global Step: 127050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:22,255-Speed 10435.47 samples/sec   Loss 6.3251   LearningRate 0.0138   Epoch: 25   Global Step: 127060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:23,237-Speed 10437.27 samples/sec   Loss 6.3994   LearningRate 0.0138   Epoch: 25   Global Step: 127070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:24,219-Speed 10439.80 samples/sec   Loss 6.1806   LearningRate 0.0138   Epoch: 25   Global Step: 127080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:25,117-Speed 11418.55 samples/sec   Loss 6.4414   LearningRate 0.0138   Epoch: 25   Global Step: 127090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:59:26,065-Speed 10811.08 samples/sec   Loss 6.3316   LearningRate 0.0138   Epoch: 25   Global Step: 127100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:59:27,026-Speed 10655.41 samples/sec   Loss 6.4432   LearningRate 0.0138   Epoch: 25   Global Step: 127110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:59:27,970-Speed 10857.82 samples/sec   Loss 6.5638   LearningRate 0.0138   Epoch: 25   Global Step: 127120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:59:28,936-Speed 10609.34 samples/sec   Loss 6.2506   LearningRate 0.0138   Epoch: 25   Global Step: 127130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:59:29,885-Speed 10799.80 samples/sec   Loss 6.4238   LearningRate 0.0138   Epoch: 25   Global Step: 127140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:59:30,905-Speed 10051.17 samples/sec   Loss 6.2836   LearningRate 0.0138   Epoch: 25   Global Step: 127150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:59:31,875-Speed 10568.38 samples/sec   Loss 6.5761   LearningRate 0.0138   Epoch: 25   Global Step: 127160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:59:32,852-Speed 10483.53 samples/sec   Loss 6.4173   LearningRate 0.0138   Epoch: 25   Global Step: 127170   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:59:33,784-Speed 10997.79 samples/sec   Loss 6.5182   LearningRate 0.0138   Epoch: 25   Global Step: 127180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 03:59:34,784-Speed 10251.68 samples/sec   Loss 6.4126   LearningRate 0.0138   Epoch: 25   Global Step: 127190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:35,731-Speed 10829.54 samples/sec   Loss 6.4157   LearningRate 0.0138   Epoch: 25   Global Step: 127200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:36,675-Speed 10860.97 samples/sec   Loss 6.3185   LearningRate 0.0138   Epoch: 25   Global Step: 127210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:37,653-Speed 10479.89 samples/sec   Loss 6.4032   LearningRate 0.0138   Epoch: 25   Global Step: 127220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:38,649-Speed 10292.15 samples/sec   Loss 6.4364   LearningRate 0.0138   Epoch: 25   Global Step: 127230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:39,601-Speed 10760.29 samples/sec   Loss 6.6651   LearningRate 0.0138   Epoch: 25   Global Step: 127240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:40,545-Speed 10853.40 samples/sec   Loss 6.4368   LearningRate 0.0138   Epoch: 25   Global Step: 127250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:41,491-Speed 10829.71 samples/sec   Loss 6.5149   LearningRate 0.0138   Epoch: 25   Global Step: 127260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:42,427-Speed 10950.72 samples/sec   Loss 6.5298   LearningRate 0.0138   Epoch: 25   Global Step: 127270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:43,403-Speed 10508.52 samples/sec   Loss 6.5518   LearningRate 0.0138   Epoch: 25   Global Step: 127280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:44,339-Speed 10942.77 samples/sec   Loss 6.6085   LearningRate 0.0138   Epoch: 25   Global Step: 127290   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:59:45,280-Speed 10898.04 samples/sec   Loss 6.4062   LearningRate 0.0137   Epoch: 25   Global Step: 127300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:46,211-Speed 11012.23 samples/sec   Loss 6.4700   LearningRate 0.0137   Epoch: 25   Global Step: 127310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:47,180-Speed 10572.80 samples/sec   Loss 6.3360   LearningRate 0.0137   Epoch: 25   Global Step: 127320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:48,179-Speed 10260.97 samples/sec   Loss 6.5092   LearningRate 0.0137   Epoch: 25   Global Step: 127330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:49,137-Speed 10705.21 samples/sec   Loss 6.5409   LearningRate 0.0137   Epoch: 25   Global Step: 127340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:50,083-Speed 10832.85 samples/sec   Loss 6.5183   LearningRate 0.0137   Epoch: 25   Global Step: 127350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:51,004-Speed 11132.32 samples/sec   Loss 6.5060   LearningRate 0.0137   Epoch: 25   Global Step: 127360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:51,936-Speed 10996.45 samples/sec   Loss 6.4790   LearningRate 0.0137   Epoch: 25   Global Step: 127370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:52,884-Speed 10805.82 samples/sec   Loss 6.3597   LearningRate 0.0137   Epoch: 25   Global Step: 127380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:53,857-Speed 10530.59 samples/sec   Loss 6.4829   LearningRate 0.0137   Epoch: 25   Global Step: 127390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:54,804-Speed 10822.06 samples/sec   Loss 6.5564   LearningRate 0.0137   Epoch: 25   Global Step: 127400   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:59:55,768-Speed 10636.80 samples/sec   Loss 6.5146   LearningRate 0.0137   Epoch: 25   Global Step: 127410   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:59:56,697-Speed 11027.28 samples/sec   Loss 6.6164   LearningRate 0.0137   Epoch: 25   Global Step: 127420   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 03:59:57,676-Speed 10478.71 samples/sec   Loss 6.4925   LearningRate 0.0137   Epoch: 25   Global Step: 127430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:58,623-Speed 10815.88 samples/sec   Loss 6.5687   LearningRate 0.0137   Epoch: 25   Global Step: 127440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 03:59:59,620-Speed 10277.40 samples/sec   Loss 6.4655   LearningRate 0.0137   Epoch: 25   Global Step: 127450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:00,601-Speed 10441.15 samples/sec   Loss 6.5062   LearningRate 0.0137   Epoch: 25   Global Step: 127460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:00:01,591-Speed 10360.97 samples/sec   Loss 6.6015   LearningRate 0.0137   Epoch: 25   Global Step: 127470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:00:02,522-Speed 11010.07 samples/sec   Loss 6.3967   LearningRate 0.0137   Epoch: 25   Global Step: 127480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:00:03,461-Speed 10911.77 samples/sec   Loss 6.6179   LearningRate 0.0137   Epoch: 25   Global Step: 127490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:00:04,456-Speed 10298.40 samples/sec   Loss 6.4384   LearningRate 0.0137   Epoch: 25   Global Step: 127500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:00:05,410-Speed 10741.04 samples/sec   Loss 6.4049   LearningRate 0.0137   Epoch: 25   Global Step: 127510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:00:06,357-Speed 10821.94 samples/sec   Loss 6.5130   LearningRate 0.0137   Epoch: 25   Global Step: 127520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:00:07,280-Speed 11108.18 samples/sec   Loss 6.5170   LearningRate 0.0137   Epoch: 25   Global Step: 127530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:00:08,256-Speed 10503.99 samples/sec   Loss 6.5630   LearningRate 0.0137   Epoch: 25   Global Step: 127540   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:00:09,200-Speed 10853.54 samples/sec   Loss 6.4797   LearningRate 0.0137   Epoch: 25   Global Step: 127550   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:00:10,213-Speed 10113.33 samples/sec   Loss 6.5933   LearningRate 0.0137   Epoch: 25   Global Step: 127560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:11,180-Speed 10604.30 samples/sec   Loss 6.6038   LearningRate 0.0137   Epoch: 25   Global Step: 127570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:12,162-Speed 10436.69 samples/sec   Loss 6.5015   LearningRate 0.0136   Epoch: 25   Global Step: 127580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:13,119-Speed 10711.61 samples/sec   Loss 6.4998   LearningRate 0.0136   Epoch: 25   Global Step: 127590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:14,070-Speed 10773.09 samples/sec   Loss 6.4880   LearningRate 0.0136   Epoch: 25   Global Step: 127600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:15,054-Speed 10417.12 samples/sec   Loss 6.5031   LearningRate 0.0136   Epoch: 25   Global Step: 127610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:15,987-Speed 10982.61 samples/sec   Loss 6.5253   LearningRate 0.0136   Epoch: 25   Global Step: 127620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:16,978-Speed 10345.56 samples/sec   Loss 6.4042   LearningRate 0.0136   Epoch: 25   Global Step: 127630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:17,931-Speed 10757.69 samples/sec   Loss 6.5061   LearningRate 0.0136   Epoch: 25   Global Step: 127640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:18,898-Speed 10607.44 samples/sec   Loss 6.5344   LearningRate 0.0136   Epoch: 25   Global Step: 127650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:19,900-Speed 10229.41 samples/sec   Loss 6.4804   LearningRate 0.0136   Epoch: 25   Global Step: 127660   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:00:20,902-Speed 10229.87 samples/sec   Loss 6.5173   LearningRate 0.0136   Epoch: 25   Global Step: 127670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:21,854-Speed 10764.47 samples/sec   Loss 6.4874   LearningRate 0.0136   Epoch: 25   Global Step: 127680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:22,803-Speed 10801.28 samples/sec   Loss 6.5833   LearningRate 0.0136   Epoch: 25   Global Step: 127690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:23,765-Speed 10653.44 samples/sec   Loss 6.6976   LearningRate 0.0136   Epoch: 25   Global Step: 127700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:24,721-Speed 10713.55 samples/sec   Loss 6.5075   LearningRate 0.0136   Epoch: 25   Global Step: 127710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:25,661-Speed 10907.24 samples/sec   Loss 6.4919   LearningRate 0.0136   Epoch: 25   Global Step: 127720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:26,603-Speed 10882.58 samples/sec   Loss 6.6567   LearningRate 0.0136   Epoch: 25   Global Step: 127730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:27,584-Speed 10475.19 samples/sec   Loss 6.6798   LearningRate 0.0136   Epoch: 25   Global Step: 127740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:28,539-Speed 10723.98 samples/sec   Loss 6.4591   LearningRate 0.0136   Epoch: 25   Global Step: 127750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:29,500-Speed 10674.96 samples/sec   Loss 6.5685   LearningRate 0.0136   Epoch: 25   Global Step: 127760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:30,478-Speed 10470.65 samples/sec   Loss 6.6648   LearningRate 0.0136   Epoch: 25   Global Step: 127770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:31,441-Speed 10642.34 samples/sec   Loss 6.4970   LearningRate 0.0136   Epoch: 25   Global Step: 127780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:32,416-Speed 10517.81 samples/sec   Loss 6.5995   LearningRate 0.0136   Epoch: 25   Global Step: 127790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:33,372-Speed 10719.41 samples/sec   Loss 6.6260   LearningRate 0.0136   Epoch: 25   Global Step: 127800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:34,334-Speed 10652.44 samples/sec   Loss 6.6045   LearningRate 0.0136   Epoch: 25   Global Step: 127810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:35,259-Speed 11070.64 samples/sec   Loss 6.6934   LearningRate 0.0136   Epoch: 25   Global Step: 127820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:36,178-Speed 11157.74 samples/sec   Loss 6.5235   LearningRate 0.0136   Epoch: 25   Global Step: 127830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:37,124-Speed 10828.38 samples/sec   Loss 6.6110   LearningRate 0.0136   Epoch: 25   Global Step: 127840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:38,079-Speed 10725.53 samples/sec   Loss 6.6521   LearningRate 0.0135   Epoch: 25   Global Step: 127850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:39,078-Speed 10265.21 samples/sec   Loss 6.5540   LearningRate 0.0135   Epoch: 25   Global Step: 127860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:40,025-Speed 10825.73 samples/sec   Loss 6.6059   LearningRate 0.0135   Epoch: 25   Global Step: 127870   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:00:40,944-Speed 11158.56 samples/sec   Loss 6.6516   LearningRate 0.0135   Epoch: 25   Global Step: 127880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:41,920-Speed 10495.30 samples/sec   Loss 6.6012   LearningRate 0.0135   Epoch: 25   Global Step: 127890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:42,883-Speed 10642.16 samples/sec   Loss 6.5164   LearningRate 0.0135   Epoch: 25   Global Step: 127900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:43,808-Speed 11089.18 samples/sec   Loss 6.4920   LearningRate 0.0135   Epoch: 25   Global Step: 127910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:44,783-Speed 10515.98 samples/sec   Loss 6.7502   LearningRate 0.0135   Epoch: 25   Global Step: 127920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:45,729-Speed 10834.56 samples/sec   Loss 6.5505   LearningRate 0.0135   Epoch: 25   Global Step: 127930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:46,664-Speed 10956.04 samples/sec   Loss 6.6015   LearningRate 0.0135   Epoch: 25   Global Step: 127940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:47,586-Speed 11117.96 samples/sec   Loss 6.6391   LearningRate 0.0135   Epoch: 25   Global Step: 127950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:48,525-Speed 10920.96 samples/sec   Loss 6.6916   LearningRate 0.0135   Epoch: 25   Global Step: 127960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:49,467-Speed 10871.61 samples/sec   Loss 6.6388   LearningRate 0.0135   Epoch: 25   Global Step: 127970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:50,476-Speed 10165.48 samples/sec   Loss 6.5641   LearningRate 0.0135   Epoch: 25   Global Step: 127980   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:00:51,440-Speed 10626.29 samples/sec   Loss 6.5241   LearningRate 0.0135   Epoch: 25   Global Step: 127990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:00:52,402-Speed 10654.20 samples/sec   Loss 6.5508   LearningRate 0.0135   Epoch: 25   Global Step: 128000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:01:14,461-[lfw][128000]XNorm: 9.461514
Training: 2022-04-11 04:01:14,462-[lfw][128000]Accuracy-Flip: 0.99583+-0.00344
Training: 2022-04-11 04:01:14,463-[lfw][128000]Accuracy-Highest: 0.99700
Training: 2022-04-11 04:01:39,884-[cfp_fp][128000]XNorm: 8.056576
Training: 2022-04-11 04:01:39,885-[cfp_fp][128000]Accuracy-Flip: 0.96800+-0.01156
Training: 2022-04-11 04:01:39,886-[cfp_fp][128000]Accuracy-Highest: 0.96800
Training: 2022-04-11 04:02:02,173-[agedb_30][128000]XNorm: 9.249785
Training: 2022-04-11 04:02:02,174-[agedb_30][128000]Accuracy-Flip: 0.96650+-0.00970
Training: 2022-04-11 04:02:02,175-[agedb_30][128000]Accuracy-Highest: 0.97017
Training: 2022-04-11 04:02:03,131-Speed 144.78 samples/sec   Loss 6.5958   LearningRate 0.0135   Epoch: 25   Global Step: 128010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:02:04,088-Speed 10713.35 samples/sec   Loss 6.5324   LearningRate 0.0135   Epoch: 25   Global Step: 128020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:02:05,048-Speed 10675.20 samples/sec   Loss 6.7123   LearningRate 0.0135   Epoch: 25   Global Step: 128030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:02:05,978-Speed 11016.85 samples/sec   Loss 6.5670   LearningRate 0.0135   Epoch: 25   Global Step: 128040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:02:06,920-Speed 10877.91 samples/sec   Loss 6.5685   LearningRate 0.0135   Epoch: 25   Global Step: 128050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:02:07,854-Speed 10980.35 samples/sec   Loss 6.5876   LearningRate 0.0135   Epoch: 25   Global Step: 128060   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:02:08,816-Speed 10656.55 samples/sec   Loss 6.7356   LearningRate 0.0135   Epoch: 25   Global Step: 128070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:02:09,747-Speed 11010.31 samples/sec   Loss 6.7463   LearningRate 0.0135   Epoch: 25   Global Step: 128080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:02:10,720-Speed 10542.05 samples/sec   Loss 6.6903   LearningRate 0.0135   Epoch: 25   Global Step: 128090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:02:11,709-Speed 10356.04 samples/sec   Loss 6.6269   LearningRate 0.0135   Epoch: 25   Global Step: 128100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:02:12,687-Speed 10474.40 samples/sec   Loss 6.5095   LearningRate 0.0135   Epoch: 25   Global Step: 128110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:02:13,686-Speed 10258.23 samples/sec   Loss 6.6344   LearningRate 0.0135   Epoch: 25   Global Step: 128120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:02:14,603-Speed 11184.68 samples/sec   Loss 6.7111   LearningRate 0.0134   Epoch: 25   Global Step: 128130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:02:15,542-Speed 10910.77 samples/sec   Loss 6.5453   LearningRate 0.0134   Epoch: 25   Global Step: 128140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:02:16,462-Speed 11147.35 samples/sec   Loss 6.6105   LearningRate 0.0134   Epoch: 25   Global Step: 128150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:02:17,379-Speed 11162.04 samples/sec   Loss 6.7046   LearningRate 0.0134   Epoch: 25   Global Step: 128160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:02:18,335-Speed 10728.90 samples/sec   Loss 6.5289   LearningRate 0.0134   Epoch: 25   Global Step: 128170   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:02:19,358-Speed 10019.32 samples/sec   Loss 6.7219   LearningRate 0.0134   Epoch: 25   Global Step: 128180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:02:20,327-Speed 10576.80 samples/sec   Loss 6.5577   LearningRate 0.0134   Epoch: 25   Global Step: 128190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:02:21,267-Speed 10901.18 samples/sec   Loss 6.6401   LearningRate 0.0134   Epoch: 25   Global Step: 128200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:02:22,208-Speed 10879.56 samples/sec   Loss 6.4954   LearningRate 0.0134   Epoch: 25   Global Step: 128210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:02:23,182-Speed 10534.79 samples/sec   Loss 6.7449   LearningRate 0.0134   Epoch: 25   Global Step: 128220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:02:24,116-Speed 10964.60 samples/sec   Loss 6.8119   LearningRate 0.0134   Epoch: 25   Global Step: 128230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:02:25,062-Speed 10833.24 samples/sec   Loss 6.5883   LearningRate 0.0134   Epoch: 25   Global Step: 128240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:02:26,012-Speed 10791.06 samples/sec   Loss 6.7630   LearningRate 0.0134   Epoch: 25   Global Step: 128250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:02:26,944-Speed 10997.95 samples/sec   Loss 6.5232   LearningRate 0.0134   Epoch: 25   Global Step: 128260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:02:27,914-Speed 10567.39 samples/sec   Loss 6.5296   LearningRate 0.0134   Epoch: 25   Global Step: 128270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:02:28,885-Speed 10551.51 samples/sec   Loss 6.6164   LearningRate 0.0134   Epoch: 25   Global Step: 128280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:02:29,889-Speed 10212.07 samples/sec   Loss 6.6844   LearningRate 0.0134   Epoch: 25   Global Step: 128290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:02:30,853-Speed 10627.45 samples/sec   Loss 6.5788   LearningRate 0.0134   Epoch: 25   Global Step: 128300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:02:31,839-Speed 10395.43 samples/sec   Loss 6.5409   LearningRate 0.0134   Epoch: 25   Global Step: 128310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:02:32,767-Speed 11046.68 samples/sec   Loss 6.6783   LearningRate 0.0134   Epoch: 25   Global Step: 128320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:02:33,734-Speed 10596.43 samples/sec   Loss 6.6925   LearningRate 0.0134   Epoch: 25   Global Step: 128330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:02:34,666-Speed 10997.23 samples/sec   Loss 6.7726   LearningRate 0.0134   Epoch: 25   Global Step: 128340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:02:35,647-Speed 10450.59 samples/sec   Loss 6.7012   LearningRate 0.0134   Epoch: 25   Global Step: 128350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:02:36,633-Speed 10396.70 samples/sec   Loss 6.6379   LearningRate 0.0134   Epoch: 25   Global Step: 128360   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:02:37,561-Speed 11034.67 samples/sec   Loss 6.6473   LearningRate 0.0134   Epoch: 25   Global Step: 128370   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:02:38,516-Speed 10739.76 samples/sec   Loss 6.7301   LearningRate 0.0134   Epoch: 25   Global Step: 128380   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:02:39,482-Speed 10605.87 samples/sec   Loss 6.6103   LearningRate 0.0134   Epoch: 25   Global Step: 128390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:02:40,449-Speed 10600.77 samples/sec   Loss 6.7154   LearningRate 0.0133   Epoch: 25   Global Step: 128400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:02:41,416-Speed 10598.17 samples/sec   Loss 6.6771   LearningRate 0.0133   Epoch: 25   Global Step: 128410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:02:42,385-Speed 10579.19 samples/sec   Loss 6.6635   LearningRate 0.0133   Epoch: 25   Global Step: 128420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:02:43,336-Speed 10769.21 samples/sec   Loss 6.6072   LearningRate 0.0133   Epoch: 25   Global Step: 128430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:02:44,355-Speed 10062.17 samples/sec   Loss 6.8695   LearningRate 0.0133   Epoch: 25   Global Step: 128440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:02:45,314-Speed 10683.00 samples/sec   Loss 6.5790   LearningRate 0.0133   Epoch: 25   Global Step: 128450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:02:46,232-Speed 11167.66 samples/sec   Loss 6.6571   LearningRate 0.0133   Epoch: 25   Global Step: 128460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:02:47,213-Speed 10448.05 samples/sec   Loss 6.7014   LearningRate 0.0133   Epoch: 25   Global Step: 128470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:02:48,166-Speed 10755.86 samples/sec   Loss 6.6346   LearningRate 0.0133   Epoch: 25   Global Step: 128480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:02:49,155-Speed 10370.47 samples/sec   Loss 6.6979   LearningRate 0.0133   Epoch: 25   Global Step: 128490   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:02:50,080-Speed 11082.71 samples/sec   Loss 6.6317   LearningRate 0.0133   Epoch: 25   Global Step: 128500   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:02:50,993-Speed 11226.11 samples/sec   Loss 6.7991   LearningRate 0.0133   Epoch: 25   Global Step: 128510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:02:51,957-Speed 10630.48 samples/sec   Loss 6.7249   LearningRate 0.0133   Epoch: 25   Global Step: 128520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:02:52,945-Speed 10380.45 samples/sec   Loss 6.6841   LearningRate 0.0133   Epoch: 25   Global Step: 128530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:02:53,916-Speed 10555.49 samples/sec   Loss 6.6456   LearningRate 0.0133   Epoch: 25   Global Step: 128540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:02:54,874-Speed 10693.42 samples/sec   Loss 6.6837   LearningRate 0.0133   Epoch: 25   Global Step: 128550   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:02:55,795-Speed 11125.75 samples/sec   Loss 6.8960   LearningRate 0.0133   Epoch: 25   Global Step: 128560   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:02:56,748-Speed 10754.46 samples/sec   Loss 6.7872   LearningRate 0.0133   Epoch: 25   Global Step: 128570   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:02:57,740-Speed 10329.29 samples/sec   Loss 6.7244   LearningRate 0.0133   Epoch: 25   Global Step: 128580   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:02:58,716-Speed 10502.70 samples/sec   Loss 6.6260   LearningRate 0.0133   Epoch: 25   Global Step: 128590   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:02:59,669-Speed 10759.39 samples/sec   Loss 6.6778   LearningRate 0.0133   Epoch: 25   Global Step: 128600   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:03:00,618-Speed 10791.77 samples/sec   Loss 6.7612   LearningRate 0.0133   Epoch: 25   Global Step: 128610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:03:01,578-Speed 10680.25 samples/sec   Loss 6.7972   LearningRate 0.0133   Epoch: 25   Global Step: 128620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:03:02,552-Speed 10527.13 samples/sec   Loss 6.7098   LearningRate 0.0133   Epoch: 25   Global Step: 128630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:03:03,495-Speed 10862.70 samples/sec   Loss 6.5306   LearningRate 0.0133   Epoch: 25   Global Step: 128640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:03:04,480-Speed 10402.51 samples/sec   Loss 6.7267   LearningRate 0.0133   Epoch: 25   Global Step: 128650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:05,437-Speed 10711.18 samples/sec   Loss 6.6523   LearningRate 0.0133   Epoch: 25   Global Step: 128660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:06,405-Speed 10587.90 samples/sec   Loss 6.7564   LearningRate 0.0133   Epoch: 25   Global Step: 128670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:07,354-Speed 10792.28 samples/sec   Loss 6.7361   LearningRate 0.0132   Epoch: 25   Global Step: 128680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:08,311-Speed 10713.84 samples/sec   Loss 6.8746   LearningRate 0.0132   Epoch: 25   Global Step: 128690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:09,265-Speed 10743.69 samples/sec   Loss 6.7161   LearningRate 0.0132   Epoch: 25   Global Step: 128700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:10,244-Speed 10469.48 samples/sec   Loss 6.5693   LearningRate 0.0132   Epoch: 25   Global Step: 128710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:11,217-Speed 10533.14 samples/sec   Loss 6.7912   LearningRate 0.0132   Epoch: 25   Global Step: 128720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:12,207-Speed 10356.60 samples/sec   Loss 6.6728   LearningRate 0.0132   Epoch: 25   Global Step: 128730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:13,160-Speed 10750.52 samples/sec   Loss 6.7062   LearningRate 0.0132   Epoch: 25   Global Step: 128740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:14,105-Speed 10848.52 samples/sec   Loss 6.8207   LearningRate 0.0132   Epoch: 25   Global Step: 128750   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:03:15,093-Speed 10381.64 samples/sec   Loss 6.6053   LearningRate 0.0132   Epoch: 25   Global Step: 128760   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:03:16,047-Speed 10742.74 samples/sec   Loss 6.7205   LearningRate 0.0132   Epoch: 25   Global Step: 128770   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:03:17,005-Speed 10702.81 samples/sec   Loss 6.7924   LearningRate 0.0132   Epoch: 25   Global Step: 128780   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:03:17,957-Speed 10764.69 samples/sec   Loss 6.6901   LearningRate 0.0132   Epoch: 25   Global Step: 128790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:18,921-Speed 10646.32 samples/sec   Loss 6.8041   LearningRate 0.0132   Epoch: 25   Global Step: 128800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:19,940-Speed 10059.01 samples/sec   Loss 6.7224   LearningRate 0.0132   Epoch: 25   Global Step: 128810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:20,898-Speed 10700.52 samples/sec   Loss 6.6819   LearningRate 0.0132   Epoch: 25   Global Step: 128820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:21,848-Speed 10786.85 samples/sec   Loss 6.8174   LearningRate 0.0132   Epoch: 25   Global Step: 128830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:22,796-Speed 10805.94 samples/sec   Loss 6.8803   LearningRate 0.0132   Epoch: 25   Global Step: 128840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:23,729-Speed 10980.86 samples/sec   Loss 6.8600   LearningRate 0.0132   Epoch: 25   Global Step: 128850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:24,743-Speed 10109.02 samples/sec   Loss 6.6430   LearningRate 0.0132   Epoch: 25   Global Step: 128860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:25,706-Speed 10649.32 samples/sec   Loss 6.6649   LearningRate 0.0132   Epoch: 25   Global Step: 128870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:26,638-Speed 11000.97 samples/sec   Loss 6.7699   LearningRate 0.0132   Epoch: 25   Global Step: 128880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:27,599-Speed 10662.03 samples/sec   Loss 6.6583   LearningRate 0.0132   Epoch: 25   Global Step: 128890   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:03:28,567-Speed 10589.91 samples/sec   Loss 6.7500   LearningRate 0.0132   Epoch: 25   Global Step: 128900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:29,533-Speed 10611.26 samples/sec   Loss 6.8619   LearningRate 0.0132   Epoch: 25   Global Step: 128910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:03:30,495-Speed 10652.22 samples/sec   Loss 6.6376   LearningRate 0.0132   Epoch: 25   Global Step: 128920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:03:31,451-Speed 10725.53 samples/sec   Loss 6.7324   LearningRate 0.0132   Epoch: 25   Global Step: 128930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:03:32,426-Speed 10512.27 samples/sec   Loss 6.7752   LearningRate 0.0132   Epoch: 25   Global Step: 128940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:03:33,371-Speed 10843.12 samples/sec   Loss 6.7518   LearningRate 0.0132   Epoch: 25   Global Step: 128950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:03:34,345-Speed 10530.67 samples/sec   Loss 6.6255   LearningRate 0.0131   Epoch: 25   Global Step: 128960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:03:35,312-Speed 10594.92 samples/sec   Loss 6.7432   LearningRate 0.0131   Epoch: 25   Global Step: 128970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:03:36,283-Speed 10546.99 samples/sec   Loss 6.7798   LearningRate 0.0131   Epoch: 25   Global Step: 128980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:03:37,224-Speed 10895.76 samples/sec   Loss 6.7208   LearningRate 0.0131   Epoch: 25   Global Step: 128990   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:03:38,205-Speed 10451.34 samples/sec   Loss 6.7442   LearningRate 0.0131   Epoch: 25   Global Step: 129000   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:03:39,148-Speed 10875.91 samples/sec   Loss 6.7267   LearningRate 0.0131   Epoch: 25   Global Step: 129010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:40,106-Speed 10692.38 samples/sec   Loss 6.7182   LearningRate 0.0131   Epoch: 25   Global Step: 129020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:41,047-Speed 10904.75 samples/sec   Loss 6.7096   LearningRate 0.0131   Epoch: 25   Global Step: 129030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:42,003-Speed 10709.22 samples/sec   Loss 6.8256   LearningRate 0.0131   Epoch: 25   Global Step: 129040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:42,979-Speed 10500.82 samples/sec   Loss 6.7153   LearningRate 0.0131   Epoch: 25   Global Step: 129050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:43,918-Speed 10923.67 samples/sec   Loss 6.5990   LearningRate 0.0131   Epoch: 25   Global Step: 129060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:44,881-Speed 10644.13 samples/sec   Loss 6.7870   LearningRate 0.0131   Epoch: 25   Global Step: 129070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:45,820-Speed 10908.78 samples/sec   Loss 6.6738   LearningRate 0.0131   Epoch: 25   Global Step: 129080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:46,759-Speed 10915.09 samples/sec   Loss 6.5527   LearningRate 0.0131   Epoch: 25   Global Step: 129090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:47,723-Speed 10640.83 samples/sec   Loss 6.7224   LearningRate 0.0131   Epoch: 25   Global Step: 129100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:48,692-Speed 10578.41 samples/sec   Loss 6.8376   LearningRate 0.0131   Epoch: 25   Global Step: 129110   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:03:49,643-Speed 10779.89 samples/sec   Loss 6.5115   LearningRate 0.0131   Epoch: 25   Global Step: 129120   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:03:50,620-Speed 10491.84 samples/sec   Loss 6.7443   LearningRate 0.0131   Epoch: 25   Global Step: 129130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:51,594-Speed 10526.39 samples/sec   Loss 6.8311   LearningRate 0.0131   Epoch: 25   Global Step: 129140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:52,518-Speed 11086.24 samples/sec   Loss 6.6895   LearningRate 0.0131   Epoch: 25   Global Step: 129150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:53,520-Speed 10220.49 samples/sec   Loss 6.6701   LearningRate 0.0131   Epoch: 25   Global Step: 129160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:54,503-Speed 10433.55 samples/sec   Loss 6.7481   LearningRate 0.0131   Epoch: 25   Global Step: 129170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:55,461-Speed 10711.02 samples/sec   Loss 6.6158   LearningRate 0.0131   Epoch: 25   Global Step: 129180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:56,396-Speed 10954.12 samples/sec   Loss 6.7872   LearningRate 0.0131   Epoch: 25   Global Step: 129190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:57,324-Speed 11051.39 samples/sec   Loss 6.7161   LearningRate 0.0131   Epoch: 25   Global Step: 129200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:58,305-Speed 10445.70 samples/sec   Loss 6.9812   LearningRate 0.0131   Epoch: 25   Global Step: 129210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:03:59,319-Speed 10105.44 samples/sec   Loss 6.6929   LearningRate 0.0131   Epoch: 25   Global Step: 129220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:04:00,282-Speed 10640.49 samples/sec   Loss 6.7567   LearningRate 0.0131   Epoch: 25   Global Step: 129230   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:04:01,212-Speed 11016.61 samples/sec   Loss 6.7241   LearningRate 0.0130   Epoch: 25   Global Step: 129240   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:04:02,158-Speed 10833.50 samples/sec   Loss 6.8657   LearningRate 0.0130   Epoch: 25   Global Step: 129250   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:04:03,145-Speed 10396.08 samples/sec   Loss 6.7389   LearningRate 0.0130   Epoch: 25   Global Step: 129260   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:04:04,083-Speed 10924.75 samples/sec   Loss 6.6175   LearningRate 0.0130   Epoch: 25   Global Step: 129270   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:04:05,026-Speed 10868.11 samples/sec   Loss 6.7864   LearningRate 0.0130   Epoch: 25   Global Step: 129280   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:04:05,942-Speed 11187.69 samples/sec   Loss 6.8415   LearningRate 0.0130   Epoch: 25   Global Step: 129290   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:04:06,891-Speed 10795.36 samples/sec   Loss 6.8350   LearningRate 0.0130   Epoch: 25   Global Step: 129300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:04:07,886-Speed 10307.56 samples/sec   Loss 6.7674   LearningRate 0.0130   Epoch: 25   Global Step: 129310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:04:08,833-Speed 10817.20 samples/sec   Loss 6.8325   LearningRate 0.0130   Epoch: 25   Global Step: 129320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:04:09,786-Speed 10755.95 samples/sec   Loss 6.8528   LearningRate 0.0130   Epoch: 25   Global Step: 129330   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:04:10,794-Speed 10166.27 samples/sec   Loss 6.9253   LearningRate 0.0130   Epoch: 25   Global Step: 129340   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:04:11,790-Speed 10289.71 samples/sec   Loss 6.6860   LearningRate 0.0130   Epoch: 25   Global Step: 129350   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:04:12,741-Speed 10781.27 samples/sec   Loss 6.7693   LearningRate 0.0130   Epoch: 25   Global Step: 129360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:04:13,687-Speed 10833.29 samples/sec   Loss 6.7228   LearningRate 0.0130   Epoch: 25   Global Step: 129370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:04:14,664-Speed 10483.28 samples/sec   Loss 6.9091   LearningRate 0.0130   Epoch: 25   Global Step: 129380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:04:15,640-Speed 10501.86 samples/sec   Loss 6.8621   LearningRate 0.0130   Epoch: 25   Global Step: 129390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:04:16,569-Speed 11032.37 samples/sec   Loss 6.6956   LearningRate 0.0130   Epoch: 25   Global Step: 129400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:04:17,511-Speed 10886.26 samples/sec   Loss 6.7716   LearningRate 0.0130   Epoch: 25   Global Step: 129410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:04:18,500-Speed 10359.52 samples/sec   Loss 6.8087   LearningRate 0.0130   Epoch: 25   Global Step: 129420   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:04:19,475-Speed 10510.43 samples/sec   Loss 6.6993   LearningRate 0.0130   Epoch: 25   Global Step: 129430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:04:20,411-Speed 10956.92 samples/sec   Loss 6.7374   LearningRate 0.0130   Epoch: 25   Global Step: 129440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:04:21,369-Speed 10712.23 samples/sec   Loss 6.6959   LearningRate 0.0130   Epoch: 25   Global Step: 129450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:04:22,327-Speed 10706.77 samples/sec   Loss 6.7392   LearningRate 0.0130   Epoch: 25   Global Step: 129460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:04:23,289-Speed 10657.89 samples/sec   Loss 6.8088   LearningRate 0.0130   Epoch: 25   Global Step: 129470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:04:24,236-Speed 10814.37 samples/sec   Loss 6.7458   LearningRate 0.0130   Epoch: 25   Global Step: 129480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:04:25,228-Speed 10334.09 samples/sec   Loss 6.6461   LearningRate 0.0130   Epoch: 25   Global Step: 129490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:04:26,154-Speed 11073.26 samples/sec   Loss 6.8181   LearningRate 0.0130   Epoch: 25   Global Step: 129500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:04:27,107-Speed 10750.75 samples/sec   Loss 6.7093   LearningRate 0.0130   Epoch: 25   Global Step: 129510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:04:28,077-Speed 10571.00 samples/sec   Loss 6.8449   LearningRate 0.0129   Epoch: 25   Global Step: 129520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:04:29,078-Speed 10242.95 samples/sec   Loss 6.8265   LearningRate 0.0129   Epoch: 25   Global Step: 129530   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:04:30,025-Speed 10825.78 samples/sec   Loss 6.8737   LearningRate 0.0129   Epoch: 25   Global Step: 129540   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:04:30,984-Speed 10680.84 samples/sec   Loss 6.7129   LearningRate 0.0129   Epoch: 25   Global Step: 129550   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:04:31,961-Speed 10497.43 samples/sec   Loss 6.7550   LearningRate 0.0129   Epoch: 25   Global Step: 129560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:04:32,918-Speed 10714.92 samples/sec   Loss 6.8055   LearningRate 0.0129   Epoch: 25   Global Step: 129570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:04:33,901-Speed 10415.32 samples/sec   Loss 6.6249   LearningRate 0.0129   Epoch: 25   Global Step: 129580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:04:34,873-Speed 10547.32 samples/sec   Loss 6.7186   LearningRate 0.0129   Epoch: 25   Global Step: 129590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:04:35,826-Speed 10756.20 samples/sec   Loss 6.6777   LearningRate 0.0129   Epoch: 25   Global Step: 129600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:04:36,783-Speed 10710.56 samples/sec   Loss 6.6907   LearningRate 0.0129   Epoch: 25   Global Step: 129610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:04:37,759-Speed 10492.72 samples/sec   Loss 6.7128   LearningRate 0.0129   Epoch: 25   Global Step: 129620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:04:38,724-Speed 10621.25 samples/sec   Loss 6.9588   LearningRate 0.0129   Epoch: 25   Global Step: 129630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:04:39,688-Speed 10634.75 samples/sec   Loss 6.8283   LearningRate 0.0129   Epoch: 25   Global Step: 129640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:04:40,669-Speed 10450.28 samples/sec   Loss 6.8171   LearningRate 0.0129   Epoch: 25   Global Step: 129650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:04:41,609-Speed 10899.20 samples/sec   Loss 6.7784   LearningRate 0.0129   Epoch: 25   Global Step: 129660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:04:42,541-Speed 11004.20 samples/sec   Loss 6.8603   LearningRate 0.0129   Epoch: 25   Global Step: 129670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:04:43,507-Speed 10602.06 samples/sec   Loss 6.9546   LearningRate 0.0129   Epoch: 25   Global Step: 129680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:04:44,446-Speed 10910.26 samples/sec   Loss 6.7984   LearningRate 0.0129   Epoch: 25   Global Step: 129690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:04:45,422-Speed 10510.34 samples/sec   Loss 6.8690   LearningRate 0.0129   Epoch: 25   Global Step: 129700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:04:46,379-Speed 10703.05 samples/sec   Loss 6.9594   LearningRate 0.0129   Epoch: 25   Global Step: 129710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:04:47,317-Speed 10925.29 samples/sec   Loss 6.7742   LearningRate 0.0129   Epoch: 25   Global Step: 129720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:04:48,246-Speed 11038.67 samples/sec   Loss 6.7225   LearningRate 0.0129   Epoch: 25   Global Step: 129730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:04:49,208-Speed 10650.51 samples/sec   Loss 6.9136   LearningRate 0.0129   Epoch: 25   Global Step: 129740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:04:50,174-Speed 10603.95 samples/sec   Loss 6.6950   LearningRate 0.0129   Epoch: 25   Global Step: 129750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:04:51,130-Speed 10726.47 samples/sec   Loss 6.7366   LearningRate 0.0129   Epoch: 25   Global Step: 129760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:04:52,079-Speed 10798.10 samples/sec   Loss 6.8984   LearningRate 0.0129   Epoch: 25   Global Step: 129770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:04:53,034-Speed 10745.37 samples/sec   Loss 6.7171   LearningRate 0.0129   Epoch: 25   Global Step: 129780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:04:53,968-Speed 10968.96 samples/sec   Loss 6.8417   LearningRate 0.0129   Epoch: 25   Global Step: 129790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:04:54,921-Speed 10761.59 samples/sec   Loss 6.7965   LearningRate 0.0128   Epoch: 25   Global Step: 129800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:04:55,892-Speed 10557.49 samples/sec   Loss 6.8344   LearningRate 0.0128   Epoch: 25   Global Step: 129810   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:04:56,860-Speed 10584.06 samples/sec   Loss 6.7714   LearningRate 0.0128   Epoch: 25   Global Step: 129820   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:04:57,821-Speed 10669.71 samples/sec   Loss 6.7407   LearningRate 0.0128   Epoch: 25   Global Step: 129830   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:04:58,771-Speed 10783.24 samples/sec   Loss 6.6771   LearningRate 0.0128   Epoch: 25   Global Step: 129840   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:04:59,680-Speed 11283.42 samples/sec   Loss 6.7245   LearningRate 0.0128   Epoch: 25   Global Step: 129850   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:05:00,634-Speed 10733.63 samples/sec   Loss 6.7199   LearningRate 0.0128   Epoch: 25   Global Step: 129860   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:05:01,621-Speed 10389.05 samples/sec   Loss 6.8220   LearningRate 0.0128   Epoch: 25   Global Step: 129870   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:05:02,572-Speed 10777.84 samples/sec   Loss 6.8296   LearningRate 0.0128   Epoch: 25   Global Step: 129880   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:05:03,502-Speed 11026.01 samples/sec   Loss 6.7471   LearningRate 0.0128   Epoch: 25   Global Step: 129890   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:05:04,409-Speed 11302.88 samples/sec   Loss 6.6892   LearningRate 0.0128   Epoch: 25   Global Step: 129900   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:05:05,366-Speed 10704.90 samples/sec   Loss 6.8348   LearningRate 0.0128   Epoch: 25   Global Step: 129910   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:05:06,315-Speed 10798.33 samples/sec   Loss 6.7904   LearningRate 0.0128   Epoch: 25   Global Step: 129920   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:05:07,238-Speed 11105.43 samples/sec   Loss 6.7319   LearningRate 0.0128   Epoch: 25   Global Step: 129930   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:05:08,154-Speed 11186.92 samples/sec   Loss 6.8156   LearningRate 0.0128   Epoch: 25   Global Step: 129940   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:05:09,138-Speed 10414.34 samples/sec   Loss 6.7749   LearningRate 0.0128   Epoch: 25   Global Step: 129950   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:05:10,072-Speed 10972.83 samples/sec   Loss 6.6417   LearningRate 0.0128   Epoch: 25   Global Step: 129960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:05:11,072-Speed 10253.41 samples/sec   Loss 6.8791   LearningRate 0.0128   Epoch: 25   Global Step: 129970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:05:12,024-Speed 10763.37 samples/sec   Loss 6.8186   LearningRate 0.0128   Epoch: 25   Global Step: 129980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:05:12,986-Speed 10657.15 samples/sec   Loss 6.8882   LearningRate 0.0128   Epoch: 25   Global Step: 129990   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:05:13,965-Speed 10464.19 samples/sec   Loss 6.7004   LearningRate 0.0128   Epoch: 25   Global Step: 130000   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:05:36,519-[lfw][130000]XNorm: 9.467653
Training: 2022-04-11 04:05:36,520-[lfw][130000]Accuracy-Flip: 0.99633+-0.00323
Training: 2022-04-11 04:05:36,521-[lfw][130000]Accuracy-Highest: 0.99700
Training: 2022-04-11 04:06:02,165-[cfp_fp][130000]XNorm: 8.067559
Training: 2022-04-11 04:06:02,166-[cfp_fp][130000]Accuracy-Flip: 0.96671+-0.00864
Training: 2022-04-11 04:06:02,167-[cfp_fp][130000]Accuracy-Highest: 0.96800
Training: 2022-04-11 04:06:24,549-[agedb_30][130000]XNorm: 9.222839
Training: 2022-04-11 04:06:24,550-[agedb_30][130000]Accuracy-Flip: 0.96850+-0.00965
Training: 2022-04-11 04:06:24,550-[agedb_30][130000]Accuracy-Highest: 0.97017
Training: 2022-04-11 04:06:25,459-Speed 143.23 samples/sec   Loss 6.7530   LearningRate 0.0128   Epoch: 25   Global Step: 130010   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:06:26,399-Speed 10910.34 samples/sec   Loss 6.6821   LearningRate 0.0128   Epoch: 25   Global Step: 130020   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:06:27,372-Speed 10535.12 samples/sec   Loss 6.6856   LearningRate 0.0128   Epoch: 25   Global Step: 130030   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:06:28,317-Speed 10839.16 samples/sec   Loss 6.7534   LearningRate 0.0128   Epoch: 25   Global Step: 130040   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:06:29,258-Speed 10903.54 samples/sec   Loss 6.8683   LearningRate 0.0128   Epoch: 25   Global Step: 130050   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:06:30,211-Speed 10755.15 samples/sec   Loss 6.7604   LearningRate 0.0128   Epoch: 25   Global Step: 130060   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:06:31,161-Speed 10796.91 samples/sec   Loss 6.8183   LearningRate 0.0128   Epoch: 25   Global Step: 130070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:06:32,114-Speed 10753.06 samples/sec   Loss 6.7464   LearningRate 0.0127   Epoch: 25   Global Step: 130080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:06:33,086-Speed 10541.43 samples/sec   Loss 6.7227   LearningRate 0.0127   Epoch: 25   Global Step: 130090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:06:34,079-Speed 10320.18 samples/sec   Loss 6.7511   LearningRate 0.0127   Epoch: 25   Global Step: 130100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:06:35,050-Speed 10562.33 samples/sec   Loss 6.7909   LearningRate 0.0127   Epoch: 25   Global Step: 130110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:06:36,018-Speed 10587.20 samples/sec   Loss 6.7381   LearningRate 0.0127   Epoch: 25   Global Step: 130120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:06:36,958-Speed 10906.43 samples/sec   Loss 6.7501   LearningRate 0.0127   Epoch: 25   Global Step: 130130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:06:37,874-Speed 11186.88 samples/sec   Loss 6.8079   LearningRate 0.0127   Epoch: 25   Global Step: 130140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:06:38,829-Speed 10731.72 samples/sec   Loss 6.8385   LearningRate 0.0127   Epoch: 25   Global Step: 130150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:06:39,781-Speed 10761.15 samples/sec   Loss 6.8097   LearningRate 0.0127   Epoch: 25   Global Step: 130160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:06:40,704-Speed 11107.99 samples/sec   Loss 6.8906   LearningRate 0.0127   Epoch: 25   Global Step: 130170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:06:41,640-Speed 10944.67 samples/sec   Loss 6.8289   LearningRate 0.0127   Epoch: 25   Global Step: 130180   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:06:42,589-Speed 10809.73 samples/sec   Loss 6.8928   LearningRate 0.0127   Epoch: 25   Global Step: 130190   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:06:43,539-Speed 10783.67 samples/sec   Loss 6.7375   LearningRate 0.0127   Epoch: 25   Global Step: 130200   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:06:44,492-Speed 10759.37 samples/sec   Loss 6.7805   LearningRate 0.0127   Epoch: 25   Global Step: 130210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:06:45,400-Speed 11292.04 samples/sec   Loss 6.5831   LearningRate 0.0127   Epoch: 25   Global Step: 130220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:06:46,348-Speed 10810.54 samples/sec   Loss 6.8737   LearningRate 0.0127   Epoch: 25   Global Step: 130230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:06:47,285-Speed 10941.99 samples/sec   Loss 6.7774   LearningRate 0.0127   Epoch: 25   Global Step: 130240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:06:48,274-Speed 10362.17 samples/sec   Loss 6.7737   LearningRate 0.0127   Epoch: 25   Global Step: 130250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:06:49,234-Speed 10680.22 samples/sec   Loss 6.8693   LearningRate 0.0127   Epoch: 25   Global Step: 130260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:06:50,234-Speed 10253.40 samples/sec   Loss 6.8195   LearningRate 0.0127   Epoch: 25   Global Step: 130270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:06:51,162-Speed 11048.04 samples/sec   Loss 6.8532   LearningRate 0.0127   Epoch: 25   Global Step: 130280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:06:52,118-Speed 10721.88 samples/sec   Loss 6.7271   LearningRate 0.0127   Epoch: 25   Global Step: 130290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:06:53,041-Speed 11106.25 samples/sec   Loss 6.8814   LearningRate 0.0127   Epoch: 25   Global Step: 130300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:06:53,997-Speed 10719.63 samples/sec   Loss 6.7985   LearningRate 0.0127   Epoch: 25   Global Step: 130310   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:06:55,012-Speed 10093.80 samples/sec   Loss 6.7488   LearningRate 0.0127   Epoch: 25   Global Step: 130320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:06:55,970-Speed 10702.44 samples/sec   Loss 6.6721   LearningRate 0.0127   Epoch: 25   Global Step: 130330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:06:56,902-Speed 10996.19 samples/sec   Loss 6.7355   LearningRate 0.0127   Epoch: 25   Global Step: 130340   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:06:57,885-Speed 10431.74 samples/sec   Loss 6.9330   LearningRate 0.0127   Epoch: 25   Global Step: 130350   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:06:58,860-Speed 10515.41 samples/sec   Loss 6.9371   LearningRate 0.0127   Epoch: 25   Global Step: 130360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:06:59,812-Speed 10756.64 samples/sec   Loss 6.7661   LearningRate 0.0126   Epoch: 25   Global Step: 130370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:07:00,759-Speed 10829.52 samples/sec   Loss 6.7652   LearningRate 0.0126   Epoch: 25   Global Step: 130380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:07:01,704-Speed 10845.49 samples/sec   Loss 6.8228   LearningRate 0.0126   Epoch: 25   Global Step: 130390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:07:02,673-Speed 10576.85 samples/sec   Loss 6.8831   LearningRate 0.0126   Epoch: 25   Global Step: 130400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:07:03,598-Speed 11075.48 samples/sec   Loss 6.7089   LearningRate 0.0126   Epoch: 25   Global Step: 130410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:07:04,574-Speed 10502.64 samples/sec   Loss 6.7540   LearningRate 0.0126   Epoch: 25   Global Step: 130420   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:07:05,539-Speed 10619.26 samples/sec   Loss 6.9653   LearningRate 0.0126   Epoch: 25   Global Step: 130430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:07:06,500-Speed 10671.29 samples/sec   Loss 6.8032   LearningRate 0.0126   Epoch: 25   Global Step: 130440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:07:07,468-Speed 10589.53 samples/sec   Loss 6.8207   LearningRate 0.0126   Epoch: 25   Global Step: 130450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:07:08,449-Speed 10445.20 samples/sec   Loss 6.8595   LearningRate 0.0126   Epoch: 25   Global Step: 130460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:07:09,442-Speed 10321.08 samples/sec   Loss 6.8684   LearningRate 0.0126   Epoch: 25   Global Step: 130470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:07:10,435-Speed 10317.33 samples/sec   Loss 6.6724   LearningRate 0.0126   Epoch: 25   Global Step: 130480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:07:11,375-Speed 10905.51 samples/sec   Loss 6.9491   LearningRate 0.0126   Epoch: 25   Global Step: 130490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:07:12,373-Speed 10272.60 samples/sec   Loss 6.8866   LearningRate 0.0126   Epoch: 25   Global Step: 130500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:07:13,309-Speed 10955.78 samples/sec   Loss 6.8689   LearningRate 0.0126   Epoch: 25   Global Step: 130510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:07:14,244-Speed 10951.84 samples/sec   Loss 6.9271   LearningRate 0.0126   Epoch: 25   Global Step: 130520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:07:15,207-Speed 10642.38 samples/sec   Loss 6.8588   LearningRate 0.0126   Epoch: 25   Global Step: 130530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:07:16,174-Speed 10601.16 samples/sec   Loss 6.6822   LearningRate 0.0126   Epoch: 25   Global Step: 130540   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:07:17,137-Speed 10640.62 samples/sec   Loss 6.9053   LearningRate 0.0126   Epoch: 25   Global Step: 130550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:07:18,118-Speed 10447.64 samples/sec   Loss 6.8781   LearningRate 0.0126   Epoch: 25   Global Step: 130560   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:07:19,077-Speed 10681.12 samples/sec   Loss 6.9298   LearningRate 0.0126   Epoch: 25   Global Step: 130570   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:07:20,025-Speed 10810.77 samples/sec   Loss 6.7577   LearningRate 0.0126   Epoch: 25   Global Step: 130580   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:07:20,991-Speed 10609.27 samples/sec   Loss 6.8194   LearningRate 0.0126   Epoch: 25   Global Step: 130590   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:07:21,976-Speed 10409.99 samples/sec   Loss 6.7737   LearningRate 0.0126   Epoch: 25   Global Step: 130600   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:07:22,911-Speed 10961.49 samples/sec   Loss 6.8225   LearningRate 0.0126   Epoch: 25   Global Step: 130610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:07:23,914-Speed 10220.56 samples/sec   Loss 6.8042   LearningRate 0.0126   Epoch: 25   Global Step: 130620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:07:24,872-Speed 10697.50 samples/sec   Loss 6.7233   LearningRate 0.0126   Epoch: 25   Global Step: 130630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:07:25,801-Speed 11028.56 samples/sec   Loss 6.9875   LearningRate 0.0126   Epoch: 25   Global Step: 130640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:07:26,755-Speed 10740.90 samples/sec   Loss 6.8049   LearningRate 0.0125   Epoch: 25   Global Step: 130650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:07:27,692-Speed 10933.98 samples/sec   Loss 6.6986   LearningRate 0.0125   Epoch: 25   Global Step: 130660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:07:28,661-Speed 10575.45 samples/sec   Loss 6.9617   LearningRate 0.0125   Epoch: 25   Global Step: 130670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:07:29,617-Speed 10730.01 samples/sec   Loss 6.7315   LearningRate 0.0125   Epoch: 25   Global Step: 130680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:07:30,594-Speed 10487.75 samples/sec   Loss 6.7215   LearningRate 0.0125   Epoch: 25   Global Step: 130690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:07:31,533-Speed 10917.18 samples/sec   Loss 6.6477   LearningRate 0.0125   Epoch: 25   Global Step: 130700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:07:32,483-Speed 10778.01 samples/sec   Loss 6.8049   LearningRate 0.0125   Epoch: 25   Global Step: 130710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:07:33,443-Speed 10682.20 samples/sec   Loss 6.6720   LearningRate 0.0125   Epoch: 25   Global Step: 130720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:07:34,404-Speed 10670.03 samples/sec   Loss 6.8726   LearningRate 0.0125   Epoch: 25   Global Step: 130730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:07:35,346-Speed 10867.69 samples/sec   Loss 6.8620   LearningRate 0.0125   Epoch: 25   Global Step: 130740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:07:36,307-Speed 10674.60 samples/sec   Loss 6.8228   LearningRate 0.0125   Epoch: 25   Global Step: 130750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:07:37,251-Speed 10850.91 samples/sec   Loss 6.8937   LearningRate 0.0125   Epoch: 25   Global Step: 130760   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:07:38,213-Speed 10664.12 samples/sec   Loss 6.8214   LearningRate 0.0125   Epoch: 25   Global Step: 130770   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:07:39,150-Speed 10941.07 samples/sec   Loss 6.7442   LearningRate 0.0125   Epoch: 25   Global Step: 130780   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:07:40,082-Speed 10991.07 samples/sec   Loss 6.7915   LearningRate 0.0125   Epoch: 25   Global Step: 130790   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:07:41,044-Speed 10653.81 samples/sec   Loss 6.8467   LearningRate 0.0125   Epoch: 25   Global Step: 130800   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:07:42,011-Speed 10606.16 samples/sec   Loss 6.7795   LearningRate 0.0125   Epoch: 25   Global Step: 130810   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:07:42,970-Speed 10682.34 samples/sec   Loss 6.8889   LearningRate 0.0125   Epoch: 25   Global Step: 130820   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:07:43,910-Speed 10906.11 samples/sec   Loss 6.9696   LearningRate 0.0125   Epoch: 25   Global Step: 130830   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:07:44,886-Speed 10499.80 samples/sec   Loss 6.8178   LearningRate 0.0125   Epoch: 25   Global Step: 130840   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:07:45,802-Speed 11178.01 samples/sec   Loss 6.9542   LearningRate 0.0125   Epoch: 25   Global Step: 130850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:07:46,762-Speed 10679.57 samples/sec   Loss 6.8784   LearningRate 0.0125   Epoch: 25   Global Step: 130860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:07:47,731-Speed 10578.11 samples/sec   Loss 6.9275   LearningRate 0.0125   Epoch: 25   Global Step: 130870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:07:48,689-Speed 10700.13 samples/sec   Loss 6.8100   LearningRate 0.0125   Epoch: 25   Global Step: 130880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:07:49,640-Speed 10777.66 samples/sec   Loss 6.9082   LearningRate 0.0125   Epoch: 25   Global Step: 130890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:07:50,627-Speed 10376.35 samples/sec   Loss 6.8425   LearningRate 0.0125   Epoch: 25   Global Step: 130900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:07:51,600-Speed 10535.03 samples/sec   Loss 6.7871   LearningRate 0.0125   Epoch: 25   Global Step: 130910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:07:52,568-Speed 10589.68 samples/sec   Loss 6.7996   LearningRate 0.0125   Epoch: 25   Global Step: 130920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:07:53,531-Speed 10641.00 samples/sec   Loss 6.8014   LearningRate 0.0125   Epoch: 25   Global Step: 130930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:07:54,462-Speed 11008.11 samples/sec   Loss 6.8859   LearningRate 0.0124   Epoch: 25   Global Step: 130940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:07:55,422-Speed 10674.24 samples/sec   Loss 6.8211   LearningRate 0.0124   Epoch: 25   Global Step: 130950   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:07:56,389-Speed 10598.62 samples/sec   Loss 6.7067   LearningRate 0.0124   Epoch: 25   Global Step: 130960   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:07:57,310-Speed 11129.01 samples/sec   Loss 6.8726   LearningRate 0.0124   Epoch: 25   Global Step: 130970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:07:58,238-Speed 11054.94 samples/sec   Loss 7.0504   LearningRate 0.0124   Epoch: 25   Global Step: 130980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:07:59,176-Speed 10922.56 samples/sec   Loss 6.6859   LearningRate 0.0124   Epoch: 25   Global Step: 130990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:08:00,169-Speed 10319.17 samples/sec   Loss 6.6070   LearningRate 0.0124   Epoch: 25   Global Step: 131000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:08:01,114-Speed 10841.40 samples/sec   Loss 6.7415   LearningRate 0.0124   Epoch: 25   Global Step: 131010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:08:02,068-Speed 10743.46 samples/sec   Loss 6.8595   LearningRate 0.0124   Epoch: 25   Global Step: 131020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:08:03,014-Speed 10836.15 samples/sec   Loss 6.8114   LearningRate 0.0124   Epoch: 25   Global Step: 131030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:08:03,985-Speed 10550.88 samples/sec   Loss 6.8437   LearningRate 0.0124   Epoch: 25   Global Step: 131040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:08:04,935-Speed 10792.92 samples/sec   Loss 6.8742   LearningRate 0.0124   Epoch: 25   Global Step: 131050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:08:05,868-Speed 10984.34 samples/sec   Loss 6.8440   LearningRate 0.0124   Epoch: 25   Global Step: 131060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:08:06,825-Speed 10709.61 samples/sec   Loss 6.6972   LearningRate 0.0124   Epoch: 25   Global Step: 131070   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:08:07,778-Speed 10756.59 samples/sec   Loss 6.8707   LearningRate 0.0124   Epoch: 25   Global Step: 131080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:08:08,730-Speed 10765.89 samples/sec   Loss 6.7733   LearningRate 0.0124   Epoch: 25   Global Step: 131090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:08:09,705-Speed 10508.04 samples/sec   Loss 6.7977   LearningRate 0.0124   Epoch: 25   Global Step: 131100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:08:10,685-Speed 10458.62 samples/sec   Loss 6.8990   LearningRate 0.0124   Epoch: 25   Global Step: 131110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:08:11,645-Speed 10677.07 samples/sec   Loss 6.9551   LearningRate 0.0124   Epoch: 25   Global Step: 131120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:08:12,593-Speed 10809.86 samples/sec   Loss 6.9001   LearningRate 0.0124   Epoch: 25   Global Step: 131130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:08:13,526-Speed 10978.77 samples/sec   Loss 6.7328   LearningRate 0.0124   Epoch: 25   Global Step: 131140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:08:14,499-Speed 10541.37 samples/sec   Loss 6.7524   LearningRate 0.0124   Epoch: 25   Global Step: 131150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:08:15,424-Speed 11082.15 samples/sec   Loss 6.8358   LearningRate 0.0124   Epoch: 25   Global Step: 131160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:08:16,384-Speed 10668.97 samples/sec   Loss 7.0648   LearningRate 0.0124   Epoch: 25   Global Step: 131170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:08:17,333-Speed 10798.24 samples/sec   Loss 6.7134   LearningRate 0.0124   Epoch: 25   Global Step: 131180   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:08:18,371-Speed 9874.57 samples/sec   Loss 6.8489   LearningRate 0.0124   Epoch: 25   Global Step: 131190   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:08:19,359-Speed 10381.74 samples/sec   Loss 6.8467   LearningRate 0.0124   Epoch: 25   Global Step: 131200   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:08:20,317-Speed 10695.64 samples/sec   Loss 6.8095   LearningRate 0.0124   Epoch: 25   Global Step: 131210   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:08:21,272-Speed 10728.82 samples/sec   Loss 6.5743   LearningRate 0.0123   Epoch: 25   Global Step: 131220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:08:22,217-Speed 10838.27 samples/sec   Loss 6.9422   LearningRate 0.0123   Epoch: 25   Global Step: 131230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:08:23,174-Speed 10707.18 samples/sec   Loss 6.9058   LearningRate 0.0123   Epoch: 25   Global Step: 131240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:08:24,205-Speed 9939.85 samples/sec   Loss 6.9652   LearningRate 0.0123   Epoch: 25   Global Step: 131250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:08:25,175-Speed 10571.22 samples/sec   Loss 6.7976   LearningRate 0.0123   Epoch: 25   Global Step: 131260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:08:26,150-Speed 10512.28 samples/sec   Loss 6.8668   LearningRate 0.0123   Epoch: 25   Global Step: 131270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:08:27,119-Speed 10582.59 samples/sec   Loss 6.8713   LearningRate 0.0123   Epoch: 25   Global Step: 131280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:08:28,079-Speed 10670.71 samples/sec   Loss 6.6797   LearningRate 0.0123   Epoch: 25   Global Step: 131290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:08:29,083-Speed 10214.31 samples/sec   Loss 6.7005   LearningRate 0.0123   Epoch: 25   Global Step: 131300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:08:30,019-Speed 10945.03 samples/sec   Loss 6.7691   LearningRate 0.0123   Epoch: 25   Global Step: 131310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:08:30,953-Speed 10975.05 samples/sec   Loss 6.8480   LearningRate 0.0123   Epoch: 25   Global Step: 131320   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:08:31,911-Speed 10690.17 samples/sec   Loss 6.7716   LearningRate 0.0123   Epoch: 25   Global Step: 131330   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:08:32,846-Speed 10967.64 samples/sec   Loss 6.8905   LearningRate 0.0123   Epoch: 25   Global Step: 131340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:08:33,776-Speed 11020.41 samples/sec   Loss 6.7013   LearningRate 0.0123   Epoch: 25   Global Step: 131350   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:08:34,740-Speed 10627.61 samples/sec   Loss 7.0572   LearningRate 0.0123   Epoch: 25   Global Step: 131360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:08:35,696-Speed 10715.29 samples/sec   Loss 6.8340   LearningRate 0.0123   Epoch: 25   Global Step: 131370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:08:36,672-Speed 10503.92 samples/sec   Loss 6.9323   LearningRate 0.0123   Epoch: 25   Global Step: 131380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:08:37,613-Speed 10896.80 samples/sec   Loss 6.9522   LearningRate 0.0123   Epoch: 25   Global Step: 131390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:08:38,548-Speed 10958.67 samples/sec   Loss 6.8466   LearningRate 0.0123   Epoch: 25   Global Step: 131400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:08:39,531-Speed 10424.15 samples/sec   Loss 6.8489   LearningRate 0.0123   Epoch: 25   Global Step: 131410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:08:40,487-Speed 10721.95 samples/sec   Loss 6.7863   LearningRate 0.0123   Epoch: 25   Global Step: 131420   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:08:41,449-Speed 10659.74 samples/sec   Loss 6.7843   LearningRate 0.0123   Epoch: 25   Global Step: 131430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:08:42,397-Speed 10806.32 samples/sec   Loss 6.9466   LearningRate 0.0123   Epoch: 25   Global Step: 131440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:08:43,300-Speed 11358.41 samples/sec   Loss 6.7064   LearningRate 0.0123   Epoch: 25   Global Step: 131450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:08:44,232-Speed 10990.52 samples/sec   Loss 6.8177   LearningRate 0.0123   Epoch: 25   Global Step: 131460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:08:45,199-Speed 10598.84 samples/sec   Loss 6.8106   LearningRate 0.0123   Epoch: 25   Global Step: 131470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:08:46,130-Speed 11011.25 samples/sec   Loss 6.9089   LearningRate 0.0123   Epoch: 25   Global Step: 131480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:08:47,082-Speed 10761.51 samples/sec   Loss 6.6810   LearningRate 0.0123   Epoch: 25   Global Step: 131490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:08:48,170-Speed 9422.18 samples/sec   Loss 6.7472   LearningRate 0.0123   Epoch: 25   Global Step: 131500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:08:57,999-Speed 1042.01 samples/sec   Loss 6.7210   LearningRate 0.0122   Epoch: 26   Global Step: 131510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:08:59,423-Speed 7201.42 samples/sec   Loss 6.0679   LearningRate 0.0122   Epoch: 26   Global Step: 131520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:00,508-Speed 9447.18 samples/sec   Loss 5.9564   LearningRate 0.0122   Epoch: 26   Global Step: 131530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:01,442-Speed 10969.42 samples/sec   Loss 6.0534   LearningRate 0.0122   Epoch: 26   Global Step: 131540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:02,393-Speed 10777.36 samples/sec   Loss 5.9549   LearningRate 0.0122   Epoch: 26   Global Step: 131550   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:09:03,367-Speed 10521.26 samples/sec   Loss 5.9914   LearningRate 0.0122   Epoch: 26   Global Step: 131560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:04,380-Speed 10112.48 samples/sec   Loss 6.0671   LearningRate 0.0122   Epoch: 26   Global Step: 131570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:05,313-Speed 10981.92 samples/sec   Loss 6.0789   LearningRate 0.0122   Epoch: 26   Global Step: 131580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:06,252-Speed 10927.14 samples/sec   Loss 6.0850   LearningRate 0.0122   Epoch: 26   Global Step: 131590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:07,232-Speed 10454.02 samples/sec   Loss 6.1764   LearningRate 0.0122   Epoch: 26   Global Step: 131600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:08,173-Speed 10884.68 samples/sec   Loss 6.1515   LearningRate 0.0122   Epoch: 26   Global Step: 131610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:09,156-Speed 10426.29 samples/sec   Loss 6.0854   LearningRate 0.0122   Epoch: 26   Global Step: 131620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:10,204-Speed 9780.42 samples/sec   Loss 6.1247   LearningRate 0.0122   Epoch: 26   Global Step: 131630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:11,202-Speed 10277.47 samples/sec   Loss 6.1596   LearningRate 0.0122   Epoch: 26   Global Step: 131640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:12,193-Speed 10344.03 samples/sec   Loss 6.1100   LearningRate 0.0122   Epoch: 26   Global Step: 131650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:13,156-Speed 10638.43 samples/sec   Loss 6.0332   LearningRate 0.0122   Epoch: 26   Global Step: 131660   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:09:14,127-Speed 10563.69 samples/sec   Loss 6.0226   LearningRate 0.0122   Epoch: 26   Global Step: 131670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:15,100-Speed 10528.66 samples/sec   Loss 6.1220   LearningRate 0.0122   Epoch: 26   Global Step: 131680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:16,069-Speed 10570.19 samples/sec   Loss 6.0575   LearningRate 0.0122   Epoch: 26   Global Step: 131690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:17,050-Speed 10445.32 samples/sec   Loss 6.0372   LearningRate 0.0122   Epoch: 26   Global Step: 131700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:18,023-Speed 10535.03 samples/sec   Loss 6.0873   LearningRate 0.0122   Epoch: 26   Global Step: 131710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:18,979-Speed 10724.25 samples/sec   Loss 6.0666   LearningRate 0.0122   Epoch: 26   Global Step: 131720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:19,973-Speed 10304.37 samples/sec   Loss 6.1434   LearningRate 0.0122   Epoch: 26   Global Step: 131730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:20,936-Speed 10641.22 samples/sec   Loss 6.1921   LearningRate 0.0122   Epoch: 26   Global Step: 131740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:21,853-Speed 11191.26 samples/sec   Loss 6.3112   LearningRate 0.0122   Epoch: 26   Global Step: 131750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:22,783-Speed 11012.49 samples/sec   Loss 6.2660   LearningRate 0.0122   Epoch: 26   Global Step: 131760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:23,749-Speed 10610.34 samples/sec   Loss 6.3005   LearningRate 0.0122   Epoch: 26   Global Step: 131770   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:09:24,691-Speed 10884.13 samples/sec   Loss 6.2903   LearningRate 0.0122   Epoch: 26   Global Step: 131780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:25,683-Speed 10332.11 samples/sec   Loss 6.2262   LearningRate 0.0122   Epoch: 26   Global Step: 131790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:26,610-Speed 11056.86 samples/sec   Loss 6.0962   LearningRate 0.0121   Epoch: 26   Global Step: 131800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:27,555-Speed 10845.39 samples/sec   Loss 6.1677   LearningRate 0.0121   Epoch: 26   Global Step: 131810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:28,530-Speed 10512.60 samples/sec   Loss 6.1507   LearningRate 0.0121   Epoch: 26   Global Step: 131820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:29,500-Speed 10564.17 samples/sec   Loss 6.0265   LearningRate 0.0121   Epoch: 26   Global Step: 131830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:30,513-Speed 10122.70 samples/sec   Loss 6.2401   LearningRate 0.0121   Epoch: 26   Global Step: 131840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:31,467-Speed 10741.27 samples/sec   Loss 6.1113   LearningRate 0.0121   Epoch: 26   Global Step: 131850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:32,418-Speed 10774.51 samples/sec   Loss 6.2555   LearningRate 0.0121   Epoch: 26   Global Step: 131860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:33,370-Speed 10775.99 samples/sec   Loss 6.1565   LearningRate 0.0121   Epoch: 26   Global Step: 131870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:34,364-Speed 10307.60 samples/sec   Loss 6.1806   LearningRate 0.0121   Epoch: 26   Global Step: 131880   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:09:35,308-Speed 10851.35 samples/sec   Loss 6.1422   LearningRate 0.0121   Epoch: 26   Global Step: 131890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:36,241-Speed 10995.77 samples/sec   Loss 6.1338   LearningRate 0.0121   Epoch: 26   Global Step: 131900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:37,195-Speed 10739.22 samples/sec   Loss 6.3097   LearningRate 0.0121   Epoch: 26   Global Step: 131910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:38,140-Speed 10842.16 samples/sec   Loss 6.1299   LearningRate 0.0121   Epoch: 26   Global Step: 131920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:39,087-Speed 10819.40 samples/sec   Loss 6.2690   LearningRate 0.0121   Epoch: 26   Global Step: 131930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:40,032-Speed 10841.91 samples/sec   Loss 6.2797   LearningRate 0.0121   Epoch: 26   Global Step: 131940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:40,999-Speed 10596.89 samples/sec   Loss 6.2983   LearningRate 0.0121   Epoch: 26   Global Step: 131950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:41,959-Speed 10685.99 samples/sec   Loss 6.3063   LearningRate 0.0121   Epoch: 26   Global Step: 131960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:09:42,959-Speed 10246.79 samples/sec   Loss 6.2311   LearningRate 0.0121   Epoch: 26   Global Step: 131970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:09:43,923-Speed 10630.86 samples/sec   Loss 6.3554   LearningRate 0.0121   Epoch: 26   Global Step: 131980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:09:44,897-Speed 10520.44 samples/sec   Loss 6.1481   LearningRate 0.0121   Epoch: 26   Global Step: 131990   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:09:45,847-Speed 10792.44 samples/sec   Loss 6.3343   LearningRate 0.0121   Epoch: 26   Global Step: 132000   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:10:08,498-[lfw][132000]XNorm: 9.400377
Training: 2022-04-11 04:10:08,499-[lfw][132000]Accuracy-Flip: 0.99617+-0.00334
Training: 2022-04-11 04:10:08,500-[lfw][132000]Accuracy-Highest: 0.99700
Training: 2022-04-11 04:10:34,242-[cfp_fp][132000]XNorm: 8.042096
Training: 2022-04-11 04:10:34,242-[cfp_fp][132000]Accuracy-Flip: 0.96643+-0.00895
Training: 2022-04-11 04:10:34,244-[cfp_fp][132000]Accuracy-Highest: 0.96800
Training: 2022-04-11 04:10:56,519-[agedb_30][132000]XNorm: 9.176513
Training: 2022-04-11 04:10:56,519-[agedb_30][132000]Accuracy-Flip: 0.96667+-0.00775
Training: 2022-04-11 04:10:56,520-[agedb_30][132000]Accuracy-Highest: 0.97017
Training: 2022-04-11 04:10:57,449-Speed 143.01 samples/sec   Loss 6.2048   LearningRate 0.0121   Epoch: 26   Global Step: 132010   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:10:58,408-Speed 10685.77 samples/sec   Loss 6.0817   LearningRate 0.0121   Epoch: 26   Global Step: 132020   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:10:59,384-Speed 10499.03 samples/sec   Loss 6.2568   LearningRate 0.0121   Epoch: 26   Global Step: 132030   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:11:00,355-Speed 10552.61 samples/sec   Loss 6.0982   LearningRate 0.0121   Epoch: 26   Global Step: 132040   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:11:01,317-Speed 10655.83 samples/sec   Loss 6.1800   LearningRate 0.0121   Epoch: 26   Global Step: 132050   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:11:02,249-Speed 10999.91 samples/sec   Loss 6.1732   LearningRate 0.0121   Epoch: 26   Global Step: 132060   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:11:03,219-Speed 10562.49 samples/sec   Loss 6.0880   LearningRate 0.0121   Epoch: 26   Global Step: 132070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:11:04,170-Speed 10784.09 samples/sec   Loss 6.2878   LearningRate 0.0121   Epoch: 26   Global Step: 132080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:11:05,137-Speed 10604.35 samples/sec   Loss 6.3859   LearningRate 0.0120   Epoch: 26   Global Step: 132090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:11:06,099-Speed 10661.14 samples/sec   Loss 6.2468   LearningRate 0.0120   Epoch: 26   Global Step: 132100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:11:07,064-Speed 10613.46 samples/sec   Loss 6.2439   LearningRate 0.0120   Epoch: 26   Global Step: 132110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:11:08,027-Speed 10641.44 samples/sec   Loss 6.2148   LearningRate 0.0120   Epoch: 26   Global Step: 132120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:11:08,985-Speed 10703.54 samples/sec   Loss 6.1450   LearningRate 0.0120   Epoch: 26   Global Step: 132130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:11:10,010-Speed 10002.67 samples/sec   Loss 6.2350   LearningRate 0.0120   Epoch: 26   Global Step: 132140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:11:10,969-Speed 10696.39 samples/sec   Loss 6.3355   LearningRate 0.0120   Epoch: 26   Global Step: 132150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:11:11,917-Speed 10809.01 samples/sec   Loss 6.2472   LearningRate 0.0120   Epoch: 26   Global Step: 132160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:11:12,864-Speed 10819.20 samples/sec   Loss 6.2127   LearningRate 0.0120   Epoch: 26   Global Step: 132170   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:11:13,826-Speed 10659.90 samples/sec   Loss 6.2779   LearningRate 0.0120   Epoch: 26   Global Step: 132180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:11:14,803-Speed 10487.34 samples/sec   Loss 6.1631   LearningRate 0.0120   Epoch: 26   Global Step: 132190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:11:15,709-Speed 11312.98 samples/sec   Loss 6.2410   LearningRate 0.0120   Epoch: 26   Global Step: 132200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:11:16,728-Speed 10050.65 samples/sec   Loss 6.2720   LearningRate 0.0120   Epoch: 26   Global Step: 132210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:11:17,786-Speed 9692.12 samples/sec   Loss 6.2573   LearningRate 0.0120   Epoch: 26   Global Step: 132220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:11:18,846-Speed 9666.83 samples/sec   Loss 6.2692   LearningRate 0.0120   Epoch: 26   Global Step: 132230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:11:19,806-Speed 10681.42 samples/sec   Loss 6.3652   LearningRate 0.0120   Epoch: 26   Global Step: 132240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:11:20,789-Speed 10427.65 samples/sec   Loss 6.2449   LearningRate 0.0120   Epoch: 26   Global Step: 132250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:11:21,762-Speed 10533.66 samples/sec   Loss 6.2527   LearningRate 0.0120   Epoch: 26   Global Step: 132260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:11:22,709-Speed 10824.41 samples/sec   Loss 6.2174   LearningRate 0.0120   Epoch: 26   Global Step: 132270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:11:23,675-Speed 10610.98 samples/sec   Loss 6.4023   LearningRate 0.0120   Epoch: 26   Global Step: 132280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:11:24,619-Speed 10853.66 samples/sec   Loss 6.1462   LearningRate 0.0120   Epoch: 26   Global Step: 132290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:11:25,588-Speed 10576.21 samples/sec   Loss 6.4502   LearningRate 0.0120   Epoch: 26   Global Step: 132300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:11:26,560-Speed 10539.31 samples/sec   Loss 6.2871   LearningRate 0.0120   Epoch: 26   Global Step: 132310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:11:27,518-Speed 10699.81 samples/sec   Loss 6.3186   LearningRate 0.0120   Epoch: 26   Global Step: 132320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:11:28,456-Speed 10930.04 samples/sec   Loss 6.2414   LearningRate 0.0120   Epoch: 26   Global Step: 132330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:11:29,412-Speed 10719.22 samples/sec   Loss 6.3469   LearningRate 0.0120   Epoch: 26   Global Step: 132340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:11:30,418-Speed 10179.33 samples/sec   Loss 6.2961   LearningRate 0.0120   Epoch: 26   Global Step: 132350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:11:31,425-Speed 10180.16 samples/sec   Loss 6.2633   LearningRate 0.0120   Epoch: 26   Global Step: 132360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:11:32,369-Speed 10865.55 samples/sec   Loss 6.3720   LearningRate 0.0120   Epoch: 26   Global Step: 132370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:11:33,306-Speed 10931.21 samples/sec   Loss 6.3151   LearningRate 0.0120   Epoch: 26   Global Step: 132380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:11:34,277-Speed 10549.47 samples/sec   Loss 6.2693   LearningRate 0.0119   Epoch: 26   Global Step: 132390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:11:35,223-Speed 10834.88 samples/sec   Loss 6.3667   LearningRate 0.0119   Epoch: 26   Global Step: 132400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:11:36,177-Speed 10737.59 samples/sec   Loss 6.4559   LearningRate 0.0119   Epoch: 26   Global Step: 132410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:11:37,140-Speed 10644.06 samples/sec   Loss 6.4690   LearningRate 0.0119   Epoch: 26   Global Step: 132420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:11:38,112-Speed 10547.83 samples/sec   Loss 6.3067   LearningRate 0.0119   Epoch: 26   Global Step: 132430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:11:39,073-Speed 10667.46 samples/sec   Loss 6.2298   LearningRate 0.0119   Epoch: 26   Global Step: 132440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:11:40,064-Speed 10334.83 samples/sec   Loss 6.4105   LearningRate 0.0119   Epoch: 26   Global Step: 132450   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:11:41,037-Speed 10535.31 samples/sec   Loss 6.4588   LearningRate 0.0119   Epoch: 26   Global Step: 132460   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:11:41,985-Speed 10804.89 samples/sec   Loss 6.3897   LearningRate 0.0119   Epoch: 26   Global Step: 132470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:11:42,920-Speed 10964.22 samples/sec   Loss 6.3282   LearningRate 0.0119   Epoch: 26   Global Step: 132480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:11:43,913-Speed 10323.90 samples/sec   Loss 6.3172   LearningRate 0.0119   Epoch: 26   Global Step: 132490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:11:44,866-Speed 10750.39 samples/sec   Loss 6.2488   LearningRate 0.0119   Epoch: 26   Global Step: 132500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:11:45,773-Speed 11299.39 samples/sec   Loss 6.3954   LearningRate 0.0119   Epoch: 26   Global Step: 132510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:11:46,719-Speed 10836.56 samples/sec   Loss 6.3334   LearningRate 0.0119   Epoch: 26   Global Step: 132520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:11:47,684-Speed 10617.80 samples/sec   Loss 6.4150   LearningRate 0.0119   Epoch: 26   Global Step: 132530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:11:48,634-Speed 10792.84 samples/sec   Loss 6.4571   LearningRate 0.0119   Epoch: 26   Global Step: 132540   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:11:49,599-Speed 10628.17 samples/sec   Loss 6.3302   LearningRate 0.0119   Epoch: 26   Global Step: 132550   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:11:50,580-Speed 10446.99 samples/sec   Loss 6.4173   LearningRate 0.0119   Epoch: 26   Global Step: 132560   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:11:51,533-Speed 10745.10 samples/sec   Loss 6.2864   LearningRate 0.0119   Epoch: 26   Global Step: 132570   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:11:52,507-Speed 10533.48 samples/sec   Loss 6.3188   LearningRate 0.0119   Epoch: 26   Global Step: 132580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:11:53,473-Speed 10641.77 samples/sec   Loss 6.3840   LearningRate 0.0119   Epoch: 26   Global Step: 132590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:11:54,416-Speed 10859.12 samples/sec   Loss 6.4716   LearningRate 0.0119   Epoch: 26   Global Step: 132600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:11:55,361-Speed 10854.41 samples/sec   Loss 6.2913   LearningRate 0.0119   Epoch: 26   Global Step: 132610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:11:56,328-Speed 10590.07 samples/sec   Loss 6.3537   LearningRate 0.0119   Epoch: 26   Global Step: 132620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:11:57,339-Speed 10145.59 samples/sec   Loss 6.4109   LearningRate 0.0119   Epoch: 26   Global Step: 132630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:11:58,271-Speed 10989.18 samples/sec   Loss 6.2377   LearningRate 0.0119   Epoch: 26   Global Step: 132640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:11:59,245-Speed 10520.19 samples/sec   Loss 6.3897   LearningRate 0.0119   Epoch: 26   Global Step: 132650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:00,219-Speed 10529.48 samples/sec   Loss 6.2505   LearningRate 0.0119   Epoch: 26   Global Step: 132660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:01,168-Speed 10796.00 samples/sec   Loss 6.3487   LearningRate 0.0119   Epoch: 26   Global Step: 132670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:02,104-Speed 10953.72 samples/sec   Loss 6.3503   LearningRate 0.0118   Epoch: 26   Global Step: 132680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:03,022-Speed 11165.52 samples/sec   Loss 6.3981   LearningRate 0.0118   Epoch: 26   Global Step: 132690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:04,033-Speed 10139.34 samples/sec   Loss 6.5183   LearningRate 0.0118   Epoch: 26   Global Step: 132700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:04,972-Speed 10906.21 samples/sec   Loss 6.4368   LearningRate 0.0118   Epoch: 26   Global Step: 132710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:05,949-Speed 10498.08 samples/sec   Loss 6.3315   LearningRate 0.0118   Epoch: 26   Global Step: 132720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:06,947-Speed 10266.88 samples/sec   Loss 6.5101   LearningRate 0.0118   Epoch: 26   Global Step: 132730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:07,887-Speed 10900.21 samples/sec   Loss 6.3407   LearningRate 0.0118   Epoch: 26   Global Step: 132740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:08,836-Speed 10797.35 samples/sec   Loss 6.3561   LearningRate 0.0118   Epoch: 26   Global Step: 132750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:09,805-Speed 10589.09 samples/sec   Loss 6.4914   LearningRate 0.0118   Epoch: 26   Global Step: 132760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:10,795-Speed 10348.97 samples/sec   Loss 6.3316   LearningRate 0.0118   Epoch: 26   Global Step: 132770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:11,765-Speed 10566.03 samples/sec   Loss 6.4822   LearningRate 0.0118   Epoch: 26   Global Step: 132780   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:12:12,745-Speed 10468.57 samples/sec   Loss 6.4377   LearningRate 0.0118   Epoch: 26   Global Step: 132790   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:12:13,699-Speed 10736.87 samples/sec   Loss 6.3602   LearningRate 0.0118   Epoch: 26   Global Step: 132800   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:12:14,638-Speed 10910.75 samples/sec   Loss 6.3200   LearningRate 0.0118   Epoch: 26   Global Step: 132810   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:12:15,564-Speed 11070.11 samples/sec   Loss 6.2431   LearningRate 0.0118   Epoch: 26   Global Step: 132820   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:12:16,493-Speed 11030.71 samples/sec   Loss 6.2704   LearningRate 0.0118   Epoch: 26   Global Step: 132830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:17,422-Speed 11031.54 samples/sec   Loss 6.3909   LearningRate 0.0118   Epoch: 26   Global Step: 132840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:18,424-Speed 10229.42 samples/sec   Loss 6.5855   LearningRate 0.0118   Epoch: 26   Global Step: 132850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:19,387-Speed 10648.59 samples/sec   Loss 6.4100   LearningRate 0.0118   Epoch: 26   Global Step: 132860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:20,344-Speed 10712.52 samples/sec   Loss 6.3719   LearningRate 0.0118   Epoch: 26   Global Step: 132870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:21,281-Speed 10933.95 samples/sec   Loss 6.4322   LearningRate 0.0118   Epoch: 26   Global Step: 132880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:22,237-Speed 10725.96 samples/sec   Loss 6.3767   LearningRate 0.0118   Epoch: 26   Global Step: 132890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:23,149-Speed 11241.38 samples/sec   Loss 6.3889   LearningRate 0.0118   Epoch: 26   Global Step: 132900   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:12:24,123-Speed 10514.98 samples/sec   Loss 6.4313   LearningRate 0.0118   Epoch: 26   Global Step: 132910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:12:25,097-Speed 10525.55 samples/sec   Loss 6.3622   LearningRate 0.0118   Epoch: 26   Global Step: 132920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:12:26,048-Speed 10770.83 samples/sec   Loss 6.3704   LearningRate 0.0118   Epoch: 26   Global Step: 132930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:12:27,007-Speed 10696.63 samples/sec   Loss 6.3919   LearningRate 0.0118   Epoch: 26   Global Step: 132940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:12:27,956-Speed 10789.77 samples/sec   Loss 6.4233   LearningRate 0.0118   Epoch: 26   Global Step: 132950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:12:28,921-Speed 10628.23 samples/sec   Loss 6.3310   LearningRate 0.0118   Epoch: 26   Global Step: 132960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:12:29,888-Speed 10589.81 samples/sec   Loss 6.4449   LearningRate 0.0117   Epoch: 26   Global Step: 132970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:12:30,850-Speed 10654.45 samples/sec   Loss 6.5064   LearningRate 0.0117   Epoch: 26   Global Step: 132980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:12:31,829-Speed 10784.48 samples/sec   Loss 6.5453   LearningRate 0.0117   Epoch: 26   Global Step: 132990   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:12:32,774-Speed 10847.38 samples/sec   Loss 6.4088   LearningRate 0.0117   Epoch: 26   Global Step: 133000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:33,756-Speed 10432.74 samples/sec   Loss 6.3321   LearningRate 0.0117   Epoch: 26   Global Step: 133010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:34,716-Speed 10677.84 samples/sec   Loss 6.5449   LearningRate 0.0117   Epoch: 26   Global Step: 133020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:35,636-Speed 11138.16 samples/sec   Loss 6.6179   LearningRate 0.0117   Epoch: 26   Global Step: 133030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:36,627-Speed 10341.55 samples/sec   Loss 6.4744   LearningRate 0.0117   Epoch: 26   Global Step: 133040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:37,570-Speed 10868.14 samples/sec   Loss 6.4392   LearningRate 0.0117   Epoch: 26   Global Step: 133050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:38,484-Speed 11212.38 samples/sec   Loss 6.3071   LearningRate 0.0117   Epoch: 26   Global Step: 133060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:39,458-Speed 10524.97 samples/sec   Loss 6.4819   LearningRate 0.0117   Epoch: 26   Global Step: 133070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:40,418-Speed 10676.76 samples/sec   Loss 6.3807   LearningRate 0.0117   Epoch: 26   Global Step: 133080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:41,381-Speed 10641.66 samples/sec   Loss 6.5202   LearningRate 0.0117   Epoch: 26   Global Step: 133090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:42,363-Speed 10431.63 samples/sec   Loss 6.4670   LearningRate 0.0117   Epoch: 26   Global Step: 133100   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:12:43,309-Speed 10843.98 samples/sec   Loss 6.4106   LearningRate 0.0117   Epoch: 26   Global Step: 133110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:44,245-Speed 10947.56 samples/sec   Loss 6.5500   LearningRate 0.0117   Epoch: 26   Global Step: 133120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:45,179-Speed 10975.01 samples/sec   Loss 6.3787   LearningRate 0.0117   Epoch: 26   Global Step: 133130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:46,126-Speed 10823.13 samples/sec   Loss 6.4090   LearningRate 0.0117   Epoch: 26   Global Step: 133140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:47,070-Speed 10850.14 samples/sec   Loss 6.5078   LearningRate 0.0117   Epoch: 26   Global Step: 133150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:48,018-Speed 10816.73 samples/sec   Loss 6.5735   LearningRate 0.0117   Epoch: 26   Global Step: 133160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:48,996-Speed 10479.29 samples/sec   Loss 6.5599   LearningRate 0.0117   Epoch: 26   Global Step: 133170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:49,992-Speed 10300.10 samples/sec   Loss 6.6123   LearningRate 0.0117   Epoch: 26   Global Step: 133180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:50,955-Speed 10642.96 samples/sec   Loss 6.4540   LearningRate 0.0117   Epoch: 26   Global Step: 133190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:51,927-Speed 10540.13 samples/sec   Loss 6.4758   LearningRate 0.0117   Epoch: 26   Global Step: 133200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:52,851-Speed 11088.81 samples/sec   Loss 6.4133   LearningRate 0.0117   Epoch: 26   Global Step: 133210   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:12:53,810-Speed 10694.34 samples/sec   Loss 6.4635   LearningRate 0.0117   Epoch: 26   Global Step: 133220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:54,795-Speed 10398.59 samples/sec   Loss 6.4551   LearningRate 0.0117   Epoch: 26   Global Step: 133230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:55,707-Speed 11242.68 samples/sec   Loss 6.4931   LearningRate 0.0117   Epoch: 26   Global Step: 133240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:56,652-Speed 10844.45 samples/sec   Loss 6.5826   LearningRate 0.0117   Epoch: 26   Global Step: 133250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:57,636-Speed 10407.43 samples/sec   Loss 6.5437   LearningRate 0.0117   Epoch: 26   Global Step: 133260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:58,572-Speed 10953.01 samples/sec   Loss 6.4796   LearningRate 0.0116   Epoch: 26   Global Step: 133270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:12:59,538-Speed 10605.76 samples/sec   Loss 6.6534   LearningRate 0.0116   Epoch: 26   Global Step: 133280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:00,541-Speed 10221.58 samples/sec   Loss 6.5758   LearningRate 0.0116   Epoch: 26   Global Step: 133290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:01,513-Speed 10545.05 samples/sec   Loss 6.4281   LearningRate 0.0116   Epoch: 26   Global Step: 133300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:02,461-Speed 10818.00 samples/sec   Loss 6.3651   LearningRate 0.0116   Epoch: 26   Global Step: 133310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:03,463-Speed 10226.32 samples/sec   Loss 6.6119   LearningRate 0.0116   Epoch: 26   Global Step: 133320   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:13:04,426-Speed 10638.74 samples/sec   Loss 6.4422   LearningRate 0.0116   Epoch: 26   Global Step: 133330   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:13:05,380-Speed 10740.46 samples/sec   Loss 6.5840   LearningRate 0.0116   Epoch: 26   Global Step: 133340   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:13:06,344-Speed 10632.79 samples/sec   Loss 6.3889   LearningRate 0.0116   Epoch: 26   Global Step: 133350   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:13:07,349-Speed 10201.67 samples/sec   Loss 6.4913   LearningRate 0.0116   Epoch: 26   Global Step: 133360   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:13:08,292-Speed 10863.00 samples/sec   Loss 6.4779   LearningRate 0.0116   Epoch: 26   Global Step: 133370   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:13:09,256-Speed 10634.54 samples/sec   Loss 6.4007   LearningRate 0.0116   Epoch: 26   Global Step: 133380   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:13:10,227-Speed 10557.48 samples/sec   Loss 6.4127   LearningRate 0.0116   Epoch: 26   Global Step: 133390   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:13:11,179-Speed 10755.78 samples/sec   Loss 6.4924   LearningRate 0.0116   Epoch: 26   Global Step: 133400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:12,153-Speed 10525.48 samples/sec   Loss 6.5668   LearningRate 0.0116   Epoch: 26   Global Step: 133410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:13,105-Speed 10762.95 samples/sec   Loss 6.5091   LearningRate 0.0116   Epoch: 26   Global Step: 133420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:14,071-Speed 10619.63 samples/sec   Loss 6.5975   LearningRate 0.0116   Epoch: 26   Global Step: 133430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:15,018-Speed 10815.40 samples/sec   Loss 6.5012   LearningRate 0.0116   Epoch: 26   Global Step: 133440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:15,966-Speed 10811.27 samples/sec   Loss 6.6256   LearningRate 0.0116   Epoch: 26   Global Step: 133450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:16,919-Speed 10755.99 samples/sec   Loss 6.2940   LearningRate 0.0116   Epoch: 26   Global Step: 133460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:17,895-Speed 10493.89 samples/sec   Loss 6.5391   LearningRate 0.0116   Epoch: 26   Global Step: 133470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:18,842-Speed 10829.29 samples/sec   Loss 6.5630   LearningRate 0.0116   Epoch: 26   Global Step: 133480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:19,818-Speed 10497.13 samples/sec   Loss 6.4899   LearningRate 0.0116   Epoch: 26   Global Step: 133490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:20,764-Speed 10841.80 samples/sec   Loss 6.5797   LearningRate 0.0116   Epoch: 26   Global Step: 133500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:21,729-Speed 10623.87 samples/sec   Loss 6.4269   LearningRate 0.0116   Epoch: 26   Global Step: 133510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:22,651-Speed 11124.23 samples/sec   Loss 6.5258   LearningRate 0.0116   Epoch: 26   Global Step: 133520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:23,599-Speed 10813.11 samples/sec   Loss 6.3491   LearningRate 0.0116   Epoch: 26   Global Step: 133530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:24,619-Speed 10044.49 samples/sec   Loss 6.5177   LearningRate 0.0116   Epoch: 26   Global Step: 133540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:25,533-Speed 11216.17 samples/sec   Loss 6.6188   LearningRate 0.0116   Epoch: 26   Global Step: 133550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:26,480-Speed 10825.48 samples/sec   Loss 6.4748   LearningRate 0.0116   Epoch: 26   Global Step: 133560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:27,431-Speed 10772.44 samples/sec   Loss 6.5454   LearningRate 0.0115   Epoch: 26   Global Step: 133570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:28,414-Speed 10430.04 samples/sec   Loss 6.5373   LearningRate 0.0115   Epoch: 26   Global Step: 133580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:29,395-Speed 10447.38 samples/sec   Loss 6.4837   LearningRate 0.0115   Epoch: 26   Global Step: 133590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:30,352-Speed 10711.72 samples/sec   Loss 6.6199   LearningRate 0.0115   Epoch: 26   Global Step: 133600   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:13:31,260-Speed 11292.03 samples/sec   Loss 6.5151   LearningRate 0.0115   Epoch: 26   Global Step: 133610   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:13:32,242-Speed 10444.81 samples/sec   Loss 6.5059   LearningRate 0.0115   Epoch: 26   Global Step: 133620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:33,205-Speed 10635.17 samples/sec   Loss 6.5983   LearningRate 0.0115   Epoch: 26   Global Step: 133630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:34,147-Speed 10890.56 samples/sec   Loss 6.4236   LearningRate 0.0115   Epoch: 26   Global Step: 133640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:35,087-Speed 10900.56 samples/sec   Loss 6.5872   LearningRate 0.0115   Epoch: 26   Global Step: 133650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:36,028-Speed 10889.17 samples/sec   Loss 6.5840   LearningRate 0.0115   Epoch: 26   Global Step: 133660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:36,990-Speed 10648.45 samples/sec   Loss 6.5914   LearningRate 0.0115   Epoch: 26   Global Step: 133670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:37,952-Speed 10658.31 samples/sec   Loss 6.5459   LearningRate 0.0115   Epoch: 26   Global Step: 133680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:38,923-Speed 10561.61 samples/sec   Loss 6.4679   LearningRate 0.0115   Epoch: 26   Global Step: 133690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:39,896-Speed 10532.11 samples/sec   Loss 6.6305   LearningRate 0.0115   Epoch: 26   Global Step: 133700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:13:40,850-Speed 10743.27 samples/sec   Loss 6.4992   LearningRate 0.0115   Epoch: 26   Global Step: 133710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:13:41,783-Speed 10985.52 samples/sec   Loss 6.5513   LearningRate 0.0115   Epoch: 26   Global Step: 133720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:13:42,742-Speed 10680.65 samples/sec   Loss 6.4604   LearningRate 0.0115   Epoch: 26   Global Step: 133730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:13:43,690-Speed 10815.59 samples/sec   Loss 6.5428   LearningRate 0.0115   Epoch: 26   Global Step: 133740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:13:44,620-Speed 11044.05 samples/sec   Loss 6.5071   LearningRate 0.0115   Epoch: 26   Global Step: 133750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:13:45,547-Speed 11054.43 samples/sec   Loss 6.6254   LearningRate 0.0115   Epoch: 26   Global Step: 133760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:13:46,506-Speed 10683.61 samples/sec   Loss 6.5303   LearningRate 0.0115   Epoch: 26   Global Step: 133770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:13:47,516-Speed 10149.42 samples/sec   Loss 6.4710   LearningRate 0.0115   Epoch: 26   Global Step: 133780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:13:48,471-Speed 10732.51 samples/sec   Loss 6.5018   LearningRate 0.0115   Epoch: 26   Global Step: 133790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:13:49,438-Speed 10595.76 samples/sec   Loss 6.4075   LearningRate 0.0115   Epoch: 26   Global Step: 133800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:50,429-Speed 10340.55 samples/sec   Loss 6.6294   LearningRate 0.0115   Epoch: 26   Global Step: 133810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:51,408-Speed 10469.96 samples/sec   Loss 6.7060   LearningRate 0.0115   Epoch: 26   Global Step: 133820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:52,424-Speed 10090.18 samples/sec   Loss 6.5082   LearningRate 0.0115   Epoch: 26   Global Step: 133830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:53,356-Speed 10999.38 samples/sec   Loss 6.5255   LearningRate 0.0115   Epoch: 26   Global Step: 133840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:54,313-Speed 10710.04 samples/sec   Loss 6.5961   LearningRate 0.0115   Epoch: 26   Global Step: 133850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:55,302-Speed 10364.35 samples/sec   Loss 6.6707   LearningRate 0.0114   Epoch: 26   Global Step: 133860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:56,265-Speed 10641.93 samples/sec   Loss 6.6436   LearningRate 0.0114   Epoch: 26   Global Step: 133870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:57,198-Speed 10978.02 samples/sec   Loss 6.6476   LearningRate 0.0114   Epoch: 26   Global Step: 133880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:58,151-Speed 10761.41 samples/sec   Loss 6.6239   LearningRate 0.0114   Epoch: 26   Global Step: 133890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:13:59,099-Speed 10806.69 samples/sec   Loss 6.4379   LearningRate 0.0114   Epoch: 26   Global Step: 133900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:14:00,078-Speed 10468.66 samples/sec   Loss 6.5203   LearningRate 0.0114   Epoch: 26   Global Step: 133910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:14:01,027-Speed 10799.94 samples/sec   Loss 6.5737   LearningRate 0.0114   Epoch: 26   Global Step: 133920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:14:01,952-Speed 11082.00 samples/sec   Loss 6.6785   LearningRate 0.0114   Epoch: 26   Global Step: 133930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:14:02,928-Speed 10500.26 samples/sec   Loss 6.6380   LearningRate 0.0114   Epoch: 26   Global Step: 133940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:14:03,894-Speed 10613.71 samples/sec   Loss 6.5110   LearningRate 0.0114   Epoch: 26   Global Step: 133950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:14:04,865-Speed 10554.78 samples/sec   Loss 6.5584   LearningRate 0.0114   Epoch: 26   Global Step: 133960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:14:05,810-Speed 10845.95 samples/sec   Loss 6.6134   LearningRate 0.0114   Epoch: 26   Global Step: 133970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:14:06,806-Speed 10286.29 samples/sec   Loss 6.6701   LearningRate 0.0114   Epoch: 26   Global Step: 133980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:14:07,722-Speed 11188.51 samples/sec   Loss 6.5432   LearningRate 0.0114   Epoch: 26   Global Step: 133990   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:14:08,692-Speed 10575.27 samples/sec   Loss 6.5332   LearningRate 0.0114   Epoch: 26   Global Step: 134000   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:14:30,940-[lfw][134000]XNorm: 9.251054
Training: 2022-04-11 04:14:30,941-[lfw][134000]Accuracy-Flip: 0.99617+-0.00373
Training: 2022-04-11 04:14:30,941-[lfw][134000]Accuracy-Highest: 0.99700
Training: 2022-04-11 04:14:56,537-[cfp_fp][134000]XNorm: 7.910650
Training: 2022-04-11 04:14:56,537-[cfp_fp][134000]Accuracy-Flip: 0.96614+-0.00881
Training: 2022-04-11 04:14:56,538-[cfp_fp][134000]Accuracy-Highest: 0.96800
Training: 2022-04-11 04:15:18,641-[agedb_30][134000]XNorm: 9.036329
Training: 2022-04-11 04:15:18,642-[agedb_30][134000]Accuracy-Flip: 0.97050+-0.00730
Training: 2022-04-11 04:15:18,642-[agedb_30][134000]Accuracy-Highest: 0.97050
Training: 2022-04-11 04:15:19,570-Speed 144.47 samples/sec   Loss 6.5587   LearningRate 0.0114   Epoch: 26   Global Step: 134010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:15:20,537-Speed 10600.54 samples/sec   Loss 6.6802   LearningRate 0.0114   Epoch: 26   Global Step: 134020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:15:21,449-Speed 11239.35 samples/sec   Loss 6.5878   LearningRate 0.0114   Epoch: 26   Global Step: 134030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:15:22,411-Speed 10645.97 samples/sec   Loss 6.6539   LearningRate 0.0114   Epoch: 26   Global Step: 134040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:15:23,393-Speed 10448.78 samples/sec   Loss 6.6576   LearningRate 0.0114   Epoch: 26   Global Step: 134050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:15:24,357-Speed 10630.62 samples/sec   Loss 6.7130   LearningRate 0.0114   Epoch: 26   Global Step: 134060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:15:25,293-Speed 10949.30 samples/sec   Loss 6.6833   LearningRate 0.0114   Epoch: 26   Global Step: 134070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:15:26,253-Speed 10680.47 samples/sec   Loss 6.7260   LearningRate 0.0114   Epoch: 26   Global Step: 134080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:15:27,207-Speed 10741.33 samples/sec   Loss 6.5987   LearningRate 0.0114   Epoch: 26   Global Step: 134090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:15:28,153-Speed 10852.16 samples/sec   Loss 6.6098   LearningRate 0.0114   Epoch: 26   Global Step: 134100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:15:29,098-Speed 10837.26 samples/sec   Loss 6.5656   LearningRate 0.0114   Epoch: 26   Global Step: 134110   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:15:30,037-Speed 10913.28 samples/sec   Loss 6.5281   LearningRate 0.0114   Epoch: 26   Global Step: 134120   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:15:30,990-Speed 10753.80 samples/sec   Loss 6.5997   LearningRate 0.0114   Epoch: 26   Global Step: 134130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:15:31,967-Speed 10492.70 samples/sec   Loss 6.6132   LearningRate 0.0114   Epoch: 26   Global Step: 134140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:15:32,953-Speed 10399.43 samples/sec   Loss 6.5138   LearningRate 0.0114   Epoch: 26   Global Step: 134150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:15:33,910-Speed 10716.35 samples/sec   Loss 6.6952   LearningRate 0.0113   Epoch: 26   Global Step: 134160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:15:34,865-Speed 10724.66 samples/sec   Loss 6.5888   LearningRate 0.0113   Epoch: 26   Global Step: 134170   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:15:35,788-Speed 11099.17 samples/sec   Loss 6.7047   LearningRate 0.0113   Epoch: 26   Global Step: 134180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:15:36,711-Speed 11111.33 samples/sec   Loss 6.5041   LearningRate 0.0113   Epoch: 26   Global Step: 134190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:15:37,676-Speed 10621.47 samples/sec   Loss 6.5608   LearningRate 0.0113   Epoch: 26   Global Step: 134200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:15:38,631-Speed 10728.44 samples/sec   Loss 6.5507   LearningRate 0.0113   Epoch: 26   Global Step: 134210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:15:39,604-Speed 10528.57 samples/sec   Loss 6.6179   LearningRate 0.0113   Epoch: 26   Global Step: 134220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:15:40,530-Speed 11066.79 samples/sec   Loss 6.5922   LearningRate 0.0113   Epoch: 26   Global Step: 134230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:15:41,500-Speed 10564.48 samples/sec   Loss 6.5015   LearningRate 0.0113   Epoch: 26   Global Step: 134240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:15:42,449-Speed 10805.77 samples/sec   Loss 6.6502   LearningRate 0.0113   Epoch: 26   Global Step: 134250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:15:43,356-Speed 11297.40 samples/sec   Loss 6.4509   LearningRate 0.0113   Epoch: 26   Global Step: 134260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:15:44,314-Speed 10706.56 samples/sec   Loss 6.6819   LearningRate 0.0113   Epoch: 26   Global Step: 134270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:15:45,275-Speed 10655.05 samples/sec   Loss 6.5592   LearningRate 0.0113   Epoch: 26   Global Step: 134280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:15:46,221-Speed 10830.66 samples/sec   Loss 6.5128   LearningRate 0.0113   Epoch: 26   Global Step: 134290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:15:47,204-Speed 10431.93 samples/sec   Loss 6.6042   LearningRate 0.0113   Epoch: 26   Global Step: 134300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:15:48,167-Speed 10640.62 samples/sec   Loss 6.6787   LearningRate 0.0113   Epoch: 26   Global Step: 134310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:15:49,095-Speed 11043.24 samples/sec   Loss 6.6250   LearningRate 0.0113   Epoch: 26   Global Step: 134320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:15:50,105-Speed 10148.78 samples/sec   Loss 6.6202   LearningRate 0.0113   Epoch: 26   Global Step: 134330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:15:51,068-Speed 10642.07 samples/sec   Loss 6.5719   LearningRate 0.0113   Epoch: 26   Global Step: 134340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:15:52,040-Speed 10547.90 samples/sec   Loss 6.6104   LearningRate 0.0113   Epoch: 26   Global Step: 134350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:15:53,002-Speed 10661.96 samples/sec   Loss 6.5110   LearningRate 0.0113   Epoch: 26   Global Step: 134360   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:15:53,976-Speed 10514.60 samples/sec   Loss 6.6650   LearningRate 0.0113   Epoch: 26   Global Step: 134370   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:15:54,967-Speed 10338.75 samples/sec   Loss 6.7382   LearningRate 0.0113   Epoch: 26   Global Step: 134380   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:15:55,918-Speed 10779.75 samples/sec   Loss 6.6208   LearningRate 0.0113   Epoch: 26   Global Step: 134390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:15:56,850-Speed 10991.06 samples/sec   Loss 6.6762   LearningRate 0.0113   Epoch: 26   Global Step: 134400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:15:57,848-Speed 10271.49 samples/sec   Loss 6.5684   LearningRate 0.0113   Epoch: 26   Global Step: 134410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:15:58,827-Speed 10469.28 samples/sec   Loss 6.6322   LearningRate 0.0113   Epoch: 26   Global Step: 134420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:15:59,814-Speed 10387.86 samples/sec   Loss 6.5997   LearningRate 0.0113   Epoch: 26   Global Step: 134430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:00,782-Speed 10582.86 samples/sec   Loss 6.7362   LearningRate 0.0113   Epoch: 26   Global Step: 134440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:01,727-Speed 10841.06 samples/sec   Loss 6.6943   LearningRate 0.0113   Epoch: 26   Global Step: 134450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:02,716-Speed 10368.82 samples/sec   Loss 6.6800   LearningRate 0.0112   Epoch: 26   Global Step: 134460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:03,675-Speed 10695.60 samples/sec   Loss 6.6158   LearningRate 0.0112   Epoch: 26   Global Step: 134470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:04,629-Speed 10743.12 samples/sec   Loss 6.6141   LearningRate 0.0112   Epoch: 26   Global Step: 134480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:05,582-Speed 10747.91 samples/sec   Loss 6.5359   LearningRate 0.0112   Epoch: 26   Global Step: 134490   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:16:06,536-Speed 10743.82 samples/sec   Loss 6.6978   LearningRate 0.0112   Epoch: 26   Global Step: 134500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:07,519-Speed 10427.96 samples/sec   Loss 6.6777   LearningRate 0.0112   Epoch: 26   Global Step: 134510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:08,479-Speed 10679.55 samples/sec   Loss 6.6785   LearningRate 0.0112   Epoch: 26   Global Step: 134520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:09,482-Speed 10220.66 samples/sec   Loss 6.5063   LearningRate 0.0112   Epoch: 26   Global Step: 134530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:10,451-Speed 10575.57 samples/sec   Loss 6.4960   LearningRate 0.0112   Epoch: 26   Global Step: 134540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:11,400-Speed 10796.26 samples/sec   Loss 6.7043   LearningRate 0.0112   Epoch: 26   Global Step: 134550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:12,339-Speed 10915.89 samples/sec   Loss 6.5306   LearningRate 0.0112   Epoch: 26   Global Step: 134560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:13,281-Speed 10881.03 samples/sec   Loss 6.6289   LearningRate 0.0112   Epoch: 26   Global Step: 134570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:14,245-Speed 10640.49 samples/sec   Loss 6.6460   LearningRate 0.0112   Epoch: 26   Global Step: 134580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:15,214-Speed 10576.19 samples/sec   Loss 6.6837   LearningRate 0.0112   Epoch: 26   Global Step: 134590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:16,184-Speed 10559.90 samples/sec   Loss 6.7047   LearningRate 0.0112   Epoch: 26   Global Step: 134600   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:16:17,113-Speed 11027.35 samples/sec   Loss 6.6335   LearningRate 0.0112   Epoch: 26   Global Step: 134610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:18,060-Speed 10826.00 samples/sec   Loss 6.5433   LearningRate 0.0112   Epoch: 26   Global Step: 134620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:19,052-Speed 10328.80 samples/sec   Loss 6.5049   LearningRate 0.0112   Epoch: 26   Global Step: 134630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:20,034-Speed 10436.71 samples/sec   Loss 6.5359   LearningRate 0.0112   Epoch: 26   Global Step: 134640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:20,996-Speed 10657.73 samples/sec   Loss 6.5396   LearningRate 0.0112   Epoch: 26   Global Step: 134650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:21,944-Speed 10809.35 samples/sec   Loss 6.6042   LearningRate 0.0112   Epoch: 26   Global Step: 134660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:22,921-Speed 10489.70 samples/sec   Loss 6.7436   LearningRate 0.0112   Epoch: 26   Global Step: 134670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:23,878-Speed 10709.68 samples/sec   Loss 6.7179   LearningRate 0.0112   Epoch: 26   Global Step: 134680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:24,848-Speed 10559.07 samples/sec   Loss 6.7919   LearningRate 0.0112   Epoch: 26   Global Step: 134690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:25,829-Speed 10442.76 samples/sec   Loss 6.4440   LearningRate 0.0112   Epoch: 26   Global Step: 134700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:26,784-Speed 10735.97 samples/sec   Loss 6.6433   LearningRate 0.0112   Epoch: 26   Global Step: 134710   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:16:27,742-Speed 10693.08 samples/sec   Loss 6.6895   LearningRate 0.0112   Epoch: 26   Global Step: 134720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:28,708-Speed 10606.97 samples/sec   Loss 6.6761   LearningRate 0.0112   Epoch: 26   Global Step: 134730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:29,649-Speed 10901.44 samples/sec   Loss 6.6145   LearningRate 0.0112   Epoch: 26   Global Step: 134740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:30,635-Speed 10394.27 samples/sec   Loss 6.5609   LearningRate 0.0112   Epoch: 26   Global Step: 134750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:31,591-Speed 10719.93 samples/sec   Loss 6.5598   LearningRate 0.0112   Epoch: 26   Global Step: 134760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:32,540-Speed 10800.19 samples/sec   Loss 6.6574   LearningRate 0.0111   Epoch: 26   Global Step: 134770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:33,469-Speed 11031.31 samples/sec   Loss 6.6848   LearningRate 0.0111   Epoch: 26   Global Step: 134780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:34,486-Speed 10080.56 samples/sec   Loss 6.5613   LearningRate 0.0111   Epoch: 26   Global Step: 134790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:35,417-Speed 11016.43 samples/sec   Loss 6.5434   LearningRate 0.0111   Epoch: 26   Global Step: 134800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:36,399-Speed 10427.53 samples/sec   Loss 6.5515   LearningRate 0.0111   Epoch: 26   Global Step: 134810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:37,375-Speed 10505.10 samples/sec   Loss 6.7461   LearningRate 0.0111   Epoch: 26   Global Step: 134820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:38,334-Speed 10689.60 samples/sec   Loss 6.6993   LearningRate 0.0111   Epoch: 26   Global Step: 134830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:39,320-Speed 10392.42 samples/sec   Loss 6.6799   LearningRate 0.0111   Epoch: 26   Global Step: 134840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:40,269-Speed 10796.14 samples/sec   Loss 6.6007   LearningRate 0.0111   Epoch: 26   Global Step: 134850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:41,246-Speed 10495.21 samples/sec   Loss 6.4963   LearningRate 0.0111   Epoch: 26   Global Step: 134860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:42,198-Speed 10754.61 samples/sec   Loss 6.5123   LearningRate 0.0111   Epoch: 26   Global Step: 134870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:43,147-Speed 10807.30 samples/sec   Loss 6.6651   LearningRate 0.0111   Epoch: 26   Global Step: 134880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:44,153-Speed 10320.03 samples/sec   Loss 6.5365   LearningRate 0.0111   Epoch: 26   Global Step: 134890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:45,074-Speed 11137.94 samples/sec   Loss 6.6339   LearningRate 0.0111   Epoch: 26   Global Step: 134900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:46,002-Speed 11037.38 samples/sec   Loss 6.5274   LearningRate 0.0111   Epoch: 26   Global Step: 134910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:46,960-Speed 10702.88 samples/sec   Loss 6.6028   LearningRate 0.0111   Epoch: 26   Global Step: 134920   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 04:16:47,955-Speed 10300.52 samples/sec   Loss 6.5785   LearningRate 0.0111   Epoch: 26   Global Step: 134930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:48,885-Speed 11020.30 samples/sec   Loss 6.7575   LearningRate 0.0111   Epoch: 26   Global Step: 134940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:16:49,838-Speed 10756.47 samples/sec   Loss 6.7213   LearningRate 0.0111   Epoch: 26   Global Step: 134950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:16:50,830-Speed 10330.81 samples/sec   Loss 6.8028   LearningRate 0.0111   Epoch: 26   Global Step: 134960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:16:51,787-Speed 10714.87 samples/sec   Loss 6.5647   LearningRate 0.0111   Epoch: 26   Global Step: 134970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:16:52,741-Speed 10742.81 samples/sec   Loss 6.6053   LearningRate 0.0111   Epoch: 26   Global Step: 134980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:16:53,708-Speed 10596.81 samples/sec   Loss 6.5999   LearningRate 0.0111   Epoch: 26   Global Step: 134990   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:16:54,663-Speed 10733.70 samples/sec   Loss 6.6668   LearningRate 0.0111   Epoch: 26   Global Step: 135000   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:16:55,617-Speed 10732.46 samples/sec   Loss 6.6811   LearningRate 0.0111   Epoch: 26   Global Step: 135010   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:16:56,576-Speed 10692.24 samples/sec   Loss 6.3748   LearningRate 0.0111   Epoch: 26   Global Step: 135020   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:16:57,549-Speed 10528.29 samples/sec   Loss 6.6754   LearningRate 0.0111   Epoch: 26   Global Step: 135030   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:16:58,504-Speed 10740.55 samples/sec   Loss 6.4983   LearningRate 0.0111   Epoch: 26   Global Step: 135040   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:16:59,459-Speed 10732.04 samples/sec   Loss 6.6608   LearningRate 0.0111   Epoch: 26   Global Step: 135050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:17:00,443-Speed 10414.87 samples/sec   Loss 6.6308   LearningRate 0.0111   Epoch: 26   Global Step: 135060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:17:01,408-Speed 10618.16 samples/sec   Loss 6.6159   LearningRate 0.0110   Epoch: 26   Global Step: 135070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:17:02,380-Speed 10539.57 samples/sec   Loss 6.6243   LearningRate 0.0110   Epoch: 26   Global Step: 135080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:17:03,352-Speed 10544.32 samples/sec   Loss 6.5881   LearningRate 0.0110   Epoch: 26   Global Step: 135090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:17:04,312-Speed 10673.65 samples/sec   Loss 6.5065   LearningRate 0.0110   Epoch: 26   Global Step: 135100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:17:05,290-Speed 10481.42 samples/sec   Loss 6.6321   LearningRate 0.0110   Epoch: 26   Global Step: 135110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:17:06,246-Speed 10723.46 samples/sec   Loss 6.7267   LearningRate 0.0110   Epoch: 26   Global Step: 135120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:17:07,206-Speed 10670.16 samples/sec   Loss 6.6004   LearningRate 0.0110   Epoch: 26   Global Step: 135130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:17:08,218-Speed 10129.83 samples/sec   Loss 6.7067   LearningRate 0.0110   Epoch: 26   Global Step: 135140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:17:09,177-Speed 10697.03 samples/sec   Loss 6.5700   LearningRate 0.0110   Epoch: 26   Global Step: 135150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:17:10,124-Speed 10818.42 samples/sec   Loss 6.6712   LearningRate 0.0110   Epoch: 26   Global Step: 135160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:17:11,080-Speed 10722.58 samples/sec   Loss 6.6828   LearningRate 0.0110   Epoch: 26   Global Step: 135170   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:17:12,042-Speed 10657.61 samples/sec   Loss 6.6306   LearningRate 0.0110   Epoch: 26   Global Step: 135180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:17:12,998-Speed 10717.60 samples/sec   Loss 6.6679   LearningRate 0.0110   Epoch: 26   Global Step: 135190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:17:13,941-Speed 10871.80 samples/sec   Loss 6.5616   LearningRate 0.0110   Epoch: 26   Global Step: 135200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:17:14,909-Speed 10585.19 samples/sec   Loss 6.6103   LearningRate 0.0110   Epoch: 26   Global Step: 135210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 04:17:15,861-Speed 10764.59 samples/sec   Loss 6.6295   LearningRate 0.0110   Epoch: 26   Global Step: 135220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:17:16,794-Speed 10986.26 samples/sec   Loss 6.5914   LearningRate 0.0110   Epoch: 26   Global Step: 135230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:17:17,752-Speed 10703.25 samples/sec   Loss 6.8868   LearningRate 0.0110   Epoch: 26   Global Step: 135240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:17:18,704-Speed 10762.52 samples/sec   Loss 6.6623   LearningRate 0.0110   Epoch: 26   Global Step: 135250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 04:17:19,639-Speed 10962.41 samples/sec   Loss 6.6573   LearningRate 0.0110   Epoch: 26   Global Step: 135260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:17:20,576-Speed 10940.35 samples/sec   Loss 6.6288   LearningRate 0.0110   Epoch: 26   Global Step: 135270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:17:21,532-Speed 10720.61 samples/sec   Loss 6.6618   LearningRate 0.0110   Epoch: 26   Global Step: 135280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:17:22,483-Speed 10780.44 samples/sec   Loss 6.6083   LearningRate 0.0110   Epoch: 26   Global Step: 135290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:17:23,501-Speed 10065.90 samples/sec   Loss 6.7328   LearningRate 0.0110   Epoch: 26   Global Step: 135300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:17:24,449-Speed 10833.26 samples/sec   Loss 6.7481   LearningRate 0.0110   Epoch: 26   Global Step: 135310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:17:25,411-Speed 10650.19 samples/sec   Loss 6.7269   LearningRate 0.0110   Epoch: 26   Global Step: 135320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:17:26,353-Speed 10877.92 samples/sec   Loss 6.5837   LearningRate 0.0110   Epoch: 26   Global Step: 135330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:17:27,328-Speed 10516.45 samples/sec   Loss 6.6013   LearningRate 0.0110   Epoch: 26   Global Step: 135340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:17:28,301-Speed 10534.71 samples/sec   Loss 6.7603   LearningRate 0.0110   Epoch: 26   Global Step: 135350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:17:29,276-Speed 10513.59 samples/sec   Loss 6.6142   LearningRate 0.0110   Epoch: 26   Global Step: 135360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:17:30,257-Speed 10453.79 samples/sec   Loss 6.6624   LearningRate 0.0110   Epoch: 26   Global Step: 135370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:17:31,234-Speed 10493.59 samples/sec   Loss 6.6600   LearningRate 0.0109   Epoch: 26   Global Step: 135380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:17:32,193-Speed 10680.11 samples/sec   Loss 6.8162   LearningRate 0.0109   Epoch: 26   Global Step: 135390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:17:33,150-Speed 10720.49 samples/sec   Loss 6.6217   LearningRate 0.0109   Epoch: 26   Global Step: 135400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:17:34,097-Speed 10822.91 samples/sec   Loss 6.6081   LearningRate 0.0109   Epoch: 26   Global Step: 135410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:17:35,040-Speed 10861.16 samples/sec   Loss 6.6062   LearningRate 0.0109   Epoch: 26   Global Step: 135420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:17:35,992-Speed 10765.63 samples/sec   Loss 6.5864   LearningRate 0.0109   Epoch: 26   Global Step: 135430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:17:36,936-Speed 10852.91 samples/sec   Loss 6.7212   LearningRate 0.0109   Epoch: 26   Global Step: 135440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:17:37,949-Speed 10118.23 samples/sec   Loss 6.6955   LearningRate 0.0109   Epoch: 26   Global Step: 135450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:17:38,920-Speed 10564.96 samples/sec   Loss 6.7479   LearningRate 0.0109   Epoch: 26   Global Step: 135460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:17:39,870-Speed 10782.59 samples/sec   Loss 6.5052   LearningRate 0.0109   Epoch: 26   Global Step: 135470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:17:40,813-Speed 10866.25 samples/sec   Loss 6.6323   LearningRate 0.0109   Epoch: 26   Global Step: 135480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:17:41,770-Speed 10711.04 samples/sec   Loss 6.5833   LearningRate 0.0109   Epoch: 26   Global Step: 135490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:17:42,740-Speed 10562.98 samples/sec   Loss 6.6835   LearningRate 0.0109   Epoch: 26   Global Step: 135500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:17:43,713-Speed 10570.83 samples/sec   Loss 6.6958   LearningRate 0.0109   Epoch: 26   Global Step: 135510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:17:44,658-Speed 10846.16 samples/sec   Loss 6.5461   LearningRate 0.0109   Epoch: 26   Global Step: 135520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:17:45,606-Speed 10811.29 samples/sec   Loss 6.6545   LearningRate 0.0109   Epoch: 26   Global Step: 135530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:17:46,561-Speed 10728.41 samples/sec   Loss 6.7700   LearningRate 0.0109   Epoch: 26   Global Step: 135540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:17:47,586-Speed 10006.26 samples/sec   Loss 6.6368   LearningRate 0.0109   Epoch: 26   Global Step: 135550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:17:48,558-Speed 10545.22 samples/sec   Loss 6.7553   LearningRate 0.0109   Epoch: 26   Global Step: 135560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:17:49,521-Speed 10639.01 samples/sec   Loss 6.7520   LearningRate 0.0109   Epoch: 26   Global Step: 135570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:17:50,475-Speed 10745.47 samples/sec   Loss 6.7083   LearningRate 0.0109   Epoch: 26   Global Step: 135580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:17:51,416-Speed 10895.19 samples/sec   Loss 6.6728   LearningRate 0.0109   Epoch: 26   Global Step: 135590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:17:52,398-Speed 10437.13 samples/sec   Loss 6.6538   LearningRate 0.0109   Epoch: 26   Global Step: 135600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:17:53,321-Speed 11105.88 samples/sec   Loss 6.6933   LearningRate 0.0109   Epoch: 26   Global Step: 135610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:17:54,275-Speed 10735.29 samples/sec   Loss 6.5815   LearningRate 0.0109   Epoch: 26   Global Step: 135620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:17:55,230-Speed 10735.09 samples/sec   Loss 6.5712   LearningRate 0.0109   Epoch: 26   Global Step: 135630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:17:56,212-Speed 10437.69 samples/sec   Loss 6.7138   LearningRate 0.0109   Epoch: 26   Global Step: 135640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:17:57,166-Speed 10736.34 samples/sec   Loss 6.6261   LearningRate 0.0109   Epoch: 26   Global Step: 135650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:17:58,133-Speed 10605.26 samples/sec   Loss 6.5924   LearningRate 0.0109   Epoch: 26   Global Step: 135660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:17:59,101-Speed 10592.06 samples/sec   Loss 6.6627   LearningRate 0.0109   Epoch: 26   Global Step: 135670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:18:00,107-Speed 10188.64 samples/sec   Loss 6.7597   LearningRate 0.0108   Epoch: 26   Global Step: 135680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:18:01,037-Speed 11036.20 samples/sec   Loss 6.7458   LearningRate 0.0108   Epoch: 26   Global Step: 135690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:18:02,034-Speed 10272.83 samples/sec   Loss 6.6268   LearningRate 0.0108   Epoch: 26   Global Step: 135700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:18:02,995-Speed 10679.70 samples/sec   Loss 6.6258   LearningRate 0.0108   Epoch: 26   Global Step: 135710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:18:03,955-Speed 10680.33 samples/sec   Loss 6.6646   LearningRate 0.0108   Epoch: 26   Global Step: 135720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:18:04,921-Speed 10608.04 samples/sec   Loss 6.6901   LearningRate 0.0108   Epoch: 26   Global Step: 135730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:18:05,868-Speed 10817.20 samples/sec   Loss 6.6705   LearningRate 0.0108   Epoch: 26   Global Step: 135740   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:18:06,812-Speed 10852.35 samples/sec   Loss 6.7005   LearningRate 0.0108   Epoch: 26   Global Step: 135750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:18:07,753-Speed 10887.46 samples/sec   Loss 6.8015   LearningRate 0.0108   Epoch: 26   Global Step: 135760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:18:08,721-Speed 10587.73 samples/sec   Loss 6.7095   LearningRate 0.0108   Epoch: 26   Global Step: 135770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:18:09,710-Speed 10359.69 samples/sec   Loss 6.5810   LearningRate 0.0108   Epoch: 26   Global Step: 135780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:18:10,675-Speed 10632.85 samples/sec   Loss 6.5817   LearningRate 0.0108   Epoch: 26   Global Step: 135790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:18:11,654-Speed 10463.58 samples/sec   Loss 6.7599   LearningRate 0.0108   Epoch: 26   Global Step: 135800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:18:12,560-Speed 11306.74 samples/sec   Loss 6.7945   LearningRate 0.0108   Epoch: 26   Global Step: 135810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:18:13,501-Speed 10898.42 samples/sec   Loss 6.6848   LearningRate 0.0108   Epoch: 26   Global Step: 135820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:18:14,487-Speed 10390.04 samples/sec   Loss 6.7680   LearningRate 0.0108   Epoch: 26   Global Step: 135830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:18:15,485-Speed 10277.88 samples/sec   Loss 6.5637   LearningRate 0.0108   Epoch: 26   Global Step: 135840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:18:16,443-Speed 10696.79 samples/sec   Loss 6.7264   LearningRate 0.0108   Epoch: 26   Global Step: 135850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:18:17,400-Speed 10711.14 samples/sec   Loss 6.7266   LearningRate 0.0108   Epoch: 26   Global Step: 135860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:18:18,346-Speed 10827.81 samples/sec   Loss 6.6043   LearningRate 0.0108   Epoch: 26   Global Step: 135870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:18:19,304-Speed 10707.68 samples/sec   Loss 6.5801   LearningRate 0.0108   Epoch: 26   Global Step: 135880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:18:20,280-Speed 10497.31 samples/sec   Loss 6.5601   LearningRate 0.0108   Epoch: 26   Global Step: 135890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:18:21,227-Speed 10820.74 samples/sec   Loss 6.7225   LearningRate 0.0108   Epoch: 26   Global Step: 135900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:18:22,216-Speed 10367.61 samples/sec   Loss 6.6595   LearningRate 0.0108   Epoch: 26   Global Step: 135910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:18:23,244-Speed 9969.94 samples/sec   Loss 6.6539   LearningRate 0.0108   Epoch: 26   Global Step: 135920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:18:24,219-Speed 10507.79 samples/sec   Loss 6.6903   LearningRate 0.0108   Epoch: 26   Global Step: 135930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:18:25,140-Speed 11125.01 samples/sec   Loss 6.7104   LearningRate 0.0108   Epoch: 26   Global Step: 135940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:18:26,082-Speed 10883.08 samples/sec   Loss 6.7545   LearningRate 0.0108   Epoch: 26   Global Step: 135950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:18:27,012-Speed 11025.11 samples/sec   Loss 6.4117   LearningRate 0.0108   Epoch: 26   Global Step: 135960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:18:28,011-Speed 10255.59 samples/sec   Loss 6.6174   LearningRate 0.0108   Epoch: 26   Global Step: 135970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:18:28,979-Speed 10616.54 samples/sec   Loss 6.7193   LearningRate 0.0108   Epoch: 26   Global Step: 135980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:18:29,931-Speed 10768.61 samples/sec   Loss 6.4801   LearningRate 0.0107   Epoch: 26   Global Step: 135990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:18:30,930-Speed 10261.42 samples/sec   Loss 6.7333   LearningRate 0.0107   Epoch: 26   Global Step: 136000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:18:53,198-[lfw][136000]XNorm: 9.181935
Training: 2022-04-11 04:18:53,199-[lfw][136000]Accuracy-Flip: 0.99650+-0.00302
Training: 2022-04-11 04:18:53,200-[lfw][136000]Accuracy-Highest: 0.99700
Training: 2022-04-11 04:19:18,709-[cfp_fp][136000]XNorm: 7.882385
Training: 2022-04-11 04:19:18,710-[cfp_fp][136000]Accuracy-Flip: 0.96643+-0.00955
Training: 2022-04-11 04:19:18,711-[cfp_fp][136000]Accuracy-Highest: 0.96800
Training: 2022-04-11 04:19:40,838-[agedb_30][136000]XNorm: 8.956538
Training: 2022-04-11 04:19:40,838-[agedb_30][136000]Accuracy-Flip: 0.97017+-0.00689
Training: 2022-04-11 04:19:40,839-[agedb_30][136000]Accuracy-Highest: 0.97050
Training: 2022-04-11 04:19:41,836-Speed 144.42 samples/sec   Loss 6.5851   LearningRate 0.0107   Epoch: 26   Global Step: 136010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:19:42,811-Speed 10514.42 samples/sec   Loss 6.7765   LearningRate 0.0107   Epoch: 26   Global Step: 136020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:19:43,778-Speed 10591.63 samples/sec   Loss 6.6777   LearningRate 0.0107   Epoch: 26   Global Step: 136030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:19:44,741-Speed 10650.80 samples/sec   Loss 6.5821   LearningRate 0.0107   Epoch: 26   Global Step: 136040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:19:45,696-Speed 10721.81 samples/sec   Loss 6.7219   LearningRate 0.0107   Epoch: 26   Global Step: 136050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:19:46,652-Speed 10718.40 samples/sec   Loss 6.7157   LearningRate 0.0107   Epoch: 26   Global Step: 136060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:19:47,607-Speed 10738.16 samples/sec   Loss 6.7581   LearningRate 0.0107   Epoch: 26   Global Step: 136070   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:19:48,555-Speed 10806.76 samples/sec   Loss 6.6362   LearningRate 0.0107   Epoch: 26   Global Step: 136080   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:19:49,508-Speed 10753.41 samples/sec   Loss 6.7170   LearningRate 0.0107   Epoch: 26   Global Step: 136090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:19:50,500-Speed 10342.97 samples/sec   Loss 6.6584   LearningRate 0.0107   Epoch: 26   Global Step: 136100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:19:51,447-Speed 10820.32 samples/sec   Loss 6.7316   LearningRate 0.0107   Epoch: 26   Global Step: 136110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:19:52,414-Speed 10593.37 samples/sec   Loss 6.8241   LearningRate 0.0107   Epoch: 26   Global Step: 136120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:19:53,373-Speed 10682.10 samples/sec   Loss 6.6964   LearningRate 0.0107   Epoch: 26   Global Step: 136130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:19:54,357-Speed 10423.44 samples/sec   Loss 6.6519   LearningRate 0.0107   Epoch: 26   Global Step: 136140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:19:55,284-Speed 11052.08 samples/sec   Loss 6.6001   LearningRate 0.0107   Epoch: 26   Global Step: 136150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:19:56,240-Speed 10725.16 samples/sec   Loss 6.5527   LearningRate 0.0107   Epoch: 26   Global Step: 136160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:19:57,215-Speed 10511.46 samples/sec   Loss 6.6537   LearningRate 0.0107   Epoch: 26   Global Step: 136170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:19:58,196-Speed 10449.19 samples/sec   Loss 6.5549   LearningRate 0.0107   Epoch: 26   Global Step: 136180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:19:59,155-Speed 10690.17 samples/sec   Loss 6.6502   LearningRate 0.0107   Epoch: 26   Global Step: 136190   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:20:00,106-Speed 10779.26 samples/sec   Loss 6.7119   LearningRate 0.0107   Epoch: 26   Global Step: 136200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:20:01,053-Speed 10820.74 samples/sec   Loss 6.7093   LearningRate 0.0107   Epoch: 26   Global Step: 136210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:20:02,003-Speed 10784.00 samples/sec   Loss 6.7075   LearningRate 0.0107   Epoch: 26   Global Step: 136220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:20:02,978-Speed 10516.94 samples/sec   Loss 6.7143   LearningRate 0.0107   Epoch: 26   Global Step: 136230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:20:03,916-Speed 10929.10 samples/sec   Loss 6.6357   LearningRate 0.0107   Epoch: 26   Global Step: 136240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:20:04,881-Speed 10613.63 samples/sec   Loss 6.7076   LearningRate 0.0107   Epoch: 26   Global Step: 136250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:20:05,816-Speed 10960.12 samples/sec   Loss 6.6451   LearningRate 0.0107   Epoch: 26   Global Step: 136260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:20:06,774-Speed 10700.78 samples/sec   Loss 6.7393   LearningRate 0.0107   Epoch: 26   Global Step: 136270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:20:07,705-Speed 11015.94 samples/sec   Loss 6.5925   LearningRate 0.0107   Epoch: 26   Global Step: 136280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:20:08,675-Speed 10569.94 samples/sec   Loss 6.7585   LearningRate 0.0107   Epoch: 26   Global Step: 136290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:20:09,595-Speed 11138.40 samples/sec   Loss 6.6986   LearningRate 0.0106   Epoch: 26   Global Step: 136300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:20:10,580-Speed 10407.11 samples/sec   Loss 6.7384   LearningRate 0.0106   Epoch: 26   Global Step: 136310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:20:11,558-Speed 10481.24 samples/sec   Loss 6.8828   LearningRate 0.0106   Epoch: 26   Global Step: 136320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:20:12,520-Speed 10653.59 samples/sec   Loss 6.7295   LearningRate 0.0106   Epoch: 26   Global Step: 136330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:20:13,483-Speed 10646.18 samples/sec   Loss 6.5670   LearningRate 0.0106   Epoch: 26   Global Step: 136340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:20:14,435-Speed 10781.83 samples/sec   Loss 6.6958   LearningRate 0.0106   Epoch: 26   Global Step: 136350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:20:15,381-Speed 10836.23 samples/sec   Loss 6.7631   LearningRate 0.0106   Epoch: 26   Global Step: 136360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:20:16,338-Speed 10702.74 samples/sec   Loss 6.5835   LearningRate 0.0106   Epoch: 26   Global Step: 136370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:20:17,298-Speed 10693.35 samples/sec   Loss 6.7243   LearningRate 0.0106   Epoch: 26   Global Step: 136380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:20:18,251-Speed 10749.89 samples/sec   Loss 6.7667   LearningRate 0.0106   Epoch: 26   Global Step: 136390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:20:19,215-Speed 10639.94 samples/sec   Loss 6.6443   LearningRate 0.0106   Epoch: 26   Global Step: 136400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:20:20,188-Speed 10533.23 samples/sec   Loss 6.5671   LearningRate 0.0106   Epoch: 26   Global Step: 136410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:20:21,141-Speed 10750.56 samples/sec   Loss 6.5935   LearningRate 0.0106   Epoch: 26   Global Step: 136420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:20:22,138-Speed 10275.60 samples/sec   Loss 6.7082   LearningRate 0.0106   Epoch: 26   Global Step: 136430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:20:23,091-Speed 10762.07 samples/sec   Loss 6.6381   LearningRate 0.0106   Epoch: 26   Global Step: 136440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:20:24,037-Speed 10823.51 samples/sec   Loss 6.5657   LearningRate 0.0106   Epoch: 26   Global Step: 136450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:20:24,987-Speed 10786.67 samples/sec   Loss 6.6520   LearningRate 0.0106   Epoch: 26   Global Step: 136460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:20:25,923-Speed 10954.49 samples/sec   Loss 6.6470   LearningRate 0.0106   Epoch: 26   Global Step: 136470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:20:26,905-Speed 10432.74 samples/sec   Loss 6.7136   LearningRate 0.0106   Epoch: 26   Global Step: 136480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:20:27,887-Speed 10438.65 samples/sec   Loss 6.5776   LearningRate 0.0106   Epoch: 26   Global Step: 136490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:20:28,871-Speed 10411.66 samples/sec   Loss 6.5898   LearningRate 0.0106   Epoch: 26   Global Step: 136500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:20:29,879-Speed 10169.56 samples/sec   Loss 6.6595   LearningRate 0.0106   Epoch: 26   Global Step: 136510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:20:30,851-Speed 10540.38 samples/sec   Loss 6.6296   LearningRate 0.0106   Epoch: 26   Global Step: 136520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:20:31,821-Speed 10559.65 samples/sec   Loss 6.4474   LearningRate 0.0106   Epoch: 26   Global Step: 136530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:20:32,812-Speed 10351.06 samples/sec   Loss 6.6326   LearningRate 0.0106   Epoch: 26   Global Step: 136540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:20:33,812-Speed 10251.98 samples/sec   Loss 6.7660   LearningRate 0.0106   Epoch: 26   Global Step: 136550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:20:34,869-Speed 9696.44 samples/sec   Loss 6.6903   LearningRate 0.0106   Epoch: 26   Global Step: 136560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:20:46,714-Speed 864.62 samples/sec   Loss 6.4170   LearningRate 0.0106   Epoch: 27   Global Step: 136570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:20:47,790-Speed 9525.92 samples/sec   Loss 5.8597   LearningRate 0.0106   Epoch: 27   Global Step: 136580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:20:48,801-Speed 10132.15 samples/sec   Loss 5.9902   LearningRate 0.0106   Epoch: 27   Global Step: 136590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:20:49,802-Speed 10239.90 samples/sec   Loss 6.0324   LearningRate 0.0106   Epoch: 27   Global Step: 136600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:20:50,841-Speed 9861.52 samples/sec   Loss 5.9300   LearningRate 0.0105   Epoch: 27   Global Step: 136610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:20:51,975-Speed 9035.37 samples/sec   Loss 5.9089   LearningRate 0.0105   Epoch: 27   Global Step: 136620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:20:52,959-Speed 10414.97 samples/sec   Loss 5.8435   LearningRate 0.0105   Epoch: 27   Global Step: 136630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:20:53,944-Speed 10406.36 samples/sec   Loss 6.0204   LearningRate 0.0105   Epoch: 27   Global Step: 136640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:20:54,915-Speed 10557.78 samples/sec   Loss 5.8631   LearningRate 0.0105   Epoch: 27   Global Step: 136650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:20:55,885-Speed 10575.81 samples/sec   Loss 5.9577   LearningRate 0.0105   Epoch: 27   Global Step: 136660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:20:56,842-Speed 10707.42 samples/sec   Loss 5.8513   LearningRate 0.0105   Epoch: 27   Global Step: 136670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:20:57,764-Speed 11105.15 samples/sec   Loss 5.9300   LearningRate 0.0105   Epoch: 27   Global Step: 136680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:20:58,738-Speed 10528.40 samples/sec   Loss 5.9471   LearningRate 0.0105   Epoch: 27   Global Step: 136690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:20:59,679-Speed 10884.53 samples/sec   Loss 6.0342   LearningRate 0.0105   Epoch: 27   Global Step: 136700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:21:00,711-Speed 9927.94 samples/sec   Loss 5.7872   LearningRate 0.0105   Epoch: 27   Global Step: 136710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:21:01,683-Speed 10547.32 samples/sec   Loss 6.0697   LearningRate 0.0105   Epoch: 27   Global Step: 136720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:21:02,625-Speed 10888.42 samples/sec   Loss 6.0425   LearningRate 0.0105   Epoch: 27   Global Step: 136730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:21:03,592-Speed 10597.13 samples/sec   Loss 5.9879   LearningRate 0.0105   Epoch: 27   Global Step: 136740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:21:04,550-Speed 10694.49 samples/sec   Loss 5.9868   LearningRate 0.0105   Epoch: 27   Global Step: 136750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:21:05,503-Speed 10763.57 samples/sec   Loss 6.0951   LearningRate 0.0105   Epoch: 27   Global Step: 136760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:21:06,434-Speed 11017.02 samples/sec   Loss 6.0823   LearningRate 0.0105   Epoch: 27   Global Step: 136770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:21:07,377-Speed 10873.74 samples/sec   Loss 5.8654   LearningRate 0.0105   Epoch: 27   Global Step: 136780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:21:08,338-Speed 10664.41 samples/sec   Loss 6.1277   LearningRate 0.0105   Epoch: 27   Global Step: 136790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:21:09,324-Speed 10390.37 samples/sec   Loss 6.0382   LearningRate 0.0105   Epoch: 27   Global Step: 136800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:21:10,450-Speed 9099.97 samples/sec   Loss 5.8792   LearningRate 0.0105   Epoch: 27   Global Step: 136810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:21:11,413-Speed 10647.20 samples/sec   Loss 5.9396   LearningRate 0.0105   Epoch: 27   Global Step: 136820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:21:12,376-Speed 10649.96 samples/sec   Loss 6.0184   LearningRate 0.0105   Epoch: 27   Global Step: 136830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:21:13,377-Speed 10228.57 samples/sec   Loss 6.1381   LearningRate 0.0105   Epoch: 27   Global Step: 136840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:21:14,353-Speed 10500.47 samples/sec   Loss 5.9839   LearningRate 0.0105   Epoch: 27   Global Step: 136850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:21:15,301-Speed 10818.61 samples/sec   Loss 6.0458   LearningRate 0.0105   Epoch: 27   Global Step: 136860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:21:16,263-Speed 10644.41 samples/sec   Loss 5.9442   LearningRate 0.0105   Epoch: 27   Global Step: 136870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:21:17,230-Speed 10602.70 samples/sec   Loss 6.0253   LearningRate 0.0105   Epoch: 27   Global Step: 136880   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:21:18,209-Speed 10472.41 samples/sec   Loss 6.1284   LearningRate 0.0105   Epoch: 27   Global Step: 136890   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:21:19,154-Speed 10846.85 samples/sec   Loss 6.0606   LearningRate 0.0105   Epoch: 27   Global Step: 136900   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:21:20,106-Speed 10770.24 samples/sec   Loss 5.9713   LearningRate 0.0105   Epoch: 27   Global Step: 136910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:21:21,018-Speed 11235.05 samples/sec   Loss 5.9934   LearningRate 0.0104   Epoch: 27   Global Step: 136920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:21:22,018-Speed 10244.65 samples/sec   Loss 6.0094   LearningRate 0.0104   Epoch: 27   Global Step: 136930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:21:22,987-Speed 10580.65 samples/sec   Loss 6.0867   LearningRate 0.0104   Epoch: 27   Global Step: 136940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:21:23,939-Speed 10763.67 samples/sec   Loss 5.9533   LearningRate 0.0104   Epoch: 27   Global Step: 136950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:21:24,915-Speed 10506.89 samples/sec   Loss 6.1500   LearningRate 0.0104   Epoch: 27   Global Step: 136960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:21:25,859-Speed 10852.57 samples/sec   Loss 6.0484   LearningRate 0.0104   Epoch: 27   Global Step: 136970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:21:26,823-Speed 10625.29 samples/sec   Loss 6.1176   LearningRate 0.0104   Epoch: 27   Global Step: 136980   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:21:27,753-Speed 11024.36 samples/sec   Loss 5.9992   LearningRate 0.0104   Epoch: 27   Global Step: 136990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:21:28,684-Speed 11007.10 samples/sec   Loss 6.1686   LearningRate 0.0104   Epoch: 27   Global Step: 137000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:21:29,685-Speed 10243.76 samples/sec   Loss 6.0609   LearningRate 0.0104   Epoch: 27   Global Step: 137010   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:21:30,634-Speed 10796.99 samples/sec   Loss 6.0228   LearningRate 0.0104   Epoch: 27   Global Step: 137020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:21:31,599-Speed 10617.14 samples/sec   Loss 6.1198   LearningRate 0.0104   Epoch: 27   Global Step: 137030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:21:32,544-Speed 10844.33 samples/sec   Loss 6.1478   LearningRate 0.0104   Epoch: 27   Global Step: 137040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:21:33,535-Speed 10361.59 samples/sec   Loss 5.8932   LearningRate 0.0104   Epoch: 27   Global Step: 137050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:21:34,482-Speed 10821.28 samples/sec   Loss 6.1847   LearningRate 0.0104   Epoch: 27   Global Step: 137060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:21:35,423-Speed 10892.99 samples/sec   Loss 6.1314   LearningRate 0.0104   Epoch: 27   Global Step: 137070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:21:36,372-Speed 10793.60 samples/sec   Loss 6.0344   LearningRate 0.0104   Epoch: 27   Global Step: 137080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:21:37,316-Speed 10853.60 samples/sec   Loss 6.0430   LearningRate 0.0104   Epoch: 27   Global Step: 137090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:21:38,321-Speed 10201.75 samples/sec   Loss 6.1545   LearningRate 0.0104   Epoch: 27   Global Step: 137100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:21:39,275-Speed 10742.17 samples/sec   Loss 6.0803   LearningRate 0.0104   Epoch: 27   Global Step: 137110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:21:40,249-Speed 10527.75 samples/sec   Loss 6.2068   LearningRate 0.0104   Epoch: 27   Global Step: 137120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:21:41,227-Speed 10476.99 samples/sec   Loss 6.1128   LearningRate 0.0104   Epoch: 27   Global Step: 137130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:21:42,186-Speed 10694.02 samples/sec   Loss 6.0853   LearningRate 0.0104   Epoch: 27   Global Step: 137140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:21:43,137-Speed 10780.59 samples/sec   Loss 6.1815   LearningRate 0.0104   Epoch: 27   Global Step: 137150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:21:44,088-Speed 10769.41 samples/sec   Loss 6.1517   LearningRate 0.0104   Epoch: 27   Global Step: 137160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:21:45,064-Speed 10504.23 samples/sec   Loss 6.0600   LearningRate 0.0104   Epoch: 27   Global Step: 137170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:21:46,048-Speed 10414.43 samples/sec   Loss 6.2256   LearningRate 0.0104   Epoch: 27   Global Step: 137180   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:21:47,004-Speed 10721.02 samples/sec   Loss 6.1185   LearningRate 0.0104   Epoch: 27   Global Step: 137190   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:21:47,969-Speed 10627.38 samples/sec   Loss 6.1568   LearningRate 0.0104   Epoch: 27   Global Step: 137200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:21:48,922-Speed 10748.98 samples/sec   Loss 6.0380   LearningRate 0.0104   Epoch: 27   Global Step: 137210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:21:49,893-Speed 10557.09 samples/sec   Loss 6.0384   LearningRate 0.0104   Epoch: 27   Global Step: 137220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:21:50,846-Speed 10751.14 samples/sec   Loss 6.0901   LearningRate 0.0104   Epoch: 27   Global Step: 137230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:21:51,838-Speed 10332.74 samples/sec   Loss 6.2654   LearningRate 0.0103   Epoch: 27   Global Step: 137240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:21:52,812-Speed 10521.31 samples/sec   Loss 6.1068   LearningRate 0.0103   Epoch: 27   Global Step: 137250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:21:53,757-Speed 10849.08 samples/sec   Loss 6.1921   LearningRate 0.0103   Epoch: 27   Global Step: 137260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:21:54,703-Speed 10832.86 samples/sec   Loss 6.0355   LearningRate 0.0103   Epoch: 27   Global Step: 137270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:21:55,693-Speed 10360.07 samples/sec   Loss 6.2404   LearningRate 0.0103   Epoch: 27   Global Step: 137280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:21:56,659-Speed 10606.26 samples/sec   Loss 6.0218   LearningRate 0.0103   Epoch: 27   Global Step: 137290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:21:57,636-Speed 10488.32 samples/sec   Loss 6.1549   LearningRate 0.0103   Epoch: 27   Global Step: 137300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:21:58,608-Speed 10555.29 samples/sec   Loss 6.0478   LearningRate 0.0103   Epoch: 27   Global Step: 137310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:21:59,553-Speed 10839.05 samples/sec   Loss 6.2543   LearningRate 0.0103   Epoch: 27   Global Step: 137320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:22:00,510-Speed 10718.13 samples/sec   Loss 6.2151   LearningRate 0.0103   Epoch: 27   Global Step: 137330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:22:01,483-Speed 10533.06 samples/sec   Loss 6.0984   LearningRate 0.0103   Epoch: 27   Global Step: 137340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:22:02,454-Speed 10551.61 samples/sec   Loss 6.1018   LearningRate 0.0103   Epoch: 27   Global Step: 137350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:22:03,404-Speed 10794.81 samples/sec   Loss 6.0561   LearningRate 0.0103   Epoch: 27   Global Step: 137360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:22:04,365-Speed 10656.07 samples/sec   Loss 6.0643   LearningRate 0.0103   Epoch: 27   Global Step: 137370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:22:05,362-Speed 10278.53 samples/sec   Loss 6.1597   LearningRate 0.0103   Epoch: 27   Global Step: 137380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:22:06,318-Speed 10736.48 samples/sec   Loss 6.1572   LearningRate 0.0103   Epoch: 27   Global Step: 137390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:22:07,264-Speed 10836.20 samples/sec   Loss 6.2766   LearningRate 0.0103   Epoch: 27   Global Step: 137400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:22:08,208-Speed 10856.76 samples/sec   Loss 6.1494   LearningRate 0.0103   Epoch: 27   Global Step: 137410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:22:09,177-Speed 10573.04 samples/sec   Loss 6.2386   LearningRate 0.0103   Epoch: 27   Global Step: 137420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:22:10,126-Speed 10793.54 samples/sec   Loss 6.2377   LearningRate 0.0103   Epoch: 27   Global Step: 137430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:22:11,102-Speed 10497.18 samples/sec   Loss 6.2424   LearningRate 0.0103   Epoch: 27   Global Step: 137440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:22:12,018-Speed 11197.65 samples/sec   Loss 6.0946   LearningRate 0.0103   Epoch: 27   Global Step: 137450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:22:12,982-Speed 10634.53 samples/sec   Loss 6.1972   LearningRate 0.0103   Epoch: 27   Global Step: 137460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:22:13,904-Speed 11115.02 samples/sec   Loss 5.9982   LearningRate 0.0103   Epoch: 27   Global Step: 137470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:22:14,847-Speed 10856.39 samples/sec   Loss 6.2496   LearningRate 0.0103   Epoch: 27   Global Step: 137480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:22:15,824-Speed 10491.49 samples/sec   Loss 6.1000   LearningRate 0.0103   Epoch: 27   Global Step: 137490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:22:16,784-Speed 10678.99 samples/sec   Loss 6.1039   LearningRate 0.0103   Epoch: 27   Global Step: 137500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:22:17,757-Speed 10528.54 samples/sec   Loss 6.1800   LearningRate 0.0103   Epoch: 27   Global Step: 137510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:22:18,702-Speed 10850.49 samples/sec   Loss 6.1677   LearningRate 0.0103   Epoch: 27   Global Step: 137520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:22:19,678-Speed 10501.75 samples/sec   Loss 6.2449   LearningRate 0.0103   Epoch: 27   Global Step: 137530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:22:20,618-Speed 10900.32 samples/sec   Loss 6.2120   LearningRate 0.0103   Epoch: 27   Global Step: 137540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:22:21,598-Speed 10452.85 samples/sec   Loss 6.0352   LearningRate 0.0102   Epoch: 27   Global Step: 137550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:22:22,525-Speed 11083.76 samples/sec   Loss 6.2432   LearningRate 0.0102   Epoch: 27   Global Step: 137560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:22:23,481-Speed 10716.55 samples/sec   Loss 6.2312   LearningRate 0.0102   Epoch: 27   Global Step: 137570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:22:24,443-Speed 10659.53 samples/sec   Loss 6.1616   LearningRate 0.0102   Epoch: 27   Global Step: 137580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:22:25,385-Speed 10890.06 samples/sec   Loss 6.2491   LearningRate 0.0102   Epoch: 27   Global Step: 137590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:22:26,352-Speed 10598.99 samples/sec   Loss 6.3027   LearningRate 0.0102   Epoch: 27   Global Step: 137600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:22:27,315-Speed 10647.39 samples/sec   Loss 6.1802   LearningRate 0.0102   Epoch: 27   Global Step: 137610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:22:28,305-Speed 10346.60 samples/sec   Loss 6.1581   LearningRate 0.0102   Epoch: 27   Global Step: 137620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:22:29,259-Speed 10759.52 samples/sec   Loss 6.4219   LearningRate 0.0102   Epoch: 27   Global Step: 137630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:22:30,211-Speed 10759.08 samples/sec   Loss 6.2908   LearningRate 0.0102   Epoch: 27   Global Step: 137640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:22:31,191-Speed 10469.84 samples/sec   Loss 6.2459   LearningRate 0.0102   Epoch: 27   Global Step: 137650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:22:32,153-Speed 10646.61 samples/sec   Loss 6.3393   LearningRate 0.0102   Epoch: 27   Global Step: 137660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:22:33,137-Speed 10414.35 samples/sec   Loss 6.2012   LearningRate 0.0102   Epoch: 27   Global Step: 137670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:22:34,127-Speed 10349.66 samples/sec   Loss 6.2602   LearningRate 0.0102   Epoch: 27   Global Step: 137680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:22:35,078-Speed 10783.33 samples/sec   Loss 6.2948   LearningRate 0.0102   Epoch: 27   Global Step: 137690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:22:36,019-Speed 10895.04 samples/sec   Loss 6.1531   LearningRate 0.0102   Epoch: 27   Global Step: 137700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:22:36,983-Speed 10631.32 samples/sec   Loss 6.1728   LearningRate 0.0102   Epoch: 27   Global Step: 137710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:22:37,962-Speed 10461.34 samples/sec   Loss 6.2442   LearningRate 0.0102   Epoch: 27   Global Step: 137720   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:22:38,946-Speed 10426.91 samples/sec   Loss 6.2656   LearningRate 0.0102   Epoch: 27   Global Step: 137730   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:22:39,923-Speed 10490.70 samples/sec   Loss 6.2620   LearningRate 0.0102   Epoch: 27   Global Step: 137740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:22:40,869-Speed 10834.58 samples/sec   Loss 6.2380   LearningRate 0.0102   Epoch: 27   Global Step: 137750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:22:41,825-Speed 10709.87 samples/sec   Loss 6.2370   LearningRate 0.0102   Epoch: 27   Global Step: 137760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:22:42,771-Speed 10840.74 samples/sec   Loss 6.2096   LearningRate 0.0102   Epoch: 27   Global Step: 137770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:22:43,745-Speed 10521.20 samples/sec   Loss 6.3269   LearningRate 0.0102   Epoch: 27   Global Step: 137780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:22:44,706-Speed 10666.08 samples/sec   Loss 6.1303   LearningRate 0.0102   Epoch: 27   Global Step: 137790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:22:45,633-Speed 11054.58 samples/sec   Loss 6.1042   LearningRate 0.0102   Epoch: 27   Global Step: 137800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:22:46,573-Speed 10903.19 samples/sec   Loss 6.2284   LearningRate 0.0102   Epoch: 27   Global Step: 137810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:22:47,534-Speed 10659.34 samples/sec   Loss 6.2649   LearningRate 0.0102   Epoch: 27   Global Step: 137820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:22:48,569-Speed 9901.50 samples/sec   Loss 6.3186   LearningRate 0.0102   Epoch: 27   Global Step: 137830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:22:49,513-Speed 10862.48 samples/sec   Loss 6.3156   LearningRate 0.0102   Epoch: 27   Global Step: 137840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:22:50,474-Speed 10661.70 samples/sec   Loss 6.1813   LearningRate 0.0102   Epoch: 27   Global Step: 137850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:22:51,419-Speed 10845.66 samples/sec   Loss 6.2843   LearningRate 0.0102   Epoch: 27   Global Step: 137860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:22:52,387-Speed 10591.06 samples/sec   Loss 6.3317   LearningRate 0.0101   Epoch: 27   Global Step: 137870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:22:53,362-Speed 10518.15 samples/sec   Loss 6.1592   LearningRate 0.0101   Epoch: 27   Global Step: 137880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:22:54,319-Speed 10709.73 samples/sec   Loss 6.3151   LearningRate 0.0101   Epoch: 27   Global Step: 137890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:22:55,264-Speed 10847.75 samples/sec   Loss 6.1732   LearningRate 0.0101   Epoch: 27   Global Step: 137900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:22:56,223-Speed 10681.29 samples/sec   Loss 6.1219   LearningRate 0.0101   Epoch: 27   Global Step: 137910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:22:57,198-Speed 10513.27 samples/sec   Loss 6.3174   LearningRate 0.0101   Epoch: 27   Global Step: 137920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:22:58,128-Speed 11025.87 samples/sec   Loss 6.2104   LearningRate 0.0101   Epoch: 27   Global Step: 137930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:22:59,091-Speed 10639.92 samples/sec   Loss 6.3066   LearningRate 0.0101   Epoch: 27   Global Step: 137940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:23:00,067-Speed 10510.81 samples/sec   Loss 6.2198   LearningRate 0.0101   Epoch: 27   Global Step: 137950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:23:01,009-Speed 10896.00 samples/sec   Loss 6.3219   LearningRate 0.0101   Epoch: 27   Global Step: 137960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:23:01,985-Speed 10491.84 samples/sec   Loss 6.1940   LearningRate 0.0101   Epoch: 27   Global Step: 137970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:23:02,931-Speed 10835.56 samples/sec   Loss 6.3773   LearningRate 0.0101   Epoch: 27   Global Step: 137980   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:23:03,934-Speed 10218.84 samples/sec   Loss 6.2052   LearningRate 0.0101   Epoch: 27   Global Step: 137990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:23:04,902-Speed 10596.30 samples/sec   Loss 6.3749   LearningRate 0.0101   Epoch: 27   Global Step: 138000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:23:27,250-[lfw][138000]XNorm: 9.161348
Training: 2022-04-11 04:23:27,251-[lfw][138000]Accuracy-Flip: 0.99617+-0.00350
Training: 2022-04-11 04:23:27,252-[lfw][138000]Accuracy-Highest: 0.99700
Training: 2022-04-11 04:23:53,036-[cfp_fp][138000]XNorm: 7.844413
Training: 2022-04-11 04:23:53,037-[cfp_fp][138000]Accuracy-Flip: 0.96657+-0.01033
Training: 2022-04-11 04:23:53,038-[cfp_fp][138000]Accuracy-Highest: 0.96800
Training: 2022-04-11 04:24:15,305-[agedb_30][138000]XNorm: 8.927129
Training: 2022-04-11 04:24:15,306-[agedb_30][138000]Accuracy-Flip: 0.96583+-0.00834
Training: 2022-04-11 04:24:15,307-[agedb_30][138000]Accuracy-Highest: 0.97050
Training: 2022-04-11 04:24:16,260-Speed 143.50 samples/sec   Loss 6.2118   LearningRate 0.0101   Epoch: 27   Global Step: 138010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:24:17,203-Speed 10868.00 samples/sec   Loss 6.2592   LearningRate 0.0101   Epoch: 27   Global Step: 138020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:24:18,163-Speed 10684.52 samples/sec   Loss 6.3039   LearningRate 0.0101   Epoch: 27   Global Step: 138030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:24:19,110-Speed 10812.94 samples/sec   Loss 6.3041   LearningRate 0.0101   Epoch: 27   Global Step: 138040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:24:20,076-Speed 10612.45 samples/sec   Loss 6.2670   LearningRate 0.0101   Epoch: 27   Global Step: 138050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:24:21,040-Speed 10628.11 samples/sec   Loss 6.3749   LearningRate 0.0101   Epoch: 27   Global Step: 138060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:24:22,003-Speed 10646.99 samples/sec   Loss 6.2038   LearningRate 0.0101   Epoch: 27   Global Step: 138070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:24:22,955-Speed 10766.33 samples/sec   Loss 6.2130   LearningRate 0.0101   Epoch: 27   Global Step: 138080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:24:23,929-Speed 10528.53 samples/sec   Loss 6.2575   LearningRate 0.0101   Epoch: 27   Global Step: 138090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:24:24,878-Speed 10806.62 samples/sec   Loss 6.3265   LearningRate 0.0101   Epoch: 27   Global Step: 138100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:24:25,857-Speed 10466.83 samples/sec   Loss 6.2698   LearningRate 0.0101   Epoch: 27   Global Step: 138110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:24:26,813-Speed 10714.53 samples/sec   Loss 6.4360   LearningRate 0.0101   Epoch: 27   Global Step: 138120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:24:27,772-Speed 10682.54 samples/sec   Loss 6.3894   LearningRate 0.0101   Epoch: 27   Global Step: 138130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:24:28,729-Speed 10710.41 samples/sec   Loss 6.1753   LearningRate 0.0101   Epoch: 27   Global Step: 138140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:24:29,717-Speed 10371.17 samples/sec   Loss 6.3233   LearningRate 0.0101   Epoch: 27   Global Step: 138150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:24:30,703-Speed 10393.52 samples/sec   Loss 6.2744   LearningRate 0.0101   Epoch: 27   Global Step: 138160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:24:31,647-Speed 10872.81 samples/sec   Loss 6.3594   LearningRate 0.0101   Epoch: 27   Global Step: 138170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:24:32,592-Speed 10837.23 samples/sec   Loss 6.1535   LearningRate 0.0101   Epoch: 27   Global Step: 138180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:24:33,571-Speed 10476.89 samples/sec   Loss 6.1750   LearningRate 0.0100   Epoch: 27   Global Step: 138190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:24:34,505-Speed 10968.57 samples/sec   Loss 6.5242   LearningRate 0.0100   Epoch: 27   Global Step: 138200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:24:35,444-Speed 10918.87 samples/sec   Loss 6.2874   LearningRate 0.0100   Epoch: 27   Global Step: 138210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:24:36,392-Speed 10808.57 samples/sec   Loss 6.4032   LearningRate 0.0100   Epoch: 27   Global Step: 138220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:24:37,354-Speed 10649.47 samples/sec   Loss 6.2699   LearningRate 0.0100   Epoch: 27   Global Step: 138230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:24:38,327-Speed 10537.13 samples/sec   Loss 6.3051   LearningRate 0.0100   Epoch: 27   Global Step: 138240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:24:39,304-Speed 10488.23 samples/sec   Loss 6.4153   LearningRate 0.0100   Epoch: 27   Global Step: 138250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:24:40,262-Speed 10698.68 samples/sec   Loss 6.2151   LearningRate 0.0100   Epoch: 27   Global Step: 138260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:24:41,225-Speed 10638.95 samples/sec   Loss 6.4991   LearningRate 0.0100   Epoch: 27   Global Step: 138270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:24:42,213-Speed 10380.72 samples/sec   Loss 6.3473   LearningRate 0.0100   Epoch: 27   Global Step: 138280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:24:43,171-Speed 10692.51 samples/sec   Loss 6.3325   LearningRate 0.0100   Epoch: 27   Global Step: 138290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:24:44,108-Speed 10939.04 samples/sec   Loss 6.4634   LearningRate 0.0100   Epoch: 27   Global Step: 138300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:24:45,127-Speed 10054.27 samples/sec   Loss 6.3916   LearningRate 0.0100   Epoch: 27   Global Step: 138310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:24:46,072-Speed 10846.08 samples/sec   Loss 6.5137   LearningRate 0.0100   Epoch: 27   Global Step: 138320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:24:47,070-Speed 10272.51 samples/sec   Loss 6.2416   LearningRate 0.0100   Epoch: 27   Global Step: 138330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:24:47,992-Speed 11112.52 samples/sec   Loss 6.3731   LearningRate 0.0100   Epoch: 27   Global Step: 138340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:24:48,958-Speed 10608.23 samples/sec   Loss 6.3279   LearningRate 0.0100   Epoch: 27   Global Step: 138350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:24:49,945-Speed 10391.77 samples/sec   Loss 6.2806   LearningRate 0.0100   Epoch: 27   Global Step: 138360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:24:50,879-Speed 10971.42 samples/sec   Loss 6.4806   LearningRate 0.0100   Epoch: 27   Global Step: 138370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:24:51,874-Speed 10297.27 samples/sec   Loss 6.2550   LearningRate 0.0100   Epoch: 27   Global Step: 138380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:24:52,855-Speed 10443.59 samples/sec   Loss 6.3832   LearningRate 0.0100   Epoch: 27   Global Step: 138390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:24:53,855-Speed 10259.68 samples/sec   Loss 6.2564   LearningRate 0.0100   Epoch: 27   Global Step: 138400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:24:54,811-Speed 10722.29 samples/sec   Loss 6.2720   LearningRate 0.0100   Epoch: 27   Global Step: 138410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:24:55,745-Speed 10970.47 samples/sec   Loss 6.4256   LearningRate 0.0100   Epoch: 27   Global Step: 138420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:24:56,706-Speed 10653.89 samples/sec   Loss 6.4011   LearningRate 0.0100   Epoch: 27   Global Step: 138430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:24:57,710-Speed 10217.28 samples/sec   Loss 6.4251   LearningRate 0.0100   Epoch: 27   Global Step: 138440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:24:58,676-Speed 10610.44 samples/sec   Loss 6.3973   LearningRate 0.0100   Epoch: 27   Global Step: 138450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:24:59,649-Speed 10526.53 samples/sec   Loss 6.3061   LearningRate 0.0100   Epoch: 27   Global Step: 138460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:00,578-Speed 11033.81 samples/sec   Loss 6.2953   LearningRate 0.0100   Epoch: 27   Global Step: 138470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:01,537-Speed 10689.89 samples/sec   Loss 6.3231   LearningRate 0.0100   Epoch: 27   Global Step: 138480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:02,563-Speed 9984.48 samples/sec   Loss 6.3479   LearningRate 0.0100   Epoch: 27   Global Step: 138490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:03,513-Speed 10799.54 samples/sec   Loss 6.4961   LearningRate 0.0100   Epoch: 27   Global Step: 138500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:04,422-Speed 11263.69 samples/sec   Loss 6.3216   LearningRate 0.0099   Epoch: 27   Global Step: 138510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:05,385-Speed 10646.95 samples/sec   Loss 6.4378   LearningRate 0.0099   Epoch: 27   Global Step: 138520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:06,350-Speed 10615.59 samples/sec   Loss 6.3395   LearningRate 0.0099   Epoch: 27   Global Step: 138530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:07,316-Speed 10601.91 samples/sec   Loss 6.4158   LearningRate 0.0099   Epoch: 27   Global Step: 138540   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:25:08,292-Speed 10509.17 samples/sec   Loss 6.4195   LearningRate 0.0099   Epoch: 27   Global Step: 138550   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:25:09,239-Speed 10824.43 samples/sec   Loss 6.4679   LearningRate 0.0099   Epoch: 27   Global Step: 138560   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:25:10,242-Speed 10218.00 samples/sec   Loss 6.4368   LearningRate 0.0099   Epoch: 27   Global Step: 138570   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:25:11,169-Speed 11051.94 samples/sec   Loss 6.2834   LearningRate 0.0099   Epoch: 27   Global Step: 138580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:12,125-Speed 10719.78 samples/sec   Loss 6.3929   LearningRate 0.0099   Epoch: 27   Global Step: 138590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:13,074-Speed 10802.44 samples/sec   Loss 6.5073   LearningRate 0.0099   Epoch: 27   Global Step: 138600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:14,027-Speed 10742.64 samples/sec   Loss 6.3832   LearningRate 0.0099   Epoch: 27   Global Step: 138610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:14,967-Speed 10905.28 samples/sec   Loss 6.4162   LearningRate 0.0099   Epoch: 27   Global Step: 138620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:15,949-Speed 10435.05 samples/sec   Loss 6.3744   LearningRate 0.0099   Epoch: 27   Global Step: 138630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:16,940-Speed 10343.90 samples/sec   Loss 6.2860   LearningRate 0.0099   Epoch: 27   Global Step: 138640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:17,867-Speed 11057.95 samples/sec   Loss 6.3464   LearningRate 0.0099   Epoch: 27   Global Step: 138650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:25:18,838-Speed 10558.47 samples/sec   Loss 6.2638   LearningRate 0.0099   Epoch: 27   Global Step: 138660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:25:19,842-Speed 10211.60 samples/sec   Loss 6.2956   LearningRate 0.0099   Epoch: 27   Global Step: 138670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:25:20,785-Speed 10871.06 samples/sec   Loss 6.4201   LearningRate 0.0099   Epoch: 27   Global Step: 138680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:25:21,748-Speed 10641.97 samples/sec   Loss 6.3492   LearningRate 0.0099   Epoch: 27   Global Step: 138690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:25:22,686-Speed 10928.08 samples/sec   Loss 6.3696   LearningRate 0.0099   Epoch: 27   Global Step: 138700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:25:23,640-Speed 10741.41 samples/sec   Loss 6.4292   LearningRate 0.0099   Epoch: 27   Global Step: 138710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:25:24,609-Speed 10576.42 samples/sec   Loss 6.3417   LearningRate 0.0099   Epoch: 27   Global Step: 138720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:25:25,575-Speed 10605.83 samples/sec   Loss 6.4843   LearningRate 0.0099   Epoch: 27   Global Step: 138730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:25:26,526-Speed 10776.29 samples/sec   Loss 6.4096   LearningRate 0.0099   Epoch: 27   Global Step: 138740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:25:27,473-Speed 10824.33 samples/sec   Loss 6.2990   LearningRate 0.0099   Epoch: 27   Global Step: 138750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:28,399-Speed 11069.58 samples/sec   Loss 6.5481   LearningRate 0.0099   Epoch: 27   Global Step: 138760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:29,388-Speed 10360.06 samples/sec   Loss 6.3569   LearningRate 0.0099   Epoch: 27   Global Step: 138770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:30,366-Speed 10476.96 samples/sec   Loss 6.2761   LearningRate 0.0099   Epoch: 27   Global Step: 138780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:31,313-Speed 10818.18 samples/sec   Loss 6.5674   LearningRate 0.0099   Epoch: 27   Global Step: 138790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:32,282-Speed 10581.23 samples/sec   Loss 6.1769   LearningRate 0.0099   Epoch: 27   Global Step: 138800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:33,217-Speed 10969.47 samples/sec   Loss 6.2886   LearningRate 0.0099   Epoch: 27   Global Step: 138810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:34,157-Speed 10902.81 samples/sec   Loss 6.5121   LearningRate 0.0099   Epoch: 27   Global Step: 138820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:35,093-Speed 10949.16 samples/sec   Loss 6.4689   LearningRate 0.0098   Epoch: 27   Global Step: 138830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:36,065-Speed 10534.97 samples/sec   Loss 6.4690   LearningRate 0.0098   Epoch: 27   Global Step: 138840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:37,061-Speed 10295.07 samples/sec   Loss 6.2356   LearningRate 0.0098   Epoch: 27   Global Step: 138850   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:25:38,034-Speed 10534.26 samples/sec   Loss 6.5299   LearningRate 0.0098   Epoch: 27   Global Step: 138860   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:25:38,966-Speed 11001.08 samples/sec   Loss 6.3608   LearningRate 0.0098   Epoch: 27   Global Step: 138870   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:25:39,911-Speed 10849.93 samples/sec   Loss 6.2859   LearningRate 0.0098   Epoch: 27   Global Step: 138880   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:25:40,890-Speed 10457.08 samples/sec   Loss 6.4272   LearningRate 0.0098   Epoch: 27   Global Step: 138890   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:25:41,850-Speed 10676.56 samples/sec   Loss 6.3648   LearningRate 0.0098   Epoch: 27   Global Step: 138900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:42,867-Speed 10084.27 samples/sec   Loss 6.3490   LearningRate 0.0098   Epoch: 27   Global Step: 138910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:43,840-Speed 10535.30 samples/sec   Loss 6.3851   LearningRate 0.0098   Epoch: 27   Global Step: 138920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:44,822-Speed 10433.12 samples/sec   Loss 6.4199   LearningRate 0.0098   Epoch: 27   Global Step: 138930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:45,791-Speed 10576.01 samples/sec   Loss 6.2312   LearningRate 0.0098   Epoch: 27   Global Step: 138940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:46,752-Speed 10668.47 samples/sec   Loss 6.5184   LearningRate 0.0098   Epoch: 27   Global Step: 138950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:47,739-Speed 10383.88 samples/sec   Loss 6.3817   LearningRate 0.0098   Epoch: 27   Global Step: 138960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:48,734-Speed 10295.76 samples/sec   Loss 6.3789   LearningRate 0.0098   Epoch: 27   Global Step: 138970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:49,692-Speed 10701.09 samples/sec   Loss 6.1671   LearningRate 0.0098   Epoch: 27   Global Step: 138980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:50,671-Speed 10463.23 samples/sec   Loss 6.3915   LearningRate 0.0098   Epoch: 27   Global Step: 138990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:51,650-Speed 10473.46 samples/sec   Loss 6.5201   LearningRate 0.0098   Epoch: 27   Global Step: 139000   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:25:52,585-Speed 10959.01 samples/sec   Loss 6.2490   LearningRate 0.0098   Epoch: 27   Global Step: 139010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:53,576-Speed 10341.96 samples/sec   Loss 6.3778   LearningRate 0.0098   Epoch: 27   Global Step: 139020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:54,529-Speed 10748.04 samples/sec   Loss 6.3102   LearningRate 0.0098   Epoch: 27   Global Step: 139030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:55,489-Speed 10685.83 samples/sec   Loss 6.3967   LearningRate 0.0098   Epoch: 27   Global Step: 139040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:56,459-Speed 10554.11 samples/sec   Loss 6.2697   LearningRate 0.0098   Epoch: 27   Global Step: 139050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:57,414-Speed 10739.19 samples/sec   Loss 6.4894   LearningRate 0.0098   Epoch: 27   Global Step: 139060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:58,356-Speed 10877.59 samples/sec   Loss 6.4058   LearningRate 0.0098   Epoch: 27   Global Step: 139070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:25:59,317-Speed 10673.23 samples/sec   Loss 6.5184   LearningRate 0.0098   Epoch: 27   Global Step: 139080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:00,289-Speed 10534.64 samples/sec   Loss 6.4315   LearningRate 0.0098   Epoch: 27   Global Step: 139090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:01,239-Speed 10792.39 samples/sec   Loss 6.2390   LearningRate 0.0098   Epoch: 27   Global Step: 139100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:02,183-Speed 10861.42 samples/sec   Loss 6.3916   LearningRate 0.0098   Epoch: 27   Global Step: 139110   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:26:03,145-Speed 10658.74 samples/sec   Loss 6.4423   LearningRate 0.0098   Epoch: 27   Global Step: 139120   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:26:04,101-Speed 10712.85 samples/sec   Loss 6.3379   LearningRate 0.0098   Epoch: 27   Global Step: 139130   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:26:05,078-Speed 10489.10 samples/sec   Loss 6.4874   LearningRate 0.0098   Epoch: 27   Global Step: 139140   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:26:06,000-Speed 11113.52 samples/sec   Loss 6.4116   LearningRate 0.0097   Epoch: 27   Global Step: 139150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:06,973-Speed 10542.80 samples/sec   Loss 6.4687   LearningRate 0.0097   Epoch: 27   Global Step: 139160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:07,934-Speed 10658.29 samples/sec   Loss 6.4550   LearningRate 0.0097   Epoch: 27   Global Step: 139170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:08,893-Speed 10682.20 samples/sec   Loss 6.4503   LearningRate 0.0097   Epoch: 27   Global Step: 139180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:09,854-Speed 10667.64 samples/sec   Loss 6.5224   LearningRate 0.0097   Epoch: 27   Global Step: 139190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:10,828-Speed 10527.17 samples/sec   Loss 6.3620   LearningRate 0.0097   Epoch: 27   Global Step: 139200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:11,846-Speed 10088.32 samples/sec   Loss 6.3938   LearningRate 0.0097   Epoch: 27   Global Step: 139210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:12,818-Speed 10542.38 samples/sec   Loss 6.4409   LearningRate 0.0097   Epoch: 27   Global Step: 139220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:13,767-Speed 10804.33 samples/sec   Loss 6.2462   LearningRate 0.0097   Epoch: 27   Global Step: 139230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:14,737-Speed 10559.14 samples/sec   Loss 6.4207   LearningRate 0.0097   Epoch: 27   Global Step: 139240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:15,696-Speed 10694.05 samples/sec   Loss 6.4007   LearningRate 0.0097   Epoch: 27   Global Step: 139250   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:26:16,671-Speed 10508.66 samples/sec   Loss 6.4407   LearningRate 0.0097   Epoch: 27   Global Step: 139260   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:26:17,632-Speed 10667.28 samples/sec   Loss 6.6714   LearningRate 0.0097   Epoch: 27   Global Step: 139270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:18,598-Speed 10607.02 samples/sec   Loss 6.3513   LearningRate 0.0097   Epoch: 27   Global Step: 139280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:19,560-Speed 10660.22 samples/sec   Loss 6.4274   LearningRate 0.0097   Epoch: 27   Global Step: 139290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:20,545-Speed 10401.54 samples/sec   Loss 6.4064   LearningRate 0.0097   Epoch: 27   Global Step: 139300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:21,489-Speed 10860.74 samples/sec   Loss 6.3900   LearningRate 0.0097   Epoch: 27   Global Step: 139310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:22,431-Speed 10878.44 samples/sec   Loss 6.3933   LearningRate 0.0097   Epoch: 27   Global Step: 139320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:23,392-Speed 10672.00 samples/sec   Loss 6.3912   LearningRate 0.0097   Epoch: 27   Global Step: 139330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:24,342-Speed 10784.16 samples/sec   Loss 6.5692   LearningRate 0.0097   Epoch: 27   Global Step: 139340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:25,330-Speed 10377.43 samples/sec   Loss 6.4366   LearningRate 0.0097   Epoch: 27   Global Step: 139350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:26,295-Speed 10622.84 samples/sec   Loss 6.3991   LearningRate 0.0097   Epoch: 27   Global Step: 139360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:27,258-Speed 10643.48 samples/sec   Loss 6.4758   LearningRate 0.0097   Epoch: 27   Global Step: 139370   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:26:28,212-Speed 10744.40 samples/sec   Loss 6.4481   LearningRate 0.0097   Epoch: 27   Global Step: 139380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:29,176-Speed 10629.02 samples/sec   Loss 6.4008   LearningRate 0.0097   Epoch: 27   Global Step: 139390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:30,167-Speed 10350.54 samples/sec   Loss 6.5067   LearningRate 0.0097   Epoch: 27   Global Step: 139400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:31,130-Speed 10639.50 samples/sec   Loss 6.5214   LearningRate 0.0097   Epoch: 27   Global Step: 139410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:32,066-Speed 10953.87 samples/sec   Loss 6.5466   LearningRate 0.0097   Epoch: 27   Global Step: 139420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:33,037-Speed 10548.20 samples/sec   Loss 6.4660   LearningRate 0.0097   Epoch: 27   Global Step: 139430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:34,045-Speed 10167.79 samples/sec   Loss 6.4329   LearningRate 0.0097   Epoch: 27   Global Step: 139440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:34,973-Speed 11050.52 samples/sec   Loss 6.4990   LearningRate 0.0097   Epoch: 27   Global Step: 139450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:26:35,908-Speed 10960.96 samples/sec   Loss 6.5692   LearningRate 0.0097   Epoch: 27   Global Step: 139460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:26:36,901-Speed 10325.29 samples/sec   Loss 6.3872   LearningRate 0.0097   Epoch: 27   Global Step: 139470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:26:37,814-Speed 11220.56 samples/sec   Loss 6.4516   LearningRate 0.0096   Epoch: 27   Global Step: 139480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:26:38,774-Speed 10671.30 samples/sec   Loss 6.4631   LearningRate 0.0096   Epoch: 27   Global Step: 139490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:26:39,760-Speed 10393.68 samples/sec   Loss 6.4490   LearningRate 0.0096   Epoch: 27   Global Step: 139500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:26:40,745-Speed 10415.56 samples/sec   Loss 6.5760   LearningRate 0.0096   Epoch: 27   Global Step: 139510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:26:41,684-Speed 10912.09 samples/sec   Loss 6.4391   LearningRate 0.0096   Epoch: 27   Global Step: 139520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:26:42,624-Speed 10896.73 samples/sec   Loss 6.4626   LearningRate 0.0096   Epoch: 27   Global Step: 139530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:26:43,601-Speed 10498.56 samples/sec   Loss 6.3730   LearningRate 0.0096   Epoch: 27   Global Step: 139540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:26:44,559-Speed 10700.39 samples/sec   Loss 6.5426   LearningRate 0.0096   Epoch: 27   Global Step: 139550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:45,511-Speed 10758.18 samples/sec   Loss 6.3785   LearningRate 0.0096   Epoch: 27   Global Step: 139560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:46,435-Speed 11093.58 samples/sec   Loss 6.4581   LearningRate 0.0096   Epoch: 27   Global Step: 139570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:47,423-Speed 10375.60 samples/sec   Loss 6.3789   LearningRate 0.0096   Epoch: 27   Global Step: 139580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:48,391-Speed 10704.55 samples/sec   Loss 6.4218   LearningRate 0.0096   Epoch: 27   Global Step: 139590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:49,349-Speed 10694.64 samples/sec   Loss 6.4016   LearningRate 0.0096   Epoch: 27   Global Step: 139600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:50,331-Speed 10436.67 samples/sec   Loss 6.5617   LearningRate 0.0096   Epoch: 27   Global Step: 139610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:51,273-Speed 10881.90 samples/sec   Loss 6.3607   LearningRate 0.0096   Epoch: 27   Global Step: 139620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:52,264-Speed 10342.99 samples/sec   Loss 6.4176   LearningRate 0.0096   Epoch: 27   Global Step: 139630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:53,252-Speed 10371.07 samples/sec   Loss 6.3621   LearningRate 0.0096   Epoch: 27   Global Step: 139640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:54,226-Speed 10523.52 samples/sec   Loss 6.5688   LearningRate 0.0096   Epoch: 27   Global Step: 139650   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:26:55,165-Speed 10920.81 samples/sec   Loss 6.3635   LearningRate 0.0096   Epoch: 27   Global Step: 139660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:26:56,129-Speed 10625.60 samples/sec   Loss 6.4400   LearningRate 0.0096   Epoch: 27   Global Step: 139670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:26:57,121-Speed 10330.70 samples/sec   Loss 6.5486   LearningRate 0.0096   Epoch: 27   Global Step: 139680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:26:58,091-Speed 10569.46 samples/sec   Loss 6.3686   LearningRate 0.0096   Epoch: 27   Global Step: 139690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:26:59,081-Speed 10344.88 samples/sec   Loss 6.5475   LearningRate 0.0096   Epoch: 27   Global Step: 139700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:27:00,022-Speed 10890.47 samples/sec   Loss 6.4189   LearningRate 0.0096   Epoch: 27   Global Step: 139710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:27:00,974-Speed 10764.12 samples/sec   Loss 6.3972   LearningRate 0.0096   Epoch: 27   Global Step: 139720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:27:01,948-Speed 10528.37 samples/sec   Loss 6.4086   LearningRate 0.0096   Epoch: 27   Global Step: 139730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:27:02,914-Speed 10611.09 samples/sec   Loss 6.4568   LearningRate 0.0096   Epoch: 27   Global Step: 139740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:27:03,897-Speed 10424.00 samples/sec   Loss 6.3741   LearningRate 0.0096   Epoch: 27   Global Step: 139750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:27:04,872-Speed 10513.62 samples/sec   Loss 6.4320   LearningRate 0.0096   Epoch: 27   Global Step: 139760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:27:05,807-Speed 10960.95 samples/sec   Loss 6.4191   LearningRate 0.0096   Epoch: 27   Global Step: 139770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:27:06,745-Speed 10925.30 samples/sec   Loss 6.6104   LearningRate 0.0096   Epoch: 27   Global Step: 139780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:27:07,709-Speed 10630.22 samples/sec   Loss 6.5385   LearningRate 0.0096   Epoch: 27   Global Step: 139790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:27:08,675-Speed 10615.09 samples/sec   Loss 6.4301   LearningRate 0.0095   Epoch: 27   Global Step: 139800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:27:09,670-Speed 10296.22 samples/sec   Loss 6.5266   LearningRate 0.0095   Epoch: 27   Global Step: 139810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:27:10,619-Speed 10806.76 samples/sec   Loss 6.6236   LearningRate 0.0095   Epoch: 27   Global Step: 139820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:27:11,597-Speed 10476.55 samples/sec   Loss 6.5474   LearningRate 0.0095   Epoch: 27   Global Step: 139830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:27:12,582-Speed 10397.53 samples/sec   Loss 6.3448   LearningRate 0.0095   Epoch: 27   Global Step: 139840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:27:13,556-Speed 10529.96 samples/sec   Loss 6.4278   LearningRate 0.0095   Epoch: 27   Global Step: 139850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:27:14,525-Speed 10576.75 samples/sec   Loss 6.3622   LearningRate 0.0095   Epoch: 27   Global Step: 139860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:27:15,486-Speed 10655.88 samples/sec   Loss 6.5988   LearningRate 0.0095   Epoch: 27   Global Step: 139870   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:27:16,420-Speed 10977.42 samples/sec   Loss 6.4378   LearningRate 0.0095   Epoch: 27   Global Step: 139880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:27:17,392-Speed 10556.34 samples/sec   Loss 6.4754   LearningRate 0.0095   Epoch: 27   Global Step: 139890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:27:18,335-Speed 10870.88 samples/sec   Loss 6.4035   LearningRate 0.0095   Epoch: 27   Global Step: 139900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:27:19,291-Speed 10719.94 samples/sec   Loss 6.5502   LearningRate 0.0095   Epoch: 27   Global Step: 139910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:27:20,277-Speed 10408.75 samples/sec   Loss 6.4168   LearningRate 0.0095   Epoch: 27   Global Step: 139920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:27:21,202-Speed 11077.34 samples/sec   Loss 6.3425   LearningRate 0.0095   Epoch: 27   Global Step: 139930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:27:22,164-Speed 10662.87 samples/sec   Loss 6.5679   LearningRate 0.0095   Epoch: 27   Global Step: 139940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:27:23,134-Speed 10570.78 samples/sec   Loss 6.3940   LearningRate 0.0095   Epoch: 27   Global Step: 139950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:27:24,059-Speed 11072.66 samples/sec   Loss 6.5767   LearningRate 0.0095   Epoch: 27   Global Step: 139960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:27:25,010-Speed 10775.87 samples/sec   Loss 6.5172   LearningRate 0.0095   Epoch: 27   Global Step: 139970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:27:25,969-Speed 10687.99 samples/sec   Loss 6.4422   LearningRate 0.0095   Epoch: 27   Global Step: 139980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:27:26,916-Speed 10827.49 samples/sec   Loss 6.6818   LearningRate 0.0095   Epoch: 27   Global Step: 139990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:27:27,881-Speed 10622.18 samples/sec   Loss 6.5828   LearningRate 0.0095   Epoch: 27   Global Step: 140000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:27:50,017-[lfw][140000]XNorm: 9.179028
Training: 2022-04-11 04:27:50,018-[lfw][140000]Accuracy-Flip: 0.99650+-0.00353
Training: 2022-04-11 04:27:50,018-[lfw][140000]Accuracy-Highest: 0.99700
Training: 2022-04-11 04:28:15,474-[cfp_fp][140000]XNorm: 7.896848
Training: 2022-04-11 04:28:15,475-[cfp_fp][140000]Accuracy-Flip: 0.96643+-0.00848
Training: 2022-04-11 04:28:15,476-[cfp_fp][140000]Accuracy-Highest: 0.96800
Training: 2022-04-11 04:28:37,610-[agedb_30][140000]XNorm: 8.953506
Training: 2022-04-11 04:28:37,611-[agedb_30][140000]Accuracy-Flip: 0.97100+-0.00646
Training: 2022-04-11 04:28:37,611-[agedb_30][140000]Accuracy-Highest: 0.97100
Training: 2022-04-11 04:28:38,595-Speed 144.81 samples/sec   Loss 6.5401   LearningRate 0.0095   Epoch: 27   Global Step: 140010   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:28:39,540-Speed 10852.16 samples/sec   Loss 6.5797   LearningRate 0.0095   Epoch: 27   Global Step: 140020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:28:40,501-Speed 10657.21 samples/sec   Loss 6.5616   LearningRate 0.0095   Epoch: 27   Global Step: 140030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:28:41,464-Speed 10645.85 samples/sec   Loss 6.2929   LearningRate 0.0095   Epoch: 27   Global Step: 140040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:28:42,443-Speed 10468.20 samples/sec   Loss 6.3326   LearningRate 0.0095   Epoch: 27   Global Step: 140050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:28:43,393-Speed 10785.86 samples/sec   Loss 6.5985   LearningRate 0.0095   Epoch: 27   Global Step: 140060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:28:44,318-Speed 11078.78 samples/sec   Loss 6.3657   LearningRate 0.0095   Epoch: 27   Global Step: 140070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:28:45,246-Speed 11042.19 samples/sec   Loss 6.4397   LearningRate 0.0095   Epoch: 27   Global Step: 140080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:28:46,221-Speed 10514.09 samples/sec   Loss 6.4777   LearningRate 0.0095   Epoch: 27   Global Step: 140090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:28:47,164-Speed 10870.12 samples/sec   Loss 6.4184   LearningRate 0.0095   Epoch: 27   Global Step: 140100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:28:48,146-Speed 10441.97 samples/sec   Loss 6.4898   LearningRate 0.0095   Epoch: 27   Global Step: 140110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:28:49,107-Speed 10656.80 samples/sec   Loss 6.3940   LearningRate 0.0095   Epoch: 27   Global Step: 140120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:28:50,087-Speed 10457.64 samples/sec   Loss 6.5329   LearningRate 0.0094   Epoch: 27   Global Step: 140130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:28:51,046-Speed 10690.31 samples/sec   Loss 6.4393   LearningRate 0.0094   Epoch: 27   Global Step: 140140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:28:52,005-Speed 10675.98 samples/sec   Loss 6.5178   LearningRate 0.0094   Epoch: 27   Global Step: 140150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:28:53,005-Speed 10254.41 samples/sec   Loss 6.3338   LearningRate 0.0094   Epoch: 27   Global Step: 140160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:28:53,984-Speed 10474.87 samples/sec   Loss 6.3968   LearningRate 0.0094   Epoch: 27   Global Step: 140170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:28:54,950-Speed 10610.78 samples/sec   Loss 6.4778   LearningRate 0.0094   Epoch: 27   Global Step: 140180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:28:55,913-Speed 10639.50 samples/sec   Loss 6.5335   LearningRate 0.0094   Epoch: 27   Global Step: 140190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:28:56,920-Speed 10175.05 samples/sec   Loss 6.4625   LearningRate 0.0094   Epoch: 27   Global Step: 140200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:28:57,904-Speed 10422.18 samples/sec   Loss 6.5873   LearningRate 0.0094   Epoch: 27   Global Step: 140210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:28:58,878-Speed 10521.94 samples/sec   Loss 6.3154   LearningRate 0.0094   Epoch: 27   Global Step: 140220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:28:59,869-Speed 10335.93 samples/sec   Loss 6.5718   LearningRate 0.0094   Epoch: 27   Global Step: 140230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:29:00,887-Speed 10075.57 samples/sec   Loss 6.5411   LearningRate 0.0094   Epoch: 27   Global Step: 140240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:29:01,860-Speed 10545.96 samples/sec   Loss 6.5124   LearningRate 0.0094   Epoch: 27   Global Step: 140250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:29:02,830-Speed 10567.46 samples/sec   Loss 6.3754   LearningRate 0.0094   Epoch: 27   Global Step: 140260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:29:03,775-Speed 10836.17 samples/sec   Loss 6.6018   LearningRate 0.0094   Epoch: 27   Global Step: 140270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:29:04,804-Speed 9967.98 samples/sec   Loss 6.6042   LearningRate 0.0094   Epoch: 27   Global Step: 140280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:29:05,726-Speed 11115.54 samples/sec   Loss 6.5051   LearningRate 0.0094   Epoch: 27   Global Step: 140290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:29:06,696-Speed 10566.10 samples/sec   Loss 6.5108   LearningRate 0.0094   Epoch: 27   Global Step: 140300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:29:07,676-Speed 10467.35 samples/sec   Loss 6.4664   LearningRate 0.0094   Epoch: 27   Global Step: 140310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:29:08,662-Speed 10399.59 samples/sec   Loss 6.6481   LearningRate 0.0094   Epoch: 27   Global Step: 140320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:29:09,645-Speed 10425.58 samples/sec   Loss 6.3350   LearningRate 0.0094   Epoch: 27   Global Step: 140330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:29:10,642-Speed 10284.17 samples/sec   Loss 6.6440   LearningRate 0.0094   Epoch: 27   Global Step: 140340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:29:11,621-Speed 10467.99 samples/sec   Loss 6.4543   LearningRate 0.0094   Epoch: 27   Global Step: 140350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:29:12,609-Speed 10368.13 samples/sec   Loss 6.4317   LearningRate 0.0094   Epoch: 27   Global Step: 140360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:29:13,552-Speed 10873.52 samples/sec   Loss 6.5218   LearningRate 0.0094   Epoch: 27   Global Step: 140370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:29:14,463-Speed 11253.47 samples/sec   Loss 6.4562   LearningRate 0.0094   Epoch: 27   Global Step: 140380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:29:15,440-Speed 10482.65 samples/sec   Loss 6.5254   LearningRate 0.0094   Epoch: 27   Global Step: 140390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:29:16,441-Speed 10245.28 samples/sec   Loss 6.5712   LearningRate 0.0094   Epoch: 27   Global Step: 140400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:29:17,389-Speed 10804.27 samples/sec   Loss 6.5474   LearningRate 0.0094   Epoch: 27   Global Step: 140410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:29:18,334-Speed 10860.50 samples/sec   Loss 6.5027   LearningRate 0.0094   Epoch: 27   Global Step: 140420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:29:19,303-Speed 10574.23 samples/sec   Loss 6.5500   LearningRate 0.0094   Epoch: 27   Global Step: 140430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:29:20,297-Speed 10313.50 samples/sec   Loss 6.6202   LearningRate 0.0094   Epoch: 27   Global Step: 140440   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:29:21,261-Speed 10627.90 samples/sec   Loss 6.6399   LearningRate 0.0094   Epoch: 27   Global Step: 140450   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:29:22,256-Speed 10295.37 samples/sec   Loss 6.6661   LearningRate 0.0093   Epoch: 27   Global Step: 140460   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:29:23,173-Speed 11178.46 samples/sec   Loss 6.4487   LearningRate 0.0093   Epoch: 27   Global Step: 140470   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:29:24,116-Speed 10872.18 samples/sec   Loss 6.3206   LearningRate 0.0093   Epoch: 27   Global Step: 140480   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:29:25,092-Speed 10495.52 samples/sec   Loss 6.4460   LearningRate 0.0093   Epoch: 27   Global Step: 140490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:29:26,022-Speed 11015.78 samples/sec   Loss 6.4616   LearningRate 0.0093   Epoch: 27   Global Step: 140500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:29:27,005-Speed 10433.60 samples/sec   Loss 6.5343   LearningRate 0.0093   Epoch: 27   Global Step: 140510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:29:27,938-Speed 10980.45 samples/sec   Loss 6.4062   LearningRate 0.0093   Epoch: 27   Global Step: 140520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:29:28,917-Speed 10471.23 samples/sec   Loss 6.4492   LearningRate 0.0093   Epoch: 27   Global Step: 140530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:29:29,894-Speed 10485.99 samples/sec   Loss 6.5284   LearningRate 0.0093   Epoch: 27   Global Step: 140540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:29:30,854-Speed 10671.21 samples/sec   Loss 6.3934   LearningRate 0.0093   Epoch: 27   Global Step: 140550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:29:31,793-Speed 10916.01 samples/sec   Loss 6.6728   LearningRate 0.0093   Epoch: 27   Global Step: 140560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:29:32,694-Speed 11376.86 samples/sec   Loss 6.5348   LearningRate 0.0093   Epoch: 27   Global Step: 140570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:29:33,665-Speed 10559.34 samples/sec   Loss 6.6002   LearningRate 0.0093   Epoch: 27   Global Step: 140580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:29:34,630-Speed 10622.31 samples/sec   Loss 6.4122   LearningRate 0.0093   Epoch: 27   Global Step: 140590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:29:35,580-Speed 10789.99 samples/sec   Loss 6.5544   LearningRate 0.0093   Epoch: 27   Global Step: 140600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:29:36,538-Speed 10689.26 samples/sec   Loss 6.2983   LearningRate 0.0093   Epoch: 27   Global Step: 140610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:29:37,478-Speed 10905.10 samples/sec   Loss 6.3735   LearningRate 0.0093   Epoch: 27   Global Step: 140620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:29:38,436-Speed 10697.69 samples/sec   Loss 6.5719   LearningRate 0.0093   Epoch: 27   Global Step: 140630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:29:39,396-Speed 10686.86 samples/sec   Loss 6.5693   LearningRate 0.0093   Epoch: 27   Global Step: 140640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:29:40,374-Speed 10475.76 samples/sec   Loss 6.2651   LearningRate 0.0093   Epoch: 27   Global Step: 140650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:29:41,376-Speed 10224.07 samples/sec   Loss 6.4388   LearningRate 0.0093   Epoch: 27   Global Step: 140660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:29:42,360-Speed 10417.29 samples/sec   Loss 6.4590   LearningRate 0.0093   Epoch: 27   Global Step: 140670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:29:43,316-Speed 10727.60 samples/sec   Loss 6.6774   LearningRate 0.0093   Epoch: 27   Global Step: 140680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:29:44,296-Speed 10456.70 samples/sec   Loss 6.5388   LearningRate 0.0093   Epoch: 27   Global Step: 140690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:29:45,254-Speed 10707.02 samples/sec   Loss 6.2520   LearningRate 0.0093   Epoch: 27   Global Step: 140700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:29:46,171-Speed 11169.32 samples/sec   Loss 6.3912   LearningRate 0.0093   Epoch: 27   Global Step: 140710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:29:47,139-Speed 10584.84 samples/sec   Loss 6.5049   LearningRate 0.0093   Epoch: 27   Global Step: 140720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:29:48,061-Speed 11118.08 samples/sec   Loss 6.6441   LearningRate 0.0093   Epoch: 27   Global Step: 140730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:29:49,021-Speed 10675.81 samples/sec   Loss 6.5412   LearningRate 0.0093   Epoch: 27   Global Step: 140740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:29:49,957-Speed 10948.54 samples/sec   Loss 6.6478   LearningRate 0.0093   Epoch: 27   Global Step: 140750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:29:50,940-Speed 10435.05 samples/sec   Loss 6.6452   LearningRate 0.0093   Epoch: 27   Global Step: 140760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:29:52,190-Speed 8194.96 samples/sec   Loss 6.3294   LearningRate 0.0093   Epoch: 27   Global Step: 140770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:29:53,149-Speed 10679.42 samples/sec   Loss 6.5330   LearningRate 0.0093   Epoch: 27   Global Step: 140780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:29:54,098-Speed 10804.23 samples/sec   Loss 6.4410   LearningRate 0.0092   Epoch: 27   Global Step: 140790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:29:55,053-Speed 10725.69 samples/sec   Loss 6.4916   LearningRate 0.0092   Epoch: 27   Global Step: 140800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:29:56,079-Speed 9987.34 samples/sec   Loss 6.6545   LearningRate 0.0092   Epoch: 27   Global Step: 140810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:29:57,022-Speed 10874.18 samples/sec   Loss 6.5390   LearningRate 0.0092   Epoch: 27   Global Step: 140820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:29:57,992-Speed 10563.84 samples/sec   Loss 6.6519   LearningRate 0.0092   Epoch: 27   Global Step: 140830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:29:58,970-Speed 10480.43 samples/sec   Loss 6.4679   LearningRate 0.0092   Epoch: 27   Global Step: 140840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:29:59,947-Speed 10494.12 samples/sec   Loss 6.5338   LearningRate 0.0092   Epoch: 27   Global Step: 140850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:30:00,902-Speed 10731.75 samples/sec   Loss 6.5150   LearningRate 0.0092   Epoch: 27   Global Step: 140860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:30:01,953-Speed 9749.47 samples/sec   Loss 6.5974   LearningRate 0.0092   Epoch: 27   Global Step: 140870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:30:02,885-Speed 10997.97 samples/sec   Loss 6.5263   LearningRate 0.0092   Epoch: 27   Global Step: 140880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:30:03,854-Speed 10576.37 samples/sec   Loss 6.6122   LearningRate 0.0092   Epoch: 27   Global Step: 140890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:30:04,810-Speed 10725.72 samples/sec   Loss 6.4953   LearningRate 0.0092   Epoch: 27   Global Step: 140900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:30:05,761-Speed 10770.18 samples/sec   Loss 6.5180   LearningRate 0.0092   Epoch: 27   Global Step: 140910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:30:06,744-Speed 10430.98 samples/sec   Loss 6.4269   LearningRate 0.0092   Epoch: 27   Global Step: 140920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:30:07,710-Speed 10610.90 samples/sec   Loss 6.4657   LearningRate 0.0092   Epoch: 27   Global Step: 140930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:30:08,677-Speed 10601.25 samples/sec   Loss 6.4816   LearningRate 0.0092   Epoch: 27   Global Step: 140940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:30:09,652-Speed 10515.05 samples/sec   Loss 6.6073   LearningRate 0.0092   Epoch: 27   Global Step: 140950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:10,647-Speed 10291.87 samples/sec   Loss 6.7057   LearningRate 0.0092   Epoch: 27   Global Step: 140960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:11,637-Speed 10361.44 samples/sec   Loss 6.5306   LearningRate 0.0092   Epoch: 27   Global Step: 140970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:12,586-Speed 10800.60 samples/sec   Loss 6.6973   LearningRate 0.0092   Epoch: 27   Global Step: 140980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:13,544-Speed 10697.37 samples/sec   Loss 6.5391   LearningRate 0.0092   Epoch: 27   Global Step: 140990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:14,541-Speed 10281.34 samples/sec   Loss 6.5002   LearningRate 0.0092   Epoch: 27   Global Step: 141000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:15,514-Speed 10529.81 samples/sec   Loss 6.6575   LearningRate 0.0092   Epoch: 27   Global Step: 141010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:16,470-Speed 10720.02 samples/sec   Loss 6.5778   LearningRate 0.0092   Epoch: 27   Global Step: 141020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:17,436-Speed 10610.47 samples/sec   Loss 6.6710   LearningRate 0.0092   Epoch: 27   Global Step: 141030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:18,409-Speed 10527.88 samples/sec   Loss 6.5048   LearningRate 0.0092   Epoch: 27   Global Step: 141040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:19,404-Speed 10300.09 samples/sec   Loss 6.4904   LearningRate 0.0092   Epoch: 27   Global Step: 141050   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:30:20,354-Speed 10788.18 samples/sec   Loss 6.5340   LearningRate 0.0092   Epoch: 27   Global Step: 141060   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:30:21,324-Speed 10562.89 samples/sec   Loss 6.4848   LearningRate 0.0092   Epoch: 27   Global Step: 141070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:22,313-Speed 10363.47 samples/sec   Loss 6.5756   LearningRate 0.0092   Epoch: 27   Global Step: 141080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:23,260-Speed 10821.26 samples/sec   Loss 6.7474   LearningRate 0.0092   Epoch: 27   Global Step: 141090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:24,176-Speed 11186.91 samples/sec   Loss 6.5435   LearningRate 0.0092   Epoch: 27   Global Step: 141100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:25,167-Speed 10342.31 samples/sec   Loss 6.5657   LearningRate 0.0092   Epoch: 27   Global Step: 141110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:26,122-Speed 10730.05 samples/sec   Loss 6.4712   LearningRate 0.0092   Epoch: 27   Global Step: 141120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:27,116-Speed 10313.05 samples/sec   Loss 6.5459   LearningRate 0.0091   Epoch: 27   Global Step: 141130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:28,133-Speed 10071.46 samples/sec   Loss 6.5437   LearningRate 0.0091   Epoch: 27   Global Step: 141140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:29,128-Speed 10308.19 samples/sec   Loss 6.5985   LearningRate 0.0091   Epoch: 27   Global Step: 141150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:30,133-Speed 10195.90 samples/sec   Loss 6.4107   LearningRate 0.0091   Epoch: 27   Global Step: 141160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:31,093-Speed 10666.06 samples/sec   Loss 6.4769   LearningRate 0.0091   Epoch: 27   Global Step: 141170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:32,015-Speed 11122.90 samples/sec   Loss 6.4249   LearningRate 0.0091   Epoch: 27   Global Step: 141180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:32,980-Speed 10622.27 samples/sec   Loss 6.4510   LearningRate 0.0091   Epoch: 27   Global Step: 141190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:33,942-Speed 10648.61 samples/sec   Loss 6.6277   LearningRate 0.0091   Epoch: 27   Global Step: 141200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:34,889-Speed 10822.22 samples/sec   Loss 6.6464   LearningRate 0.0091   Epoch: 27   Global Step: 141210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:35,851-Speed 10646.78 samples/sec   Loss 6.4832   LearningRate 0.0091   Epoch: 27   Global Step: 141220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:36,835-Speed 10422.89 samples/sec   Loss 6.5645   LearningRate 0.0091   Epoch: 27   Global Step: 141230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:37,805-Speed 10560.48 samples/sec   Loss 6.5035   LearningRate 0.0091   Epoch: 27   Global Step: 141240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:38,767-Speed 10653.03 samples/sec   Loss 6.5324   LearningRate 0.0091   Epoch: 27   Global Step: 141250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:39,753-Speed 10397.46 samples/sec   Loss 6.5462   LearningRate 0.0091   Epoch: 27   Global Step: 141260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:40,685-Speed 10997.46 samples/sec   Loss 6.5112   LearningRate 0.0091   Epoch: 27   Global Step: 141270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:41,636-Speed 10774.93 samples/sec   Loss 6.5613   LearningRate 0.0091   Epoch: 27   Global Step: 141280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:42,587-Speed 10774.32 samples/sec   Loss 6.3354   LearningRate 0.0091   Epoch: 27   Global Step: 141290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:43,560-Speed 10540.09 samples/sec   Loss 6.4232   LearningRate 0.0091   Epoch: 27   Global Step: 141300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:44,550-Speed 10353.67 samples/sec   Loss 6.5591   LearningRate 0.0091   Epoch: 27   Global Step: 141310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:45,476-Speed 11083.54 samples/sec   Loss 6.4057   LearningRate 0.0091   Epoch: 27   Global Step: 141320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:46,421-Speed 10839.38 samples/sec   Loss 6.5875   LearningRate 0.0091   Epoch: 27   Global Step: 141330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:47,371-Speed 10803.19 samples/sec   Loss 6.5415   LearningRate 0.0091   Epoch: 27   Global Step: 141340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:48,315-Speed 10857.79 samples/sec   Loss 6.4076   LearningRate 0.0091   Epoch: 27   Global Step: 141350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:49,275-Speed 10669.21 samples/sec   Loss 6.5122   LearningRate 0.0091   Epoch: 27   Global Step: 141360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:50,234-Speed 10686.09 samples/sec   Loss 6.5677   LearningRate 0.0091   Epoch: 27   Global Step: 141370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:51,211-Speed 10488.17 samples/sec   Loss 6.6686   LearningRate 0.0091   Epoch: 27   Global Step: 141380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:52,204-Speed 10325.75 samples/sec   Loss 6.3795   LearningRate 0.0091   Epoch: 27   Global Step: 141390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:53,157-Speed 10746.99 samples/sec   Loss 6.6473   LearningRate 0.0091   Epoch: 27   Global Step: 141400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:54,124-Speed 10600.39 samples/sec   Loss 6.6395   LearningRate 0.0091   Epoch: 27   Global Step: 141410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:55,107-Speed 10428.96 samples/sec   Loss 6.6203   LearningRate 0.0091   Epoch: 27   Global Step: 141420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:56,056-Speed 10793.41 samples/sec   Loss 6.6720   LearningRate 0.0091   Epoch: 27   Global Step: 141430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:56,985-Speed 11043.49 samples/sec   Loss 6.4599   LearningRate 0.0091   Epoch: 27   Global Step: 141440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:57,948-Speed 10640.90 samples/sec   Loss 6.5569   LearningRate 0.0091   Epoch: 27   Global Step: 141450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:58,914-Speed 10612.71 samples/sec   Loss 6.5151   LearningRate 0.0090   Epoch: 27   Global Step: 141460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:30:59,892-Speed 10478.31 samples/sec   Loss 6.6061   LearningRate 0.0090   Epoch: 27   Global Step: 141470   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:31:00,875-Speed 10418.21 samples/sec   Loss 6.4038   LearningRate 0.0090   Epoch: 27   Global Step: 141480   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:31:01,818-Speed 10866.90 samples/sec   Loss 6.4820   LearningRate 0.0090   Epoch: 27   Global Step: 141490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:31:02,754-Speed 10948.99 samples/sec   Loss 6.6676   LearningRate 0.0090   Epoch: 27   Global Step: 141500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:31:03,771-Speed 10084.98 samples/sec   Loss 6.4576   LearningRate 0.0090   Epoch: 27   Global Step: 141510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:31:04,733-Speed 10652.44 samples/sec   Loss 6.5728   LearningRate 0.0090   Epoch: 27   Global Step: 141520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:31:05,674-Speed 10896.13 samples/sec   Loss 6.4368   LearningRate 0.0090   Epoch: 27   Global Step: 141530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:31:06,644-Speed 10560.34 samples/sec   Loss 6.5891   LearningRate 0.0090   Epoch: 27   Global Step: 141540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:31:07,636-Speed 10342.10 samples/sec   Loss 6.5696   LearningRate 0.0090   Epoch: 27   Global Step: 141550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:31:08,586-Speed 10784.44 samples/sec   Loss 6.5353   LearningRate 0.0090   Epoch: 27   Global Step: 141560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:31:09,542-Speed 10726.56 samples/sec   Loss 6.5279   LearningRate 0.0090   Epoch: 27   Global Step: 141570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:31:10,522-Speed 10449.04 samples/sec   Loss 6.5992   LearningRate 0.0090   Epoch: 27   Global Step: 141580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:31:11,484-Speed 10648.83 samples/sec   Loss 6.5420   LearningRate 0.0090   Epoch: 27   Global Step: 141590   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:31:12,425-Speed 10895.61 samples/sec   Loss 6.4323   LearningRate 0.0090   Epoch: 27   Global Step: 141600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:31:13,395-Speed 10558.73 samples/sec   Loss 6.4642   LearningRate 0.0090   Epoch: 27   Global Step: 141610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:31:14,412-Speed 10081.87 samples/sec   Loss 6.4773   LearningRate 0.0090   Epoch: 27   Global Step: 141620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:31:24,406-Speed 1024.86 samples/sec   Loss 6.1353   LearningRate 0.0090   Epoch: 28   Global Step: 141630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:31:25,441-Speed 9904.58 samples/sec   Loss 5.7512   LearningRate 0.0090   Epoch: 28   Global Step: 141640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:31:26,409-Speed 10583.36 samples/sec   Loss 5.7974   LearningRate 0.0090   Epoch: 28   Global Step: 141650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:31:27,822-Speed 7253.93 samples/sec   Loss 5.7105   LearningRate 0.0090   Epoch: 28   Global Step: 141660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:31:28,789-Speed 10598.24 samples/sec   Loss 5.8172   LearningRate 0.0090   Epoch: 28   Global Step: 141670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:31:29,784-Speed 10303.96 samples/sec   Loss 5.8549   LearningRate 0.0090   Epoch: 28   Global Step: 141680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:31:30,772-Speed 10366.72 samples/sec   Loss 5.8728   LearningRate 0.0090   Epoch: 28   Global Step: 141690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:31:31,722-Speed 10790.96 samples/sec   Loss 5.8877   LearningRate 0.0090   Epoch: 28   Global Step: 141700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:31:32,723-Speed 10233.38 samples/sec   Loss 5.8554   LearningRate 0.0090   Epoch: 28   Global Step: 141710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:31:33,723-Speed 10252.13 samples/sec   Loss 5.8196   LearningRate 0.0090   Epoch: 28   Global Step: 141720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:31:34,640-Speed 11183.78 samples/sec   Loss 5.7942   LearningRate 0.0090   Epoch: 28   Global Step: 141730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:31:35,584-Speed 10856.56 samples/sec   Loss 5.8106   LearningRate 0.0090   Epoch: 28   Global Step: 141740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:31:36,504-Speed 11130.66 samples/sec   Loss 5.9068   LearningRate 0.0090   Epoch: 28   Global Step: 141750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:31:37,440-Speed 10960.09 samples/sec   Loss 5.7088   LearningRate 0.0090   Epoch: 28   Global Step: 141760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:31:38,388-Speed 10810.31 samples/sec   Loss 5.9568   LearningRate 0.0090   Epoch: 28   Global Step: 141770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:31:39,326-Speed 10923.67 samples/sec   Loss 5.8930   LearningRate 0.0090   Epoch: 28   Global Step: 141780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:31:40,298-Speed 10554.29 samples/sec   Loss 5.8916   LearningRate 0.0090   Epoch: 28   Global Step: 141790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:31:41,292-Speed 10312.11 samples/sec   Loss 5.9428   LearningRate 0.0089   Epoch: 28   Global Step: 141800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:31:42,241-Speed 10806.20 samples/sec   Loss 5.9401   LearningRate 0.0089   Epoch: 28   Global Step: 141810   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:31:43,211-Speed 10569.38 samples/sec   Loss 5.9819   LearningRate 0.0089   Epoch: 28   Global Step: 141820   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:31:44,139-Speed 11052.42 samples/sec   Loss 5.8615   LearningRate 0.0089   Epoch: 28   Global Step: 141830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:31:45,094-Speed 10736.01 samples/sec   Loss 5.8747   LearningRate 0.0089   Epoch: 28   Global Step: 141840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:31:46,028-Speed 10973.96 samples/sec   Loss 5.9700   LearningRate 0.0089   Epoch: 28   Global Step: 141850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:31:46,972-Speed 10855.30 samples/sec   Loss 5.9104   LearningRate 0.0089   Epoch: 28   Global Step: 141860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:31:47,957-Speed 10415.85 samples/sec   Loss 5.8371   LearningRate 0.0089   Epoch: 28   Global Step: 141870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:31:48,928-Speed 10552.02 samples/sec   Loss 5.8834   LearningRate 0.0089   Epoch: 28   Global Step: 141880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:31:49,900-Speed 10544.50 samples/sec   Loss 5.9483   LearningRate 0.0089   Epoch: 28   Global Step: 141890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:31:50,830-Speed 11024.69 samples/sec   Loss 6.0076   LearningRate 0.0089   Epoch: 28   Global Step: 141900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:31:51,796-Speed 10613.84 samples/sec   Loss 5.8541   LearningRate 0.0089   Epoch: 28   Global Step: 141910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:31:52,782-Speed 10386.19 samples/sec   Loss 5.9325   LearningRate 0.0089   Epoch: 28   Global Step: 141920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:31:53,794-Speed 10128.36 samples/sec   Loss 5.9309   LearningRate 0.0089   Epoch: 28   Global Step: 141930   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:31:54,769-Speed 10514.00 samples/sec   Loss 5.8938   LearningRate 0.0089   Epoch: 28   Global Step: 141940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:31:55,736-Speed 10619.68 samples/sec   Loss 6.0110   LearningRate 0.0089   Epoch: 28   Global Step: 141950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:31:56,688-Speed 10763.99 samples/sec   Loss 5.9584   LearningRate 0.0089   Epoch: 28   Global Step: 141960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:31:57,606-Speed 11158.70 samples/sec   Loss 5.9353   LearningRate 0.0089   Epoch: 28   Global Step: 141970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:31:58,559-Speed 10761.11 samples/sec   Loss 5.8887   LearningRate 0.0089   Epoch: 28   Global Step: 141980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:31:59,545-Speed 10388.66 samples/sec   Loss 5.8256   LearningRate 0.0089   Epoch: 28   Global Step: 141990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:32:00,534-Speed 10362.21 samples/sec   Loss 6.0225   LearningRate 0.0089   Epoch: 28   Global Step: 142000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:32:23,029-[lfw][142000]XNorm: 8.981314
Training: 2022-04-11 04:32:23,030-[lfw][142000]Accuracy-Flip: 0.99633+-0.00348
Training: 2022-04-11 04:32:23,030-[lfw][142000]Accuracy-Highest: 0.99700
Training: 2022-04-11 04:32:48,595-[cfp_fp][142000]XNorm: 7.696322
Training: 2022-04-11 04:32:48,596-[cfp_fp][142000]Accuracy-Flip: 0.96900+-0.00915
Training: 2022-04-11 04:32:48,597-[cfp_fp][142000]Accuracy-Highest: 0.96900
Training: 2022-04-11 04:33:10,732-[agedb_30][142000]XNorm: 8.772362
Training: 2022-04-11 04:33:10,733-[agedb_30][142000]Accuracy-Flip: 0.97250+-0.00757
Training: 2022-04-11 04:33:10,733-[agedb_30][142000]Accuracy-Highest: 0.97250
Training: 2022-04-11 04:33:11,663-Speed 143.97 samples/sec   Loss 5.9854   LearningRate 0.0089   Epoch: 28   Global Step: 142010   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:33:12,602-Speed 10910.06 samples/sec   Loss 5.9848   LearningRate 0.0089   Epoch: 28   Global Step: 142020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:33:13,578-Speed 10503.97 samples/sec   Loss 5.9615   LearningRate 0.0089   Epoch: 28   Global Step: 142030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:33:14,583-Speed 10192.30 samples/sec   Loss 6.0599   LearningRate 0.0089   Epoch: 28   Global Step: 142040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:33:15,564-Speed 10454.02 samples/sec   Loss 6.1103   LearningRate 0.0089   Epoch: 28   Global Step: 142050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:33:16,518-Speed 10747.17 samples/sec   Loss 5.9486   LearningRate 0.0089   Epoch: 28   Global Step: 142060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:33:17,444-Speed 11068.57 samples/sec   Loss 5.9179   LearningRate 0.0089   Epoch: 28   Global Step: 142070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:33:18,379-Speed 10963.58 samples/sec   Loss 6.0860   LearningRate 0.0089   Epoch: 28   Global Step: 142080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:33:19,330-Speed 10776.14 samples/sec   Loss 5.9995   LearningRate 0.0089   Epoch: 28   Global Step: 142090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:33:20,306-Speed 10505.81 samples/sec   Loss 6.0611   LearningRate 0.0089   Epoch: 28   Global Step: 142100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:33:21,294-Speed 10373.15 samples/sec   Loss 5.9023   LearningRate 0.0089   Epoch: 28   Global Step: 142110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:33:22,274-Speed 10449.22 samples/sec   Loss 5.9903   LearningRate 0.0089   Epoch: 28   Global Step: 142120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:33:23,264-Speed 10369.26 samples/sec   Loss 5.8683   LearningRate 0.0089   Epoch: 28   Global Step: 142130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:33:24,224-Speed 10669.59 samples/sec   Loss 5.9223   LearningRate 0.0088   Epoch: 28   Global Step: 142140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:33:25,230-Speed 10182.83 samples/sec   Loss 6.0809   LearningRate 0.0088   Epoch: 28   Global Step: 142150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:33:26,158-Speed 11042.72 samples/sec   Loss 5.9158   LearningRate 0.0088   Epoch: 28   Global Step: 142160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:33:27,276-Speed 9165.20 samples/sec   Loss 6.0044   LearningRate 0.0088   Epoch: 28   Global Step: 142170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:33:28,270-Speed 10317.26 samples/sec   Loss 5.9433   LearningRate 0.0088   Epoch: 28   Global Step: 142180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:33:29,233-Speed 10646.06 samples/sec   Loss 6.0086   LearningRate 0.0088   Epoch: 28   Global Step: 142190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:33:30,211-Speed 10478.10 samples/sec   Loss 5.9840   LearningRate 0.0088   Epoch: 28   Global Step: 142200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:33:31,191-Speed 10458.29 samples/sec   Loss 6.0791   LearningRate 0.0088   Epoch: 28   Global Step: 142210   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:33:32,128-Speed 10938.64 samples/sec   Loss 5.8921   LearningRate 0.0088   Epoch: 28   Global Step: 142220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:33:33,084-Speed 10720.41 samples/sec   Loss 5.9442   LearningRate 0.0088   Epoch: 28   Global Step: 142230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:33:34,041-Speed 10704.58 samples/sec   Loss 5.9569   LearningRate 0.0088   Epoch: 28   Global Step: 142240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:33:35,016-Speed 10515.62 samples/sec   Loss 5.9720   LearningRate 0.0088   Epoch: 28   Global Step: 142250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:33:35,953-Speed 10937.08 samples/sec   Loss 5.8969   LearningRate 0.0088   Epoch: 28   Global Step: 142260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:33:36,912-Speed 10680.79 samples/sec   Loss 5.9767   LearningRate 0.0088   Epoch: 28   Global Step: 142270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:33:37,891-Speed 10476.62 samples/sec   Loss 5.9457   LearningRate 0.0088   Epoch: 28   Global Step: 142280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:33:38,861-Speed 10561.29 samples/sec   Loss 5.9573   LearningRate 0.0088   Epoch: 28   Global Step: 142290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:33:39,764-Speed 11349.03 samples/sec   Loss 6.0357   LearningRate 0.0088   Epoch: 28   Global Step: 142300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:33:40,706-Speed 10872.59 samples/sec   Loss 6.0814   LearningRate 0.0088   Epoch: 28   Global Step: 142310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:33:41,680-Speed 10535.74 samples/sec   Loss 5.8982   LearningRate 0.0088   Epoch: 28   Global Step: 142320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:33:42,610-Speed 11025.59 samples/sec   Loss 6.1019   LearningRate 0.0088   Epoch: 28   Global Step: 142330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:33:43,536-Speed 11067.90 samples/sec   Loss 6.0567   LearningRate 0.0088   Epoch: 28   Global Step: 142340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:33:44,500-Speed 10630.84 samples/sec   Loss 5.9558   LearningRate 0.0088   Epoch: 28   Global Step: 142350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:33:45,440-Speed 10906.95 samples/sec   Loss 6.1475   LearningRate 0.0088   Epoch: 28   Global Step: 142360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:33:46,412-Speed 10538.46 samples/sec   Loss 6.0348   LearningRate 0.0088   Epoch: 28   Global Step: 142370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:33:47,412-Speed 10257.92 samples/sec   Loss 6.0378   LearningRate 0.0088   Epoch: 28   Global Step: 142380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:33:48,351-Speed 10907.99 samples/sec   Loss 5.9944   LearningRate 0.0088   Epoch: 28   Global Step: 142390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:33:49,336-Speed 10406.40 samples/sec   Loss 6.0788   LearningRate 0.0088   Epoch: 28   Global Step: 142400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:33:50,320-Speed 10417.15 samples/sec   Loss 5.8774   LearningRate 0.0088   Epoch: 28   Global Step: 142410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:33:51,284-Speed 10624.98 samples/sec   Loss 5.9951   LearningRate 0.0088   Epoch: 28   Global Step: 142420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:33:52,233-Speed 10811.17 samples/sec   Loss 5.9986   LearningRate 0.0088   Epoch: 28   Global Step: 142430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:33:53,216-Speed 10423.77 samples/sec   Loss 6.0561   LearningRate 0.0088   Epoch: 28   Global Step: 142440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:33:54,168-Speed 10768.70 samples/sec   Loss 6.1409   LearningRate 0.0088   Epoch: 28   Global Step: 142450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:33:55,126-Speed 10701.74 samples/sec   Loss 6.0082   LearningRate 0.0088   Epoch: 28   Global Step: 142460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:33:56,070-Speed 10858.36 samples/sec   Loss 6.0205   LearningRate 0.0088   Epoch: 28   Global Step: 142470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:33:57,062-Speed 10324.53 samples/sec   Loss 6.1429   LearningRate 0.0087   Epoch: 28   Global Step: 142480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:33:58,055-Speed 10317.14 samples/sec   Loss 5.9687   LearningRate 0.0087   Epoch: 28   Global Step: 142490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:33:58,994-Speed 10918.83 samples/sec   Loss 6.1295   LearningRate 0.0087   Epoch: 28   Global Step: 142500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:33:59,959-Speed 10625.92 samples/sec   Loss 6.1118   LearningRate 0.0087   Epoch: 28   Global Step: 142510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:34:00,939-Speed 10453.53 samples/sec   Loss 6.0443   LearningRate 0.0087   Epoch: 28   Global Step: 142520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:34:01,877-Speed 10935.74 samples/sec   Loss 6.0809   LearningRate 0.0087   Epoch: 28   Global Step: 142530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:34:02,827-Speed 10783.37 samples/sec   Loss 6.0245   LearningRate 0.0087   Epoch: 28   Global Step: 142540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:34:03,813-Speed 10389.22 samples/sec   Loss 6.1279   LearningRate 0.0087   Epoch: 28   Global Step: 142550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:34:04,833-Speed 10056.11 samples/sec   Loss 6.1897   LearningRate 0.0087   Epoch: 28   Global Step: 142560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:05,784-Speed 10770.24 samples/sec   Loss 6.1082   LearningRate 0.0087   Epoch: 28   Global Step: 142570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:06,724-Speed 10911.29 samples/sec   Loss 6.1791   LearningRate 0.0087   Epoch: 28   Global Step: 142580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:07,690-Speed 10611.45 samples/sec   Loss 5.8284   LearningRate 0.0087   Epoch: 28   Global Step: 142590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:08,672-Speed 10434.04 samples/sec   Loss 6.1013   LearningRate 0.0087   Epoch: 28   Global Step: 142600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:09,591-Speed 11146.54 samples/sec   Loss 6.1887   LearningRate 0.0087   Epoch: 28   Global Step: 142610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:10,508-Speed 11181.86 samples/sec   Loss 6.1185   LearningRate 0.0087   Epoch: 28   Global Step: 142620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:11,430-Speed 11112.88 samples/sec   Loss 6.0656   LearningRate 0.0087   Epoch: 28   Global Step: 142630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:12,419-Speed 10361.89 samples/sec   Loss 5.9312   LearningRate 0.0087   Epoch: 28   Global Step: 142640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:13,364-Speed 10840.41 samples/sec   Loss 6.0782   LearningRate 0.0087   Epoch: 28   Global Step: 142650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:14,320-Speed 10726.47 samples/sec   Loss 6.1017   LearningRate 0.0087   Epoch: 28   Global Step: 142660   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:34:15,264-Speed 10853.74 samples/sec   Loss 6.0023   LearningRate 0.0087   Epoch: 28   Global Step: 142670   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:34:16,180-Speed 11185.75 samples/sec   Loss 6.0253   LearningRate 0.0087   Epoch: 28   Global Step: 142680   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:34:17,133-Speed 10752.09 samples/sec   Loss 6.0555   LearningRate 0.0087   Epoch: 28   Global Step: 142690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:18,073-Speed 10902.35 samples/sec   Loss 6.0580   LearningRate 0.0087   Epoch: 28   Global Step: 142700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:19,058-Speed 10412.61 samples/sec   Loss 6.0504   LearningRate 0.0087   Epoch: 28   Global Step: 142710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:20,003-Speed 10835.22 samples/sec   Loss 6.0244   LearningRate 0.0087   Epoch: 28   Global Step: 142720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:20,935-Speed 10994.91 samples/sec   Loss 6.1373   LearningRate 0.0087   Epoch: 28   Global Step: 142730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:21,897-Speed 10655.91 samples/sec   Loss 5.9171   LearningRate 0.0087   Epoch: 28   Global Step: 142740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:22,937-Speed 9865.20 samples/sec   Loss 6.2164   LearningRate 0.0087   Epoch: 28   Global Step: 142750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:23,868-Speed 11004.32 samples/sec   Loss 6.1675   LearningRate 0.0087   Epoch: 28   Global Step: 142760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:24,851-Speed 10424.54 samples/sec   Loss 5.9849   LearningRate 0.0087   Epoch: 28   Global Step: 142770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:25,808-Speed 10707.35 samples/sec   Loss 6.0196   LearningRate 0.0087   Epoch: 28   Global Step: 142780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:26,783-Speed 10514.66 samples/sec   Loss 6.0731   LearningRate 0.0087   Epoch: 28   Global Step: 142790   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:34:27,739-Speed 10727.22 samples/sec   Loss 5.9507   LearningRate 0.0087   Epoch: 28   Global Step: 142800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:28,659-Speed 11140.91 samples/sec   Loss 6.0681   LearningRate 0.0087   Epoch: 28   Global Step: 142810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:29,634-Speed 10508.86 samples/sec   Loss 6.1736   LearningRate 0.0086   Epoch: 28   Global Step: 142820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:30,619-Speed 10414.99 samples/sec   Loss 5.9976   LearningRate 0.0086   Epoch: 28   Global Step: 142830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:31,571-Speed 10762.96 samples/sec   Loss 6.1643   LearningRate 0.0086   Epoch: 28   Global Step: 142840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:32,510-Speed 10915.26 samples/sec   Loss 6.0407   LearningRate 0.0086   Epoch: 28   Global Step: 142850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:33,498-Speed 10372.09 samples/sec   Loss 5.9742   LearningRate 0.0086   Epoch: 28   Global Step: 142860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:34,454-Speed 10718.01 samples/sec   Loss 6.2155   LearningRate 0.0086   Epoch: 28   Global Step: 142870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:35,423-Speed 10581.39 samples/sec   Loss 6.2240   LearningRate 0.0086   Epoch: 28   Global Step: 142880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:36,373-Speed 10795.13 samples/sec   Loss 6.2533   LearningRate 0.0086   Epoch: 28   Global Step: 142890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:37,315-Speed 10876.27 samples/sec   Loss 6.1250   LearningRate 0.0086   Epoch: 28   Global Step: 142900   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:34:38,269-Speed 10746.43 samples/sec   Loss 6.0995   LearningRate 0.0086   Epoch: 28   Global Step: 142910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:39,248-Speed 10469.49 samples/sec   Loss 6.0528   LearningRate 0.0086   Epoch: 28   Global Step: 142920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:34:40,223-Speed 10510.43 samples/sec   Loss 6.0363   LearningRate 0.0086   Epoch: 28   Global Step: 142930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:34:41,212-Speed 10361.31 samples/sec   Loss 6.0344   LearningRate 0.0086   Epoch: 28   Global Step: 142940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:34:42,173-Speed 10669.90 samples/sec   Loss 5.9945   LearningRate 0.0086   Epoch: 28   Global Step: 142950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:34:43,112-Speed 10917.14 samples/sec   Loss 6.1195   LearningRate 0.0086   Epoch: 28   Global Step: 142960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:34:44,060-Speed 10798.90 samples/sec   Loss 6.1259   LearningRate 0.0086   Epoch: 28   Global Step: 142970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:34:45,007-Speed 10841.10 samples/sec   Loss 6.0674   LearningRate 0.0086   Epoch: 28   Global Step: 142980   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:34:45,930-Speed 11097.77 samples/sec   Loss 6.1850   LearningRate 0.0086   Epoch: 28   Global Step: 142990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:34:46,895-Speed 10622.98 samples/sec   Loss 6.0995   LearningRate 0.0086   Epoch: 28   Global Step: 143000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:34:47,860-Speed 10624.46 samples/sec   Loss 6.1190   LearningRate 0.0086   Epoch: 28   Global Step: 143010   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:34:48,805-Speed 10840.43 samples/sec   Loss 6.2110   LearningRate 0.0086   Epoch: 28   Global Step: 143020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:49,749-Speed 10855.04 samples/sec   Loss 6.1001   LearningRate 0.0086   Epoch: 28   Global Step: 143030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:50,719-Speed 10565.74 samples/sec   Loss 6.1917   LearningRate 0.0086   Epoch: 28   Global Step: 143040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:51,680-Speed 10662.12 samples/sec   Loss 6.1485   LearningRate 0.0086   Epoch: 28   Global Step: 143050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:52,619-Speed 10923.32 samples/sec   Loss 6.2246   LearningRate 0.0086   Epoch: 28   Global Step: 143060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:53,564-Speed 10840.02 samples/sec   Loss 6.0886   LearningRate 0.0086   Epoch: 28   Global Step: 143070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:54,540-Speed 10501.77 samples/sec   Loss 6.2087   LearningRate 0.0086   Epoch: 28   Global Step: 143080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:55,502-Speed 10648.28 samples/sec   Loss 5.9710   LearningRate 0.0086   Epoch: 28   Global Step: 143090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:56,435-Speed 10986.89 samples/sec   Loss 6.1425   LearningRate 0.0086   Epoch: 28   Global Step: 143100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:57,431-Speed 10288.96 samples/sec   Loss 6.2109   LearningRate 0.0086   Epoch: 28   Global Step: 143110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:34:58,411-Speed 10455.12 samples/sec   Loss 6.1657   LearningRate 0.0086   Epoch: 28   Global Step: 143120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:34:59,376-Speed 10625.66 samples/sec   Loss 6.2509   LearningRate 0.0086   Epoch: 28   Global Step: 143130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:35:00,368-Speed 10327.39 samples/sec   Loss 6.1474   LearningRate 0.0086   Epoch: 28   Global Step: 143140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:35:01,344-Speed 10501.53 samples/sec   Loss 6.2208   LearningRate 0.0086   Epoch: 28   Global Step: 143150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:35:02,300-Speed 10715.76 samples/sec   Loss 6.2267   LearningRate 0.0086   Epoch: 28   Global Step: 143160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:35:03,283-Speed 10428.07 samples/sec   Loss 6.1584   LearningRate 0.0085   Epoch: 28   Global Step: 143170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:35:04,235-Speed 10774.21 samples/sec   Loss 6.1718   LearningRate 0.0085   Epoch: 28   Global Step: 143180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:35:05,199-Speed 10632.84 samples/sec   Loss 6.1563   LearningRate 0.0085   Epoch: 28   Global Step: 143190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:35:06,182-Speed 10425.41 samples/sec   Loss 6.0659   LearningRate 0.0085   Epoch: 28   Global Step: 143200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:35:07,136-Speed 10744.86 samples/sec   Loss 6.2551   LearningRate 0.0085   Epoch: 28   Global Step: 143210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:35:08,118-Speed 10438.99 samples/sec   Loss 6.2377   LearningRate 0.0085   Epoch: 28   Global Step: 143220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:09,108-Speed 10342.11 samples/sec   Loss 6.1966   LearningRate 0.0085   Epoch: 28   Global Step: 143230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:10,074-Speed 10611.46 samples/sec   Loss 6.1519   LearningRate 0.0085   Epoch: 28   Global Step: 143240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:11,033-Speed 10693.97 samples/sec   Loss 6.1780   LearningRate 0.0085   Epoch: 28   Global Step: 143250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:11,980-Speed 10817.94 samples/sec   Loss 6.2414   LearningRate 0.0085   Epoch: 28   Global Step: 143260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:12,946-Speed 10608.53 samples/sec   Loss 6.1095   LearningRate 0.0085   Epoch: 28   Global Step: 143270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:13,915-Speed 10580.90 samples/sec   Loss 6.1156   LearningRate 0.0085   Epoch: 28   Global Step: 143280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:14,894-Speed 10463.12 samples/sec   Loss 6.0324   LearningRate 0.0085   Epoch: 28   Global Step: 143290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:15,831-Speed 10935.65 samples/sec   Loss 6.1919   LearningRate 0.0085   Epoch: 28   Global Step: 143300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:16,786-Speed 10739.77 samples/sec   Loss 6.2050   LearningRate 0.0085   Epoch: 28   Global Step: 143310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:17,767-Speed 10446.51 samples/sec   Loss 6.1719   LearningRate 0.0085   Epoch: 28   Global Step: 143320   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:35:18,754-Speed 10391.95 samples/sec   Loss 6.3226   LearningRate 0.0085   Epoch: 28   Global Step: 143330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:19,737-Speed 10420.10 samples/sec   Loss 6.1853   LearningRate 0.0085   Epoch: 28   Global Step: 143340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:20,691-Speed 10742.84 samples/sec   Loss 6.1633   LearningRate 0.0085   Epoch: 28   Global Step: 143350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:21,638-Speed 10827.49 samples/sec   Loss 6.2130   LearningRate 0.0085   Epoch: 28   Global Step: 143360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:22,604-Speed 10619.66 samples/sec   Loss 6.1857   LearningRate 0.0085   Epoch: 28   Global Step: 143370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:23,573-Speed 10566.55 samples/sec   Loss 6.3137   LearningRate 0.0085   Epoch: 28   Global Step: 143380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:24,581-Speed 10165.38 samples/sec   Loss 6.2633   LearningRate 0.0085   Epoch: 28   Global Step: 143390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:25,534-Speed 10766.70 samples/sec   Loss 6.2480   LearningRate 0.0085   Epoch: 28   Global Step: 143400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:26,512-Speed 10472.07 samples/sec   Loss 6.0562   LearningRate 0.0085   Epoch: 28   Global Step: 143410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:27,475-Speed 10642.80 samples/sec   Loss 6.2315   LearningRate 0.0085   Epoch: 28   Global Step: 143420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:28,452-Speed 10491.99 samples/sec   Loss 6.3178   LearningRate 0.0085   Epoch: 28   Global Step: 143430   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:35:29,403-Speed 10784.11 samples/sec   Loss 6.2324   LearningRate 0.0085   Epoch: 28   Global Step: 143440   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:35:30,357-Speed 10743.21 samples/sec   Loss 6.3732   LearningRate 0.0085   Epoch: 28   Global Step: 143450   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:35:31,321-Speed 10624.88 samples/sec   Loss 6.1810   LearningRate 0.0085   Epoch: 28   Global Step: 143460   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:35:32,308-Speed 10385.03 samples/sec   Loss 6.2399   LearningRate 0.0085   Epoch: 28   Global Step: 143470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:33,278-Speed 10571.23 samples/sec   Loss 6.1478   LearningRate 0.0085   Epoch: 28   Global Step: 143480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:34,215-Speed 10941.37 samples/sec   Loss 6.2447   LearningRate 0.0085   Epoch: 28   Global Step: 143490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:35,148-Speed 10988.67 samples/sec   Loss 6.1828   LearningRate 0.0085   Epoch: 28   Global Step: 143500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:36,084-Speed 10948.84 samples/sec   Loss 6.1779   LearningRate 0.0084   Epoch: 28   Global Step: 143510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:37,022-Speed 10927.49 samples/sec   Loss 6.1444   LearningRate 0.0084   Epoch: 28   Global Step: 143520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:37,980-Speed 10695.86 samples/sec   Loss 6.4646   LearningRate 0.0084   Epoch: 28   Global Step: 143530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:38,986-Speed 10181.66 samples/sec   Loss 6.1191   LearningRate 0.0084   Epoch: 28   Global Step: 143540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:39,963-Speed 10488.48 samples/sec   Loss 5.9878   LearningRate 0.0084   Epoch: 28   Global Step: 143550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:40,915-Speed 10765.92 samples/sec   Loss 6.2762   LearningRate 0.0084   Epoch: 28   Global Step: 143560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:41,874-Speed 10692.96 samples/sec   Loss 6.2338   LearningRate 0.0084   Epoch: 28   Global Step: 143570   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:35:42,857-Speed 10426.03 samples/sec   Loss 6.0160   LearningRate 0.0084   Epoch: 28   Global Step: 143580   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:35:43,798-Speed 10886.86 samples/sec   Loss 6.2054   LearningRate 0.0084   Epoch: 28   Global Step: 143590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:44,798-Speed 10248.30 samples/sec   Loss 6.3421   LearningRate 0.0084   Epoch: 28   Global Step: 143600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:45,743-Speed 10851.73 samples/sec   Loss 6.3959   LearningRate 0.0084   Epoch: 28   Global Step: 143610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:46,702-Speed 10685.10 samples/sec   Loss 6.1352   LearningRate 0.0084   Epoch: 28   Global Step: 143620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:47,724-Speed 10025.11 samples/sec   Loss 6.2149   LearningRate 0.0084   Epoch: 28   Global Step: 143630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:48,709-Speed 10409.79 samples/sec   Loss 6.2592   LearningRate 0.0084   Epoch: 28   Global Step: 143640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:49,652-Speed 10870.73 samples/sec   Loss 6.1630   LearningRate 0.0084   Epoch: 28   Global Step: 143650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:50,613-Speed 10661.47 samples/sec   Loss 6.0659   LearningRate 0.0084   Epoch: 28   Global Step: 143660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:51,563-Speed 10789.88 samples/sec   Loss 6.2338   LearningRate 0.0084   Epoch: 28   Global Step: 143670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:52,585-Speed 10027.02 samples/sec   Loss 6.1328   LearningRate 0.0084   Epoch: 28   Global Step: 143680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:53,550-Speed 10619.98 samples/sec   Loss 6.2459   LearningRate 0.0084   Epoch: 28   Global Step: 143690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:54,506-Speed 10722.28 samples/sec   Loss 6.2840   LearningRate 0.0084   Epoch: 28   Global Step: 143700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:55,505-Speed 10265.92 samples/sec   Loss 6.2408   LearningRate 0.0084   Epoch: 28   Global Step: 143710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:56,513-Speed 10157.77 samples/sec   Loss 6.3023   LearningRate 0.0084   Epoch: 28   Global Step: 143720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:57,475-Speed 10654.46 samples/sec   Loss 6.2788   LearningRate 0.0084   Epoch: 28   Global Step: 143730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:58,451-Speed 10507.79 samples/sec   Loss 6.2406   LearningRate 0.0084   Epoch: 28   Global Step: 143740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:35:59,419-Speed 10581.18 samples/sec   Loss 6.2090   LearningRate 0.0084   Epoch: 28   Global Step: 143750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:36:00,397-Speed 10500.78 samples/sec   Loss 6.1758   LearningRate 0.0084   Epoch: 28   Global Step: 143760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:36:01,383-Speed 10391.14 samples/sec   Loss 6.0731   LearningRate 0.0084   Epoch: 28   Global Step: 143770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:36:02,357-Speed 10527.98 samples/sec   Loss 6.3178   LearningRate 0.0084   Epoch: 28   Global Step: 143780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:36:03,301-Speed 10857.78 samples/sec   Loss 6.2079   LearningRate 0.0084   Epoch: 28   Global Step: 143790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:36:04,235-Speed 10963.59 samples/sec   Loss 6.2491   LearningRate 0.0084   Epoch: 28   Global Step: 143800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:36:05,219-Speed 10416.49 samples/sec   Loss 6.2413   LearningRate 0.0084   Epoch: 28   Global Step: 143810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:36:06,170-Speed 10784.71 samples/sec   Loss 6.2566   LearningRate 0.0084   Epoch: 28   Global Step: 143820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:36:07,189-Speed 10051.55 samples/sec   Loss 6.2010   LearningRate 0.0084   Epoch: 28   Global Step: 143830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:36:08,174-Speed 10409.64 samples/sec   Loss 6.3033   LearningRate 0.0084   Epoch: 28   Global Step: 143840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:36:09,139-Speed 10614.59 samples/sec   Loss 6.3080   LearningRate 0.0084   Epoch: 28   Global Step: 143850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:36:10,082-Speed 10868.23 samples/sec   Loss 6.2777   LearningRate 0.0083   Epoch: 28   Global Step: 143860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:36:11,031-Speed 10797.12 samples/sec   Loss 6.2393   LearningRate 0.0083   Epoch: 28   Global Step: 143870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:36:11,985-Speed 10750.79 samples/sec   Loss 6.1658   LearningRate 0.0083   Epoch: 28   Global Step: 143880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:36:12,957-Speed 10538.00 samples/sec   Loss 6.2755   LearningRate 0.0083   Epoch: 28   Global Step: 143890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:36:13,943-Speed 10401.17 samples/sec   Loss 6.2672   LearningRate 0.0083   Epoch: 28   Global Step: 143900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:36:14,910-Speed 10596.97 samples/sec   Loss 6.3309   LearningRate 0.0083   Epoch: 28   Global Step: 143910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:36:15,891-Speed 10449.48 samples/sec   Loss 6.3157   LearningRate 0.0083   Epoch: 28   Global Step: 143920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:36:16,853-Speed 10654.19 samples/sec   Loss 6.2840   LearningRate 0.0083   Epoch: 28   Global Step: 143930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:36:17,797-Speed 10858.88 samples/sec   Loss 6.2440   LearningRate 0.0083   Epoch: 28   Global Step: 143940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:36:18,797-Speed 10242.83 samples/sec   Loss 6.2763   LearningRate 0.0083   Epoch: 28   Global Step: 143950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:36:19,755-Speed 10705.05 samples/sec   Loss 6.1226   LearningRate 0.0083   Epoch: 28   Global Step: 143960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:36:20,673-Speed 11170.64 samples/sec   Loss 6.1419   LearningRate 0.0083   Epoch: 28   Global Step: 143970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:36:21,647-Speed 10541.74 samples/sec   Loss 6.1945   LearningRate 0.0083   Epoch: 28   Global Step: 143980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:36:22,556-Speed 11271.74 samples/sec   Loss 6.1582   LearningRate 0.0083   Epoch: 28   Global Step: 143990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:36:23,491-Speed 10962.12 samples/sec   Loss 6.1668   LearningRate 0.0083   Epoch: 28   Global Step: 144000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:36:45,773-[lfw][144000]XNorm: 8.827336
Training: 2022-04-11 04:36:45,774-[lfw][144000]Accuracy-Flip: 0.99583+-0.00335
Training: 2022-04-11 04:36:45,775-[lfw][144000]Accuracy-Highest: 0.99700
Training: 2022-04-11 04:37:11,557-[cfp_fp][144000]XNorm: 7.590035
Training: 2022-04-11 04:37:11,558-[cfp_fp][144000]Accuracy-Flip: 0.96914+-0.00986
Training: 2022-04-11 04:37:11,559-[cfp_fp][144000]Accuracy-Highest: 0.96914
Training: 2022-04-11 04:37:33,932-[agedb_30][144000]XNorm: 8.639676
Training: 2022-04-11 04:37:33,933-[agedb_30][144000]Accuracy-Flip: 0.97233+-0.00797
Training: 2022-04-11 04:37:33,933-[agedb_30][144000]Accuracy-Highest: 0.97250
Training: 2022-04-11 04:37:34,881-Speed 143.44 samples/sec   Loss 6.4279   LearningRate 0.0083   Epoch: 28   Global Step: 144010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:37:35,852-Speed 10564.69 samples/sec   Loss 6.1806   LearningRate 0.0083   Epoch: 28   Global Step: 144020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:37:36,750-Speed 11412.03 samples/sec   Loss 6.2449   LearningRate 0.0083   Epoch: 28   Global Step: 144030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:37:37,739-Speed 10362.81 samples/sec   Loss 6.3660   LearningRate 0.0083   Epoch: 28   Global Step: 144040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:37:38,719-Speed 10460.82 samples/sec   Loss 6.2175   LearningRate 0.0083   Epoch: 28   Global Step: 144050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:37:39,681-Speed 10660.70 samples/sec   Loss 6.2497   LearningRate 0.0083   Epoch: 28   Global Step: 144060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:37:40,666-Speed 10397.84 samples/sec   Loss 6.3081   LearningRate 0.0083   Epoch: 28   Global Step: 144070   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:37:41,637-Speed 10560.50 samples/sec   Loss 6.1781   LearningRate 0.0083   Epoch: 28   Global Step: 144080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:37:42,627-Speed 10355.37 samples/sec   Loss 6.2893   LearningRate 0.0083   Epoch: 28   Global Step: 144090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:37:43,609-Speed 10443.17 samples/sec   Loss 6.2454   LearningRate 0.0083   Epoch: 28   Global Step: 144100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:37:44,568-Speed 10682.93 samples/sec   Loss 6.3372   LearningRate 0.0083   Epoch: 28   Global Step: 144110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:37:45,532-Speed 10631.10 samples/sec   Loss 6.1798   LearningRate 0.0083   Epoch: 28   Global Step: 144120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:37:46,557-Speed 10004.28 samples/sec   Loss 6.3448   LearningRate 0.0083   Epoch: 28   Global Step: 144130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:37:47,509-Speed 10766.70 samples/sec   Loss 6.2689   LearningRate 0.0083   Epoch: 28   Global Step: 144140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:37:48,478-Speed 10575.37 samples/sec   Loss 6.1582   LearningRate 0.0083   Epoch: 28   Global Step: 144150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:37:49,474-Speed 10291.22 samples/sec   Loss 6.3258   LearningRate 0.0083   Epoch: 28   Global Step: 144160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:37:50,423-Speed 10811.20 samples/sec   Loss 6.3111   LearningRate 0.0083   Epoch: 28   Global Step: 144170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:37:51,395-Speed 10546.79 samples/sec   Loss 6.3736   LearningRate 0.0083   Epoch: 28   Global Step: 144180   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:37:52,367-Speed 10536.75 samples/sec   Loss 6.3152   LearningRate 0.0083   Epoch: 28   Global Step: 144190   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:37:53,315-Speed 10807.34 samples/sec   Loss 6.2224   LearningRate 0.0083   Epoch: 28   Global Step: 144200   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:37:54,290-Speed 10516.43 samples/sec   Loss 6.2507   LearningRate 0.0082   Epoch: 28   Global Step: 144210   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:37:55,223-Speed 10979.77 samples/sec   Loss 6.2924   LearningRate 0.0082   Epoch: 28   Global Step: 144220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:37:56,187-Speed 10625.76 samples/sec   Loss 6.1384   LearningRate 0.0082   Epoch: 28   Global Step: 144230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:37:57,158-Speed 10562.71 samples/sec   Loss 6.2487   LearningRate 0.0082   Epoch: 28   Global Step: 144240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:37:58,136-Speed 10481.95 samples/sec   Loss 6.1886   LearningRate 0.0082   Epoch: 28   Global Step: 144250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:37:59,141-Speed 10199.24 samples/sec   Loss 6.2871   LearningRate 0.0082   Epoch: 28   Global Step: 144260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:00,120-Speed 10466.84 samples/sec   Loss 6.3055   LearningRate 0.0082   Epoch: 28   Global Step: 144270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:01,121-Speed 10236.15 samples/sec   Loss 6.3353   LearningRate 0.0082   Epoch: 28   Global Step: 144280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:02,130-Speed 10165.63 samples/sec   Loss 6.4292   LearningRate 0.0082   Epoch: 28   Global Step: 144290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:03,057-Speed 11058.89 samples/sec   Loss 6.0799   LearningRate 0.0082   Epoch: 28   Global Step: 144300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:03,992-Speed 10951.30 samples/sec   Loss 6.2939   LearningRate 0.0082   Epoch: 28   Global Step: 144310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:04,950-Speed 10701.78 samples/sec   Loss 6.2258   LearningRate 0.0082   Epoch: 28   Global Step: 144320   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:38:05,918-Speed 10579.19 samples/sec   Loss 6.3686   LearningRate 0.0082   Epoch: 28   Global Step: 144330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:06,898-Speed 10483.53 samples/sec   Loss 6.1920   LearningRate 0.0082   Epoch: 28   Global Step: 144340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:07,900-Speed 10239.05 samples/sec   Loss 6.1961   LearningRate 0.0082   Epoch: 28   Global Step: 144350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:08,864-Speed 10630.76 samples/sec   Loss 6.1793   LearningRate 0.0082   Epoch: 28   Global Step: 144360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:09,876-Speed 10135.68 samples/sec   Loss 6.2168   LearningRate 0.0082   Epoch: 28   Global Step: 144370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:10,838-Speed 10653.01 samples/sec   Loss 6.2586   LearningRate 0.0082   Epoch: 28   Global Step: 144380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:11,774-Speed 10950.16 samples/sec   Loss 6.3690   LearningRate 0.0082   Epoch: 28   Global Step: 144390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:12,756-Speed 10434.04 samples/sec   Loss 6.2831   LearningRate 0.0082   Epoch: 28   Global Step: 144400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:13,720-Speed 10638.15 samples/sec   Loss 6.2704   LearningRate 0.0082   Epoch: 28   Global Step: 144410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:14,677-Speed 10698.31 samples/sec   Loss 6.2487   LearningRate 0.0082   Epoch: 28   Global Step: 144420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:15,679-Speed 10234.94 samples/sec   Loss 6.2799   LearningRate 0.0082   Epoch: 28   Global Step: 144430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:16,652-Speed 10536.76 samples/sec   Loss 6.2936   LearningRate 0.0082   Epoch: 28   Global Step: 144440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:17,635-Speed 10425.49 samples/sec   Loss 6.2439   LearningRate 0.0082   Epoch: 28   Global Step: 144450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:18,646-Speed 10134.77 samples/sec   Loss 6.2197   LearningRate 0.0082   Epoch: 28   Global Step: 144460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:19,622-Speed 10499.71 samples/sec   Loss 6.2470   LearningRate 0.0082   Epoch: 28   Global Step: 144470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:20,595-Speed 10543.41 samples/sec   Loss 6.1900   LearningRate 0.0082   Epoch: 28   Global Step: 144480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:21,556-Speed 10661.12 samples/sec   Loss 6.2579   LearningRate 0.0082   Epoch: 28   Global Step: 144490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:22,549-Speed 10326.61 samples/sec   Loss 6.2771   LearningRate 0.0082   Epoch: 28   Global Step: 144500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:23,513-Speed 10632.24 samples/sec   Loss 6.3586   LearningRate 0.0082   Epoch: 28   Global Step: 144510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:24,479-Speed 10607.09 samples/sec   Loss 6.3732   LearningRate 0.0082   Epoch: 28   Global Step: 144520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:25,481-Speed 10228.74 samples/sec   Loss 6.2571   LearningRate 0.0082   Epoch: 28   Global Step: 144530   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:38:26,476-Speed 10290.14 samples/sec   Loss 6.1902   LearningRate 0.0082   Epoch: 28   Global Step: 144540   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:38:27,529-Speed 9738.82 samples/sec   Loss 6.3802   LearningRate 0.0082   Epoch: 28   Global Step: 144550   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:38:28,465-Speed 10956.12 samples/sec   Loss 6.3013   LearningRate 0.0082   Epoch: 28   Global Step: 144560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:29,453-Speed 10372.27 samples/sec   Loss 6.1212   LearningRate 0.0081   Epoch: 28   Global Step: 144570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:30,437-Speed 10411.78 samples/sec   Loss 6.1924   LearningRate 0.0081   Epoch: 28   Global Step: 144580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:31,371-Speed 10973.99 samples/sec   Loss 6.1366   LearningRate 0.0081   Epoch: 28   Global Step: 144590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:32,367-Speed 10288.60 samples/sec   Loss 6.2783   LearningRate 0.0081   Epoch: 28   Global Step: 144600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:33,298-Speed 11019.12 samples/sec   Loss 6.2868   LearningRate 0.0081   Epoch: 28   Global Step: 144610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:34,278-Speed 10454.71 samples/sec   Loss 6.3115   LearningRate 0.0081   Epoch: 28   Global Step: 144620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:35,270-Speed 10325.28 samples/sec   Loss 6.3243   LearningRate 0.0081   Epoch: 28   Global Step: 144630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:36,239-Speed 10579.76 samples/sec   Loss 6.2949   LearningRate 0.0081   Epoch: 28   Global Step: 144640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:37,247-Speed 10169.87 samples/sec   Loss 6.1847   LearningRate 0.0081   Epoch: 28   Global Step: 144650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:38,248-Speed 10236.82 samples/sec   Loss 6.2490   LearningRate 0.0081   Epoch: 28   Global Step: 144660   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:38:39,209-Speed 10663.82 samples/sec   Loss 6.2664   LearningRate 0.0081   Epoch: 28   Global Step: 144670   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:38:40,180-Speed 10561.06 samples/sec   Loss 6.1908   LearningRate 0.0081   Epoch: 28   Global Step: 144680   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:38:41,137-Speed 10715.68 samples/sec   Loss 6.1893   LearningRate 0.0081   Epoch: 28   Global Step: 144690   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:38:42,067-Speed 11014.22 samples/sec   Loss 6.3011   LearningRate 0.0081   Epoch: 28   Global Step: 144700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:43,013-Speed 10836.22 samples/sec   Loss 6.3721   LearningRate 0.0081   Epoch: 28   Global Step: 144710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:43,987-Speed 10521.03 samples/sec   Loss 6.3642   LearningRate 0.0081   Epoch: 28   Global Step: 144720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:44,937-Speed 10808.43 samples/sec   Loss 6.4550   LearningRate 0.0081   Epoch: 28   Global Step: 144730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:45,889-Speed 10771.25 samples/sec   Loss 6.2391   LearningRate 0.0081   Epoch: 28   Global Step: 144740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:46,847-Speed 10702.58 samples/sec   Loss 6.2975   LearningRate 0.0081   Epoch: 28   Global Step: 144750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:47,779-Speed 10994.89 samples/sec   Loss 6.4224   LearningRate 0.0081   Epoch: 28   Global Step: 144760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:38:48,730-Speed 10780.97 samples/sec   Loss 6.2577   LearningRate 0.0081   Epoch: 28   Global Step: 144770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:38:49,725-Speed 10301.41 samples/sec   Loss 6.2871   LearningRate 0.0081   Epoch: 28   Global Step: 144780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:38:50,702-Speed 10490.91 samples/sec   Loss 6.2904   LearningRate 0.0081   Epoch: 28   Global Step: 144790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:38:51,707-Speed 10206.58 samples/sec   Loss 6.2266   LearningRate 0.0081   Epoch: 28   Global Step: 144800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:38:52,685-Speed 10477.43 samples/sec   Loss 6.4289   LearningRate 0.0081   Epoch: 28   Global Step: 144810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:38:53,657-Speed 10547.47 samples/sec   Loss 6.2409   LearningRate 0.0081   Epoch: 28   Global Step: 144820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:38:54,641-Speed 10428.96 samples/sec   Loss 6.2630   LearningRate 0.0081   Epoch: 28   Global Step: 144830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:38:55,579-Speed 10915.56 samples/sec   Loss 6.2975   LearningRate 0.0081   Epoch: 28   Global Step: 144840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:38:56,509-Speed 11026.75 samples/sec   Loss 6.4127   LearningRate 0.0081   Epoch: 28   Global Step: 144850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:38:57,490-Speed 10444.67 samples/sec   Loss 6.2134   LearningRate 0.0081   Epoch: 28   Global Step: 144860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:58,474-Speed 10421.65 samples/sec   Loss 6.2910   LearningRate 0.0081   Epoch: 28   Global Step: 144870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:38:59,473-Speed 10254.96 samples/sec   Loss 6.2470   LearningRate 0.0081   Epoch: 28   Global Step: 144880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:00,452-Speed 10473.06 samples/sec   Loss 6.3151   LearningRate 0.0081   Epoch: 28   Global Step: 144890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:01,457-Speed 10192.04 samples/sec   Loss 6.2592   LearningRate 0.0081   Epoch: 28   Global Step: 144900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:02,465-Speed 10174.63 samples/sec   Loss 6.3609   LearningRate 0.0081   Epoch: 28   Global Step: 144910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:03,466-Speed 10236.19 samples/sec   Loss 6.1566   LearningRate 0.0080   Epoch: 28   Global Step: 144920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:04,425-Speed 10688.10 samples/sec   Loss 6.3719   LearningRate 0.0080   Epoch: 28   Global Step: 144930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:05,392-Speed 10591.33 samples/sec   Loss 6.2291   LearningRate 0.0080   Epoch: 28   Global Step: 144940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:06,390-Speed 10270.40 samples/sec   Loss 6.2626   LearningRate 0.0080   Epoch: 28   Global Step: 144950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:07,348-Speed 10704.42 samples/sec   Loss 6.3579   LearningRate 0.0080   Epoch: 28   Global Step: 144960   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:39:08,295-Speed 10821.03 samples/sec   Loss 6.1886   LearningRate 0.0080   Epoch: 28   Global Step: 144970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:09,301-Speed 10193.17 samples/sec   Loss 6.2083   LearningRate 0.0080   Epoch: 28   Global Step: 144980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:10,326-Speed 9995.40 samples/sec   Loss 6.2748   LearningRate 0.0080   Epoch: 28   Global Step: 144990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:11,319-Speed 10325.70 samples/sec   Loss 6.3492   LearningRate 0.0080   Epoch: 28   Global Step: 145000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:12,324-Speed 10197.52 samples/sec   Loss 6.4218   LearningRate 0.0080   Epoch: 28   Global Step: 145010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:13,296-Speed 10542.44 samples/sec   Loss 6.2678   LearningRate 0.0080   Epoch: 28   Global Step: 145020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:14,211-Speed 11203.04 samples/sec   Loss 6.2922   LearningRate 0.0080   Epoch: 28   Global Step: 145030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:15,172-Speed 10654.31 samples/sec   Loss 6.4375   LearningRate 0.0080   Epoch: 28   Global Step: 145040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:16,118-Speed 10842.17 samples/sec   Loss 6.2698   LearningRate 0.0080   Epoch: 28   Global Step: 145050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:17,067-Speed 10792.06 samples/sec   Loss 6.2645   LearningRate 0.0080   Epoch: 28   Global Step: 145060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:18,031-Speed 10640.53 samples/sec   Loss 6.4214   LearningRate 0.0080   Epoch: 28   Global Step: 145070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:19,010-Speed 10464.25 samples/sec   Loss 6.4239   LearningRate 0.0080   Epoch: 28   Global Step: 145080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:19,979-Speed 10577.52 samples/sec   Loss 6.1933   LearningRate 0.0080   Epoch: 28   Global Step: 145090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:39:20,930-Speed 10780.89 samples/sec   Loss 6.3407   LearningRate 0.0080   Epoch: 28   Global Step: 145100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:39:21,921-Speed 10339.12 samples/sec   Loss 6.3497   LearningRate 0.0080   Epoch: 28   Global Step: 145110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:39:22,926-Speed 10200.77 samples/sec   Loss 6.5918   LearningRate 0.0080   Epoch: 28   Global Step: 145120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:39:23,885-Speed 10683.67 samples/sec   Loss 6.2705   LearningRate 0.0080   Epoch: 28   Global Step: 145130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:39:24,831-Speed 10835.58 samples/sec   Loss 6.2223   LearningRate 0.0080   Epoch: 28   Global Step: 145140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:39:25,787-Speed 10724.38 samples/sec   Loss 6.2006   LearningRate 0.0080   Epoch: 28   Global Step: 145150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:39:26,797-Speed 10146.36 samples/sec   Loss 6.3419   LearningRate 0.0080   Epoch: 28   Global Step: 145160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:39:27,765-Speed 10588.12 samples/sec   Loss 6.1305   LearningRate 0.0080   Epoch: 28   Global Step: 145170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:39:28,736-Speed 10559.98 samples/sec   Loss 6.4566   LearningRate 0.0080   Epoch: 28   Global Step: 145180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:39:29,752-Speed 10080.61 samples/sec   Loss 6.4069   LearningRate 0.0080   Epoch: 28   Global Step: 145190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:30,791-Speed 9864.59 samples/sec   Loss 6.3937   LearningRate 0.0080   Epoch: 28   Global Step: 145200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:31,739-Speed 10811.89 samples/sec   Loss 6.3847   LearningRate 0.0080   Epoch: 28   Global Step: 145210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:32,725-Speed 10397.44 samples/sec   Loss 6.4714   LearningRate 0.0080   Epoch: 28   Global Step: 145220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:33,710-Speed 10400.48 samples/sec   Loss 6.3201   LearningRate 0.0080   Epoch: 28   Global Step: 145230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:34,689-Speed 10474.88 samples/sec   Loss 6.2679   LearningRate 0.0080   Epoch: 28   Global Step: 145240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:35,678-Speed 10360.96 samples/sec   Loss 6.2681   LearningRate 0.0080   Epoch: 28   Global Step: 145250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:36,659-Speed 10457.17 samples/sec   Loss 6.2097   LearningRate 0.0080   Epoch: 28   Global Step: 145260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:37,603-Speed 10856.25 samples/sec   Loss 6.2417   LearningRate 0.0080   Epoch: 28   Global Step: 145270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:38,559-Speed 10716.78 samples/sec   Loss 6.3106   LearningRate 0.0079   Epoch: 28   Global Step: 145280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:39,563-Speed 10207.29 samples/sec   Loss 6.2706   LearningRate 0.0079   Epoch: 28   Global Step: 145290   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:39:40,506-Speed 10862.74 samples/sec   Loss 6.3407   LearningRate 0.0079   Epoch: 28   Global Step: 145300   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:39:41,478-Speed 10545.95 samples/sec   Loss 6.1510   LearningRate 0.0079   Epoch: 28   Global Step: 145310   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:39:42,461-Speed 10428.81 samples/sec   Loss 6.2624   LearningRate 0.0079   Epoch: 28   Global Step: 145320   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:39:43,411-Speed 10784.81 samples/sec   Loss 6.2680   LearningRate 0.0079   Epoch: 28   Global Step: 145330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:44,361-Speed 10790.11 samples/sec   Loss 6.4032   LearningRate 0.0079   Epoch: 28   Global Step: 145340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:45,309-Speed 10810.11 samples/sec   Loss 6.2221   LearningRate 0.0079   Epoch: 28   Global Step: 145350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:46,287-Speed 10483.51 samples/sec   Loss 6.3710   LearningRate 0.0079   Epoch: 28   Global Step: 145360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:47,263-Speed 10492.63 samples/sec   Loss 6.2603   LearningRate 0.0079   Epoch: 28   Global Step: 145370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:48,250-Speed 10385.00 samples/sec   Loss 6.3428   LearningRate 0.0079   Epoch: 28   Global Step: 145380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:49,216-Speed 10615.31 samples/sec   Loss 6.3792   LearningRate 0.0079   Epoch: 28   Global Step: 145390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:50,240-Speed 10007.52 samples/sec   Loss 6.2573   LearningRate 0.0079   Epoch: 28   Global Step: 145400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:51,182-Speed 10872.72 samples/sec   Loss 6.2466   LearningRate 0.0079   Epoch: 28   Global Step: 145410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:52,144-Speed 10655.17 samples/sec   Loss 6.3925   LearningRate 0.0079   Epoch: 28   Global Step: 145420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:53,086-Speed 10881.73 samples/sec   Loss 6.4262   LearningRate 0.0079   Epoch: 28   Global Step: 145430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:54,047-Speed 10670.29 samples/sec   Loss 6.2801   LearningRate 0.0079   Epoch: 28   Global Step: 145440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:55,033-Speed 10397.38 samples/sec   Loss 6.3197   LearningRate 0.0079   Epoch: 28   Global Step: 145450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:55,965-Speed 10993.61 samples/sec   Loss 6.3292   LearningRate 0.0079   Epoch: 28   Global Step: 145460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:56,941-Speed 10503.62 samples/sec   Loss 6.3052   LearningRate 0.0079   Epoch: 28   Global Step: 145470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:57,902-Speed 10669.75 samples/sec   Loss 6.3174   LearningRate 0.0079   Epoch: 28   Global Step: 145480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:58,851-Speed 10791.63 samples/sec   Loss 6.4986   LearningRate 0.0079   Epoch: 28   Global Step: 145490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:39:59,842-Speed 10347.70 samples/sec   Loss 6.2127   LearningRate 0.0079   Epoch: 28   Global Step: 145500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:00,812-Speed 10569.16 samples/sec   Loss 6.2829   LearningRate 0.0079   Epoch: 28   Global Step: 145510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:01,790-Speed 10473.75 samples/sec   Loss 6.4228   LearningRate 0.0079   Epoch: 28   Global Step: 145520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:02,705-Speed 11207.66 samples/sec   Loss 6.3081   LearningRate 0.0079   Epoch: 28   Global Step: 145530   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:40:03,646-Speed 10887.12 samples/sec   Loss 6.2097   LearningRate 0.0079   Epoch: 28   Global Step: 145540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:04,583-Speed 10942.56 samples/sec   Loss 6.2666   LearningRate 0.0079   Epoch: 28   Global Step: 145550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:05,549-Speed 10616.89 samples/sec   Loss 6.3633   LearningRate 0.0079   Epoch: 28   Global Step: 145560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:06,508-Speed 10681.23 samples/sec   Loss 6.4386   LearningRate 0.0079   Epoch: 28   Global Step: 145570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:07,477-Speed 10581.01 samples/sec   Loss 6.3804   LearningRate 0.0079   Epoch: 28   Global Step: 145580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:08,433-Speed 10726.40 samples/sec   Loss 6.1671   LearningRate 0.0079   Epoch: 28   Global Step: 145590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:09,401-Speed 10582.24 samples/sec   Loss 6.2585   LearningRate 0.0079   Epoch: 28   Global Step: 145600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:10,362-Speed 10665.73 samples/sec   Loss 6.3046   LearningRate 0.0079   Epoch: 28   Global Step: 145610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:11,345-Speed 10425.61 samples/sec   Loss 6.1831   LearningRate 0.0079   Epoch: 28   Global Step: 145620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:12,340-Speed 10299.36 samples/sec   Loss 6.2344   LearningRate 0.0079   Epoch: 28   Global Step: 145630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:13,321-Speed 10451.84 samples/sec   Loss 6.4008   LearningRate 0.0078   Epoch: 28   Global Step: 145640   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:40:14,320-Speed 10263.19 samples/sec   Loss 6.2799   LearningRate 0.0078   Epoch: 28   Global Step: 145650   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:40:15,302-Speed 10428.14 samples/sec   Loss 6.3329   LearningRate 0.0078   Epoch: 28   Global Step: 145660   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:40:16,295-Speed 10328.21 samples/sec   Loss 6.2808   LearningRate 0.0078   Epoch: 28   Global Step: 145670   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:40:17,266-Speed 10550.76 samples/sec   Loss 6.3517   LearningRate 0.0078   Epoch: 28   Global Step: 145680   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:40:18,246-Speed 10454.92 samples/sec   Loss 6.2223   LearningRate 0.0078   Epoch: 28   Global Step: 145690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:19,214-Speed 10588.02 samples/sec   Loss 6.4585   LearningRate 0.0078   Epoch: 28   Global Step: 145700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:20,203-Speed 10361.21 samples/sec   Loss 6.3534   LearningRate 0.0078   Epoch: 28   Global Step: 145710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:21,178-Speed 10515.57 samples/sec   Loss 6.4415   LearningRate 0.0078   Epoch: 28   Global Step: 145720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:22,168-Speed 10358.75 samples/sec   Loss 6.3014   LearningRate 0.0078   Epoch: 28   Global Step: 145730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:23,126-Speed 10686.96 samples/sec   Loss 6.3386   LearningRate 0.0078   Epoch: 28   Global Step: 145740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:24,088-Speed 10655.63 samples/sec   Loss 6.5283   LearningRate 0.0078   Epoch: 28   Global Step: 145750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:25,061-Speed 10534.86 samples/sec   Loss 6.4141   LearningRate 0.0078   Epoch: 28   Global Step: 145760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:26,054-Speed 10317.26 samples/sec   Loss 6.3984   LearningRate 0.0078   Epoch: 28   Global Step: 145770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:26,967-Speed 11221.81 samples/sec   Loss 6.4299   LearningRate 0.0078   Epoch: 28   Global Step: 145780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:28,009-Speed 9840.38 samples/sec   Loss 6.4746   LearningRate 0.0078   Epoch: 28   Global Step: 145790   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:40:28,978-Speed 10578.74 samples/sec   Loss 6.4632   LearningRate 0.0078   Epoch: 28   Global Step: 145800   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:40:29,946-Speed 10584.08 samples/sec   Loss 6.1207   LearningRate 0.0078   Epoch: 28   Global Step: 145810   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:40:30,875-Speed 11039.64 samples/sec   Loss 6.3741   LearningRate 0.0078   Epoch: 28   Global Step: 145820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:31,855-Speed 10453.33 samples/sec   Loss 6.2663   LearningRate 0.0078   Epoch: 28   Global Step: 145830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:32,825-Speed 10567.45 samples/sec   Loss 6.2699   LearningRate 0.0078   Epoch: 28   Global Step: 145840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:33,788-Speed 10645.06 samples/sec   Loss 6.2912   LearningRate 0.0078   Epoch: 28   Global Step: 145850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:34,787-Speed 10258.31 samples/sec   Loss 6.1927   LearningRate 0.0078   Epoch: 28   Global Step: 145860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:35,785-Speed 10263.31 samples/sec   Loss 6.3818   LearningRate 0.0078   Epoch: 28   Global Step: 145870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:36,733-Speed 10815.73 samples/sec   Loss 6.3762   LearningRate 0.0078   Epoch: 28   Global Step: 145880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:37,682-Speed 10804.06 samples/sec   Loss 6.5048   LearningRate 0.0078   Epoch: 28   Global Step: 145890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:38,632-Speed 10811.44 samples/sec   Loss 6.4559   LearningRate 0.0078   Epoch: 28   Global Step: 145900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:39,631-Speed 10260.47 samples/sec   Loss 6.4185   LearningRate 0.0078   Epoch: 28   Global Step: 145910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:40,653-Speed 10035.32 samples/sec   Loss 6.3571   LearningRate 0.0078   Epoch: 28   Global Step: 145920   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:40:41,620-Speed 10592.37 samples/sec   Loss 6.4471   LearningRate 0.0078   Epoch: 28   Global Step: 145930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:42,601-Speed 10455.62 samples/sec   Loss 6.3967   LearningRate 0.0078   Epoch: 28   Global Step: 145940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:43,553-Speed 10758.35 samples/sec   Loss 6.3006   LearningRate 0.0078   Epoch: 28   Global Step: 145950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:44,586-Speed 9925.90 samples/sec   Loss 6.5201   LearningRate 0.0078   Epoch: 28   Global Step: 145960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:45,519-Speed 10988.10 samples/sec   Loss 6.3277   LearningRate 0.0078   Epoch: 28   Global Step: 145970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:46,521-Speed 10234.35 samples/sec   Loss 6.5330   LearningRate 0.0078   Epoch: 28   Global Step: 145980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:47,514-Speed 10321.26 samples/sec   Loss 6.3966   LearningRate 0.0078   Epoch: 28   Global Step: 145990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:40:48,489-Speed 10514.98 samples/sec   Loss 6.4904   LearningRate 0.0077   Epoch: 28   Global Step: 146000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:41:10,606-[lfw][146000]XNorm: 8.869128
Training: 2022-04-11 04:41:10,606-[lfw][146000]Accuracy-Flip: 0.99600+-0.00309
Training: 2022-04-11 04:41:10,607-[lfw][146000]Accuracy-Highest: 0.99700
Training: 2022-04-11 04:41:36,139-[cfp_fp][146000]XNorm: 7.598924
Training: 2022-04-11 04:41:36,139-[cfp_fp][146000]Accuracy-Flip: 0.96657+-0.00860
Training: 2022-04-11 04:41:36,140-[cfp_fp][146000]Accuracy-Highest: 0.96914
Training: 2022-04-11 04:41:58,174-[agedb_30][146000]XNorm: 8.638296
Training: 2022-04-11 04:41:58,174-[agedb_30][146000]Accuracy-Flip: 0.96917+-0.00602
Training: 2022-04-11 04:41:58,175-[agedb_30][146000]Accuracy-Highest: 0.97250
Training: 2022-04-11 04:41:59,173-Speed 144.87 samples/sec   Loss 6.5055   LearningRate 0.0077   Epoch: 28   Global Step: 146010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:42:00,160-Speed 10383.37 samples/sec   Loss 6.5252   LearningRate 0.0077   Epoch: 28   Global Step: 146020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:42:01,165-Speed 10203.57 samples/sec   Loss 6.2838   LearningRate 0.0077   Epoch: 28   Global Step: 146030   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:42:02,131-Speed 10606.10 samples/sec   Loss 6.4704   LearningRate 0.0077   Epoch: 28   Global Step: 146040   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:42:03,091-Speed 10688.94 samples/sec   Loss 6.3728   LearningRate 0.0077   Epoch: 28   Global Step: 146050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:42:04,030-Speed 10915.15 samples/sec   Loss 6.2611   LearningRate 0.0077   Epoch: 28   Global Step: 146060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:42:04,969-Speed 10918.28 samples/sec   Loss 6.3893   LearningRate 0.0077   Epoch: 28   Global Step: 146070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:42:05,927-Speed 10703.11 samples/sec   Loss 6.3294   LearningRate 0.0077   Epoch: 28   Global Step: 146080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:42:06,844-Speed 11169.90 samples/sec   Loss 6.3340   LearningRate 0.0077   Epoch: 28   Global Step: 146090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:42:07,841-Speed 10277.81 samples/sec   Loss 6.2829   LearningRate 0.0077   Epoch: 28   Global Step: 146100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:42:08,845-Speed 10213.39 samples/sec   Loss 6.3788   LearningRate 0.0077   Epoch: 28   Global Step: 146110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:42:09,833-Speed 10379.74 samples/sec   Loss 6.3275   LearningRate 0.0077   Epoch: 28   Global Step: 146120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:42:10,800-Speed 10605.89 samples/sec   Loss 6.4355   LearningRate 0.0077   Epoch: 28   Global Step: 146130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:42:11,750-Speed 10782.29 samples/sec   Loss 6.1618   LearningRate 0.0077   Epoch: 28   Global Step: 146140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:42:12,728-Speed 10479.35 samples/sec   Loss 6.5705   LearningRate 0.0077   Epoch: 28   Global Step: 146150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:42:13,714-Speed 10399.29 samples/sec   Loss 6.3573   LearningRate 0.0077   Epoch: 28   Global Step: 146160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:42:14,658-Speed 10849.48 samples/sec   Loss 6.3855   LearningRate 0.0077   Epoch: 28   Global Step: 146170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:42:15,614-Speed 10718.73 samples/sec   Loss 6.4712   LearningRate 0.0077   Epoch: 28   Global Step: 146180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:42:16,569-Speed 10736.41 samples/sec   Loss 6.5273   LearningRate 0.0077   Epoch: 28   Global Step: 146190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:42:17,578-Speed 10151.21 samples/sec   Loss 6.4571   LearningRate 0.0077   Epoch: 28   Global Step: 146200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:42:18,557-Speed 10479.88 samples/sec   Loss 6.3856   LearningRate 0.0077   Epoch: 28   Global Step: 146210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:42:19,533-Speed 10501.84 samples/sec   Loss 6.4329   LearningRate 0.0077   Epoch: 28   Global Step: 146220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:42:20,523-Speed 10345.30 samples/sec   Loss 6.2792   LearningRate 0.0077   Epoch: 28   Global Step: 146230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:42:21,468-Speed 10845.44 samples/sec   Loss 6.3522   LearningRate 0.0077   Epoch: 28   Global Step: 146240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:42:22,422-Speed 10740.16 samples/sec   Loss 6.2030   LearningRate 0.0077   Epoch: 28   Global Step: 146250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:42:23,355-Speed 10993.28 samples/sec   Loss 6.3634   LearningRate 0.0077   Epoch: 28   Global Step: 146260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:42:24,295-Speed 10909.20 samples/sec   Loss 6.2066   LearningRate 0.0077   Epoch: 28   Global Step: 146270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:42:25,286-Speed 10334.01 samples/sec   Loss 6.3254   LearningRate 0.0077   Epoch: 28   Global Step: 146280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:42:26,239-Speed 10760.77 samples/sec   Loss 6.4408   LearningRate 0.0077   Epoch: 28   Global Step: 146290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:42:27,224-Speed 10401.90 samples/sec   Loss 6.3763   LearningRate 0.0077   Epoch: 28   Global Step: 146300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:42:28,172-Speed 10813.62 samples/sec   Loss 6.3485   LearningRate 0.0077   Epoch: 28   Global Step: 146310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:42:29,126-Speed 10753.61 samples/sec   Loss 6.3787   LearningRate 0.0077   Epoch: 28   Global Step: 146320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:42:30,080-Speed 10748.40 samples/sec   Loss 6.3673   LearningRate 0.0077   Epoch: 28   Global Step: 146330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:42:31,074-Speed 10305.66 samples/sec   Loss 6.3871   LearningRate 0.0077   Epoch: 28   Global Step: 146340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:42:32,094-Speed 10049.02 samples/sec   Loss 6.3000   LearningRate 0.0077   Epoch: 28   Global Step: 146350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:42:33,040-Speed 10841.21 samples/sec   Loss 6.2259   LearningRate 0.0077   Epoch: 28   Global Step: 146360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:42:34,020-Speed 10451.74 samples/sec   Loss 6.4526   LearningRate 0.0076   Epoch: 28   Global Step: 146370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:42:34,973-Speed 10749.80 samples/sec   Loss 6.3903   LearningRate 0.0076   Epoch: 28   Global Step: 146380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:42:35,960-Speed 10385.77 samples/sec   Loss 6.3306   LearningRate 0.0076   Epoch: 28   Global Step: 146390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:42:36,917-Speed 10718.51 samples/sec   Loss 6.3745   LearningRate 0.0076   Epoch: 28   Global Step: 146400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:42:37,874-Speed 10699.20 samples/sec   Loss 6.3513   LearningRate 0.0076   Epoch: 28   Global Step: 146410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:42:38,857-Speed 10436.11 samples/sec   Loss 6.3930   LearningRate 0.0076   Epoch: 28   Global Step: 146420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:42:39,807-Speed 10783.88 samples/sec   Loss 6.3659   LearningRate 0.0076   Epoch: 28   Global Step: 146430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:42:40,804-Speed 10277.96 samples/sec   Loss 6.4725   LearningRate 0.0076   Epoch: 28   Global Step: 146440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:42:41,807-Speed 10222.42 samples/sec   Loss 6.4964   LearningRate 0.0076   Epoch: 28   Global Step: 146450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:42:42,784-Speed 10494.61 samples/sec   Loss 6.4352   LearningRate 0.0076   Epoch: 28   Global Step: 146460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:42:43,747-Speed 10636.74 samples/sec   Loss 6.4814   LearningRate 0.0076   Epoch: 28   Global Step: 146470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:42:44,708-Speed 10668.21 samples/sec   Loss 6.3931   LearningRate 0.0076   Epoch: 28   Global Step: 146480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:42:45,667-Speed 10682.28 samples/sec   Loss 6.3945   LearningRate 0.0076   Epoch: 28   Global Step: 146490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:42:46,643-Speed 10500.84 samples/sec   Loss 6.2010   LearningRate 0.0076   Epoch: 28   Global Step: 146500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:42:47,641-Speed 10267.01 samples/sec   Loss 6.4708   LearningRate 0.0076   Epoch: 28   Global Step: 146510   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:42:48,602-Speed 10670.76 samples/sec   Loss 6.3136   LearningRate 0.0076   Epoch: 28   Global Step: 146520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:42:49,629-Speed 9979.70 samples/sec   Loss 6.2483   LearningRate 0.0076   Epoch: 28   Global Step: 146530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:42:50,627-Speed 10270.34 samples/sec   Loss 6.4719   LearningRate 0.0076   Epoch: 28   Global Step: 146540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:42:51,573-Speed 10838.42 samples/sec   Loss 6.3411   LearningRate 0.0076   Epoch: 28   Global Step: 146550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:42:52,552-Speed 10471.88 samples/sec   Loss 6.3930   LearningRate 0.0076   Epoch: 28   Global Step: 146560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:42:53,567-Speed 10101.84 samples/sec   Loss 6.3066   LearningRate 0.0076   Epoch: 28   Global Step: 146570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:42:54,481-Speed 11211.99 samples/sec   Loss 6.2876   LearningRate 0.0076   Epoch: 28   Global Step: 146580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:42:55,439-Speed 10700.09 samples/sec   Loss 6.4037   LearningRate 0.0076   Epoch: 28   Global Step: 146590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:42:56,412-Speed 10538.45 samples/sec   Loss 6.2949   LearningRate 0.0076   Epoch: 28   Global Step: 146600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:42:57,355-Speed 10859.84 samples/sec   Loss 6.3589   LearningRate 0.0076   Epoch: 28   Global Step: 146610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:42:58,283-Speed 11045.45 samples/sec   Loss 6.3894   LearningRate 0.0076   Epoch: 28   Global Step: 146620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:42:59,284-Speed 10240.73 samples/sec   Loss 6.4420   LearningRate 0.0076   Epoch: 28   Global Step: 146630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:43:00,225-Speed 10891.87 samples/sec   Loss 6.3213   LearningRate 0.0076   Epoch: 28   Global Step: 146640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:43:01,243-Speed 10070.74 samples/sec   Loss 6.4521   LearningRate 0.0076   Epoch: 28   Global Step: 146650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:43:02,199-Speed 10723.14 samples/sec   Loss 6.4009   LearningRate 0.0076   Epoch: 28   Global Step: 146660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:43:03,175-Speed 10496.64 samples/sec   Loss 6.4614   LearningRate 0.0076   Epoch: 28   Global Step: 146670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:43:04,193-Speed 10072.41 samples/sec   Loss 6.2251   LearningRate 0.0076   Epoch: 28   Global Step: 146680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:43:15,314-Speed 920.85 samples/sec   Loss 5.7598   LearningRate 0.0076   Epoch: 29   Global Step: 146690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:43:16,412-Speed 9341.19 samples/sec   Loss 5.7382   LearningRate 0.0076   Epoch: 29   Global Step: 146700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:43:17,384-Speed 10551.50 samples/sec   Loss 5.6818   LearningRate 0.0076   Epoch: 29   Global Step: 146710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:43:18,336-Speed 10761.34 samples/sec   Loss 5.6595   LearningRate 0.0076   Epoch: 29   Global Step: 146720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:43:19,377-Speed 9847.82 samples/sec   Loss 5.8635   LearningRate 0.0075   Epoch: 29   Global Step: 146730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:43:20,499-Speed 9129.03 samples/sec   Loss 5.7078   LearningRate 0.0075   Epoch: 29   Global Step: 146740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:43:21,554-Speed 9717.84 samples/sec   Loss 5.6800   LearningRate 0.0075   Epoch: 29   Global Step: 146750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:43:22,612-Speed 9687.60 samples/sec   Loss 5.8153   LearningRate 0.0075   Epoch: 29   Global Step: 146760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:43:23,565-Speed 10750.75 samples/sec   Loss 5.8240   LearningRate 0.0075   Epoch: 29   Global Step: 146770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:43:24,519-Speed 10750.48 samples/sec   Loss 5.5992   LearningRate 0.0075   Epoch: 29   Global Step: 146780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:43:25,480-Speed 10655.29 samples/sec   Loss 5.6646   LearningRate 0.0075   Epoch: 29   Global Step: 146790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:43:26,866-Speed 7394.94 samples/sec   Loss 5.7434   LearningRate 0.0075   Epoch: 29   Global Step: 146800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:43:28,140-Speed 8039.87 samples/sec   Loss 5.6385   LearningRate 0.0075   Epoch: 29   Global Step: 146810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:43:30,402-Speed 4528.85 samples/sec   Loss 5.7793   LearningRate 0.0075   Epoch: 29   Global Step: 146820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:43:31,386-Speed 10413.12 samples/sec   Loss 5.6699   LearningRate 0.0075   Epoch: 29   Global Step: 146830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:43:32,433-Speed 9792.62 samples/sec   Loss 5.7509   LearningRate 0.0075   Epoch: 29   Global Step: 146840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:43:33,389-Speed 10722.55 samples/sec   Loss 5.6782   LearningRate 0.0075   Epoch: 29   Global Step: 146850   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:43:34,376-Speed 10381.56 samples/sec   Loss 5.6770   LearningRate 0.0075   Epoch: 29   Global Step: 146860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:43:35,359-Speed 10430.31 samples/sec   Loss 5.7631   LearningRate 0.0075   Epoch: 29   Global Step: 146870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:43:36,305-Speed 10823.72 samples/sec   Loss 5.6344   LearningRate 0.0075   Epoch: 29   Global Step: 146880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:43:37,405-Speed 9325.31 samples/sec   Loss 5.8224   LearningRate 0.0075   Epoch: 29   Global Step: 146890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:43:38,360-Speed 10731.76 samples/sec   Loss 5.7509   LearningRate 0.0075   Epoch: 29   Global Step: 146900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:43:39,358-Speed 10263.43 samples/sec   Loss 5.6708   LearningRate 0.0075   Epoch: 29   Global Step: 146910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:43:40,383-Speed 10003.01 samples/sec   Loss 5.7205   LearningRate 0.0075   Epoch: 29   Global Step: 146920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:43:41,343-Speed 10674.82 samples/sec   Loss 5.7723   LearningRate 0.0075   Epoch: 29   Global Step: 146930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:43:42,318-Speed 10508.16 samples/sec   Loss 5.8059   LearningRate 0.0075   Epoch: 29   Global Step: 146940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:43:43,297-Speed 10470.10 samples/sec   Loss 5.8490   LearningRate 0.0075   Epoch: 29   Global Step: 146950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:43:44,291-Speed 10319.90 samples/sec   Loss 5.7709   LearningRate 0.0075   Epoch: 29   Global Step: 146960   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:43:45,251-Speed 10667.17 samples/sec   Loss 5.7585   LearningRate 0.0075   Epoch: 29   Global Step: 146970   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:43:46,203-Speed 10769.97 samples/sec   Loss 5.7110   LearningRate 0.0075   Epoch: 29   Global Step: 146980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:43:47,219-Speed 10086.63 samples/sec   Loss 5.8297   LearningRate 0.0075   Epoch: 29   Global Step: 146990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:43:48,192-Speed 10538.54 samples/sec   Loss 5.8626   LearningRate 0.0075   Epoch: 29   Global Step: 147000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:43:49,179-Speed 10383.40 samples/sec   Loss 5.8511   LearningRate 0.0075   Epoch: 29   Global Step: 147010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:43:50,121-Speed 10872.95 samples/sec   Loss 5.7931   LearningRate 0.0075   Epoch: 29   Global Step: 147020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:43:51,091-Speed 10566.93 samples/sec   Loss 5.8039   LearningRate 0.0075   Epoch: 29   Global Step: 147030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:43:52,068-Speed 10497.08 samples/sec   Loss 5.7671   LearningRate 0.0075   Epoch: 29   Global Step: 147040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:43:53,031-Speed 10640.66 samples/sec   Loss 5.7003   LearningRate 0.0075   Epoch: 29   Global Step: 147050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:43:54,003-Speed 10546.11 samples/sec   Loss 5.6461   LearningRate 0.0075   Epoch: 29   Global Step: 147060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:43:55,038-Speed 9899.16 samples/sec   Loss 5.5937   LearningRate 0.0075   Epoch: 29   Global Step: 147070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:43:56,008-Speed 10562.29 samples/sec   Loss 5.6321   LearningRate 0.0075   Epoch: 29   Global Step: 147080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:43:56,994-Speed 10413.65 samples/sec   Loss 5.8595   LearningRate 0.0075   Epoch: 29   Global Step: 147090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:43:58,011-Speed 10078.69 samples/sec   Loss 5.7222   LearningRate 0.0074   Epoch: 29   Global Step: 147100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:43:58,927-Speed 11182.38 samples/sec   Loss 5.7618   LearningRate 0.0074   Epoch: 29   Global Step: 147110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:43:59,910-Speed 10434.51 samples/sec   Loss 5.8927   LearningRate 0.0074   Epoch: 29   Global Step: 147120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:00,903-Speed 10318.24 samples/sec   Loss 5.9357   LearningRate 0.0074   Epoch: 29   Global Step: 147130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:01,862-Speed 10685.99 samples/sec   Loss 5.8075   LearningRate 0.0074   Epoch: 29   Global Step: 147140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:02,799-Speed 10940.92 samples/sec   Loss 5.5834   LearningRate 0.0074   Epoch: 29   Global Step: 147150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:03,749-Speed 10796.32 samples/sec   Loss 5.7654   LearningRate 0.0074   Epoch: 29   Global Step: 147160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:04,762-Speed 10116.80 samples/sec   Loss 5.7349   LearningRate 0.0074   Epoch: 29   Global Step: 147170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:05,746-Speed 10412.50 samples/sec   Loss 5.8433   LearningRate 0.0074   Epoch: 29   Global Step: 147180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:06,668-Speed 11112.49 samples/sec   Loss 5.9132   LearningRate 0.0074   Epoch: 29   Global Step: 147190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:07,642-Speed 10532.08 samples/sec   Loss 5.8322   LearningRate 0.0074   Epoch: 29   Global Step: 147200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:08,598-Speed 10721.77 samples/sec   Loss 5.8918   LearningRate 0.0074   Epoch: 29   Global Step: 147210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:09,584-Speed 10391.97 samples/sec   Loss 5.8263   LearningRate 0.0074   Epoch: 29   Global Step: 147220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:10,562-Speed 10485.52 samples/sec   Loss 5.7283   LearningRate 0.0074   Epoch: 29   Global Step: 147230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:11,534-Speed 10538.79 samples/sec   Loss 5.9019   LearningRate 0.0074   Epoch: 29   Global Step: 147240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:12,533-Speed 10256.58 samples/sec   Loss 5.8468   LearningRate 0.0074   Epoch: 29   Global Step: 147250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:13,505-Speed 10546.82 samples/sec   Loss 5.7107   LearningRate 0.0074   Epoch: 29   Global Step: 147260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:14,446-Speed 10895.16 samples/sec   Loss 5.8785   LearningRate 0.0074   Epoch: 29   Global Step: 147270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:15,403-Speed 10708.45 samples/sec   Loss 5.9057   LearningRate 0.0074   Epoch: 29   Global Step: 147280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:16,379-Speed 10504.61 samples/sec   Loss 6.0827   LearningRate 0.0074   Epoch: 29   Global Step: 147290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:17,383-Speed 10200.51 samples/sec   Loss 5.7999   LearningRate 0.0074   Epoch: 29   Global Step: 147300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:18,326-Speed 10881.91 samples/sec   Loss 5.8471   LearningRate 0.0074   Epoch: 29   Global Step: 147310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:19,361-Speed 9904.29 samples/sec   Loss 5.7795   LearningRate 0.0074   Epoch: 29   Global Step: 147320   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:44:20,310-Speed 10802.92 samples/sec   Loss 5.9393   LearningRate 0.0074   Epoch: 29   Global Step: 147330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:21,238-Speed 11048.11 samples/sec   Loss 5.8506   LearningRate 0.0074   Epoch: 29   Global Step: 147340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:22,226-Speed 10375.24 samples/sec   Loss 5.9552   LearningRate 0.0074   Epoch: 29   Global Step: 147350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:23,255-Speed 9965.52 samples/sec   Loss 5.8264   LearningRate 0.0074   Epoch: 29   Global Step: 147360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:24,201-Speed 10830.28 samples/sec   Loss 5.9784   LearningRate 0.0074   Epoch: 29   Global Step: 147370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:44:25,185-Speed 10416.93 samples/sec   Loss 5.7665   LearningRate 0.0074   Epoch: 29   Global Step: 147380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:44:26,176-Speed 10344.06 samples/sec   Loss 5.9201   LearningRate 0.0074   Epoch: 29   Global Step: 147390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:44:27,128-Speed 10762.61 samples/sec   Loss 5.8878   LearningRate 0.0074   Epoch: 29   Global Step: 147400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:44:28,110-Speed 10444.65 samples/sec   Loss 6.0206   LearningRate 0.0074   Epoch: 29   Global Step: 147410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:44:29,078-Speed 10589.44 samples/sec   Loss 5.8735   LearningRate 0.0074   Epoch: 29   Global Step: 147420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:44:30,029-Speed 10785.67 samples/sec   Loss 5.9535   LearningRate 0.0074   Epoch: 29   Global Step: 147430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:44:30,973-Speed 10852.89 samples/sec   Loss 5.8474   LearningRate 0.0074   Epoch: 29   Global Step: 147440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:44:32,041-Speed 9591.78 samples/sec   Loss 5.7892   LearningRate 0.0074   Epoch: 29   Global Step: 147450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:44:33,010-Speed 10594.86 samples/sec   Loss 5.8951   LearningRate 0.0074   Epoch: 29   Global Step: 147460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:44:34,056-Speed 9795.98 samples/sec   Loss 5.8689   LearningRate 0.0073   Epoch: 29   Global Step: 147470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:35,092-Speed 9887.11 samples/sec   Loss 5.9603   LearningRate 0.0073   Epoch: 29   Global Step: 147480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:36,035-Speed 10866.56 samples/sec   Loss 5.9153   LearningRate 0.0073   Epoch: 29   Global Step: 147490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:37,003-Speed 10594.26 samples/sec   Loss 5.9643   LearningRate 0.0073   Epoch: 29   Global Step: 147500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:37,980-Speed 10497.21 samples/sec   Loss 5.8836   LearningRate 0.0073   Epoch: 29   Global Step: 147510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:38,961-Speed 10439.76 samples/sec   Loss 6.0277   LearningRate 0.0073   Epoch: 29   Global Step: 147520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:39,898-Speed 10948.34 samples/sec   Loss 6.0400   LearningRate 0.0073   Epoch: 29   Global Step: 147530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:40,892-Speed 10313.31 samples/sec   Loss 5.8443   LearningRate 0.0073   Epoch: 29   Global Step: 147540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:41,854-Speed 10646.48 samples/sec   Loss 5.7507   LearningRate 0.0073   Epoch: 29   Global Step: 147550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:42,870-Speed 10091.44 samples/sec   Loss 5.9069   LearningRate 0.0073   Epoch: 29   Global Step: 147560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:43,838-Speed 10633.24 samples/sec   Loss 6.0225   LearningRate 0.0073   Epoch: 29   Global Step: 147570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:45,113-Speed 8038.42 samples/sec   Loss 5.9540   LearningRate 0.0073   Epoch: 29   Global Step: 147580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:46,382-Speed 8070.16 samples/sec   Loss 5.9624   LearningRate 0.0073   Epoch: 29   Global Step: 147590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:47,320-Speed 10936.33 samples/sec   Loss 5.9166   LearningRate 0.0073   Epoch: 29   Global Step: 147600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:48,292-Speed 10546.79 samples/sec   Loss 5.9492   LearningRate 0.0073   Epoch: 29   Global Step: 147610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:49,278-Speed 10386.10 samples/sec   Loss 5.9983   LearningRate 0.0073   Epoch: 29   Global Step: 147620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:50,291-Speed 10122.47 samples/sec   Loss 6.0608   LearningRate 0.0073   Epoch: 29   Global Step: 147630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:51,250-Speed 10690.91 samples/sec   Loss 5.8428   LearningRate 0.0073   Epoch: 29   Global Step: 147640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:52,240-Speed 10354.20 samples/sec   Loss 5.8642   LearningRate 0.0073   Epoch: 29   Global Step: 147650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:53,211-Speed 10544.45 samples/sec   Loss 5.9042   LearningRate 0.0073   Epoch: 29   Global Step: 147660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:54,261-Speed 9762.00 samples/sec   Loss 5.8689   LearningRate 0.0073   Epoch: 29   Global Step: 147670   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:44:55,204-Speed 10873.42 samples/sec   Loss 5.9174   LearningRate 0.0073   Epoch: 29   Global Step: 147680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:56,147-Speed 10879.40 samples/sec   Loss 5.9766   LearningRate 0.0073   Epoch: 29   Global Step: 147690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:57,098-Speed 10765.12 samples/sec   Loss 5.9322   LearningRate 0.0073   Epoch: 29   Global Step: 147700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:58,097-Speed 10259.82 samples/sec   Loss 5.7339   LearningRate 0.0073   Epoch: 29   Global Step: 147710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:59,081-Speed 10421.43 samples/sec   Loss 5.7443   LearningRate 0.0073   Epoch: 29   Global Step: 147720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:44:59,994-Speed 11217.55 samples/sec   Loss 5.8049   LearningRate 0.0073   Epoch: 29   Global Step: 147730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:45:00,954-Speed 10674.73 samples/sec   Loss 5.9925   LearningRate 0.0073   Epoch: 29   Global Step: 147740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:45:01,951-Speed 10289.40 samples/sec   Loss 6.0119   LearningRate 0.0073   Epoch: 29   Global Step: 147750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:45:02,897-Speed 10842.08 samples/sec   Loss 5.9059   LearningRate 0.0073   Epoch: 29   Global Step: 147760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:45:03,827-Speed 11023.55 samples/sec   Loss 5.9577   LearningRate 0.0073   Epoch: 29   Global Step: 147770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:45:04,837-Speed 10147.42 samples/sec   Loss 5.7430   LearningRate 0.0073   Epoch: 29   Global Step: 147780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:45:05,832-Speed 10299.20 samples/sec   Loss 5.8623   LearningRate 0.0073   Epoch: 29   Global Step: 147790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:45:06,789-Speed 10722.97 samples/sec   Loss 5.9406   LearningRate 0.0073   Epoch: 29   Global Step: 147800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:45:07,775-Speed 10385.01 samples/sec   Loss 5.9859   LearningRate 0.0073   Epoch: 29   Global Step: 147810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:45:08,790-Speed 10104.49 samples/sec   Loss 6.0278   LearningRate 0.0073   Epoch: 29   Global Step: 147820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:45:09,761-Speed 10555.53 samples/sec   Loss 5.9807   LearningRate 0.0073   Epoch: 29   Global Step: 147830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:45:10,749-Speed 10381.97 samples/sec   Loss 6.0073   LearningRate 0.0073   Epoch: 29   Global Step: 147840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:45:11,750-Speed 10234.21 samples/sec   Loss 5.9854   LearningRate 0.0072   Epoch: 29   Global Step: 147850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:45:12,679-Speed 11038.46 samples/sec   Loss 6.0512   LearningRate 0.0072   Epoch: 29   Global Step: 147860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:45:13,612-Speed 10981.34 samples/sec   Loss 5.9284   LearningRate 0.0072   Epoch: 29   Global Step: 147870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:45:14,563-Speed 10774.48 samples/sec   Loss 5.7848   LearningRate 0.0072   Epoch: 29   Global Step: 147880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:45:15,647-Speed 9452.75 samples/sec   Loss 6.0152   LearningRate 0.0072   Epoch: 29   Global Step: 147890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:45:16,642-Speed 10303.80 samples/sec   Loss 6.0137   LearningRate 0.0072   Epoch: 29   Global Step: 147900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:45:17,653-Speed 10138.79 samples/sec   Loss 5.9276   LearningRate 0.0072   Epoch: 29   Global Step: 147910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:45:18,608-Speed 10732.23 samples/sec   Loss 5.9399   LearningRate 0.0072   Epoch: 29   Global Step: 147920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:45:19,571-Speed 10645.13 samples/sec   Loss 6.0777   LearningRate 0.0072   Epoch: 29   Global Step: 147930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:45:20,606-Speed 10126.33 samples/sec   Loss 5.9505   LearningRate 0.0072   Epoch: 29   Global Step: 147940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:45:21,551-Speed 10843.53 samples/sec   Loss 6.0590   LearningRate 0.0072   Epoch: 29   Global Step: 147950   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:45:22,504-Speed 10753.59 samples/sec   Loss 5.8779   LearningRate 0.0072   Epoch: 29   Global Step: 147960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:45:23,462-Speed 10698.07 samples/sec   Loss 5.9775   LearningRate 0.0072   Epoch: 29   Global Step: 147970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:45:24,428-Speed 10609.33 samples/sec   Loss 5.9019   LearningRate 0.0072   Epoch: 29   Global Step: 147980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:45:25,433-Speed 10206.33 samples/sec   Loss 5.9232   LearningRate 0.0072   Epoch: 29   Global Step: 147990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:45:26,439-Speed 10184.39 samples/sec   Loss 6.0851   LearningRate 0.0072   Epoch: 29   Global Step: 148000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:45:48,930-[lfw][148000]XNorm: 8.750307
Training: 2022-04-11 04:45:48,931-[lfw][148000]Accuracy-Flip: 0.99617+-0.00279
Training: 2022-04-11 04:45:48,932-[lfw][148000]Accuracy-Highest: 0.99700
Training: 2022-04-11 04:46:14,731-[cfp_fp][148000]XNorm: 7.484924
Training: 2022-04-11 04:46:14,732-[cfp_fp][148000]Accuracy-Flip: 0.96657+-0.00988
Training: 2022-04-11 04:46:14,733-[cfp_fp][148000]Accuracy-Highest: 0.96914
Training: 2022-04-11 04:46:37,500-[agedb_30][148000]XNorm: 8.534926
Training: 2022-04-11 04:46:37,501-[agedb_30][148000]Accuracy-Flip: 0.97033+-0.00645
Training: 2022-04-11 04:46:37,502-[agedb_30][148000]Accuracy-Highest: 0.97250
Training: 2022-04-11 04:46:38,476-Speed 142.15 samples/sec   Loss 6.0022   LearningRate 0.0072   Epoch: 29   Global Step: 148010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:46:39,408-Speed 10999.81 samples/sec   Loss 5.9537   LearningRate 0.0072   Epoch: 29   Global Step: 148020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:46:40,427-Speed 10051.14 samples/sec   Loss 6.1044   LearningRate 0.0072   Epoch: 29   Global Step: 148030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:46:41,452-Speed 10005.57 samples/sec   Loss 5.9085   LearningRate 0.0072   Epoch: 29   Global Step: 148040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:46:42,471-Speed 10053.11 samples/sec   Loss 6.1174   LearningRate 0.0072   Epoch: 29   Global Step: 148050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:46:43,420-Speed 10799.55 samples/sec   Loss 6.0855   LearningRate 0.0072   Epoch: 29   Global Step: 148060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:46:44,413-Speed 10328.63 samples/sec   Loss 5.9083   LearningRate 0.0072   Epoch: 29   Global Step: 148070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:46:45,369-Speed 10731.43 samples/sec   Loss 6.0439   LearningRate 0.0072   Epoch: 29   Global Step: 148080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:46:46,310-Speed 10889.22 samples/sec   Loss 5.9721   LearningRate 0.0072   Epoch: 29   Global Step: 148090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:46:47,282-Speed 10545.98 samples/sec   Loss 5.9467   LearningRate 0.0072   Epoch: 29   Global Step: 148100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:46:48,287-Speed 10191.66 samples/sec   Loss 6.0229   LearningRate 0.0072   Epoch: 29   Global Step: 148110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:46:49,220-Speed 10994.41 samples/sec   Loss 6.0490   LearningRate 0.0072   Epoch: 29   Global Step: 148120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:46:50,165-Speed 10851.88 samples/sec   Loss 5.9940   LearningRate 0.0072   Epoch: 29   Global Step: 148130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:46:51,143-Speed 10480.57 samples/sec   Loss 5.9417   LearningRate 0.0072   Epoch: 29   Global Step: 148140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:46:52,107-Speed 10627.73 samples/sec   Loss 5.9877   LearningRate 0.0072   Epoch: 29   Global Step: 148150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:46:53,098-Speed 10339.82 samples/sec   Loss 5.9557   LearningRate 0.0072   Epoch: 29   Global Step: 148160   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:46:54,109-Speed 10136.70 samples/sec   Loss 5.9510   LearningRate 0.0072   Epoch: 29   Global Step: 148170   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:46:55,091-Speed 10438.56 samples/sec   Loss 5.9921   LearningRate 0.0072   Epoch: 29   Global Step: 148180   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:46:56,017-Speed 11075.76 samples/sec   Loss 6.0215   LearningRate 0.0072   Epoch: 29   Global Step: 148190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:46:56,976-Speed 10687.66 samples/sec   Loss 5.8879   LearningRate 0.0072   Epoch: 29   Global Step: 148200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:46:57,981-Speed 10207.71 samples/sec   Loss 5.8916   LearningRate 0.0072   Epoch: 29   Global Step: 148210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:46:58,980-Speed 10256.64 samples/sec   Loss 6.0031   LearningRate 0.0072   Epoch: 29   Global Step: 148220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:46:59,950-Speed 10570.61 samples/sec   Loss 6.1615   LearningRate 0.0071   Epoch: 29   Global Step: 148230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:47:00,929-Speed 10467.83 samples/sec   Loss 6.0843   LearningRate 0.0071   Epoch: 29   Global Step: 148240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:47:01,946-Speed 10079.72 samples/sec   Loss 6.2290   LearningRate 0.0071   Epoch: 29   Global Step: 148250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:47:02,881-Speed 10970.75 samples/sec   Loss 6.0588   LearningRate 0.0071   Epoch: 29   Global Step: 148260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:47:03,849-Speed 10579.04 samples/sec   Loss 5.9787   LearningRate 0.0071   Epoch: 29   Global Step: 148270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:47:04,834-Speed 10415.22 samples/sec   Loss 6.1214   LearningRate 0.0071   Epoch: 29   Global Step: 148280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:47:05,776-Speed 10871.92 samples/sec   Loss 5.9968   LearningRate 0.0071   Epoch: 29   Global Step: 148290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:47:06,729-Speed 10757.49 samples/sec   Loss 6.0607   LearningRate 0.0071   Epoch: 29   Global Step: 148300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:47:07,719-Speed 10356.12 samples/sec   Loss 6.0591   LearningRate 0.0071   Epoch: 29   Global Step: 148310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:47:08,699-Speed 10454.23 samples/sec   Loss 6.0445   LearningRate 0.0071   Epoch: 29   Global Step: 148320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:47:09,679-Speed 10465.02 samples/sec   Loss 6.0199   LearningRate 0.0071   Epoch: 29   Global Step: 148330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:47:10,676-Speed 10277.07 samples/sec   Loss 6.0124   LearningRate 0.0071   Epoch: 29   Global Step: 148340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:47:11,657-Speed 10450.49 samples/sec   Loss 6.0319   LearningRate 0.0071   Epoch: 29   Global Step: 148350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:47:12,612-Speed 10729.41 samples/sec   Loss 5.9418   LearningRate 0.0071   Epoch: 29   Global Step: 148360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:47:13,570-Speed 10690.39 samples/sec   Loss 6.0432   LearningRate 0.0071   Epoch: 29   Global Step: 148370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:47:14,560-Speed 10359.38 samples/sec   Loss 5.9421   LearningRate 0.0071   Epoch: 29   Global Step: 148380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:47:15,521-Speed 10682.18 samples/sec   Loss 5.9130   LearningRate 0.0071   Epoch: 29   Global Step: 148390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:47:16,504-Speed 10425.50 samples/sec   Loss 5.9460   LearningRate 0.0071   Epoch: 29   Global Step: 148400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:47:17,506-Speed 10219.68 samples/sec   Loss 6.0570   LearningRate 0.0071   Epoch: 29   Global Step: 148410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:47:18,511-Speed 10205.23 samples/sec   Loss 6.0007   LearningRate 0.0071   Epoch: 29   Global Step: 148420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:47:19,512-Speed 10238.57 samples/sec   Loss 5.9217   LearningRate 0.0071   Epoch: 29   Global Step: 148430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:47:20,466-Speed 10741.77 samples/sec   Loss 5.9489   LearningRate 0.0071   Epoch: 29   Global Step: 148440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:47:21,452-Speed 10397.84 samples/sec   Loss 6.0151   LearningRate 0.0071   Epoch: 29   Global Step: 148450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:47:22,393-Speed 10924.63 samples/sec   Loss 5.9856   LearningRate 0.0071   Epoch: 29   Global Step: 148460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:47:23,337-Speed 10856.86 samples/sec   Loss 6.0927   LearningRate 0.0071   Epoch: 29   Global Step: 148470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:47:24,380-Speed 9832.80 samples/sec   Loss 6.1413   LearningRate 0.0071   Epoch: 29   Global Step: 148480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:47:25,320-Speed 10911.76 samples/sec   Loss 6.1197   LearningRate 0.0071   Epoch: 29   Global Step: 148490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:47:26,293-Speed 10526.14 samples/sec   Loss 5.9346   LearningRate 0.0071   Epoch: 29   Global Step: 148500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:47:27,252-Speed 10684.84 samples/sec   Loss 6.0773   LearningRate 0.0071   Epoch: 29   Global Step: 148510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:47:28,301-Speed 9770.66 samples/sec   Loss 6.0556   LearningRate 0.0071   Epoch: 29   Global Step: 148520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:47:29,267-Speed 10616.40 samples/sec   Loss 5.8966   LearningRate 0.0071   Epoch: 29   Global Step: 148530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:47:30,226-Speed 10687.28 samples/sec   Loss 6.0878   LearningRate 0.0071   Epoch: 29   Global Step: 148540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:47:31,211-Speed 10404.24 samples/sec   Loss 5.9427   LearningRate 0.0071   Epoch: 29   Global Step: 148550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:47:32,211-Speed 10249.10 samples/sec   Loss 5.9670   LearningRate 0.0071   Epoch: 29   Global Step: 148560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:47:33,180-Speed 10582.35 samples/sec   Loss 6.1315   LearningRate 0.0071   Epoch: 29   Global Step: 148570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:47:34,166-Speed 10389.71 samples/sec   Loss 6.0492   LearningRate 0.0071   Epoch: 29   Global Step: 148580   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:47:35,100-Speed 10972.04 samples/sec   Loss 5.9780   LearningRate 0.0071   Epoch: 29   Global Step: 148590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:47:36,067-Speed 10596.04 samples/sec   Loss 6.0158   LearningRate 0.0071   Epoch: 29   Global Step: 148600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:47:37,064-Speed 10293.20 samples/sec   Loss 6.0683   LearningRate 0.0070   Epoch: 29   Global Step: 148610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:47:38,021-Speed 10709.43 samples/sec   Loss 6.0124   LearningRate 0.0070   Epoch: 29   Global Step: 148620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:47:38,953-Speed 10991.82 samples/sec   Loss 6.0852   LearningRate 0.0070   Epoch: 29   Global Step: 148630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:47:39,956-Speed 10218.55 samples/sec   Loss 6.0141   LearningRate 0.0070   Epoch: 29   Global Step: 148640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:47:40,906-Speed 10785.70 samples/sec   Loss 5.9532   LearningRate 0.0070   Epoch: 29   Global Step: 148650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:47:41,904-Speed 10275.94 samples/sec   Loss 6.1933   LearningRate 0.0070   Epoch: 29   Global Step: 148660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:47:42,910-Speed 10181.17 samples/sec   Loss 6.1037   LearningRate 0.0070   Epoch: 29   Global Step: 148670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:47:43,894-Speed 10418.77 samples/sec   Loss 5.9892   LearningRate 0.0070   Epoch: 29   Global Step: 148680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:47:44,866-Speed 10542.35 samples/sec   Loss 6.1257   LearningRate 0.0070   Epoch: 29   Global Step: 148690   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:47:45,814-Speed 10823.14 samples/sec   Loss 6.1451   LearningRate 0.0070   Epoch: 29   Global Step: 148700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:47:46,766-Speed 10758.81 samples/sec   Loss 6.0245   LearningRate 0.0070   Epoch: 29   Global Step: 148710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:47:47,751-Speed 10406.45 samples/sec   Loss 6.0654   LearningRate 0.0070   Epoch: 29   Global Step: 148720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:47:48,712-Speed 10655.63 samples/sec   Loss 6.1360   LearningRate 0.0070   Epoch: 29   Global Step: 148730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:47:49,721-Speed 10169.76 samples/sec   Loss 6.0386   LearningRate 0.0070   Epoch: 29   Global Step: 148740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:47:50,670-Speed 10794.04 samples/sec   Loss 6.0150   LearningRate 0.0070   Epoch: 29   Global Step: 148750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:47:51,669-Speed 10256.75 samples/sec   Loss 6.0582   LearningRate 0.0070   Epoch: 29   Global Step: 148760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:47:52,603-Speed 10974.79 samples/sec   Loss 6.0077   LearningRate 0.0070   Epoch: 29   Global Step: 148770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:47:53,592-Speed 10361.39 samples/sec   Loss 6.0583   LearningRate 0.0070   Epoch: 29   Global Step: 148780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:47:54,541-Speed 10802.03 samples/sec   Loss 6.1475   LearningRate 0.0070   Epoch: 29   Global Step: 148790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:47:55,489-Speed 10817.58 samples/sec   Loss 5.9945   LearningRate 0.0070   Epoch: 29   Global Step: 148800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:47:56,486-Speed 10276.76 samples/sec   Loss 5.9558   LearningRate 0.0070   Epoch: 29   Global Step: 148810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:47:57,512-Speed 9987.58 samples/sec   Loss 6.0241   LearningRate 0.0070   Epoch: 29   Global Step: 148820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:47:58,494-Speed 10435.06 samples/sec   Loss 6.0540   LearningRate 0.0070   Epoch: 29   Global Step: 148830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:47:59,419-Speed 11083.13 samples/sec   Loss 5.9802   LearningRate 0.0070   Epoch: 29   Global Step: 148840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:00,357-Speed 10920.00 samples/sec   Loss 6.0879   LearningRate 0.0070   Epoch: 29   Global Step: 148850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:01,380-Speed 10024.01 samples/sec   Loss 5.9775   LearningRate 0.0070   Epoch: 29   Global Step: 148860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:02,377-Speed 10277.03 samples/sec   Loss 6.1355   LearningRate 0.0070   Epoch: 29   Global Step: 148870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:03,356-Speed 10471.00 samples/sec   Loss 6.0769   LearningRate 0.0070   Epoch: 29   Global Step: 148880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:04,344-Speed 10370.01 samples/sec   Loss 5.9935   LearningRate 0.0070   Epoch: 29   Global Step: 148890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:05,261-Speed 11177.92 samples/sec   Loss 6.1124   LearningRate 0.0070   Epoch: 29   Global Step: 148900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:06,231-Speed 10561.79 samples/sec   Loss 6.0439   LearningRate 0.0070   Epoch: 29   Global Step: 148910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:07,195-Speed 10632.84 samples/sec   Loss 6.0347   LearningRate 0.0070   Epoch: 29   Global Step: 148920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:08,196-Speed 10249.96 samples/sec   Loss 6.1005   LearningRate 0.0070   Epoch: 29   Global Step: 148930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:09,199-Speed 10220.54 samples/sec   Loss 6.0198   LearningRate 0.0070   Epoch: 29   Global Step: 148940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:10,167-Speed 10583.24 samples/sec   Loss 5.9443   LearningRate 0.0070   Epoch: 29   Global Step: 148950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:11,151-Speed 10420.64 samples/sec   Loss 6.0427   LearningRate 0.0070   Epoch: 29   Global Step: 148960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:12,102-Speed 10778.43 samples/sec   Loss 6.1914   LearningRate 0.0070   Epoch: 29   Global Step: 148970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:13,069-Speed 10613.84 samples/sec   Loss 6.1435   LearningRate 0.0070   Epoch: 29   Global Step: 148980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:14,063-Speed 10313.91 samples/sec   Loss 6.1222   LearningRate 0.0069   Epoch: 29   Global Step: 148990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:15,041-Speed 10485.29 samples/sec   Loss 6.1745   LearningRate 0.0069   Epoch: 29   Global Step: 149000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:16,012-Speed 10549.64 samples/sec   Loss 6.1704   LearningRate 0.0069   Epoch: 29   Global Step: 149010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:16,972-Speed 10674.18 samples/sec   Loss 5.9312   LearningRate 0.0069   Epoch: 29   Global Step: 149020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:48:17,960-Speed 10388.91 samples/sec   Loss 6.1769   LearningRate 0.0069   Epoch: 29   Global Step: 149030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:48:18,956-Speed 10298.01 samples/sec   Loss 6.1039   LearningRate 0.0069   Epoch: 29   Global Step: 149040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:48:19,933-Speed 10492.79 samples/sec   Loss 6.1948   LearningRate 0.0069   Epoch: 29   Global Step: 149050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:48:20,916-Speed 10421.10 samples/sec   Loss 6.2287   LearningRate 0.0069   Epoch: 29   Global Step: 149060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:48:21,922-Speed 10182.21 samples/sec   Loss 6.2387   LearningRate 0.0069   Epoch: 29   Global Step: 149070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:48:22,930-Speed 10173.40 samples/sec   Loss 6.2698   LearningRate 0.0069   Epoch: 29   Global Step: 149080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:48:23,914-Speed 10410.43 samples/sec   Loss 6.0001   LearningRate 0.0069   Epoch: 29   Global Step: 149090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:48:24,915-Speed 10242.40 samples/sec   Loss 6.0578   LearningRate 0.0069   Epoch: 29   Global Step: 149100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:48:25,882-Speed 10603.01 samples/sec   Loss 6.0787   LearningRate 0.0069   Epoch: 29   Global Step: 149110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:48:26,863-Speed 10463.08 samples/sec   Loss 6.2210   LearningRate 0.0069   Epoch: 29   Global Step: 149120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:27,810-Speed 10820.84 samples/sec   Loss 6.0040   LearningRate 0.0069   Epoch: 29   Global Step: 149130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:28,793-Speed 10429.67 samples/sec   Loss 6.1856   LearningRate 0.0069   Epoch: 29   Global Step: 149140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:29,776-Speed 10423.32 samples/sec   Loss 5.9723   LearningRate 0.0069   Epoch: 29   Global Step: 149150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:30,747-Speed 10553.56 samples/sec   Loss 6.1729   LearningRate 0.0069   Epoch: 29   Global Step: 149160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:31,740-Speed 10322.66 samples/sec   Loss 6.1182   LearningRate 0.0069   Epoch: 29   Global Step: 149170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:32,724-Speed 10423.13 samples/sec   Loss 6.2049   LearningRate 0.0069   Epoch: 29   Global Step: 149180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:33,696-Speed 10547.26 samples/sec   Loss 6.1636   LearningRate 0.0069   Epoch: 29   Global Step: 149190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:34,679-Speed 10427.44 samples/sec   Loss 6.0387   LearningRate 0.0069   Epoch: 29   Global Step: 149200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:35,651-Speed 10538.58 samples/sec   Loss 6.0467   LearningRate 0.0069   Epoch: 29   Global Step: 149210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:36,616-Speed 10626.86 samples/sec   Loss 6.0782   LearningRate 0.0069   Epoch: 29   Global Step: 149220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:37,662-Speed 9795.87 samples/sec   Loss 6.0712   LearningRate 0.0069   Epoch: 29   Global Step: 149230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:38,623-Speed 10674.42 samples/sec   Loss 6.2190   LearningRate 0.0069   Epoch: 29   Global Step: 149240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:39,613-Speed 10348.56 samples/sec   Loss 6.0688   LearningRate 0.0069   Epoch: 29   Global Step: 149250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:40,607-Speed 10315.91 samples/sec   Loss 6.0936   LearningRate 0.0069   Epoch: 29   Global Step: 149260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:41,575-Speed 10587.92 samples/sec   Loss 6.0054   LearningRate 0.0069   Epoch: 29   Global Step: 149270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:42,634-Speed 9676.77 samples/sec   Loss 6.1497   LearningRate 0.0069   Epoch: 29   Global Step: 149280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:43,652-Speed 10076.88 samples/sec   Loss 6.1102   LearningRate 0.0069   Epoch: 29   Global Step: 149290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:44,612-Speed 10674.24 samples/sec   Loss 6.1913   LearningRate 0.0069   Epoch: 29   Global Step: 149300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:45,573-Speed 10663.97 samples/sec   Loss 5.9653   LearningRate 0.0069   Epoch: 29   Global Step: 149310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:46,524-Speed 10776.41 samples/sec   Loss 6.0471   LearningRate 0.0069   Epoch: 29   Global Step: 149320   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:48:47,467-Speed 10873.94 samples/sec   Loss 6.0783   LearningRate 0.0069   Epoch: 29   Global Step: 149330   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:48:48,397-Speed 11025.13 samples/sec   Loss 6.1518   LearningRate 0.0069   Epoch: 29   Global Step: 149340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:49,362-Speed 10616.19 samples/sec   Loss 6.0350   LearningRate 0.0069   Epoch: 29   Global Step: 149350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:50,415-Speed 9735.53 samples/sec   Loss 6.0762   LearningRate 0.0069   Epoch: 29   Global Step: 149360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:51,379-Speed 10632.83 samples/sec   Loss 6.2036   LearningRate 0.0068   Epoch: 29   Global Step: 149370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:52,355-Speed 10491.63 samples/sec   Loss 6.0911   LearningRate 0.0068   Epoch: 29   Global Step: 149380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:53,358-Speed 10225.46 samples/sec   Loss 6.0582   LearningRate 0.0068   Epoch: 29   Global Step: 149390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:54,323-Speed 10623.54 samples/sec   Loss 6.0509   LearningRate 0.0068   Epoch: 29   Global Step: 149400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:55,322-Speed 10258.30 samples/sec   Loss 6.1711   LearningRate 0.0068   Epoch: 29   Global Step: 149410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:56,330-Speed 10165.33 samples/sec   Loss 6.1666   LearningRate 0.0068   Epoch: 29   Global Step: 149420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:57,280-Speed 10791.83 samples/sec   Loss 6.1045   LearningRate 0.0068   Epoch: 29   Global Step: 149430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:48:58,265-Speed 10408.68 samples/sec   Loss 6.1775   LearningRate 0.0068   Epoch: 29   Global Step: 149440   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:48:59,234-Speed 10587.79 samples/sec   Loss 6.2149   LearningRate 0.0068   Epoch: 29   Global Step: 149450   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:49:00,194-Speed 10669.65 samples/sec   Loss 6.0446   LearningRate 0.0068   Epoch: 29   Global Step: 149460   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:49:01,186-Speed 10362.97 samples/sec   Loss 6.1645   LearningRate 0.0068   Epoch: 29   Global Step: 149470   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:49:02,123-Speed 10944.62 samples/sec   Loss 6.2294   LearningRate 0.0068   Epoch: 29   Global Step: 149480   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:49:03,061-Speed 10927.71 samples/sec   Loss 6.0361   LearningRate 0.0068   Epoch: 29   Global Step: 149490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:49:04,062-Speed 10238.84 samples/sec   Loss 6.1603   LearningRate 0.0068   Epoch: 29   Global Step: 149500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:49:05,070-Speed 10172.16 samples/sec   Loss 6.2268   LearningRate 0.0068   Epoch: 29   Global Step: 149510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:49:06,025-Speed 10735.03 samples/sec   Loss 6.0549   LearningRate 0.0068   Epoch: 29   Global Step: 149520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:49:06,981-Speed 10718.14 samples/sec   Loss 6.2281   LearningRate 0.0068   Epoch: 29   Global Step: 149530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:49:08,005-Speed 10014.47 samples/sec   Loss 6.2122   LearningRate 0.0068   Epoch: 29   Global Step: 149540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:49:09,059-Speed 9729.44 samples/sec   Loss 6.0082   LearningRate 0.0068   Epoch: 29   Global Step: 149550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:49:10,034-Speed 10523.09 samples/sec   Loss 6.2413   LearningRate 0.0068   Epoch: 29   Global Step: 149560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:49:11,013-Speed 10463.16 samples/sec   Loss 6.1086   LearningRate 0.0068   Epoch: 29   Global Step: 149570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:49:12,011-Speed 10279.28 samples/sec   Loss 6.1753   LearningRate 0.0068   Epoch: 29   Global Step: 149580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:49:12,918-Speed 11303.04 samples/sec   Loss 6.0269   LearningRate 0.0068   Epoch: 29   Global Step: 149590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:49:13,924-Speed 10191.09 samples/sec   Loss 5.9978   LearningRate 0.0068   Epoch: 29   Global Step: 149600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:49:14,925-Speed 10242.67 samples/sec   Loss 6.2514   LearningRate 0.0068   Epoch: 29   Global Step: 149610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:49:15,895-Speed 10563.82 samples/sec   Loss 6.1221   LearningRate 0.0068   Epoch: 29   Global Step: 149620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:49:16,893-Speed 10269.51 samples/sec   Loss 6.2340   LearningRate 0.0068   Epoch: 29   Global Step: 149630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:49:17,831-Speed 10927.67 samples/sec   Loss 6.1363   LearningRate 0.0068   Epoch: 29   Global Step: 149640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:49:18,768-Speed 10943.74 samples/sec   Loss 6.1735   LearningRate 0.0068   Epoch: 29   Global Step: 149650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:49:19,746-Speed 10484.41 samples/sec   Loss 6.1733   LearningRate 0.0068   Epoch: 29   Global Step: 149660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:49:20,718-Speed 10541.16 samples/sec   Loss 6.1074   LearningRate 0.0068   Epoch: 29   Global Step: 149670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:49:21,684-Speed 10603.38 samples/sec   Loss 6.1437   LearningRate 0.0068   Epoch: 29   Global Step: 149680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:49:22,697-Speed 10123.06 samples/sec   Loss 6.1665   LearningRate 0.0068   Epoch: 29   Global Step: 149690   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:49:23,699-Speed 10226.73 samples/sec   Loss 6.1085   LearningRate 0.0068   Epoch: 29   Global Step: 149700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:49:24,697-Speed 10270.24 samples/sec   Loss 6.0473   LearningRate 0.0068   Epoch: 29   Global Step: 149710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:49:25,690-Speed 10322.43 samples/sec   Loss 6.1851   LearningRate 0.0068   Epoch: 29   Global Step: 149720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:49:26,672-Speed 10431.93 samples/sec   Loss 6.1809   LearningRate 0.0068   Epoch: 29   Global Step: 149730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:49:27,671-Speed 10258.87 samples/sec   Loss 6.2352   LearningRate 0.0068   Epoch: 29   Global Step: 149740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:49:28,638-Speed 10606.21 samples/sec   Loss 6.0845   LearningRate 0.0068   Epoch: 29   Global Step: 149750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:49:29,596-Speed 10700.50 samples/sec   Loss 6.2628   LearningRate 0.0067   Epoch: 29   Global Step: 149760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:49:30,583-Speed 10382.22 samples/sec   Loss 6.0211   LearningRate 0.0067   Epoch: 29   Global Step: 149770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:49:31,571-Speed 10372.92 samples/sec   Loss 6.1423   LearningRate 0.0067   Epoch: 29   Global Step: 149780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:49:32,572-Speed 10244.47 samples/sec   Loss 6.1705   LearningRate 0.0067   Epoch: 29   Global Step: 149790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:49:33,546-Speed 10530.34 samples/sec   Loss 6.0701   LearningRate 0.0067   Epoch: 29   Global Step: 149800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:49:34,529-Speed 10427.30 samples/sec   Loss 6.1216   LearningRate 0.0067   Epoch: 29   Global Step: 149810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:49:35,510-Speed 10437.65 samples/sec   Loss 6.2163   LearningRate 0.0067   Epoch: 29   Global Step: 149820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:49:36,500-Speed 10355.86 samples/sec   Loss 6.1463   LearningRate 0.0067   Epoch: 29   Global Step: 149830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:49:37,537-Speed 9889.28 samples/sec   Loss 6.1184   LearningRate 0.0067   Epoch: 29   Global Step: 149840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:49:38,538-Speed 10243.99 samples/sec   Loss 6.2418   LearningRate 0.0067   Epoch: 29   Global Step: 149850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:49:39,507-Speed 10567.50 samples/sec   Loss 6.1113   LearningRate 0.0067   Epoch: 29   Global Step: 149860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:49:40,504-Speed 10287.13 samples/sec   Loss 6.1555   LearningRate 0.0067   Epoch: 29   Global Step: 149870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:49:41,468-Speed 10637.40 samples/sec   Loss 6.1037   LearningRate 0.0067   Epoch: 29   Global Step: 149880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:49:42,407-Speed 10918.49 samples/sec   Loss 6.1630   LearningRate 0.0067   Epoch: 29   Global Step: 149890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:49:43,480-Speed 9551.69 samples/sec   Loss 6.1073   LearningRate 0.0067   Epoch: 29   Global Step: 149900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:49:44,471-Speed 10344.45 samples/sec   Loss 6.1114   LearningRate 0.0067   Epoch: 29   Global Step: 149910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:49:45,420-Speed 10811.89 samples/sec   Loss 6.2473   LearningRate 0.0067   Epoch: 29   Global Step: 149920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:49:46,367-Speed 10813.80 samples/sec   Loss 6.2895   LearningRate 0.0067   Epoch: 29   Global Step: 149930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:49:47,335-Speed 10589.68 samples/sec   Loss 6.0977   LearningRate 0.0067   Epoch: 29   Global Step: 149940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:49:48,365-Speed 9958.20 samples/sec   Loss 6.0885   LearningRate 0.0067   Epoch: 29   Global Step: 149950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:49:49,345-Speed 10465.90 samples/sec   Loss 6.1882   LearningRate 0.0067   Epoch: 29   Global Step: 149960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:49:50,260-Speed 11198.34 samples/sec   Loss 6.1546   LearningRate 0.0067   Epoch: 29   Global Step: 149970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:49:51,232-Speed 10541.16 samples/sec   Loss 6.2037   LearningRate 0.0067   Epoch: 29   Global Step: 149980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:49:52,259-Speed 9987.36 samples/sec   Loss 6.0739   LearningRate 0.0067   Epoch: 29   Global Step: 149990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:49:53,184-Speed 11081.32 samples/sec   Loss 6.1096   LearningRate 0.0067   Epoch: 29   Global Step: 150000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:50:15,518-[lfw][150000]XNorm: 8.751093
Training: 2022-04-11 04:50:15,519-[lfw][150000]Accuracy-Flip: 0.99700+-0.00332
Training: 2022-04-11 04:50:15,519-[lfw][150000]Accuracy-Highest: 0.99700
Training: 2022-04-11 04:50:41,158-[cfp_fp][150000]XNorm: 7.525670
Training: 2022-04-11 04:50:41,158-[cfp_fp][150000]Accuracy-Flip: 0.96971+-0.01190
Training: 2022-04-11 04:50:41,159-[cfp_fp][150000]Accuracy-Highest: 0.96971
Training: 2022-04-11 04:51:03,429-[agedb_30][150000]XNorm: 8.543677
Training: 2022-04-11 04:51:03,430-[agedb_30][150000]Accuracy-Flip: 0.96750+-0.00790
Training: 2022-04-11 04:51:03,430-[agedb_30][150000]Accuracy-Highest: 0.97250
Training: 2022-04-11 04:51:04,441-Speed 143.71 samples/sec   Loss 6.1926   LearningRate 0.0067   Epoch: 29   Global Step: 150010   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:51:05,439-Speed 10269.80 samples/sec   Loss 6.1937   LearningRate 0.0067   Epoch: 29   Global Step: 150020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:51:06,432-Speed 10321.78 samples/sec   Loss 6.1774   LearningRate 0.0067   Epoch: 29   Global Step: 150030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:51:07,372-Speed 10900.59 samples/sec   Loss 6.1681   LearningRate 0.0067   Epoch: 29   Global Step: 150040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:51:08,333-Speed 10666.35 samples/sec   Loss 6.2453   LearningRate 0.0067   Epoch: 29   Global Step: 150050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:51:09,397-Speed 9637.30 samples/sec   Loss 6.2009   LearningRate 0.0067   Epoch: 29   Global Step: 150060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:51:10,345-Speed 10811.75 samples/sec   Loss 6.2449   LearningRate 0.0067   Epoch: 29   Global Step: 150070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:51:11,394-Speed 9777.05 samples/sec   Loss 6.1298   LearningRate 0.0067   Epoch: 29   Global Step: 150080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:51:12,388-Speed 10302.75 samples/sec   Loss 6.0522   LearningRate 0.0067   Epoch: 29   Global Step: 150090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:51:13,430-Speed 9848.72 samples/sec   Loss 6.1367   LearningRate 0.0067   Epoch: 29   Global Step: 150100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:51:14,452-Speed 10038.14 samples/sec   Loss 6.1103   LearningRate 0.0067   Epoch: 29   Global Step: 150110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:51:15,462-Speed 10141.43 samples/sec   Loss 6.0984   LearningRate 0.0067   Epoch: 29   Global Step: 150120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:51:16,458-Speed 10285.13 samples/sec   Loss 6.1856   LearningRate 0.0067   Epoch: 29   Global Step: 150130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:51:17,475-Speed 10080.38 samples/sec   Loss 6.2239   LearningRate 0.0067   Epoch: 29   Global Step: 150140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:51:18,434-Speed 10690.38 samples/sec   Loss 6.3136   LearningRate 0.0066   Epoch: 29   Global Step: 150150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:51:19,431-Speed 10277.81 samples/sec   Loss 6.1608   LearningRate 0.0066   Epoch: 29   Global Step: 150160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:51:20,444-Speed 10121.29 samples/sec   Loss 6.2146   LearningRate 0.0066   Epoch: 29   Global Step: 150170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:51:21,481-Speed 9883.47 samples/sec   Loss 6.1092   LearningRate 0.0066   Epoch: 29   Global Step: 150180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:51:22,456-Speed 10509.80 samples/sec   Loss 6.1339   LearningRate 0.0066   Epoch: 29   Global Step: 150190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:51:23,439-Speed 10429.37 samples/sec   Loss 6.1750   LearningRate 0.0066   Epoch: 29   Global Step: 150200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:51:24,409-Speed 10567.40 samples/sec   Loss 6.2620   LearningRate 0.0066   Epoch: 29   Global Step: 150210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:51:25,366-Speed 10709.51 samples/sec   Loss 6.1747   LearningRate 0.0066   Epoch: 29   Global Step: 150220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:51:26,400-Speed 9913.10 samples/sec   Loss 6.0906   LearningRate 0.0066   Epoch: 29   Global Step: 150230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:51:27,378-Speed 10474.83 samples/sec   Loss 6.3215   LearningRate 0.0066   Epoch: 29   Global Step: 150240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:51:28,365-Speed 10385.40 samples/sec   Loss 6.1979   LearningRate 0.0066   Epoch: 29   Global Step: 150250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:51:29,333-Speed 10587.39 samples/sec   Loss 6.1780   LearningRate 0.0066   Epoch: 29   Global Step: 150260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:51:30,301-Speed 10585.23 samples/sec   Loss 6.2709   LearningRate 0.0066   Epoch: 29   Global Step: 150270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:51:31,256-Speed 10736.44 samples/sec   Loss 6.2240   LearningRate 0.0066   Epoch: 29   Global Step: 150280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:51:32,203-Speed 10825.75 samples/sec   Loss 6.1324   LearningRate 0.0066   Epoch: 29   Global Step: 150290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:51:33,178-Speed 10511.62 samples/sec   Loss 6.2847   LearningRate 0.0066   Epoch: 29   Global Step: 150300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:51:34,122-Speed 10868.57 samples/sec   Loss 6.1373   LearningRate 0.0066   Epoch: 29   Global Step: 150310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:51:35,060-Speed 10919.03 samples/sec   Loss 6.1013   LearningRate 0.0066   Epoch: 29   Global Step: 150320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:51:36,009-Speed 10797.02 samples/sec   Loss 6.1254   LearningRate 0.0066   Epoch: 29   Global Step: 150330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:51:37,056-Speed 9786.08 samples/sec   Loss 6.2336   LearningRate 0.0066   Epoch: 29   Global Step: 150340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:51:38,021-Speed 10623.78 samples/sec   Loss 6.1476   LearningRate 0.0066   Epoch: 29   Global Step: 150350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:51:39,019-Speed 10274.56 samples/sec   Loss 6.0772   LearningRate 0.0066   Epoch: 29   Global Step: 150360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:51:40,065-Speed 9796.84 samples/sec   Loss 6.0925   LearningRate 0.0066   Epoch: 29   Global Step: 150370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:51:41,070-Speed 10195.25 samples/sec   Loss 6.0336   LearningRate 0.0066   Epoch: 29   Global Step: 150380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:51:42,034-Speed 10633.95 samples/sec   Loss 6.1523   LearningRate 0.0066   Epoch: 29   Global Step: 150390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:51:43,031-Speed 10281.35 samples/sec   Loss 6.2892   LearningRate 0.0066   Epoch: 29   Global Step: 150400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:51:44,036-Speed 10199.43 samples/sec   Loss 6.2509   LearningRate 0.0066   Epoch: 29   Global Step: 150410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:51:45,003-Speed 10602.66 samples/sec   Loss 6.1147   LearningRate 0.0066   Epoch: 29   Global Step: 150420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:51:45,946-Speed 10865.02 samples/sec   Loss 6.1469   LearningRate 0.0066   Epoch: 29   Global Step: 150430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:51:46,920-Speed 10523.16 samples/sec   Loss 6.2007   LearningRate 0.0066   Epoch: 29   Global Step: 150440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:51:47,927-Speed 10175.41 samples/sec   Loss 6.1975   LearningRate 0.0066   Epoch: 29   Global Step: 150450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:51:48,876-Speed 10814.61 samples/sec   Loss 6.1220   LearningRate 0.0066   Epoch: 29   Global Step: 150460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:51:49,898-Speed 10021.11 samples/sec   Loss 6.2243   LearningRate 0.0066   Epoch: 29   Global Step: 150470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:51:50,930-Speed 9937.04 samples/sec   Loss 6.0789   LearningRate 0.0066   Epoch: 29   Global Step: 150480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:51:51,883-Speed 10758.96 samples/sec   Loss 6.2245   LearningRate 0.0066   Epoch: 29   Global Step: 150490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:51:52,827-Speed 10863.29 samples/sec   Loss 6.2307   LearningRate 0.0066   Epoch: 29   Global Step: 150500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:51:53,901-Speed 9541.20 samples/sec   Loss 6.2991   LearningRate 0.0066   Epoch: 29   Global Step: 150510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:51:54,850-Speed 10807.12 samples/sec   Loss 6.3040   LearningRate 0.0066   Epoch: 29   Global Step: 150520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:51:55,854-Speed 10207.58 samples/sec   Loss 6.1872   LearningRate 0.0066   Epoch: 29   Global Step: 150530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:51:56,814-Speed 10676.34 samples/sec   Loss 6.0927   LearningRate 0.0066   Epoch: 29   Global Step: 150540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:51:57,854-Speed 9858.88 samples/sec   Loss 6.3144   LearningRate 0.0065   Epoch: 29   Global Step: 150550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:51:58,830-Speed 10493.55 samples/sec   Loss 6.3990   LearningRate 0.0065   Epoch: 29   Global Step: 150560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:51:59,814-Speed 10417.43 samples/sec   Loss 6.2959   LearningRate 0.0065   Epoch: 29   Global Step: 150570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:52:00,821-Speed 10178.17 samples/sec   Loss 6.1567   LearningRate 0.0065   Epoch: 29   Global Step: 150580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:52:01,785-Speed 10635.33 samples/sec   Loss 6.2736   LearningRate 0.0065   Epoch: 29   Global Step: 150590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:52:02,715-Speed 11017.47 samples/sec   Loss 6.1183   LearningRate 0.0065   Epoch: 29   Global Step: 150600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:52:03,680-Speed 10625.37 samples/sec   Loss 6.1486   LearningRate 0.0065   Epoch: 29   Global Step: 150610   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:52:04,620-Speed 10897.54 samples/sec   Loss 6.1493   LearningRate 0.0065   Epoch: 29   Global Step: 150620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:52:05,621-Speed 10238.06 samples/sec   Loss 6.2345   LearningRate 0.0065   Epoch: 29   Global Step: 150630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:52:06,633-Speed 10132.80 samples/sec   Loss 6.1866   LearningRate 0.0065   Epoch: 29   Global Step: 150640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:52:07,642-Speed 10161.80 samples/sec   Loss 6.3222   LearningRate 0.0065   Epoch: 29   Global Step: 150650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:52:08,595-Speed 10748.00 samples/sec   Loss 6.0893   LearningRate 0.0065   Epoch: 29   Global Step: 150660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:52:09,619-Speed 10008.69 samples/sec   Loss 6.3226   LearningRate 0.0065   Epoch: 29   Global Step: 150670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:52:10,661-Speed 9838.51 samples/sec   Loss 6.1863   LearningRate 0.0065   Epoch: 29   Global Step: 150680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:52:11,654-Speed 10315.44 samples/sec   Loss 6.1916   LearningRate 0.0065   Epoch: 29   Global Step: 150690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:52:12,622-Speed 10591.95 samples/sec   Loss 6.2516   LearningRate 0.0065   Epoch: 29   Global Step: 150700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:52:13,623-Speed 10235.61 samples/sec   Loss 6.1291   LearningRate 0.0065   Epoch: 29   Global Step: 150710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:52:14,569-Speed 10835.57 samples/sec   Loss 6.1802   LearningRate 0.0065   Epoch: 29   Global Step: 150720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:52:15,519-Speed 10782.72 samples/sec   Loss 6.0242   LearningRate 0.0065   Epoch: 29   Global Step: 150730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:52:16,506-Speed 10393.15 samples/sec   Loss 6.1268   LearningRate 0.0065   Epoch: 29   Global Step: 150740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:52:17,439-Speed 10978.50 samples/sec   Loss 6.2212   LearningRate 0.0065   Epoch: 29   Global Step: 150750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:52:18,487-Speed 9786.45 samples/sec   Loss 6.2877   LearningRate 0.0065   Epoch: 29   Global Step: 150760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:52:19,489-Speed 10226.57 samples/sec   Loss 6.2077   LearningRate 0.0065   Epoch: 29   Global Step: 150770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:52:20,498-Speed 10154.99 samples/sec   Loss 6.2338   LearningRate 0.0065   Epoch: 29   Global Step: 150780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:52:21,534-Speed 9895.20 samples/sec   Loss 6.2450   LearningRate 0.0065   Epoch: 29   Global Step: 150790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:52:22,501-Speed 10596.43 samples/sec   Loss 6.0015   LearningRate 0.0065   Epoch: 29   Global Step: 150800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:52:23,496-Speed 10301.82 samples/sec   Loss 6.2500   LearningRate 0.0065   Epoch: 29   Global Step: 150810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:52:24,513-Speed 10076.58 samples/sec   Loss 6.2025   LearningRate 0.0065   Epoch: 29   Global Step: 150820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:52:25,447-Speed 10977.57 samples/sec   Loss 6.1077   LearningRate 0.0065   Epoch: 29   Global Step: 150830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:52:26,439-Speed 10330.23 samples/sec   Loss 6.1502   LearningRate 0.0065   Epoch: 29   Global Step: 150840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:52:27,451-Speed 10129.98 samples/sec   Loss 6.1575   LearningRate 0.0065   Epoch: 29   Global Step: 150850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:52:28,448-Speed 10284.03 samples/sec   Loss 6.2358   LearningRate 0.0065   Epoch: 29   Global Step: 150860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:52:29,429-Speed 10448.43 samples/sec   Loss 6.3638   LearningRate 0.0065   Epoch: 29   Global Step: 150870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:52:30,410-Speed 10441.78 samples/sec   Loss 6.1001   LearningRate 0.0065   Epoch: 29   Global Step: 150880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:52:31,429-Speed 10060.13 samples/sec   Loss 6.1268   LearningRate 0.0065   Epoch: 29   Global Step: 150890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:52:32,408-Speed 10471.64 samples/sec   Loss 6.1927   LearningRate 0.0065   Epoch: 29   Global Step: 150900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:52:33,392-Speed 10420.08 samples/sec   Loss 6.1656   LearningRate 0.0065   Epoch: 29   Global Step: 150910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:52:34,360-Speed 10581.24 samples/sec   Loss 6.0843   LearningRate 0.0065   Epoch: 29   Global Step: 150920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:52:35,369-Speed 10154.09 samples/sec   Loss 6.1967   LearningRate 0.0065   Epoch: 29   Global Step: 150930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:52:36,300-Speed 11016.91 samples/sec   Loss 6.1604   LearningRate 0.0064   Epoch: 29   Global Step: 150940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:52:37,297-Speed 10279.17 samples/sec   Loss 6.1617   LearningRate 0.0064   Epoch: 29   Global Step: 150950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:52:38,241-Speed 10850.27 samples/sec   Loss 6.1538   LearningRate 0.0064   Epoch: 29   Global Step: 150960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:52:39,256-Speed 10102.11 samples/sec   Loss 6.3746   LearningRate 0.0064   Epoch: 29   Global Step: 150970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:52:40,227-Speed 10567.21 samples/sec   Loss 6.1286   LearningRate 0.0064   Epoch: 29   Global Step: 150980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:52:41,270-Speed 9828.26 samples/sec   Loss 6.2500   LearningRate 0.0064   Epoch: 29   Global Step: 150990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:52:42,238-Speed 10587.29 samples/sec   Loss 6.2017   LearningRate 0.0064   Epoch: 29   Global Step: 151000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:52:43,227-Speed 10366.55 samples/sec   Loss 6.1240   LearningRate 0.0064   Epoch: 29   Global Step: 151010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:52:44,256-Speed 9961.38 samples/sec   Loss 6.3357   LearningRate 0.0064   Epoch: 29   Global Step: 151020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:52:45,277-Speed 10036.44 samples/sec   Loss 6.2822   LearningRate 0.0064   Epoch: 29   Global Step: 151030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:52:46,214-Speed 10933.76 samples/sec   Loss 6.0695   LearningRate 0.0064   Epoch: 29   Global Step: 151040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:52:47,173-Speed 10681.55 samples/sec   Loss 6.1548   LearningRate 0.0064   Epoch: 29   Global Step: 151050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:52:48,148-Speed 10520.14 samples/sec   Loss 6.2195   LearningRate 0.0064   Epoch: 29   Global Step: 151060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:52:49,113-Speed 10627.09 samples/sec   Loss 6.2722   LearningRate 0.0064   Epoch: 29   Global Step: 151070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:52:50,059-Speed 10848.95 samples/sec   Loss 6.1578   LearningRate 0.0064   Epoch: 29   Global Step: 151080   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:52:51,073-Speed 10102.08 samples/sec   Loss 6.1019   LearningRate 0.0064   Epoch: 29   Global Step: 151090   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:52:52,088-Speed 10103.23 samples/sec   Loss 6.1314   LearningRate 0.0064   Epoch: 29   Global Step: 151100   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:52:53,029-Speed 10899.49 samples/sec   Loss 6.1986   LearningRate 0.0064   Epoch: 29   Global Step: 151110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:52:53,977-Speed 10807.66 samples/sec   Loss 6.0596   LearningRate 0.0064   Epoch: 29   Global Step: 151120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:52:54,960-Speed 10430.07 samples/sec   Loss 6.1679   LearningRate 0.0064   Epoch: 29   Global Step: 151130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:52:55,980-Speed 10056.94 samples/sec   Loss 6.2677   LearningRate 0.0064   Epoch: 29   Global Step: 151140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:52:56,903-Speed 11101.20 samples/sec   Loss 6.1234   LearningRate 0.0064   Epoch: 29   Global Step: 151150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:52:57,843-Speed 10896.30 samples/sec   Loss 5.9753   LearningRate 0.0064   Epoch: 29   Global Step: 151160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:52:58,871-Speed 9970.48 samples/sec   Loss 6.1888   LearningRate 0.0064   Epoch: 29   Global Step: 151170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:52:59,849-Speed 10485.86 samples/sec   Loss 6.3963   LearningRate 0.0064   Epoch: 29   Global Step: 151180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:53:00,864-Speed 10111.80 samples/sec   Loss 6.1453   LearningRate 0.0064   Epoch: 29   Global Step: 151190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:53:01,891-Speed 9975.08 samples/sec   Loss 6.2206   LearningRate 0.0064   Epoch: 29   Global Step: 151200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:53:02,912-Speed 10044.07 samples/sec   Loss 6.1998   LearningRate 0.0064   Epoch: 29   Global Step: 151210   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:53:03,936-Speed 10011.98 samples/sec   Loss 6.1900   LearningRate 0.0064   Epoch: 29   Global Step: 151220   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:53:04,950-Speed 10103.70 samples/sec   Loss 6.0957   LearningRate 0.0064   Epoch: 29   Global Step: 151230   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:53:05,914-Speed 10632.38 samples/sec   Loss 6.2301   LearningRate 0.0064   Epoch: 29   Global Step: 151240   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:53:06,858-Speed 10863.68 samples/sec   Loss 6.2173   LearningRate 0.0064   Epoch: 29   Global Step: 151250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:53:07,812-Speed 10737.50 samples/sec   Loss 6.1967   LearningRate 0.0064   Epoch: 29   Global Step: 151260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:53:08,771-Speed 10693.52 samples/sec   Loss 6.1721   LearningRate 0.0064   Epoch: 29   Global Step: 151270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:53:09,762-Speed 10334.45 samples/sec   Loss 6.2940   LearningRate 0.0064   Epoch: 29   Global Step: 151280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:53:10,702-Speed 10912.42 samples/sec   Loss 6.1863   LearningRate 0.0064   Epoch: 29   Global Step: 151290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:53:11,673-Speed 10551.56 samples/sec   Loss 6.2483   LearningRate 0.0064   Epoch: 29   Global Step: 151300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:53:12,660-Speed 10386.24 samples/sec   Loss 6.2211   LearningRate 0.0064   Epoch: 29   Global Step: 151310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:53:13,630-Speed 10571.42 samples/sec   Loss 6.2084   LearningRate 0.0064   Epoch: 29   Global Step: 151320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:53:14,661-Speed 9937.42 samples/sec   Loss 6.2066   LearningRate 0.0064   Epoch: 29   Global Step: 151330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:53:15,635-Speed 10521.66 samples/sec   Loss 6.3322   LearningRate 0.0063   Epoch: 29   Global Step: 151340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:53:16,614-Speed 10475.47 samples/sec   Loss 6.2669   LearningRate 0.0063   Epoch: 29   Global Step: 151350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:53:17,563-Speed 10801.25 samples/sec   Loss 6.2297   LearningRate 0.0063   Epoch: 29   Global Step: 151360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:53:18,592-Speed 9958.97 samples/sec   Loss 6.1667   LearningRate 0.0063   Epoch: 29   Global Step: 151370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:53:19,599-Speed 10187.00 samples/sec   Loss 6.3588   LearningRate 0.0063   Epoch: 29   Global Step: 151380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:53:20,646-Speed 9792.19 samples/sec   Loss 6.1493   LearningRate 0.0063   Epoch: 29   Global Step: 151390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:53:21,734-Speed 9414.86 samples/sec   Loss 6.3098   LearningRate 0.0063   Epoch: 29   Global Step: 151400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:53:22,685-Speed 10781.39 samples/sec   Loss 6.1625   LearningRate 0.0063   Epoch: 29   Global Step: 151410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:53:23,639-Speed 10742.00 samples/sec   Loss 6.4007   LearningRate 0.0063   Epoch: 29   Global Step: 151420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:53:24,688-Speed 9768.14 samples/sec   Loss 6.1407   LearningRate 0.0063   Epoch: 29   Global Step: 151430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:53:25,648-Speed 10679.96 samples/sec   Loss 6.0926   LearningRate 0.0063   Epoch: 29   Global Step: 151440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:53:26,594-Speed 10835.98 samples/sec   Loss 6.1419   LearningRate 0.0063   Epoch: 29   Global Step: 151450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:53:27,578-Speed 10415.65 samples/sec   Loss 6.1469   LearningRate 0.0063   Epoch: 29   Global Step: 151460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:53:28,567-Speed 10360.47 samples/sec   Loss 6.2922   LearningRate 0.0063   Epoch: 29   Global Step: 151470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:53:29,561-Speed 10320.26 samples/sec   Loss 6.3573   LearningRate 0.0063   Epoch: 29   Global Step: 151480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:53:30,526-Speed 10618.33 samples/sec   Loss 6.0916   LearningRate 0.0063   Epoch: 29   Global Step: 151490   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:53:31,511-Speed 10400.67 samples/sec   Loss 6.1777   LearningRate 0.0063   Epoch: 29   Global Step: 151500   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:53:32,488-Speed 10498.30 samples/sec   Loss 6.2339   LearningRate 0.0063   Epoch: 29   Global Step: 151510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:53:33,454-Speed 10614.47 samples/sec   Loss 6.1732   LearningRate 0.0063   Epoch: 29   Global Step: 151520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:53:34,359-Speed 11325.84 samples/sec   Loss 6.1747   LearningRate 0.0063   Epoch: 29   Global Step: 151530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:53:35,348-Speed 10358.82 samples/sec   Loss 6.1573   LearningRate 0.0063   Epoch: 29   Global Step: 151540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:53:36,318-Speed 10561.13 samples/sec   Loss 6.0513   LearningRate 0.0063   Epoch: 29   Global Step: 151550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:53:37,304-Speed 10398.82 samples/sec   Loss 6.2294   LearningRate 0.0063   Epoch: 29   Global Step: 151560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:53:38,300-Speed 10340.71 samples/sec   Loss 6.4576   LearningRate 0.0063   Epoch: 29   Global Step: 151570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:53:39,285-Speed 10404.57 samples/sec   Loss 6.0908   LearningRate 0.0063   Epoch: 29   Global Step: 151580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:53:40,300-Speed 10103.76 samples/sec   Loss 6.2596   LearningRate 0.0063   Epoch: 29   Global Step: 151590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:53:41,280-Speed 10453.98 samples/sec   Loss 6.2641   LearningRate 0.0063   Epoch: 29   Global Step: 151600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:53:42,270-Speed 10350.28 samples/sec   Loss 6.0488   LearningRate 0.0063   Epoch: 29   Global Step: 151610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:53:43,277-Speed 10173.30 samples/sec   Loss 6.2368   LearningRate 0.0063   Epoch: 29   Global Step: 151620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:53:44,263-Speed 10399.08 samples/sec   Loss 6.1363   LearningRate 0.0063   Epoch: 29   Global Step: 151630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:53:45,243-Speed 10468.58 samples/sec   Loss 6.1869   LearningRate 0.0063   Epoch: 29   Global Step: 151640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:53:46,222-Speed 10465.00 samples/sec   Loss 6.3089   LearningRate 0.0063   Epoch: 29   Global Step: 151650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:53:47,146-Speed 11085.21 samples/sec   Loss 6.1655   LearningRate 0.0063   Epoch: 29   Global Step: 151660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:53:48,115-Speed 10581.39 samples/sec   Loss 6.0768   LearningRate 0.0063   Epoch: 29   Global Step: 151670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:53:49,109-Speed 10312.72 samples/sec   Loss 6.2559   LearningRate 0.0063   Epoch: 29   Global Step: 151680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:53:50,086-Speed 10490.72 samples/sec   Loss 6.2739   LearningRate 0.0063   Epoch: 29   Global Step: 151690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:53:51,038-Speed 10762.85 samples/sec   Loss 6.1064   LearningRate 0.0063   Epoch: 29   Global Step: 151700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:53:52,024-Speed 10397.45 samples/sec   Loss 6.1528   LearningRate 0.0063   Epoch: 29   Global Step: 151710   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:53:53,018-Speed 10315.87 samples/sec   Loss 6.1169   LearningRate 0.0063   Epoch: 29   Global Step: 151720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:53:54,144-Speed 9103.38 samples/sec   Loss 6.2095   LearningRate 0.0063   Epoch: 29   Global Step: 151730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:53:55,017-Speed 11740.33 samples/sec   Loss 6.1770   LearningRate 0.0063   Epoch: 29   Global Step: 151740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:54:04,374-Speed 1094.49 samples/sec   Loss 5.6045   LearningRate 0.0062   Epoch: 30   Global Step: 151750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:54:05,386-Speed 10152.85 samples/sec   Loss 5.5417   LearningRate 0.0062   Epoch: 30   Global Step: 151760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:54:06,451-Speed 9621.68 samples/sec   Loss 5.6240   LearningRate 0.0062   Epoch: 30   Global Step: 151770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:54:07,466-Speed 10095.80 samples/sec   Loss 5.6401   LearningRate 0.0062   Epoch: 30   Global Step: 151780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:54:08,711-Speed 8231.02 samples/sec   Loss 5.6078   LearningRate 0.0062   Epoch: 30   Global Step: 151790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:54:09,808-Speed 9348.20 samples/sec   Loss 5.6530   LearningRate 0.0062   Epoch: 30   Global Step: 151800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:54:10,845-Speed 9877.60 samples/sec   Loss 5.6298   LearningRate 0.0062   Epoch: 30   Global Step: 151810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:54:11,816-Speed 10557.81 samples/sec   Loss 5.6078   LearningRate 0.0062   Epoch: 30   Global Step: 151820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:54:12,788-Speed 10547.16 samples/sec   Loss 5.5102   LearningRate 0.0062   Epoch: 30   Global Step: 151830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:54:13,778-Speed 10353.44 samples/sec   Loss 5.6628   LearningRate 0.0062   Epoch: 30   Global Step: 151840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:54:14,732-Speed 10745.49 samples/sec   Loss 5.6760   LearningRate 0.0062   Epoch: 30   Global Step: 151850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:54:15,828-Speed 9345.33 samples/sec   Loss 5.5955   LearningRate 0.0062   Epoch: 30   Global Step: 151860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:54:16,776-Speed 10823.00 samples/sec   Loss 5.7012   LearningRate 0.0062   Epoch: 30   Global Step: 151870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:54:17,722-Speed 10834.74 samples/sec   Loss 5.6595   LearningRate 0.0062   Epoch: 30   Global Step: 151880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:54:18,734-Speed 10133.60 samples/sec   Loss 5.6547   LearningRate 0.0062   Epoch: 30   Global Step: 151890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:54:19,683-Speed 10789.40 samples/sec   Loss 5.6002   LearningRate 0.0062   Epoch: 30   Global Step: 151900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:54:20,767-Speed 9460.41 samples/sec   Loss 5.7496   LearningRate 0.0062   Epoch: 30   Global Step: 151910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:54:21,788-Speed 10033.13 samples/sec   Loss 5.5569   LearningRate 0.0062   Epoch: 30   Global Step: 151920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:54:22,783-Speed 10311.93 samples/sec   Loss 5.6107   LearningRate 0.0062   Epoch: 30   Global Step: 151930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:54:23,715-Speed 10991.01 samples/sec   Loss 5.5897   LearningRate 0.0062   Epoch: 30   Global Step: 151940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:54:24,638-Speed 11102.37 samples/sec   Loss 5.6379   LearningRate 0.0062   Epoch: 30   Global Step: 151950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:54:25,611-Speed 10537.70 samples/sec   Loss 5.5539   LearningRate 0.0062   Epoch: 30   Global Step: 151960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:54:26,638-Speed 9980.90 samples/sec   Loss 5.6902   LearningRate 0.0062   Epoch: 30   Global Step: 151970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:54:27,630-Speed 10327.06 samples/sec   Loss 5.8290   LearningRate 0.0062   Epoch: 30   Global Step: 151980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:54:28,660-Speed 9969.65 samples/sec   Loss 5.6626   LearningRate 0.0062   Epoch: 30   Global Step: 151990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:54:29,636-Speed 10502.96 samples/sec   Loss 5.5213   LearningRate 0.0062   Epoch: 30   Global Step: 152000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:54:51,703-[lfw][152000]XNorm: 8.586445
Training: 2022-04-11 04:54:51,703-[lfw][152000]Accuracy-Flip: 0.99667+-0.00333
Training: 2022-04-11 04:54:51,704-[lfw][152000]Accuracy-Highest: 0.99700
Training: 2022-04-11 04:55:17,305-[cfp_fp][152000]XNorm: 7.402166
Training: 2022-04-11 04:55:17,306-[cfp_fp][152000]Accuracy-Flip: 0.96829+-0.01047
Training: 2022-04-11 04:55:17,307-[cfp_fp][152000]Accuracy-Highest: 0.96971
Training: 2022-04-11 04:55:39,572-[agedb_30][152000]XNorm: 8.408008
Training: 2022-04-11 04:55:39,573-[agedb_30][152000]Accuracy-Flip: 0.96983+-0.00743
Training: 2022-04-11 04:55:39,573-[agedb_30][152000]Accuracy-Highest: 0.97250
Training: 2022-04-11 04:55:40,544-Speed 144.41 samples/sec   Loss 5.7118   LearningRate 0.0062   Epoch: 30   Global Step: 152010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:55:41,515-Speed 10559.71 samples/sec   Loss 5.7859   LearningRate 0.0062   Epoch: 30   Global Step: 152020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:55:42,493-Speed 10482.10 samples/sec   Loss 5.6833   LearningRate 0.0062   Epoch: 30   Global Step: 152030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:55:43,481-Speed 10371.41 samples/sec   Loss 5.7187   LearningRate 0.0062   Epoch: 30   Global Step: 152040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:55:44,444-Speed 10644.41 samples/sec   Loss 5.7781   LearningRate 0.0062   Epoch: 30   Global Step: 152050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:55:45,415-Speed 10553.58 samples/sec   Loss 5.5578   LearningRate 0.0062   Epoch: 30   Global Step: 152060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:55:46,358-Speed 10866.39 samples/sec   Loss 5.7163   LearningRate 0.0062   Epoch: 30   Global Step: 152070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:55:47,290-Speed 11000.78 samples/sec   Loss 5.6070   LearningRate 0.0062   Epoch: 30   Global Step: 152080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:55:48,284-Speed 10319.18 samples/sec   Loss 5.5359   LearningRate 0.0062   Epoch: 30   Global Step: 152090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:55:49,241-Speed 10718.63 samples/sec   Loss 5.6748   LearningRate 0.0062   Epoch: 30   Global Step: 152100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:55:50,247-Speed 10187.42 samples/sec   Loss 5.8854   LearningRate 0.0062   Epoch: 30   Global Step: 152110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:55:51,242-Speed 10307.66 samples/sec   Loss 5.6069   LearningRate 0.0062   Epoch: 30   Global Step: 152120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:55:52,351-Speed 9245.33 samples/sec   Loss 5.7322   LearningRate 0.0062   Epoch: 30   Global Step: 152130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:55:53,289-Speed 10930.55 samples/sec   Loss 5.5924   LearningRate 0.0062   Epoch: 30   Global Step: 152140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:55:54,283-Speed 10313.43 samples/sec   Loss 5.6465   LearningRate 0.0061   Epoch: 30   Global Step: 152150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:55:55,261-Speed 10478.54 samples/sec   Loss 5.7072   LearningRate 0.0061   Epoch: 30   Global Step: 152160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:55:56,274-Speed 10155.66 samples/sec   Loss 5.7665   LearningRate 0.0061   Epoch: 30   Global Step: 152170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:55:57,280-Speed 10184.07 samples/sec   Loss 5.6580   LearningRate 0.0061   Epoch: 30   Global Step: 152180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:55:58,263-Speed 10428.51 samples/sec   Loss 5.7566   LearningRate 0.0061   Epoch: 30   Global Step: 152190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:55:59,262-Speed 10260.34 samples/sec   Loss 5.6815   LearningRate 0.0061   Epoch: 30   Global Step: 152200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:56:00,254-Speed 10334.48 samples/sec   Loss 5.6987   LearningRate 0.0061   Epoch: 30   Global Step: 152210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:56:01,226-Speed 10546.73 samples/sec   Loss 5.5908   LearningRate 0.0061   Epoch: 30   Global Step: 152220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:56:02,159-Speed 10973.83 samples/sec   Loss 5.6871   LearningRate 0.0061   Epoch: 30   Global Step: 152230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:56:03,138-Speed 10467.99 samples/sec   Loss 5.7485   LearningRate 0.0061   Epoch: 30   Global Step: 152240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:56:04,112-Speed 10529.16 samples/sec   Loss 5.8272   LearningRate 0.0061   Epoch: 30   Global Step: 152250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:56:05,070-Speed 10714.34 samples/sec   Loss 5.7472   LearningRate 0.0061   Epoch: 30   Global Step: 152260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:56:06,027-Speed 10703.52 samples/sec   Loss 5.7839   LearningRate 0.0061   Epoch: 30   Global Step: 152270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:56:06,970-Speed 10862.82 samples/sec   Loss 5.6430   LearningRate 0.0061   Epoch: 30   Global Step: 152280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:56:07,957-Speed 10399.88 samples/sec   Loss 5.6117   LearningRate 0.0061   Epoch: 30   Global Step: 152290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:56:08,935-Speed 10476.40 samples/sec   Loss 5.8642   LearningRate 0.0061   Epoch: 30   Global Step: 152300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:56:09,899-Speed 10633.47 samples/sec   Loss 5.7002   LearningRate 0.0061   Epoch: 30   Global Step: 152310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:56:10,854-Speed 10730.21 samples/sec   Loss 5.7615   LearningRate 0.0061   Epoch: 30   Global Step: 152320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:56:11,840-Speed 10394.72 samples/sec   Loss 5.8119   LearningRate 0.0061   Epoch: 30   Global Step: 152330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:56:12,787-Speed 10813.61 samples/sec   Loss 5.6793   LearningRate 0.0061   Epoch: 30   Global Step: 152340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:56:13,768-Speed 10461.11 samples/sec   Loss 5.7831   LearningRate 0.0061   Epoch: 30   Global Step: 152350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:56:14,745-Speed 10488.36 samples/sec   Loss 5.8705   LearningRate 0.0061   Epoch: 30   Global Step: 152360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:56:15,720-Speed 10508.77 samples/sec   Loss 5.6947   LearningRate 0.0061   Epoch: 30   Global Step: 152370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:56:16,681-Speed 10673.25 samples/sec   Loss 5.7941   LearningRate 0.0061   Epoch: 30   Global Step: 152380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:56:17,650-Speed 10572.82 samples/sec   Loss 5.7010   LearningRate 0.0061   Epoch: 30   Global Step: 152390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:56:18,664-Speed 10112.88 samples/sec   Loss 5.7390   LearningRate 0.0061   Epoch: 30   Global Step: 152400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:56:19,626-Speed 10650.00 samples/sec   Loss 5.6807   LearningRate 0.0061   Epoch: 30   Global Step: 152410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:56:20,608-Speed 10435.67 samples/sec   Loss 5.6529   LearningRate 0.0061   Epoch: 30   Global Step: 152420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:56:21,530-Speed 11113.95 samples/sec   Loss 5.7394   LearningRate 0.0061   Epoch: 30   Global Step: 152430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:56:22,506-Speed 10497.88 samples/sec   Loss 5.8159   LearningRate 0.0061   Epoch: 30   Global Step: 152440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:56:23,541-Speed 9909.75 samples/sec   Loss 5.6517   LearningRate 0.0061   Epoch: 30   Global Step: 152450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:56:24,499-Speed 10713.87 samples/sec   Loss 5.7748   LearningRate 0.0061   Epoch: 30   Global Step: 152460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:56:25,499-Speed 10243.35 samples/sec   Loss 5.5770   LearningRate 0.0061   Epoch: 30   Global Step: 152470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:56:26,491-Speed 10333.18 samples/sec   Loss 5.7856   LearningRate 0.0061   Epoch: 30   Global Step: 152480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:56:27,465-Speed 10516.86 samples/sec   Loss 5.6970   LearningRate 0.0061   Epoch: 30   Global Step: 152490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:56:28,418-Speed 10759.48 samples/sec   Loss 5.8666   LearningRate 0.0061   Epoch: 30   Global Step: 152500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:56:29,378-Speed 10673.02 samples/sec   Loss 5.8004   LearningRate 0.0061   Epoch: 30   Global Step: 152510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:56:30,339-Speed 10665.86 samples/sec   Loss 5.6869   LearningRate 0.0061   Epoch: 30   Global Step: 152520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:56:31,400-Speed 9657.08 samples/sec   Loss 5.6827   LearningRate 0.0061   Epoch: 30   Global Step: 152530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:56:32,414-Speed 10107.10 samples/sec   Loss 5.7076   LearningRate 0.0061   Epoch: 30   Global Step: 152540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:56:33,409-Speed 10302.70 samples/sec   Loss 5.7595   LearningRate 0.0061   Epoch: 30   Global Step: 152550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:56:34,384-Speed 10515.76 samples/sec   Loss 5.6906   LearningRate 0.0060   Epoch: 30   Global Step: 152560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:56:35,373-Speed 10370.50 samples/sec   Loss 5.6958   LearningRate 0.0060   Epoch: 30   Global Step: 152570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:56:36,355-Speed 10430.89 samples/sec   Loss 5.8591   LearningRate 0.0060   Epoch: 30   Global Step: 152580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:56:37,366-Speed 10143.84 samples/sec   Loss 5.7852   LearningRate 0.0060   Epoch: 30   Global Step: 152590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:56:38,334-Speed 10577.91 samples/sec   Loss 5.7638   LearningRate 0.0060   Epoch: 30   Global Step: 152600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:56:39,311-Speed 10493.36 samples/sec   Loss 5.7520   LearningRate 0.0060   Epoch: 30   Global Step: 152610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:56:40,319-Speed 10169.39 samples/sec   Loss 5.7724   LearningRate 0.0060   Epoch: 30   Global Step: 152620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:56:41,255-Speed 10947.02 samples/sec   Loss 5.7237   LearningRate 0.0060   Epoch: 30   Global Step: 152630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:56:42,226-Speed 10560.25 samples/sec   Loss 5.8267   LearningRate 0.0060   Epoch: 30   Global Step: 152640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:56:43,259-Speed 9918.37 samples/sec   Loss 5.8385   LearningRate 0.0060   Epoch: 30   Global Step: 152650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:56:44,267-Speed 10177.04 samples/sec   Loss 5.6737   LearningRate 0.0060   Epoch: 30   Global Step: 152660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:56:45,254-Speed 10379.80 samples/sec   Loss 5.7175   LearningRate 0.0060   Epoch: 30   Global Step: 152670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:56:46,262-Speed 10165.59 samples/sec   Loss 5.7245   LearningRate 0.0060   Epoch: 30   Global Step: 152680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:56:47,252-Speed 10359.77 samples/sec   Loss 5.8485   LearningRate 0.0060   Epoch: 30   Global Step: 152690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:56:48,260-Speed 10163.66 samples/sec   Loss 5.9340   LearningRate 0.0060   Epoch: 30   Global Step: 152700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:56:49,223-Speed 10647.88 samples/sec   Loss 5.7628   LearningRate 0.0060   Epoch: 30   Global Step: 152710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:56:50,206-Speed 10423.50 samples/sec   Loss 5.7573   LearningRate 0.0060   Epoch: 30   Global Step: 152720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:56:51,208-Speed 10227.07 samples/sec   Loss 5.7582   LearningRate 0.0060   Epoch: 30   Global Step: 152730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:56:52,217-Speed 10149.53 samples/sec   Loss 5.9335   LearningRate 0.0060   Epoch: 30   Global Step: 152740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:56:53,145-Speed 11053.89 samples/sec   Loss 5.7043   LearningRate 0.0060   Epoch: 30   Global Step: 152750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:56:54,103-Speed 10701.25 samples/sec   Loss 5.7725   LearningRate 0.0060   Epoch: 30   Global Step: 152760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:56:55,049-Speed 10827.25 samples/sec   Loss 5.7595   LearningRate 0.0060   Epoch: 30   Global Step: 152770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:56:56,018-Speed 10581.73 samples/sec   Loss 5.8794   LearningRate 0.0060   Epoch: 30   Global Step: 152780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:56:57,013-Speed 10299.20 samples/sec   Loss 5.9341   LearningRate 0.0060   Epoch: 30   Global Step: 152790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:56:57,949-Speed 10951.22 samples/sec   Loss 5.9571   LearningRate 0.0060   Epoch: 30   Global Step: 152800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:56:58,922-Speed 10531.88 samples/sec   Loss 5.7599   LearningRate 0.0060   Epoch: 30   Global Step: 152810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:56:59,980-Speed 9691.96 samples/sec   Loss 5.9301   LearningRate 0.0060   Epoch: 30   Global Step: 152820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:00,916-Speed 10959.83 samples/sec   Loss 5.8857   LearningRate 0.0060   Epoch: 30   Global Step: 152830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:01,870-Speed 10734.79 samples/sec   Loss 5.8705   LearningRate 0.0060   Epoch: 30   Global Step: 152840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:02,848-Speed 10484.35 samples/sec   Loss 5.6244   LearningRate 0.0060   Epoch: 30   Global Step: 152850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:03,844-Speed 10289.37 samples/sec   Loss 5.8437   LearningRate 0.0060   Epoch: 30   Global Step: 152860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:04,790-Speed 10839.74 samples/sec   Loss 5.8204   LearningRate 0.0060   Epoch: 30   Global Step: 152870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:05,781-Speed 10339.91 samples/sec   Loss 5.8361   LearningRate 0.0060   Epoch: 30   Global Step: 152880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:06,784-Speed 10214.27 samples/sec   Loss 5.8103   LearningRate 0.0060   Epoch: 30   Global Step: 152890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:07,755-Speed 10551.88 samples/sec   Loss 5.7608   LearningRate 0.0060   Epoch: 30   Global Step: 152900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:57:08,871-Speed 9187.88 samples/sec   Loss 5.8312   LearningRate 0.0060   Epoch: 30   Global Step: 152910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:57:09,835-Speed 10633.17 samples/sec   Loss 5.8880   LearningRate 0.0060   Epoch: 30   Global Step: 152920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:57:10,767-Speed 11001.37 samples/sec   Loss 5.8706   LearningRate 0.0060   Epoch: 30   Global Step: 152930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:57:11,724-Speed 10703.60 samples/sec   Loss 5.9495   LearningRate 0.0060   Epoch: 30   Global Step: 152940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:57:12,700-Speed 10503.79 samples/sec   Loss 5.8996   LearningRate 0.0060   Epoch: 30   Global Step: 152950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:57:13,681-Speed 10445.83 samples/sec   Loss 5.6733   LearningRate 0.0060   Epoch: 30   Global Step: 152960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:57:14,719-Speed 9877.50 samples/sec   Loss 5.7901   LearningRate 0.0059   Epoch: 30   Global Step: 152970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:57:15,615-Speed 11431.26 samples/sec   Loss 5.8385   LearningRate 0.0059   Epoch: 30   Global Step: 152980   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:57:16,607-Speed 10328.52 samples/sec   Loss 5.8145   LearningRate 0.0059   Epoch: 30   Global Step: 152990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:57:17,563-Speed 10718.74 samples/sec   Loss 5.6698   LearningRate 0.0059   Epoch: 30   Global Step: 153000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:18,564-Speed 10243.74 samples/sec   Loss 5.7581   LearningRate 0.0059   Epoch: 30   Global Step: 153010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:19,541-Speed 10490.26 samples/sec   Loss 5.7497   LearningRate 0.0059   Epoch: 30   Global Step: 153020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:20,462-Speed 11135.84 samples/sec   Loss 5.6771   LearningRate 0.0059   Epoch: 30   Global Step: 153030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:21,388-Speed 11062.00 samples/sec   Loss 5.9254   LearningRate 0.0059   Epoch: 30   Global Step: 153040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:22,360-Speed 10548.69 samples/sec   Loss 5.7608   LearningRate 0.0059   Epoch: 30   Global Step: 153050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:23,352-Speed 10337.28 samples/sec   Loss 5.8021   LearningRate 0.0059   Epoch: 30   Global Step: 153060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:24,295-Speed 10870.72 samples/sec   Loss 5.9029   LearningRate 0.0059   Epoch: 30   Global Step: 153070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:25,228-Speed 10984.22 samples/sec   Loss 5.7246   LearningRate 0.0059   Epoch: 30   Global Step: 153080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:26,192-Speed 10626.17 samples/sec   Loss 5.8869   LearningRate 0.0059   Epoch: 30   Global Step: 153090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:27,159-Speed 10603.10 samples/sec   Loss 5.7470   LearningRate 0.0059   Epoch: 30   Global Step: 153100   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:57:28,092-Speed 10983.65 samples/sec   Loss 5.8103   LearningRate 0.0059   Epoch: 30   Global Step: 153110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:29,021-Speed 11028.16 samples/sec   Loss 5.7878   LearningRate 0.0059   Epoch: 30   Global Step: 153120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:29,967-Speed 10884.48 samples/sec   Loss 5.7708   LearningRate 0.0059   Epoch: 30   Global Step: 153130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:30,973-Speed 10192.38 samples/sec   Loss 5.7353   LearningRate 0.0059   Epoch: 30   Global Step: 153140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:31,909-Speed 10952.23 samples/sec   Loss 5.7926   LearningRate 0.0059   Epoch: 30   Global Step: 153150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:32,885-Speed 10500.11 samples/sec   Loss 5.8365   LearningRate 0.0059   Epoch: 30   Global Step: 153160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:33,836-Speed 10773.10 samples/sec   Loss 5.8572   LearningRate 0.0059   Epoch: 30   Global Step: 153170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:34,843-Speed 10182.47 samples/sec   Loss 5.7729   LearningRate 0.0059   Epoch: 30   Global Step: 153180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:35,792-Speed 10800.64 samples/sec   Loss 5.7149   LearningRate 0.0059   Epoch: 30   Global Step: 153190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:57:36,697-Speed 11332.81 samples/sec   Loss 5.8467   LearningRate 0.0059   Epoch: 30   Global Step: 153200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:57:37,674-Speed 10483.96 samples/sec   Loss 5.8743   LearningRate 0.0059   Epoch: 30   Global Step: 153210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:57:38,682-Speed 10171.63 samples/sec   Loss 5.8085   LearningRate 0.0059   Epoch: 30   Global Step: 153220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:57:39,657-Speed 10513.39 samples/sec   Loss 5.8706   LearningRate 0.0059   Epoch: 30   Global Step: 153230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:57:40,656-Speed 10255.11 samples/sec   Loss 5.8389   LearningRate 0.0059   Epoch: 30   Global Step: 153240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:57:41,636-Speed 10465.69 samples/sec   Loss 5.7652   LearningRate 0.0059   Epoch: 30   Global Step: 153250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:57:42,639-Speed 10217.43 samples/sec   Loss 5.7263   LearningRate 0.0059   Epoch: 30   Global Step: 153260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:57:43,604-Speed 10629.00 samples/sec   Loss 5.9205   LearningRate 0.0059   Epoch: 30   Global Step: 153270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:57:44,587-Speed 10423.37 samples/sec   Loss 5.7996   LearningRate 0.0059   Epoch: 30   Global Step: 153280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:57:45,544-Speed 10701.35 samples/sec   Loss 5.9391   LearningRate 0.0059   Epoch: 30   Global Step: 153290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:46,514-Speed 10565.84 samples/sec   Loss 5.7566   LearningRate 0.0059   Epoch: 30   Global Step: 153300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:47,535-Speed 10039.98 samples/sec   Loss 5.9437   LearningRate 0.0059   Epoch: 30   Global Step: 153310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:48,509-Speed 10516.43 samples/sec   Loss 5.8961   LearningRate 0.0059   Epoch: 30   Global Step: 153320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:49,466-Speed 10712.18 samples/sec   Loss 5.8960   LearningRate 0.0059   Epoch: 30   Global Step: 153330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:50,402-Speed 10974.82 samples/sec   Loss 5.7655   LearningRate 0.0059   Epoch: 30   Global Step: 153340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:51,351-Speed 10795.57 samples/sec   Loss 5.9572   LearningRate 0.0059   Epoch: 30   Global Step: 153350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:52,299-Speed 10807.95 samples/sec   Loss 5.8317   LearningRate 0.0059   Epoch: 30   Global Step: 153360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:53,312-Speed 10117.74 samples/sec   Loss 5.7546   LearningRate 0.0059   Epoch: 30   Global Step: 153370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:54,282-Speed 10575.29 samples/sec   Loss 5.9544   LearningRate 0.0059   Epoch: 30   Global Step: 153380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:55,273-Speed 10330.44 samples/sec   Loss 6.0005   LearningRate 0.0058   Epoch: 30   Global Step: 153390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:56,254-Speed 10445.61 samples/sec   Loss 5.8842   LearningRate 0.0058   Epoch: 30   Global Step: 153400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:57,222-Speed 10596.70 samples/sec   Loss 5.8038   LearningRate 0.0058   Epoch: 30   Global Step: 153410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:58,204-Speed 10437.58 samples/sec   Loss 5.9553   LearningRate 0.0058   Epoch: 30   Global Step: 153420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:57:59,191-Speed 10383.22 samples/sec   Loss 5.8922   LearningRate 0.0058   Epoch: 30   Global Step: 153430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:58:00,172-Speed 10452.42 samples/sec   Loss 5.8626   LearningRate 0.0058   Epoch: 30   Global Step: 153440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:58:01,165-Speed 10314.04 samples/sec   Loss 5.8539   LearningRate 0.0058   Epoch: 30   Global Step: 153450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:58:02,157-Speed 10338.19 samples/sec   Loss 5.9172   LearningRate 0.0058   Epoch: 30   Global Step: 153460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:58:03,135-Speed 10474.33 samples/sec   Loss 5.8555   LearningRate 0.0058   Epoch: 30   Global Step: 153470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:58:04,107-Speed 10550.10 samples/sec   Loss 5.9808   LearningRate 0.0058   Epoch: 30   Global Step: 153480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:58:05,068-Speed 10666.74 samples/sec   Loss 5.9175   LearningRate 0.0058   Epoch: 30   Global Step: 153490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:58:05,996-Speed 11041.05 samples/sec   Loss 5.8806   LearningRate 0.0058   Epoch: 30   Global Step: 153500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:58:06,964-Speed 10591.14 samples/sec   Loss 6.0037   LearningRate 0.0058   Epoch: 30   Global Step: 153510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:58:07,906-Speed 10871.41 samples/sec   Loss 5.9992   LearningRate 0.0058   Epoch: 30   Global Step: 153520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:58:08,886-Speed 10453.26 samples/sec   Loss 5.8368   LearningRate 0.0058   Epoch: 30   Global Step: 153530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:58:09,882-Speed 10294.36 samples/sec   Loss 5.8240   LearningRate 0.0058   Epoch: 30   Global Step: 153540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:58:10,863-Speed 10449.19 samples/sec   Loss 5.8299   LearningRate 0.0058   Epoch: 30   Global Step: 153550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:58:11,851-Speed 10369.96 samples/sec   Loss 5.8964   LearningRate 0.0058   Epoch: 30   Global Step: 153560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:58:12,801-Speed 10786.98 samples/sec   Loss 5.9775   LearningRate 0.0058   Epoch: 30   Global Step: 153570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:58:13,814-Speed 10122.79 samples/sec   Loss 5.9392   LearningRate 0.0058   Epoch: 30   Global Step: 153580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:58:14,785-Speed 10554.35 samples/sec   Loss 5.8900   LearningRate 0.0058   Epoch: 30   Global Step: 153590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:58:15,768-Speed 10423.88 samples/sec   Loss 5.9747   LearningRate 0.0058   Epoch: 30   Global Step: 153600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:58:16,749-Speed 10447.61 samples/sec   Loss 5.8545   LearningRate 0.0058   Epoch: 30   Global Step: 153610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:58:17,782-Speed 9922.83 samples/sec   Loss 5.7984   LearningRate 0.0058   Epoch: 30   Global Step: 153620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:58:18,844-Speed 9652.62 samples/sec   Loss 5.9066   LearningRate 0.0058   Epoch: 30   Global Step: 153630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:58:19,833-Speed 10359.16 samples/sec   Loss 5.8709   LearningRate 0.0058   Epoch: 30   Global Step: 153640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:58:20,772-Speed 10916.46 samples/sec   Loss 5.6443   LearningRate 0.0058   Epoch: 30   Global Step: 153650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:58:21,804-Speed 9931.23 samples/sec   Loss 5.8264   LearningRate 0.0058   Epoch: 30   Global Step: 153660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:58:22,814-Speed 10147.88 samples/sec   Loss 5.8032   LearningRate 0.0058   Epoch: 30   Global Step: 153670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:58:23,788-Speed 10525.82 samples/sec   Loss 5.9978   LearningRate 0.0058   Epoch: 30   Global Step: 153680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:58:24,804-Speed 10083.99 samples/sec   Loss 5.9704   LearningRate 0.0058   Epoch: 30   Global Step: 153690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 04:58:25,787-Speed 10424.42 samples/sec   Loss 5.7744   LearningRate 0.0058   Epoch: 30   Global Step: 153700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:58:26,751-Speed 10632.08 samples/sec   Loss 5.7013   LearningRate 0.0058   Epoch: 30   Global Step: 153710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:58:27,689-Speed 10925.82 samples/sec   Loss 5.8236   LearningRate 0.0058   Epoch: 30   Global Step: 153720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:58:28,667-Speed 10488.18 samples/sec   Loss 5.9851   LearningRate 0.0058   Epoch: 30   Global Step: 153730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:58:29,648-Speed 10448.39 samples/sec   Loss 6.0349   LearningRate 0.0058   Epoch: 30   Global Step: 153740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:58:30,602-Speed 10748.84 samples/sec   Loss 5.9155   LearningRate 0.0058   Epoch: 30   Global Step: 153750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:58:31,582-Speed 10450.83 samples/sec   Loss 6.1434   LearningRate 0.0058   Epoch: 30   Global Step: 153760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:58:32,588-Speed 10188.52 samples/sec   Loss 5.9628   LearningRate 0.0058   Epoch: 30   Global Step: 153770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:58:33,556-Speed 10585.49 samples/sec   Loss 6.0501   LearningRate 0.0058   Epoch: 30   Global Step: 153780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:58:34,504-Speed 10817.39 samples/sec   Loss 5.8031   LearningRate 0.0058   Epoch: 30   Global Step: 153790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:58:35,517-Speed 10115.55 samples/sec   Loss 5.7932   LearningRate 0.0058   Epoch: 30   Global Step: 153800   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:58:36,510-Speed 10324.67 samples/sec   Loss 6.0301   LearningRate 0.0057   Epoch: 30   Global Step: 153810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:58:37,485-Speed 10518.37 samples/sec   Loss 5.8989   LearningRate 0.0057   Epoch: 30   Global Step: 153820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:58:38,454-Speed 10569.78 samples/sec   Loss 6.0494   LearningRate 0.0057   Epoch: 30   Global Step: 153830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:58:39,451-Speed 10276.99 samples/sec   Loss 5.9805   LearningRate 0.0057   Epoch: 30   Global Step: 153840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:58:40,475-Speed 10008.17 samples/sec   Loss 5.8604   LearningRate 0.0057   Epoch: 30   Global Step: 153850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:58:41,478-Speed 10225.15 samples/sec   Loss 5.8804   LearningRate 0.0057   Epoch: 30   Global Step: 153860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:58:42,432-Speed 10736.82 samples/sec   Loss 5.9585   LearningRate 0.0057   Epoch: 30   Global Step: 153870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:58:43,398-Speed 10618.93 samples/sec   Loss 5.9760   LearningRate 0.0057   Epoch: 30   Global Step: 153880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:58:44,412-Speed 10109.40 samples/sec   Loss 5.9765   LearningRate 0.0057   Epoch: 30   Global Step: 153890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:58:45,340-Speed 11051.56 samples/sec   Loss 5.8371   LearningRate 0.0057   Epoch: 30   Global Step: 153900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:58:46,319-Speed 10465.97 samples/sec   Loss 5.9491   LearningRate 0.0057   Epoch: 30   Global Step: 153910   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 04:58:47,283-Speed 10624.52 samples/sec   Loss 5.8564   LearningRate 0.0057   Epoch: 30   Global Step: 153920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:58:48,287-Speed 10214.61 samples/sec   Loss 5.9493   LearningRate 0.0057   Epoch: 30   Global Step: 153930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:58:49,245-Speed 10689.43 samples/sec   Loss 5.9248   LearningRate 0.0057   Epoch: 30   Global Step: 153940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:58:50,212-Speed 10598.85 samples/sec   Loss 5.9655   LearningRate 0.0057   Epoch: 30   Global Step: 153950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:58:51,182-Speed 10562.70 samples/sec   Loss 5.8729   LearningRate 0.0057   Epoch: 30   Global Step: 153960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:58:52,138-Speed 10726.20 samples/sec   Loss 5.8534   LearningRate 0.0057   Epoch: 30   Global Step: 153970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:58:53,146-Speed 10162.23 samples/sec   Loss 5.8296   LearningRate 0.0057   Epoch: 30   Global Step: 153980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:58:54,211-Speed 9624.78 samples/sec   Loss 5.9822   LearningRate 0.0057   Epoch: 30   Global Step: 153990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:58:55,174-Speed 10650.13 samples/sec   Loss 6.1019   LearningRate 0.0057   Epoch: 30   Global Step: 154000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 04:59:17,357-[lfw][154000]XNorm: 8.549296
Training: 2022-04-11 04:59:17,358-[lfw][154000]Accuracy-Flip: 0.99683+-0.00283
Training: 2022-04-11 04:59:17,359-[lfw][154000]Accuracy-Highest: 0.99700
Training: 2022-04-11 04:59:42,798-[cfp_fp][154000]XNorm: 7.354668
Training: 2022-04-11 04:59:42,799-[cfp_fp][154000]Accuracy-Flip: 0.97057+-0.00740
Training: 2022-04-11 04:59:42,800-[cfp_fp][154000]Accuracy-Highest: 0.97057
Training: 2022-04-11 05:00:04,822-[agedb_30][154000]XNorm: 8.389275
Training: 2022-04-11 05:00:04,822-[agedb_30][154000]Accuracy-Flip: 0.97033+-0.00777
Training: 2022-04-11 05:00:04,823-[agedb_30][154000]Accuracy-Highest: 0.97250
Training: 2022-04-11 05:00:05,800-Speed 144.99 samples/sec   Loss 6.0473   LearningRate 0.0057   Epoch: 30   Global Step: 154010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:06,762-Speed 10647.21 samples/sec   Loss 6.0508   LearningRate 0.0057   Epoch: 30   Global Step: 154020   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 05:00:07,736-Speed 10515.30 samples/sec   Loss 5.8757   LearningRate 0.0057   Epoch: 30   Global Step: 154030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:08,727-Speed 10347.05 samples/sec   Loss 5.8991   LearningRate 0.0057   Epoch: 30   Global Step: 154040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:09,669-Speed 10882.18 samples/sec   Loss 5.9246   LearningRate 0.0057   Epoch: 30   Global Step: 154050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:10,663-Speed 10306.81 samples/sec   Loss 6.0285   LearningRate 0.0057   Epoch: 30   Global Step: 154060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:11,601-Speed 10930.97 samples/sec   Loss 5.9147   LearningRate 0.0057   Epoch: 30   Global Step: 154070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:00:12,646-Speed 9812.84 samples/sec   Loss 5.8816   LearningRate 0.0057   Epoch: 30   Global Step: 154080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:00:13,621-Speed 10520.94 samples/sec   Loss 6.0604   LearningRate 0.0057   Epoch: 30   Global Step: 154090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:00:14,603-Speed 10430.89 samples/sec   Loss 5.8219   LearningRate 0.0057   Epoch: 30   Global Step: 154100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:00:15,604-Speed 10257.37 samples/sec   Loss 5.9136   LearningRate 0.0057   Epoch: 30   Global Step: 154110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:00:16,574-Speed 10563.71 samples/sec   Loss 5.7659   LearningRate 0.0057   Epoch: 30   Global Step: 154120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:00:17,594-Speed 10045.89 samples/sec   Loss 5.9626   LearningRate 0.0057   Epoch: 30   Global Step: 154130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:00:18,603-Speed 10155.05 samples/sec   Loss 5.8987   LearningRate 0.0057   Epoch: 30   Global Step: 154140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:00:19,595-Speed 10337.46 samples/sec   Loss 5.9550   LearningRate 0.0057   Epoch: 30   Global Step: 154150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:00:20,595-Speed 10246.10 samples/sec   Loss 6.1360   LearningRate 0.0057   Epoch: 30   Global Step: 154160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:00:21,557-Speed 10650.57 samples/sec   Loss 5.8069   LearningRate 0.0057   Epoch: 30   Global Step: 154170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:22,601-Speed 9833.08 samples/sec   Loss 5.9166   LearningRate 0.0057   Epoch: 30   Global Step: 154180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:23,565-Speed 10627.85 samples/sec   Loss 6.0051   LearningRate 0.0057   Epoch: 30   Global Step: 154190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:24,527-Speed 10661.55 samples/sec   Loss 5.9083   LearningRate 0.0057   Epoch: 30   Global Step: 154200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:25,456-Speed 11024.49 samples/sec   Loss 6.0190   LearningRate 0.0057   Epoch: 30   Global Step: 154210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:26,419-Speed 10637.13 samples/sec   Loss 5.9455   LearningRate 0.0057   Epoch: 30   Global Step: 154220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:27,404-Speed 10410.74 samples/sec   Loss 5.8267   LearningRate 0.0056   Epoch: 30   Global Step: 154230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:28,406-Speed 10225.08 samples/sec   Loss 5.8884   LearningRate 0.0056   Epoch: 30   Global Step: 154240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:29,351-Speed 10851.50 samples/sec   Loss 6.0324   LearningRate 0.0056   Epoch: 30   Global Step: 154250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:30,318-Speed 10595.30 samples/sec   Loss 5.8530   LearningRate 0.0056   Epoch: 30   Global Step: 154260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:31,308-Speed 10358.13 samples/sec   Loss 5.9376   LearningRate 0.0056   Epoch: 30   Global Step: 154270   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 05:00:32,296-Speed 10362.55 samples/sec   Loss 5.9268   LearningRate 0.0056   Epoch: 30   Global Step: 154280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:33,270-Speed 10528.40 samples/sec   Loss 5.9765   LearningRate 0.0056   Epoch: 30   Global Step: 154290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:34,296-Speed 9989.59 samples/sec   Loss 5.9184   LearningRate 0.0056   Epoch: 30   Global Step: 154300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:35,265-Speed 10578.93 samples/sec   Loss 5.8954   LearningRate 0.0056   Epoch: 30   Global Step: 154310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:36,232-Speed 10600.47 samples/sec   Loss 5.9162   LearningRate 0.0056   Epoch: 30   Global Step: 154320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:37,184-Speed 10760.81 samples/sec   Loss 6.0411   LearningRate 0.0056   Epoch: 30   Global Step: 154330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:38,160-Speed 10502.41 samples/sec   Loss 5.8950   LearningRate 0.0056   Epoch: 30   Global Step: 154340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:39,138-Speed 10485.51 samples/sec   Loss 5.9735   LearningRate 0.0056   Epoch: 30   Global Step: 154350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:40,107-Speed 10574.82 samples/sec   Loss 5.9524   LearningRate 0.0056   Epoch: 30   Global Step: 154360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:41,065-Speed 10700.25 samples/sec   Loss 5.9308   LearningRate 0.0056   Epoch: 30   Global Step: 154370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:42,073-Speed 10166.57 samples/sec   Loss 6.0913   LearningRate 0.0056   Epoch: 30   Global Step: 154380   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 05:00:43,042-Speed 10572.15 samples/sec   Loss 5.9263   LearningRate 0.0056   Epoch: 30   Global Step: 154390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:43,974-Speed 11008.03 samples/sec   Loss 5.9382   LearningRate 0.0056   Epoch: 30   Global Step: 154400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:44,968-Speed 10305.05 samples/sec   Loss 5.9307   LearningRate 0.0056   Epoch: 30   Global Step: 154410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:45,974-Speed 10192.42 samples/sec   Loss 5.8455   LearningRate 0.0056   Epoch: 30   Global Step: 154420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:46,999-Speed 9998.29 samples/sec   Loss 5.8419   LearningRate 0.0056   Epoch: 30   Global Step: 154430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:48,011-Speed 10126.32 samples/sec   Loss 5.9826   LearningRate 0.0056   Epoch: 30   Global Step: 154440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:48,963-Speed 10774.10 samples/sec   Loss 5.8936   LearningRate 0.0056   Epoch: 30   Global Step: 154450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:49,917-Speed 10749.39 samples/sec   Loss 5.9944   LearningRate 0.0056   Epoch: 30   Global Step: 154460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:50,955-Speed 9871.82 samples/sec   Loss 5.8453   LearningRate 0.0056   Epoch: 30   Global Step: 154470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:51,884-Speed 11026.31 samples/sec   Loss 6.0858   LearningRate 0.0056   Epoch: 30   Global Step: 154480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:52,844-Speed 10680.36 samples/sec   Loss 5.8742   LearningRate 0.0056   Epoch: 30   Global Step: 154490   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 05:00:53,874-Speed 9950.92 samples/sec   Loss 5.9262   LearningRate 0.0056   Epoch: 30   Global Step: 154500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:54,896-Speed 10029.76 samples/sec   Loss 6.0043   LearningRate 0.0056   Epoch: 30   Global Step: 154510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:55,878-Speed 10435.90 samples/sec   Loss 5.8965   LearningRate 0.0056   Epoch: 30   Global Step: 154520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:56,846-Speed 10585.07 samples/sec   Loss 6.0574   LearningRate 0.0056   Epoch: 30   Global Step: 154530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:57,809-Speed 10649.18 samples/sec   Loss 6.0233   LearningRate 0.0056   Epoch: 30   Global Step: 154540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:58,861-Speed 9749.95 samples/sec   Loss 5.9499   LearningRate 0.0056   Epoch: 30   Global Step: 154550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:00:59,868-Speed 10184.64 samples/sec   Loss 6.0316   LearningRate 0.0056   Epoch: 30   Global Step: 154560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:01:00,832-Speed 10628.86 samples/sec   Loss 5.8548   LearningRate 0.0056   Epoch: 30   Global Step: 154570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:01:01,807-Speed 10513.97 samples/sec   Loss 5.9858   LearningRate 0.0056   Epoch: 30   Global Step: 154580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:01:02,921-Speed 9195.68 samples/sec   Loss 5.9641   LearningRate 0.0056   Epoch: 30   Global Step: 154590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:01:03,906-Speed 10413.13 samples/sec   Loss 5.9562   LearningRate 0.0056   Epoch: 30   Global Step: 154600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:01:04,888-Speed 10430.90 samples/sec   Loss 6.0690   LearningRate 0.0056   Epoch: 30   Global Step: 154610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:01:05,860-Speed 10542.48 samples/sec   Loss 5.8523   LearningRate 0.0056   Epoch: 30   Global Step: 154620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:01:06,807-Speed 10827.90 samples/sec   Loss 5.9432   LearningRate 0.0056   Epoch: 30   Global Step: 154630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:01:07,803-Speed 10282.28 samples/sec   Loss 5.8602   LearningRate 0.0056   Epoch: 30   Global Step: 154640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:01:08,804-Speed 10239.50 samples/sec   Loss 6.1556   LearningRate 0.0056   Epoch: 30   Global Step: 154650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:01:09,790-Speed 10394.56 samples/sec   Loss 5.9736   LearningRate 0.0055   Epoch: 30   Global Step: 154660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:01:10,778-Speed 10372.14 samples/sec   Loss 5.9625   LearningRate 0.0055   Epoch: 30   Global Step: 154670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:01:11,743-Speed 10620.32 samples/sec   Loss 5.9985   LearningRate 0.0055   Epoch: 30   Global Step: 154680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:01:12,759-Speed 10082.57 samples/sec   Loss 6.1124   LearningRate 0.0055   Epoch: 30   Global Step: 154690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:01:13,755-Speed 10309.52 samples/sec   Loss 5.9785   LearningRate 0.0055   Epoch: 30   Global Step: 154700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:01:14,734-Speed 10478.93 samples/sec   Loss 5.8650   LearningRate 0.0055   Epoch: 30   Global Step: 154710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:01:15,667-Speed 10978.76 samples/sec   Loss 5.9516   LearningRate 0.0055   Epoch: 30   Global Step: 154720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:01:16,642-Speed 10509.55 samples/sec   Loss 6.0294   LearningRate 0.0055   Epoch: 30   Global Step: 154730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:01:17,651-Speed 10167.12 samples/sec   Loss 5.9764   LearningRate 0.0055   Epoch: 30   Global Step: 154740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:01:18,629-Speed 10481.57 samples/sec   Loss 6.0041   LearningRate 0.0055   Epoch: 30   Global Step: 154750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:01:19,600-Speed 10553.26 samples/sec   Loss 6.0807   LearningRate 0.0055   Epoch: 30   Global Step: 154760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:01:20,637-Speed 9884.19 samples/sec   Loss 6.0700   LearningRate 0.0055   Epoch: 30   Global Step: 154770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:01:21,629-Speed 10337.94 samples/sec   Loss 5.9206   LearningRate 0.0055   Epoch: 30   Global Step: 154780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:01:22,564-Speed 10954.94 samples/sec   Loss 5.9480   LearningRate 0.0055   Epoch: 30   Global Step: 154790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:01:23,541-Speed 10489.02 samples/sec   Loss 6.0927   LearningRate 0.0055   Epoch: 30   Global Step: 154800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:01:24,533-Speed 10330.54 samples/sec   Loss 5.9269   LearningRate 0.0055   Epoch: 30   Global Step: 154810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:01:25,478-Speed 10857.15 samples/sec   Loss 5.9290   LearningRate 0.0055   Epoch: 30   Global Step: 154820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:01:26,468-Speed 10353.04 samples/sec   Loss 5.7823   LearningRate 0.0055   Epoch: 30   Global Step: 154830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:01:27,470-Speed 10221.20 samples/sec   Loss 5.8841   LearningRate 0.0055   Epoch: 30   Global Step: 154840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:01:28,485-Speed 10101.44 samples/sec   Loss 5.9632   LearningRate 0.0055   Epoch: 30   Global Step: 154850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:01:29,437-Speed 10761.16 samples/sec   Loss 5.8729   LearningRate 0.0055   Epoch: 30   Global Step: 154860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:01:30,413-Speed 10502.96 samples/sec   Loss 5.9142   LearningRate 0.0055   Epoch: 30   Global Step: 154870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:01:31,388-Speed 10515.16 samples/sec   Loss 6.0772   LearningRate 0.0055   Epoch: 30   Global Step: 154880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:01:32,365-Speed 10494.07 samples/sec   Loss 6.1338   LearningRate 0.0055   Epoch: 30   Global Step: 154890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:01:33,359-Speed 10306.77 samples/sec   Loss 5.9440   LearningRate 0.0055   Epoch: 30   Global Step: 154900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:01:34,367-Speed 10167.73 samples/sec   Loss 5.9834   LearningRate 0.0055   Epoch: 30   Global Step: 154910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:01:35,347-Speed 10462.31 samples/sec   Loss 5.9843   LearningRate 0.0055   Epoch: 30   Global Step: 154920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:01:36,341-Speed 10311.13 samples/sec   Loss 5.9484   LearningRate 0.0055   Epoch: 30   Global Step: 154930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:01:37,299-Speed 10696.16 samples/sec   Loss 6.0208   LearningRate 0.0055   Epoch: 30   Global Step: 154940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:01:38,267-Speed 10585.07 samples/sec   Loss 6.0433   LearningRate 0.0055   Epoch: 30   Global Step: 154950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:01:39,259-Speed 10332.73 samples/sec   Loss 5.9172   LearningRate 0.0055   Epoch: 30   Global Step: 154960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:01:40,225-Speed 10601.20 samples/sec   Loss 5.9764   LearningRate 0.0055   Epoch: 30   Global Step: 154970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:01:41,247-Speed 10036.94 samples/sec   Loss 5.8803   LearningRate 0.0055   Epoch: 30   Global Step: 154980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:01:42,212-Speed 10615.87 samples/sec   Loss 5.9685   LearningRate 0.0055   Epoch: 30   Global Step: 154990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:01:43,318-Speed 9274.14 samples/sec   Loss 6.0355   LearningRate 0.0055   Epoch: 30   Global Step: 155000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:01:44,277-Speed 10687.24 samples/sec   Loss 5.9257   LearningRate 0.0055   Epoch: 30   Global Step: 155010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:01:45,274-Speed 10276.02 samples/sec   Loss 5.9662   LearningRate 0.0055   Epoch: 30   Global Step: 155020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:01:46,276-Speed 10223.13 samples/sec   Loss 5.9877   LearningRate 0.0055   Epoch: 30   Global Step: 155030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:01:47,284-Speed 10177.10 samples/sec   Loss 6.0042   LearningRate 0.0055   Epoch: 30   Global Step: 155040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:01:48,230-Speed 10833.42 samples/sec   Loss 6.2017   LearningRate 0.0055   Epoch: 30   Global Step: 155050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:01:49,235-Speed 10201.28 samples/sec   Loss 5.9690   LearningRate 0.0055   Epoch: 30   Global Step: 155060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:01:50,199-Speed 10625.77 samples/sec   Loss 6.0791   LearningRate 0.0055   Epoch: 30   Global Step: 155070   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 05:01:51,127-Speed 11059.41 samples/sec   Loss 5.9537   LearningRate 0.0055   Epoch: 30   Global Step: 155080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:01:52,080-Speed 10754.79 samples/sec   Loss 5.9255   LearningRate 0.0054   Epoch: 30   Global Step: 155090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:01:53,016-Speed 10947.70 samples/sec   Loss 6.0724   LearningRate 0.0054   Epoch: 30   Global Step: 155100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:01:53,995-Speed 10466.78 samples/sec   Loss 6.0308   LearningRate 0.0054   Epoch: 30   Global Step: 155110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:01:54,945-Speed 10795.93 samples/sec   Loss 5.9259   LearningRate 0.0054   Epoch: 30   Global Step: 155120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:01:55,938-Speed 10311.78 samples/sec   Loss 6.0018   LearningRate 0.0054   Epoch: 30   Global Step: 155130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:01:56,937-Speed 10265.93 samples/sec   Loss 6.0068   LearningRate 0.0054   Epoch: 30   Global Step: 155140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:01:57,976-Speed 9870.43 samples/sec   Loss 5.9633   LearningRate 0.0054   Epoch: 30   Global Step: 155150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:01:58,974-Speed 10264.23 samples/sec   Loss 5.9713   LearningRate 0.0054   Epoch: 30   Global Step: 155160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:01:59,951-Speed 10496.31 samples/sec   Loss 5.8854   LearningRate 0.0054   Epoch: 30   Global Step: 155170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:02:00,960-Speed 10156.82 samples/sec   Loss 5.9086   LearningRate 0.0054   Epoch: 30   Global Step: 155180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:02:01,993-Speed 9922.68 samples/sec   Loss 6.1342   LearningRate 0.0054   Epoch: 30   Global Step: 155190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:02:02,976-Speed 10434.53 samples/sec   Loss 6.0772   LearningRate 0.0054   Epoch: 30   Global Step: 155200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:02:03,943-Speed 10592.39 samples/sec   Loss 5.9232   LearningRate 0.0054   Epoch: 30   Global Step: 155210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:04,889-Speed 10828.93 samples/sec   Loss 5.9604   LearningRate 0.0054   Epoch: 30   Global Step: 155220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:05,849-Speed 10682.90 samples/sec   Loss 6.1663   LearningRate 0.0054   Epoch: 30   Global Step: 155230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:06,889-Speed 9851.36 samples/sec   Loss 6.0897   LearningRate 0.0054   Epoch: 30   Global Step: 155240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:07,887-Speed 10273.58 samples/sec   Loss 5.9677   LearningRate 0.0054   Epoch: 30   Global Step: 155250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:08,822-Speed 10961.50 samples/sec   Loss 5.9882   LearningRate 0.0054   Epoch: 30   Global Step: 155260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:09,794-Speed 10545.97 samples/sec   Loss 6.0086   LearningRate 0.0054   Epoch: 30   Global Step: 155270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:10,759-Speed 10622.68 samples/sec   Loss 6.0483   LearningRate 0.0054   Epoch: 30   Global Step: 155280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:11,729-Speed 10566.54 samples/sec   Loss 6.0970   LearningRate 0.0054   Epoch: 30   Global Step: 155290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:12,713-Speed 10405.56 samples/sec   Loss 6.0676   LearningRate 0.0054   Epoch: 30   Global Step: 155300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:13,719-Speed 10197.54 samples/sec   Loss 6.0722   LearningRate 0.0054   Epoch: 30   Global Step: 155310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:14,689-Speed 10560.62 samples/sec   Loss 5.9974   LearningRate 0.0054   Epoch: 30   Global Step: 155320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:15,642-Speed 10755.32 samples/sec   Loss 6.0070   LearningRate 0.0054   Epoch: 30   Global Step: 155330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:16,613-Speed 10555.11 samples/sec   Loss 6.1027   LearningRate 0.0054   Epoch: 30   Global Step: 155340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:17,643-Speed 9941.00 samples/sec   Loss 6.0477   LearningRate 0.0054   Epoch: 30   Global Step: 155350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:18,606-Speed 10652.30 samples/sec   Loss 5.9642   LearningRate 0.0054   Epoch: 30   Global Step: 155360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:19,580-Speed 10516.01 samples/sec   Loss 5.9710   LearningRate 0.0054   Epoch: 30   Global Step: 155370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:20,511-Speed 11013.59 samples/sec   Loss 6.0463   LearningRate 0.0054   Epoch: 30   Global Step: 155380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:21,446-Speed 10967.49 samples/sec   Loss 5.9370   LearningRate 0.0054   Epoch: 30   Global Step: 155390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:22,429-Speed 10420.47 samples/sec   Loss 6.0477   LearningRate 0.0054   Epoch: 30   Global Step: 155400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:23,391-Speed 10661.66 samples/sec   Loss 5.9890   LearningRate 0.0054   Epoch: 30   Global Step: 155410   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 05:02:24,391-Speed 10242.34 samples/sec   Loss 5.8570   LearningRate 0.0054   Epoch: 30   Global Step: 155420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:25,428-Speed 9891.05 samples/sec   Loss 6.0823   LearningRate 0.0054   Epoch: 30   Global Step: 155430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:26,415-Speed 10381.54 samples/sec   Loss 6.0486   LearningRate 0.0054   Epoch: 30   Global Step: 155440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:27,397-Speed 10432.12 samples/sec   Loss 5.8600   LearningRate 0.0054   Epoch: 30   Global Step: 155450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:28,445-Speed 9783.75 samples/sec   Loss 6.0726   LearningRate 0.0054   Epoch: 30   Global Step: 155460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:29,502-Speed 9697.15 samples/sec   Loss 6.0371   LearningRate 0.0054   Epoch: 30   Global Step: 155470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:30,490-Speed 10375.74 samples/sec   Loss 5.9740   LearningRate 0.0054   Epoch: 30   Global Step: 155480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:31,415-Speed 11086.77 samples/sec   Loss 6.1972   LearningRate 0.0054   Epoch: 30   Global Step: 155490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:32,366-Speed 10769.69 samples/sec   Loss 6.0958   LearningRate 0.0054   Epoch: 30   Global Step: 155500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:33,303-Speed 10939.57 samples/sec   Loss 6.1701   LearningRate 0.0054   Epoch: 30   Global Step: 155510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:34,378-Speed 9535.00 samples/sec   Loss 5.9696   LearningRate 0.0054   Epoch: 30   Global Step: 155520   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 05:02:35,370-Speed 10327.32 samples/sec   Loss 6.0404   LearningRate 0.0053   Epoch: 30   Global Step: 155530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:36,324-Speed 10747.12 samples/sec   Loss 6.0268   LearningRate 0.0053   Epoch: 30   Global Step: 155540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:37,307-Speed 10430.30 samples/sec   Loss 6.0225   LearningRate 0.0053   Epoch: 30   Global Step: 155550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:38,310-Speed 10217.96 samples/sec   Loss 5.9701   LearningRate 0.0053   Epoch: 30   Global Step: 155560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:39,333-Speed 10018.95 samples/sec   Loss 6.0972   LearningRate 0.0053   Epoch: 30   Global Step: 155570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:40,330-Speed 10270.27 samples/sec   Loss 5.9744   LearningRate 0.0053   Epoch: 30   Global Step: 155580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:41,307-Speed 10498.17 samples/sec   Loss 6.0810   LearningRate 0.0053   Epoch: 30   Global Step: 155590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:02:42,233-Speed 11075.15 samples/sec   Loss 5.9935   LearningRate 0.0053   Epoch: 30   Global Step: 155600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:02:43,261-Speed 9972.09 samples/sec   Loss 5.9984   LearningRate 0.0053   Epoch: 30   Global Step: 155610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:02:44,248-Speed 10381.76 samples/sec   Loss 5.9748   LearningRate 0.0053   Epoch: 30   Global Step: 155620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:02:45,246-Speed 10265.23 samples/sec   Loss 5.9565   LearningRate 0.0053   Epoch: 30   Global Step: 155630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:02:46,240-Speed 10313.09 samples/sec   Loss 6.0905   LearningRate 0.0053   Epoch: 30   Global Step: 155640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:02:47,241-Speed 10235.63 samples/sec   Loss 5.9924   LearningRate 0.0053   Epoch: 30   Global Step: 155650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:02:48,188-Speed 10820.61 samples/sec   Loss 6.1098   LearningRate 0.0053   Epoch: 30   Global Step: 155660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:02:49,181-Speed 10326.17 samples/sec   Loss 6.1002   LearningRate 0.0053   Epoch: 30   Global Step: 155670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:02:50,178-Speed 10272.20 samples/sec   Loss 5.9504   LearningRate 0.0053   Epoch: 30   Global Step: 155680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:02:51,190-Speed 10131.30 samples/sec   Loss 6.1115   LearningRate 0.0053   Epoch: 30   Global Step: 155690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:52,174-Speed 10409.49 samples/sec   Loss 5.9772   LearningRate 0.0053   Epoch: 30   Global Step: 155700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:53,259-Speed 9445.71 samples/sec   Loss 6.0054   LearningRate 0.0053   Epoch: 30   Global Step: 155710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:54,272-Speed 10128.63 samples/sec   Loss 5.9804   LearningRate 0.0053   Epoch: 30   Global Step: 155720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:55,252-Speed 10456.89 samples/sec   Loss 5.9154   LearningRate 0.0053   Epoch: 30   Global Step: 155730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:56,197-Speed 10840.35 samples/sec   Loss 6.0346   LearningRate 0.0053   Epoch: 30   Global Step: 155740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:02:57,156-Speed 10686.51 samples/sec   Loss 6.0435   LearningRate 0.0053   Epoch: 30   Global Step: 155750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:02:58,106-Speed 10795.11 samples/sec   Loss 6.0815   LearningRate 0.0053   Epoch: 30   Global Step: 155760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:02:59,059-Speed 10754.20 samples/sec   Loss 6.2219   LearningRate 0.0053   Epoch: 30   Global Step: 155770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:03:00,018-Speed 10686.59 samples/sec   Loss 6.0309   LearningRate 0.0053   Epoch: 30   Global Step: 155780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:03:01,002-Speed 10426.99 samples/sec   Loss 6.1051   LearningRate 0.0053   Epoch: 30   Global Step: 155790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:03:02,003-Speed 10240.09 samples/sec   Loss 6.0077   LearningRate 0.0053   Epoch: 30   Global Step: 155800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:03:02,927-Speed 11087.54 samples/sec   Loss 6.2574   LearningRate 0.0053   Epoch: 30   Global Step: 155810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:03:03,911-Speed 10416.64 samples/sec   Loss 6.1311   LearningRate 0.0053   Epoch: 30   Global Step: 155820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:03:04,912-Speed 10228.92 samples/sec   Loss 5.9627   LearningRate 0.0053   Epoch: 30   Global Step: 155830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:03:05,879-Speed 10608.21 samples/sec   Loss 5.9953   LearningRate 0.0053   Epoch: 30   Global Step: 155840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:03:06,845-Speed 10615.79 samples/sec   Loss 5.9897   LearningRate 0.0053   Epoch: 30   Global Step: 155850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:03:07,784-Speed 10906.55 samples/sec   Loss 6.0697   LearningRate 0.0053   Epoch: 30   Global Step: 155860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:03:08,788-Speed 10207.50 samples/sec   Loss 6.0298   LearningRate 0.0053   Epoch: 30   Global Step: 155870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:03:09,804-Speed 10089.76 samples/sec   Loss 6.1085   LearningRate 0.0053   Epoch: 30   Global Step: 155880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:03:10,804-Speed 10249.52 samples/sec   Loss 5.9598   LearningRate 0.0053   Epoch: 30   Global Step: 155890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:03:11,814-Speed 10142.23 samples/sec   Loss 5.8807   LearningRate 0.0053   Epoch: 30   Global Step: 155900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:03:12,814-Speed 10247.33 samples/sec   Loss 5.8467   LearningRate 0.0053   Epoch: 30   Global Step: 155910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:03:13,759-Speed 10847.25 samples/sec   Loss 6.0176   LearningRate 0.0053   Epoch: 30   Global Step: 155920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:03:14,728-Speed 10584.34 samples/sec   Loss 5.9763   LearningRate 0.0053   Epoch: 30   Global Step: 155930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:03:15,716-Speed 10363.42 samples/sec   Loss 5.8403   LearningRate 0.0053   Epoch: 30   Global Step: 155940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:03:16,774-Speed 9690.71 samples/sec   Loss 6.0469   LearningRate 0.0053   Epoch: 30   Global Step: 155950   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 05:03:17,768-Speed 10316.34 samples/sec   Loss 6.0102   LearningRate 0.0053   Epoch: 30   Global Step: 155960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:03:18,739-Speed 10548.74 samples/sec   Loss 5.9416   LearningRate 0.0052   Epoch: 30   Global Step: 155970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:03:19,755-Speed 10091.81 samples/sec   Loss 5.9981   LearningRate 0.0052   Epoch: 30   Global Step: 155980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:03:20,768-Speed 10119.29 samples/sec   Loss 5.9869   LearningRate 0.0052   Epoch: 30   Global Step: 155990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:03:21,754-Speed 10390.74 samples/sec   Loss 5.9570   LearningRate 0.0052   Epoch: 30   Global Step: 156000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:03:43,940-[lfw][156000]XNorm: 8.465717
Training: 2022-04-11 05:03:43,941-[lfw][156000]Accuracy-Flip: 0.99617+-0.00334
Training: 2022-04-11 05:03:43,942-[lfw][156000]Accuracy-Highest: 0.99700
Training: 2022-04-11 05:04:09,620-[cfp_fp][156000]XNorm: 7.276210
Training: 2022-04-11 05:04:09,621-[cfp_fp][156000]Accuracy-Flip: 0.97071+-0.01046
Training: 2022-04-11 05:04:09,622-[cfp_fp][156000]Accuracy-Highest: 0.97071
Training: 2022-04-11 05:04:31,916-[agedb_30][156000]XNorm: 8.264810
Training: 2022-04-11 05:04:31,917-[agedb_30][156000]Accuracy-Flip: 0.97033+-0.00802
Training: 2022-04-11 05:04:31,918-[agedb_30][156000]Accuracy-Highest: 0.97250
Training: 2022-04-11 05:04:32,899-Speed 143.93 samples/sec   Loss 5.9817   LearningRate 0.0052   Epoch: 30   Global Step: 156010   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:04:33,887-Speed 10365.33 samples/sec   Loss 5.9953   LearningRate 0.0052   Epoch: 30   Global Step: 156020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:04:34,882-Speed 10300.60 samples/sec   Loss 6.0371   LearningRate 0.0052   Epoch: 30   Global Step: 156030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:04:35,862-Speed 10464.03 samples/sec   Loss 6.1698   LearningRate 0.0052   Epoch: 30   Global Step: 156040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:04:36,806-Speed 10857.60 samples/sec   Loss 5.9773   LearningRate 0.0052   Epoch: 30   Global Step: 156050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:04:37,855-Speed 9762.99 samples/sec   Loss 6.0714   LearningRate 0.0052   Epoch: 30   Global Step: 156060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:04:38,845-Speed 10369.29 samples/sec   Loss 6.0420   LearningRate 0.0052   Epoch: 30   Global Step: 156070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:04:39,845-Speed 10269.82 samples/sec   Loss 6.0987   LearningRate 0.0052   Epoch: 30   Global Step: 156080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:04:40,802-Speed 10705.36 samples/sec   Loss 6.0641   LearningRate 0.0052   Epoch: 30   Global Step: 156090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:04:41,805-Speed 10218.52 samples/sec   Loss 6.1984   LearningRate 0.0052   Epoch: 30   Global Step: 156100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:04:42,773-Speed 10582.76 samples/sec   Loss 6.0879   LearningRate 0.0052   Epoch: 30   Global Step: 156110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:04:43,747-Speed 10526.07 samples/sec   Loss 6.0040   LearningRate 0.0052   Epoch: 30   Global Step: 156120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:04:44,712-Speed 10624.19 samples/sec   Loss 6.0802   LearningRate 0.0052   Epoch: 30   Global Step: 156130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:04:45,674-Speed 10650.12 samples/sec   Loss 6.0710   LearningRate 0.0052   Epoch: 30   Global Step: 156140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:04:46,669-Speed 10295.94 samples/sec   Loss 5.9036   LearningRate 0.0052   Epoch: 30   Global Step: 156150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:04:47,686-Speed 10078.91 samples/sec   Loss 6.1168   LearningRate 0.0052   Epoch: 30   Global Step: 156160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:04:48,670-Speed 10420.50 samples/sec   Loss 6.1118   LearningRate 0.0052   Epoch: 30   Global Step: 156170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:04:49,634-Speed 10632.21 samples/sec   Loss 6.1020   LearningRate 0.0052   Epoch: 30   Global Step: 156180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:04:50,658-Speed 10007.55 samples/sec   Loss 6.1414   LearningRate 0.0052   Epoch: 30   Global Step: 156190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:04:51,678-Speed 10041.75 samples/sec   Loss 6.0592   LearningRate 0.0052   Epoch: 30   Global Step: 156200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:04:52,671-Speed 10329.20 samples/sec   Loss 6.0927   LearningRate 0.0052   Epoch: 30   Global Step: 156210   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 05:04:53,660-Speed 10355.57 samples/sec   Loss 5.9959   LearningRate 0.0052   Epoch: 30   Global Step: 156220   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 05:04:54,591-Speed 11017.16 samples/sec   Loss 6.1413   LearningRate 0.0052   Epoch: 30   Global Step: 156230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:04:55,523-Speed 10998.84 samples/sec   Loss 6.0928   LearningRate 0.0052   Epoch: 30   Global Step: 156240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:04:56,455-Speed 10999.13 samples/sec   Loss 6.0545   LearningRate 0.0052   Epoch: 30   Global Step: 156250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:04:57,412-Speed 10697.60 samples/sec   Loss 6.0568   LearningRate 0.0052   Epoch: 30   Global Step: 156260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:04:58,391-Speed 10470.36 samples/sec   Loss 6.1293   LearningRate 0.0052   Epoch: 30   Global Step: 156270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:04:59,360-Speed 10578.48 samples/sec   Loss 6.0369   LearningRate 0.0052   Epoch: 30   Global Step: 156280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:05:00,343-Speed 10429.35 samples/sec   Loss 5.8982   LearningRate 0.0052   Epoch: 30   Global Step: 156290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:05:01,353-Speed 10154.35 samples/sec   Loss 5.9431   LearningRate 0.0052   Epoch: 30   Global Step: 156300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:05:02,366-Speed 10111.88 samples/sec   Loss 6.0350   LearningRate 0.0052   Epoch: 30   Global Step: 156310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:05:03,367-Speed 10240.86 samples/sec   Loss 6.1545   LearningRate 0.0052   Epoch: 30   Global Step: 156320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:05:04,321-Speed 10740.61 samples/sec   Loss 6.1438   LearningRate 0.0052   Epoch: 30   Global Step: 156330   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 05:05:05,331-Speed 10157.37 samples/sec   Loss 5.9460   LearningRate 0.0052   Epoch: 30   Global Step: 156340   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 05:05:06,294-Speed 10644.64 samples/sec   Loss 6.0268   LearningRate 0.0052   Epoch: 30   Global Step: 156350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:05:07,217-Speed 11097.73 samples/sec   Loss 6.1126   LearningRate 0.0052   Epoch: 30   Global Step: 156360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:05:08,310-Speed 9385.36 samples/sec   Loss 6.0391   LearningRate 0.0052   Epoch: 30   Global Step: 156370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:05:09,266-Speed 10722.48 samples/sec   Loss 6.0647   LearningRate 0.0052   Epoch: 30   Global Step: 156380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:05:10,278-Speed 10124.21 samples/sec   Loss 5.9944   LearningRate 0.0052   Epoch: 30   Global Step: 156390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:05:11,293-Speed 10098.00 samples/sec   Loss 6.1141   LearningRate 0.0052   Epoch: 30   Global Step: 156400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:05:12,275-Speed 10444.69 samples/sec   Loss 6.0970   LearningRate 0.0051   Epoch: 30   Global Step: 156410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:05:13,253-Speed 10470.49 samples/sec   Loss 5.9935   LearningRate 0.0051   Epoch: 30   Global Step: 156420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:05:14,245-Speed 10337.18 samples/sec   Loss 6.1421   LearningRate 0.0051   Epoch: 30   Global Step: 156430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:05:15,265-Speed 10044.10 samples/sec   Loss 6.0442   LearningRate 0.0051   Epoch: 30   Global Step: 156440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:05:16,218-Speed 10777.21 samples/sec   Loss 5.9604   LearningRate 0.0051   Epoch: 30   Global Step: 156450   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 05:05:17,198-Speed 10453.23 samples/sec   Loss 6.1046   LearningRate 0.0051   Epoch: 30   Global Step: 156460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:05:18,162-Speed 10637.20 samples/sec   Loss 6.1057   LearningRate 0.0051   Epoch: 30   Global Step: 156470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:05:19,097-Speed 10956.78 samples/sec   Loss 6.1417   LearningRate 0.0051   Epoch: 30   Global Step: 156480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:05:20,104-Speed 10178.90 samples/sec   Loss 6.0588   LearningRate 0.0051   Epoch: 30   Global Step: 156490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:05:21,086-Speed 10433.18 samples/sec   Loss 6.0785   LearningRate 0.0051   Epoch: 30   Global Step: 156500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:05:22,046-Speed 10673.80 samples/sec   Loss 6.1197   LearningRate 0.0051   Epoch: 30   Global Step: 156510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:05:23,016-Speed 10573.29 samples/sec   Loss 6.0066   LearningRate 0.0051   Epoch: 30   Global Step: 156520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:05:24,021-Speed 10196.35 samples/sec   Loss 5.9626   LearningRate 0.0051   Epoch: 30   Global Step: 156530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:05:24,998-Speed 10493.72 samples/sec   Loss 6.1460   LearningRate 0.0051   Epoch: 30   Global Step: 156540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:05:25,993-Speed 10295.65 samples/sec   Loss 5.9482   LearningRate 0.0051   Epoch: 30   Global Step: 156550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:05:26,935-Speed 10880.75 samples/sec   Loss 6.1470   LearningRate 0.0051   Epoch: 30   Global Step: 156560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:05:27,876-Speed 10891.61 samples/sec   Loss 5.9615   LearningRate 0.0051   Epoch: 30   Global Step: 156570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:05:28,900-Speed 10001.03 samples/sec   Loss 6.1416   LearningRate 0.0051   Epoch: 30   Global Step: 156580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:05:29,857-Speed 10714.28 samples/sec   Loss 6.0590   LearningRate 0.0051   Epoch: 30   Global Step: 156590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:05:30,812-Speed 10729.54 samples/sec   Loss 6.0507   LearningRate 0.0051   Epoch: 30   Global Step: 156600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:05:31,797-Speed 10410.70 samples/sec   Loss 6.0418   LearningRate 0.0051   Epoch: 30   Global Step: 156610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:05:32,818-Speed 10039.96 samples/sec   Loss 6.0786   LearningRate 0.0051   Epoch: 30   Global Step: 156620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:05:33,838-Speed 10057.32 samples/sec   Loss 5.9823   LearningRate 0.0051   Epoch: 30   Global Step: 156630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:05:34,809-Speed 10552.42 samples/sec   Loss 5.9446   LearningRate 0.0051   Epoch: 30   Global Step: 156640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:05:35,751-Speed 10874.29 samples/sec   Loss 6.0887   LearningRate 0.0051   Epoch: 30   Global Step: 156650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:05:36,758-Speed 10174.31 samples/sec   Loss 6.0706   LearningRate 0.0051   Epoch: 30   Global Step: 156660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:05:37,845-Speed 9439.20 samples/sec   Loss 5.9151   LearningRate 0.0051   Epoch: 30   Global Step: 156670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:05:38,819-Speed 10517.86 samples/sec   Loss 5.9946   LearningRate 0.0051   Epoch: 30   Global Step: 156680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:05:39,806-Speed 10382.87 samples/sec   Loss 5.9585   LearningRate 0.0051   Epoch: 30   Global Step: 156690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:05:40,816-Speed 10145.88 samples/sec   Loss 6.1336   LearningRate 0.0051   Epoch: 30   Global Step: 156700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:05:41,811-Speed 10309.11 samples/sec   Loss 6.0423   LearningRate 0.0051   Epoch: 30   Global Step: 156710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:05:42,795-Speed 10418.72 samples/sec   Loss 6.1341   LearningRate 0.0051   Epoch: 30   Global Step: 156720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:05:43,725-Speed 11016.28 samples/sec   Loss 6.0317   LearningRate 0.0051   Epoch: 30   Global Step: 156730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:05:44,733-Speed 10169.56 samples/sec   Loss 6.0669   LearningRate 0.0051   Epoch: 30   Global Step: 156740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:05:45,738-Speed 10195.72 samples/sec   Loss 6.1047   LearningRate 0.0051   Epoch: 30   Global Step: 156750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:05:46,714-Speed 10502.23 samples/sec   Loss 6.0083   LearningRate 0.0051   Epoch: 30   Global Step: 156760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:05:47,649-Speed 10964.35 samples/sec   Loss 6.1570   LearningRate 0.0051   Epoch: 30   Global Step: 156770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:05:48,653-Speed 10203.17 samples/sec   Loss 6.0689   LearningRate 0.0051   Epoch: 30   Global Step: 156780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:05:49,696-Speed 9829.83 samples/sec   Loss 5.9622   LearningRate 0.0051   Epoch: 30   Global Step: 156790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:00,249-Speed 970.50 samples/sec   Loss 5.8815   LearningRate 0.0051   Epoch: 31   Global Step: 156800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:01,318-Speed 9593.60 samples/sec   Loss 5.5980   LearningRate 0.0051   Epoch: 31   Global Step: 156810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:02,343-Speed 9997.63 samples/sec   Loss 5.6201   LearningRate 0.0051   Epoch: 31   Global Step: 156820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:03,368-Speed 9997.91 samples/sec   Loss 5.5392   LearningRate 0.0051   Epoch: 31   Global Step: 156830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:04,839-Speed 6965.79 samples/sec   Loss 5.5640   LearningRate 0.0051   Epoch: 31   Global Step: 156840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:05,903-Speed 9642.01 samples/sec   Loss 5.5097   LearningRate 0.0051   Epoch: 31   Global Step: 156850   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 05:06:07,202-Speed 7888.96 samples/sec   Loss 5.5664   LearningRate 0.0050   Epoch: 31   Global Step: 156860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:08,257-Speed 9712.42 samples/sec   Loss 5.5753   LearningRate 0.0050   Epoch: 31   Global Step: 156870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:09,242-Speed 10409.70 samples/sec   Loss 5.4417   LearningRate 0.0050   Epoch: 31   Global Step: 156880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:10,267-Speed 10000.95 samples/sec   Loss 5.5376   LearningRate 0.0050   Epoch: 31   Global Step: 156890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:11,241-Speed 10522.80 samples/sec   Loss 5.6216   LearningRate 0.0050   Epoch: 31   Global Step: 156900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:12,290-Speed 9771.82 samples/sec   Loss 5.4414   LearningRate 0.0050   Epoch: 31   Global Step: 156910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:13,334-Speed 9822.41 samples/sec   Loss 5.3548   LearningRate 0.0050   Epoch: 31   Global Step: 156920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:14,285-Speed 10775.42 samples/sec   Loss 5.4589   LearningRate 0.0050   Epoch: 31   Global Step: 156930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:15,258-Speed 10530.96 samples/sec   Loss 5.5244   LearningRate 0.0050   Epoch: 31   Global Step: 156940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:16,256-Speed 10270.85 samples/sec   Loss 5.5342   LearningRate 0.0050   Epoch: 31   Global Step: 156950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:17,274-Speed 10072.16 samples/sec   Loss 5.5221   LearningRate 0.0050   Epoch: 31   Global Step: 156960   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 05:06:18,271-Speed 10279.56 samples/sec   Loss 5.4125   LearningRate 0.0050   Epoch: 31   Global Step: 156970   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 05:06:19,208-Speed 10939.95 samples/sec   Loss 5.5516   LearningRate 0.0050   Epoch: 31   Global Step: 156980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:20,272-Speed 9636.35 samples/sec   Loss 5.5259   LearningRate 0.0050   Epoch: 31   Global Step: 156990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:21,271-Speed 10258.97 samples/sec   Loss 5.5378   LearningRate 0.0050   Epoch: 31   Global Step: 157000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:22,262-Speed 10334.24 samples/sec   Loss 5.5376   LearningRate 0.0050   Epoch: 31   Global Step: 157010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:23,269-Speed 10180.05 samples/sec   Loss 5.5078   LearningRate 0.0050   Epoch: 31   Global Step: 157020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:24,276-Speed 10177.85 samples/sec   Loss 5.3785   LearningRate 0.0050   Epoch: 31   Global Step: 157030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:25,273-Speed 10273.77 samples/sec   Loss 5.4724   LearningRate 0.0050   Epoch: 31   Global Step: 157040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:26,301-Speed 9974.32 samples/sec   Loss 5.5245   LearningRate 0.0050   Epoch: 31   Global Step: 157050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:27,271-Speed 10564.80 samples/sec   Loss 5.5281   LearningRate 0.0050   Epoch: 31   Global Step: 157060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:28,345-Speed 9542.09 samples/sec   Loss 5.5633   LearningRate 0.0050   Epoch: 31   Global Step: 157070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:29,340-Speed 10299.73 samples/sec   Loss 5.4789   LearningRate 0.0050   Epoch: 31   Global Step: 157080   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 05:06:30,316-Speed 10496.88 samples/sec   Loss 5.5238   LearningRate 0.0050   Epoch: 31   Global Step: 157090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:31,296-Speed 10462.33 samples/sec   Loss 5.7855   LearningRate 0.0050   Epoch: 31   Global Step: 157100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:32,278-Speed 10435.32 samples/sec   Loss 5.5910   LearningRate 0.0050   Epoch: 31   Global Step: 157110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:33,307-Speed 9960.39 samples/sec   Loss 5.4475   LearningRate 0.0050   Epoch: 31   Global Step: 157120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:34,271-Speed 10634.58 samples/sec   Loss 5.4664   LearningRate 0.0050   Epoch: 31   Global Step: 157130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:35,244-Speed 10529.58 samples/sec   Loss 5.6902   LearningRate 0.0050   Epoch: 31   Global Step: 157140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:36,217-Speed 10534.78 samples/sec   Loss 5.5978   LearningRate 0.0050   Epoch: 31   Global Step: 157150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:37,172-Speed 10733.69 samples/sec   Loss 5.4620   LearningRate 0.0050   Epoch: 31   Global Step: 157160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:38,223-Speed 9754.49 samples/sec   Loss 5.5553   LearningRate 0.0050   Epoch: 31   Global Step: 157170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:39,182-Speed 10698.89 samples/sec   Loss 5.6769   LearningRate 0.0050   Epoch: 31   Global Step: 157180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:40,123-Speed 10898.58 samples/sec   Loss 5.5906   LearningRate 0.0050   Epoch: 31   Global Step: 157190   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 05:06:41,109-Speed 10382.16 samples/sec   Loss 5.3496   LearningRate 0.0050   Epoch: 31   Global Step: 157200   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 05:06:42,164-Speed 9719.57 samples/sec   Loss 5.5014   LearningRate 0.0050   Epoch: 31   Global Step: 157210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:43,131-Speed 10606.10 samples/sec   Loss 5.5618   LearningRate 0.0050   Epoch: 31   Global Step: 157220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:44,109-Speed 10477.70 samples/sec   Loss 5.5927   LearningRate 0.0050   Epoch: 31   Global Step: 157230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:06:45,149-Speed 9849.76 samples/sec   Loss 5.5859   LearningRate 0.0050   Epoch: 31   Global Step: 157240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:06:46,155-Speed 10185.12 samples/sec   Loss 5.4807   LearningRate 0.0050   Epoch: 31   Global Step: 157250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:06:47,112-Speed 10708.85 samples/sec   Loss 5.6409   LearningRate 0.0050   Epoch: 31   Global Step: 157260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:06:48,100-Speed 10390.25 samples/sec   Loss 5.6406   LearningRate 0.0050   Epoch: 31   Global Step: 157270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:06:49,033-Speed 10984.67 samples/sec   Loss 5.5832   LearningRate 0.0050   Epoch: 31   Global Step: 157280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:06:49,982-Speed 10791.89 samples/sec   Loss 5.6030   LearningRate 0.0050   Epoch: 31   Global Step: 157290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:06:51,021-Speed 9887.71 samples/sec   Loss 5.5193   LearningRate 0.0050   Epoch: 31   Global Step: 157300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:06:51,990-Speed 10577.96 samples/sec   Loss 5.6119   LearningRate 0.0049   Epoch: 31   Global Step: 157310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:06:52,984-Speed 10308.17 samples/sec   Loss 5.5351   LearningRate 0.0049   Epoch: 31   Global Step: 157320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:06:53,966-Speed 10446.00 samples/sec   Loss 5.6412   LearningRate 0.0049   Epoch: 31   Global Step: 157330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:54,898-Speed 11001.79 samples/sec   Loss 5.6136   LearningRate 0.0049   Epoch: 31   Global Step: 157340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:55,847-Speed 10803.46 samples/sec   Loss 5.5675   LearningRate 0.0049   Epoch: 31   Global Step: 157350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:56,808-Speed 10669.14 samples/sec   Loss 5.6884   LearningRate 0.0049   Epoch: 31   Global Step: 157360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:57,809-Speed 10229.44 samples/sec   Loss 5.6112   LearningRate 0.0049   Epoch: 31   Global Step: 157370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:58,801-Speed 10346.17 samples/sec   Loss 5.6113   LearningRate 0.0049   Epoch: 31   Global Step: 157380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:06:59,805-Speed 10220.96 samples/sec   Loss 5.6539   LearningRate 0.0049   Epoch: 31   Global Step: 157390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:07:00,787-Speed 10435.33 samples/sec   Loss 5.7287   LearningRate 0.0049   Epoch: 31   Global Step: 157400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:07:01,774-Speed 10387.31 samples/sec   Loss 5.5308   LearningRate 0.0049   Epoch: 31   Global Step: 157410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:07:02,782-Speed 10162.77 samples/sec   Loss 5.5214   LearningRate 0.0049   Epoch: 31   Global Step: 157420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:07:03,791-Speed 10160.26 samples/sec   Loss 5.5822   LearningRate 0.0049   Epoch: 31   Global Step: 157430   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 05:07:04,716-Speed 11079.35 samples/sec   Loss 5.5357   LearningRate 0.0049   Epoch: 31   Global Step: 157440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:07:05,679-Speed 10651.08 samples/sec   Loss 5.6906   LearningRate 0.0049   Epoch: 31   Global Step: 157450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:07:06,682-Speed 10226.48 samples/sec   Loss 5.5529   LearningRate 0.0049   Epoch: 31   Global Step: 157460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:07:07,675-Speed 10324.15 samples/sec   Loss 5.5432   LearningRate 0.0049   Epoch: 31   Global Step: 157470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:07:08,761-Speed 9439.87 samples/sec   Loss 5.7347   LearningRate 0.0049   Epoch: 31   Global Step: 157480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:07:09,888-Speed 9088.33 samples/sec   Loss 5.5621   LearningRate 0.0049   Epoch: 31   Global Step: 157490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:07:10,908-Speed 10056.86 samples/sec   Loss 5.6008   LearningRate 0.0049   Epoch: 31   Global Step: 157500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:07:11,932-Speed 9999.06 samples/sec   Loss 5.5686   LearningRate 0.0049   Epoch: 31   Global Step: 157510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:07:12,879-Speed 10834.42 samples/sec   Loss 5.8042   LearningRate 0.0049   Epoch: 31   Global Step: 157520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:07:13,849-Speed 10568.64 samples/sec   Loss 5.6721   LearningRate 0.0049   Epoch: 31   Global Step: 157530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:07:14,840-Speed 10332.15 samples/sec   Loss 5.7446   LearningRate 0.0049   Epoch: 31   Global Step: 157540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:07:15,835-Speed 10304.76 samples/sec   Loss 5.6455   LearningRate 0.0049   Epoch: 31   Global Step: 157550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:07:16,842-Speed 10179.68 samples/sec   Loss 5.4879   LearningRate 0.0049   Epoch: 31   Global Step: 157560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:07:17,836-Speed 10317.96 samples/sec   Loss 5.5915   LearningRate 0.0049   Epoch: 31   Global Step: 157570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:07:18,800-Speed 10628.92 samples/sec   Loss 5.5828   LearningRate 0.0049   Epoch: 31   Global Step: 157580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:07:19,742-Speed 10880.20 samples/sec   Loss 5.5799   LearningRate 0.0049   Epoch: 31   Global Step: 157590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:07:20,739-Speed 10272.30 samples/sec   Loss 5.6777   LearningRate 0.0049   Epoch: 31   Global Step: 157600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:07:21,676-Speed 10939.45 samples/sec   Loss 5.6977   LearningRate 0.0049   Epoch: 31   Global Step: 157610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:07:22,707-Speed 9946.22 samples/sec   Loss 5.7176   LearningRate 0.0049   Epoch: 31   Global Step: 157620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:07:23,711-Speed 10203.69 samples/sec   Loss 5.7048   LearningRate 0.0049   Epoch: 31   Global Step: 157630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:07:24,684-Speed 10537.85 samples/sec   Loss 5.6254   LearningRate 0.0049   Epoch: 31   Global Step: 157640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:07:25,642-Speed 10695.73 samples/sec   Loss 5.6924   LearningRate 0.0049   Epoch: 31   Global Step: 157650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:07:26,613-Speed 10559.83 samples/sec   Loss 5.5518   LearningRate 0.0049   Epoch: 31   Global Step: 157660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:07:27,599-Speed 10394.27 samples/sec   Loss 5.5972   LearningRate 0.0049   Epoch: 31   Global Step: 157670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:07:28,599-Speed 10245.01 samples/sec   Loss 5.6723   LearningRate 0.0049   Epoch: 31   Global Step: 157680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:07:29,588-Speed 10364.85 samples/sec   Loss 5.8043   LearningRate 0.0049   Epoch: 31   Global Step: 157690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:07:30,525-Speed 10944.00 samples/sec   Loss 5.5888   LearningRate 0.0049   Epoch: 31   Global Step: 157700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:07:31,511-Speed 10388.51 samples/sec   Loss 5.5751   LearningRate 0.0049   Epoch: 31   Global Step: 157710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:07:32,450-Speed 10922.85 samples/sec   Loss 5.6483   LearningRate 0.0049   Epoch: 31   Global Step: 157720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:07:33,439-Speed 10359.65 samples/sec   Loss 5.7006   LearningRate 0.0049   Epoch: 31   Global Step: 157730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:07:34,412-Speed 10531.19 samples/sec   Loss 5.7290   LearningRate 0.0049   Epoch: 31   Global Step: 157740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:07:35,398-Speed 10399.39 samples/sec   Loss 5.6302   LearningRate 0.0049   Epoch: 31   Global Step: 157750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:07:36,376-Speed 10476.85 samples/sec   Loss 5.7487   LearningRate 0.0049   Epoch: 31   Global Step: 157760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:07:37,388-Speed 10125.66 samples/sec   Loss 5.6819   LearningRate 0.0048   Epoch: 31   Global Step: 157770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:07:38,371-Speed 10432.00 samples/sec   Loss 5.5625   LearningRate 0.0048   Epoch: 31   Global Step: 157780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:07:39,320-Speed 10795.62 samples/sec   Loss 5.5394   LearningRate 0.0048   Epoch: 31   Global Step: 157790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:07:40,332-Speed 10129.50 samples/sec   Loss 5.6414   LearningRate 0.0048   Epoch: 31   Global Step: 157800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:07:41,305-Speed 10538.34 samples/sec   Loss 5.5963   LearningRate 0.0048   Epoch: 31   Global Step: 157810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:07:42,317-Speed 10129.18 samples/sec   Loss 5.6700   LearningRate 0.0048   Epoch: 31   Global Step: 157820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:07:43,311-Speed 10310.27 samples/sec   Loss 5.6694   LearningRate 0.0048   Epoch: 31   Global Step: 157830   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 05:07:44,356-Speed 9806.28 samples/sec   Loss 5.8149   LearningRate 0.0048   Epoch: 31   Global Step: 157840   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 05:07:45,355-Speed 10259.63 samples/sec   Loss 5.7178   LearningRate 0.0048   Epoch: 31   Global Step: 157850   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 05:07:46,327-Speed 10549.44 samples/sec   Loss 5.6228   LearningRate 0.0048   Epoch: 31   Global Step: 157860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:07:47,300-Speed 10526.57 samples/sec   Loss 5.6918   LearningRate 0.0048   Epoch: 31   Global Step: 157870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:07:48,353-Speed 9734.63 samples/sec   Loss 5.6067   LearningRate 0.0048   Epoch: 31   Global Step: 157880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:07:49,303-Speed 10796.34 samples/sec   Loss 5.7104   LearningRate 0.0048   Epoch: 31   Global Step: 157890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:07:50,246-Speed 10862.84 samples/sec   Loss 5.5509   LearningRate 0.0048   Epoch: 31   Global Step: 157900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:07:51,237-Speed 10341.94 samples/sec   Loss 5.6287   LearningRate 0.0048   Epoch: 31   Global Step: 157910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:07:52,218-Speed 10448.96 samples/sec   Loss 5.7045   LearningRate 0.0048   Epoch: 31   Global Step: 157920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:07:53,158-Speed 10902.86 samples/sec   Loss 5.6495   LearningRate 0.0048   Epoch: 31   Global Step: 157930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:07:54,128-Speed 10566.95 samples/sec   Loss 5.6238   LearningRate 0.0048   Epoch: 31   Global Step: 157940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:07:55,162-Speed 9908.70 samples/sec   Loss 5.6999   LearningRate 0.0048   Epoch: 31   Global Step: 157950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:07:56,147-Speed 10410.77 samples/sec   Loss 5.7347   LearningRate 0.0048   Epoch: 31   Global Step: 157960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:07:57,126-Speed 10467.54 samples/sec   Loss 5.7954   LearningRate 0.0048   Epoch: 31   Global Step: 157970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:07:58,069-Speed 10860.39 samples/sec   Loss 5.6750   LearningRate 0.0048   Epoch: 31   Global Step: 157980   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:07:59,110-Speed 9847.06 samples/sec   Loss 5.6433   LearningRate 0.0048   Epoch: 31   Global Step: 157990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:08:00,109-Speed 10254.54 samples/sec   Loss 5.7247   LearningRate 0.0048   Epoch: 31   Global Step: 158000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:08:22,432-[lfw][158000]XNorm: 8.522117
Training: 2022-04-11 05:08:22,433-[lfw][158000]Accuracy-Flip: 0.99667+-0.00258
Training: 2022-04-11 05:08:22,434-[lfw][158000]Accuracy-Highest: 0.99700
Training: 2022-04-11 05:08:48,015-[cfp_fp][158000]XNorm: 7.320941
Training: 2022-04-11 05:08:48,016-[cfp_fp][158000]Accuracy-Flip: 0.96929+-0.01009
Training: 2022-04-11 05:08:48,017-[cfp_fp][158000]Accuracy-Highest: 0.97071
Training: 2022-04-11 05:09:10,128-[agedb_30][158000]XNorm: 8.332011
Training: 2022-04-11 05:09:10,129-[agedb_30][158000]Accuracy-Flip: 0.97117+-0.00866
Training: 2022-04-11 05:09:10,129-[agedb_30][158000]Accuracy-Highest: 0.97250
Training: 2022-04-11 05:09:11,152-Speed 144.14 samples/sec   Loss 5.7606   LearningRate 0.0048   Epoch: 31   Global Step: 158010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:09:12,122-Speed 10554.91 samples/sec   Loss 5.6382   LearningRate 0.0048   Epoch: 31   Global Step: 158020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:09:13,084-Speed 10658.84 samples/sec   Loss 5.6535   LearningRate 0.0048   Epoch: 31   Global Step: 158030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:09:14,115-Speed 9934.36 samples/sec   Loss 5.5939   LearningRate 0.0048   Epoch: 31   Global Step: 158040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:09:15,120-Speed 10201.83 samples/sec   Loss 5.5747   LearningRate 0.0048   Epoch: 31   Global Step: 158050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:09:16,081-Speed 10678.82 samples/sec   Loss 5.7468   LearningRate 0.0048   Epoch: 31   Global Step: 158060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:09:17,070-Speed 10355.71 samples/sec   Loss 5.7235   LearningRate 0.0048   Epoch: 31   Global Step: 158070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:09:18,088-Speed 10073.34 samples/sec   Loss 5.7103   LearningRate 0.0048   Epoch: 31   Global Step: 158080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:09:19,124-Speed 9894.19 samples/sec   Loss 5.5698   LearningRate 0.0048   Epoch: 31   Global Step: 158090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:09:20,074-Speed 10785.46 samples/sec   Loss 5.7113   LearningRate 0.0048   Epoch: 31   Global Step: 158100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:09:21,062-Speed 10377.73 samples/sec   Loss 5.7530   LearningRate 0.0048   Epoch: 31   Global Step: 158110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:09:22,083-Speed 10032.05 samples/sec   Loss 5.6367   LearningRate 0.0048   Epoch: 31   Global Step: 158120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:09:23,070-Speed 10383.94 samples/sec   Loss 5.9011   LearningRate 0.0048   Epoch: 31   Global Step: 158130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:09:24,065-Speed 10301.52 samples/sec   Loss 5.5712   LearningRate 0.0048   Epoch: 31   Global Step: 158140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:09:25,077-Speed 10128.25 samples/sec   Loss 5.7303   LearningRate 0.0048   Epoch: 31   Global Step: 158150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:09:26,060-Speed 10424.70 samples/sec   Loss 5.5598   LearningRate 0.0048   Epoch: 31   Global Step: 158160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:09:27,086-Speed 9995.20 samples/sec   Loss 5.5999   LearningRate 0.0048   Epoch: 31   Global Step: 158170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:09:28,018-Speed 10999.65 samples/sec   Loss 5.7085   LearningRate 0.0048   Epoch: 31   Global Step: 158180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:09:29,020-Speed 10226.74 samples/sec   Loss 5.6305   LearningRate 0.0048   Epoch: 31   Global Step: 158190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:09:30,035-Speed 10100.44 samples/sec   Loss 5.7647   LearningRate 0.0048   Epoch: 31   Global Step: 158200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:09:31,043-Speed 10187.80 samples/sec   Loss 5.8197   LearningRate 0.0048   Epoch: 31   Global Step: 158210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:09:32,008-Speed 10617.80 samples/sec   Loss 5.8646   LearningRate 0.0048   Epoch: 31   Global Step: 158220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:09:33,003-Speed 10306.35 samples/sec   Loss 5.6726   LearningRate 0.0047   Epoch: 31   Global Step: 158230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:09:33,989-Speed 10387.91 samples/sec   Loss 5.6247   LearningRate 0.0047   Epoch: 31   Global Step: 158240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:09:34,950-Speed 10672.67 samples/sec   Loss 5.6995   LearningRate 0.0047   Epoch: 31   Global Step: 158250   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 05:09:35,945-Speed 10298.90 samples/sec   Loss 5.6821   LearningRate 0.0047   Epoch: 31   Global Step: 158260   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 05:09:36,936-Speed 10348.80 samples/sec   Loss 5.7161   LearningRate 0.0047   Epoch: 31   Global Step: 158270   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 05:09:37,887-Speed 10771.89 samples/sec   Loss 5.6302   LearningRate 0.0047   Epoch: 31   Global Step: 158280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:09:38,887-Speed 10250.28 samples/sec   Loss 5.6083   LearningRate 0.0047   Epoch: 31   Global Step: 158290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:09:39,902-Speed 10096.27 samples/sec   Loss 5.7426   LearningRate 0.0047   Epoch: 31   Global Step: 158300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:09:40,868-Speed 10627.96 samples/sec   Loss 5.6887   LearningRate 0.0047   Epoch: 31   Global Step: 158310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:09:41,836-Speed 10694.87 samples/sec   Loss 5.6685   LearningRate 0.0047   Epoch: 31   Global Step: 158320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:09:42,977-Speed 8977.31 samples/sec   Loss 5.8019   LearningRate 0.0047   Epoch: 31   Global Step: 158330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:09:43,949-Speed 10550.53 samples/sec   Loss 5.7899   LearningRate 0.0047   Epoch: 31   Global Step: 158340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:09:44,968-Speed 10052.07 samples/sec   Loss 5.7436   LearningRate 0.0047   Epoch: 31   Global Step: 158350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:09:45,937-Speed 10576.74 samples/sec   Loss 5.8025   LearningRate 0.0047   Epoch: 31   Global Step: 158360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:09:46,978-Speed 9848.07 samples/sec   Loss 5.7184   LearningRate 0.0047   Epoch: 31   Global Step: 158370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:09:47,979-Speed 10239.17 samples/sec   Loss 5.6786   LearningRate 0.0047   Epoch: 31   Global Step: 158380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:09:48,978-Speed 10255.40 samples/sec   Loss 5.6850   LearningRate 0.0047   Epoch: 31   Global Step: 158390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:09:49,955-Speed 10493.63 samples/sec   Loss 5.6713   LearningRate 0.0047   Epoch: 31   Global Step: 158400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:09:50,960-Speed 10196.55 samples/sec   Loss 5.7545   LearningRate 0.0047   Epoch: 31   Global Step: 158410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:09:51,935-Speed 10512.05 samples/sec   Loss 5.8337   LearningRate 0.0047   Epoch: 31   Global Step: 158420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:09:52,905-Speed 10567.54 samples/sec   Loss 5.7196   LearningRate 0.0047   Epoch: 31   Global Step: 158430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:09:53,923-Speed 10069.53 samples/sec   Loss 5.5966   LearningRate 0.0047   Epoch: 31   Global Step: 158440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:09:54,919-Speed 10285.36 samples/sec   Loss 5.8647   LearningRate 0.0047   Epoch: 31   Global Step: 158450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:09:55,905-Speed 10396.55 samples/sec   Loss 5.6880   LearningRate 0.0047   Epoch: 31   Global Step: 158460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:09:56,900-Speed 10301.74 samples/sec   Loss 5.7956   LearningRate 0.0047   Epoch: 31   Global Step: 158470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:09:57,918-Speed 10061.60 samples/sec   Loss 5.6443   LearningRate 0.0047   Epoch: 31   Global Step: 158480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:09:58,936-Speed 10076.51 samples/sec   Loss 5.7022   LearningRate 0.0047   Epoch: 31   Global Step: 158490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:09:59,902-Speed 10614.70 samples/sec   Loss 5.8090   LearningRate 0.0047   Epoch: 31   Global Step: 158500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:10:00,843-Speed 10892.65 samples/sec   Loss 5.7337   LearningRate 0.0047   Epoch: 31   Global Step: 158510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:10:01,857-Speed 10107.74 samples/sec   Loss 5.5534   LearningRate 0.0047   Epoch: 31   Global Step: 158520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:10:02,828-Speed 10547.23 samples/sec   Loss 5.6952   LearningRate 0.0047   Epoch: 31   Global Step: 158530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:10:03,813-Speed 10407.77 samples/sec   Loss 5.6372   LearningRate 0.0047   Epoch: 31   Global Step: 158540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:10:04,865-Speed 9741.18 samples/sec   Loss 5.8959   LearningRate 0.0047   Epoch: 31   Global Step: 158550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:10:05,856-Speed 10343.22 samples/sec   Loss 5.8881   LearningRate 0.0047   Epoch: 31   Global Step: 158560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:10:06,881-Speed 10005.89 samples/sec   Loss 5.8487   LearningRate 0.0047   Epoch: 31   Global Step: 158570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:10:07,872-Speed 10344.96 samples/sec   Loss 5.6876   LearningRate 0.0047   Epoch: 31   Global Step: 158580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:10:08,886-Speed 10107.60 samples/sec   Loss 5.6535   LearningRate 0.0047   Epoch: 31   Global Step: 158590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:10:09,870-Speed 10413.90 samples/sec   Loss 5.8420   LearningRate 0.0047   Epoch: 31   Global Step: 158600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:10:10,859-Speed 10357.24 samples/sec   Loss 5.6616   LearningRate 0.0047   Epoch: 31   Global Step: 158610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:10:11,802-Speed 10874.35 samples/sec   Loss 5.6407   LearningRate 0.0047   Epoch: 31   Global Step: 158620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:10:12,745-Speed 10864.81 samples/sec   Loss 5.8016   LearningRate 0.0047   Epoch: 31   Global Step: 158630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:10:13,733-Speed 10373.06 samples/sec   Loss 5.8568   LearningRate 0.0047   Epoch: 31   Global Step: 158640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:10:14,688-Speed 10734.84 samples/sec   Loss 5.7177   LearningRate 0.0047   Epoch: 31   Global Step: 158650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:10:15,671-Speed 10428.38 samples/sec   Loss 5.7685   LearningRate 0.0047   Epoch: 31   Global Step: 158660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:10:16,689-Speed 10068.68 samples/sec   Loss 5.6697   LearningRate 0.0047   Epoch: 31   Global Step: 158670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:10:17,680-Speed 10338.21 samples/sec   Loss 5.8950   LearningRate 0.0047   Epoch: 31   Global Step: 158680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:10:18,707-Speed 9987.48 samples/sec   Loss 5.8056   LearningRate 0.0047   Epoch: 31   Global Step: 158690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:10:19,661-Speed 10738.99 samples/sec   Loss 5.6916   LearningRate 0.0046   Epoch: 31   Global Step: 158700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:10:20,664-Speed 10217.52 samples/sec   Loss 5.7275   LearningRate 0.0046   Epoch: 31   Global Step: 158710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:10:21,691-Speed 9981.29 samples/sec   Loss 5.6144   LearningRate 0.0046   Epoch: 31   Global Step: 158720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:10:22,728-Speed 9884.09 samples/sec   Loss 5.7926   LearningRate 0.0046   Epoch: 31   Global Step: 158730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:10:23,725-Speed 10281.02 samples/sec   Loss 5.6624   LearningRate 0.0046   Epoch: 31   Global Step: 158740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:10:24,703-Speed 10471.92 samples/sec   Loss 5.7434   LearningRate 0.0046   Epoch: 31   Global Step: 158750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:10:25,699-Speed 10292.90 samples/sec   Loss 5.7774   LearningRate 0.0046   Epoch: 31   Global Step: 158760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:10:26,615-Speed 11192.27 samples/sec   Loss 5.8294   LearningRate 0.0046   Epoch: 31   Global Step: 158770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:10:27,611-Speed 10480.96 samples/sec   Loss 5.7786   LearningRate 0.0046   Epoch: 31   Global Step: 158780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:10:28,591-Speed 10462.25 samples/sec   Loss 5.7300   LearningRate 0.0046   Epoch: 31   Global Step: 158790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:10:29,604-Speed 10122.76 samples/sec   Loss 5.8203   LearningRate 0.0046   Epoch: 31   Global Step: 158800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:10:30,586-Speed 10432.79 samples/sec   Loss 5.7599   LearningRate 0.0046   Epoch: 31   Global Step: 158810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:10:31,583-Speed 10273.34 samples/sec   Loss 5.7388   LearningRate 0.0046   Epoch: 31   Global Step: 158820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:10:32,592-Speed 10163.24 samples/sec   Loss 5.8097   LearningRate 0.0046   Epoch: 31   Global Step: 158830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:10:33,570-Speed 10474.73 samples/sec   Loss 5.7954   LearningRate 0.0046   Epoch: 31   Global Step: 158840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:10:34,592-Speed 10032.14 samples/sec   Loss 5.7506   LearningRate 0.0046   Epoch: 31   Global Step: 158850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:10:35,540-Speed 10805.74 samples/sec   Loss 5.6124   LearningRate 0.0046   Epoch: 31   Global Step: 158860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:10:36,516-Speed 10508.80 samples/sec   Loss 5.7240   LearningRate 0.0046   Epoch: 31   Global Step: 158870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:10:37,494-Speed 10476.64 samples/sec   Loss 5.9073   LearningRate 0.0046   Epoch: 31   Global Step: 158880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:10:38,501-Speed 10179.80 samples/sec   Loss 5.7233   LearningRate 0.0046   Epoch: 31   Global Step: 158890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:10:39,461-Speed 10673.07 samples/sec   Loss 5.8021   LearningRate 0.0046   Epoch: 31   Global Step: 158900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:10:40,430-Speed 10572.54 samples/sec   Loss 5.8230   LearningRate 0.0046   Epoch: 31   Global Step: 158910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:10:41,457-Speed 9978.48 samples/sec   Loss 5.7224   LearningRate 0.0046   Epoch: 31   Global Step: 158920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:10:42,507-Speed 9765.41 samples/sec   Loss 5.8320   LearningRate 0.0046   Epoch: 31   Global Step: 158930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:10:43,512-Speed 10217.23 samples/sec   Loss 5.6410   LearningRate 0.0046   Epoch: 31   Global Step: 158940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:10:44,509-Speed 10278.95 samples/sec   Loss 5.8448   LearningRate 0.0046   Epoch: 31   Global Step: 158950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:10:45,491-Speed 10428.63 samples/sec   Loss 5.6908   LearningRate 0.0046   Epoch: 31   Global Step: 158960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:10:46,446-Speed 10742.96 samples/sec   Loss 5.7807   LearningRate 0.0046   Epoch: 31   Global Step: 158970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:10:47,483-Speed 9885.04 samples/sec   Loss 5.8512   LearningRate 0.0046   Epoch: 31   Global Step: 158980   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:10:48,441-Speed 10689.98 samples/sec   Loss 5.5674   LearningRate 0.0046   Epoch: 31   Global Step: 158990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:10:49,435-Speed 10310.41 samples/sec   Loss 5.8056   LearningRate 0.0046   Epoch: 31   Global Step: 159000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:10:50,459-Speed 10013.58 samples/sec   Loss 5.6865   LearningRate 0.0046   Epoch: 31   Global Step: 159010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:10:51,489-Speed 9948.87 samples/sec   Loss 5.8878   LearningRate 0.0046   Epoch: 31   Global Step: 159020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:10:52,478-Speed 10359.87 samples/sec   Loss 5.7586   LearningRate 0.0046   Epoch: 31   Global Step: 159030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:10:53,461-Speed 10430.60 samples/sec   Loss 5.7183   LearningRate 0.0046   Epoch: 31   Global Step: 159040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:10:54,429-Speed 10586.91 samples/sec   Loss 5.6815   LearningRate 0.0046   Epoch: 31   Global Step: 159050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:10:55,412-Speed 10431.26 samples/sec   Loss 5.7661   LearningRate 0.0046   Epoch: 31   Global Step: 159060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:10:56,357-Speed 10848.96 samples/sec   Loss 5.8079   LearningRate 0.0046   Epoch: 31   Global Step: 159070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:10:57,347-Speed 10346.03 samples/sec   Loss 5.7380   LearningRate 0.0046   Epoch: 31   Global Step: 159080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:10:58,365-Speed 10070.77 samples/sec   Loss 5.7705   LearningRate 0.0046   Epoch: 31   Global Step: 159090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:10:59,375-Speed 10154.26 samples/sec   Loss 5.8976   LearningRate 0.0046   Epoch: 31   Global Step: 159100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:00,373-Speed 10262.40 samples/sec   Loss 5.6486   LearningRate 0.0046   Epoch: 31   Global Step: 159110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:01,351-Speed 10477.95 samples/sec   Loss 5.8596   LearningRate 0.0046   Epoch: 31   Global Step: 159120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:02,364-Speed 10114.38 samples/sec   Loss 5.7100   LearningRate 0.0046   Epoch: 31   Global Step: 159130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:03,364-Speed 10256.43 samples/sec   Loss 5.7744   LearningRate 0.0046   Epoch: 31   Global Step: 159140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:04,359-Speed 10294.19 samples/sec   Loss 5.6676   LearningRate 0.0046   Epoch: 31   Global Step: 159150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:05,366-Speed 10179.25 samples/sec   Loss 5.8379   LearningRate 0.0046   Epoch: 31   Global Step: 159160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:06,335-Speed 10578.07 samples/sec   Loss 5.7581   LearningRate 0.0045   Epoch: 31   Global Step: 159170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:07,331-Speed 10287.22 samples/sec   Loss 5.7161   LearningRate 0.0045   Epoch: 31   Global Step: 159180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:08,411-Speed 9486.73 samples/sec   Loss 5.8026   LearningRate 0.0045   Epoch: 31   Global Step: 159190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:09,390-Speed 10479.96 samples/sec   Loss 5.8107   LearningRate 0.0045   Epoch: 31   Global Step: 159200   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 05:11:10,375-Speed 10408.74 samples/sec   Loss 5.7857   LearningRate 0.0045   Epoch: 31   Global Step: 159210   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 05:11:11,369-Speed 10307.41 samples/sec   Loss 5.7855   LearningRate 0.0045   Epoch: 31   Global Step: 159220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:12,344-Speed 10513.39 samples/sec   Loss 5.6406   LearningRate 0.0045   Epoch: 31   Global Step: 159230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:13,307-Speed 10642.99 samples/sec   Loss 5.8739   LearningRate 0.0045   Epoch: 31   Global Step: 159240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:14,310-Speed 10217.72 samples/sec   Loss 5.8232   LearningRate 0.0045   Epoch: 31   Global Step: 159250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:15,363-Speed 9732.98 samples/sec   Loss 5.8022   LearningRate 0.0045   Epoch: 31   Global Step: 159260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:16,359-Speed 10295.61 samples/sec   Loss 5.7415   LearningRate 0.0045   Epoch: 31   Global Step: 159270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:17,374-Speed 10101.42 samples/sec   Loss 5.8258   LearningRate 0.0045   Epoch: 31   Global Step: 159280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:18,317-Speed 10877.42 samples/sec   Loss 5.7572   LearningRate 0.0045   Epoch: 31   Global Step: 159290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:19,319-Speed 10223.25 samples/sec   Loss 5.9287   LearningRate 0.0045   Epoch: 31   Global Step: 159300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:20,362-Speed 10005.85 samples/sec   Loss 5.8850   LearningRate 0.0045   Epoch: 31   Global Step: 159310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:21,281-Speed 11157.50 samples/sec   Loss 5.7085   LearningRate 0.0045   Epoch: 31   Global Step: 159320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:22,246-Speed 10620.85 samples/sec   Loss 5.7570   LearningRate 0.0045   Epoch: 31   Global Step: 159330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:23,266-Speed 10049.76 samples/sec   Loss 5.7590   LearningRate 0.0045   Epoch: 31   Global Step: 159340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:24,303-Speed 9886.44 samples/sec   Loss 5.7847   LearningRate 0.0045   Epoch: 31   Global Step: 159350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:25,289-Speed 10394.34 samples/sec   Loss 5.8156   LearningRate 0.0045   Epoch: 31   Global Step: 159360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:26,273-Speed 10411.21 samples/sec   Loss 5.9505   LearningRate 0.0045   Epoch: 31   Global Step: 159370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:27,254-Speed 10449.92 samples/sec   Loss 5.8676   LearningRate 0.0045   Epoch: 31   Global Step: 159380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:28,290-Speed 9885.51 samples/sec   Loss 5.8689   LearningRate 0.0045   Epoch: 31   Global Step: 159390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:29,254-Speed 10637.39 samples/sec   Loss 6.0100   LearningRate 0.0045   Epoch: 31   Global Step: 159400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:30,256-Speed 10228.00 samples/sec   Loss 5.9424   LearningRate 0.0045   Epoch: 31   Global Step: 159410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:31,279-Speed 10026.46 samples/sec   Loss 5.8621   LearningRate 0.0045   Epoch: 31   Global Step: 159420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:32,251-Speed 10546.40 samples/sec   Loss 5.7531   LearningRate 0.0045   Epoch: 31   Global Step: 159430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:33,226-Speed 10512.79 samples/sec   Loss 5.7558   LearningRate 0.0045   Epoch: 31   Global Step: 159440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:34,171-Speed 10835.88 samples/sec   Loss 5.8409   LearningRate 0.0045   Epoch: 31   Global Step: 159450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:35,172-Speed 10241.96 samples/sec   Loss 5.8515   LearningRate 0.0045   Epoch: 31   Global Step: 159460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:36,146-Speed 10523.89 samples/sec   Loss 5.8615   LearningRate 0.0045   Epoch: 31   Global Step: 159470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:37,150-Speed 10211.19 samples/sec   Loss 5.8424   LearningRate 0.0045   Epoch: 31   Global Step: 159480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:38,140-Speed 10352.75 samples/sec   Loss 5.8565   LearningRate 0.0045   Epoch: 31   Global Step: 159490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:39,118-Speed 10479.32 samples/sec   Loss 5.8179   LearningRate 0.0045   Epoch: 31   Global Step: 159500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:40,059-Speed 10892.20 samples/sec   Loss 5.6704   LearningRate 0.0045   Epoch: 31   Global Step: 159510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:41,068-Speed 10160.53 samples/sec   Loss 5.7729   LearningRate 0.0045   Epoch: 31   Global Step: 159520   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 05:11:42,029-Speed 10664.17 samples/sec   Loss 5.7962   LearningRate 0.0045   Epoch: 31   Global Step: 159530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:42,972-Speed 10859.93 samples/sec   Loss 5.7543   LearningRate 0.0045   Epoch: 31   Global Step: 159540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:44,040-Speed 9600.34 samples/sec   Loss 5.7305   LearningRate 0.0045   Epoch: 31   Global Step: 159550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:45,028-Speed 10376.70 samples/sec   Loss 5.7535   LearningRate 0.0045   Epoch: 31   Global Step: 159560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:46,027-Speed 10253.54 samples/sec   Loss 5.7707   LearningRate 0.0045   Epoch: 31   Global Step: 159570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:47,027-Speed 10250.52 samples/sec   Loss 5.8239   LearningRate 0.0045   Epoch: 31   Global Step: 159580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:48,113-Speed 9433.19 samples/sec   Loss 5.7439   LearningRate 0.0045   Epoch: 31   Global Step: 159590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:49,102-Speed 10364.97 samples/sec   Loss 5.6518   LearningRate 0.0045   Epoch: 31   Global Step: 159600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:50,099-Speed 10286.53 samples/sec   Loss 5.8374   LearningRate 0.0045   Epoch: 31   Global Step: 159610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:51,098-Speed 10257.72 samples/sec   Loss 5.8164   LearningRate 0.0045   Epoch: 31   Global Step: 159620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:52,110-Speed 10127.27 samples/sec   Loss 5.8498   LearningRate 0.0045   Epoch: 31   Global Step: 159630   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 05:11:53,100-Speed 10360.04 samples/sec   Loss 5.7523   LearningRate 0.0045   Epoch: 31   Global Step: 159640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:54,076-Speed 10492.24 samples/sec   Loss 5.6970   LearningRate 0.0044   Epoch: 31   Global Step: 159650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:55,007-Speed 11009.92 samples/sec   Loss 5.7552   LearningRate 0.0044   Epoch: 31   Global Step: 159660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:56,017-Speed 10152.56 samples/sec   Loss 5.9087   LearningRate 0.0044   Epoch: 31   Global Step: 159670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:56,984-Speed 10597.13 samples/sec   Loss 5.7258   LearningRate 0.0044   Epoch: 31   Global Step: 159680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:57,972-Speed 10372.30 samples/sec   Loss 5.8017   LearningRate 0.0044   Epoch: 31   Global Step: 159690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:11:58,922-Speed 10783.00 samples/sec   Loss 5.8461   LearningRate 0.0044   Epoch: 31   Global Step: 159700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:12:00,017-Speed 9363.24 samples/sec   Loss 5.7849   LearningRate 0.0044   Epoch: 31   Global Step: 159710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:12:00,988-Speed 10557.34 samples/sec   Loss 5.8487   LearningRate 0.0044   Epoch: 31   Global Step: 159720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:12:01,942-Speed 10752.28 samples/sec   Loss 5.8241   LearningRate 0.0044   Epoch: 31   Global Step: 159730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:12:02,919-Speed 10488.11 samples/sec   Loss 5.7492   LearningRate 0.0044   Epoch: 31   Global Step: 159740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:12:03,910-Speed 10344.00 samples/sec   Loss 5.7307   LearningRate 0.0044   Epoch: 31   Global Step: 159750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:12:04,856-Speed 10836.27 samples/sec   Loss 5.8216   LearningRate 0.0044   Epoch: 31   Global Step: 159760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:12:05,858-Speed 10218.32 samples/sec   Loss 5.9188   LearningRate 0.0044   Epoch: 31   Global Step: 159770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:12:06,917-Speed 9676.70 samples/sec   Loss 5.7245   LearningRate 0.0044   Epoch: 31   Global Step: 159780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:12:07,929-Speed 10132.61 samples/sec   Loss 5.7821   LearningRate 0.0044   Epoch: 31   Global Step: 159790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:12:08,946-Speed 10078.87 samples/sec   Loss 5.7504   LearningRate 0.0044   Epoch: 31   Global Step: 159800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:12:09,959-Speed 10111.66 samples/sec   Loss 5.8621   LearningRate 0.0044   Epoch: 31   Global Step: 159810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:12:10,964-Speed 10198.74 samples/sec   Loss 5.6581   LearningRate 0.0044   Epoch: 31   Global Step: 159820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:12:11,952-Speed 10377.00 samples/sec   Loss 5.8095   LearningRate 0.0044   Epoch: 31   Global Step: 159830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:12:12,956-Speed 10205.76 samples/sec   Loss 5.7335   LearningRate 0.0044   Epoch: 31   Global Step: 159840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:12:13,942-Speed 10392.99 samples/sec   Loss 5.7847   LearningRate 0.0044   Epoch: 31   Global Step: 159850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:12:14,907-Speed 10629.16 samples/sec   Loss 5.9676   LearningRate 0.0044   Epoch: 31   Global Step: 159860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:12:15,844-Speed 10933.54 samples/sec   Loss 5.8284   LearningRate 0.0044   Epoch: 31   Global Step: 159870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:12:16,802-Speed 10695.05 samples/sec   Loss 5.9234   LearningRate 0.0044   Epoch: 31   Global Step: 159880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:12:17,879-Speed 9546.09 samples/sec   Loss 5.9067   LearningRate 0.0044   Epoch: 31   Global Step: 159890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:12:18,885-Speed 10184.07 samples/sec   Loss 5.6871   LearningRate 0.0044   Epoch: 31   Global Step: 159900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:12:19,856-Speed 10559.26 samples/sec   Loss 5.8488   LearningRate 0.0044   Epoch: 31   Global Step: 159910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:12:20,808-Speed 10765.28 samples/sec   Loss 5.8004   LearningRate 0.0044   Epoch: 31   Global Step: 159920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:12:21,816-Speed 10166.57 samples/sec   Loss 5.8466   LearningRate 0.0044   Epoch: 31   Global Step: 159930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:12:22,792-Speed 10496.66 samples/sec   Loss 5.8479   LearningRate 0.0044   Epoch: 31   Global Step: 159940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:12:23,815-Speed 10020.06 samples/sec   Loss 5.6962   LearningRate 0.0044   Epoch: 31   Global Step: 159950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:12:24,864-Speed 9764.94 samples/sec   Loss 5.9389   LearningRate 0.0044   Epoch: 31   Global Step: 159960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:12:25,838-Speed 10528.04 samples/sec   Loss 5.8283   LearningRate 0.0044   Epoch: 31   Global Step: 159970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:12:26,791-Speed 10751.21 samples/sec   Loss 5.6472   LearningRate 0.0044   Epoch: 31   Global Step: 159980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:12:27,740-Speed 10795.53 samples/sec   Loss 5.7619   LearningRate 0.0044   Epoch: 31   Global Step: 159990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:12:28,774-Speed 9915.56 samples/sec   Loss 5.8483   LearningRate 0.0044   Epoch: 31   Global Step: 160000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:12:51,452-[lfw][160000]XNorm: 8.348365
Training: 2022-04-11 05:12:51,453-[lfw][160000]Accuracy-Flip: 0.99667+-0.00307
Training: 2022-04-11 05:12:51,453-[lfw][160000]Accuracy-Highest: 0.99700
Training: 2022-04-11 05:13:16,893-[cfp_fp][160000]XNorm: 7.147081
Training: 2022-04-11 05:13:16,894-[cfp_fp][160000]Accuracy-Flip: 0.97200+-0.00793
Training: 2022-04-11 05:13:16,895-[cfp_fp][160000]Accuracy-Highest: 0.97200
Training: 2022-04-11 05:13:38,936-[agedb_30][160000]XNorm: 8.150915
Training: 2022-04-11 05:13:38,936-[agedb_30][160000]Accuracy-Flip: 0.97300+-0.00670
Training: 2022-04-11 05:13:38,937-[agedb_30][160000]Accuracy-Highest: 0.97300
Training: 2022-04-11 05:13:39,890-Speed 143.99 samples/sec   Loss 5.7534   LearningRate 0.0044   Epoch: 31   Global Step: 160010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:13:40,858-Speed 10616.12 samples/sec   Loss 5.7914   LearningRate 0.0044   Epoch: 31   Global Step: 160020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:13:41,822-Speed 10631.48 samples/sec   Loss 5.7846   LearningRate 0.0044   Epoch: 31   Global Step: 160030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:13:42,844-Speed 10033.91 samples/sec   Loss 5.8678   LearningRate 0.0044   Epoch: 31   Global Step: 160040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:13:43,762-Speed 11158.71 samples/sec   Loss 5.7990   LearningRate 0.0044   Epoch: 31   Global Step: 160050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:13:44,760-Speed 10267.07 samples/sec   Loss 5.7974   LearningRate 0.0044   Epoch: 31   Global Step: 160060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:13:45,769-Speed 10155.09 samples/sec   Loss 5.6804   LearningRate 0.0044   Epoch: 31   Global Step: 160070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:13:46,721-Speed 10764.64 samples/sec   Loss 5.7749   LearningRate 0.0044   Epoch: 31   Global Step: 160080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:13:47,765-Speed 9832.85 samples/sec   Loss 5.8009   LearningRate 0.0044   Epoch: 31   Global Step: 160090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:13:48,736-Speed 10553.91 samples/sec   Loss 5.9370   LearningRate 0.0044   Epoch: 31   Global Step: 160100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:13:49,714-Speed 10485.25 samples/sec   Loss 5.8810   LearningRate 0.0044   Epoch: 31   Global Step: 160110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:13:50,714-Speed 10262.08 samples/sec   Loss 5.8362   LearningRate 0.0044   Epoch: 31   Global Step: 160120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:13:51,731-Speed 10073.23 samples/sec   Loss 5.9333   LearningRate 0.0043   Epoch: 31   Global Step: 160130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:13:52,735-Speed 10232.30 samples/sec   Loss 5.7823   LearningRate 0.0043   Epoch: 31   Global Step: 160140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:13:53,730-Speed 10292.59 samples/sec   Loss 5.7428   LearningRate 0.0043   Epoch: 31   Global Step: 160150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:13:54,680-Speed 10795.51 samples/sec   Loss 5.8992   LearningRate 0.0043   Epoch: 31   Global Step: 160160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:13:55,624-Speed 10851.31 samples/sec   Loss 5.6918   LearningRate 0.0043   Epoch: 31   Global Step: 160170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:13:56,579-Speed 10729.10 samples/sec   Loss 5.7339   LearningRate 0.0043   Epoch: 31   Global Step: 160180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:13:57,575-Speed 10296.71 samples/sec   Loss 5.8682   LearningRate 0.0043   Epoch: 31   Global Step: 160190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:13:58,559-Speed 10409.43 samples/sec   Loss 5.8704   LearningRate 0.0043   Epoch: 31   Global Step: 160200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:13:59,560-Speed 10246.19 samples/sec   Loss 5.9025   LearningRate 0.0043   Epoch: 31   Global Step: 160210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:14:00,584-Speed 10009.77 samples/sec   Loss 5.8645   LearningRate 0.0043   Epoch: 31   Global Step: 160220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:14:01,610-Speed 9987.52 samples/sec   Loss 5.8813   LearningRate 0.0043   Epoch: 31   Global Step: 160230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:14:02,579-Speed 10583.89 samples/sec   Loss 5.7684   LearningRate 0.0043   Epoch: 31   Global Step: 160240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:14:03,545-Speed 10605.16 samples/sec   Loss 5.8309   LearningRate 0.0043   Epoch: 31   Global Step: 160250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:14:04,557-Speed 10123.61 samples/sec   Loss 5.8169   LearningRate 0.0043   Epoch: 31   Global Step: 160260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:14:05,586-Speed 9964.16 samples/sec   Loss 5.8031   LearningRate 0.0043   Epoch: 31   Global Step: 160270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:14:06,587-Speed 10236.28 samples/sec   Loss 5.8103   LearningRate 0.0043   Epoch: 31   Global Step: 160280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:14:07,584-Speed 10277.05 samples/sec   Loss 5.8440   LearningRate 0.0043   Epoch: 31   Global Step: 160290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:14:08,562-Speed 10487.49 samples/sec   Loss 5.8880   LearningRate 0.0043   Epoch: 31   Global Step: 160300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:14:09,535-Speed 10528.21 samples/sec   Loss 5.7773   LearningRate 0.0043   Epoch: 31   Global Step: 160310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:14:10,518-Speed 10433.74 samples/sec   Loss 5.9871   LearningRate 0.0043   Epoch: 31   Global Step: 160320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:14:11,444-Speed 11066.23 samples/sec   Loss 5.7481   LearningRate 0.0043   Epoch: 31   Global Step: 160330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:14:12,409-Speed 10622.23 samples/sec   Loss 5.9766   LearningRate 0.0043   Epoch: 31   Global Step: 160340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:14:13,418-Speed 10162.40 samples/sec   Loss 5.9338   LearningRate 0.0043   Epoch: 31   Global Step: 160350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:14:14,391-Speed 10536.57 samples/sec   Loss 5.8058   LearningRate 0.0043   Epoch: 31   Global Step: 160360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:14:15,389-Speed 10275.61 samples/sec   Loss 5.9237   LearningRate 0.0043   Epoch: 31   Global Step: 160370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:14:16,399-Speed 10147.27 samples/sec   Loss 5.7162   LearningRate 0.0043   Epoch: 31   Global Step: 160380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:14:17,365-Speed 10606.63 samples/sec   Loss 5.8960   LearningRate 0.0043   Epoch: 31   Global Step: 160390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:14:18,454-Speed 9412.42 samples/sec   Loss 5.7436   LearningRate 0.0043   Epoch: 31   Global Step: 160400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:14:19,422-Speed 10586.42 samples/sec   Loss 5.7679   LearningRate 0.0043   Epoch: 31   Global Step: 160410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:14:20,346-Speed 11090.42 samples/sec   Loss 5.9302   LearningRate 0.0043   Epoch: 31   Global Step: 160420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:14:21,333-Speed 10381.29 samples/sec   Loss 5.7722   LearningRate 0.0043   Epoch: 31   Global Step: 160430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:14:22,338-Speed 10202.30 samples/sec   Loss 5.8072   LearningRate 0.0043   Epoch: 31   Global Step: 160440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:14:23,325-Speed 10391.99 samples/sec   Loss 5.7985   LearningRate 0.0043   Epoch: 31   Global Step: 160450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:14:24,305-Speed 10465.23 samples/sec   Loss 5.9368   LearningRate 0.0043   Epoch: 31   Global Step: 160460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:14:25,279-Speed 10517.39 samples/sec   Loss 5.8370   LearningRate 0.0043   Epoch: 31   Global Step: 160470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:14:26,294-Speed 10090.83 samples/sec   Loss 5.9064   LearningRate 0.0043   Epoch: 31   Global Step: 160480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:14:27,320-Speed 9997.41 samples/sec   Loss 5.7830   LearningRate 0.0043   Epoch: 31   Global Step: 160490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:14:28,322-Speed 10219.08 samples/sec   Loss 5.8151   LearningRate 0.0043   Epoch: 31   Global Step: 160500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:14:29,337-Speed 10105.15 samples/sec   Loss 5.9961   LearningRate 0.0043   Epoch: 31   Global Step: 160510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:14:30,366-Speed 9959.74 samples/sec   Loss 5.8220   LearningRate 0.0043   Epoch: 31   Global Step: 160520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:14:31,346-Speed 10455.28 samples/sec   Loss 5.9033   LearningRate 0.0043   Epoch: 31   Global Step: 160530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:14:32,314-Speed 10590.61 samples/sec   Loss 5.8494   LearningRate 0.0043   Epoch: 31   Global Step: 160540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:14:33,347-Speed 9917.92 samples/sec   Loss 5.8738   LearningRate 0.0043   Epoch: 31   Global Step: 160550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:14:34,328-Speed 10448.50 samples/sec   Loss 5.9440   LearningRate 0.0043   Epoch: 31   Global Step: 160560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:14:35,364-Speed 9895.97 samples/sec   Loss 5.8118   LearningRate 0.0043   Epoch: 31   Global Step: 160570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:14:36,369-Speed 10191.17 samples/sec   Loss 5.9401   LearningRate 0.0043   Epoch: 31   Global Step: 160580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:14:37,386-Speed 10075.64 samples/sec   Loss 5.9019   LearningRate 0.0043   Epoch: 31   Global Step: 160590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:14:38,410-Speed 10007.79 samples/sec   Loss 5.8160   LearningRate 0.0043   Epoch: 31   Global Step: 160600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:14:39,427-Speed 10086.52 samples/sec   Loss 5.9778   LearningRate 0.0043   Epoch: 31   Global Step: 160610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:14:40,396-Speed 10571.59 samples/sec   Loss 5.8079   LearningRate 0.0042   Epoch: 31   Global Step: 160620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:14:41,483-Speed 9429.13 samples/sec   Loss 5.9954   LearningRate 0.0042   Epoch: 31   Global Step: 160630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:14:42,461-Speed 10480.94 samples/sec   Loss 5.9860   LearningRate 0.0042   Epoch: 31   Global Step: 160640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:14:43,426-Speed 10627.43 samples/sec   Loss 5.9294   LearningRate 0.0042   Epoch: 31   Global Step: 160650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:14:44,529-Speed 9293.44 samples/sec   Loss 5.8325   LearningRate 0.0042   Epoch: 31   Global Step: 160660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:14:45,494-Speed 10623.80 samples/sec   Loss 5.7687   LearningRate 0.0042   Epoch: 31   Global Step: 160670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:14:46,473-Speed 10471.56 samples/sec   Loss 6.0123   LearningRate 0.0042   Epoch: 31   Global Step: 160680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:14:47,485-Speed 10124.94 samples/sec   Loss 5.8977   LearningRate 0.0042   Epoch: 31   Global Step: 160690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:14:48,496-Speed 10143.80 samples/sec   Loss 5.7900   LearningRate 0.0042   Epoch: 31   Global Step: 160700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:14:49,487-Speed 10341.42 samples/sec   Loss 5.9687   LearningRate 0.0042   Epoch: 31   Global Step: 160710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:14:50,483-Speed 10295.33 samples/sec   Loss 5.8954   LearningRate 0.0042   Epoch: 31   Global Step: 160720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:14:51,485-Speed 10221.55 samples/sec   Loss 5.9839   LearningRate 0.0042   Epoch: 31   Global Step: 160730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:14:52,489-Speed 10214.73 samples/sec   Loss 5.8804   LearningRate 0.0042   Epoch: 31   Global Step: 160740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:14:53,463-Speed 10522.87 samples/sec   Loss 5.9600   LearningRate 0.0042   Epoch: 31   Global Step: 160750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:14:54,444-Speed 10447.86 samples/sec   Loss 5.7316   LearningRate 0.0042   Epoch: 31   Global Step: 160760   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 05:14:55,411-Speed 10593.53 samples/sec   Loss 5.8807   LearningRate 0.0042   Epoch: 31   Global Step: 160770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:14:56,406-Speed 10305.32 samples/sec   Loss 5.9088   LearningRate 0.0042   Epoch: 31   Global Step: 160780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:14:57,479-Speed 9555.39 samples/sec   Loss 5.9127   LearningRate 0.0042   Epoch: 31   Global Step: 160790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:14:58,499-Speed 10063.20 samples/sec   Loss 5.8152   LearningRate 0.0042   Epoch: 31   Global Step: 160800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:14:59,500-Speed 10233.11 samples/sec   Loss 5.8841   LearningRate 0.0042   Epoch: 31   Global Step: 160810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:15:00,518-Speed 10068.71 samples/sec   Loss 5.9000   LearningRate 0.0042   Epoch: 31   Global Step: 160820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:15:01,575-Speed 9694.35 samples/sec   Loss 5.8547   LearningRate 0.0042   Epoch: 31   Global Step: 160830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:15:02,552-Speed 10500.27 samples/sec   Loss 5.9430   LearningRate 0.0042   Epoch: 31   Global Step: 160840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:15:03,531-Speed 10458.41 samples/sec   Loss 5.8667   LearningRate 0.0042   Epoch: 31   Global Step: 160850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:15:04,579-Speed 9782.80 samples/sec   Loss 5.8863   LearningRate 0.0042   Epoch: 31   Global Step: 160860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:15:05,546-Speed 10600.46 samples/sec   Loss 5.9487   LearningRate 0.0042   Epoch: 31   Global Step: 160870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:15:06,509-Speed 10650.53 samples/sec   Loss 5.7692   LearningRate 0.0042   Epoch: 31   Global Step: 160880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:15:07,503-Speed 10309.59 samples/sec   Loss 5.7875   LearningRate 0.0042   Epoch: 31   Global Step: 160890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:08,507-Speed 10218.47 samples/sec   Loss 5.8473   LearningRate 0.0042   Epoch: 31   Global Step: 160900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:09,476-Speed 10575.24 samples/sec   Loss 5.9143   LearningRate 0.0042   Epoch: 31   Global Step: 160910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:10,473-Speed 10275.17 samples/sec   Loss 5.8253   LearningRate 0.0042   Epoch: 31   Global Step: 160920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:11,461-Speed 10375.92 samples/sec   Loss 5.7590   LearningRate 0.0042   Epoch: 31   Global Step: 160930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:12,468-Speed 10189.01 samples/sec   Loss 5.8383   LearningRate 0.0042   Epoch: 31   Global Step: 160940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:13,466-Speed 10267.11 samples/sec   Loss 5.8593   LearningRate 0.0042   Epoch: 31   Global Step: 160950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:14,480-Speed 10104.45 samples/sec   Loss 5.7573   LearningRate 0.0042   Epoch: 31   Global Step: 160960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:15,475-Speed 10302.67 samples/sec   Loss 5.9292   LearningRate 0.0042   Epoch: 31   Global Step: 160970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:16,475-Speed 10252.89 samples/sec   Loss 5.7341   LearningRate 0.0042   Epoch: 31   Global Step: 160980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:17,511-Speed 9894.10 samples/sec   Loss 5.9201   LearningRate 0.0042   Epoch: 31   Global Step: 160990   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 05:15:18,496-Speed 10406.01 samples/sec   Loss 5.8256   LearningRate 0.0042   Epoch: 31   Global Step: 161000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:19,463-Speed 10596.99 samples/sec   Loss 6.0061   LearningRate 0.0042   Epoch: 31   Global Step: 161010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:20,475-Speed 10124.37 samples/sec   Loss 5.9730   LearningRate 0.0042   Epoch: 31   Global Step: 161020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:21,507-Speed 9939.23 samples/sec   Loss 5.8310   LearningRate 0.0042   Epoch: 31   Global Step: 161030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:22,502-Speed 10301.45 samples/sec   Loss 5.9474   LearningRate 0.0042   Epoch: 31   Global Step: 161040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:23,447-Speed 10842.16 samples/sec   Loss 5.8937   LearningRate 0.0042   Epoch: 31   Global Step: 161050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:24,441-Speed 10306.23 samples/sec   Loss 5.9397   LearningRate 0.0042   Epoch: 31   Global Step: 161060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:25,398-Speed 10712.48 samples/sec   Loss 5.9617   LearningRate 0.0042   Epoch: 31   Global Step: 161070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:26,384-Speed 10401.85 samples/sec   Loss 5.9202   LearningRate 0.0042   Epoch: 31   Global Step: 161080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:27,363-Speed 10466.53 samples/sec   Loss 5.8667   LearningRate 0.0042   Epoch: 31   Global Step: 161090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:28,380-Speed 10070.95 samples/sec   Loss 5.9431   LearningRate 0.0042   Epoch: 31   Global Step: 161100   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 05:15:29,308-Speed 11046.06 samples/sec   Loss 5.9808   LearningRate 0.0041   Epoch: 31   Global Step: 161110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:30,292-Speed 10415.97 samples/sec   Loss 5.7645   LearningRate 0.0041   Epoch: 31   Global Step: 161120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:31,425-Speed 9048.95 samples/sec   Loss 5.8060   LearningRate 0.0041   Epoch: 31   Global Step: 161130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:32,407-Speed 10438.29 samples/sec   Loss 5.8995   LearningRate 0.0041   Epoch: 31   Global Step: 161140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:33,346-Speed 10907.51 samples/sec   Loss 5.9405   LearningRate 0.0041   Epoch: 31   Global Step: 161150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:34,395-Speed 9777.30 samples/sec   Loss 5.9107   LearningRate 0.0041   Epoch: 31   Global Step: 161160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:35,328-Speed 10975.60 samples/sec   Loss 5.9792   LearningRate 0.0041   Epoch: 31   Global Step: 161170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:36,314-Speed 10399.39 samples/sec   Loss 5.9072   LearningRate 0.0041   Epoch: 31   Global Step: 161180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:37,316-Speed 10226.85 samples/sec   Loss 5.7889   LearningRate 0.0041   Epoch: 31   Global Step: 161190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:38,319-Speed 10212.20 samples/sec   Loss 5.9501   LearningRate 0.0041   Epoch: 31   Global Step: 161200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:39,251-Speed 11001.56 samples/sec   Loss 5.9829   LearningRate 0.0041   Epoch: 31   Global Step: 161210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:40,245-Speed 10317.51 samples/sec   Loss 5.8752   LearningRate 0.0041   Epoch: 31   Global Step: 161220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:41,277-Speed 9932.53 samples/sec   Loss 6.1468   LearningRate 0.0041   Epoch: 31   Global Step: 161230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:42,290-Speed 10125.88 samples/sec   Loss 5.8773   LearningRate 0.0041   Epoch: 31   Global Step: 161240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:43,316-Speed 9992.29 samples/sec   Loss 5.8657   LearningRate 0.0041   Epoch: 31   Global Step: 161250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:44,328-Speed 10131.52 samples/sec   Loss 5.8525   LearningRate 0.0041   Epoch: 31   Global Step: 161260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:45,338-Speed 10142.98 samples/sec   Loss 6.0995   LearningRate 0.0041   Epoch: 31   Global Step: 161270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:46,356-Speed 10065.78 samples/sec   Loss 5.8157   LearningRate 0.0041   Epoch: 31   Global Step: 161280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:47,334-Speed 10487.72 samples/sec   Loss 5.9910   LearningRate 0.0041   Epoch: 31   Global Step: 161290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:48,311-Speed 10488.38 samples/sec   Loss 5.8881   LearningRate 0.0041   Epoch: 31   Global Step: 161300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:49,277-Speed 10607.09 samples/sec   Loss 5.8775   LearningRate 0.0041   Epoch: 31   Global Step: 161310   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 05:15:50,249-Speed 10541.61 samples/sec   Loss 5.9036   LearningRate 0.0041   Epoch: 31   Global Step: 161320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:51,274-Speed 9996.62 samples/sec   Loss 5.7702   LearningRate 0.0041   Epoch: 31   Global Step: 161330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:52,259-Speed 10408.26 samples/sec   Loss 5.9477   LearningRate 0.0041   Epoch: 31   Global Step: 161340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:53,244-Speed 10407.76 samples/sec   Loss 6.0860   LearningRate 0.0041   Epoch: 31   Global Step: 161350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:54,238-Speed 10309.43 samples/sec   Loss 5.7865   LearningRate 0.0041   Epoch: 31   Global Step: 161360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:55,248-Speed 10141.61 samples/sec   Loss 5.9616   LearningRate 0.0041   Epoch: 31   Global Step: 161370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:56,184-Speed 10957.32 samples/sec   Loss 5.9813   LearningRate 0.0041   Epoch: 31   Global Step: 161380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:57,145-Speed 10664.75 samples/sec   Loss 6.0085   LearningRate 0.0041   Epoch: 31   Global Step: 161390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:58,129-Speed 10413.81 samples/sec   Loss 5.7933   LearningRate 0.0041   Epoch: 31   Global Step: 161400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:15:59,137-Speed 10162.44 samples/sec   Loss 5.8653   LearningRate 0.0041   Epoch: 31   Global Step: 161410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:00,062-Speed 11091.62 samples/sec   Loss 5.9737   LearningRate 0.0041   Epoch: 31   Global Step: 161420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:01,047-Speed 10396.91 samples/sec   Loss 5.9657   LearningRate 0.0041   Epoch: 31   Global Step: 161430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:16:02,108-Speed 9658.74 samples/sec   Loss 5.9781   LearningRate 0.0041   Epoch: 31   Global Step: 161440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:16:03,124-Speed 10083.48 samples/sec   Loss 5.9805   LearningRate 0.0041   Epoch: 31   Global Step: 161450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:16:04,104-Speed 10460.28 samples/sec   Loss 5.8647   LearningRate 0.0041   Epoch: 31   Global Step: 161460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:16:05,072-Speed 10588.04 samples/sec   Loss 5.9121   LearningRate 0.0041   Epoch: 31   Global Step: 161470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:16:06,069-Speed 10285.86 samples/sec   Loss 6.0220   LearningRate 0.0041   Epoch: 31   Global Step: 161480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:16:07,065-Speed 10289.41 samples/sec   Loss 6.0687   LearningRate 0.0041   Epoch: 31   Global Step: 161490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:16:08,065-Speed 10249.49 samples/sec   Loss 5.8497   LearningRate 0.0041   Epoch: 31   Global Step: 161500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:16:09,055-Speed 10357.11 samples/sec   Loss 5.9398   LearningRate 0.0041   Epoch: 31   Global Step: 161510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:16:10,042-Speed 10379.56 samples/sec   Loss 5.9076   LearningRate 0.0041   Epoch: 31   Global Step: 161520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 05:16:11,054-Speed 10140.99 samples/sec   Loss 5.8627   LearningRate 0.0041   Epoch: 31   Global Step: 161530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:12,080-Speed 9984.94 samples/sec   Loss 5.9320   LearningRate 0.0041   Epoch: 31   Global Step: 161540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:13,078-Speed 10263.87 samples/sec   Loss 5.6973   LearningRate 0.0041   Epoch: 31   Global Step: 161550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:14,083-Speed 10199.54 samples/sec   Loss 5.8996   LearningRate 0.0041   Epoch: 31   Global Step: 161560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:15,082-Speed 10258.15 samples/sec   Loss 5.9219   LearningRate 0.0041   Epoch: 31   Global Step: 161570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:16,048-Speed 10605.08 samples/sec   Loss 5.8727   LearningRate 0.0041   Epoch: 31   Global Step: 161580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:17,004-Speed 10727.07 samples/sec   Loss 5.9836   LearningRate 0.0041   Epoch: 31   Global Step: 161590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:18,013-Speed 10153.16 samples/sec   Loss 6.0788   LearningRate 0.0041   Epoch: 31   Global Step: 161600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:18,982-Speed 10583.62 samples/sec   Loss 5.8363   LearningRate 0.0040   Epoch: 31   Global Step: 161610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:19,953-Speed 10551.77 samples/sec   Loss 5.9879   LearningRate 0.0040   Epoch: 31   Global Step: 161620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:20,960-Speed 10178.59 samples/sec   Loss 5.8518   LearningRate 0.0040   Epoch: 31   Global Step: 161630   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 05:16:21,952-Speed 10334.56 samples/sec   Loss 5.9608   LearningRate 0.0040   Epoch: 31   Global Step: 161640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:22,923-Speed 10551.94 samples/sec   Loss 5.8536   LearningRate 0.0040   Epoch: 31   Global Step: 161650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:23,905-Speed 10434.83 samples/sec   Loss 5.9156   LearningRate 0.0040   Epoch: 31   Global Step: 161660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:24,924-Speed 10059.67 samples/sec   Loss 5.8531   LearningRate 0.0040   Epoch: 31   Global Step: 161670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:25,865-Speed 10897.21 samples/sec   Loss 6.0280   LearningRate 0.0040   Epoch: 31   Global Step: 161680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:26,832-Speed 10590.68 samples/sec   Loss 5.9484   LearningRate 0.0040   Epoch: 31   Global Step: 161690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:27,848-Speed 10084.03 samples/sec   Loss 5.9640   LearningRate 0.0040   Epoch: 31   Global Step: 161700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:28,826-Speed 10480.39 samples/sec   Loss 5.8469   LearningRate 0.0040   Epoch: 31   Global Step: 161710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:29,793-Speed 10609.90 samples/sec   Loss 5.9346   LearningRate 0.0040   Epoch: 31   Global Step: 161720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:30,842-Speed 9769.08 samples/sec   Loss 6.0208   LearningRate 0.0040   Epoch: 31   Global Step: 161730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:31,793-Speed 10776.03 samples/sec   Loss 5.8436   LearningRate 0.0040   Epoch: 31   Global Step: 161740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:32,778-Speed 10407.40 samples/sec   Loss 5.8297   LearningRate 0.0040   Epoch: 31   Global Step: 161750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:33,783-Speed 10202.09 samples/sec   Loss 5.8479   LearningRate 0.0040   Epoch: 31   Global Step: 161760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:34,789-Speed 10181.27 samples/sec   Loss 5.8595   LearningRate 0.0040   Epoch: 31   Global Step: 161770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:35,768-Speed 10477.66 samples/sec   Loss 5.8388   LearningRate 0.0040   Epoch: 31   Global Step: 161780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:36,797-Speed 9962.78 samples/sec   Loss 5.9383   LearningRate 0.0040   Epoch: 31   Global Step: 161790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:37,750-Speed 10755.53 samples/sec   Loss 6.0078   LearningRate 0.0040   Epoch: 31   Global Step: 161800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:38,782-Speed 9926.17 samples/sec   Loss 5.9361   LearningRate 0.0040   Epoch: 31   Global Step: 161810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:39,777-Speed 10297.76 samples/sec   Loss 5.8294   LearningRate 0.0040   Epoch: 31   Global Step: 161820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:40,773-Speed 10298.48 samples/sec   Loss 5.8321   LearningRate 0.0040   Epoch: 31   Global Step: 161830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:41,797-Speed 10014.00 samples/sec   Loss 5.8632   LearningRate 0.0040   Epoch: 31   Global Step: 161840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:42,821-Speed 10010.70 samples/sec   Loss 5.9284   LearningRate 0.0040   Epoch: 31   Global Step: 161850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:53,391-Speed 969.14 samples/sec   Loss 5.5897   LearningRate 0.0040   Epoch: 32   Global Step: 161860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:54,392-Speed 10239.62 samples/sec   Loss 5.4691   LearningRate 0.0040   Epoch: 32   Global Step: 161870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:55,511-Speed 9163.35 samples/sec   Loss 5.5656   LearningRate 0.0040   Epoch: 32   Global Step: 161880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:56,612-Speed 9308.39 samples/sec   Loss 5.4634   LearningRate 0.0040   Epoch: 32   Global Step: 161890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:57,861-Speed 8204.13 samples/sec   Loss 5.4644   LearningRate 0.0040   Epoch: 32   Global Step: 161900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:58,818-Speed 10714.62 samples/sec   Loss 5.4004   LearningRate 0.0040   Epoch: 32   Global Step: 161910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:16:59,880-Speed 9654.10 samples/sec   Loss 5.4844   LearningRate 0.0040   Epoch: 32   Global Step: 161920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:17:00,908-Speed 9966.81 samples/sec   Loss 5.4272   LearningRate 0.0040   Epoch: 32   Global Step: 161930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:17:01,901-Speed 10330.71 samples/sec   Loss 5.4110   LearningRate 0.0040   Epoch: 32   Global Step: 161940   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 05:17:02,887-Speed 10394.02 samples/sec   Loss 5.3859   LearningRate 0.0040   Epoch: 32   Global Step: 161950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:17:03,932-Speed 9806.36 samples/sec   Loss 5.5195   LearningRate 0.0040   Epoch: 32   Global Step: 161960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:17:04,944-Speed 10133.39 samples/sec   Loss 5.2627   LearningRate 0.0040   Epoch: 32   Global Step: 161970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:17:05,907-Speed 10651.79 samples/sec   Loss 5.4628   LearningRate 0.0040   Epoch: 32   Global Step: 161980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:17:06,900-Speed 10326.58 samples/sec   Loss 5.3625   LearningRate 0.0040   Epoch: 32   Global Step: 161990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:17:07,890-Speed 10342.82 samples/sec   Loss 5.3489   LearningRate 0.0040   Epoch: 32   Global Step: 162000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:17:37,286-[lfw][162000]XNorm: 8.326912
Training: 2022-04-11 05:17:37,286-[lfw][162000]Accuracy-Flip: 0.99667+-0.00365
Training: 2022-04-11 05:17:37,286-[lfw][162000]Accuracy-Highest: 0.99700
Training: 2022-04-11 05:18:04,572-[cfp_fp][162000]XNorm: 7.187373
Training: 2022-04-11 05:18:04,572-[cfp_fp][162000]Accuracy-Flip: 0.97043+-0.00874
Training: 2022-04-11 05:18:04,573-[cfp_fp][162000]Accuracy-Highest: 0.97200
Training: 2022-04-11 05:18:26,766-[agedb_30][162000]XNorm: 8.109785
Training: 2022-04-11 05:18:26,767-[agedb_30][162000]Accuracy-Flip: 0.97167+-0.00637
Training: 2022-04-11 05:18:26,768-[agedb_30][162000]Accuracy-Highest: 0.97300
Training: 2022-04-11 05:18:27,764-Speed 128.20 samples/sec   Loss 5.4565   LearningRate 0.0040   Epoch: 32   Global Step: 162010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:18:28,735-Speed 10547.73 samples/sec   Loss 5.4468   LearningRate 0.0040   Epoch: 32   Global Step: 162020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:18:29,735-Speed 10248.50 samples/sec   Loss 5.3915   LearningRate 0.0040   Epoch: 32   Global Step: 162030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:18:30,734-Speed 10263.39 samples/sec   Loss 5.4070   LearningRate 0.0040   Epoch: 32   Global Step: 162040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:18:31,685-Speed 10795.90 samples/sec   Loss 5.3695   LearningRate 0.0040   Epoch: 32   Global Step: 162050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:18:32,663-Speed 10481.18 samples/sec   Loss 5.3714   LearningRate 0.0040   Epoch: 32   Global Step: 162060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:18:33,721-Speed 9694.04 samples/sec   Loss 5.3021   LearningRate 0.0040   Epoch: 32   Global Step: 162070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:18:34,740-Speed 10055.63 samples/sec   Loss 5.4172   LearningRate 0.0040   Epoch: 32   Global Step: 162080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:18:35,928-Speed 8630.76 samples/sec   Loss 5.3668   LearningRate 0.0040   Epoch: 32   Global Step: 162090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:18:36,875-Speed 10814.70 samples/sec   Loss 5.4971   LearningRate 0.0040   Epoch: 32   Global Step: 162100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:18:37,898-Speed 10018.98 samples/sec   Loss 5.4307   LearningRate 0.0039   Epoch: 32   Global Step: 162110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:18:38,883-Speed 10408.23 samples/sec   Loss 5.4740   LearningRate 0.0039   Epoch: 32   Global Step: 162120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:18:39,913-Speed 9946.09 samples/sec   Loss 5.4504   LearningRate 0.0039   Epoch: 32   Global Step: 162130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:18:40,928-Speed 10106.92 samples/sec   Loss 5.4361   LearningRate 0.0039   Epoch: 32   Global Step: 162140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:18:41,924-Speed 10291.75 samples/sec   Loss 5.6077   LearningRate 0.0039   Epoch: 32   Global Step: 162150   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 05:18:42,913-Speed 10365.09 samples/sec   Loss 5.4121   LearningRate 0.0039   Epoch: 32   Global Step: 162160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:18:43,993-Speed 9484.23 samples/sec   Loss 5.5230   LearningRate 0.0039   Epoch: 32   Global Step: 162170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:18:44,988-Speed 10310.39 samples/sec   Loss 5.4152   LearningRate 0.0039   Epoch: 32   Global Step: 162180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:18:45,977-Speed 10361.11 samples/sec   Loss 5.4317   LearningRate 0.0039   Epoch: 32   Global Step: 162190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:18:46,961-Speed 10420.61 samples/sec   Loss 5.3776   LearningRate 0.0039   Epoch: 32   Global Step: 162200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:18:48,003-Speed 9835.28 samples/sec   Loss 5.4675   LearningRate 0.0039   Epoch: 32   Global Step: 162210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 05:18:48,951-Speed 10812.38 samples/sec   Loss 5.4355   LearningRate 0.0039   Epoch: 32   Global Step: 162220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:18:49,918-Speed 10595.83 samples/sec   Loss 5.3654   LearningRate 0.0039   Epoch: 32   Global Step: 162230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:18:50,946-Speed 9968.81 samples/sec   Loss 5.4438   LearningRate 0.0039   Epoch: 32   Global Step: 162240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:18:51,933-Speed 10384.65 samples/sec   Loss 5.3594   LearningRate 0.0039   Epoch: 32   Global Step: 162250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:18:52,898-Speed 10629.42 samples/sec   Loss 5.5147   LearningRate 0.0039   Epoch: 32   Global Step: 162260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:18:53,884-Speed 10389.47 samples/sec   Loss 5.4158   LearningRate 0.0039   Epoch: 32   Global Step: 162270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:18:54,946-Speed 9656.20 samples/sec   Loss 5.6879   LearningRate 0.0039   Epoch: 32   Global Step: 162280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:18:55,915-Speed 10576.84 samples/sec   Loss 5.4906   LearningRate 0.0039   Epoch: 32   Global Step: 162290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:18:56,943-Speed 9964.54 samples/sec   Loss 5.5428   LearningRate 0.0039   Epoch: 32   Global Step: 162300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:18:57,952-Speed 10162.12 samples/sec   Loss 5.5320   LearningRate 0.0039   Epoch: 32   Global Step: 162310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:18:58,941-Speed 10363.46 samples/sec   Loss 5.5518   LearningRate 0.0039   Epoch: 32   Global Step: 162320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:18:59,940-Speed 10253.53 samples/sec   Loss 5.3814   LearningRate 0.0039   Epoch: 32   Global Step: 162330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:00,934-Speed 10308.99 samples/sec   Loss 5.3814   LearningRate 0.0039   Epoch: 32   Global Step: 162340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:01,975-Speed 9850.44 samples/sec   Loss 5.6390   LearningRate 0.0039   Epoch: 32   Global Step: 162350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:02,912-Speed 10938.34 samples/sec   Loss 5.6193   LearningRate 0.0039   Epoch: 32   Global Step: 162360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:03,855-Speed 10865.71 samples/sec   Loss 5.6820   LearningRate 0.0039   Epoch: 32   Global Step: 162370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:04,819-Speed 10633.17 samples/sec   Loss 5.4440   LearningRate 0.0039   Epoch: 32   Global Step: 162380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:05,753-Speed 10983.45 samples/sec   Loss 5.4442   LearningRate 0.0039   Epoch: 32   Global Step: 162390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:06,739-Speed 10393.65 samples/sec   Loss 5.5057   LearningRate 0.0039   Epoch: 32   Global Step: 162400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:19:07,743-Speed 10217.03 samples/sec   Loss 5.3278   LearningRate 0.0039   Epoch: 32   Global Step: 162410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:19:08,666-Speed 11094.49 samples/sec   Loss 5.4488   LearningRate 0.0039   Epoch: 32   Global Step: 162420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:19:09,642-Speed 10498.69 samples/sec   Loss 5.4833   LearningRate 0.0039   Epoch: 32   Global Step: 162430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:19:10,625-Speed 10429.43 samples/sec   Loss 5.3296   LearningRate 0.0039   Epoch: 32   Global Step: 162440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:19:11,681-Speed 9706.18 samples/sec   Loss 5.5064   LearningRate 0.0039   Epoch: 32   Global Step: 162450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:19:12,673-Speed 10331.85 samples/sec   Loss 5.4525   LearningRate 0.0039   Epoch: 32   Global Step: 162460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:19:13,642-Speed 10582.96 samples/sec   Loss 5.5519   LearningRate 0.0039   Epoch: 32   Global Step: 162470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:19:14,687-Speed 9796.55 samples/sec   Loss 5.6892   LearningRate 0.0039   Epoch: 32   Global Step: 162480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:19:15,699-Speed 10127.44 samples/sec   Loss 5.4725   LearningRate 0.0039   Epoch: 32   Global Step: 162490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:19:16,661-Speed 10658.72 samples/sec   Loss 5.5349   LearningRate 0.0039   Epoch: 32   Global Step: 162500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:17,653-Speed 10335.22 samples/sec   Loss 5.5332   LearningRate 0.0039   Epoch: 32   Global Step: 162510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:18,743-Speed 9399.47 samples/sec   Loss 5.4568   LearningRate 0.0039   Epoch: 32   Global Step: 162520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:19,768-Speed 10000.17 samples/sec   Loss 5.5462   LearningRate 0.0039   Epoch: 32   Global Step: 162530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:20,769-Speed 10239.05 samples/sec   Loss 5.5938   LearningRate 0.0039   Epoch: 32   Global Step: 162540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:21,789-Speed 10049.14 samples/sec   Loss 5.4835   LearningRate 0.0039   Epoch: 32   Global Step: 162550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:22,828-Speed 9859.48 samples/sec   Loss 5.5006   LearningRate 0.0039   Epoch: 32   Global Step: 162560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:23,828-Speed 10262.10 samples/sec   Loss 5.5471   LearningRate 0.0039   Epoch: 32   Global Step: 162570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:24,816-Speed 10372.31 samples/sec   Loss 5.4845   LearningRate 0.0039   Epoch: 32   Global Step: 162580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:25,804-Speed 10373.01 samples/sec   Loss 5.4594   LearningRate 0.0039   Epoch: 32   Global Step: 162590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:26,786-Speed 10434.00 samples/sec   Loss 5.4802   LearningRate 0.0039   Epoch: 32   Global Step: 162600   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 05:19:27,759-Speed 10531.65 samples/sec   Loss 5.5661   LearningRate 0.0039   Epoch: 32   Global Step: 162610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:28,720-Speed 10672.47 samples/sec   Loss 5.4281   LearningRate 0.0039   Epoch: 32   Global Step: 162620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:29,710-Speed 10342.41 samples/sec   Loss 5.5234   LearningRate 0.0038   Epoch: 32   Global Step: 162630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:30,762-Speed 9749.82 samples/sec   Loss 5.4674   LearningRate 0.0038   Epoch: 32   Global Step: 162640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:31,762-Speed 10242.57 samples/sec   Loss 5.5812   LearningRate 0.0038   Epoch: 32   Global Step: 162650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:32,719-Speed 10711.80 samples/sec   Loss 5.5322   LearningRate 0.0038   Epoch: 32   Global Step: 162660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:33,821-Speed 9297.21 samples/sec   Loss 5.5948   LearningRate 0.0038   Epoch: 32   Global Step: 162670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:34,785-Speed 10633.40 samples/sec   Loss 5.6421   LearningRate 0.0038   Epoch: 32   Global Step: 162680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:35,714-Speed 11032.77 samples/sec   Loss 5.5719   LearningRate 0.0038   Epoch: 32   Global Step: 162690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:36,700-Speed 10394.29 samples/sec   Loss 5.6558   LearningRate 0.0038   Epoch: 32   Global Step: 162700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:37,661-Speed 10666.70 samples/sec   Loss 5.5447   LearningRate 0.0038   Epoch: 32   Global Step: 162710   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 05:19:38,716-Speed 9719.18 samples/sec   Loss 5.4798   LearningRate 0.0038   Epoch: 32   Global Step: 162720   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 05:19:39,679-Speed 10643.49 samples/sec   Loss 5.5800   LearningRate 0.0038   Epoch: 32   Global Step: 162730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:40,674-Speed 10292.60 samples/sec   Loss 5.4527   LearningRate 0.0038   Epoch: 32   Global Step: 162740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:41,735-Speed 9661.51 samples/sec   Loss 5.4499   LearningRate 0.0038   Epoch: 32   Global Step: 162750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:42,711-Speed 10504.72 samples/sec   Loss 5.6290   LearningRate 0.0038   Epoch: 32   Global Step: 162760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:43,654-Speed 10868.44 samples/sec   Loss 5.5214   LearningRate 0.0038   Epoch: 32   Global Step: 162770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:44,659-Speed 10202.28 samples/sec   Loss 5.5266   LearningRate 0.0038   Epoch: 32   Global Step: 162780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:45,632-Speed 10535.10 samples/sec   Loss 5.4890   LearningRate 0.0038   Epoch: 32   Global Step: 162790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:46,630-Speed 10258.94 samples/sec   Loss 5.5908   LearningRate 0.0038   Epoch: 32   Global Step: 162800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:47,645-Speed 10102.99 samples/sec   Loss 5.5278   LearningRate 0.0038   Epoch: 32   Global Step: 162810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:48,630-Speed 10402.43 samples/sec   Loss 5.4290   LearningRate 0.0038   Epoch: 32   Global Step: 162820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:49,643-Speed 10121.71 samples/sec   Loss 5.5241   LearningRate 0.0038   Epoch: 32   Global Step: 162830   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 05:19:50,683-Speed 9858.10 samples/sec   Loss 5.5511   LearningRate 0.0038   Epoch: 32   Global Step: 162840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:51,626-Speed 10872.28 samples/sec   Loss 5.4501   LearningRate 0.0038   Epoch: 32   Global Step: 162850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:52,597-Speed 10544.10 samples/sec   Loss 5.6679   LearningRate 0.0038   Epoch: 32   Global Step: 162860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:53,585-Speed 10375.86 samples/sec   Loss 5.4968   LearningRate 0.0038   Epoch: 32   Global Step: 162870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:54,578-Speed 10318.98 samples/sec   Loss 5.5085   LearningRate 0.0038   Epoch: 32   Global Step: 162880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:55,617-Speed 9859.38 samples/sec   Loss 5.5280   LearningRate 0.0038   Epoch: 32   Global Step: 162890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:56,596-Speed 10467.65 samples/sec   Loss 5.3737   LearningRate 0.0038   Epoch: 32   Global Step: 162900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:57,600-Speed 10213.18 samples/sec   Loss 5.5766   LearningRate 0.0038   Epoch: 32   Global Step: 162910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:58,610-Speed 10182.26 samples/sec   Loss 5.4957   LearningRate 0.0038   Epoch: 32   Global Step: 162920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:19:59,614-Speed 10203.95 samples/sec   Loss 5.7126   LearningRate 0.0038   Epoch: 32   Global Step: 162930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:00,551-Speed 10937.64 samples/sec   Loss 5.6326   LearningRate 0.0038   Epoch: 32   Global Step: 162940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:01,532-Speed 10449.73 samples/sec   Loss 5.5322   LearningRate 0.0038   Epoch: 32   Global Step: 162950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:02,477-Speed 10837.51 samples/sec   Loss 5.6694   LearningRate 0.0038   Epoch: 32   Global Step: 162960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:03,493-Speed 10095.26 samples/sec   Loss 5.4952   LearningRate 0.0038   Epoch: 32   Global Step: 162970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:04,473-Speed 10457.81 samples/sec   Loss 5.5190   LearningRate 0.0038   Epoch: 32   Global Step: 162980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:05,447-Speed 10523.35 samples/sec   Loss 5.6497   LearningRate 0.0038   Epoch: 32   Global Step: 162990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:06,443-Speed 10283.44 samples/sec   Loss 5.5915   LearningRate 0.0038   Epoch: 32   Global Step: 163000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:07,443-Speed 10246.95 samples/sec   Loss 5.6598   LearningRate 0.0038   Epoch: 32   Global Step: 163010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:08,417-Speed 10523.28 samples/sec   Loss 5.5804   LearningRate 0.0038   Epoch: 32   Global Step: 163020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:09,399-Speed 10444.05 samples/sec   Loss 5.6268   LearningRate 0.0038   Epoch: 32   Global Step: 163030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:10,355-Speed 10713.47 samples/sec   Loss 5.6767   LearningRate 0.0038   Epoch: 32   Global Step: 163040   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 05:20:11,323-Speed 10586.23 samples/sec   Loss 5.5514   LearningRate 0.0038   Epoch: 32   Global Step: 163050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:12,389-Speed 9622.13 samples/sec   Loss 5.5685   LearningRate 0.0038   Epoch: 32   Global Step: 163060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:13,377-Speed 10362.95 samples/sec   Loss 5.4708   LearningRate 0.0038   Epoch: 32   Global Step: 163070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:14,399-Speed 10033.40 samples/sec   Loss 5.7754   LearningRate 0.0038   Epoch: 32   Global Step: 163080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:15,462-Speed 9647.13 samples/sec   Loss 5.5368   LearningRate 0.0038   Epoch: 32   Global Step: 163090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:16,451-Speed 10360.06 samples/sec   Loss 5.5003   LearningRate 0.0038   Epoch: 32   Global Step: 163100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:17,453-Speed 10234.20 samples/sec   Loss 5.5304   LearningRate 0.0038   Epoch: 32   Global Step: 163110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:18,455-Speed 10224.32 samples/sec   Loss 5.4167   LearningRate 0.0038   Epoch: 32   Global Step: 163120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:19,526-Speed 9574.07 samples/sec   Loss 5.4813   LearningRate 0.0038   Epoch: 32   Global Step: 163130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:20,493-Speed 10600.43 samples/sec   Loss 5.6522   LearningRate 0.0038   Epoch: 32   Global Step: 163140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:21,477-Speed 10426.94 samples/sec   Loss 5.5527   LearningRate 0.0037   Epoch: 32   Global Step: 163150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:22,530-Speed 9732.02 samples/sec   Loss 5.5666   LearningRate 0.0037   Epoch: 32   Global Step: 163160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:23,550-Speed 10047.84 samples/sec   Loss 5.5950   LearningRate 0.0037   Epoch: 32   Global Step: 163170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:24,532-Speed 10438.35 samples/sec   Loss 5.6095   LearningRate 0.0037   Epoch: 32   Global Step: 163180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:25,570-Speed 9872.16 samples/sec   Loss 5.5683   LearningRate 0.0037   Epoch: 32   Global Step: 163190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:26,572-Speed 10242.12 samples/sec   Loss 5.5992   LearningRate 0.0037   Epoch: 32   Global Step: 163200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:27,588-Speed 10091.46 samples/sec   Loss 5.4141   LearningRate 0.0037   Epoch: 32   Global Step: 163210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:28,603-Speed 10103.10 samples/sec   Loss 5.5761   LearningRate 0.0037   Epoch: 32   Global Step: 163220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:29,604-Speed 10233.01 samples/sec   Loss 5.5621   LearningRate 0.0037   Epoch: 32   Global Step: 163230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:30,605-Speed 10241.86 samples/sec   Loss 5.6977   LearningRate 0.0037   Epoch: 32   Global Step: 163240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:31,608-Speed 10221.64 samples/sec   Loss 5.6034   LearningRate 0.0037   Epoch: 32   Global Step: 163250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:32,649-Speed 9835.98 samples/sec   Loss 5.6096   LearningRate 0.0037   Epoch: 32   Global Step: 163260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:33,780-Speed 9060.74 samples/sec   Loss 5.6621   LearningRate 0.0037   Epoch: 32   Global Step: 163270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:34,759-Speed 10476.11 samples/sec   Loss 5.5600   LearningRate 0.0037   Epoch: 32   Global Step: 163280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:35,705-Speed 10830.52 samples/sec   Loss 5.6856   LearningRate 0.0037   Epoch: 32   Global Step: 163290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:36,695-Speed 10347.84 samples/sec   Loss 5.5770   LearningRate 0.0037   Epoch: 32   Global Step: 163300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:37,740-Speed 9806.95 samples/sec   Loss 5.6368   LearningRate 0.0037   Epoch: 32   Global Step: 163310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:38,736-Speed 10297.21 samples/sec   Loss 5.5110   LearningRate 0.0037   Epoch: 32   Global Step: 163320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:39,718-Speed 10441.24 samples/sec   Loss 5.5494   LearningRate 0.0037   Epoch: 32   Global Step: 163330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:40,746-Speed 9967.95 samples/sec   Loss 5.4696   LearningRate 0.0037   Epoch: 32   Global Step: 163340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:41,715-Speed 10573.54 samples/sec   Loss 5.6215   LearningRate 0.0037   Epoch: 32   Global Step: 163350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:42,708-Speed 10315.96 samples/sec   Loss 5.5505   LearningRate 0.0037   Epoch: 32   Global Step: 163360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:43,686-Speed 10487.66 samples/sec   Loss 5.6027   LearningRate 0.0037   Epoch: 32   Global Step: 163370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:44,717-Speed 9937.64 samples/sec   Loss 5.5353   LearningRate 0.0037   Epoch: 32   Global Step: 163380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:45,698-Speed 10450.65 samples/sec   Loss 5.7261   LearningRate 0.0037   Epoch: 32   Global Step: 163390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:46,694-Speed 10288.04 samples/sec   Loss 5.5999   LearningRate 0.0037   Epoch: 32   Global Step: 163400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:47,677-Speed 10428.12 samples/sec   Loss 5.7086   LearningRate 0.0037   Epoch: 32   Global Step: 163410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:48,684-Speed 10175.40 samples/sec   Loss 5.5919   LearningRate 0.0037   Epoch: 32   Global Step: 163420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:49,716-Speed 9932.35 samples/sec   Loss 5.4999   LearningRate 0.0037   Epoch: 32   Global Step: 163430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:50,734-Speed 10072.21 samples/sec   Loss 5.6629   LearningRate 0.0037   Epoch: 32   Global Step: 163440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:51,737-Speed 10217.34 samples/sec   Loss 5.4911   LearningRate 0.0037   Epoch: 32   Global Step: 163450   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 05:20:52,740-Speed 10222.79 samples/sec   Loss 5.5700   LearningRate 0.0037   Epoch: 32   Global Step: 163460   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 05:20:53,735-Speed 10302.35 samples/sec   Loss 5.5299   LearningRate 0.0037   Epoch: 32   Global Step: 163470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:54,750-Speed 10099.39 samples/sec   Loss 5.5887   LearningRate 0.0037   Epoch: 32   Global Step: 163480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:55,706-Speed 10725.64 samples/sec   Loss 5.5841   LearningRate 0.0037   Epoch: 32   Global Step: 163490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:56,693-Speed 10379.51 samples/sec   Loss 5.6804   LearningRate 0.0037   Epoch: 32   Global Step: 163500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:57,651-Speed 10697.21 samples/sec   Loss 5.5554   LearningRate 0.0037   Epoch: 32   Global Step: 163510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:58,662-Speed 10131.51 samples/sec   Loss 5.7394   LearningRate 0.0037   Epoch: 32   Global Step: 163520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:20:59,658-Speed 10298.68 samples/sec   Loss 5.6166   LearningRate 0.0037   Epoch: 32   Global Step: 163530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:21:00,665-Speed 10176.38 samples/sec   Loss 5.6038   LearningRate 0.0037   Epoch: 32   Global Step: 163540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:21:01,687-Speed 10025.14 samples/sec   Loss 5.5724   LearningRate 0.0037   Epoch: 32   Global Step: 163550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:21:02,687-Speed 10244.07 samples/sec   Loss 5.4793   LearningRate 0.0037   Epoch: 32   Global Step: 163560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:21:03,668-Speed 10458.38 samples/sec   Loss 5.6226   LearningRate 0.0037   Epoch: 32   Global Step: 163570   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 05:21:04,686-Speed 10060.59 samples/sec   Loss 5.6234   LearningRate 0.0037   Epoch: 32   Global Step: 163580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:21:05,685-Speed 10265.38 samples/sec   Loss 5.6222   LearningRate 0.0037   Epoch: 32   Global Step: 163590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:21:06,650-Speed 10612.96 samples/sec   Loss 5.6905   LearningRate 0.0037   Epoch: 32   Global Step: 163600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:21:07,658-Speed 10170.42 samples/sec   Loss 5.6371   LearningRate 0.0037   Epoch: 32   Global Step: 163610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:21:08,663-Speed 10196.07 samples/sec   Loss 5.4904   LearningRate 0.0037   Epoch: 32   Global Step: 163620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:21:09,639-Speed 10500.20 samples/sec   Loss 5.6027   LearningRate 0.0037   Epoch: 32   Global Step: 163630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:21:10,645-Speed 10195.51 samples/sec   Loss 5.8002   LearningRate 0.0037   Epoch: 32   Global Step: 163640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:21:11,671-Speed 9994.69 samples/sec   Loss 5.6671   LearningRate 0.0037   Epoch: 32   Global Step: 163650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:21:12,665-Speed 10304.71 samples/sec   Loss 5.6739   LearningRate 0.0037   Epoch: 32   Global Step: 163660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:21:13,644-Speed 10475.74 samples/sec   Loss 5.6819   LearningRate 0.0036   Epoch: 32   Global Step: 163670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:21:14,622-Speed 10469.58 samples/sec   Loss 5.7928   LearningRate 0.0036   Epoch: 32   Global Step: 163680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:21:15,646-Speed 10017.50 samples/sec   Loss 5.4893   LearningRate 0.0036   Epoch: 32   Global Step: 163690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:21:16,674-Speed 9969.61 samples/sec   Loss 5.6501   LearningRate 0.0036   Epoch: 32   Global Step: 163700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:21:17,631-Speed 10709.43 samples/sec   Loss 5.5904   LearningRate 0.0036   Epoch: 32   Global Step: 163710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:21:18,637-Speed 10206.63 samples/sec   Loss 5.6154   LearningRate 0.0036   Epoch: 32   Global Step: 163720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:21:19,625-Speed 10364.60 samples/sec   Loss 5.4747   LearningRate 0.0036   Epoch: 32   Global Step: 163730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:21:20,639-Speed 10113.48 samples/sec   Loss 5.5038   LearningRate 0.0036   Epoch: 32   Global Step: 163740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:21:21,610-Speed 10549.48 samples/sec   Loss 5.5347   LearningRate 0.0036   Epoch: 32   Global Step: 163750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:21:22,618-Speed 10173.74 samples/sec   Loss 5.5626   LearningRate 0.0036   Epoch: 32   Global Step: 163760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:21:23,607-Speed 10360.46 samples/sec   Loss 5.5861   LearningRate 0.0036   Epoch: 32   Global Step: 163770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:21:24,594-Speed 10379.73 samples/sec   Loss 5.6743   LearningRate 0.0036   Epoch: 32   Global Step: 163780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:21:25,633-Speed 9861.47 samples/sec   Loss 5.4886   LearningRate 0.0036   Epoch: 32   Global Step: 163790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:21:26,597-Speed 10637.55 samples/sec   Loss 5.5483   LearningRate 0.0036   Epoch: 32   Global Step: 163800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:21:27,588-Speed 10347.03 samples/sec   Loss 5.5428   LearningRate 0.0036   Epoch: 32   Global Step: 163810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:21:28,607-Speed 10060.94 samples/sec   Loss 5.4483   LearningRate 0.0036   Epoch: 32   Global Step: 163820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:21:29,624-Speed 10072.05 samples/sec   Loss 5.6679   LearningRate 0.0036   Epoch: 32   Global Step: 163830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:21:30,636-Speed 10127.85 samples/sec   Loss 5.4591   LearningRate 0.0036   Epoch: 32   Global Step: 163840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:21:31,614-Speed 10476.98 samples/sec   Loss 5.6086   LearningRate 0.0036   Epoch: 32   Global Step: 163850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:21:32,599-Speed 10405.43 samples/sec   Loss 5.6275   LearningRate 0.0036   Epoch: 32   Global Step: 163860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:21:33,604-Speed 10204.30 samples/sec   Loss 5.3834   LearningRate 0.0036   Epoch: 32   Global Step: 163870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:21:34,682-Speed 9501.61 samples/sec   Loss 5.6670   LearningRate 0.0036   Epoch: 32   Global Step: 163880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:21:35,696-Speed 10112.10 samples/sec   Loss 5.5833   LearningRate 0.0036   Epoch: 32   Global Step: 163890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:21:36,678-Speed 10436.26 samples/sec   Loss 5.5707   LearningRate 0.0036   Epoch: 32   Global Step: 163900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:21:37,655-Speed 10492.79 samples/sec   Loss 5.6091   LearningRate 0.0036   Epoch: 32   Global Step: 163910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:21:38,673-Speed 10063.58 samples/sec   Loss 5.5156   LearningRate 0.0036   Epoch: 32   Global Step: 163920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:21:39,643-Speed 10565.62 samples/sec   Loss 5.7099   LearningRate 0.0036   Epoch: 32   Global Step: 163930   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 05:21:40,633-Speed 10360.33 samples/sec   Loss 5.6100   LearningRate 0.0036   Epoch: 32   Global Step: 163940   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 05:21:41,613-Speed 10464.82 samples/sec   Loss 5.5395   LearningRate 0.0036   Epoch: 32   Global Step: 163950   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 05:21:42,602-Speed 10373.67 samples/sec   Loss 5.5918   LearningRate 0.0036   Epoch: 32   Global Step: 163960   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 05:21:43,576-Speed 10511.85 samples/sec   Loss 5.6570   LearningRate 0.0036   Epoch: 32   Global Step: 163970   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 05:21:44,637-Speed 9660.74 samples/sec   Loss 5.6214   LearningRate 0.0036   Epoch: 32   Global Step: 163980   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 05:21:45,604-Speed 10608.16 samples/sec   Loss 5.6971   LearningRate 0.0036   Epoch: 32   Global Step: 163990   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 05:21:46,591-Speed 10381.08 samples/sec   Loss 5.6784   LearningRate 0.0036   Epoch: 32   Global Step: 164000   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 05:22:08,883-[lfw][164000]XNorm: 8.250782
Training: 2022-04-11 05:22:08,884-[lfw][164000]Accuracy-Flip: 0.99667+-0.00333
Training: 2022-04-11 05:22:08,884-[lfw][164000]Accuracy-Highest: 0.99700
Training: 2022-04-11 05:22:34,395-[cfp_fp][164000]XNorm: 7.127077
Training: 2022-04-11 05:22:34,396-[cfp_fp][164000]Accuracy-Flip: 0.97000+-0.00994
Training: 2022-04-11 05:22:34,397-[cfp_fp][164000]Accuracy-Highest: 0.97200
Training: 2022-04-11 05:22:56,604-[agedb_30][164000]XNorm: 8.088189
Training: 2022-04-11 05:22:56,605-[agedb_30][164000]Accuracy-Flip: 0.97100+-0.00768
Training: 2022-04-11 05:22:56,605-[agedb_30][164000]Accuracy-Highest: 0.97300
Training: 2022-04-11 05:22:57,570-Speed 144.27 samples/sec   Loss 5.5557   LearningRate 0.0036   Epoch: 32   Global Step: 164010   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 05:22:58,521-Speed 10782.00 samples/sec   Loss 5.6289   LearningRate 0.0036   Epoch: 32   Global Step: 164020   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 05:22:59,508-Speed 10381.79 samples/sec   Loss 5.6713   LearningRate 0.0036   Epoch: 32   Global Step: 164030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:23:00,536-Speed 9969.98 samples/sec   Loss 5.6199   LearningRate 0.0036   Epoch: 32   Global Step: 164040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:23:01,549-Speed 10113.87 samples/sec   Loss 5.5271   LearningRate 0.0036   Epoch: 32   Global Step: 164050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:23:02,525-Speed 10500.73 samples/sec   Loss 5.6774   LearningRate 0.0036   Epoch: 32   Global Step: 164060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:23:03,547-Speed 10029.13 samples/sec   Loss 5.6670   LearningRate 0.0036   Epoch: 32   Global Step: 164070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:23:04,561-Speed 10120.23 samples/sec   Loss 5.6704   LearningRate 0.0036   Epoch: 32   Global Step: 164080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:23:05,513-Speed 10761.11 samples/sec   Loss 5.5776   LearningRate 0.0036   Epoch: 32   Global Step: 164090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:23:06,471-Speed 10695.40 samples/sec   Loss 5.6521   LearningRate 0.0036   Epoch: 32   Global Step: 164100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:23:07,467-Speed 10285.51 samples/sec   Loss 5.7442   LearningRate 0.0036   Epoch: 32   Global Step: 164110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:23:08,477-Speed 10154.54 samples/sec   Loss 5.4980   LearningRate 0.0036   Epoch: 32   Global Step: 164120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:23:09,452-Speed 10503.33 samples/sec   Loss 5.6877   LearningRate 0.0036   Epoch: 32   Global Step: 164130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:23:10,443-Speed 10338.98 samples/sec   Loss 5.7112   LearningRate 0.0036   Epoch: 32   Global Step: 164140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:23:11,441-Speed 10266.86 samples/sec   Loss 5.5832   LearningRate 0.0036   Epoch: 32   Global Step: 164150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:23:12,449-Speed 10165.42 samples/sec   Loss 5.7529   LearningRate 0.0036   Epoch: 32   Global Step: 164160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:23:13,514-Speed 9623.97 samples/sec   Loss 5.5693   LearningRate 0.0036   Epoch: 32   Global Step: 164170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:23:14,459-Speed 10848.53 samples/sec   Loss 5.6327   LearningRate 0.0036   Epoch: 32   Global Step: 164180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:23:15,469-Speed 10151.58 samples/sec   Loss 5.7103   LearningRate 0.0036   Epoch: 32   Global Step: 164190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:23:16,421-Speed 10762.48 samples/sec   Loss 5.7772   LearningRate 0.0035   Epoch: 32   Global Step: 164200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:23:17,441-Speed 10047.07 samples/sec   Loss 5.6805   LearningRate 0.0035   Epoch: 32   Global Step: 164210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:23:18,434-Speed 10319.72 samples/sec   Loss 5.7399   LearningRate 0.0035   Epoch: 32   Global Step: 164220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:23:19,443-Speed 10164.26 samples/sec   Loss 5.6581   LearningRate 0.0035   Epoch: 32   Global Step: 164230   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 05:23:20,414-Speed 10555.07 samples/sec   Loss 5.7976   LearningRate 0.0035   Epoch: 32   Global Step: 164240   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 05:23:21,426-Speed 10129.33 samples/sec   Loss 5.6955   LearningRate 0.0035   Epoch: 32   Global Step: 164250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:23:22,497-Speed 9566.36 samples/sec   Loss 5.6615   LearningRate 0.0035   Epoch: 32   Global Step: 164260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:23:23,489-Speed 10336.59 samples/sec   Loss 5.7065   LearningRate 0.0035   Epoch: 32   Global Step: 164270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:23:24,518-Speed 9957.13 samples/sec   Loss 5.7111   LearningRate 0.0035   Epoch: 32   Global Step: 164280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:23:25,529-Speed 10143.25 samples/sec   Loss 5.5698   LearningRate 0.0035   Epoch: 32   Global Step: 164290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:23:26,533-Speed 10212.87 samples/sec   Loss 5.7263   LearningRate 0.0035   Epoch: 32   Global Step: 164300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:23:27,506-Speed 10533.63 samples/sec   Loss 5.6864   LearningRate 0.0035   Epoch: 32   Global Step: 164310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:23:28,538-Speed 9934.25 samples/sec   Loss 5.5190   LearningRate 0.0035   Epoch: 32   Global Step: 164320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:23:29,536-Speed 10262.37 samples/sec   Loss 5.6254   LearningRate 0.0035   Epoch: 32   Global Step: 164330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:23:30,556-Speed 10047.07 samples/sec   Loss 5.5347   LearningRate 0.0035   Epoch: 32   Global Step: 164340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:23:31,558-Speed 10231.01 samples/sec   Loss 5.6142   LearningRate 0.0035   Epoch: 32   Global Step: 164350   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 05:23:32,556-Speed 10590.26 samples/sec   Loss 5.6306   LearningRate 0.0035   Epoch: 32   Global Step: 164360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:23:33,525-Speed 10580.24 samples/sec   Loss 5.6377   LearningRate 0.0035   Epoch: 32   Global Step: 164370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:23:34,550-Speed 9998.44 samples/sec   Loss 5.6176   LearningRate 0.0035   Epoch: 32   Global Step: 164380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:23:35,539-Speed 10363.22 samples/sec   Loss 5.6068   LearningRate 0.0035   Epoch: 32   Global Step: 164390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:23:36,542-Speed 10225.47 samples/sec   Loss 5.5393   LearningRate 0.0035   Epoch: 32   Global Step: 164400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:23:37,558-Speed 10078.22 samples/sec   Loss 5.6870   LearningRate 0.0035   Epoch: 32   Global Step: 164410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:23:38,590-Speed 9938.44 samples/sec   Loss 5.7805   LearningRate 0.0035   Epoch: 32   Global Step: 164420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:23:39,601-Speed 10136.76 samples/sec   Loss 5.5787   LearningRate 0.0035   Epoch: 32   Global Step: 164430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:23:40,606-Speed 10200.99 samples/sec   Loss 5.6591   LearningRate 0.0035   Epoch: 32   Global Step: 164440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:23:41,585-Speed 10462.07 samples/sec   Loss 5.7583   LearningRate 0.0035   Epoch: 32   Global Step: 164450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:23:42,616-Speed 9943.39 samples/sec   Loss 5.7065   LearningRate 0.0035   Epoch: 32   Global Step: 164460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:23:43,600-Speed 10411.52 samples/sec   Loss 5.6003   LearningRate 0.0035   Epoch: 32   Global Step: 164470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:23:44,553-Speed 10749.87 samples/sec   Loss 5.6989   LearningRate 0.0035   Epoch: 32   Global Step: 164480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:23:45,536-Speed 10432.37 samples/sec   Loss 5.5665   LearningRate 0.0035   Epoch: 32   Global Step: 164490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:23:46,532-Speed 10288.88 samples/sec   Loss 5.6179   LearningRate 0.0035   Epoch: 32   Global Step: 164500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:23:47,541-Speed 10156.33 samples/sec   Loss 5.8012   LearningRate 0.0035   Epoch: 32   Global Step: 164510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:23:48,513-Speed 10546.15 samples/sec   Loss 5.5848   LearningRate 0.0035   Epoch: 32   Global Step: 164520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:23:49,511-Speed 10269.60 samples/sec   Loss 5.6795   LearningRate 0.0035   Epoch: 32   Global Step: 164530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:23:50,485-Speed 10518.04 samples/sec   Loss 5.6473   LearningRate 0.0035   Epoch: 32   Global Step: 164540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:23:51,471-Speed 10401.90 samples/sec   Loss 5.6757   LearningRate 0.0035   Epoch: 32   Global Step: 164550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:23:52,434-Speed 10644.77 samples/sec   Loss 5.7276   LearningRate 0.0035   Epoch: 32   Global Step: 164560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:23:53,419-Speed 10395.04 samples/sec   Loss 5.5958   LearningRate 0.0035   Epoch: 32   Global Step: 164570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:23:54,432-Speed 10123.85 samples/sec   Loss 5.4668   LearningRate 0.0035   Epoch: 32   Global Step: 164580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:23:55,422-Speed 10350.41 samples/sec   Loss 5.6086   LearningRate 0.0035   Epoch: 32   Global Step: 164590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:23:56,362-Speed 10904.09 samples/sec   Loss 5.6152   LearningRate 0.0035   Epoch: 32   Global Step: 164600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:23:57,318-Speed 10723.23 samples/sec   Loss 5.7967   LearningRate 0.0035   Epoch: 32   Global Step: 164610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:23:58,378-Speed 9672.87 samples/sec   Loss 5.6809   LearningRate 0.0035   Epoch: 32   Global Step: 164620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:23:59,417-Speed 9857.75 samples/sec   Loss 5.7108   LearningRate 0.0035   Epoch: 32   Global Step: 164630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:00,416-Speed 10259.37 samples/sec   Loss 5.5010   LearningRate 0.0035   Epoch: 32   Global Step: 164640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:01,354-Speed 10924.68 samples/sec   Loss 5.6666   LearningRate 0.0035   Epoch: 32   Global Step: 164650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:02,368-Speed 10109.28 samples/sec   Loss 5.8066   LearningRate 0.0035   Epoch: 32   Global Step: 164660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:24:03,370-Speed 10232.12 samples/sec   Loss 5.7211   LearningRate 0.0035   Epoch: 32   Global Step: 164670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:24:04,390-Speed 10050.12 samples/sec   Loss 5.6722   LearningRate 0.0035   Epoch: 32   Global Step: 164680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:24:05,413-Speed 10021.17 samples/sec   Loss 5.6174   LearningRate 0.0035   Epoch: 32   Global Step: 164690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:24:06,399-Speed 10398.99 samples/sec   Loss 5.6743   LearningRate 0.0035   Epoch: 32   Global Step: 164700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:24:07,403-Speed 10198.92 samples/sec   Loss 5.7531   LearningRate 0.0035   Epoch: 32   Global Step: 164710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:24:08,436-Speed 9924.01 samples/sec   Loss 5.6889   LearningRate 0.0035   Epoch: 32   Global Step: 164720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:24:09,444-Speed 10172.21 samples/sec   Loss 5.6277   LearningRate 0.0035   Epoch: 32   Global Step: 164730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:24:10,457-Speed 10119.11 samples/sec   Loss 5.5805   LearningRate 0.0035   Epoch: 32   Global Step: 164740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:24:11,473-Speed 10091.15 samples/sec   Loss 5.6165   LearningRate 0.0034   Epoch: 32   Global Step: 164750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:24:12,472-Speed 10259.70 samples/sec   Loss 5.6948   LearningRate 0.0034   Epoch: 32   Global Step: 164760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:13,461-Speed 10367.70 samples/sec   Loss 5.7901   LearningRate 0.0034   Epoch: 32   Global Step: 164770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:14,492-Speed 9938.77 samples/sec   Loss 5.6290   LearningRate 0.0034   Epoch: 32   Global Step: 164780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:15,466-Speed 10538.68 samples/sec   Loss 5.8456   LearningRate 0.0034   Epoch: 32   Global Step: 164790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:16,462-Speed 10291.60 samples/sec   Loss 5.7541   LearningRate 0.0034   Epoch: 32   Global Step: 164800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:17,477-Speed 10100.98 samples/sec   Loss 5.7172   LearningRate 0.0034   Epoch: 32   Global Step: 164810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:18,493-Speed 10087.78 samples/sec   Loss 5.5707   LearningRate 0.0034   Epoch: 32   Global Step: 164820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:19,460-Speed 10593.75 samples/sec   Loss 5.7775   LearningRate 0.0034   Epoch: 32   Global Step: 164830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:20,470-Speed 10144.72 samples/sec   Loss 5.7184   LearningRate 0.0034   Epoch: 32   Global Step: 164840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:21,514-Speed 9822.70 samples/sec   Loss 5.6536   LearningRate 0.0034   Epoch: 32   Global Step: 164850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:22,461-Speed 10819.91 samples/sec   Loss 5.7062   LearningRate 0.0034   Epoch: 32   Global Step: 164860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:23,441-Speed 10459.47 samples/sec   Loss 5.6868   LearningRate 0.0034   Epoch: 32   Global Step: 164870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:24,442-Speed 10238.50 samples/sec   Loss 5.7090   LearningRate 0.0034   Epoch: 32   Global Step: 164880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:25,431-Speed 10365.58 samples/sec   Loss 5.7490   LearningRate 0.0034   Epoch: 32   Global Step: 164890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:26,388-Speed 10703.63 samples/sec   Loss 5.7057   LearningRate 0.0034   Epoch: 32   Global Step: 164900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:27,381-Speed 10319.06 samples/sec   Loss 5.5068   LearningRate 0.0034   Epoch: 32   Global Step: 164910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:28,384-Speed 10222.27 samples/sec   Loss 5.6310   LearningRate 0.0034   Epoch: 32   Global Step: 164920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:29,391-Speed 10184.08 samples/sec   Loss 5.6314   LearningRate 0.0034   Epoch: 32   Global Step: 164930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:30,388-Speed 10278.92 samples/sec   Loss 5.5463   LearningRate 0.0034   Epoch: 32   Global Step: 164940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:31,405-Speed 10078.15 samples/sec   Loss 5.5853   LearningRate 0.0034   Epoch: 32   Global Step: 164950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:32,436-Speed 9943.81 samples/sec   Loss 5.7580   LearningRate 0.0034   Epoch: 32   Global Step: 164960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:33,431-Speed 10297.05 samples/sec   Loss 5.6532   LearningRate 0.0034   Epoch: 32   Global Step: 164970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:34,441-Speed 10145.44 samples/sec   Loss 5.7094   LearningRate 0.0034   Epoch: 32   Global Step: 164980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:35,422-Speed 10447.60 samples/sec   Loss 5.6081   LearningRate 0.0034   Epoch: 32   Global Step: 164990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:36,405-Speed 10432.64 samples/sec   Loss 5.6373   LearningRate 0.0034   Epoch: 32   Global Step: 165000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:37,396-Speed 10335.18 samples/sec   Loss 5.6954   LearningRate 0.0034   Epoch: 32   Global Step: 165010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:38,370-Speed 10534.13 samples/sec   Loss 5.8118   LearningRate 0.0034   Epoch: 32   Global Step: 165020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:39,314-Speed 10859.17 samples/sec   Loss 5.8455   LearningRate 0.0034   Epoch: 32   Global Step: 165030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:24:40,354-Speed 9850.30 samples/sec   Loss 5.5548   LearningRate 0.0034   Epoch: 32   Global Step: 165040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:24:41,362-Speed 10167.11 samples/sec   Loss 5.6423   LearningRate 0.0034   Epoch: 32   Global Step: 165050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:24:42,344-Speed 10444.36 samples/sec   Loss 5.7292   LearningRate 0.0034   Epoch: 32   Global Step: 165060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:24:43,364-Speed 10039.08 samples/sec   Loss 5.7586   LearningRate 0.0034   Epoch: 32   Global Step: 165070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:24:44,383-Speed 10063.55 samples/sec   Loss 5.6637   LearningRate 0.0034   Epoch: 32   Global Step: 165080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:24:45,328-Speed 10862.90 samples/sec   Loss 5.6623   LearningRate 0.0034   Epoch: 32   Global Step: 165090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:24:46,290-Speed 10653.62 samples/sec   Loss 5.5916   LearningRate 0.0034   Epoch: 32   Global Step: 165100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:24:47,290-Speed 10253.81 samples/sec   Loss 5.7701   LearningRate 0.0034   Epoch: 32   Global Step: 165110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:24:48,298-Speed 10167.31 samples/sec   Loss 5.7165   LearningRate 0.0034   Epoch: 32   Global Step: 165120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:24:49,301-Speed 10213.30 samples/sec   Loss 5.7135   LearningRate 0.0034   Epoch: 32   Global Step: 165130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:50,320-Speed 10064.10 samples/sec   Loss 5.7181   LearningRate 0.0034   Epoch: 32   Global Step: 165140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:51,307-Speed 10378.85 samples/sec   Loss 5.6705   LearningRate 0.0034   Epoch: 32   Global Step: 165150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:52,314-Speed 10186.14 samples/sec   Loss 5.7586   LearningRate 0.0034   Epoch: 32   Global Step: 165160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:53,315-Speed 10231.69 samples/sec   Loss 5.6028   LearningRate 0.0034   Epoch: 32   Global Step: 165170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:54,318-Speed 10221.80 samples/sec   Loss 5.7056   LearningRate 0.0034   Epoch: 32   Global Step: 165180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:55,312-Speed 10308.69 samples/sec   Loss 5.7463   LearningRate 0.0034   Epoch: 32   Global Step: 165190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:56,286-Speed 10515.66 samples/sec   Loss 5.7438   LearningRate 0.0034   Epoch: 32   Global Step: 165200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:57,282-Speed 10286.85 samples/sec   Loss 5.6926   LearningRate 0.0034   Epoch: 32   Global Step: 165210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:58,214-Speed 11011.25 samples/sec   Loss 5.7616   LearningRate 0.0034   Epoch: 32   Global Step: 165220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:24:59,194-Speed 10451.39 samples/sec   Loss 5.6988   LearningRate 0.0034   Epoch: 32   Global Step: 165230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:00,291-Speed 9341.90 samples/sec   Loss 5.6205   LearningRate 0.0034   Epoch: 32   Global Step: 165240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:01,281-Speed 10355.07 samples/sec   Loss 5.6694   LearningRate 0.0034   Epoch: 32   Global Step: 165250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:02,284-Speed 10214.40 samples/sec   Loss 5.7626   LearningRate 0.0034   Epoch: 32   Global Step: 165260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:03,261-Speed 10486.56 samples/sec   Loss 5.6433   LearningRate 0.0034   Epoch: 32   Global Step: 165270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:04,296-Speed 9905.95 samples/sec   Loss 5.7779   LearningRate 0.0034   Epoch: 32   Global Step: 165280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:05,376-Speed 9483.01 samples/sec   Loss 5.7051   LearningRate 0.0033   Epoch: 32   Global Step: 165290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:06,344-Speed 10589.25 samples/sec   Loss 5.6925   LearningRate 0.0033   Epoch: 32   Global Step: 165300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:07,377-Speed 9924.12 samples/sec   Loss 5.7917   LearningRate 0.0033   Epoch: 32   Global Step: 165310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:08,378-Speed 10234.08 samples/sec   Loss 5.7046   LearningRate 0.0033   Epoch: 32   Global Step: 165320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:09,347-Speed 10574.84 samples/sec   Loss 5.7434   LearningRate 0.0033   Epoch: 32   Global Step: 165330   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 05:25:10,345-Speed 10276.47 samples/sec   Loss 5.6301   LearningRate 0.0033   Epoch: 32   Global Step: 165340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:11,289-Speed 10855.84 samples/sec   Loss 5.6547   LearningRate 0.0033   Epoch: 32   Global Step: 165350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:12,278-Speed 10368.61 samples/sec   Loss 5.6577   LearningRate 0.0033   Epoch: 32   Global Step: 165360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:13,276-Speed 10267.99 samples/sec   Loss 5.5996   LearningRate 0.0033   Epoch: 32   Global Step: 165370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:14,269-Speed 10318.75 samples/sec   Loss 5.7377   LearningRate 0.0033   Epoch: 32   Global Step: 165380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:15,251-Speed 10442.48 samples/sec   Loss 5.7013   LearningRate 0.0033   Epoch: 32   Global Step: 165390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:16,281-Speed 9953.58 samples/sec   Loss 5.6139   LearningRate 0.0033   Epoch: 32   Global Step: 165400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:17,274-Speed 10314.53 samples/sec   Loss 5.7287   LearningRate 0.0033   Epoch: 32   Global Step: 165410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:18,294-Speed 10050.97 samples/sec   Loss 5.7001   LearningRate 0.0033   Epoch: 32   Global Step: 165420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:19,264-Speed 10567.30 samples/sec   Loss 5.6851   LearningRate 0.0033   Epoch: 32   Global Step: 165430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:20,272-Speed 10165.86 samples/sec   Loss 5.5659   LearningRate 0.0033   Epoch: 32   Global Step: 165440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:21,320-Speed 9791.25 samples/sec   Loss 5.7337   LearningRate 0.0033   Epoch: 32   Global Step: 165450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:22,309-Speed 10357.36 samples/sec   Loss 5.6762   LearningRate 0.0033   Epoch: 32   Global Step: 165460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:23,272-Speed 10640.28 samples/sec   Loss 5.6858   LearningRate 0.0033   Epoch: 32   Global Step: 165470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:24,263-Speed 10340.30 samples/sec   Loss 5.8217   LearningRate 0.0033   Epoch: 32   Global Step: 165480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:25,241-Speed 10481.03 samples/sec   Loss 5.6382   LearningRate 0.0033   Epoch: 32   Global Step: 165490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:26,229-Speed 10373.80 samples/sec   Loss 5.7281   LearningRate 0.0033   Epoch: 32   Global Step: 165500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:27,234-Speed 10193.09 samples/sec   Loss 5.8097   LearningRate 0.0033   Epoch: 32   Global Step: 165510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:28,222-Speed 10381.87 samples/sec   Loss 5.6778   LearningRate 0.0033   Epoch: 32   Global Step: 165520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:29,195-Speed 10537.61 samples/sec   Loss 5.5925   LearningRate 0.0033   Epoch: 32   Global Step: 165530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:30,227-Speed 9927.06 samples/sec   Loss 5.6359   LearningRate 0.0033   Epoch: 32   Global Step: 165540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:31,235-Speed 10169.35 samples/sec   Loss 5.7282   LearningRate 0.0033   Epoch: 32   Global Step: 165550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:32,224-Speed 10364.75 samples/sec   Loss 5.6904   LearningRate 0.0033   Epoch: 32   Global Step: 165560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:33,193-Speed 10581.37 samples/sec   Loss 5.7740   LearningRate 0.0033   Epoch: 32   Global Step: 165570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:34,313-Speed 9144.67 samples/sec   Loss 5.8170   LearningRate 0.0033   Epoch: 32   Global Step: 165580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:35,310-Speed 10281.80 samples/sec   Loss 5.8461   LearningRate 0.0033   Epoch: 32   Global Step: 165590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:36,250-Speed 10906.51 samples/sec   Loss 5.6459   LearningRate 0.0033   Epoch: 32   Global Step: 165600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:37,237-Speed 10381.79 samples/sec   Loss 5.7418   LearningRate 0.0033   Epoch: 32   Global Step: 165610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:38,272-Speed 9897.89 samples/sec   Loss 5.5385   LearningRate 0.0033   Epoch: 32   Global Step: 165620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:39,264-Speed 10341.64 samples/sec   Loss 5.7129   LearningRate 0.0033   Epoch: 32   Global Step: 165630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:40,258-Speed 10305.61 samples/sec   Loss 5.5276   LearningRate 0.0033   Epoch: 32   Global Step: 165640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:41,244-Speed 10397.98 samples/sec   Loss 5.6905   LearningRate 0.0033   Epoch: 32   Global Step: 165650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:42,280-Speed 9887.08 samples/sec   Loss 5.8014   LearningRate 0.0033   Epoch: 32   Global Step: 165660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:43,272-Speed 10330.65 samples/sec   Loss 5.6865   LearningRate 0.0033   Epoch: 32   Global Step: 165670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:44,231-Speed 10689.22 samples/sec   Loss 5.6550   LearningRate 0.0033   Epoch: 32   Global Step: 165680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:45,172-Speed 10893.97 samples/sec   Loss 5.7677   LearningRate 0.0033   Epoch: 32   Global Step: 165690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:25:46,163-Speed 10340.36 samples/sec   Loss 5.7192   LearningRate 0.0033   Epoch: 32   Global Step: 165700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:25:47,134-Speed 10554.51 samples/sec   Loss 5.7066   LearningRate 0.0033   Epoch: 32   Global Step: 165710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:25:48,096-Speed 10661.65 samples/sec   Loss 5.9491   LearningRate 0.0033   Epoch: 32   Global Step: 165720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:25:49,047-Speed 10769.87 samples/sec   Loss 5.6132   LearningRate 0.0033   Epoch: 32   Global Step: 165730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:25:50,039-Speed 10336.42 samples/sec   Loss 5.8047   LearningRate 0.0033   Epoch: 32   Global Step: 165740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:25:51,018-Speed 10472.96 samples/sec   Loss 5.6858   LearningRate 0.0033   Epoch: 32   Global Step: 165750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:25:52,043-Speed 10002.60 samples/sec   Loss 5.6551   LearningRate 0.0033   Epoch: 32   Global Step: 165760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:25:53,087-Speed 9817.47 samples/sec   Loss 5.8506   LearningRate 0.0033   Epoch: 32   Global Step: 165770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:25:54,138-Speed 9744.19 samples/sec   Loss 5.6087   LearningRate 0.0033   Epoch: 32   Global Step: 165780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:25:55,088-Speed 10800.34 samples/sec   Loss 5.8600   LearningRate 0.0033   Epoch: 32   Global Step: 165790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:56,082-Speed 10306.49 samples/sec   Loss 5.7066   LearningRate 0.0033   Epoch: 32   Global Step: 165800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:57,040-Speed 10696.36 samples/sec   Loss 5.6523   LearningRate 0.0033   Epoch: 32   Global Step: 165810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:58,028-Speed 10387.06 samples/sec   Loss 5.6679   LearningRate 0.0033   Epoch: 32   Global Step: 165820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:59,009-Speed 10453.71 samples/sec   Loss 5.6691   LearningRate 0.0033   Epoch: 32   Global Step: 165830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:25:59,976-Speed 10599.09 samples/sec   Loss 5.7202   LearningRate 0.0033   Epoch: 32   Global Step: 165840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:26:01,012-Speed 9890.48 samples/sec   Loss 5.5861   LearningRate 0.0032   Epoch: 32   Global Step: 165850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:26:01,994-Speed 10437.75 samples/sec   Loss 5.9043   LearningRate 0.0032   Epoch: 32   Global Step: 165860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:26:02,970-Speed 10499.41 samples/sec   Loss 5.6043   LearningRate 0.0032   Epoch: 32   Global Step: 165870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:26:04,008-Speed 9882.05 samples/sec   Loss 5.6473   LearningRate 0.0032   Epoch: 32   Global Step: 165880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:26:05,026-Speed 10064.12 samples/sec   Loss 5.7765   LearningRate 0.0032   Epoch: 32   Global Step: 165890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:26:06,007-Speed 10455.16 samples/sec   Loss 5.7121   LearningRate 0.0032   Epoch: 32   Global Step: 165900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:26:07,003-Speed 10284.96 samples/sec   Loss 5.6757   LearningRate 0.0032   Epoch: 32   Global Step: 165910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:26:08,025-Speed 10024.68 samples/sec   Loss 5.6893   LearningRate 0.0032   Epoch: 32   Global Step: 165920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:26:09,033-Speed 10166.12 samples/sec   Loss 5.6809   LearningRate 0.0032   Epoch: 32   Global Step: 165930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:26:10,040-Speed 10178.67 samples/sec   Loss 5.7061   LearningRate 0.0032   Epoch: 32   Global Step: 165940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:26:11,046-Speed 10183.40 samples/sec   Loss 5.6111   LearningRate 0.0032   Epoch: 32   Global Step: 165950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:26:12,093-Speed 10000.95 samples/sec   Loss 5.6921   LearningRate 0.0032   Epoch: 32   Global Step: 165960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:26:13,103-Speed 10156.21 samples/sec   Loss 5.6399   LearningRate 0.0032   Epoch: 32   Global Step: 165970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:26:14,109-Speed 10189.74 samples/sec   Loss 5.7896   LearningRate 0.0032   Epoch: 32   Global Step: 165980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:26:15,192-Speed 9459.24 samples/sec   Loss 5.8196   LearningRate 0.0032   Epoch: 32   Global Step: 165990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:26:16,208-Speed 10085.86 samples/sec   Loss 5.6464   LearningRate 0.0032   Epoch: 32   Global Step: 166000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:26:38,544-[lfw][166000]XNorm: 8.237125
Training: 2022-04-11 05:26:38,545-[lfw][166000]Accuracy-Flip: 0.99617+-0.00334
Training: 2022-04-11 05:26:38,545-[lfw][166000]Accuracy-Highest: 0.99700
Training: 2022-04-11 05:27:04,171-[cfp_fp][166000]XNorm: 7.136419
Training: 2022-04-11 05:27:04,172-[cfp_fp][166000]Accuracy-Flip: 0.97100+-0.00836
Training: 2022-04-11 05:27:04,172-[cfp_fp][166000]Accuracy-Highest: 0.97200
Training: 2022-04-11 05:27:26,360-[agedb_30][166000]XNorm: 8.045265
Training: 2022-04-11 05:27:26,360-[agedb_30][166000]Accuracy-Flip: 0.97017+-0.00621
Training: 2022-04-11 05:27:26,361-[agedb_30][166000]Accuracy-Highest: 0.97300
Training: 2022-04-11 05:27:27,315-Speed 144.01 samples/sec   Loss 5.7145   LearningRate 0.0032   Epoch: 32   Global Step: 166010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:27:28,301-Speed 10386.20 samples/sec   Loss 5.7525   LearningRate 0.0032   Epoch: 32   Global Step: 166020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:27:29,275-Speed 10525.86 samples/sec   Loss 5.7585   LearningRate 0.0032   Epoch: 32   Global Step: 166030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:27:30,281-Speed 10195.45 samples/sec   Loss 5.6683   LearningRate 0.0032   Epoch: 32   Global Step: 166040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:27:31,276-Speed 10295.85 samples/sec   Loss 5.8015   LearningRate 0.0032   Epoch: 32   Global Step: 166050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:27:32,314-Speed 9880.14 samples/sec   Loss 5.8497   LearningRate 0.0032   Epoch: 32   Global Step: 166060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:27:33,257-Speed 10859.41 samples/sec   Loss 5.7231   LearningRate 0.0032   Epoch: 32   Global Step: 166070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:27:34,253-Speed 10290.56 samples/sec   Loss 5.7895   LearningRate 0.0032   Epoch: 32   Global Step: 166080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:27:35,266-Speed 10120.00 samples/sec   Loss 5.7452   LearningRate 0.0032   Epoch: 32   Global Step: 166090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:27:36,235-Speed 10570.75 samples/sec   Loss 5.7842   LearningRate 0.0032   Epoch: 32   Global Step: 166100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:27:37,239-Speed 10214.64 samples/sec   Loss 5.7638   LearningRate 0.0032   Epoch: 32   Global Step: 166110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:27:38,206-Speed 10599.17 samples/sec   Loss 5.8065   LearningRate 0.0032   Epoch: 32   Global Step: 166120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:27:39,183-Speed 10497.37 samples/sec   Loss 5.7585   LearningRate 0.0032   Epoch: 32   Global Step: 166130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:27:40,238-Speed 9706.04 samples/sec   Loss 5.8318   LearningRate 0.0032   Epoch: 32   Global Step: 166140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:27:41,243-Speed 10199.65 samples/sec   Loss 5.6665   LearningRate 0.0032   Epoch: 32   Global Step: 166150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:27:42,231-Speed 10373.04 samples/sec   Loss 5.6915   LearningRate 0.0032   Epoch: 32   Global Step: 166160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:27:43,242-Speed 10133.23 samples/sec   Loss 5.6177   LearningRate 0.0032   Epoch: 32   Global Step: 166170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:27:44,244-Speed 10231.54 samples/sec   Loss 5.8291   LearningRate 0.0032   Epoch: 32   Global Step: 166180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:27:45,203-Speed 10684.24 samples/sec   Loss 5.7607   LearningRate 0.0032   Epoch: 32   Global Step: 166190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:27:46,164-Speed 10668.72 samples/sec   Loss 5.7346   LearningRate 0.0032   Epoch: 32   Global Step: 166200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:27:47,160-Speed 10288.13 samples/sec   Loss 5.6995   LearningRate 0.0032   Epoch: 32   Global Step: 166210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:27:48,145-Speed 10404.81 samples/sec   Loss 5.6952   LearningRate 0.0032   Epoch: 32   Global Step: 166220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:27:49,157-Speed 10124.92 samples/sec   Loss 5.7140   LearningRate 0.0032   Epoch: 32   Global Step: 166230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:27:50,177-Speed 10046.64 samples/sec   Loss 5.6748   LearningRate 0.0032   Epoch: 32   Global Step: 166240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:27:51,209-Speed 9937.21 samples/sec   Loss 5.7491   LearningRate 0.0032   Epoch: 32   Global Step: 166250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:27:52,222-Speed 10115.97 samples/sec   Loss 5.9127   LearningRate 0.0032   Epoch: 32   Global Step: 166260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:27:53,209-Speed 10390.18 samples/sec   Loss 5.8036   LearningRate 0.0032   Epoch: 32   Global Step: 166270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:27:54,185-Speed 10493.85 samples/sec   Loss 5.8448   LearningRate 0.0032   Epoch: 32   Global Step: 166280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:27:55,220-Speed 9904.57 samples/sec   Loss 5.5735   LearningRate 0.0032   Epoch: 32   Global Step: 166290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:27:56,237-Speed 10079.96 samples/sec   Loss 5.8758   LearningRate 0.0032   Epoch: 32   Global Step: 166300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:27:57,213-Speed 10493.13 samples/sec   Loss 5.7642   LearningRate 0.0032   Epoch: 32   Global Step: 166310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:27:58,195-Speed 10440.03 samples/sec   Loss 5.6902   LearningRate 0.0032   Epoch: 32   Global Step: 166320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:27:59,210-Speed 10094.88 samples/sec   Loss 5.6890   LearningRate 0.0032   Epoch: 32   Global Step: 166330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:28:00,200-Speed 10347.30 samples/sec   Loss 5.7504   LearningRate 0.0032   Epoch: 32   Global Step: 166340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:28:01,165-Speed 10631.79 samples/sec   Loss 5.8174   LearningRate 0.0032   Epoch: 32   Global Step: 166350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:28:02,183-Speed 10067.89 samples/sec   Loss 5.7429   LearningRate 0.0032   Epoch: 32   Global Step: 166360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:28:03,213-Speed 9951.13 samples/sec   Loss 5.7656   LearningRate 0.0032   Epoch: 32   Global Step: 166370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:28:04,201-Speed 10371.84 samples/sec   Loss 5.7414   LearningRate 0.0032   Epoch: 32   Global Step: 166380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:28:05,199-Speed 10267.33 samples/sec   Loss 5.6611   LearningRate 0.0032   Epoch: 32   Global Step: 166390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:28:06,195-Speed 10285.94 samples/sec   Loss 5.6943   LearningRate 0.0032   Epoch: 32   Global Step: 166400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:28:07,221-Speed 9998.62 samples/sec   Loss 5.7069   LearningRate 0.0032   Epoch: 32   Global Step: 166410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:28:08,198-Speed 10485.41 samples/sec   Loss 5.7704   LearningRate 0.0031   Epoch: 32   Global Step: 166420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:28:09,174-Speed 10496.00 samples/sec   Loss 5.6385   LearningRate 0.0031   Epoch: 32   Global Step: 166430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:28:10,166-Speed 10332.59 samples/sec   Loss 5.8378   LearningRate 0.0031   Epoch: 32   Global Step: 166440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:28:11,164-Speed 10270.85 samples/sec   Loss 5.7662   LearningRate 0.0031   Epoch: 32   Global Step: 166450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:28:12,156-Speed 10334.02 samples/sec   Loss 5.7539   LearningRate 0.0031   Epoch: 32   Global Step: 166460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:28:13,144-Speed 10383.04 samples/sec   Loss 5.7063   LearningRate 0.0031   Epoch: 32   Global Step: 166470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:28:14,164-Speed 10054.10 samples/sec   Loss 5.6778   LearningRate 0.0031   Epoch: 32   Global Step: 166480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:28:15,197-Speed 9927.61 samples/sec   Loss 5.7741   LearningRate 0.0031   Epoch: 32   Global Step: 166490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:28:16,174-Speed 10485.94 samples/sec   Loss 5.7675   LearningRate 0.0031   Epoch: 32   Global Step: 166500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:28:17,169-Speed 10298.35 samples/sec   Loss 5.7408   LearningRate 0.0031   Epoch: 32   Global Step: 166510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:28:18,182-Speed 10124.65 samples/sec   Loss 5.7304   LearningRate 0.0031   Epoch: 32   Global Step: 166520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:28:19,189-Speed 10174.16 samples/sec   Loss 5.6310   LearningRate 0.0031   Epoch: 32   Global Step: 166530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:28:20,163-Speed 10524.50 samples/sec   Loss 5.7039   LearningRate 0.0031   Epoch: 32   Global Step: 166540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:28:21,169-Speed 10187.79 samples/sec   Loss 5.6524   LearningRate 0.0031   Epoch: 32   Global Step: 166550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:28:22,189-Speed 10045.42 samples/sec   Loss 5.6506   LearningRate 0.0031   Epoch: 32   Global Step: 166560   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 05:28:23,182-Speed 10323.42 samples/sec   Loss 5.7313   LearningRate 0.0031   Epoch: 32   Global Step: 166570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:28:24,123-Speed 10894.17 samples/sec   Loss 5.7364   LearningRate 0.0031   Epoch: 32   Global Step: 166580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:28:25,094-Speed 10547.08 samples/sec   Loss 5.6792   LearningRate 0.0031   Epoch: 32   Global Step: 166590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:28:26,082-Speed 10378.90 samples/sec   Loss 5.7621   LearningRate 0.0031   Epoch: 32   Global Step: 166600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:28:27,112-Speed 9953.50 samples/sec   Loss 5.7183   LearningRate 0.0031   Epoch: 32   Global Step: 166610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:28:28,109-Speed 10281.57 samples/sec   Loss 5.7566   LearningRate 0.0031   Epoch: 32   Global Step: 166620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:28:29,076-Speed 10595.56 samples/sec   Loss 5.8620   LearningRate 0.0031   Epoch: 32   Global Step: 166630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:28:30,106-Speed 9949.49 samples/sec   Loss 5.8314   LearningRate 0.0031   Epoch: 32   Global Step: 166640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:28:31,122-Speed 10087.47 samples/sec   Loss 5.8363   LearningRate 0.0031   Epoch: 32   Global Step: 166650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:28:32,099-Speed 10497.09 samples/sec   Loss 5.7202   LearningRate 0.0031   Epoch: 32   Global Step: 166660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:28:33,080-Speed 10438.46 samples/sec   Loss 5.6402   LearningRate 0.0031   Epoch: 32   Global Step: 166670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:28:34,240-Speed 8837.61 samples/sec   Loss 5.7051   LearningRate 0.0031   Epoch: 32   Global Step: 166680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:28:35,216-Speed 10509.41 samples/sec   Loss 5.6924   LearningRate 0.0031   Epoch: 32   Global Step: 166690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:28:36,202-Speed 10391.41 samples/sec   Loss 5.7579   LearningRate 0.0031   Epoch: 32   Global Step: 166700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:28:37,154-Speed 10770.71 samples/sec   Loss 5.6908   LearningRate 0.0031   Epoch: 32   Global Step: 166710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:28:38,214-Speed 9670.65 samples/sec   Loss 5.7360   LearningRate 0.0031   Epoch: 32   Global Step: 166720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:28:39,188-Speed 10530.28 samples/sec   Loss 5.7621   LearningRate 0.0031   Epoch: 32   Global Step: 166730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:28:40,214-Speed 9984.11 samples/sec   Loss 5.5920   LearningRate 0.0031   Epoch: 32   Global Step: 166740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:28:41,320-Speed 9272.56 samples/sec   Loss 5.8236   LearningRate 0.0031   Epoch: 32   Global Step: 166750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:28:42,291-Speed 10545.74 samples/sec   Loss 5.8080   LearningRate 0.0031   Epoch: 32   Global Step: 166760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:28:43,281-Speed 10351.78 samples/sec   Loss 5.6906   LearningRate 0.0031   Epoch: 32   Global Step: 166770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:28:44,268-Speed 10391.00 samples/sec   Loss 5.7397   LearningRate 0.0031   Epoch: 32   Global Step: 166780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:28:45,337-Speed 9586.26 samples/sec   Loss 5.7886   LearningRate 0.0031   Epoch: 32   Global Step: 166790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:28:46,308-Speed 10553.19 samples/sec   Loss 5.9363   LearningRate 0.0031   Epoch: 32   Global Step: 166800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:28:47,265-Speed 10709.19 samples/sec   Loss 5.8862   LearningRate 0.0031   Epoch: 32   Global Step: 166810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:28:48,277-Speed 10126.22 samples/sec   Loss 5.7765   LearningRate 0.0031   Epoch: 32   Global Step: 166820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:28:49,229-Speed 10772.52 samples/sec   Loss 5.7779   LearningRate 0.0031   Epoch: 32   Global Step: 166830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:28:50,263-Speed 9907.67 samples/sec   Loss 5.6436   LearningRate 0.0031   Epoch: 32   Global Step: 166840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:28:51,307-Speed 9822.07 samples/sec   Loss 5.8107   LearningRate 0.0031   Epoch: 32   Global Step: 166850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:28:52,300-Speed 10321.75 samples/sec   Loss 5.7633   LearningRate 0.0031   Epoch: 32   Global Step: 166860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:28:53,304-Speed 10213.80 samples/sec   Loss 5.7179   LearningRate 0.0031   Epoch: 32   Global Step: 166870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:28:54,348-Speed 9813.03 samples/sec   Loss 5.8606   LearningRate 0.0031   Epoch: 32   Global Step: 166880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:28:55,364-Speed 10087.95 samples/sec   Loss 5.6281   LearningRate 0.0031   Epoch: 32   Global Step: 166890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:28:56,333-Speed 10585.31 samples/sec   Loss 5.7931   LearningRate 0.0031   Epoch: 32   Global Step: 166900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:28:57,389-Speed 9706.60 samples/sec   Loss 5.7034   LearningRate 0.0031   Epoch: 32   Global Step: 166910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:07,103-Speed 1054.24 samples/sec   Loss 5.4280   LearningRate 0.0031   Epoch: 33   Global Step: 166920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:08,175-Speed 9569.12 samples/sec   Loss 5.3004   LearningRate 0.0031   Epoch: 33   Global Step: 166930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:09,436-Speed 8124.20 samples/sec   Loss 5.1747   LearningRate 0.0031   Epoch: 33   Global Step: 166940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:10,446-Speed 10157.22 samples/sec   Loss 5.3753   LearningRate 0.0031   Epoch: 33   Global Step: 166950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:11,441-Speed 10297.66 samples/sec   Loss 5.0617   LearningRate 0.0031   Epoch: 33   Global Step: 166960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:12,458-Speed 10069.87 samples/sec   Loss 5.2761   LearningRate 0.0031   Epoch: 33   Global Step: 166970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:13,477-Speed 10062.75 samples/sec   Loss 5.3268   LearningRate 0.0031   Epoch: 33   Global Step: 166980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:14,573-Speed 9357.42 samples/sec   Loss 5.2441   LearningRate 0.0030   Epoch: 33   Global Step: 166990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:15,547-Speed 10517.90 samples/sec   Loss 5.3203   LearningRate 0.0030   Epoch: 33   Global Step: 167000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:16,552-Speed 10200.55 samples/sec   Loss 5.3402   LearningRate 0.0030   Epoch: 33   Global Step: 167010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:17,542-Speed 10354.48 samples/sec   Loss 5.3690   LearningRate 0.0030   Epoch: 33   Global Step: 167020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:18,511-Speed 10577.16 samples/sec   Loss 5.2134   LearningRate 0.0030   Epoch: 33   Global Step: 167030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:19,557-Speed 9808.75 samples/sec   Loss 5.2568   LearningRate 0.0030   Epoch: 33   Global Step: 167040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:20,575-Speed 10066.29 samples/sec   Loss 5.3962   LearningRate 0.0030   Epoch: 33   Global Step: 167050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:21,580-Speed 10201.84 samples/sec   Loss 5.3473   LearningRate 0.0030   Epoch: 33   Global Step: 167060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:22,558-Speed 10488.82 samples/sec   Loss 5.4365   LearningRate 0.0030   Epoch: 33   Global Step: 167070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:23,710-Speed 8895.69 samples/sec   Loss 5.2426   LearningRate 0.0030   Epoch: 33   Global Step: 167080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:24,731-Speed 10036.25 samples/sec   Loss 5.4169   LearningRate 0.0030   Epoch: 33   Global Step: 167090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:25,722-Speed 10348.37 samples/sec   Loss 5.3657   LearningRate 0.0030   Epoch: 33   Global Step: 167100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:26,715-Speed 10320.38 samples/sec   Loss 5.3716   LearningRate 0.0030   Epoch: 33   Global Step: 167110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:27,721-Speed 10189.93 samples/sec   Loss 5.2897   LearningRate 0.0030   Epoch: 33   Global Step: 167120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:28,715-Speed 10304.37 samples/sec   Loss 5.2524   LearningRate 0.0030   Epoch: 33   Global Step: 167130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:29:29,691-Speed 10502.44 samples/sec   Loss 5.3932   LearningRate 0.0030   Epoch: 33   Global Step: 167140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:29:30,682-Speed 10346.39 samples/sec   Loss 5.4411   LearningRate 0.0030   Epoch: 33   Global Step: 167150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:29:31,725-Speed 9828.60 samples/sec   Loss 5.3755   LearningRate 0.0030   Epoch: 33   Global Step: 167160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:29:32,730-Speed 10202.44 samples/sec   Loss 5.3332   LearningRate 0.0030   Epoch: 33   Global Step: 167170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:29:33,751-Speed 10042.81 samples/sec   Loss 5.3639   LearningRate 0.0030   Epoch: 33   Global Step: 167180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:29:34,718-Speed 10593.53 samples/sec   Loss 5.3557   LearningRate 0.0030   Epoch: 33   Global Step: 167190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:29:35,714-Speed 10292.89 samples/sec   Loss 5.3275   LearningRate 0.0030   Epoch: 33   Global Step: 167200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:29:36,692-Speed 10474.79 samples/sec   Loss 5.5176   LearningRate 0.0030   Epoch: 33   Global Step: 167210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:29:37,658-Speed 10616.40 samples/sec   Loss 5.2485   LearningRate 0.0030   Epoch: 33   Global Step: 167220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:29:38,641-Speed 10419.40 samples/sec   Loss 5.4350   LearningRate 0.0030   Epoch: 33   Global Step: 167230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:39,653-Speed 10131.00 samples/sec   Loss 5.4332   LearningRate 0.0030   Epoch: 33   Global Step: 167240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:40,654-Speed 10234.30 samples/sec   Loss 5.4224   LearningRate 0.0030   Epoch: 33   Global Step: 167250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:41,640-Speed 10396.53 samples/sec   Loss 5.3355   LearningRate 0.0030   Epoch: 33   Global Step: 167260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:42,629-Speed 10368.96 samples/sec   Loss 5.5006   LearningRate 0.0030   Epoch: 33   Global Step: 167270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:43,708-Speed 9493.73 samples/sec   Loss 5.3001   LearningRate 0.0030   Epoch: 33   Global Step: 167280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:44,719-Speed 10139.33 samples/sec   Loss 5.3917   LearningRate 0.0030   Epoch: 33   Global Step: 167290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:45,745-Speed 9986.13 samples/sec   Loss 5.4197   LearningRate 0.0030   Epoch: 33   Global Step: 167300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:46,719-Speed 10527.48 samples/sec   Loss 5.3269   LearningRate 0.0030   Epoch: 33   Global Step: 167310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:47,696-Speed 10487.13 samples/sec   Loss 5.3208   LearningRate 0.0030   Epoch: 33   Global Step: 167320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:48,671-Speed 10505.76 samples/sec   Loss 5.4296   LearningRate 0.0030   Epoch: 33   Global Step: 167330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:49,683-Speed 10133.39 samples/sec   Loss 5.3833   LearningRate 0.0030   Epoch: 33   Global Step: 167340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:50,658-Speed 10516.98 samples/sec   Loss 5.3550   LearningRate 0.0030   Epoch: 33   Global Step: 167350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:51,667-Speed 10152.35 samples/sec   Loss 5.2282   LearningRate 0.0030   Epoch: 33   Global Step: 167360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:52,666-Speed 10256.71 samples/sec   Loss 5.4430   LearningRate 0.0030   Epoch: 33   Global Step: 167370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:53,746-Speed 9493.44 samples/sec   Loss 5.3818   LearningRate 0.0030   Epoch: 33   Global Step: 167380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:54,786-Speed 9853.22 samples/sec   Loss 5.2571   LearningRate 0.0030   Epoch: 33   Global Step: 167390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:55,780-Speed 10309.88 samples/sec   Loss 5.5762   LearningRate 0.0030   Epoch: 33   Global Step: 167400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:56,783-Speed 10210.48 samples/sec   Loss 5.5245   LearningRate 0.0030   Epoch: 33   Global Step: 167410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:57,787-Speed 10217.20 samples/sec   Loss 5.2995   LearningRate 0.0030   Epoch: 33   Global Step: 167420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:58,880-Speed 9380.58 samples/sec   Loss 5.5481   LearningRate 0.0030   Epoch: 33   Global Step: 167430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:29:59,900-Speed 10046.33 samples/sec   Loss 5.2066   LearningRate 0.0030   Epoch: 33   Global Step: 167440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:01,058-Speed 8850.12 samples/sec   Loss 5.4137   LearningRate 0.0030   Epoch: 33   Global Step: 167450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:02,063-Speed 10203.03 samples/sec   Loss 5.4413   LearningRate 0.0030   Epoch: 33   Global Step: 167460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:03,029-Speed 10608.11 samples/sec   Loss 5.3553   LearningRate 0.0030   Epoch: 33   Global Step: 167470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:04,033-Speed 10210.75 samples/sec   Loss 5.3761   LearningRate 0.0030   Epoch: 33   Global Step: 167480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:05,042-Speed 10158.70 samples/sec   Loss 5.2796   LearningRate 0.0030   Epoch: 33   Global Step: 167490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:06,073-Speed 9935.62 samples/sec   Loss 5.3747   LearningRate 0.0030   Epoch: 33   Global Step: 167500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:07,051-Speed 10482.17 samples/sec   Loss 5.3268   LearningRate 0.0030   Epoch: 33   Global Step: 167510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:08,085-Speed 9908.06 samples/sec   Loss 5.3823   LearningRate 0.0030   Epoch: 33   Global Step: 167520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:09,081-Speed 10286.42 samples/sec   Loss 5.4768   LearningRate 0.0030   Epoch: 33   Global Step: 167530   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 05:30:10,132-Speed 9753.17 samples/sec   Loss 5.3263   LearningRate 0.0030   Epoch: 33   Global Step: 167540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:11,215-Speed 9469.85 samples/sec   Loss 5.3689   LearningRate 0.0030   Epoch: 33   Global Step: 167550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:12,186-Speed 10555.77 samples/sec   Loss 5.3363   LearningRate 0.0030   Epoch: 33   Global Step: 167560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:13,180-Speed 10310.01 samples/sec   Loss 5.3290   LearningRate 0.0030   Epoch: 33   Global Step: 167570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:14,266-Speed 9439.48 samples/sec   Loss 5.3631   LearningRate 0.0029   Epoch: 33   Global Step: 167580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:15,262-Speed 10297.85 samples/sec   Loss 5.3222   LearningRate 0.0029   Epoch: 33   Global Step: 167590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:16,249-Speed 10385.70 samples/sec   Loss 5.3992   LearningRate 0.0029   Epoch: 33   Global Step: 167600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:17,275-Speed 9981.75 samples/sec   Loss 5.5602   LearningRate 0.0029   Epoch: 33   Global Step: 167610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:18,259-Speed 10420.61 samples/sec   Loss 5.3311   LearningRate 0.0029   Epoch: 33   Global Step: 167620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:19,253-Speed 10317.99 samples/sec   Loss 5.4054   LearningRate 0.0029   Epoch: 33   Global Step: 167630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:20,255-Speed 10229.34 samples/sec   Loss 5.3333   LearningRate 0.0029   Epoch: 33   Global Step: 167640   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 05:30:21,296-Speed 9843.71 samples/sec   Loss 5.3100   LearningRate 0.0029   Epoch: 33   Global Step: 167650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:22,338-Speed 9833.91 samples/sec   Loss 5.4329   LearningRate 0.0029   Epoch: 33   Global Step: 167660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:23,359-Speed 10042.00 samples/sec   Loss 5.3792   LearningRate 0.0029   Epoch: 33   Global Step: 167670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:24,395-Speed 9888.21 samples/sec   Loss 5.2932   LearningRate 0.0029   Epoch: 33   Global Step: 167680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:25,414-Speed 10061.07 samples/sec   Loss 5.3425   LearningRate 0.0029   Epoch: 33   Global Step: 167690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:26,435-Speed 10031.04 samples/sec   Loss 5.3426   LearningRate 0.0029   Epoch: 33   Global Step: 167700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:27,454-Speed 10063.99 samples/sec   Loss 5.5131   LearningRate 0.0029   Epoch: 33   Global Step: 167710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:28,453-Speed 10258.71 samples/sec   Loss 5.4612   LearningRate 0.0029   Epoch: 33   Global Step: 167720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:29,518-Speed 9619.62 samples/sec   Loss 5.4225   LearningRate 0.0029   Epoch: 33   Global Step: 167730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:30,531-Speed 10125.23 samples/sec   Loss 5.3437   LearningRate 0.0029   Epoch: 33   Global Step: 167740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:31,531-Speed 10245.00 samples/sec   Loss 5.4087   LearningRate 0.0029   Epoch: 33   Global Step: 167750   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 05:30:32,536-Speed 10192.48 samples/sec   Loss 5.4135   LearningRate 0.0029   Epoch: 33   Global Step: 167760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:33,525-Speed 10368.71 samples/sec   Loss 5.4719   LearningRate 0.0029   Epoch: 33   Global Step: 167770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:34,518-Speed 10329.80 samples/sec   Loss 5.4432   LearningRate 0.0029   Epoch: 33   Global Step: 167780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:35,518-Speed 10244.71 samples/sec   Loss 5.2204   LearningRate 0.0029   Epoch: 33   Global Step: 167790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:36,512-Speed 10307.54 samples/sec   Loss 5.3348   LearningRate 0.0029   Epoch: 33   Global Step: 167800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:37,559-Speed 9786.14 samples/sec   Loss 5.5657   LearningRate 0.0029   Epoch: 33   Global Step: 167810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:38,546-Speed 10391.48 samples/sec   Loss 5.4700   LearningRate 0.0029   Epoch: 33   Global Step: 167820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:39,533-Speed 10394.22 samples/sec   Loss 5.2372   LearningRate 0.0029   Epoch: 33   Global Step: 167830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:40,486-Speed 10759.91 samples/sec   Loss 5.3129   LearningRate 0.0029   Epoch: 33   Global Step: 167840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:41,499-Speed 10109.20 samples/sec   Loss 5.4293   LearningRate 0.0029   Epoch: 33   Global Step: 167850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:42,453-Speed 10750.60 samples/sec   Loss 5.2938   LearningRate 0.0029   Epoch: 33   Global Step: 167860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:43,453-Speed 10253.57 samples/sec   Loss 5.4318   LearningRate 0.0029   Epoch: 33   Global Step: 167870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:30:44,472-Speed 10057.09 samples/sec   Loss 5.5259   LearningRate 0.0029   Epoch: 33   Global Step: 167880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:30:45,497-Speed 9995.08 samples/sec   Loss 5.5088   LearningRate 0.0029   Epoch: 33   Global Step: 167890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:30:46,511-Speed 10111.98 samples/sec   Loss 5.3694   LearningRate 0.0029   Epoch: 33   Global Step: 167900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:30:47,503-Speed 10327.27 samples/sec   Loss 5.4109   LearningRate 0.0029   Epoch: 33   Global Step: 167910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:30:48,467-Speed 10636.29 samples/sec   Loss 5.2187   LearningRate 0.0029   Epoch: 33   Global Step: 167920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:30:49,481-Speed 10106.46 samples/sec   Loss 5.4891   LearningRate 0.0029   Epoch: 33   Global Step: 167930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:30:50,463-Speed 10441.15 samples/sec   Loss 5.4650   LearningRate 0.0029   Epoch: 33   Global Step: 167940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:30:51,478-Speed 10100.38 samples/sec   Loss 5.5111   LearningRate 0.0029   Epoch: 33   Global Step: 167950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:30:52,480-Speed 10218.12 samples/sec   Loss 5.4322   LearningRate 0.0029   Epoch: 33   Global Step: 167960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:30:53,440-Speed 10676.81 samples/sec   Loss 5.3815   LearningRate 0.0029   Epoch: 33   Global Step: 167970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:54,444-Speed 10209.66 samples/sec   Loss 5.4699   LearningRate 0.0029   Epoch: 33   Global Step: 167980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:55,418-Speed 10533.31 samples/sec   Loss 5.2500   LearningRate 0.0029   Epoch: 33   Global Step: 167990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:30:56,423-Speed 10188.16 samples/sec   Loss 5.5283   LearningRate 0.0029   Epoch: 33   Global Step: 168000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:31:18,837-[lfw][168000]XNorm: 8.201423
Training: 2022-04-11 05:31:18,837-[lfw][168000]Accuracy-Flip: 0.99633+-0.00323
Training: 2022-04-11 05:31:18,838-[lfw][168000]Accuracy-Highest: 0.99700
Training: 2022-04-11 05:31:44,898-[cfp_fp][168000]XNorm: 7.080536
Training: 2022-04-11 05:31:44,899-[cfp_fp][168000]Accuracy-Flip: 0.97043+-0.01000
Training: 2022-04-11 05:31:44,900-[cfp_fp][168000]Accuracy-Highest: 0.97200
Training: 2022-04-11 05:32:07,251-[agedb_30][168000]XNorm: 8.010437
Training: 2022-04-11 05:32:07,252-[agedb_30][168000]Accuracy-Flip: 0.97100+-0.00708
Training: 2022-04-11 05:32:07,253-[agedb_30][168000]Accuracy-Highest: 0.97300
Training: 2022-04-11 05:32:08,227-Speed 142.61 samples/sec   Loss 5.4818   LearningRate 0.0029   Epoch: 33   Global Step: 168010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:32:09,221-Speed 10305.26 samples/sec   Loss 5.6679   LearningRate 0.0029   Epoch: 33   Global Step: 168020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:32:10,282-Speed 9660.12 samples/sec   Loss 5.4451   LearningRate 0.0029   Epoch: 33   Global Step: 168030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:32:11,329-Speed 9789.56 samples/sec   Loss 5.5525   LearningRate 0.0029   Epoch: 33   Global Step: 168040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:32:12,314-Speed 10417.97 samples/sec   Loss 5.4306   LearningRate 0.0029   Epoch: 33   Global Step: 168050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:32:13,311-Speed 10277.23 samples/sec   Loss 5.4094   LearningRate 0.0029   Epoch: 33   Global Step: 168060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:32:14,328-Speed 10077.00 samples/sec   Loss 5.5246   LearningRate 0.0029   Epoch: 33   Global Step: 168070   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 05:32:15,318-Speed 10362.10 samples/sec   Loss 5.4076   LearningRate 0.0029   Epoch: 33   Global Step: 168080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:32:16,303-Speed 10404.00 samples/sec   Loss 5.4985   LearningRate 0.0029   Epoch: 33   Global Step: 168090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:32:17,302-Speed 10262.02 samples/sec   Loss 5.4252   LearningRate 0.0029   Epoch: 33   Global Step: 168100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:32:18,292-Speed 10349.60 samples/sec   Loss 5.5940   LearningRate 0.0029   Epoch: 33   Global Step: 168110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:32:19,295-Speed 10229.24 samples/sec   Loss 5.3941   LearningRate 0.0029   Epoch: 33   Global Step: 168120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:32:20,320-Speed 9993.56 samples/sec   Loss 5.5479   LearningRate 0.0029   Epoch: 33   Global Step: 168130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:32:21,302-Speed 10432.48 samples/sec   Loss 5.5698   LearningRate 0.0029   Epoch: 33   Global Step: 168140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:32:22,334-Speed 9933.95 samples/sec   Loss 5.4209   LearningRate 0.0029   Epoch: 33   Global Step: 168150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:32:23,338-Speed 10212.44 samples/sec   Loss 5.3608   LearningRate 0.0029   Epoch: 33   Global Step: 168160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:32:24,354-Speed 10081.93 samples/sec   Loss 5.4470   LearningRate 0.0028   Epoch: 33   Global Step: 168170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:32:25,360-Speed 10194.11 samples/sec   Loss 5.4727   LearningRate 0.0028   Epoch: 33   Global Step: 168180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:32:26,313-Speed 10749.22 samples/sec   Loss 5.4332   LearningRate 0.0028   Epoch: 33   Global Step: 168190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:32:27,303-Speed 10354.85 samples/sec   Loss 5.4061   LearningRate 0.0028   Epoch: 33   Global Step: 168200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:32:28,278-Speed 10518.40 samples/sec   Loss 5.3457   LearningRate 0.0028   Epoch: 33   Global Step: 168210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:32:29,268-Speed 10352.60 samples/sec   Loss 5.4951   LearningRate 0.0028   Epoch: 33   Global Step: 168220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:32:30,307-Speed 9853.88 samples/sec   Loss 5.4421   LearningRate 0.0028   Epoch: 33   Global Step: 168230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:32:31,317-Speed 10160.73 samples/sec   Loss 5.5353   LearningRate 0.0028   Epoch: 33   Global Step: 168240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:32:32,302-Speed 10405.50 samples/sec   Loss 5.3101   LearningRate 0.0028   Epoch: 33   Global Step: 168250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:32:33,300-Speed 10270.89 samples/sec   Loss 5.5115   LearningRate 0.0028   Epoch: 33   Global Step: 168260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:32:34,285-Speed 10401.63 samples/sec   Loss 5.3553   LearningRate 0.0028   Epoch: 33   Global Step: 168270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:32:35,277-Speed 10339.30 samples/sec   Loss 5.4187   LearningRate 0.0028   Epoch: 33   Global Step: 168280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:32:36,292-Speed 10088.90 samples/sec   Loss 5.5299   LearningRate 0.0028   Epoch: 33   Global Step: 168290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:32:37,256-Speed 10640.01 samples/sec   Loss 5.4542   LearningRate 0.0028   Epoch: 33   Global Step: 168300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:32:38,297-Speed 9848.64 samples/sec   Loss 5.3858   LearningRate 0.0028   Epoch: 33   Global Step: 168310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:32:39,293-Speed 10336.62 samples/sec   Loss 5.4353   LearningRate 0.0028   Epoch: 33   Global Step: 168320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:32:40,281-Speed 10374.83 samples/sec   Loss 5.4628   LearningRate 0.0028   Epoch: 33   Global Step: 168330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:32:41,318-Speed 9877.20 samples/sec   Loss 5.4950   LearningRate 0.0028   Epoch: 33   Global Step: 168340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:32:42,305-Speed 10401.73 samples/sec   Loss 5.6013   LearningRate 0.0028   Epoch: 33   Global Step: 168350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:32:43,286-Speed 10449.28 samples/sec   Loss 5.5507   LearningRate 0.0028   Epoch: 33   Global Step: 168360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:32:44,276-Speed 10351.93 samples/sec   Loss 5.3973   LearningRate 0.0028   Epoch: 33   Global Step: 168370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:32:45,268-Speed 10335.18 samples/sec   Loss 5.5205   LearningRate 0.0028   Epoch: 33   Global Step: 168380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:32:46,232-Speed 10634.40 samples/sec   Loss 5.4304   LearningRate 0.0028   Epoch: 33   Global Step: 168390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:32:47,262-Speed 9945.69 samples/sec   Loss 5.4672   LearningRate 0.0028   Epoch: 33   Global Step: 168400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:32:48,271-Speed 10153.82 samples/sec   Loss 5.6859   LearningRate 0.0028   Epoch: 33   Global Step: 168410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:32:49,311-Speed 9859.66 samples/sec   Loss 5.4075   LearningRate 0.0028   Epoch: 33   Global Step: 168420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:32:50,287-Speed 10509.60 samples/sec   Loss 5.4760   LearningRate 0.0028   Epoch: 33   Global Step: 168430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:32:51,298-Speed 10135.05 samples/sec   Loss 5.2923   LearningRate 0.0028   Epoch: 33   Global Step: 168440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:32:52,388-Speed 9402.70 samples/sec   Loss 5.4467   LearningRate 0.0028   Epoch: 33   Global Step: 168450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:32:53,366-Speed 10479.30 samples/sec   Loss 5.5006   LearningRate 0.0028   Epoch: 33   Global Step: 168460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:32:54,401-Speed 9900.61 samples/sec   Loss 5.4103   LearningRate 0.0028   Epoch: 33   Global Step: 168470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:32:55,480-Speed 9495.84 samples/sec   Loss 5.3403   LearningRate 0.0028   Epoch: 33   Global Step: 168480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:32:56,446-Speed 10615.98 samples/sec   Loss 5.4452   LearningRate 0.0028   Epoch: 33   Global Step: 168490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:32:57,441-Speed 10301.35 samples/sec   Loss 5.3536   LearningRate 0.0028   Epoch: 33   Global Step: 168500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:32:58,433-Speed 10323.40 samples/sec   Loss 5.5690   LearningRate 0.0028   Epoch: 33   Global Step: 168510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:32:59,443-Speed 10154.18 samples/sec   Loss 5.6993   LearningRate 0.0028   Epoch: 33   Global Step: 168520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:33:00,419-Speed 10499.11 samples/sec   Loss 5.3904   LearningRate 0.0028   Epoch: 33   Global Step: 168530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:33:01,369-Speed 10787.02 samples/sec   Loss 5.3825   LearningRate 0.0028   Epoch: 33   Global Step: 168540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:33:02,376-Speed 10179.17 samples/sec   Loss 5.4224   LearningRate 0.0028   Epoch: 33   Global Step: 168550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:33:03,432-Speed 9703.83 samples/sec   Loss 5.4988   LearningRate 0.0028   Epoch: 33   Global Step: 168560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:33:04,414-Speed 10442.82 samples/sec   Loss 5.5312   LearningRate 0.0028   Epoch: 33   Global Step: 168570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:33:05,390-Speed 10494.63 samples/sec   Loss 5.4297   LearningRate 0.0028   Epoch: 33   Global Step: 168580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:33:06,365-Speed 10515.49 samples/sec   Loss 5.4705   LearningRate 0.0028   Epoch: 33   Global Step: 168590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:33:07,375-Speed 10142.69 samples/sec   Loss 5.4219   LearningRate 0.0028   Epoch: 33   Global Step: 168600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:33:08,368-Speed 10324.55 samples/sec   Loss 5.4014   LearningRate 0.0028   Epoch: 33   Global Step: 168610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:33:09,408-Speed 9855.89 samples/sec   Loss 5.4211   LearningRate 0.0028   Epoch: 33   Global Step: 168620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:33:10,393-Speed 10398.85 samples/sec   Loss 5.5450   LearningRate 0.0028   Epoch: 33   Global Step: 168630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:33:11,410-Speed 10079.61 samples/sec   Loss 5.6749   LearningRate 0.0028   Epoch: 33   Global Step: 168640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:33:12,394-Speed 10449.30 samples/sec   Loss 5.5689   LearningRate 0.0028   Epoch: 33   Global Step: 168650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:33:13,438-Speed 9813.89 samples/sec   Loss 5.5197   LearningRate 0.0028   Epoch: 33   Global Step: 168660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:33:14,494-Speed 9702.70 samples/sec   Loss 5.4580   LearningRate 0.0028   Epoch: 33   Global Step: 168670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:33:15,481-Speed 10385.50 samples/sec   Loss 5.4350   LearningRate 0.0028   Epoch: 33   Global Step: 168680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:33:16,475-Speed 10305.97 samples/sec   Loss 5.4597   LearningRate 0.0028   Epoch: 33   Global Step: 168690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:33:17,491-Speed 10088.41 samples/sec   Loss 5.3245   LearningRate 0.0028   Epoch: 33   Global Step: 168700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:33:18,474-Speed 10430.92 samples/sec   Loss 5.5545   LearningRate 0.0028   Epoch: 33   Global Step: 168710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:33:19,481-Speed 10177.67 samples/sec   Loss 5.4107   LearningRate 0.0028   Epoch: 33   Global Step: 168720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:33:20,547-Speed 9618.70 samples/sec   Loss 5.3593   LearningRate 0.0028   Epoch: 33   Global Step: 168730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:33:21,518-Speed 10550.09 samples/sec   Loss 5.4094   LearningRate 0.0028   Epoch: 33   Global Step: 168740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:33:22,483-Speed 10621.72 samples/sec   Loss 5.4671   LearningRate 0.0028   Epoch: 33   Global Step: 168750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:33:23,478-Speed 10304.46 samples/sec   Loss 5.4014   LearningRate 0.0028   Epoch: 33   Global Step: 168760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:33:24,478-Speed 10243.56 samples/sec   Loss 5.4665   LearningRate 0.0027   Epoch: 33   Global Step: 168770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:33:25,496-Speed 10069.67 samples/sec   Loss 5.3868   LearningRate 0.0027   Epoch: 33   Global Step: 168780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:33:26,499-Speed 10222.16 samples/sec   Loss 5.5891   LearningRate 0.0027   Epoch: 33   Global Step: 168790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:33:27,471-Speed 10545.73 samples/sec   Loss 5.4765   LearningRate 0.0027   Epoch: 33   Global Step: 168800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:33:28,466-Speed 10297.65 samples/sec   Loss 5.4859   LearningRate 0.0027   Epoch: 33   Global Step: 168810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:33:29,518-Speed 9745.51 samples/sec   Loss 5.4661   LearningRate 0.0027   Epoch: 33   Global Step: 168820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:33:30,558-Speed 9861.27 samples/sec   Loss 5.5338   LearningRate 0.0027   Epoch: 33   Global Step: 168830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:33:31,550-Speed 10333.91 samples/sec   Loss 5.6469   LearningRate 0.0027   Epoch: 33   Global Step: 168840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:33:32,536-Speed 10394.14 samples/sec   Loss 5.4327   LearningRate 0.0027   Epoch: 33   Global Step: 168850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:33:33,562-Speed 9991.07 samples/sec   Loss 5.4380   LearningRate 0.0027   Epoch: 33   Global Step: 168860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:33:34,533-Speed 10554.34 samples/sec   Loss 5.4584   LearningRate 0.0027   Epoch: 33   Global Step: 168870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:33:35,559-Speed 10002.19 samples/sec   Loss 5.4797   LearningRate 0.0027   Epoch: 33   Global Step: 168880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:33:36,526-Speed 10591.26 samples/sec   Loss 5.5554   LearningRate 0.0027   Epoch: 33   Global Step: 168890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:33:37,537-Speed 10136.58 samples/sec   Loss 5.4856   LearningRate 0.0027   Epoch: 33   Global Step: 168900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:33:38,523-Speed 10392.72 samples/sec   Loss 5.5371   LearningRate 0.0027   Epoch: 33   Global Step: 168910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:33:39,533-Speed 10151.49 samples/sec   Loss 5.5290   LearningRate 0.0027   Epoch: 33   Global Step: 168920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:33:40,571-Speed 9879.67 samples/sec   Loss 5.4011   LearningRate 0.0027   Epoch: 33   Global Step: 168930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:33:41,543-Speed 10548.76 samples/sec   Loss 5.5842   LearningRate 0.0027   Epoch: 33   Global Step: 168940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:33:42,509-Speed 10610.57 samples/sec   Loss 5.4724   LearningRate 0.0027   Epoch: 33   Global Step: 168950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:33:43,508-Speed 10258.97 samples/sec   Loss 5.6384   LearningRate 0.0027   Epoch: 33   Global Step: 168960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:33:44,542-Speed 9906.79 samples/sec   Loss 5.4153   LearningRate 0.0027   Epoch: 33   Global Step: 168970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:33:45,514-Speed 10553.60 samples/sec   Loss 5.5244   LearningRate 0.0027   Epoch: 33   Global Step: 168980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:33:46,493-Speed 10465.60 samples/sec   Loss 5.4920   LearningRate 0.0027   Epoch: 33   Global Step: 168990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:33:47,470-Speed 10487.70 samples/sec   Loss 5.6181   LearningRate 0.0027   Epoch: 33   Global Step: 169000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:33:48,487-Speed 10075.71 samples/sec   Loss 5.4497   LearningRate 0.0027   Epoch: 33   Global Step: 169010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:33:49,532-Speed 9812.92 samples/sec   Loss 5.4012   LearningRate 0.0027   Epoch: 33   Global Step: 169020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:33:50,516-Speed 10408.60 samples/sec   Loss 5.5574   LearningRate 0.0027   Epoch: 33   Global Step: 169030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:33:51,513-Speed 10286.58 samples/sec   Loss 5.5698   LearningRate 0.0027   Epoch: 33   Global Step: 169040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:33:52,516-Speed 10209.95 samples/sec   Loss 5.4832   LearningRate 0.0027   Epoch: 33   Global Step: 169050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:33:53,532-Speed 10091.61 samples/sec   Loss 5.4166   LearningRate 0.0027   Epoch: 33   Global Step: 169060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:33:54,498-Speed 10609.72 samples/sec   Loss 5.4256   LearningRate 0.0027   Epoch: 33   Global Step: 169070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:33:55,465-Speed 10594.17 samples/sec   Loss 5.5113   LearningRate 0.0027   Epoch: 33   Global Step: 169080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:33:56,473-Speed 10164.59 samples/sec   Loss 5.5309   LearningRate 0.0027   Epoch: 33   Global Step: 169090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:33:57,455-Speed 10435.68 samples/sec   Loss 5.4394   LearningRate 0.0027   Epoch: 33   Global Step: 169100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:33:58,485-Speed 9949.23 samples/sec   Loss 5.5164   LearningRate 0.0027   Epoch: 33   Global Step: 169110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:33:59,532-Speed 9789.01 samples/sec   Loss 5.4346   LearningRate 0.0027   Epoch: 33   Global Step: 169120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:34:00,492-Speed 10677.93 samples/sec   Loss 5.4548   LearningRate 0.0027   Epoch: 33   Global Step: 169130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:34:01,508-Speed 10097.63 samples/sec   Loss 5.4988   LearningRate 0.0027   Epoch: 33   Global Step: 169140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:34:02,471-Speed 10645.81 samples/sec   Loss 5.3244   LearningRate 0.0027   Epoch: 33   Global Step: 169150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:34:03,471-Speed 10250.15 samples/sec   Loss 5.3937   LearningRate 0.0027   Epoch: 33   Global Step: 169160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:34:04,470-Speed 10253.65 samples/sec   Loss 5.5363   LearningRate 0.0027   Epoch: 33   Global Step: 169170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:34:05,466-Speed 10287.30 samples/sec   Loss 5.5721   LearningRate 0.0027   Epoch: 33   Global Step: 169180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:34:06,458-Speed 10333.43 samples/sec   Loss 5.5470   LearningRate 0.0027   Epoch: 33   Global Step: 169190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:34:07,488-Speed 9952.41 samples/sec   Loss 5.5661   LearningRate 0.0027   Epoch: 33   Global Step: 169200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:34:08,491-Speed 10223.86 samples/sec   Loss 5.5202   LearningRate 0.0027   Epoch: 33   Global Step: 169210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:34:09,523-Speed 9937.60 samples/sec   Loss 5.5035   LearningRate 0.0027   Epoch: 33   Global Step: 169220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:34:10,493-Speed 10572.56 samples/sec   Loss 5.5598   LearningRate 0.0027   Epoch: 33   Global Step: 169230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:34:11,505-Speed 10118.91 samples/sec   Loss 5.5622   LearningRate 0.0027   Epoch: 33   Global Step: 169240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:34:12,472-Speed 10608.37 samples/sec   Loss 5.5262   LearningRate 0.0027   Epoch: 33   Global Step: 169250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:34:13,482-Speed 10148.03 samples/sec   Loss 5.5370   LearningRate 0.0027   Epoch: 33   Global Step: 169260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:34:14,467-Speed 10397.80 samples/sec   Loss 5.6546   LearningRate 0.0027   Epoch: 33   Global Step: 169270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:34:15,438-Speed 10558.62 samples/sec   Loss 5.5236   LearningRate 0.0027   Epoch: 33   Global Step: 169280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:34:16,447-Speed 10163.95 samples/sec   Loss 5.5841   LearningRate 0.0027   Epoch: 33   Global Step: 169290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:34:17,463-Speed 10088.83 samples/sec   Loss 5.6008   LearningRate 0.0027   Epoch: 33   Global Step: 169300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:34:18,449-Speed 10394.39 samples/sec   Loss 5.6424   LearningRate 0.0027   Epoch: 33   Global Step: 169310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:34:19,478-Speed 9961.19 samples/sec   Loss 5.5764   LearningRate 0.0027   Epoch: 33   Global Step: 169320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:34:20,497-Speed 10059.89 samples/sec   Loss 5.4325   LearningRate 0.0027   Epoch: 33   Global Step: 169330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:34:21,495-Speed 10274.56 samples/sec   Loss 5.5862   LearningRate 0.0027   Epoch: 33   Global Step: 169340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:34:22,488-Speed 10331.53 samples/sec   Loss 5.6789   LearningRate 0.0027   Epoch: 33   Global Step: 169350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:34:23,455-Speed 10599.02 samples/sec   Loss 5.5309   LearningRate 0.0027   Epoch: 33   Global Step: 169360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:34:24,461-Speed 10195.56 samples/sec   Loss 5.6082   LearningRate 0.0027   Epoch: 33   Global Step: 169370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:34:25,410-Speed 10795.74 samples/sec   Loss 5.4605   LearningRate 0.0027   Epoch: 33   Global Step: 169380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:34:26,377-Speed 10598.57 samples/sec   Loss 5.6971   LearningRate 0.0026   Epoch: 33   Global Step: 169390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:34:27,371-Speed 10315.40 samples/sec   Loss 5.4813   LearningRate 0.0026   Epoch: 33   Global Step: 169400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:34:28,383-Speed 10122.70 samples/sec   Loss 5.5270   LearningRate 0.0026   Epoch: 33   Global Step: 169410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:34:29,384-Speed 10242.00 samples/sec   Loss 5.5659   LearningRate 0.0026   Epoch: 33   Global Step: 169420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:34:30,394-Speed 10156.64 samples/sec   Loss 5.4948   LearningRate 0.0026   Epoch: 33   Global Step: 169430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:34:31,381-Speed 10381.41 samples/sec   Loss 5.5869   LearningRate 0.0026   Epoch: 33   Global Step: 169440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:34:32,410-Speed 9960.01 samples/sec   Loss 5.5597   LearningRate 0.0026   Epoch: 33   Global Step: 169450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:34:33,428-Speed 10064.73 samples/sec   Loss 5.6254   LearningRate 0.0026   Epoch: 33   Global Step: 169460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:34:34,423-Speed 10298.23 samples/sec   Loss 5.4648   LearningRate 0.0026   Epoch: 33   Global Step: 169470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:34:35,408-Speed 10409.62 samples/sec   Loss 5.5012   LearningRate 0.0026   Epoch: 33   Global Step: 169480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:34:36,409-Speed 10241.25 samples/sec   Loss 5.4923   LearningRate 0.0026   Epoch: 33   Global Step: 169490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:34:37,406-Speed 10275.27 samples/sec   Loss 5.5397   LearningRate 0.0026   Epoch: 33   Global Step: 169500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:34:38,364-Speed 10696.74 samples/sec   Loss 5.6971   LearningRate 0.0026   Epoch: 33   Global Step: 169510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:34:39,346-Speed 10449.32 samples/sec   Loss 5.4962   LearningRate 0.0026   Epoch: 33   Global Step: 169520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:34:40,404-Speed 9692.65 samples/sec   Loss 5.5685   LearningRate 0.0026   Epoch: 33   Global Step: 169530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:34:41,372-Speed 10582.90 samples/sec   Loss 5.5115   LearningRate 0.0026   Epoch: 33   Global Step: 169540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:34:42,309-Speed 10950.10 samples/sec   Loss 5.4688   LearningRate 0.0026   Epoch: 33   Global Step: 169550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:34:43,267-Speed 10693.69 samples/sec   Loss 5.5447   LearningRate 0.0026   Epoch: 33   Global Step: 169560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:34:44,263-Speed 10296.80 samples/sec   Loss 5.4168   LearningRate 0.0026   Epoch: 33   Global Step: 169570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:34:45,262-Speed 10259.89 samples/sec   Loss 5.5303   LearningRate 0.0026   Epoch: 33   Global Step: 169580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:34:46,247-Speed 10402.00 samples/sec   Loss 5.4953   LearningRate 0.0026   Epoch: 33   Global Step: 169590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:34:47,267-Speed 10045.91 samples/sec   Loss 5.5675   LearningRate 0.0026   Epoch: 33   Global Step: 169600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:34:48,295-Speed 9971.65 samples/sec   Loss 5.5356   LearningRate 0.0026   Epoch: 33   Global Step: 169610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:34:49,331-Speed 9888.81 samples/sec   Loss 5.5454   LearningRate 0.0026   Epoch: 33   Global Step: 169620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:34:50,375-Speed 9821.15 samples/sec   Loss 5.4226   LearningRate 0.0026   Epoch: 33   Global Step: 169630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:34:51,398-Speed 10014.14 samples/sec   Loss 5.4036   LearningRate 0.0026   Epoch: 33   Global Step: 169640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:34:52,408-Speed 10144.73 samples/sec   Loss 5.5264   LearningRate 0.0026   Epoch: 33   Global Step: 169650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:34:53,398-Speed 10357.04 samples/sec   Loss 5.5947   LearningRate 0.0026   Epoch: 33   Global Step: 169660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:34:54,391-Speed 10328.20 samples/sec   Loss 5.5623   LearningRate 0.0026   Epoch: 33   Global Step: 169670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:34:55,376-Speed 10397.08 samples/sec   Loss 5.4861   LearningRate 0.0026   Epoch: 33   Global Step: 169680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:34:56,384-Speed 10165.17 samples/sec   Loss 5.5996   LearningRate 0.0026   Epoch: 33   Global Step: 169690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:34:57,371-Speed 10391.32 samples/sec   Loss 5.5269   LearningRate 0.0026   Epoch: 33   Global Step: 169700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:34:58,317-Speed 10830.27 samples/sec   Loss 5.6344   LearningRate 0.0026   Epoch: 33   Global Step: 169710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:34:59,322-Speed 10204.07 samples/sec   Loss 5.5108   LearningRate 0.0026   Epoch: 33   Global Step: 169720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:35:00,383-Speed 9652.81 samples/sec   Loss 5.2863   LearningRate 0.0026   Epoch: 33   Global Step: 169730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:35:01,341-Speed 10700.20 samples/sec   Loss 5.6497   LearningRate 0.0026   Epoch: 33   Global Step: 169740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:35:02,315-Speed 10522.69 samples/sec   Loss 5.4082   LearningRate 0.0026   Epoch: 33   Global Step: 169750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:35:03,337-Speed 10030.83 samples/sec   Loss 5.5305   LearningRate 0.0026   Epoch: 33   Global Step: 169760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:35:04,337-Speed 10249.47 samples/sec   Loss 5.5390   LearningRate 0.0026   Epoch: 33   Global Step: 169770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:35:05,308-Speed 10556.32 samples/sec   Loss 5.3601   LearningRate 0.0026   Epoch: 33   Global Step: 169780   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 05:35:06,245-Speed 10934.74 samples/sec   Loss 5.4636   LearningRate 0.0026   Epoch: 33   Global Step: 169790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:35:07,213-Speed 10587.29 samples/sec   Loss 5.5994   LearningRate 0.0026   Epoch: 33   Global Step: 169800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:35:08,199-Speed 10400.39 samples/sec   Loss 5.4326   LearningRate 0.0026   Epoch: 33   Global Step: 169810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:35:09,151-Speed 10762.50 samples/sec   Loss 5.5774   LearningRate 0.0026   Epoch: 33   Global Step: 169820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:35:10,168-Speed 10075.57 samples/sec   Loss 5.5736   LearningRate 0.0026   Epoch: 33   Global Step: 169830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:35:11,171-Speed 10220.95 samples/sec   Loss 5.4710   LearningRate 0.0026   Epoch: 33   Global Step: 169840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:35:12,174-Speed 10225.62 samples/sec   Loss 5.5103   LearningRate 0.0026   Epoch: 33   Global Step: 169850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:35:13,184-Speed 10141.44 samples/sec   Loss 5.6179   LearningRate 0.0026   Epoch: 33   Global Step: 169860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:35:14,251-Speed 9606.37 samples/sec   Loss 5.5899   LearningRate 0.0026   Epoch: 33   Global Step: 169870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:35:15,287-Speed 9896.46 samples/sec   Loss 5.4194   LearningRate 0.0026   Epoch: 33   Global Step: 169880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:35:16,290-Speed 10219.12 samples/sec   Loss 5.5229   LearningRate 0.0026   Epoch: 33   Global Step: 169890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:35:17,296-Speed 10194.95 samples/sec   Loss 5.5508   LearningRate 0.0026   Epoch: 33   Global Step: 169900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:35:18,313-Speed 10067.14 samples/sec   Loss 5.5461   LearningRate 0.0026   Epoch: 33   Global Step: 169910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:35:19,314-Speed 10247.44 samples/sec   Loss 5.4674   LearningRate 0.0026   Epoch: 33   Global Step: 169920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:35:20,285-Speed 10547.64 samples/sec   Loss 5.5287   LearningRate 0.0026   Epoch: 33   Global Step: 169930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:35:21,253-Speed 10584.63 samples/sec   Loss 5.5635   LearningRate 0.0026   Epoch: 33   Global Step: 169940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:35:22,264-Speed 10147.53 samples/sec   Loss 5.5279   LearningRate 0.0026   Epoch: 33   Global Step: 169950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:35:23,270-Speed 10184.79 samples/sec   Loss 5.5852   LearningRate 0.0026   Epoch: 33   Global Step: 169960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:35:24,248-Speed 10500.58 samples/sec   Loss 5.4873   LearningRate 0.0026   Epoch: 33   Global Step: 169970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:35:25,195-Speed 10817.90 samples/sec   Loss 5.5227   LearningRate 0.0026   Epoch: 33   Global Step: 169980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:35:26,138-Speed 10869.24 samples/sec   Loss 5.5543   LearningRate 0.0026   Epoch: 33   Global Step: 169990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:35:27,144-Speed 10181.79 samples/sec   Loss 5.7128   LearningRate 0.0026   Epoch: 33   Global Step: 170000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:35:49,430-[lfw][170000]XNorm: 8.270940
Training: 2022-04-11 05:35:49,431-[lfw][170000]Accuracy-Flip: 0.99617+-0.00334
Training: 2022-04-11 05:35:49,431-[lfw][170000]Accuracy-Highest: 0.99700
Training: 2022-04-11 05:36:15,172-[cfp_fp][170000]XNorm: 7.115096
Training: 2022-04-11 05:36:15,173-[cfp_fp][170000]Accuracy-Flip: 0.97300+-0.00768
Training: 2022-04-11 05:36:15,173-[cfp_fp][170000]Accuracy-Highest: 0.97300
Training: 2022-04-11 05:36:37,396-[agedb_30][170000]XNorm: 8.069886
Training: 2022-04-11 05:36:37,397-[agedb_30][170000]Accuracy-Flip: 0.97217+-0.00619
Training: 2022-04-11 05:36:37,398-[agedb_30][170000]Accuracy-Highest: 0.97300
Training: 2022-04-11 05:36:38,347-Speed 143.82 samples/sec   Loss 5.5651   LearningRate 0.0026   Epoch: 33   Global Step: 170010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:36:39,350-Speed 10217.17 samples/sec   Loss 5.5340   LearningRate 0.0025   Epoch: 33   Global Step: 170020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:36:40,316-Speed 10612.49 samples/sec   Loss 5.4994   LearningRate 0.0025   Epoch: 33   Global Step: 170030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:36:41,327-Speed 10133.98 samples/sec   Loss 5.4860   LearningRate 0.0025   Epoch: 33   Global Step: 170040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:36:42,322-Speed 10298.76 samples/sec   Loss 5.5230   LearningRate 0.0025   Epoch: 33   Global Step: 170050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:36:43,414-Speed 9383.58 samples/sec   Loss 5.4851   LearningRate 0.0025   Epoch: 33   Global Step: 170060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:36:44,426-Speed 10130.48 samples/sec   Loss 5.5465   LearningRate 0.0025   Epoch: 33   Global Step: 170070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:36:45,438-Speed 10134.08 samples/sec   Loss 5.6141   LearningRate 0.0025   Epoch: 33   Global Step: 170080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:36:46,413-Speed 10516.86 samples/sec   Loss 5.5281   LearningRate 0.0025   Epoch: 33   Global Step: 170090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:36:47,388-Speed 10504.29 samples/sec   Loss 5.4730   LearningRate 0.0025   Epoch: 33   Global Step: 170100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:36:48,399-Speed 10148.37 samples/sec   Loss 5.5653   LearningRate 0.0025   Epoch: 33   Global Step: 170110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:36:49,414-Speed 10097.37 samples/sec   Loss 5.5562   LearningRate 0.0025   Epoch: 33   Global Step: 170120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:36:50,439-Speed 9993.20 samples/sec   Loss 5.6096   LearningRate 0.0025   Epoch: 33   Global Step: 170130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:36:51,401-Speed 10658.92 samples/sec   Loss 5.5711   LearningRate 0.0025   Epoch: 33   Global Step: 170140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:36:52,399-Speed 10267.36 samples/sec   Loss 5.4143   LearningRate 0.0025   Epoch: 33   Global Step: 170150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:36:53,402-Speed 10217.48 samples/sec   Loss 5.5340   LearningRate 0.0025   Epoch: 33   Global Step: 170160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:36:54,365-Speed 10642.92 samples/sec   Loss 5.4599   LearningRate 0.0025   Epoch: 33   Global Step: 170170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:36:55,372-Speed 10175.72 samples/sec   Loss 5.5123   LearningRate 0.0025   Epoch: 33   Global Step: 170180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:36:56,377-Speed 10198.56 samples/sec   Loss 5.5951   LearningRate 0.0025   Epoch: 33   Global Step: 170190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:36:57,404-Speed 9980.69 samples/sec   Loss 5.6752   LearningRate 0.0025   Epoch: 33   Global Step: 170200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:36:58,377-Speed 10541.32 samples/sec   Loss 5.6201   LearningRate 0.0025   Epoch: 33   Global Step: 170210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:36:59,396-Speed 10052.86 samples/sec   Loss 5.5331   LearningRate 0.0025   Epoch: 33   Global Step: 170220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:37:00,374-Speed 10480.33 samples/sec   Loss 5.5047   LearningRate 0.0025   Epoch: 33   Global Step: 170230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:37:01,373-Speed 10259.59 samples/sec   Loss 5.5246   LearningRate 0.0025   Epoch: 33   Global Step: 170240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:37:02,333-Speed 10670.03 samples/sec   Loss 5.6599   LearningRate 0.0025   Epoch: 33   Global Step: 170250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:37:03,328-Speed 10302.67 samples/sec   Loss 5.5141   LearningRate 0.0025   Epoch: 33   Global Step: 170260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:37:04,370-Speed 9840.51 samples/sec   Loss 5.5848   LearningRate 0.0025   Epoch: 33   Global Step: 170270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:37:05,300-Speed 11017.24 samples/sec   Loss 5.6771   LearningRate 0.0025   Epoch: 33   Global Step: 170280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:37:06,264-Speed 10635.67 samples/sec   Loss 5.5336   LearningRate 0.0025   Epoch: 33   Global Step: 170290   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 05:37:07,227-Speed 10643.89 samples/sec   Loss 5.7519   LearningRate 0.0025   Epoch: 33   Global Step: 170300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:37:08,265-Speed 9874.37 samples/sec   Loss 5.6468   LearningRate 0.0025   Epoch: 33   Global Step: 170310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:37:09,290-Speed 9998.62 samples/sec   Loss 5.6578   LearningRate 0.0025   Epoch: 33   Global Step: 170320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:37:10,272-Speed 10431.32 samples/sec   Loss 5.5403   LearningRate 0.0025   Epoch: 33   Global Step: 170330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:37:11,360-Speed 9421.14 samples/sec   Loss 5.5350   LearningRate 0.0025   Epoch: 33   Global Step: 170340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:37:12,353-Speed 10321.81 samples/sec   Loss 5.4322   LearningRate 0.0025   Epoch: 33   Global Step: 170350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:37:13,365-Speed 10125.36 samples/sec   Loss 5.4709   LearningRate 0.0025   Epoch: 33   Global Step: 170360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:37:14,373-Speed 10168.97 samples/sec   Loss 5.4806   LearningRate 0.0025   Epoch: 33   Global Step: 170370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:37:15,391-Speed 10058.57 samples/sec   Loss 5.5831   LearningRate 0.0025   Epoch: 33   Global Step: 170380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:37:16,445-Speed 9725.27 samples/sec   Loss 5.5109   LearningRate 0.0025   Epoch: 33   Global Step: 170390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:37:17,471-Speed 9997.86 samples/sec   Loss 5.4676   LearningRate 0.0025   Epoch: 33   Global Step: 170400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:37:18,477-Speed 10183.49 samples/sec   Loss 5.5726   LearningRate 0.0025   Epoch: 33   Global Step: 170410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:37:19,516-Speed 9870.92 samples/sec   Loss 5.5354   LearningRate 0.0025   Epoch: 33   Global Step: 170420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:37:20,526-Speed 10149.88 samples/sec   Loss 5.5375   LearningRate 0.0025   Epoch: 33   Global Step: 170430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:37:21,520-Speed 10306.16 samples/sec   Loss 5.6072   LearningRate 0.0025   Epoch: 33   Global Step: 170440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:37:22,505-Speed 10400.95 samples/sec   Loss 5.3820   LearningRate 0.0025   Epoch: 33   Global Step: 170450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:37:23,508-Speed 10218.91 samples/sec   Loss 5.5591   LearningRate 0.0025   Epoch: 33   Global Step: 170460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:37:24,515-Speed 10183.55 samples/sec   Loss 5.5845   LearningRate 0.0025   Epoch: 33   Global Step: 170470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:37:25,528-Speed 10121.81 samples/sec   Loss 5.4064   LearningRate 0.0025   Epoch: 33   Global Step: 170480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:37:26,513-Speed 10395.84 samples/sec   Loss 5.6602   LearningRate 0.0025   Epoch: 33   Global Step: 170490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:37:27,510-Speed 10276.79 samples/sec   Loss 5.6878   LearningRate 0.0025   Epoch: 33   Global Step: 170500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:37:28,515-Speed 10201.02 samples/sec   Loss 5.7192   LearningRate 0.0025   Epoch: 33   Global Step: 170510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:37:29,484-Speed 10572.63 samples/sec   Loss 5.6352   LearningRate 0.0025   Epoch: 33   Global Step: 170520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:37:30,514-Speed 9966.51 samples/sec   Loss 5.5879   LearningRate 0.0025   Epoch: 33   Global Step: 170530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:37:31,509-Speed 10291.83 samples/sec   Loss 5.6206   LearningRate 0.0025   Epoch: 33   Global Step: 170540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:37:32,542-Speed 9931.13 samples/sec   Loss 5.4168   LearningRate 0.0025   Epoch: 33   Global Step: 170550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:37:33,519-Speed 10487.21 samples/sec   Loss 5.5157   LearningRate 0.0025   Epoch: 33   Global Step: 170560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:37:34,485-Speed 10608.65 samples/sec   Loss 5.6153   LearningRate 0.0025   Epoch: 33   Global Step: 170570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:37:35,474-Speed 10353.97 samples/sec   Loss 5.4994   LearningRate 0.0025   Epoch: 33   Global Step: 170580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:37:36,476-Speed 10234.87 samples/sec   Loss 5.5807   LearningRate 0.0025   Epoch: 33   Global Step: 170590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:37:37,534-Speed 9683.72 samples/sec   Loss 5.5362   LearningRate 0.0025   Epoch: 33   Global Step: 170600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:37:38,488-Speed 10744.50 samples/sec   Loss 5.5350   LearningRate 0.0025   Epoch: 33   Global Step: 170610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:37:39,513-Speed 9994.05 samples/sec   Loss 5.5380   LearningRate 0.0025   Epoch: 33   Global Step: 170620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:37:40,528-Speed 10104.93 samples/sec   Loss 5.5545   LearningRate 0.0025   Epoch: 33   Global Step: 170630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:37:41,506-Speed 10476.71 samples/sec   Loss 5.5101   LearningRate 0.0025   Epoch: 33   Global Step: 170640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:37:42,470-Speed 10632.22 samples/sec   Loss 5.5636   LearningRate 0.0025   Epoch: 33   Global Step: 170650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:37:43,473-Speed 10219.92 samples/sec   Loss 5.6985   LearningRate 0.0024   Epoch: 33   Global Step: 170660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:37:44,546-Speed 9550.77 samples/sec   Loss 5.5251   LearningRate 0.0024   Epoch: 33   Global Step: 170670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:37:45,546-Speed 10240.17 samples/sec   Loss 5.6281   LearningRate 0.0024   Epoch: 33   Global Step: 170680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:37:46,550-Speed 10207.93 samples/sec   Loss 5.6207   LearningRate 0.0024   Epoch: 33   Global Step: 170690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:37:47,544-Speed 10314.18 samples/sec   Loss 5.5512   LearningRate 0.0024   Epoch: 33   Global Step: 170700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:37:48,566-Speed 10027.30 samples/sec   Loss 5.6471   LearningRate 0.0024   Epoch: 33   Global Step: 170710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:37:49,538-Speed 10552.79 samples/sec   Loss 5.4995   LearningRate 0.0024   Epoch: 33   Global Step: 170720   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 05:37:50,508-Speed 10555.37 samples/sec   Loss 5.5932   LearningRate 0.0024   Epoch: 33   Global Step: 170730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:37:51,539-Speed 9942.97 samples/sec   Loss 5.5417   LearningRate 0.0024   Epoch: 33   Global Step: 170740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:37:52,534-Speed 10313.02 samples/sec   Loss 5.5898   LearningRate 0.0024   Epoch: 33   Global Step: 170750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:37:53,520-Speed 10390.96 samples/sec   Loss 5.5498   LearningRate 0.0024   Epoch: 33   Global Step: 170760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:37:54,597-Speed 9515.11 samples/sec   Loss 5.6215   LearningRate 0.0024   Epoch: 33   Global Step: 170770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:37:55,589-Speed 10328.60 samples/sec   Loss 5.6136   LearningRate 0.0024   Epoch: 33   Global Step: 170780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:37:56,583-Speed 10322.46 samples/sec   Loss 5.6160   LearningRate 0.0024   Epoch: 33   Global Step: 170790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:37:57,558-Speed 10510.67 samples/sec   Loss 5.5291   LearningRate 0.0024   Epoch: 33   Global Step: 170800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:37:58,546-Speed 10373.46 samples/sec   Loss 5.6771   LearningRate 0.0024   Epoch: 33   Global Step: 170810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:37:59,514-Speed 10594.74 samples/sec   Loss 5.5889   LearningRate 0.0024   Epoch: 33   Global Step: 170820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:38:00,575-Speed 9657.81 samples/sec   Loss 5.7055   LearningRate 0.0024   Epoch: 33   Global Step: 170830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:38:01,607-Speed 9931.40 samples/sec   Loss 5.5810   LearningRate 0.0024   Epoch: 33   Global Step: 170840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:38:02,604-Speed 10280.41 samples/sec   Loss 5.4098   LearningRate 0.0024   Epoch: 33   Global Step: 170850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:38:03,594-Speed 10358.43 samples/sec   Loss 5.5971   LearningRate 0.0024   Epoch: 33   Global Step: 170860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:38:04,569-Speed 10514.95 samples/sec   Loss 5.5967   LearningRate 0.0024   Epoch: 33   Global Step: 170870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:38:05,523-Speed 10743.53 samples/sec   Loss 5.5152   LearningRate 0.0024   Epoch: 33   Global Step: 170880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:38:06,503-Speed 10448.18 samples/sec   Loss 5.5263   LearningRate 0.0024   Epoch: 33   Global Step: 170890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:38:07,513-Speed 10155.05 samples/sec   Loss 5.7011   LearningRate 0.0024   Epoch: 33   Global Step: 170900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:38:08,497-Speed 10418.61 samples/sec   Loss 5.5367   LearningRate 0.0024   Epoch: 33   Global Step: 170910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:38:09,523-Speed 9983.44 samples/sec   Loss 5.6423   LearningRate 0.0024   Epoch: 33   Global Step: 170920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:38:10,529-Speed 10204.35 samples/sec   Loss 5.5483   LearningRate 0.0024   Epoch: 33   Global Step: 170930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:38:11,521-Speed 10336.34 samples/sec   Loss 5.5744   LearningRate 0.0024   Epoch: 33   Global Step: 170940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:38:12,522-Speed 10236.46 samples/sec   Loss 5.5494   LearningRate 0.0024   Epoch: 33   Global Step: 170950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:38:13,484-Speed 10647.96 samples/sec   Loss 5.6343   LearningRate 0.0024   Epoch: 33   Global Step: 170960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:38:14,500-Speed 10083.61 samples/sec   Loss 5.6389   LearningRate 0.0024   Epoch: 33   Global Step: 170970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:38:15,530-Speed 9952.17 samples/sec   Loss 5.5412   LearningRate 0.0024   Epoch: 33   Global Step: 170980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:38:16,515-Speed 10410.05 samples/sec   Loss 5.5513   LearningRate 0.0024   Epoch: 33   Global Step: 170990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:38:17,512-Speed 10277.51 samples/sec   Loss 5.6282   LearningRate 0.0024   Epoch: 33   Global Step: 171000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:38:18,507-Speed 10294.50 samples/sec   Loss 5.6471   LearningRate 0.0024   Epoch: 33   Global Step: 171010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:38:19,653-Speed 8943.55 samples/sec   Loss 5.6849   LearningRate 0.0024   Epoch: 33   Global Step: 171020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:38:20,679-Speed 9997.92 samples/sec   Loss 5.4972   LearningRate 0.0024   Epoch: 33   Global Step: 171030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:38:21,684-Speed 10192.95 samples/sec   Loss 5.5858   LearningRate 0.0024   Epoch: 33   Global Step: 171040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:38:22,682-Speed 10276.08 samples/sec   Loss 5.6311   LearningRate 0.0024   Epoch: 33   Global Step: 171050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:38:23,686-Speed 10209.28 samples/sec   Loss 5.5852   LearningRate 0.0024   Epoch: 33   Global Step: 171060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:38:24,725-Speed 9856.98 samples/sec   Loss 5.4930   LearningRate 0.0024   Epoch: 33   Global Step: 171070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:38:25,733-Speed 10173.93 samples/sec   Loss 5.5789   LearningRate 0.0024   Epoch: 33   Global Step: 171080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:38:26,687-Speed 10736.92 samples/sec   Loss 5.6484   LearningRate 0.0024   Epoch: 33   Global Step: 171090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:38:27,672-Speed 10410.50 samples/sec   Loss 5.6777   LearningRate 0.0024   Epoch: 33   Global Step: 171100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:38:28,696-Speed 10005.14 samples/sec   Loss 5.5972   LearningRate 0.0024   Epoch: 33   Global Step: 171110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:38:29,737-Speed 9850.98 samples/sec   Loss 5.6453   LearningRate 0.0024   Epoch: 33   Global Step: 171120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:38:30,753-Speed 10081.84 samples/sec   Loss 5.6945   LearningRate 0.0024   Epoch: 33   Global Step: 171130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:38:31,727-Speed 10519.99 samples/sec   Loss 5.5739   LearningRate 0.0024   Epoch: 33   Global Step: 171140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:38:32,730-Speed 10224.92 samples/sec   Loss 5.4975   LearningRate 0.0024   Epoch: 33   Global Step: 171150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:38:33,720-Speed 10354.46 samples/sec   Loss 5.8050   LearningRate 0.0024   Epoch: 33   Global Step: 171160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:38:34,681-Speed 10670.17 samples/sec   Loss 5.6549   LearningRate 0.0024   Epoch: 33   Global Step: 171170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:38:35,649-Speed 10587.80 samples/sec   Loss 5.7220   LearningRate 0.0024   Epoch: 33   Global Step: 171180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:38:36,641-Speed 10323.99 samples/sec   Loss 5.6100   LearningRate 0.0024   Epoch: 33   Global Step: 171190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:38:37,657-Speed 10086.47 samples/sec   Loss 5.5719   LearningRate 0.0024   Epoch: 33   Global Step: 171200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:38:38,685-Speed 9973.50 samples/sec   Loss 5.7297   LearningRate 0.0024   Epoch: 33   Global Step: 171210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:38:39,669-Speed 10412.33 samples/sec   Loss 5.6422   LearningRate 0.0024   Epoch: 33   Global Step: 171220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:38:40,684-Speed 10098.38 samples/sec   Loss 5.5693   LearningRate 0.0024   Epoch: 33   Global Step: 171230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:38:41,707-Speed 10029.66 samples/sec   Loss 5.6848   LearningRate 0.0024   Epoch: 33   Global Step: 171240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:38:42,661-Speed 10743.34 samples/sec   Loss 5.5563   LearningRate 0.0024   Epoch: 33   Global Step: 171250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:38:43,657-Speed 10290.99 samples/sec   Loss 5.5928   LearningRate 0.0024   Epoch: 33   Global Step: 171260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:38:44,670-Speed 10110.58 samples/sec   Loss 5.5473   LearningRate 0.0024   Epoch: 33   Global Step: 171270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:38:45,749-Speed 9501.85 samples/sec   Loss 5.6801   LearningRate 0.0024   Epoch: 33   Global Step: 171280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:38:46,751-Speed 10227.14 samples/sec   Loss 5.6178   LearningRate 0.0024   Epoch: 33   Global Step: 171290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:38:47,734-Speed 10425.81 samples/sec   Loss 5.6940   LearningRate 0.0024   Epoch: 33   Global Step: 171300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:38:48,773-Speed 9860.54 samples/sec   Loss 5.6436   LearningRate 0.0023   Epoch: 33   Global Step: 171310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:38:49,827-Speed 9722.60 samples/sec   Loss 5.5065   LearningRate 0.0023   Epoch: 33   Global Step: 171320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:38:50,837-Speed 10155.42 samples/sec   Loss 5.6217   LearningRate 0.0023   Epoch: 33   Global Step: 171330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:38:51,833-Speed 10287.47 samples/sec   Loss 5.5994   LearningRate 0.0023   Epoch: 33   Global Step: 171340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:38:52,832-Speed 10263.34 samples/sec   Loss 5.6929   LearningRate 0.0023   Epoch: 33   Global Step: 171350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:38:53,821-Speed 10357.05 samples/sec   Loss 5.5998   LearningRate 0.0023   Epoch: 33   Global Step: 171360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:38:54,801-Speed 10458.17 samples/sec   Loss 5.5131   LearningRate 0.0023   Epoch: 33   Global Step: 171370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:38:55,842-Speed 9846.85 samples/sec   Loss 5.5413   LearningRate 0.0023   Epoch: 33   Global Step: 171380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:38:56,834-Speed 10324.78 samples/sec   Loss 5.6093   LearningRate 0.0023   Epoch: 33   Global Step: 171390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:38:57,888-Speed 9732.12 samples/sec   Loss 5.4796   LearningRate 0.0023   Epoch: 33   Global Step: 171400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:38:58,880-Speed 10345.17 samples/sec   Loss 5.7180   LearningRate 0.0023   Epoch: 33   Global Step: 171410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:38:59,887-Speed 10171.10 samples/sec   Loss 5.7204   LearningRate 0.0023   Epoch: 33   Global Step: 171420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:39:00,876-Speed 10366.83 samples/sec   Loss 5.6938   LearningRate 0.0023   Epoch: 33   Global Step: 171430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:39:01,903-Speed 9974.63 samples/sec   Loss 5.6904   LearningRate 0.0023   Epoch: 33   Global Step: 171440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:39:02,920-Speed 10082.37 samples/sec   Loss 5.5284   LearningRate 0.0023   Epoch: 33   Global Step: 171450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:39:03,938-Speed 10067.30 samples/sec   Loss 5.5898   LearningRate 0.0023   Epoch: 33   Global Step: 171460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:39:04,965-Speed 9973.44 samples/sec   Loss 5.7041   LearningRate 0.0023   Epoch: 33   Global Step: 171470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:39:05,940-Speed 10521.32 samples/sec   Loss 5.5600   LearningRate 0.0023   Epoch: 33   Global Step: 171480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:39:06,920-Speed 10450.34 samples/sec   Loss 5.5122   LearningRate 0.0023   Epoch: 33   Global Step: 171490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:39:07,927-Speed 10176.09 samples/sec   Loss 5.5876   LearningRate 0.0023   Epoch: 33   Global Step: 171500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:39:08,988-Speed 9662.80 samples/sec   Loss 5.5070   LearningRate 0.0023   Epoch: 33   Global Step: 171510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:39:09,991-Speed 10221.76 samples/sec   Loss 5.5407   LearningRate 0.0023   Epoch: 33   Global Step: 171520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:39:11,011-Speed 10051.83 samples/sec   Loss 5.6712   LearningRate 0.0023   Epoch: 33   Global Step: 171530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:39:12,032-Speed 10035.98 samples/sec   Loss 5.6050   LearningRate 0.0023   Epoch: 33   Global Step: 171540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:39:13,030-Speed 10268.08 samples/sec   Loss 5.6621   LearningRate 0.0023   Epoch: 33   Global Step: 171550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:39:13,973-Speed 10859.34 samples/sec   Loss 5.6782   LearningRate 0.0023   Epoch: 33   Global Step: 171560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:39:15,016-Speed 9826.79 samples/sec   Loss 5.6334   LearningRate 0.0023   Epoch: 33   Global Step: 171570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:39:16,033-Speed 10082.22 samples/sec   Loss 5.7419   LearningRate 0.0023   Epoch: 33   Global Step: 171580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:39:17,036-Speed 10215.47 samples/sec   Loss 5.6613   LearningRate 0.0023   Epoch: 33   Global Step: 171590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:39:18,146-Speed 9235.20 samples/sec   Loss 5.6570   LearningRate 0.0023   Epoch: 33   Global Step: 171600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:39:19,113-Speed 10605.06 samples/sec   Loss 5.4744   LearningRate 0.0023   Epoch: 33   Global Step: 171610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:39:20,083-Speed 10562.67 samples/sec   Loss 5.5234   LearningRate 0.0023   Epoch: 33   Global Step: 171620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:39:21,058-Speed 10507.40 samples/sec   Loss 5.7364   LearningRate 0.0023   Epoch: 33   Global Step: 171630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:39:22,155-Speed 9342.77 samples/sec   Loss 5.7195   LearningRate 0.0023   Epoch: 33   Global Step: 171640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:39:23,100-Speed 10854.15 samples/sec   Loss 5.5777   LearningRate 0.0023   Epoch: 33   Global Step: 171650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:39:24,068-Speed 10579.76 samples/sec   Loss 5.6192   LearningRate 0.0023   Epoch: 33   Global Step: 171660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:39:25,062-Speed 10308.26 samples/sec   Loss 5.6270   LearningRate 0.0023   Epoch: 33   Global Step: 171670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:39:26,052-Speed 10360.65 samples/sec   Loss 5.6086   LearningRate 0.0023   Epoch: 33   Global Step: 171680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:39:27,039-Speed 10385.06 samples/sec   Loss 5.4959   LearningRate 0.0023   Epoch: 33   Global Step: 171690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:39:28,068-Speed 9961.33 samples/sec   Loss 5.6537   LearningRate 0.0023   Epoch: 33   Global Step: 171700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:39:29,079-Speed 10130.96 samples/sec   Loss 5.6295   LearningRate 0.0023   Epoch: 33   Global Step: 171710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:39:30,115-Speed 9901.69 samples/sec   Loss 5.5307   LearningRate 0.0023   Epoch: 33   Global Step: 171720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:39:31,128-Speed 10119.69 samples/sec   Loss 5.6610   LearningRate 0.0023   Epoch: 33   Global Step: 171730   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 05:39:32,127-Speed 10259.01 samples/sec   Loss 5.5270   LearningRate 0.0023   Epoch: 33   Global Step: 171740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:39:33,104-Speed 10493.23 samples/sec   Loss 5.5482   LearningRate 0.0023   Epoch: 33   Global Step: 171750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:39:34,108-Speed 10208.57 samples/sec   Loss 5.5553   LearningRate 0.0023   Epoch: 33   Global Step: 171760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:39:35,128-Speed 10048.06 samples/sec   Loss 5.6392   LearningRate 0.0023   Epoch: 33   Global Step: 171770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:39:36,111-Speed 10417.38 samples/sec   Loss 5.8236   LearningRate 0.0023   Epoch: 33   Global Step: 171780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:39:37,138-Speed 9981.06 samples/sec   Loss 5.6131   LearningRate 0.0023   Epoch: 33   Global Step: 171790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:39:38,103-Speed 10643.82 samples/sec   Loss 5.6356   LearningRate 0.0023   Epoch: 33   Global Step: 171800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:39:39,080-Speed 10494.49 samples/sec   Loss 5.5663   LearningRate 0.0023   Epoch: 33   Global Step: 171810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:39:40,105-Speed 9997.34 samples/sec   Loss 5.6655   LearningRate 0.0023   Epoch: 33   Global Step: 171820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:39:41,083-Speed 10482.88 samples/sec   Loss 5.6456   LearningRate 0.0023   Epoch: 33   Global Step: 171830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:39:42,076-Speed 10326.36 samples/sec   Loss 5.6154   LearningRate 0.0023   Epoch: 33   Global Step: 171840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:39:43,089-Speed 10117.26 samples/sec   Loss 5.7180   LearningRate 0.0023   Epoch: 33   Global Step: 171850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:39:44,116-Speed 9984.10 samples/sec   Loss 5.4757   LearningRate 0.0023   Epoch: 33   Global Step: 171860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:39:45,065-Speed 10790.98 samples/sec   Loss 5.5265   LearningRate 0.0023   Epoch: 33   Global Step: 171870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:39:46,067-Speed 10227.16 samples/sec   Loss 5.7883   LearningRate 0.0023   Epoch: 33   Global Step: 171880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:39:47,064-Speed 10283.76 samples/sec   Loss 5.5789   LearningRate 0.0023   Epoch: 33   Global Step: 171890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:39:48,040-Speed 10498.63 samples/sec   Loss 5.5109   LearningRate 0.0023   Epoch: 33   Global Step: 171900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:39:49,052-Speed 10127.95 samples/sec   Loss 5.4883   LearningRate 0.0023   Epoch: 33   Global Step: 171910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:39:50,030-Speed 10482.63 samples/sec   Loss 5.8084   LearningRate 0.0023   Epoch: 33   Global Step: 171920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:39:51,011-Speed 10439.16 samples/sec   Loss 5.4776   LearningRate 0.0023   Epoch: 33   Global Step: 171930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:39:52,059-Speed 9780.91 samples/sec   Loss 5.4668   LearningRate 0.0023   Epoch: 33   Global Step: 171940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:39:53,065-Speed 10193.04 samples/sec   Loss 5.6596   LearningRate 0.0023   Epoch: 33   Global Step: 171950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:39:54,076-Speed 10141.89 samples/sec   Loss 5.7414   LearningRate 0.0023   Epoch: 33   Global Step: 171960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:39:55,092-Speed 10084.95 samples/sec   Loss 5.6719   LearningRate 0.0023   Epoch: 33   Global Step: 171970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:40:06,363-Speed 908.62 samples/sec   Loss 5.3750   LearningRate 0.0022   Epoch: 34   Global Step: 171980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:40:07,508-Speed 8952.70 samples/sec   Loss 5.3268   LearningRate 0.0022   Epoch: 34   Global Step: 171990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:40:08,593-Speed 9444.35 samples/sec   Loss 5.2036   LearningRate 0.0022   Epoch: 34   Global Step: 172000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:40:31,023-[lfw][172000]XNorm: 8.100525
Training: 2022-04-11 05:40:31,023-[lfw][172000]Accuracy-Flip: 0.99600+-0.00309
Training: 2022-04-11 05:40:31,024-[lfw][172000]Accuracy-Highest: 0.99700
Training: 2022-04-11 05:40:56,752-[cfp_fp][172000]XNorm: 7.020157
Training: 2022-04-11 05:40:56,753-[cfp_fp][172000]Accuracy-Flip: 0.96971+-0.01055
Training: 2022-04-11 05:40:56,754-[cfp_fp][172000]Accuracy-Highest: 0.97300
Training: 2022-04-11 05:41:19,024-[agedb_30][172000]XNorm: 7.939079
Training: 2022-04-11 05:41:19,025-[agedb_30][172000]Accuracy-Flip: 0.97150+-0.00689
Training: 2022-04-11 05:41:19,026-[agedb_30][172000]Accuracy-Highest: 0.97300
Training: 2022-04-11 05:41:19,986-Speed 143.43 samples/sec   Loss 5.4226   LearningRate 0.0022   Epoch: 34   Global Step: 172010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:41:20,998-Speed 10133.49 samples/sec   Loss 5.1733   LearningRate 0.0022   Epoch: 34   Global Step: 172020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:41:22,133-Speed 9028.09 samples/sec   Loss 5.2289   LearningRate 0.0022   Epoch: 34   Global Step: 172030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:41:23,186-Speed 9734.99 samples/sec   Loss 5.2637   LearningRate 0.0022   Epoch: 34   Global Step: 172040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:41:24,262-Speed 9518.18 samples/sec   Loss 5.3161   LearningRate 0.0022   Epoch: 34   Global Step: 172050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:41:25,261-Speed 10267.91 samples/sec   Loss 5.2009   LearningRate 0.0022   Epoch: 34   Global Step: 172060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:41:26,236-Speed 10526.41 samples/sec   Loss 5.1486   LearningRate 0.0022   Epoch: 34   Global Step: 172070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:41:27,232-Speed 10287.06 samples/sec   Loss 5.2221   LearningRate 0.0022   Epoch: 34   Global Step: 172080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:41:28,240-Speed 10169.63 samples/sec   Loss 5.2694   LearningRate 0.0022   Epoch: 34   Global Step: 172090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:41:29,222-Speed 10442.14 samples/sec   Loss 5.2723   LearningRate 0.0022   Epoch: 34   Global Step: 172100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:41:30,235-Speed 10117.76 samples/sec   Loss 5.0907   LearningRate 0.0022   Epoch: 34   Global Step: 172110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:41:31,222-Speed 10383.52 samples/sec   Loss 5.2477   LearningRate 0.0022   Epoch: 34   Global Step: 172120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:41:32,247-Speed 10006.29 samples/sec   Loss 5.2168   LearningRate 0.0022   Epoch: 34   Global Step: 172130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:41:33,215-Speed 10586.55 samples/sec   Loss 5.1950   LearningRate 0.0022   Epoch: 34   Global Step: 172140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:41:34,192-Speed 10484.27 samples/sec   Loss 5.1698   LearningRate 0.0022   Epoch: 34   Global Step: 172150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:41:35,233-Speed 9847.04 samples/sec   Loss 5.3709   LearningRate 0.0022   Epoch: 34   Global Step: 172160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:41:36,270-Speed 9876.58 samples/sec   Loss 5.3446   LearningRate 0.0022   Epoch: 34   Global Step: 172170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:41:37,278-Speed 10374.25 samples/sec   Loss 5.3220   LearningRate 0.0022   Epoch: 34   Global Step: 172180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:41:38,266-Speed 10371.70 samples/sec   Loss 5.2211   LearningRate 0.0022   Epoch: 34   Global Step: 172190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:41:39,345-Speed 9494.80 samples/sec   Loss 5.1274   LearningRate 0.0022   Epoch: 34   Global Step: 172200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:41:40,356-Speed 10145.21 samples/sec   Loss 5.3126   LearningRate 0.0022   Epoch: 34   Global Step: 172210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:41:41,350-Speed 10307.61 samples/sec   Loss 5.1957   LearningRate 0.0022   Epoch: 34   Global Step: 172220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:41:42,453-Speed 9291.05 samples/sec   Loss 5.1880   LearningRate 0.0022   Epoch: 34   Global Step: 172230   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 05:41:43,457-Speed 10213.07 samples/sec   Loss 5.4139   LearningRate 0.0022   Epoch: 34   Global Step: 172240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:41:44,455-Speed 10267.26 samples/sec   Loss 5.2517   LearningRate 0.0022   Epoch: 34   Global Step: 172250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:41:45,443-Speed 10370.00 samples/sec   Loss 5.2101   LearningRate 0.0022   Epoch: 34   Global Step: 172260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:41:46,429-Speed 10399.37 samples/sec   Loss 5.1680   LearningRate 0.0022   Epoch: 34   Global Step: 172270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:41:47,436-Speed 10168.86 samples/sec   Loss 5.4032   LearningRate 0.0022   Epoch: 34   Global Step: 172280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:41:48,431-Speed 10305.60 samples/sec   Loss 5.2819   LearningRate 0.0022   Epoch: 34   Global Step: 172290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:41:49,433-Speed 10229.25 samples/sec   Loss 5.4039   LearningRate 0.0022   Epoch: 34   Global Step: 172300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:41:50,393-Speed 10670.17 samples/sec   Loss 5.3365   LearningRate 0.0022   Epoch: 34   Global Step: 172310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:41:51,383-Speed 10354.58 samples/sec   Loss 5.3270   LearningRate 0.0022   Epoch: 34   Global Step: 172320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:41:52,395-Speed 10125.81 samples/sec   Loss 5.3381   LearningRate 0.0022   Epoch: 34   Global Step: 172330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:41:53,381-Speed 10397.11 samples/sec   Loss 5.3154   LearningRate 0.0022   Epoch: 34   Global Step: 172340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:41:54,375-Speed 10307.93 samples/sec   Loss 5.3244   LearningRate 0.0022   Epoch: 34   Global Step: 172350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:41:55,379-Speed 10207.61 samples/sec   Loss 5.3435   LearningRate 0.0022   Epoch: 34   Global Step: 172360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:41:56,334-Speed 10729.31 samples/sec   Loss 5.3421   LearningRate 0.0022   Epoch: 34   Global Step: 172370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:41:57,363-Speed 9967.50 samples/sec   Loss 5.2362   LearningRate 0.0022   Epoch: 34   Global Step: 172380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:41:58,371-Speed 10163.37 samples/sec   Loss 5.3358   LearningRate 0.0022   Epoch: 34   Global Step: 172390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:41:59,373-Speed 10231.97 samples/sec   Loss 5.2571   LearningRate 0.0022   Epoch: 34   Global Step: 172400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:42:00,400-Speed 9978.40 samples/sec   Loss 5.3715   LearningRate 0.0022   Epoch: 34   Global Step: 172410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:42:01,391-Speed 10342.15 samples/sec   Loss 5.4190   LearningRate 0.0022   Epoch: 34   Global Step: 172420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:42:02,372-Speed 10449.66 samples/sec   Loss 5.3759   LearningRate 0.0022   Epoch: 34   Global Step: 172430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:03,358-Speed 10385.16 samples/sec   Loss 5.1710   LearningRate 0.0022   Epoch: 34   Global Step: 172440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:04,327-Speed 10588.71 samples/sec   Loss 5.2345   LearningRate 0.0022   Epoch: 34   Global Step: 172450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:05,334-Speed 10182.05 samples/sec   Loss 5.3867   LearningRate 0.0022   Epoch: 34   Global Step: 172460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:06,339-Speed 10195.89 samples/sec   Loss 5.2367   LearningRate 0.0022   Epoch: 34   Global Step: 172470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:07,324-Speed 10405.31 samples/sec   Loss 5.2417   LearningRate 0.0022   Epoch: 34   Global Step: 172480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:08,358-Speed 9920.56 samples/sec   Loss 5.3786   LearningRate 0.0022   Epoch: 34   Global Step: 172490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:09,345-Speed 10377.46 samples/sec   Loss 5.2001   LearningRate 0.0022   Epoch: 34   Global Step: 172500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:10,379-Speed 9914.16 samples/sec   Loss 5.3216   LearningRate 0.0022   Epoch: 34   Global Step: 172510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:11,364-Speed 10397.42 samples/sec   Loss 5.3296   LearningRate 0.0022   Epoch: 34   Global Step: 172520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:12,417-Speed 9740.39 samples/sec   Loss 5.4169   LearningRate 0.0022   Epoch: 34   Global Step: 172530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:42:13,402-Speed 10525.34 samples/sec   Loss 5.2728   LearningRate 0.0022   Epoch: 34   Global Step: 172540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:42:14,382-Speed 10453.47 samples/sec   Loss 5.2985   LearningRate 0.0022   Epoch: 34   Global Step: 172550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:42:15,371-Speed 10361.79 samples/sec   Loss 5.2962   LearningRate 0.0022   Epoch: 34   Global Step: 172560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:42:16,337-Speed 10609.71 samples/sec   Loss 5.2470   LearningRate 0.0022   Epoch: 34   Global Step: 172570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:42:17,355-Speed 10063.99 samples/sec   Loss 5.2943   LearningRate 0.0022   Epoch: 34   Global Step: 172580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:42:18,378-Speed 10018.89 samples/sec   Loss 5.2733   LearningRate 0.0022   Epoch: 34   Global Step: 172590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:42:19,373-Speed 10302.51 samples/sec   Loss 5.3251   LearningRate 0.0022   Epoch: 34   Global Step: 172600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:42:20,371-Speed 10272.89 samples/sec   Loss 5.3797   LearningRate 0.0022   Epoch: 34   Global Step: 172610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:42:21,353-Speed 10435.10 samples/sec   Loss 5.3984   LearningRate 0.0022   Epoch: 34   Global Step: 172620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:22,335-Speed 10432.78 samples/sec   Loss 5.2995   LearningRate 0.0022   Epoch: 34   Global Step: 172630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:23,325-Speed 10359.22 samples/sec   Loss 5.2858   LearningRate 0.0022   Epoch: 34   Global Step: 172640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:24,300-Speed 10510.84 samples/sec   Loss 5.2214   LearningRate 0.0022   Epoch: 34   Global Step: 172650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:25,330-Speed 9945.91 samples/sec   Loss 5.3727   LearningRate 0.0021   Epoch: 34   Global Step: 172660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:26,315-Speed 10414.98 samples/sec   Loss 5.3793   LearningRate 0.0021   Epoch: 34   Global Step: 172670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:27,330-Speed 10094.35 samples/sec   Loss 5.3577   LearningRate 0.0021   Epoch: 34   Global Step: 172680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:28,327-Speed 10285.45 samples/sec   Loss 5.3420   LearningRate 0.0021   Epoch: 34   Global Step: 172690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:29,325-Speed 10268.25 samples/sec   Loss 5.2602   LearningRate 0.0021   Epoch: 34   Global Step: 172700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:30,334-Speed 10157.58 samples/sec   Loss 5.2361   LearningRate 0.0021   Epoch: 34   Global Step: 172710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:31,360-Speed 9993.16 samples/sec   Loss 5.2161   LearningRate 0.0021   Epoch: 34   Global Step: 172720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:42:32,410-Speed 9759.73 samples/sec   Loss 5.3396   LearningRate 0.0021   Epoch: 34   Global Step: 172730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:42:33,399-Speed 10364.98 samples/sec   Loss 5.3624   LearningRate 0.0021   Epoch: 34   Global Step: 172740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:42:34,374-Speed 10516.72 samples/sec   Loss 5.3628   LearningRate 0.0021   Epoch: 34   Global Step: 172750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:42:35,398-Speed 10011.13 samples/sec   Loss 5.2406   LearningRate 0.0021   Epoch: 34   Global Step: 172760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:42:36,349-Speed 10774.62 samples/sec   Loss 5.2919   LearningRate 0.0021   Epoch: 34   Global Step: 172770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:42:37,332-Speed 10431.26 samples/sec   Loss 5.2041   LearningRate 0.0021   Epoch: 34   Global Step: 172780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:38,310-Speed 10475.83 samples/sec   Loss 5.2793   LearningRate 0.0021   Epoch: 34   Global Step: 172790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:39,284-Speed 10524.39 samples/sec   Loss 5.2330   LearningRate 0.0021   Epoch: 34   Global Step: 172800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:40,258-Speed 10516.77 samples/sec   Loss 5.3464   LearningRate 0.0021   Epoch: 34   Global Step: 172810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:41,229-Speed 10558.29 samples/sec   Loss 5.2868   LearningRate 0.0021   Epoch: 34   Global Step: 172820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:42,210-Speed 10440.96 samples/sec   Loss 5.3066   LearningRate 0.0021   Epoch: 34   Global Step: 172830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:43,216-Speed 10192.11 samples/sec   Loss 5.2339   LearningRate 0.0021   Epoch: 34   Global Step: 172840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:44,236-Speed 10042.44 samples/sec   Loss 5.2875   LearningRate 0.0021   Epoch: 34   Global Step: 172850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:45,219-Speed 10422.14 samples/sec   Loss 5.3487   LearningRate 0.0021   Epoch: 34   Global Step: 172860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:46,196-Speed 10493.63 samples/sec   Loss 5.3026   LearningRate 0.0021   Epoch: 34   Global Step: 172870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:47,231-Speed 9902.43 samples/sec   Loss 5.3696   LearningRate 0.0021   Epoch: 34   Global Step: 172880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:42:48,208-Speed 10490.60 samples/sec   Loss 5.3392   LearningRate 0.0021   Epoch: 34   Global Step: 172890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:49,191-Speed 10426.27 samples/sec   Loss 5.3178   LearningRate 0.0021   Epoch: 34   Global Step: 172900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:50,281-Speed 9405.51 samples/sec   Loss 5.3156   LearningRate 0.0021   Epoch: 34   Global Step: 172910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:51,302-Speed 10044.48 samples/sec   Loss 5.2197   LearningRate 0.0021   Epoch: 34   Global Step: 172920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:52,298-Speed 10284.58 samples/sec   Loss 5.3492   LearningRate 0.0021   Epoch: 34   Global Step: 172930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:53,283-Speed 10402.00 samples/sec   Loss 5.3531   LearningRate 0.0021   Epoch: 34   Global Step: 172940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:54,271-Speed 10373.86 samples/sec   Loss 5.4239   LearningRate 0.0021   Epoch: 34   Global Step: 172950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:55,257-Speed 10395.94 samples/sec   Loss 5.2652   LearningRate 0.0021   Epoch: 34   Global Step: 172960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:56,186-Speed 11027.34 samples/sec   Loss 5.3792   LearningRate 0.0021   Epoch: 34   Global Step: 172970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:57,170-Speed 10413.86 samples/sec   Loss 5.3558   LearningRate 0.0021   Epoch: 34   Global Step: 172980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:42:58,184-Speed 10106.78 samples/sec   Loss 5.3016   LearningRate 0.0021   Epoch: 34   Global Step: 172990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:42:59,183-Speed 10263.71 samples/sec   Loss 5.4394   LearningRate 0.0021   Epoch: 34   Global Step: 173000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:00,171-Speed 10376.85 samples/sec   Loss 5.1794   LearningRate 0.0021   Epoch: 34   Global Step: 173010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:01,191-Speed 10049.04 samples/sec   Loss 5.4317   LearningRate 0.0021   Epoch: 34   Global Step: 173020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:02,209-Speed 10063.18 samples/sec   Loss 5.2323   LearningRate 0.0021   Epoch: 34   Global Step: 173030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:03,227-Speed 10073.59 samples/sec   Loss 5.1043   LearningRate 0.0021   Epoch: 34   Global Step: 173040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:04,217-Speed 10346.16 samples/sec   Loss 5.3048   LearningRate 0.0021   Epoch: 34   Global Step: 173050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:05,272-Speed 9718.20 samples/sec   Loss 5.3880   LearningRate 0.0021   Epoch: 34   Global Step: 173060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:06,258-Speed 10395.79 samples/sec   Loss 5.2625   LearningRate 0.0021   Epoch: 34   Global Step: 173070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:07,309-Speed 9749.56 samples/sec   Loss 5.2979   LearningRate 0.0021   Epoch: 34   Global Step: 173080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:08,293-Speed 10420.85 samples/sec   Loss 5.3909   LearningRate 0.0021   Epoch: 34   Global Step: 173090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:09,291-Speed 10264.64 samples/sec   Loss 5.3122   LearningRate 0.0021   Epoch: 34   Global Step: 173100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:10,281-Speed 10343.48 samples/sec   Loss 5.4746   LearningRate 0.0021   Epoch: 34   Global Step: 173110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:11,293-Speed 10127.53 samples/sec   Loss 5.3344   LearningRate 0.0021   Epoch: 34   Global Step: 173120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:12,293-Speed 10249.42 samples/sec   Loss 5.2123   LearningRate 0.0021   Epoch: 34   Global Step: 173130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:13,263-Speed 10569.98 samples/sec   Loss 5.3424   LearningRate 0.0021   Epoch: 34   Global Step: 173140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:14,258-Speed 10302.72 samples/sec   Loss 5.4624   LearningRate 0.0021   Epoch: 34   Global Step: 173150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:15,238-Speed 10457.57 samples/sec   Loss 5.3577   LearningRate 0.0021   Epoch: 34   Global Step: 173160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:16,219-Speed 10445.05 samples/sec   Loss 5.3006   LearningRate 0.0021   Epoch: 34   Global Step: 173170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:17,228-Speed 10155.26 samples/sec   Loss 5.2205   LearningRate 0.0021   Epoch: 34   Global Step: 173180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:18,252-Speed 10008.59 samples/sec   Loss 5.2607   LearningRate 0.0021   Epoch: 34   Global Step: 173190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:19,261-Speed 10159.74 samples/sec   Loss 5.3194   LearningRate 0.0021   Epoch: 34   Global Step: 173200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:20,240-Speed 10474.39 samples/sec   Loss 5.2902   LearningRate 0.0021   Epoch: 34   Global Step: 173210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:21,217-Speed 10483.05 samples/sec   Loss 5.4560   LearningRate 0.0021   Epoch: 34   Global Step: 173220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:22,163-Speed 10826.59 samples/sec   Loss 5.4085   LearningRate 0.0021   Epoch: 34   Global Step: 173230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:23,202-Speed 9864.75 samples/sec   Loss 5.3576   LearningRate 0.0021   Epoch: 34   Global Step: 173240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:24,186-Speed 10415.86 samples/sec   Loss 5.3636   LearningRate 0.0021   Epoch: 34   Global Step: 173250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:25,183-Speed 10281.12 samples/sec   Loss 5.4330   LearningRate 0.0021   Epoch: 34   Global Step: 173260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:43:26,149-Speed 10609.77 samples/sec   Loss 5.2114   LearningRate 0.0021   Epoch: 34   Global Step: 173270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:43:27,114-Speed 10629.73 samples/sec   Loss 5.4394   LearningRate 0.0021   Epoch: 34   Global Step: 173280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:43:28,103-Speed 10354.15 samples/sec   Loss 5.4034   LearningRate 0.0021   Epoch: 34   Global Step: 173290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:43:29,114-Speed 10144.41 samples/sec   Loss 5.3947   LearningRate 0.0021   Epoch: 34   Global Step: 173300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:43:30,112-Speed 10267.63 samples/sec   Loss 5.4411   LearningRate 0.0021   Epoch: 34   Global Step: 173310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:43:31,109-Speed 10287.84 samples/sec   Loss 5.4106   LearningRate 0.0021   Epoch: 34   Global Step: 173320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:43:32,116-Speed 10175.29 samples/sec   Loss 5.4325   LearningRate 0.0021   Epoch: 34   Global Step: 173330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:43:33,095-Speed 10471.96 samples/sec   Loss 5.3216   LearningRate 0.0021   Epoch: 34   Global Step: 173340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:43:34,077-Speed 10440.68 samples/sec   Loss 5.3761   LearningRate 0.0021   Epoch: 34   Global Step: 173350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:43:35,050-Speed 10530.79 samples/sec   Loss 5.4101   LearningRate 0.0020   Epoch: 34   Global Step: 173360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:36,039-Speed 10357.14 samples/sec   Loss 5.2919   LearningRate 0.0020   Epoch: 34   Global Step: 173370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:37,037-Speed 10268.81 samples/sec   Loss 5.2939   LearningRate 0.0020   Epoch: 34   Global Step: 173380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:38,109-Speed 9557.98 samples/sec   Loss 5.5575   LearningRate 0.0020   Epoch: 34   Global Step: 173390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:39,076-Speed 10606.50 samples/sec   Loss 5.4693   LearningRate 0.0020   Epoch: 34   Global Step: 173400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:40,100-Speed 10010.02 samples/sec   Loss 5.4104   LearningRate 0.0020   Epoch: 34   Global Step: 173410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:41,065-Speed 10614.56 samples/sec   Loss 5.2840   LearningRate 0.0020   Epoch: 34   Global Step: 173420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:42,087-Speed 10031.25 samples/sec   Loss 5.3306   LearningRate 0.0020   Epoch: 34   Global Step: 173430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:43,083-Speed 10289.92 samples/sec   Loss 5.5014   LearningRate 0.0020   Epoch: 34   Global Step: 173440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:44,076-Speed 10326.66 samples/sec   Loss 5.4278   LearningRate 0.0020   Epoch: 34   Global Step: 173450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:45,076-Speed 10248.95 samples/sec   Loss 5.3923   LearningRate 0.0020   Epoch: 34   Global Step: 173460   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 05:43:46,076-Speed 10242.90 samples/sec   Loss 5.4073   LearningRate 0.0020   Epoch: 34   Global Step: 173470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:47,045-Speed 10572.34 samples/sec   Loss 5.4674   LearningRate 0.0020   Epoch: 34   Global Step: 173480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:48,009-Speed 10636.00 samples/sec   Loss 5.2971   LearningRate 0.0020   Epoch: 34   Global Step: 173490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:49,003-Speed 10320.05 samples/sec   Loss 5.4417   LearningRate 0.0020   Epoch: 34   Global Step: 173500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:49,967-Speed 10623.22 samples/sec   Loss 5.3996   LearningRate 0.0020   Epoch: 34   Global Step: 173510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:50,955-Speed 10378.79 samples/sec   Loss 5.3505   LearningRate 0.0020   Epoch: 34   Global Step: 173520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:51,949-Speed 10316.20 samples/sec   Loss 5.4682   LearningRate 0.0020   Epoch: 34   Global Step: 173530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:52,930-Speed 10446.37 samples/sec   Loss 5.3607   LearningRate 0.0020   Epoch: 34   Global Step: 173540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:53,947-Speed 10080.24 samples/sec   Loss 5.4300   LearningRate 0.0020   Epoch: 34   Global Step: 173550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:55,012-Speed 9618.88 samples/sec   Loss 5.3440   LearningRate 0.0020   Epoch: 34   Global Step: 173560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:43:55,960-Speed 10817.01 samples/sec   Loss 5.2254   LearningRate 0.0020   Epoch: 34   Global Step: 173570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:43:57,006-Speed 9801.85 samples/sec   Loss 5.3526   LearningRate 0.0020   Epoch: 34   Global Step: 173580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:43:58,039-Speed 9918.01 samples/sec   Loss 5.1686   LearningRate 0.0020   Epoch: 34   Global Step: 173590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:43:59,006-Speed 10607.38 samples/sec   Loss 5.4778   LearningRate 0.0020   Epoch: 34   Global Step: 173600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:44:00,007-Speed 10236.24 samples/sec   Loss 5.4720   LearningRate 0.0020   Epoch: 34   Global Step: 173610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:44:01,008-Speed 10234.46 samples/sec   Loss 5.3178   LearningRate 0.0020   Epoch: 34   Global Step: 173620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:44:02,010-Speed 10223.16 samples/sec   Loss 5.3062   LearningRate 0.0020   Epoch: 34   Global Step: 173630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:44:03,029-Speed 10060.61 samples/sec   Loss 5.4185   LearningRate 0.0020   Epoch: 34   Global Step: 173640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:44:03,979-Speed 10788.39 samples/sec   Loss 5.5029   LearningRate 0.0020   Epoch: 34   Global Step: 173650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:44:05,001-Speed 10032.61 samples/sec   Loss 5.4378   LearningRate 0.0020   Epoch: 34   Global Step: 173660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:44:05,996-Speed 10299.64 samples/sec   Loss 5.4454   LearningRate 0.0020   Epoch: 34   Global Step: 173670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:44:06,989-Speed 10322.30 samples/sec   Loss 5.3710   LearningRate 0.0020   Epoch: 34   Global Step: 173680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:44:07,977-Speed 10365.51 samples/sec   Loss 5.3301   LearningRate 0.0020   Epoch: 34   Global Step: 173690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:44:08,960-Speed 10430.19 samples/sec   Loss 5.2585   LearningRate 0.0020   Epoch: 34   Global Step: 173700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:44:09,968-Speed 10164.02 samples/sec   Loss 5.3725   LearningRate 0.0020   Epoch: 34   Global Step: 173710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:44:11,005-Speed 9883.55 samples/sec   Loss 5.2968   LearningRate 0.0020   Epoch: 34   Global Step: 173720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:44:11,989-Speed 10416.80 samples/sec   Loss 5.4130   LearningRate 0.0020   Epoch: 34   Global Step: 173730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:44:12,965-Speed 10503.66 samples/sec   Loss 5.3994   LearningRate 0.0020   Epoch: 34   Global Step: 173740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:44:13,973-Speed 10167.04 samples/sec   Loss 5.3396   LearningRate 0.0020   Epoch: 34   Global Step: 173750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:44:14,982-Speed 10154.85 samples/sec   Loss 5.2923   LearningRate 0.0020   Epoch: 34   Global Step: 173760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:44:15,947-Speed 10626.83 samples/sec   Loss 5.4764   LearningRate 0.0020   Epoch: 34   Global Step: 173770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:44:16,927-Speed 10458.98 samples/sec   Loss 5.4029   LearningRate 0.0020   Epoch: 34   Global Step: 173780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:44:17,923-Speed 10286.62 samples/sec   Loss 5.2792   LearningRate 0.0020   Epoch: 34   Global Step: 173790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:44:18,951-Speed 9969.76 samples/sec   Loss 5.4819   LearningRate 0.0020   Epoch: 34   Global Step: 173800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:44:19,935-Speed 10409.68 samples/sec   Loss 5.2645   LearningRate 0.0020   Epoch: 34   Global Step: 173810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:44:20,911-Speed 10506.89 samples/sec   Loss 5.4093   LearningRate 0.0020   Epoch: 34   Global Step: 173820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:44:21,915-Speed 10209.15 samples/sec   Loss 5.4033   LearningRate 0.0020   Epoch: 34   Global Step: 173830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:44:22,993-Speed 9504.05 samples/sec   Loss 5.4086   LearningRate 0.0020   Epoch: 34   Global Step: 173840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:44:23,948-Speed 10741.65 samples/sec   Loss 5.2447   LearningRate 0.0020   Epoch: 34   Global Step: 173850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:44:24,951-Speed 10216.97 samples/sec   Loss 5.4433   LearningRate 0.0020   Epoch: 34   Global Step: 173860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:44:25,908-Speed 10705.38 samples/sec   Loss 5.2856   LearningRate 0.0020   Epoch: 34   Global Step: 173870   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 05:44:26,924-Speed 10084.80 samples/sec   Loss 5.3233   LearningRate 0.0020   Epoch: 34   Global Step: 173880   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 05:44:27,938-Speed 10112.82 samples/sec   Loss 5.4222   LearningRate 0.0020   Epoch: 34   Global Step: 173890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:44:28,922-Speed 10416.42 samples/sec   Loss 5.4554   LearningRate 0.0020   Epoch: 34   Global Step: 173900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:44:29,872-Speed 10785.33 samples/sec   Loss 5.3322   LearningRate 0.0020   Epoch: 34   Global Step: 173910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:44:30,906-Speed 9916.62 samples/sec   Loss 5.3299   LearningRate 0.0020   Epoch: 34   Global Step: 173920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:44:31,889-Speed 10431.24 samples/sec   Loss 5.4178   LearningRate 0.0020   Epoch: 34   Global Step: 173930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:44:32,883-Speed 10307.49 samples/sec   Loss 5.4671   LearningRate 0.0020   Epoch: 34   Global Step: 173940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:44:33,942-Speed 9674.07 samples/sec   Loss 5.3779   LearningRate 0.0020   Epoch: 34   Global Step: 173950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:44:34,923-Speed 10449.83 samples/sec   Loss 5.3297   LearningRate 0.0020   Epoch: 34   Global Step: 173960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:44:35,864-Speed 10890.68 samples/sec   Loss 5.3670   LearningRate 0.0020   Epoch: 34   Global Step: 173970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:44:36,859-Speed 10298.84 samples/sec   Loss 5.3970   LearningRate 0.0020   Epoch: 34   Global Step: 173980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:44:37,854-Speed 10303.55 samples/sec   Loss 5.6227   LearningRate 0.0020   Epoch: 34   Global Step: 173990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:44:38,855-Speed 10240.73 samples/sec   Loss 5.2024   LearningRate 0.0020   Epoch: 34   Global Step: 174000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:45:00,944-[lfw][174000]XNorm: 8.122557
Training: 2022-04-11 05:45:00,945-[lfw][174000]Accuracy-Flip: 0.99617+-0.00317
Training: 2022-04-11 05:45:00,946-[lfw][174000]Accuracy-Highest: 0.99700
Training: 2022-04-11 05:45:26,615-[cfp_fp][174000]XNorm: 7.012655
Training: 2022-04-11 05:45:26,616-[cfp_fp][174000]Accuracy-Flip: 0.97157+-0.00877
Training: 2022-04-11 05:45:26,617-[cfp_fp][174000]Accuracy-Highest: 0.97300
Training: 2022-04-11 05:45:48,820-[agedb_30][174000]XNorm: 7.941196
Training: 2022-04-11 05:45:48,820-[agedb_30][174000]Accuracy-Flip: 0.97183+-0.00652
Training: 2022-04-11 05:45:48,821-[agedb_30][174000]Accuracy-Highest: 0.97300
Training: 2022-04-11 05:45:49,807-Speed 144.32 samples/sec   Loss 5.3753   LearningRate 0.0020   Epoch: 34   Global Step: 174010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:45:50,761-Speed 10742.77 samples/sec   Loss 5.3334   LearningRate 0.0020   Epoch: 34   Global Step: 174020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:45:51,764-Speed 10214.07 samples/sec   Loss 5.4255   LearningRate 0.0020   Epoch: 34   Global Step: 174030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:45:52,779-Speed 10094.57 samples/sec   Loss 5.3075   LearningRate 0.0020   Epoch: 34   Global Step: 174040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:45:53,769-Speed 10359.55 samples/sec   Loss 5.4313   LearningRate 0.0020   Epoch: 34   Global Step: 174050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:45:54,723-Speed 10742.48 samples/sec   Loss 5.3629   LearningRate 0.0020   Epoch: 34   Global Step: 174060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:45:55,706-Speed 10432.42 samples/sec   Loss 5.4206   LearningRate 0.0019   Epoch: 34   Global Step: 174070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:45:56,727-Speed 10033.82 samples/sec   Loss 5.3525   LearningRate 0.0019   Epoch: 34   Global Step: 174080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:45:57,684-Speed 10707.02 samples/sec   Loss 5.2914   LearningRate 0.0019   Epoch: 34   Global Step: 174090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:45:58,681-Speed 10302.02 samples/sec   Loss 5.2717   LearningRate 0.0019   Epoch: 34   Global Step: 174100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:45:59,697-Speed 10094.65 samples/sec   Loss 5.3094   LearningRate 0.0019   Epoch: 34   Global Step: 174110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:46:00,674-Speed 10490.33 samples/sec   Loss 5.4428   LearningRate 0.0019   Epoch: 34   Global Step: 174120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:46:01,672-Speed 10271.05 samples/sec   Loss 5.4408   LearningRate 0.0019   Epoch: 34   Global Step: 174130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:46:02,716-Speed 9819.93 samples/sec   Loss 5.4346   LearningRate 0.0019   Epoch: 34   Global Step: 174140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:46:03,694-Speed 10476.15 samples/sec   Loss 5.3826   LearningRate 0.0019   Epoch: 34   Global Step: 174150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:46:04,631-Speed 10934.30 samples/sec   Loss 5.3148   LearningRate 0.0019   Epoch: 34   Global Step: 174160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:46:05,632-Speed 10235.20 samples/sec   Loss 5.4072   LearningRate 0.0019   Epoch: 34   Global Step: 174170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:46:06,584-Speed 10760.90 samples/sec   Loss 5.4856   LearningRate 0.0019   Epoch: 34   Global Step: 174180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:46:07,634-Speed 9757.50 samples/sec   Loss 5.3334   LearningRate 0.0019   Epoch: 34   Global Step: 174190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:46:08,629-Speed 10305.87 samples/sec   Loss 5.3748   LearningRate 0.0019   Epoch: 34   Global Step: 174200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:46:09,604-Speed 10514.10 samples/sec   Loss 5.4275   LearningRate 0.0019   Epoch: 34   Global Step: 174210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:46:10,590-Speed 10402.76 samples/sec   Loss 5.4846   LearningRate 0.0019   Epoch: 34   Global Step: 174220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:46:11,585-Speed 10307.23 samples/sec   Loss 5.4910   LearningRate 0.0019   Epoch: 34   Global Step: 174230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:46:12,596-Speed 10126.95 samples/sec   Loss 5.3710   LearningRate 0.0019   Epoch: 34   Global Step: 174240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:46:13,604-Speed 10174.67 samples/sec   Loss 5.3694   LearningRate 0.0019   Epoch: 34   Global Step: 174250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:46:14,584-Speed 10455.54 samples/sec   Loss 5.4962   LearningRate 0.0019   Epoch: 34   Global Step: 174260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:46:15,599-Speed 10105.47 samples/sec   Loss 5.4083   LearningRate 0.0019   Epoch: 34   Global Step: 174270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:46:16,546-Speed 10813.25 samples/sec   Loss 5.3581   LearningRate 0.0019   Epoch: 34   Global Step: 174280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:46:17,584-Speed 9880.60 samples/sec   Loss 5.3591   LearningRate 0.0019   Epoch: 34   Global Step: 174290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:46:18,608-Speed 10002.25 samples/sec   Loss 5.4876   LearningRate 0.0019   Epoch: 34   Global Step: 174300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:46:19,588-Speed 10458.74 samples/sec   Loss 5.5524   LearningRate 0.0019   Epoch: 34   Global Step: 174310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:46:20,583-Speed 10296.45 samples/sec   Loss 5.2706   LearningRate 0.0019   Epoch: 34   Global Step: 174320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:46:21,584-Speed 10245.70 samples/sec   Loss 5.3949   LearningRate 0.0019   Epoch: 34   Global Step: 174330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:46:22,617-Speed 9922.86 samples/sec   Loss 5.4891   LearningRate 0.0019   Epoch: 34   Global Step: 174340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:46:23,586-Speed 10573.67 samples/sec   Loss 5.4027   LearningRate 0.0019   Epoch: 34   Global Step: 174350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:46:24,602-Speed 10092.51 samples/sec   Loss 5.2795   LearningRate 0.0019   Epoch: 34   Global Step: 174360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:46:25,603-Speed 10241.29 samples/sec   Loss 5.3927   LearningRate 0.0019   Epoch: 34   Global Step: 174370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:46:26,567-Speed 10632.95 samples/sec   Loss 5.3095   LearningRate 0.0019   Epoch: 34   Global Step: 174380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:46:27,553-Speed 10398.95 samples/sec   Loss 5.3820   LearningRate 0.0019   Epoch: 34   Global Step: 174390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:46:28,565-Speed 10127.97 samples/sec   Loss 5.4687   LearningRate 0.0019   Epoch: 34   Global Step: 174400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:46:29,565-Speed 10248.61 samples/sec   Loss 5.4431   LearningRate 0.0019   Epoch: 34   Global Step: 174410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:46:30,566-Speed 10233.91 samples/sec   Loss 5.4300   LearningRate 0.0019   Epoch: 34   Global Step: 174420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:46:31,567-Speed 10241.78 samples/sec   Loss 5.4992   LearningRate 0.0019   Epoch: 34   Global Step: 174430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:46:32,582-Speed 10104.14 samples/sec   Loss 5.3240   LearningRate 0.0019   Epoch: 34   Global Step: 174440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:46:33,550-Speed 10590.67 samples/sec   Loss 5.5683   LearningRate 0.0019   Epoch: 34   Global Step: 174450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:46:34,539-Speed 10366.80 samples/sec   Loss 5.3745   LearningRate 0.0019   Epoch: 34   Global Step: 174460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:46:35,581-Speed 9829.63 samples/sec   Loss 5.4299   LearningRate 0.0019   Epoch: 34   Global Step: 174470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:46:36,529-Speed 10816.18 samples/sec   Loss 5.3781   LearningRate 0.0019   Epoch: 34   Global Step: 174480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:46:37,467-Speed 10927.60 samples/sec   Loss 5.5933   LearningRate 0.0019   Epoch: 34   Global Step: 174490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:46:38,428-Speed 10665.84 samples/sec   Loss 5.4401   LearningRate 0.0019   Epoch: 34   Global Step: 174500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:46:39,400-Speed 10547.73 samples/sec   Loss 5.5816   LearningRate 0.0019   Epoch: 34   Global Step: 174510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:46:40,453-Speed 9727.83 samples/sec   Loss 5.3611   LearningRate 0.0019   Epoch: 34   Global Step: 174520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:46:41,478-Speed 10001.33 samples/sec   Loss 5.4261   LearningRate 0.0019   Epoch: 34   Global Step: 174530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:46:42,443-Speed 10624.68 samples/sec   Loss 5.4414   LearningRate 0.0019   Epoch: 34   Global Step: 174540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:46:43,446-Speed 10217.38 samples/sec   Loss 5.3356   LearningRate 0.0019   Epoch: 34   Global Step: 174550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:46:44,455-Speed 10156.08 samples/sec   Loss 5.2968   LearningRate 0.0019   Epoch: 34   Global Step: 174560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:46:45,452-Speed 10285.20 samples/sec   Loss 5.3725   LearningRate 0.0019   Epoch: 34   Global Step: 174570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:46:46,451-Speed 10256.51 samples/sec   Loss 5.2398   LearningRate 0.0019   Epoch: 34   Global Step: 174580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:46:47,434-Speed 10429.02 samples/sec   Loss 5.3658   LearningRate 0.0019   Epoch: 34   Global Step: 174590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:46:48,519-Speed 9446.74 samples/sec   Loss 5.3612   LearningRate 0.0019   Epoch: 34   Global Step: 174600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:46:49,500-Speed 10452.07 samples/sec   Loss 5.3995   LearningRate 0.0019   Epoch: 34   Global Step: 174610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:46:50,519-Speed 10058.50 samples/sec   Loss 5.4984   LearningRate 0.0019   Epoch: 34   Global Step: 174620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:46:51,561-Speed 9828.79 samples/sec   Loss 5.3432   LearningRate 0.0019   Epoch: 34   Global Step: 174630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:46:52,539-Speed 10478.72 samples/sec   Loss 5.4409   LearningRate 0.0019   Epoch: 34   Global Step: 174640   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 05:46:53,508-Speed 10588.07 samples/sec   Loss 5.5008   LearningRate 0.0019   Epoch: 34   Global Step: 174650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:46:54,505-Speed 10273.58 samples/sec   Loss 5.3249   LearningRate 0.0019   Epoch: 34   Global Step: 174660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:46:55,523-Speed 10071.44 samples/sec   Loss 5.5477   LearningRate 0.0019   Epoch: 34   Global Step: 174670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:46:56,557-Speed 9904.69 samples/sec   Loss 5.4832   LearningRate 0.0019   Epoch: 34   Global Step: 174680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:46:57,507-Speed 10799.91 samples/sec   Loss 5.3509   LearningRate 0.0019   Epoch: 34   Global Step: 174690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:46:58,438-Speed 11004.66 samples/sec   Loss 5.4081   LearningRate 0.0019   Epoch: 34   Global Step: 174700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:46:59,408-Speed 10571.78 samples/sec   Loss 5.4200   LearningRate 0.0019   Epoch: 34   Global Step: 174710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:47:00,433-Speed 9999.32 samples/sec   Loss 5.4091   LearningRate 0.0019   Epoch: 34   Global Step: 174720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:47:01,437-Speed 10206.58 samples/sec   Loss 5.4814   LearningRate 0.0019   Epoch: 34   Global Step: 174730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:47:02,419-Speed 10438.70 samples/sec   Loss 5.2648   LearningRate 0.0019   Epoch: 34   Global Step: 174740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:47:03,416-Speed 10278.99 samples/sec   Loss 5.5297   LearningRate 0.0019   Epoch: 34   Global Step: 174750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:47:04,416-Speed 10251.28 samples/sec   Loss 5.3275   LearningRate 0.0019   Epoch: 34   Global Step: 174760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:47:05,415-Speed 10255.15 samples/sec   Loss 5.4368   LearningRate 0.0019   Epoch: 34   Global Step: 174770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:47:06,400-Speed 10401.23 samples/sec   Loss 5.3099   LearningRate 0.0019   Epoch: 34   Global Step: 174780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:47:07,385-Speed 10408.23 samples/sec   Loss 5.3188   LearningRate 0.0019   Epoch: 34   Global Step: 174790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:47:08,356-Speed 10556.27 samples/sec   Loss 5.4585   LearningRate 0.0019   Epoch: 34   Global Step: 174800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:47:09,344-Speed 10378.28 samples/sec   Loss 5.3676   LearningRate 0.0018   Epoch: 34   Global Step: 174810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:47:10,297-Speed 10747.74 samples/sec   Loss 5.5706   LearningRate 0.0018   Epoch: 34   Global Step: 174820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:47:11,290-Speed 10328.58 samples/sec   Loss 5.2217   LearningRate 0.0018   Epoch: 34   Global Step: 174830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:47:12,273-Speed 10429.71 samples/sec   Loss 5.4050   LearningRate 0.0018   Epoch: 34   Global Step: 174840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:47:13,223-Speed 10786.96 samples/sec   Loss 5.3241   LearningRate 0.0018   Epoch: 34   Global Step: 174850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:47:14,213-Speed 10357.45 samples/sec   Loss 5.2544   LearningRate 0.0018   Epoch: 34   Global Step: 174860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:47:15,230-Speed 10074.55 samples/sec   Loss 5.4546   LearningRate 0.0018   Epoch: 34   Global Step: 174870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:47:16,184-Speed 10741.20 samples/sec   Loss 5.4119   LearningRate 0.0018   Epoch: 34   Global Step: 174880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:47:17,183-Speed 10262.67 samples/sec   Loss 5.5034   LearningRate 0.0018   Epoch: 34   Global Step: 174890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:47:18,164-Speed 10447.15 samples/sec   Loss 5.4304   LearningRate 0.0018   Epoch: 34   Global Step: 174900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:47:19,238-Speed 9541.34 samples/sec   Loss 5.3171   LearningRate 0.0018   Epoch: 34   Global Step: 174910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:47:20,205-Speed 10621.82 samples/sec   Loss 5.3726   LearningRate 0.0018   Epoch: 34   Global Step: 174920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:47:21,246-Speed 9841.50 samples/sec   Loss 5.5016   LearningRate 0.0018   Epoch: 34   Global Step: 174930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:47:22,235-Speed 10365.63 samples/sec   Loss 5.4353   LearningRate 0.0018   Epoch: 34   Global Step: 174940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:47:23,263-Speed 9979.69 samples/sec   Loss 5.4082   LearningRate 0.0018   Epoch: 34   Global Step: 174950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:47:24,299-Speed 9884.49 samples/sec   Loss 5.5241   LearningRate 0.0018   Epoch: 34   Global Step: 174960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:47:25,313-Speed 10117.49 samples/sec   Loss 5.4867   LearningRate 0.0018   Epoch: 34   Global Step: 174970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:47:26,302-Speed 10363.19 samples/sec   Loss 5.4925   LearningRate 0.0018   Epoch: 34   Global Step: 174980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:47:27,255-Speed 10756.45 samples/sec   Loss 5.3287   LearningRate 0.0018   Epoch: 34   Global Step: 174990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:47:28,255-Speed 10256.49 samples/sec   Loss 5.4388   LearningRate 0.0018   Epoch: 34   Global Step: 175000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:47:29,223-Speed 10591.95 samples/sec   Loss 5.3635   LearningRate 0.0018   Epoch: 34   Global Step: 175010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:47:30,233-Speed 10145.28 samples/sec   Loss 5.4427   LearningRate 0.0018   Epoch: 34   Global Step: 175020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:47:31,216-Speed 10417.91 samples/sec   Loss 5.5021   LearningRate 0.0018   Epoch: 34   Global Step: 175030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:47:32,302-Speed 9437.42 samples/sec   Loss 5.4109   LearningRate 0.0018   Epoch: 34   Global Step: 175040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:47:33,289-Speed 10395.39 samples/sec   Loss 5.4163   LearningRate 0.0018   Epoch: 34   Global Step: 175050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:47:34,241-Speed 10767.33 samples/sec   Loss 5.5116   LearningRate 0.0018   Epoch: 34   Global Step: 175060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:47:35,256-Speed 10094.91 samples/sec   Loss 5.5978   LearningRate 0.0018   Epoch: 34   Global Step: 175070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:47:36,233-Speed 10488.33 samples/sec   Loss 5.2978   LearningRate 0.0018   Epoch: 34   Global Step: 175080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:47:37,202-Speed 10574.72 samples/sec   Loss 5.4259   LearningRate 0.0018   Epoch: 34   Global Step: 175090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:47:38,208-Speed 10197.89 samples/sec   Loss 5.5404   LearningRate 0.0018   Epoch: 34   Global Step: 175100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:47:39,267-Speed 9668.99 samples/sec   Loss 5.4187   LearningRate 0.0018   Epoch: 34   Global Step: 175110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:47:40,289-Speed 10033.84 samples/sec   Loss 5.3760   LearningRate 0.0018   Epoch: 34   Global Step: 175120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:47:41,265-Speed 10500.84 samples/sec   Loss 5.4517   LearningRate 0.0018   Epoch: 34   Global Step: 175130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:47:42,238-Speed 10530.61 samples/sec   Loss 5.3720   LearningRate 0.0018   Epoch: 34   Global Step: 175140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:47:43,246-Speed 10174.44 samples/sec   Loss 5.4561   LearningRate 0.0018   Epoch: 34   Global Step: 175150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:47:44,248-Speed 10232.92 samples/sec   Loss 5.5721   LearningRate 0.0018   Epoch: 34   Global Step: 175160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:47:45,237-Speed 10357.17 samples/sec   Loss 5.5603   LearningRate 0.0018   Epoch: 34   Global Step: 175170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:47:46,204-Speed 10606.16 samples/sec   Loss 5.5041   LearningRate 0.0018   Epoch: 34   Global Step: 175180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:47:47,184-Speed 10446.48 samples/sec   Loss 5.3921   LearningRate 0.0018   Epoch: 34   Global Step: 175190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:47:48,191-Speed 10184.14 samples/sec   Loss 5.4041   LearningRate 0.0018   Epoch: 34   Global Step: 175200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:47:49,186-Speed 10301.67 samples/sec   Loss 5.4634   LearningRate 0.0018   Epoch: 34   Global Step: 175210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:47:50,188-Speed 10232.38 samples/sec   Loss 5.3947   LearningRate 0.0018   Epoch: 34   Global Step: 175220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:47:51,200-Speed 10128.25 samples/sec   Loss 5.4194   LearningRate 0.0018   Epoch: 34   Global Step: 175230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:47:52,192-Speed 10331.22 samples/sec   Loss 5.4071   LearningRate 0.0018   Epoch: 34   Global Step: 175240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:47:53,167-Speed 10505.06 samples/sec   Loss 5.3972   LearningRate 0.0018   Epoch: 34   Global Step: 175250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:47:54,235-Speed 9601.92 samples/sec   Loss 5.4746   LearningRate 0.0018   Epoch: 34   Global Step: 175260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:47:55,231-Speed 10294.67 samples/sec   Loss 5.3233   LearningRate 0.0018   Epoch: 34   Global Step: 175270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:47:56,199-Speed 10592.94 samples/sec   Loss 5.6005   LearningRate 0.0018   Epoch: 34   Global Step: 175280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:47:57,177-Speed 10471.66 samples/sec   Loss 5.4220   LearningRate 0.0018   Epoch: 34   Global Step: 175290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:47:58,260-Speed 9469.51 samples/sec   Loss 5.4508   LearningRate 0.0018   Epoch: 34   Global Step: 175300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:47:59,210-Speed 10790.96 samples/sec   Loss 5.5299   LearningRate 0.0018   Epoch: 34   Global Step: 175310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:48:00,200-Speed 10351.63 samples/sec   Loss 5.4646   LearningRate 0.0018   Epoch: 34   Global Step: 175320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:48:01,238-Speed 9875.24 samples/sec   Loss 5.4373   LearningRate 0.0018   Epoch: 34   Global Step: 175330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:48:02,239-Speed 10245.57 samples/sec   Loss 5.4653   LearningRate 0.0018   Epoch: 34   Global Step: 175340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:48:03,221-Speed 10437.63 samples/sec   Loss 5.3623   LearningRate 0.0018   Epoch: 34   Global Step: 175350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:48:04,229-Speed 10167.86 samples/sec   Loss 5.4181   LearningRate 0.0018   Epoch: 34   Global Step: 175360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:48:05,220-Speed 10346.91 samples/sec   Loss 5.3737   LearningRate 0.0018   Epoch: 34   Global Step: 175370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:48:06,200-Speed 10454.47 samples/sec   Loss 5.5174   LearningRate 0.0018   Epoch: 34   Global Step: 175380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:48:07,171-Speed 10559.36 samples/sec   Loss 5.4042   LearningRate 0.0018   Epoch: 34   Global Step: 175390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:48:08,167-Speed 10290.84 samples/sec   Loss 5.4964   LearningRate 0.0018   Epoch: 34   Global Step: 175400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:48:09,152-Speed 10401.02 samples/sec   Loss 5.4591   LearningRate 0.0018   Epoch: 34   Global Step: 175410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:48:10,174-Speed 10031.32 samples/sec   Loss 5.3772   LearningRate 0.0018   Epoch: 34   Global Step: 175420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:48:11,110-Speed 10949.23 samples/sec   Loss 5.4000   LearningRate 0.0018   Epoch: 34   Global Step: 175430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:48:12,079-Speed 10576.91 samples/sec   Loss 5.5142   LearningRate 0.0018   Epoch: 34   Global Step: 175440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:48:13,032-Speed 10757.54 samples/sec   Loss 5.3649   LearningRate 0.0018   Epoch: 34   Global Step: 175450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:48:14,022-Speed 10351.22 samples/sec   Loss 5.3926   LearningRate 0.0018   Epoch: 34   Global Step: 175460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:48:14,993-Speed 10552.17 samples/sec   Loss 5.4466   LearningRate 0.0018   Epoch: 34   Global Step: 175470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:48:15,984-Speed 10346.95 samples/sec   Loss 5.4418   LearningRate 0.0018   Epoch: 34   Global Step: 175480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:48:17,009-Speed 9995.09 samples/sec   Loss 5.3649   LearningRate 0.0018   Epoch: 34   Global Step: 175490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:48:17,995-Speed 10396.47 samples/sec   Loss 5.5279   LearningRate 0.0018   Epoch: 34   Global Step: 175500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:48:18,946-Speed 10784.81 samples/sec   Loss 5.5018   LearningRate 0.0018   Epoch: 34   Global Step: 175510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:48:19,966-Speed 10039.31 samples/sec   Loss 5.3219   LearningRate 0.0018   Epoch: 34   Global Step: 175520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:48:21,018-Speed 9750.98 samples/sec   Loss 5.5991   LearningRate 0.0018   Epoch: 34   Global Step: 175530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:48:21,985-Speed 10594.00 samples/sec   Loss 5.4541   LearningRate 0.0018   Epoch: 34   Global Step: 175540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:48:22,979-Speed 10329.31 samples/sec   Loss 5.5001   LearningRate 0.0018   Epoch: 34   Global Step: 175550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:48:24,013-Speed 9924.79 samples/sec   Loss 5.4582   LearningRate 0.0017   Epoch: 34   Global Step: 175560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:48:25,015-Speed 10232.13 samples/sec   Loss 5.4354   LearningRate 0.0017   Epoch: 34   Global Step: 175570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:48:25,992-Speed 10494.78 samples/sec   Loss 5.4628   LearningRate 0.0017   Epoch: 34   Global Step: 175580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:48:27,041-Speed 9768.40 samples/sec   Loss 5.3069   LearningRate 0.0017   Epoch: 34   Global Step: 175590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:48:28,046-Speed 10193.90 samples/sec   Loss 5.4739   LearningRate 0.0017   Epoch: 34   Global Step: 175600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:48:29,033-Speed 10384.97 samples/sec   Loss 5.4889   LearningRate 0.0017   Epoch: 34   Global Step: 175610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:48:30,006-Speed 10536.43 samples/sec   Loss 5.3602   LearningRate 0.0017   Epoch: 34   Global Step: 175620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:48:31,092-Speed 9438.41 samples/sec   Loss 5.5074   LearningRate 0.0017   Epoch: 34   Global Step: 175630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:48:32,054-Speed 10659.31 samples/sec   Loss 5.3703   LearningRate 0.0017   Epoch: 34   Global Step: 175640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:48:33,057-Speed 10216.28 samples/sec   Loss 5.2902   LearningRate 0.0017   Epoch: 34   Global Step: 175650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:48:34,072-Speed 10092.54 samples/sec   Loss 5.4466   LearningRate 0.0017   Epoch: 34   Global Step: 175660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:48:35,120-Speed 9778.78 samples/sec   Loss 5.4599   LearningRate 0.0017   Epoch: 34   Global Step: 175670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:48:36,077-Speed 10713.51 samples/sec   Loss 5.3329   LearningRate 0.0017   Epoch: 34   Global Step: 175680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:48:37,011-Speed 10979.07 samples/sec   Loss 5.4856   LearningRate 0.0017   Epoch: 34   Global Step: 175690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:48:38,005-Speed 10305.33 samples/sec   Loss 5.4577   LearningRate 0.0017   Epoch: 34   Global Step: 175700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:48:39,039-Speed 9913.05 samples/sec   Loss 5.4429   LearningRate 0.0017   Epoch: 34   Global Step: 175710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:48:40,036-Speed 10291.53 samples/sec   Loss 5.4208   LearningRate 0.0017   Epoch: 34   Global Step: 175720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:48:41,034-Speed 10279.31 samples/sec   Loss 5.5335   LearningRate 0.0017   Epoch: 34   Global Step: 175730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:48:42,029-Speed 10298.73 samples/sec   Loss 5.4073   LearningRate 0.0017   Epoch: 34   Global Step: 175740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:48:43,014-Speed 10402.29 samples/sec   Loss 5.4702   LearningRate 0.0017   Epoch: 34   Global Step: 175750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:48:43,979-Speed 10629.77 samples/sec   Loss 5.6114   LearningRate 0.0017   Epoch: 34   Global Step: 175760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:48:44,962-Speed 10420.80 samples/sec   Loss 5.5181   LearningRate 0.0017   Epoch: 34   Global Step: 175770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:48:45,973-Speed 10137.75 samples/sec   Loss 5.3803   LearningRate 0.0017   Epoch: 34   Global Step: 175780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:48:47,076-Speed 9288.39 samples/sec   Loss 5.4764   LearningRate 0.0017   Epoch: 34   Global Step: 175790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:48:48,058-Speed 10442.53 samples/sec   Loss 5.4298   LearningRate 0.0017   Epoch: 34   Global Step: 175800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:48:49,042-Speed 10421.58 samples/sec   Loss 5.3988   LearningRate 0.0017   Epoch: 34   Global Step: 175810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:48:50,075-Speed 9924.19 samples/sec   Loss 5.5266   LearningRate 0.0017   Epoch: 34   Global Step: 175820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:48:51,129-Speed 9721.70 samples/sec   Loss 5.5436   LearningRate 0.0017   Epoch: 34   Global Step: 175830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:48:52,156-Speed 9981.32 samples/sec   Loss 5.4977   LearningRate 0.0017   Epoch: 34   Global Step: 175840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:48:53,157-Speed 10237.97 samples/sec   Loss 5.4759   LearningRate 0.0017   Epoch: 34   Global Step: 175850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:48:54,146-Speed 10362.61 samples/sec   Loss 5.4233   LearningRate 0.0017   Epoch: 34   Global Step: 175860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:48:55,158-Speed 10138.56 samples/sec   Loss 5.4584   LearningRate 0.0017   Epoch: 34   Global Step: 175870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:48:56,116-Speed 10702.61 samples/sec   Loss 5.5844   LearningRate 0.0017   Epoch: 34   Global Step: 175880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:48:57,079-Speed 10647.63 samples/sec   Loss 5.4505   LearningRate 0.0017   Epoch: 34   Global Step: 175890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:48:58,078-Speed 10251.71 samples/sec   Loss 5.4716   LearningRate 0.0017   Epoch: 34   Global Step: 175900   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 05:48:59,134-Speed 9712.73 samples/sec   Loss 5.4855   LearningRate 0.0017   Epoch: 34   Global Step: 175910   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 05:49:00,119-Speed 10404.30 samples/sec   Loss 5.4351   LearningRate 0.0017   Epoch: 34   Global Step: 175920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:49:01,173-Speed 9724.57 samples/sec   Loss 5.4664   LearningRate 0.0017   Epoch: 34   Global Step: 175930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:49:02,207-Speed 9914.56 samples/sec   Loss 5.3051   LearningRate 0.0017   Epoch: 34   Global Step: 175940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:49:03,172-Speed 10623.55 samples/sec   Loss 5.4343   LearningRate 0.0017   Epoch: 34   Global Step: 175950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:49:04,217-Speed 9804.52 samples/sec   Loss 5.4357   LearningRate 0.0017   Epoch: 34   Global Step: 175960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:49:05,291-Speed 9548.57 samples/sec   Loss 5.5249   LearningRate 0.0017   Epoch: 34   Global Step: 175970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:49:06,234-Speed 10873.55 samples/sec   Loss 5.3013   LearningRate 0.0017   Epoch: 34   Global Step: 175980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:49:07,188-Speed 10741.77 samples/sec   Loss 5.5376   LearningRate 0.0017   Epoch: 34   Global Step: 175990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:49:08,198-Speed 10143.86 samples/sec   Loss 5.2057   LearningRate 0.0017   Epoch: 34   Global Step: 176000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:49:30,242-[lfw][176000]XNorm: 8.090695
Training: 2022-04-11 05:49:30,243-[lfw][176000]Accuracy-Flip: 0.99667+-0.00333
Training: 2022-04-11 05:49:30,243-[lfw][176000]Accuracy-Highest: 0.99700
Training: 2022-04-11 05:49:55,902-[cfp_fp][176000]XNorm: 6.995538
Training: 2022-04-11 05:49:55,903-[cfp_fp][176000]Accuracy-Flip: 0.97371+-0.00933
Training: 2022-04-11 05:49:55,904-[cfp_fp][176000]Accuracy-Highest: 0.97371
Training: 2022-04-11 05:50:18,176-[agedb_30][176000]XNorm: 7.918708
Training: 2022-04-11 05:50:18,177-[agedb_30][176000]Accuracy-Flip: 0.97317+-0.00639
Training: 2022-04-11 05:50:18,178-[agedb_30][176000]Accuracy-Highest: 0.97317
Training: 2022-04-11 05:50:19,130-Speed 144.36 samples/sec   Loss 5.4692   LearningRate 0.0017   Epoch: 34   Global Step: 176010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:50:20,107-Speed 10496.47 samples/sec   Loss 5.3457   LearningRate 0.0017   Epoch: 34   Global Step: 176020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:50:21,107-Speed 10241.87 samples/sec   Loss 5.4308   LearningRate 0.0017   Epoch: 34   Global Step: 176030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:50:22,160-Speed 9739.69 samples/sec   Loss 5.5051   LearningRate 0.0017   Epoch: 34   Global Step: 176040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:50:23,150-Speed 10352.31 samples/sec   Loss 5.3666   LearningRate 0.0017   Epoch: 34   Global Step: 176050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:50:24,142-Speed 10331.04 samples/sec   Loss 5.4697   LearningRate 0.0017   Epoch: 34   Global Step: 176060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:50:25,147-Speed 10193.82 samples/sec   Loss 5.3595   LearningRate 0.0017   Epoch: 34   Global Step: 176070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:50:26,156-Speed 10174.83 samples/sec   Loss 5.5294   LearningRate 0.0017   Epoch: 34   Global Step: 176080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:50:27,151-Speed 10304.23 samples/sec   Loss 5.5421   LearningRate 0.0017   Epoch: 34   Global Step: 176090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:50:28,180-Speed 9960.62 samples/sec   Loss 5.6091   LearningRate 0.0017   Epoch: 34   Global Step: 176100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:50:29,234-Speed 9721.12 samples/sec   Loss 5.3052   LearningRate 0.0017   Epoch: 34   Global Step: 176110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:50:30,248-Speed 10106.17 samples/sec   Loss 5.4448   LearningRate 0.0017   Epoch: 34   Global Step: 176120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:50:31,255-Speed 10176.64 samples/sec   Loss 5.4938   LearningRate 0.0017   Epoch: 34   Global Step: 176130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:50:32,325-Speed 9575.03 samples/sec   Loss 5.4952   LearningRate 0.0017   Epoch: 34   Global Step: 176140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:50:33,327-Speed 10228.09 samples/sec   Loss 5.4466   LearningRate 0.0017   Epoch: 34   Global Step: 176150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:50:34,318-Speed 10339.49 samples/sec   Loss 5.5978   LearningRate 0.0017   Epoch: 34   Global Step: 176160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:50:35,310-Speed 10334.70 samples/sec   Loss 5.3674   LearningRate 0.0017   Epoch: 34   Global Step: 176170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:50:36,269-Speed 10684.34 samples/sec   Loss 5.4501   LearningRate 0.0017   Epoch: 34   Global Step: 176180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:50:37,317-Speed 9783.15 samples/sec   Loss 5.4846   LearningRate 0.0017   Epoch: 34   Global Step: 176190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:50:38,339-Speed 10028.75 samples/sec   Loss 5.4841   LearningRate 0.0017   Epoch: 34   Global Step: 176200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:50:39,318-Speed 10471.89 samples/sec   Loss 5.4345   LearningRate 0.0017   Epoch: 34   Global Step: 176210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:50:40,323-Speed 10200.37 samples/sec   Loss 5.3115   LearningRate 0.0017   Epoch: 34   Global Step: 176220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:50:41,359-Speed 9886.80 samples/sec   Loss 5.5107   LearningRate 0.0017   Epoch: 34   Global Step: 176230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:50:42,336-Speed 10499.27 samples/sec   Loss 5.4219   LearningRate 0.0017   Epoch: 34   Global Step: 176240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:50:43,292-Speed 10723.08 samples/sec   Loss 5.2809   LearningRate 0.0017   Epoch: 34   Global Step: 176250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:50:44,253-Speed 10661.52 samples/sec   Loss 5.6102   LearningRate 0.0017   Epoch: 34   Global Step: 176260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:50:45,273-Speed 10042.98 samples/sec   Loss 5.4972   LearningRate 0.0017   Epoch: 34   Global Step: 176270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:50:46,222-Speed 10797.04 samples/sec   Loss 5.4108   LearningRate 0.0017   Epoch: 34   Global Step: 176280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:50:47,216-Speed 10316.94 samples/sec   Loss 5.4740   LearningRate 0.0017   Epoch: 34   Global Step: 176290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:50:48,182-Speed 10600.26 samples/sec   Loss 5.2682   LearningRate 0.0017   Epoch: 34   Global Step: 176300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:50:49,184-Speed 10227.07 samples/sec   Loss 5.4102   LearningRate 0.0017   Epoch: 34   Global Step: 176310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:50:50,237-Speed 9731.60 samples/sec   Loss 5.5933   LearningRate 0.0017   Epoch: 34   Global Step: 176320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:50:51,208-Speed 10561.86 samples/sec   Loss 5.4141   LearningRate 0.0017   Epoch: 34   Global Step: 176330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:50:52,194-Speed 10390.60 samples/sec   Loss 5.4331   LearningRate 0.0016   Epoch: 34   Global Step: 176340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:50:53,192-Speed 10271.38 samples/sec   Loss 5.4225   LearningRate 0.0016   Epoch: 34   Global Step: 176350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:50:54,200-Speed 10177.98 samples/sec   Loss 5.2545   LearningRate 0.0016   Epoch: 34   Global Step: 176360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:50:55,207-Speed 10178.98 samples/sec   Loss 5.5286   LearningRate 0.0016   Epoch: 34   Global Step: 176370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:50:56,219-Speed 10119.53 samples/sec   Loss 5.5352   LearningRate 0.0016   Epoch: 34   Global Step: 176380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:50:57,203-Speed 10422.96 samples/sec   Loss 5.4077   LearningRate 0.0016   Epoch: 34   Global Step: 176390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:50:58,376-Speed 8732.67 samples/sec   Loss 5.5385   LearningRate 0.0016   Epoch: 34   Global Step: 176400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:50:59,401-Speed 10007.02 samples/sec   Loss 5.4332   LearningRate 0.0016   Epoch: 34   Global Step: 176410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:51:00,385-Speed 10414.25 samples/sec   Loss 5.5435   LearningRate 0.0016   Epoch: 34   Global Step: 176420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:51:01,377-Speed 10331.66 samples/sec   Loss 5.4676   LearningRate 0.0016   Epoch: 34   Global Step: 176430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:51:02,344-Speed 10592.88 samples/sec   Loss 5.5112   LearningRate 0.0016   Epoch: 34   Global Step: 176440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:51:03,348-Speed 10218.08 samples/sec   Loss 5.4259   LearningRate 0.0016   Epoch: 34   Global Step: 176450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:51:04,339-Speed 10338.26 samples/sec   Loss 5.4060   LearningRate 0.0016   Epoch: 34   Global Step: 176460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:51:05,291-Speed 10770.23 samples/sec   Loss 5.6126   LearningRate 0.0016   Epoch: 34   Global Step: 176470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:51:06,272-Speed 10444.19 samples/sec   Loss 5.3748   LearningRate 0.0016   Epoch: 34   Global Step: 176480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:51:07,303-Speed 9939.26 samples/sec   Loss 5.5099   LearningRate 0.0016   Epoch: 34   Global Step: 176490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:51:08,290-Speed 10387.29 samples/sec   Loss 5.3910   LearningRate 0.0016   Epoch: 34   Global Step: 176500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:51:09,316-Speed 9989.26 samples/sec   Loss 5.4541   LearningRate 0.0016   Epoch: 34   Global Step: 176510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:51:10,275-Speed 10686.77 samples/sec   Loss 5.4384   LearningRate 0.0016   Epoch: 34   Global Step: 176520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:51:11,283-Speed 10161.76 samples/sec   Loss 5.4099   LearningRate 0.0016   Epoch: 34   Global Step: 176530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:51:12,226-Speed 10877.01 samples/sec   Loss 5.4554   LearningRate 0.0016   Epoch: 34   Global Step: 176540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:51:13,230-Speed 10202.02 samples/sec   Loss 5.6068   LearningRate 0.0016   Epoch: 34   Global Step: 176550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:51:14,205-Speed 10512.77 samples/sec   Loss 5.4496   LearningRate 0.0016   Epoch: 34   Global Step: 176560   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 05:51:15,226-Speed 10036.19 samples/sec   Loss 5.5073   LearningRate 0.0016   Epoch: 34   Global Step: 176570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:51:16,195-Speed 10572.94 samples/sec   Loss 5.4545   LearningRate 0.0016   Epoch: 34   Global Step: 176580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:51:17,180-Speed 10403.92 samples/sec   Loss 5.4211   LearningRate 0.0016   Epoch: 34   Global Step: 176590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:51:18,172-Speed 10335.10 samples/sec   Loss 5.3786   LearningRate 0.0016   Epoch: 34   Global Step: 176600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:51:19,173-Speed 10242.06 samples/sec   Loss 5.5369   LearningRate 0.0016   Epoch: 34   Global Step: 176610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:51:20,166-Speed 10322.20 samples/sec   Loss 5.4378   LearningRate 0.0016   Epoch: 34   Global Step: 176620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:51:21,170-Speed 10205.30 samples/sec   Loss 5.5321   LearningRate 0.0016   Epoch: 34   Global Step: 176630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:51:22,150-Speed 10455.09 samples/sec   Loss 5.5616   LearningRate 0.0016   Epoch: 34   Global Step: 176640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:51:23,195-Speed 9813.68 samples/sec   Loss 5.4111   LearningRate 0.0016   Epoch: 34   Global Step: 176650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:51:24,165-Speed 10564.56 samples/sec   Loss 5.3897   LearningRate 0.0016   Epoch: 34   Global Step: 176660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:51:25,135-Speed 10569.15 samples/sec   Loss 5.5644   LearningRate 0.0016   Epoch: 34   Global Step: 176670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:51:26,105-Speed 10570.57 samples/sec   Loss 5.5584   LearningRate 0.0016   Epoch: 34   Global Step: 176680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:51:27,169-Speed 9631.74 samples/sec   Loss 5.5274   LearningRate 0.0016   Epoch: 34   Global Step: 176690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:51:28,131-Speed 10658.37 samples/sec   Loss 5.4981   LearningRate 0.0016   Epoch: 34   Global Step: 176700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:51:29,140-Speed 10149.00 samples/sec   Loss 5.5671   LearningRate 0.0016   Epoch: 34   Global Step: 176710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:51:30,196-Speed 9704.52 samples/sec   Loss 5.5616   LearningRate 0.0016   Epoch: 34   Global Step: 176720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:51:31,174-Speed 10484.27 samples/sec   Loss 5.4588   LearningRate 0.0016   Epoch: 34   Global Step: 176730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:51:32,170-Speed 10299.74 samples/sec   Loss 5.6231   LearningRate 0.0016   Epoch: 34   Global Step: 176740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:51:33,182-Speed 10124.78 samples/sec   Loss 5.3446   LearningRate 0.0016   Epoch: 34   Global Step: 176750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:51:34,219-Speed 9876.39 samples/sec   Loss 5.4507   LearningRate 0.0016   Epoch: 34   Global Step: 176760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:51:35,188-Speed 10582.66 samples/sec   Loss 5.4957   LearningRate 0.0016   Epoch: 34   Global Step: 176770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:51:36,206-Speed 10070.21 samples/sec   Loss 5.5431   LearningRate 0.0016   Epoch: 34   Global Step: 176780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:51:37,176-Speed 10561.37 samples/sec   Loss 5.3503   LearningRate 0.0016   Epoch: 34   Global Step: 176790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:51:38,217-Speed 9837.11 samples/sec   Loss 5.5259   LearningRate 0.0016   Epoch: 34   Global Step: 176800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:51:39,194-Speed 10500.97 samples/sec   Loss 5.4869   LearningRate 0.0016   Epoch: 34   Global Step: 176810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:51:40,167-Speed 10533.77 samples/sec   Loss 5.3505   LearningRate 0.0016   Epoch: 34   Global Step: 176820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:51:41,165-Speed 10264.66 samples/sec   Loss 5.4561   LearningRate 0.0016   Epoch: 34   Global Step: 176830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:51:42,181-Speed 10101.19 samples/sec   Loss 5.4924   LearningRate 0.0016   Epoch: 34   Global Step: 176840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:51:43,152-Speed 10553.20 samples/sec   Loss 5.3620   LearningRate 0.0016   Epoch: 34   Global Step: 176850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:51:44,101-Speed 10797.21 samples/sec   Loss 5.4414   LearningRate 0.0016   Epoch: 34   Global Step: 176860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:51:45,122-Speed 10040.89 samples/sec   Loss 5.4850   LearningRate 0.0016   Epoch: 34   Global Step: 176870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:51:46,124-Speed 10232.47 samples/sec   Loss 5.4441   LearningRate 0.0016   Epoch: 34   Global Step: 176880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:51:47,155-Speed 9948.40 samples/sec   Loss 5.4458   LearningRate 0.0016   Epoch: 34   Global Step: 176890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:51:48,126-Speed 10558.69 samples/sec   Loss 5.5481   LearningRate 0.0016   Epoch: 34   Global Step: 176900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:51:49,118-Speed 10325.95 samples/sec   Loss 5.5096   LearningRate 0.0016   Epoch: 34   Global Step: 176910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:51:50,143-Speed 10003.38 samples/sec   Loss 5.4938   LearningRate 0.0016   Epoch: 34   Global Step: 176920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:51:51,135-Speed 10340.00 samples/sec   Loss 5.4186   LearningRate 0.0016   Epoch: 34   Global Step: 176930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:51:52,154-Speed 10065.03 samples/sec   Loss 5.4933   LearningRate 0.0016   Epoch: 34   Global Step: 176940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:51:53,167-Speed 10115.22 samples/sec   Loss 5.5703   LearningRate 0.0016   Epoch: 34   Global Step: 176950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:51:54,185-Speed 10073.53 samples/sec   Loss 5.4702   LearningRate 0.0016   Epoch: 34   Global Step: 176960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:51:55,175-Speed 10354.38 samples/sec   Loss 5.4714   LearningRate 0.0016   Epoch: 34   Global Step: 176970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:51:56,169-Speed 10307.86 samples/sec   Loss 5.5122   LearningRate 0.0016   Epoch: 34   Global Step: 176980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:51:57,175-Speed 10196.07 samples/sec   Loss 5.4963   LearningRate 0.0016   Epoch: 34   Global Step: 176990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:51:58,202-Speed 9974.53 samples/sec   Loss 5.5946   LearningRate 0.0016   Epoch: 34   Global Step: 177000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:51:59,174-Speed 10543.39 samples/sec   Loss 5.4540   LearningRate 0.0016   Epoch: 34   Global Step: 177010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:52:00,267-Speed 9374.84 samples/sec   Loss 5.5399   LearningRate 0.0016   Epoch: 34   Global Step: 177020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:52:01,162-Speed 11455.52 samples/sec   Loss 5.5744   LearningRate 0.0016   Epoch: 34   Global Step: 177030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:52:13,911-Speed 803.29 samples/sec   Loss 5.1762   LearningRate 0.0016   Epoch: 35   Global Step: 177040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:52:14,957-Speed 9803.67 samples/sec   Loss 5.1080   LearningRate 0.0016   Epoch: 35   Global Step: 177050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:52:16,218-Speed 8127.85 samples/sec   Loss 5.0933   LearningRate 0.0016   Epoch: 35   Global Step: 177060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:52:17,250-Speed 9932.54 samples/sec   Loss 5.2102   LearningRate 0.0016   Epoch: 35   Global Step: 177070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:52:18,344-Speed 9367.90 samples/sec   Loss 5.2667   LearningRate 0.0016   Epoch: 35   Global Step: 177080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:52:19,505-Speed 8819.18 samples/sec   Loss 5.2703   LearningRate 0.0016   Epoch: 35   Global Step: 177090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:52:20,524-Speed 10062.36 samples/sec   Loss 5.1130   LearningRate 0.0016   Epoch: 35   Global Step: 177100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:52:21,511-Speed 10391.69 samples/sec   Loss 5.1440   LearningRate 0.0016   Epoch: 35   Global Step: 177110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:52:22,524-Speed 10116.55 samples/sec   Loss 5.1862   LearningRate 0.0016   Epoch: 35   Global Step: 177120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:52:23,520-Speed 10289.82 samples/sec   Loss 5.2885   LearningRate 0.0016   Epoch: 35   Global Step: 177130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:52:24,553-Speed 9916.87 samples/sec   Loss 5.2140   LearningRate 0.0015   Epoch: 35   Global Step: 177140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:52:25,580-Speed 9979.68 samples/sec   Loss 5.2191   LearningRate 0.0015   Epoch: 35   Global Step: 177150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:52:26,581-Speed 10234.16 samples/sec   Loss 5.2544   LearningRate 0.0015   Epoch: 35   Global Step: 177160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:52:27,579-Speed 10391.49 samples/sec   Loss 5.1905   LearningRate 0.0015   Epoch: 35   Global Step: 177170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:52:28,552-Speed 10537.44 samples/sec   Loss 5.1375   LearningRate 0.0015   Epoch: 35   Global Step: 177180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:52:29,561-Speed 10159.24 samples/sec   Loss 5.2619   LearningRate 0.0015   Epoch: 35   Global Step: 177190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:52:30,589-Speed 9966.79 samples/sec   Loss 5.2479   LearningRate 0.0015   Epoch: 35   Global Step: 177200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:52:31,601-Speed 10122.23 samples/sec   Loss 5.1348   LearningRate 0.0015   Epoch: 35   Global Step: 177210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:52:32,627-Speed 9992.06 samples/sec   Loss 5.2635   LearningRate 0.0015   Epoch: 35   Global Step: 177220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:52:33,616-Speed 10366.44 samples/sec   Loss 5.2673   LearningRate 0.0015   Epoch: 35   Global Step: 177230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:52:34,594-Speed 10474.52 samples/sec   Loss 5.2604   LearningRate 0.0015   Epoch: 35   Global Step: 177240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:52:35,594-Speed 10248.36 samples/sec   Loss 5.3253   LearningRate 0.0015   Epoch: 35   Global Step: 177250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:52:36,598-Speed 10203.93 samples/sec   Loss 5.1203   LearningRate 0.0015   Epoch: 35   Global Step: 177260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:52:37,615-Speed 10078.40 samples/sec   Loss 5.2853   LearningRate 0.0015   Epoch: 35   Global Step: 177270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:52:38,623-Speed 10167.09 samples/sec   Loss 5.2958   LearningRate 0.0015   Epoch: 35   Global Step: 177280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:52:39,578-Speed 10732.06 samples/sec   Loss 5.0729   LearningRate 0.0015   Epoch: 35   Global Step: 177290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:52:40,541-Speed 10644.00 samples/sec   Loss 5.2172   LearningRate 0.0015   Epoch: 35   Global Step: 177300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:52:41,534-Speed 10317.07 samples/sec   Loss 5.1454   LearningRate 0.0015   Epoch: 35   Global Step: 177310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:52:42,526-Speed 10341.10 samples/sec   Loss 5.0215   LearningRate 0.0015   Epoch: 35   Global Step: 177320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:52:43,528-Speed 10225.25 samples/sec   Loss 5.1549   LearningRate 0.0015   Epoch: 35   Global Step: 177330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:52:44,511-Speed 10430.29 samples/sec   Loss 5.2569   LearningRate 0.0015   Epoch: 35   Global Step: 177340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:52:45,477-Speed 10608.72 samples/sec   Loss 5.1848   LearningRate 0.0015   Epoch: 35   Global Step: 177350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:52:46,451-Speed 10522.26 samples/sec   Loss 5.1684   LearningRate 0.0015   Epoch: 35   Global Step: 177360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:52:47,427-Speed 10498.06 samples/sec   Loss 5.1864   LearningRate 0.0015   Epoch: 35   Global Step: 177370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:52:48,469-Speed 9831.95 samples/sec   Loss 5.1439   LearningRate 0.0015   Epoch: 35   Global Step: 177380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:52:49,492-Speed 10014.33 samples/sec   Loss 5.1611   LearningRate 0.0015   Epoch: 35   Global Step: 177390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:52:50,550-Speed 9684.08 samples/sec   Loss 5.1524   LearningRate 0.0015   Epoch: 35   Global Step: 177400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:52:51,608-Speed 9693.11 samples/sec   Loss 5.3261   LearningRate 0.0015   Epoch: 35   Global Step: 177410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:52:52,648-Speed 9852.91 samples/sec   Loss 5.2592   LearningRate 0.0015   Epoch: 35   Global Step: 177420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:52:53,612-Speed 10628.77 samples/sec   Loss 5.3094   LearningRate 0.0015   Epoch: 35   Global Step: 177430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:52:54,616-Speed 10198.76 samples/sec   Loss 5.3253   LearningRate 0.0015   Epoch: 35   Global Step: 177440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:52:55,588-Speed 10540.00 samples/sec   Loss 5.3528   LearningRate 0.0015   Epoch: 35   Global Step: 177450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:52:56,556-Speed 10590.93 samples/sec   Loss 5.1793   LearningRate 0.0015   Epoch: 35   Global Step: 177460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:52:57,574-Speed 10067.31 samples/sec   Loss 5.1592   LearningRate 0.0015   Epoch: 35   Global Step: 177470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:52:58,571-Speed 10285.92 samples/sec   Loss 5.2745   LearningRate 0.0015   Epoch: 35   Global Step: 177480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:52:59,547-Speed 10500.86 samples/sec   Loss 5.3049   LearningRate 0.0015   Epoch: 35   Global Step: 177490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:53:00,517-Speed 10562.11 samples/sec   Loss 5.3353   LearningRate 0.0015   Epoch: 35   Global Step: 177500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:53:01,520-Speed 10222.71 samples/sec   Loss 5.1865   LearningRate 0.0015   Epoch: 35   Global Step: 177510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:53:02,500-Speed 10454.39 samples/sec   Loss 5.3364   LearningRate 0.0015   Epoch: 35   Global Step: 177520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:53:03,477-Speed 10493.90 samples/sec   Loss 5.2484   LearningRate 0.0015   Epoch: 35   Global Step: 177530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:53:04,506-Speed 9959.13 samples/sec   Loss 5.2988   LearningRate 0.0015   Epoch: 35   Global Step: 177540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:53:05,511-Speed 10195.94 samples/sec   Loss 5.1804   LearningRate 0.0015   Epoch: 35   Global Step: 177550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:53:06,486-Speed 10512.92 samples/sec   Loss 5.0993   LearningRate 0.0015   Epoch: 35   Global Step: 177560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:53:07,460-Speed 10527.41 samples/sec   Loss 5.2389   LearningRate 0.0015   Epoch: 35   Global Step: 177570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:53:08,444-Speed 10415.80 samples/sec   Loss 5.3238   LearningRate 0.0015   Epoch: 35   Global Step: 177580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:53:09,410-Speed 10604.66 samples/sec   Loss 5.1572   LearningRate 0.0015   Epoch: 35   Global Step: 177590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:53:10,385-Speed 10521.77 samples/sec   Loss 5.2332   LearningRate 0.0015   Epoch: 35   Global Step: 177600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:53:11,362-Speed 10484.36 samples/sec   Loss 5.2575   LearningRate 0.0015   Epoch: 35   Global Step: 177610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:53:12,342-Speed 10461.66 samples/sec   Loss 5.0932   LearningRate 0.0015   Epoch: 35   Global Step: 177620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:53:13,322-Speed 10455.44 samples/sec   Loss 5.1465   LearningRate 0.0015   Epoch: 35   Global Step: 177630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:53:14,292-Speed 10561.79 samples/sec   Loss 5.2369   LearningRate 0.0015   Epoch: 35   Global Step: 177640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:53:15,267-Speed 10517.96 samples/sec   Loss 5.0140   LearningRate 0.0015   Epoch: 35   Global Step: 177650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:53:16,232-Speed 10620.77 samples/sec   Loss 5.1666   LearningRate 0.0015   Epoch: 35   Global Step: 177660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:53:17,187-Speed 10737.50 samples/sec   Loss 5.1185   LearningRate 0.0015   Epoch: 35   Global Step: 177670   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 05:53:18,181-Speed 10300.83 samples/sec   Loss 5.1059   LearningRate 0.0015   Epoch: 35   Global Step: 177680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:53:19,185-Speed 10215.07 samples/sec   Loss 5.1969   LearningRate 0.0015   Epoch: 35   Global Step: 177690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:53:20,183-Speed 10269.65 samples/sec   Loss 5.2447   LearningRate 0.0015   Epoch: 35   Global Step: 177700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:53:21,156-Speed 10528.52 samples/sec   Loss 5.2206   LearningRate 0.0015   Epoch: 35   Global Step: 177710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:53:22,118-Speed 10652.41 samples/sec   Loss 5.3143   LearningRate 0.0015   Epoch: 35   Global Step: 177720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:53:23,098-Speed 10458.42 samples/sec   Loss 5.2771   LearningRate 0.0015   Epoch: 35   Global Step: 177730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:53:24,061-Speed 10637.48 samples/sec   Loss 5.3078   LearningRate 0.0015   Epoch: 35   Global Step: 177740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:53:25,034-Speed 10531.69 samples/sec   Loss 5.2664   LearningRate 0.0015   Epoch: 35   Global Step: 177750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:53:26,030-Speed 10287.23 samples/sec   Loss 5.2067   LearningRate 0.0015   Epoch: 35   Global Step: 177760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:53:27,038-Speed 10164.24 samples/sec   Loss 5.4202   LearningRate 0.0015   Epoch: 35   Global Step: 177770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:53:28,039-Speed 10237.20 samples/sec   Loss 5.2419   LearningRate 0.0015   Epoch: 35   Global Step: 177780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:53:29,035-Speed 10294.25 samples/sec   Loss 5.3560   LearningRate 0.0015   Epoch: 35   Global Step: 177790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:53:29,996-Speed 10664.90 samples/sec   Loss 5.3171   LearningRate 0.0015   Epoch: 35   Global Step: 177800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:53:30,950-Speed 10748.62 samples/sec   Loss 5.1140   LearningRate 0.0015   Epoch: 35   Global Step: 177810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:53:31,931-Speed 10436.47 samples/sec   Loss 5.3965   LearningRate 0.0015   Epoch: 35   Global Step: 177820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:53:32,925-Speed 10316.51 samples/sec   Loss 5.2052   LearningRate 0.0015   Epoch: 35   Global Step: 177830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:53:33,895-Speed 10557.51 samples/sec   Loss 5.3520   LearningRate 0.0015   Epoch: 35   Global Step: 177840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:53:34,888-Speed 10320.01 samples/sec   Loss 5.0489   LearningRate 0.0015   Epoch: 35   Global Step: 177850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:53:35,853-Speed 10619.60 samples/sec   Loss 5.2719   LearningRate 0.0015   Epoch: 35   Global Step: 177860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:53:36,850-Speed 10279.34 samples/sec   Loss 5.3010   LearningRate 0.0015   Epoch: 35   Global Step: 177870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:53:37,860-Speed 10147.79 samples/sec   Loss 5.1123   LearningRate 0.0015   Epoch: 35   Global Step: 177880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:53:38,833-Speed 10538.22 samples/sec   Loss 5.2549   LearningRate 0.0015   Epoch: 35   Global Step: 177890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:53:39,818-Speed 10398.85 samples/sec   Loss 5.2123   LearningRate 0.0015   Epoch: 35   Global Step: 177900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:53:40,810-Speed 10335.97 samples/sec   Loss 5.0969   LearningRate 0.0015   Epoch: 35   Global Step: 177910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:53:41,846-Speed 9892.24 samples/sec   Loss 5.4161   LearningRate 0.0015   Epoch: 35   Global Step: 177920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:53:42,879-Speed 9923.11 samples/sec   Loss 5.2346   LearningRate 0.0015   Epoch: 35   Global Step: 177930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:53:43,848-Speed 10570.54 samples/sec   Loss 5.3191   LearningRate 0.0015   Epoch: 35   Global Step: 177940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:53:44,824-Speed 10497.17 samples/sec   Loss 5.3072   LearningRate 0.0015   Epoch: 35   Global Step: 177950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:53:45,834-Speed 10151.79 samples/sec   Loss 5.2480   LearningRate 0.0014   Epoch: 35   Global Step: 177960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:53:46,839-Speed 10201.16 samples/sec   Loss 5.1759   LearningRate 0.0014   Epoch: 35   Global Step: 177970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:53:47,868-Speed 9962.78 samples/sec   Loss 5.3399   LearningRate 0.0014   Epoch: 35   Global Step: 177980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:53:48,829-Speed 10665.89 samples/sec   Loss 5.1617   LearningRate 0.0014   Epoch: 35   Global Step: 177990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:53:49,831-Speed 10224.71 samples/sec   Loss 5.1751   LearningRate 0.0014   Epoch: 35   Global Step: 178000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:54:12,253-[lfw][178000]XNorm: 8.065321
Training: 2022-04-11 05:54:12,254-[lfw][178000]Accuracy-Flip: 0.99600+-0.00300
Training: 2022-04-11 05:54:12,255-[lfw][178000]Accuracy-Highest: 0.99700
Training: 2022-04-11 05:54:38,035-[cfp_fp][178000]XNorm: 6.953108
Training: 2022-04-11 05:54:38,036-[cfp_fp][178000]Accuracy-Flip: 0.97257+-0.00964
Training: 2022-04-11 05:54:38,036-[cfp_fp][178000]Accuracy-Highest: 0.97371
Training: 2022-04-11 05:55:00,302-[agedb_30][178000]XNorm: 7.886579
Training: 2022-04-11 05:55:00,303-[agedb_30][178000]Accuracy-Flip: 0.97350+-0.00617
Training: 2022-04-11 05:55:00,303-[agedb_30][178000]Accuracy-Highest: 0.97350
Training: 2022-04-11 05:55:01,278-Speed 143.33 samples/sec   Loss 5.4552   LearningRate 0.0014   Epoch: 35   Global Step: 178010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:02,234-Speed 10717.44 samples/sec   Loss 5.3684   LearningRate 0.0014   Epoch: 35   Global Step: 178020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:03,236-Speed 10225.62 samples/sec   Loss 5.2310   LearningRate 0.0014   Epoch: 35   Global Step: 178030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:04,199-Speed 10633.61 samples/sec   Loss 5.2679   LearningRate 0.0014   Epoch: 35   Global Step: 178040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:55:05,160-Speed 10670.16 samples/sec   Loss 5.2834   LearningRate 0.0014   Epoch: 35   Global Step: 178050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:55:06,135-Speed 10505.24 samples/sec   Loss 5.1590   LearningRate 0.0014   Epoch: 35   Global Step: 178060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:55:07,137-Speed 10225.86 samples/sec   Loss 5.1972   LearningRate 0.0014   Epoch: 35   Global Step: 178070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:55:08,141-Speed 10212.83 samples/sec   Loss 5.2680   LearningRate 0.0014   Epoch: 35   Global Step: 178080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:55:09,140-Speed 10259.18 samples/sec   Loss 5.3001   LearningRate 0.0014   Epoch: 35   Global Step: 178090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:55:10,148-Speed 10168.10 samples/sec   Loss 5.0398   LearningRate 0.0014   Epoch: 35   Global Step: 178100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:11,119-Speed 10547.79 samples/sec   Loss 5.3516   LearningRate 0.0014   Epoch: 35   Global Step: 178110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:12,137-Speed 10070.69 samples/sec   Loss 5.2099   LearningRate 0.0014   Epoch: 35   Global Step: 178120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:13,106-Speed 10570.94 samples/sec   Loss 5.2511   LearningRate 0.0014   Epoch: 35   Global Step: 178130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:14,124-Speed 10069.18 samples/sec   Loss 4.9891   LearningRate 0.0014   Epoch: 35   Global Step: 178140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:15,091-Speed 10593.24 samples/sec   Loss 5.1918   LearningRate 0.0014   Epoch: 35   Global Step: 178150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:16,112-Speed 10044.35 samples/sec   Loss 5.2687   LearningRate 0.0014   Epoch: 35   Global Step: 178160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:17,098-Speed 10392.97 samples/sec   Loss 5.1832   LearningRate 0.0014   Epoch: 35   Global Step: 178170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:18,093-Speed 10306.68 samples/sec   Loss 5.2901   LearningRate 0.0014   Epoch: 35   Global Step: 178180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:19,083-Speed 10351.76 samples/sec   Loss 5.1899   LearningRate 0.0014   Epoch: 35   Global Step: 178190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:20,089-Speed 10189.55 samples/sec   Loss 5.2514   LearningRate 0.0014   Epoch: 35   Global Step: 178200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:55:21,036-Speed 10820.99 samples/sec   Loss 5.3003   LearningRate 0.0014   Epoch: 35   Global Step: 178210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:22,022-Speed 10391.06 samples/sec   Loss 5.2440   LearningRate 0.0014   Epoch: 35   Global Step: 178220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:23,050-Speed 9967.18 samples/sec   Loss 5.2244   LearningRate 0.0014   Epoch: 35   Global Step: 178230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:24,020-Speed 10572.14 samples/sec   Loss 5.1653   LearningRate 0.0014   Epoch: 35   Global Step: 178240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:24,980-Speed 10671.27 samples/sec   Loss 5.2134   LearningRate 0.0014   Epoch: 35   Global Step: 178250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:25,919-Speed 10916.05 samples/sec   Loss 5.1248   LearningRate 0.0014   Epoch: 35   Global Step: 178260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:26,910-Speed 10345.98 samples/sec   Loss 5.2541   LearningRate 0.0014   Epoch: 35   Global Step: 178270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:27,887-Speed 10482.68 samples/sec   Loss 5.3018   LearningRate 0.0014   Epoch: 35   Global Step: 178280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:28,850-Speed 10644.03 samples/sec   Loss 5.2108   LearningRate 0.0014   Epoch: 35   Global Step: 178290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:29,805-Speed 10736.52 samples/sec   Loss 5.2236   LearningRate 0.0014   Epoch: 35   Global Step: 178300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:30,776-Speed 10551.75 samples/sec   Loss 5.3432   LearningRate 0.0014   Epoch: 35   Global Step: 178310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:55:31,777-Speed 10235.96 samples/sec   Loss 5.1950   LearningRate 0.0014   Epoch: 35   Global Step: 178320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:55:32,793-Speed 10084.50 samples/sec   Loss 5.4074   LearningRate 0.0014   Epoch: 35   Global Step: 178330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:55:33,767-Speed 10523.61 samples/sec   Loss 5.3648   LearningRate 0.0014   Epoch: 35   Global Step: 178340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:55:34,745-Speed 10476.00 samples/sec   Loss 5.2775   LearningRate 0.0014   Epoch: 35   Global Step: 178350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:35,730-Speed 10408.22 samples/sec   Loss 5.3751   LearningRate 0.0014   Epoch: 35   Global Step: 178360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:36,712-Speed 10434.98 samples/sec   Loss 5.1437   LearningRate 0.0014   Epoch: 35   Global Step: 178370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:37,680-Speed 10588.55 samples/sec   Loss 5.3475   LearningRate 0.0014   Epoch: 35   Global Step: 178380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:38,692-Speed 10125.94 samples/sec   Loss 5.4011   LearningRate 0.0014   Epoch: 35   Global Step: 178390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:39,623-Speed 11012.76 samples/sec   Loss 5.1931   LearningRate 0.0014   Epoch: 35   Global Step: 178400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:40,601-Speed 10479.19 samples/sec   Loss 5.3634   LearningRate 0.0014   Epoch: 35   Global Step: 178410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:41,580-Speed 10468.26 samples/sec   Loss 5.2923   LearningRate 0.0014   Epoch: 35   Global Step: 178420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:42,617-Speed 9879.63 samples/sec   Loss 5.2366   LearningRate 0.0014   Epoch: 35   Global Step: 178430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:43,636-Speed 10060.99 samples/sec   Loss 5.2797   LearningRate 0.0014   Epoch: 35   Global Step: 178440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:44,579-Speed 10859.93 samples/sec   Loss 5.2769   LearningRate 0.0014   Epoch: 35   Global Step: 178450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:55:45,522-Speed 10872.23 samples/sec   Loss 5.3493   LearningRate 0.0014   Epoch: 35   Global Step: 178460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:55:46,517-Speed 10298.17 samples/sec   Loss 5.2547   LearningRate 0.0014   Epoch: 35   Global Step: 178470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:55:47,504-Speed 10376.78 samples/sec   Loss 5.2276   LearningRate 0.0014   Epoch: 35   Global Step: 178480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:55:48,460-Speed 10720.13 samples/sec   Loss 5.3398   LearningRate 0.0014   Epoch: 35   Global Step: 178490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:49,414-Speed 10744.45 samples/sec   Loss 5.3701   LearningRate 0.0014   Epoch: 35   Global Step: 178500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:50,434-Speed 10052.90 samples/sec   Loss 5.3643   LearningRate 0.0014   Epoch: 35   Global Step: 178510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:51,421-Speed 10375.79 samples/sec   Loss 5.2096   LearningRate 0.0014   Epoch: 35   Global Step: 178520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:52,420-Speed 10256.91 samples/sec   Loss 5.2166   LearningRate 0.0014   Epoch: 35   Global Step: 178530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:53,417-Speed 10282.42 samples/sec   Loss 5.1529   LearningRate 0.0014   Epoch: 35   Global Step: 178540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:54,405-Speed 10382.52 samples/sec   Loss 5.3239   LearningRate 0.0014   Epoch: 35   Global Step: 178550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:55,402-Speed 10279.07 samples/sec   Loss 5.2156   LearningRate 0.0014   Epoch: 35   Global Step: 178560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:56,373-Speed 10544.25 samples/sec   Loss 5.2698   LearningRate 0.0014   Epoch: 35   Global Step: 178570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:57,389-Speed 10088.65 samples/sec   Loss 5.2219   LearningRate 0.0014   Epoch: 35   Global Step: 178580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:55:58,388-Speed 10257.21 samples/sec   Loss 5.3492   LearningRate 0.0014   Epoch: 35   Global Step: 178590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:55:59,398-Speed 10148.83 samples/sec   Loss 5.1932   LearningRate 0.0014   Epoch: 35   Global Step: 178600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:00,369-Speed 10559.49 samples/sec   Loss 5.3314   LearningRate 0.0014   Epoch: 35   Global Step: 178610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:01,320-Speed 10768.56 samples/sec   Loss 5.1382   LearningRate 0.0014   Epoch: 35   Global Step: 178620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:02,282-Speed 10655.77 samples/sec   Loss 5.2851   LearningRate 0.0014   Epoch: 35   Global Step: 178630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:03,265-Speed 10423.03 samples/sec   Loss 5.3940   LearningRate 0.0014   Epoch: 35   Global Step: 178640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:04,233-Speed 10591.56 samples/sec   Loss 5.1243   LearningRate 0.0014   Epoch: 35   Global Step: 178650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:05,222-Speed 10364.32 samples/sec   Loss 5.2879   LearningRate 0.0014   Epoch: 35   Global Step: 178660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:06,176-Speed 10738.27 samples/sec   Loss 5.3470   LearningRate 0.0014   Epoch: 35   Global Step: 178670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:07,153-Speed 10486.35 samples/sec   Loss 5.1607   LearningRate 0.0014   Epoch: 35   Global Step: 178680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:08,149-Speed 10294.75 samples/sec   Loss 5.3227   LearningRate 0.0014   Epoch: 35   Global Step: 178690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:09,156-Speed 10170.71 samples/sec   Loss 5.1493   LearningRate 0.0014   Epoch: 35   Global Step: 178700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:10,146-Speed 10354.89 samples/sec   Loss 5.3627   LearningRate 0.0014   Epoch: 35   Global Step: 178710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:11,140-Speed 10311.70 samples/sec   Loss 5.3126   LearningRate 0.0014   Epoch: 35   Global Step: 178720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:12,142-Speed 10229.69 samples/sec   Loss 5.3095   LearningRate 0.0014   Epoch: 35   Global Step: 178730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:13,132-Speed 10346.90 samples/sec   Loss 5.3235   LearningRate 0.0014   Epoch: 35   Global Step: 178740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:14,134-Speed 10226.96 samples/sec   Loss 5.1905   LearningRate 0.0014   Epoch: 35   Global Step: 178750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:15,097-Speed 10644.43 samples/sec   Loss 5.4273   LearningRate 0.0014   Epoch: 35   Global Step: 178760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:16,103-Speed 10187.53 samples/sec   Loss 5.2876   LearningRate 0.0014   Epoch: 35   Global Step: 178770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:17,063-Speed 10687.46 samples/sec   Loss 5.2654   LearningRate 0.0014   Epoch: 35   Global Step: 178780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:18,021-Speed 10697.57 samples/sec   Loss 5.3328   LearningRate 0.0014   Epoch: 35   Global Step: 178790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:19,020-Speed 10256.01 samples/sec   Loss 5.3403   LearningRate 0.0014   Epoch: 35   Global Step: 178800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:20,011-Speed 10352.76 samples/sec   Loss 5.1534   LearningRate 0.0014   Epoch: 35   Global Step: 178810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:20,980-Speed 10569.89 samples/sec   Loss 5.3849   LearningRate 0.0013   Epoch: 35   Global Step: 178820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:21,942-Speed 10652.30 samples/sec   Loss 5.2696   LearningRate 0.0013   Epoch: 35   Global Step: 178830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:22,934-Speed 10326.52 samples/sec   Loss 5.3476   LearningRate 0.0013   Epoch: 35   Global Step: 178840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:23,942-Speed 10164.38 samples/sec   Loss 5.4434   LearningRate 0.0013   Epoch: 35   Global Step: 178850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:24,979-Speed 9888.61 samples/sec   Loss 5.3347   LearningRate 0.0013   Epoch: 35   Global Step: 178860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:25,937-Speed 10703.01 samples/sec   Loss 5.4529   LearningRate 0.0013   Epoch: 35   Global Step: 178870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:26,892-Speed 10725.40 samples/sec   Loss 5.2750   LearningRate 0.0013   Epoch: 35   Global Step: 178880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:27,856-Speed 10630.23 samples/sec   Loss 5.3648   LearningRate 0.0013   Epoch: 35   Global Step: 178890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:28,831-Speed 10518.04 samples/sec   Loss 5.2984   LearningRate 0.0013   Epoch: 35   Global Step: 178900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:29,879-Speed 9780.91 samples/sec   Loss 5.3180   LearningRate 0.0013   Epoch: 35   Global Step: 178910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:30,891-Speed 10121.75 samples/sec   Loss 5.2127   LearningRate 0.0013   Epoch: 35   Global Step: 178920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:31,862-Speed 10559.31 samples/sec   Loss 5.2399   LearningRate 0.0013   Epoch: 35   Global Step: 178930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:32,848-Speed 10397.11 samples/sec   Loss 5.2317   LearningRate 0.0013   Epoch: 35   Global Step: 178940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:33,792-Speed 10850.85 samples/sec   Loss 5.2759   LearningRate 0.0013   Epoch: 35   Global Step: 178950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:34,796-Speed 10206.67 samples/sec   Loss 5.3353   LearningRate 0.0013   Epoch: 35   Global Step: 178960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:35,775-Speed 10475.14 samples/sec   Loss 5.2653   LearningRate 0.0013   Epoch: 35   Global Step: 178970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:36,748-Speed 10527.12 samples/sec   Loss 5.2024   LearningRate 0.0013   Epoch: 35   Global Step: 178980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:37,767-Speed 10055.82 samples/sec   Loss 5.1808   LearningRate 0.0013   Epoch: 35   Global Step: 178990   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 05:56:38,796-Speed 9962.32 samples/sec   Loss 5.3013   LearningRate 0.0013   Epoch: 35   Global Step: 179000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:39,734-Speed 10924.45 samples/sec   Loss 5.2722   LearningRate 0.0013   Epoch: 35   Global Step: 179010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:40,678-Speed 10854.70 samples/sec   Loss 5.2063   LearningRate 0.0013   Epoch: 35   Global Step: 179020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:41,648-Speed 10564.70 samples/sec   Loss 5.2539   LearningRate 0.0013   Epoch: 35   Global Step: 179030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:42,622-Speed 10521.68 samples/sec   Loss 5.2937   LearningRate 0.0013   Epoch: 35   Global Step: 179040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:43,638-Speed 10091.91 samples/sec   Loss 5.2790   LearningRate 0.0013   Epoch: 35   Global Step: 179050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:44,653-Speed 10095.00 samples/sec   Loss 5.3184   LearningRate 0.0013   Epoch: 35   Global Step: 179060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:45,588-Speed 10968.06 samples/sec   Loss 5.2230   LearningRate 0.0013   Epoch: 35   Global Step: 179070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:46,582-Speed 10303.81 samples/sec   Loss 5.4120   LearningRate 0.0013   Epoch: 35   Global Step: 179080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:56:47,536-Speed 10746.16 samples/sec   Loss 5.2379   LearningRate 0.0013   Epoch: 35   Global Step: 179090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:56:48,498-Speed 10647.84 samples/sec   Loss 5.2437   LearningRate 0.0013   Epoch: 35   Global Step: 179100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:56:49,495-Speed 10271.75 samples/sec   Loss 5.3405   LearningRate 0.0013   Epoch: 35   Global Step: 179110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:56:50,491-Speed 10290.38 samples/sec   Loss 5.2448   LearningRate 0.0013   Epoch: 35   Global Step: 179120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:56:51,481-Speed 10360.70 samples/sec   Loss 5.3053   LearningRate 0.0013   Epoch: 35   Global Step: 179130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:56:52,457-Speed 10499.69 samples/sec   Loss 5.1298   LearningRate 0.0013   Epoch: 35   Global Step: 179140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:56:53,413-Speed 10715.96 samples/sec   Loss 5.3844   LearningRate 0.0013   Epoch: 35   Global Step: 179150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:56:54,413-Speed 10245.67 samples/sec   Loss 5.2712   LearningRate 0.0013   Epoch: 35   Global Step: 179160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:56:55,391-Speed 10490.14 samples/sec   Loss 5.3481   LearningRate 0.0013   Epoch: 35   Global Step: 179170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:56:56,346-Speed 10732.20 samples/sec   Loss 5.3261   LearningRate 0.0013   Epoch: 35   Global Step: 179180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:57,344-Speed 10259.98 samples/sec   Loss 5.0597   LearningRate 0.0013   Epoch: 35   Global Step: 179190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:58,310-Speed 10608.69 samples/sec   Loss 5.3978   LearningRate 0.0013   Epoch: 35   Global Step: 179200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:56:59,311-Speed 10233.43 samples/sec   Loss 5.2747   LearningRate 0.0013   Epoch: 35   Global Step: 179210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:57:00,259-Speed 10818.08 samples/sec   Loss 5.3586   LearningRate 0.0013   Epoch: 35   Global Step: 179220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:57:01,240-Speed 10443.91 samples/sec   Loss 5.3164   LearningRate 0.0013   Epoch: 35   Global Step: 179230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:57:02,279-Speed 9862.76 samples/sec   Loss 5.1800   LearningRate 0.0013   Epoch: 35   Global Step: 179240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:57:03,264-Speed 10409.26 samples/sec   Loss 5.3082   LearningRate 0.0013   Epoch: 35   Global Step: 179250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:57:04,279-Speed 10100.83 samples/sec   Loss 5.2130   LearningRate 0.0013   Epoch: 35   Global Step: 179260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:57:05,282-Speed 10214.99 samples/sec   Loss 5.3967   LearningRate 0.0013   Epoch: 35   Global Step: 179270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:57:06,301-Speed 10056.88 samples/sec   Loss 5.4128   LearningRate 0.0013   Epoch: 35   Global Step: 179280   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 05:57:07,259-Speed 10693.47 samples/sec   Loss 5.3156   LearningRate 0.0013   Epoch: 35   Global Step: 179290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:57:08,255-Speed 10289.83 samples/sec   Loss 5.3004   LearningRate 0.0013   Epoch: 35   Global Step: 179300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:57:09,246-Speed 10377.58 samples/sec   Loss 5.2556   LearningRate 0.0013   Epoch: 35   Global Step: 179310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:57:10,217-Speed 10547.66 samples/sec   Loss 5.2587   LearningRate 0.0013   Epoch: 35   Global Step: 179320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:57:11,213-Speed 10295.48 samples/sec   Loss 5.3698   LearningRate 0.0013   Epoch: 35   Global Step: 179330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:57:12,174-Speed 10663.38 samples/sec   Loss 5.3659   LearningRate 0.0013   Epoch: 35   Global Step: 179340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:57:13,170-Speed 10283.99 samples/sec   Loss 5.4024   LearningRate 0.0013   Epoch: 35   Global Step: 179350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:57:14,167-Speed 10280.92 samples/sec   Loss 5.2955   LearningRate 0.0013   Epoch: 35   Global Step: 179360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:57:15,163-Speed 10293.83 samples/sec   Loss 5.3142   LearningRate 0.0013   Epoch: 35   Global Step: 179370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:57:16,129-Speed 10609.75 samples/sec   Loss 5.4353   LearningRate 0.0013   Epoch: 35   Global Step: 179380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:57:17,099-Speed 10564.10 samples/sec   Loss 5.4194   LearningRate 0.0013   Epoch: 35   Global Step: 179390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:57:18,086-Speed 10378.25 samples/sec   Loss 5.3592   LearningRate 0.0013   Epoch: 35   Global Step: 179400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:57:19,050-Speed 10631.15 samples/sec   Loss 5.2702   LearningRate 0.0013   Epoch: 35   Global Step: 179410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:57:20,052-Speed 10230.37 samples/sec   Loss 5.3922   LearningRate 0.0013   Epoch: 35   Global Step: 179420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:57:21,030-Speed 10479.00 samples/sec   Loss 5.2791   LearningRate 0.0013   Epoch: 35   Global Step: 179430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:57:22,050-Speed 10046.74 samples/sec   Loss 5.3410   LearningRate 0.0013   Epoch: 35   Global Step: 179440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:57:23,040-Speed 10351.26 samples/sec   Loss 5.2233   LearningRate 0.0013   Epoch: 35   Global Step: 179450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:57:24,039-Speed 10256.58 samples/sec   Loss 5.3526   LearningRate 0.0013   Epoch: 35   Global Step: 179460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:57:25,040-Speed 10243.29 samples/sec   Loss 5.3194   LearningRate 0.0013   Epoch: 35   Global Step: 179470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:57:26,010-Speed 10557.28 samples/sec   Loss 5.2309   LearningRate 0.0013   Epoch: 35   Global Step: 179480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:57:27,050-Speed 9863.61 samples/sec   Loss 5.4123   LearningRate 0.0013   Epoch: 35   Global Step: 179490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:57:28,026-Speed 10503.11 samples/sec   Loss 5.1931   LearningRate 0.0013   Epoch: 35   Global Step: 179500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:57:28,979-Speed 10743.24 samples/sec   Loss 5.2409   LearningRate 0.0013   Epoch: 35   Global Step: 179510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:57:29,988-Speed 10165.65 samples/sec   Loss 5.2710   LearningRate 0.0013   Epoch: 35   Global Step: 179520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:57:30,958-Speed 10565.44 samples/sec   Loss 5.3190   LearningRate 0.0013   Epoch: 35   Global Step: 179530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:57:31,941-Speed 10425.60 samples/sec   Loss 5.4373   LearningRate 0.0013   Epoch: 35   Global Step: 179540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:57:32,918-Speed 10489.43 samples/sec   Loss 5.1457   LearningRate 0.0013   Epoch: 35   Global Step: 179550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:57:33,892-Speed 10521.52 samples/sec   Loss 5.2277   LearningRate 0.0013   Epoch: 35   Global Step: 179560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:57:34,905-Speed 10114.02 samples/sec   Loss 5.2543   LearningRate 0.0013   Epoch: 35   Global Step: 179570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:57:35,866-Speed 10657.11 samples/sec   Loss 5.2903   LearningRate 0.0013   Epoch: 35   Global Step: 179580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:57:36,841-Speed 10512.28 samples/sec   Loss 5.3170   LearningRate 0.0013   Epoch: 35   Global Step: 179590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:57:37,839-Speed 10273.21 samples/sec   Loss 5.2980   LearningRate 0.0013   Epoch: 35   Global Step: 179600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:57:38,841-Speed 10218.61 samples/sec   Loss 5.3813   LearningRate 0.0013   Epoch: 35   Global Step: 179610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:57:39,831-Speed 10356.90 samples/sec   Loss 5.2867   LearningRate 0.0013   Epoch: 35   Global Step: 179620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:57:40,866-Speed 9902.23 samples/sec   Loss 5.2759   LearningRate 0.0013   Epoch: 35   Global Step: 179630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:57:41,841-Speed 10511.30 samples/sec   Loss 5.3703   LearningRate 0.0013   Epoch: 35   Global Step: 179640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:57:42,826-Speed 10403.26 samples/sec   Loss 5.1995   LearningRate 0.0013   Epoch: 35   Global Step: 179650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:57:43,833-Speed 10177.62 samples/sec   Loss 5.4127   LearningRate 0.0013   Epoch: 35   Global Step: 179660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:57:44,824-Speed 10346.47 samples/sec   Loss 5.3537   LearningRate 0.0013   Epoch: 35   Global Step: 179670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:57:45,784-Speed 10670.06 samples/sec   Loss 5.3296   LearningRate 0.0013   Epoch: 35   Global Step: 179680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:57:46,771-Speed 10384.78 samples/sec   Loss 5.3456   LearningRate 0.0013   Epoch: 35   Global Step: 179690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:57:47,728-Speed 10713.76 samples/sec   Loss 5.2710   LearningRate 0.0012   Epoch: 35   Global Step: 179700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:57:48,725-Speed 10278.68 samples/sec   Loss 5.2897   LearningRate 0.0012   Epoch: 35   Global Step: 179710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:57:49,679-Speed 10738.00 samples/sec   Loss 5.3431   LearningRate 0.0012   Epoch: 35   Global Step: 179720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:57:50,700-Speed 10040.61 samples/sec   Loss 5.3410   LearningRate 0.0012   Epoch: 35   Global Step: 179730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:57:51,713-Speed 10115.21 samples/sec   Loss 5.4663   LearningRate 0.0012   Epoch: 35   Global Step: 179740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:57:52,713-Speed 10246.94 samples/sec   Loss 5.3587   LearningRate 0.0012   Epoch: 35   Global Step: 179750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:57:53,688-Speed 10508.37 samples/sec   Loss 5.3159   LearningRate 0.0012   Epoch: 35   Global Step: 179760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:57:54,673-Speed 10408.57 samples/sec   Loss 5.3256   LearningRate 0.0012   Epoch: 35   Global Step: 179770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:57:55,680-Speed 10181.94 samples/sec   Loss 5.3584   LearningRate 0.0012   Epoch: 35   Global Step: 179780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:57:56,667-Speed 10387.79 samples/sec   Loss 5.2755   LearningRate 0.0012   Epoch: 35   Global Step: 179790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:57:57,664-Speed 10279.94 samples/sec   Loss 5.3474   LearningRate 0.0012   Epoch: 35   Global Step: 179800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:57:58,680-Speed 10088.67 samples/sec   Loss 5.2917   LearningRate 0.0012   Epoch: 35   Global Step: 179810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:57:59,688-Speed 10166.73 samples/sec   Loss 5.3186   LearningRate 0.0012   Epoch: 35   Global Step: 179820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:58:00,716-Speed 9966.81 samples/sec   Loss 5.3176   LearningRate 0.0012   Epoch: 35   Global Step: 179830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:58:01,701-Speed 10406.74 samples/sec   Loss 5.3902   LearningRate 0.0012   Epoch: 35   Global Step: 179840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:58:02,697-Speed 10302.09 samples/sec   Loss 5.4556   LearningRate 0.0012   Epoch: 35   Global Step: 179850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:58:03,655-Speed 10696.22 samples/sec   Loss 5.3606   LearningRate 0.0012   Epoch: 35   Global Step: 179860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:58:04,614-Speed 10679.44 samples/sec   Loss 5.2907   LearningRate 0.0012   Epoch: 35   Global Step: 179870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:58:05,631-Speed 10081.98 samples/sec   Loss 5.2504   LearningRate 0.0012   Epoch: 35   Global Step: 179880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:58:06,597-Speed 10609.83 samples/sec   Loss 5.2884   LearningRate 0.0012   Epoch: 35   Global Step: 179890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:58:07,562-Speed 10619.95 samples/sec   Loss 5.2052   LearningRate 0.0012   Epoch: 35   Global Step: 179900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:58:08,547-Speed 10410.52 samples/sec   Loss 5.3567   LearningRate 0.0012   Epoch: 35   Global Step: 179910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:58:09,525-Speed 10469.58 samples/sec   Loss 5.3533   LearningRate 0.0012   Epoch: 35   Global Step: 179920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:58:10,517-Speed 10329.85 samples/sec   Loss 5.1166   LearningRate 0.0012   Epoch: 35   Global Step: 179930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:58:11,502-Speed 10413.20 samples/sec   Loss 5.1801   LearningRate 0.0012   Epoch: 35   Global Step: 179940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:58:12,436-Speed 10964.32 samples/sec   Loss 5.1850   LearningRate 0.0012   Epoch: 35   Global Step: 179950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:58:13,411-Speed 10508.32 samples/sec   Loss 5.3761   LearningRate 0.0012   Epoch: 35   Global Step: 179960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:58:14,359-Speed 10810.97 samples/sec   Loss 5.3540   LearningRate 0.0012   Epoch: 35   Global Step: 179970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:58:15,380-Speed 10041.13 samples/sec   Loss 5.3931   LearningRate 0.0012   Epoch: 35   Global Step: 179980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:58:16,359-Speed 10471.05 samples/sec   Loss 5.2908   LearningRate 0.0012   Epoch: 35   Global Step: 179990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:58:17,337-Speed 10478.07 samples/sec   Loss 5.3531   LearningRate 0.0012   Epoch: 35   Global Step: 180000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:58:39,593-[lfw][180000]XNorm: 8.040342
Training: 2022-04-11 05:58:39,596-[lfw][180000]Accuracy-Flip: 0.99667+-0.00333
Training: 2022-04-11 05:58:39,596-[lfw][180000]Accuracy-Highest: 0.99700
Training: 2022-04-11 05:59:05,203-[cfp_fp][180000]XNorm: 6.941148
Training: 2022-04-11 05:59:05,204-[cfp_fp][180000]Accuracy-Flip: 0.97229+-0.00872
Training: 2022-04-11 05:59:05,205-[cfp_fp][180000]Accuracy-Highest: 0.97371
Training: 2022-04-11 05:59:27,383-[agedb_30][180000]XNorm: 7.855420
Training: 2022-04-11 05:59:27,384-[agedb_30][180000]Accuracy-Flip: 0.97100+-0.00688
Training: 2022-04-11 05:59:27,385-[agedb_30][180000]Accuracy-Highest: 0.97350
Training: 2022-04-11 05:59:28,389-Speed 144.12 samples/sec   Loss 5.3522   LearningRate 0.0012   Epoch: 35   Global Step: 180010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:59:29,378-Speed 10361.53 samples/sec   Loss 5.2889   LearningRate 0.0012   Epoch: 35   Global Step: 180020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:59:30,351-Speed 10530.06 samples/sec   Loss 5.2178   LearningRate 0.0012   Epoch: 35   Global Step: 180030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:59:31,335-Speed 10412.67 samples/sec   Loss 5.3398   LearningRate 0.0012   Epoch: 35   Global Step: 180040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:59:32,305-Speed 10569.39 samples/sec   Loss 5.3246   LearningRate 0.0012   Epoch: 35   Global Step: 180050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:59:33,279-Speed 10526.52 samples/sec   Loss 5.3551   LearningRate 0.0012   Epoch: 35   Global Step: 180060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:59:34,307-Speed 9976.85 samples/sec   Loss 5.4285   LearningRate 0.0012   Epoch: 35   Global Step: 180070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:59:35,304-Speed 10282.27 samples/sec   Loss 5.2992   LearningRate 0.0012   Epoch: 35   Global Step: 180080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:59:36,306-Speed 10231.59 samples/sec   Loss 5.3576   LearningRate 0.0012   Epoch: 35   Global Step: 180090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:59:37,298-Speed 10335.90 samples/sec   Loss 5.3408   LearningRate 0.0012   Epoch: 35   Global Step: 180100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:59:38,298-Speed 10251.52 samples/sec   Loss 5.3430   LearningRate 0.0012   Epoch: 35   Global Step: 180110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:59:39,261-Speed 10639.33 samples/sec   Loss 5.2346   LearningRate 0.0012   Epoch: 35   Global Step: 180120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:59:40,261-Speed 10251.45 samples/sec   Loss 5.2159   LearningRate 0.0012   Epoch: 35   Global Step: 180130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:59:41,179-Speed 11158.55 samples/sec   Loss 5.4104   LearningRate 0.0012   Epoch: 35   Global Step: 180140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:59:42,168-Speed 10364.19 samples/sec   Loss 5.3198   LearningRate 0.0012   Epoch: 35   Global Step: 180150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:59:43,126-Speed 10707.58 samples/sec   Loss 5.1973   LearningRate 0.0012   Epoch: 35   Global Step: 180160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:59:44,123-Speed 10279.27 samples/sec   Loss 5.2781   LearningRate 0.0012   Epoch: 35   Global Step: 180170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:59:45,121-Speed 10261.62 samples/sec   Loss 5.4570   LearningRate 0.0012   Epoch: 35   Global Step: 180180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:59:46,162-Speed 9845.44 samples/sec   Loss 5.2254   LearningRate 0.0012   Epoch: 35   Global Step: 180190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:59:47,124-Speed 10662.36 samples/sec   Loss 5.4923   LearningRate 0.0012   Epoch: 35   Global Step: 180200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:59:48,132-Speed 10169.34 samples/sec   Loss 5.2473   LearningRate 0.0012   Epoch: 35   Global Step: 180210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:59:49,119-Speed 10379.94 samples/sec   Loss 5.3136   LearningRate 0.0012   Epoch: 35   Global Step: 180220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:59:50,128-Speed 10156.48 samples/sec   Loss 5.2957   LearningRate 0.0012   Epoch: 35   Global Step: 180230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:59:51,105-Speed 10496.68 samples/sec   Loss 5.1829   LearningRate 0.0012   Epoch: 35   Global Step: 180240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:59:52,094-Speed 10358.44 samples/sec   Loss 5.3369   LearningRate 0.0012   Epoch: 35   Global Step: 180250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:59:53,068-Speed 10523.95 samples/sec   Loss 5.4872   LearningRate 0.0012   Epoch: 35   Global Step: 180260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:59:54,035-Speed 10599.21 samples/sec   Loss 5.2538   LearningRate 0.0012   Epoch: 35   Global Step: 180270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:59:55,045-Speed 10147.59 samples/sec   Loss 5.1946   LearningRate 0.0012   Epoch: 35   Global Step: 180280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:59:56,039-Speed 10318.17 samples/sec   Loss 5.3812   LearningRate 0.0012   Epoch: 35   Global Step: 180290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:59:56,989-Speed 10782.24 samples/sec   Loss 5.3825   LearningRate 0.0012   Epoch: 35   Global Step: 180300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:59:57,983-Speed 10311.37 samples/sec   Loss 5.2490   LearningRate 0.0012   Epoch: 35   Global Step: 180310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 05:59:58,991-Speed 10163.24 samples/sec   Loss 5.2184   LearningRate 0.0012   Epoch: 35   Global Step: 180320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 05:59:59,986-Speed 10307.12 samples/sec   Loss 5.3307   LearningRate 0.0012   Epoch: 35   Global Step: 180330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:00:00,973-Speed 10376.04 samples/sec   Loss 5.2502   LearningRate 0.0012   Epoch: 35   Global Step: 180340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:00:01,944-Speed 10557.79 samples/sec   Loss 5.3188   LearningRate 0.0012   Epoch: 35   Global Step: 180350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:00:02,963-Speed 10085.93 samples/sec   Loss 5.2218   LearningRate 0.0012   Epoch: 35   Global Step: 180360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:00:03,946-Speed 10420.37 samples/sec   Loss 5.3096   LearningRate 0.0012   Epoch: 35   Global Step: 180370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:00:04,901-Speed 10732.03 samples/sec   Loss 5.4493   LearningRate 0.0012   Epoch: 35   Global Step: 180380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:00:05,876-Speed 10515.93 samples/sec   Loss 5.3579   LearningRate 0.0012   Epoch: 35   Global Step: 180390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:00:06,915-Speed 9860.59 samples/sec   Loss 5.2894   LearningRate 0.0012   Epoch: 35   Global Step: 180400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:00:07,895-Speed 10459.05 samples/sec   Loss 5.3858   LearningRate 0.0012   Epoch: 35   Global Step: 180410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:00:08,913-Speed 10067.08 samples/sec   Loss 5.2497   LearningRate 0.0012   Epoch: 35   Global Step: 180420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:00:09,925-Speed 10122.59 samples/sec   Loss 5.4349   LearningRate 0.0012   Epoch: 35   Global Step: 180430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:00:10,930-Speed 10199.94 samples/sec   Loss 5.3858   LearningRate 0.0012   Epoch: 35   Global Step: 180440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:00:11,926-Speed 10283.55 samples/sec   Loss 5.3839   LearningRate 0.0012   Epoch: 35   Global Step: 180450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:12,914-Speed 10378.82 samples/sec   Loss 5.3270   LearningRate 0.0012   Epoch: 35   Global Step: 180460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:13,877-Speed 10635.91 samples/sec   Loss 5.3371   LearningRate 0.0012   Epoch: 35   Global Step: 180470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:14,852-Speed 10508.00 samples/sec   Loss 5.3861   LearningRate 0.0012   Epoch: 35   Global Step: 180480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:15,794-Speed 10888.74 samples/sec   Loss 5.4127   LearningRate 0.0012   Epoch: 35   Global Step: 180490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:16,787-Speed 10314.14 samples/sec   Loss 5.4157   LearningRate 0.0012   Epoch: 35   Global Step: 180500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:17,816-Speed 9964.84 samples/sec   Loss 5.3978   LearningRate 0.0012   Epoch: 35   Global Step: 180510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:18,805-Speed 10360.03 samples/sec   Loss 5.2633   LearningRate 0.0012   Epoch: 35   Global Step: 180520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:19,789-Speed 10416.38 samples/sec   Loss 5.3367   LearningRate 0.0012   Epoch: 35   Global Step: 180530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:20,799-Speed 10141.20 samples/sec   Loss 5.3081   LearningRate 0.0012   Epoch: 35   Global Step: 180540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:21,771-Speed 10550.42 samples/sec   Loss 5.3072   LearningRate 0.0012   Epoch: 35   Global Step: 180550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:00:22,749-Speed 10480.75 samples/sec   Loss 5.3870   LearningRate 0.0012   Epoch: 35   Global Step: 180560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:23,736-Speed 10381.01 samples/sec   Loss 5.4116   LearningRate 0.0012   Epoch: 35   Global Step: 180570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:24,742-Speed 10181.02 samples/sec   Loss 5.4057   LearningRate 0.0012   Epoch: 35   Global Step: 180580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:25,705-Speed 10650.27 samples/sec   Loss 5.4593   LearningRate 0.0012   Epoch: 35   Global Step: 180590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:26,681-Speed 10500.69 samples/sec   Loss 5.3778   LearningRate 0.0012   Epoch: 35   Global Step: 180600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:27,682-Speed 10229.16 samples/sec   Loss 5.3289   LearningRate 0.0012   Epoch: 35   Global Step: 180610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:28,716-Speed 9915.68 samples/sec   Loss 5.2491   LearningRate 0.0012   Epoch: 35   Global Step: 180620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:29,729-Speed 10111.90 samples/sec   Loss 5.4441   LearningRate 0.0011   Epoch: 35   Global Step: 180630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:30,682-Speed 10752.48 samples/sec   Loss 5.4528   LearningRate 0.0011   Epoch: 35   Global Step: 180640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:31,687-Speed 10203.59 samples/sec   Loss 5.3919   LearningRate 0.0011   Epoch: 35   Global Step: 180650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:32,652-Speed 10622.83 samples/sec   Loss 5.3861   LearningRate 0.0011   Epoch: 35   Global Step: 180660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:00:33,633-Speed 10447.90 samples/sec   Loss 5.1959   LearningRate 0.0011   Epoch: 35   Global Step: 180670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:34,591-Speed 10700.53 samples/sec   Loss 5.3410   LearningRate 0.0011   Epoch: 35   Global Step: 180680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:35,571-Speed 10453.08 samples/sec   Loss 5.3222   LearningRate 0.0011   Epoch: 35   Global Step: 180690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:36,570-Speed 10262.05 samples/sec   Loss 5.2947   LearningRate 0.0011   Epoch: 35   Global Step: 180700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:37,559-Speed 10361.21 samples/sec   Loss 5.3579   LearningRate 0.0011   Epoch: 35   Global Step: 180710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:38,528-Speed 10569.91 samples/sec   Loss 5.2962   LearningRate 0.0011   Epoch: 35   Global Step: 180720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:39,517-Speed 10368.89 samples/sec   Loss 5.2850   LearningRate 0.0011   Epoch: 35   Global Step: 180730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:40,536-Speed 10056.36 samples/sec   Loss 5.3814   LearningRate 0.0011   Epoch: 35   Global Step: 180740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:41,549-Speed 10126.16 samples/sec   Loss 5.1395   LearningRate 0.0011   Epoch: 35   Global Step: 180750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:42,543-Speed 10306.33 samples/sec   Loss 5.4686   LearningRate 0.0011   Epoch: 35   Global Step: 180760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:43,523-Speed 10462.87 samples/sec   Loss 5.3874   LearningRate 0.0011   Epoch: 35   Global Step: 180770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:44,495-Speed 10539.24 samples/sec   Loss 5.3620   LearningRate 0.0011   Epoch: 35   Global Step: 180780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:45,506-Speed 10142.17 samples/sec   Loss 5.3270   LearningRate 0.0011   Epoch: 35   Global Step: 180790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:46,489-Speed 10426.59 samples/sec   Loss 5.3134   LearningRate 0.0011   Epoch: 35   Global Step: 180800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:47,484-Speed 10295.61 samples/sec   Loss 5.4067   LearningRate 0.0011   Epoch: 35   Global Step: 180810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:48,438-Speed 10747.41 samples/sec   Loss 5.2590   LearningRate 0.0011   Epoch: 35   Global Step: 180820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:49,436-Speed 10264.38 samples/sec   Loss 5.3257   LearningRate 0.0011   Epoch: 35   Global Step: 180830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:50,432-Speed 10292.27 samples/sec   Loss 5.4946   LearningRate 0.0011   Epoch: 35   Global Step: 180840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:51,426-Speed 10312.34 samples/sec   Loss 5.4467   LearningRate 0.0011   Epoch: 35   Global Step: 180850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:52,398-Speed 10542.13 samples/sec   Loss 5.4094   LearningRate 0.0011   Epoch: 35   Global Step: 180860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:00:53,357-Speed 10691.63 samples/sec   Loss 5.3263   LearningRate 0.0011   Epoch: 35   Global Step: 180870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:00:54,350-Speed 10312.38 samples/sec   Loss 5.4647   LearningRate 0.0011   Epoch: 35   Global Step: 180880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:00:55,343-Speed 10330.57 samples/sec   Loss 5.2107   LearningRate 0.0011   Epoch: 35   Global Step: 180890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:00:56,322-Speed 10463.47 samples/sec   Loss 5.3256   LearningRate 0.0011   Epoch: 35   Global Step: 180900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:00:57,297-Speed 10514.32 samples/sec   Loss 5.2436   LearningRate 0.0011   Epoch: 35   Global Step: 180910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:00:58,261-Speed 10629.81 samples/sec   Loss 5.3105   LearningRate 0.0011   Epoch: 35   Global Step: 180920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:00:59,258-Speed 10284.54 samples/sec   Loss 5.3479   LearningRate 0.0011   Epoch: 35   Global Step: 180930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:01:00,226-Speed 10586.46 samples/sec   Loss 5.2285   LearningRate 0.0011   Epoch: 35   Global Step: 180940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:01:01,182-Speed 10723.45 samples/sec   Loss 5.3008   LearningRate 0.0011   Epoch: 35   Global Step: 180950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:01:02,168-Speed 10391.55 samples/sec   Loss 5.4406   LearningRate 0.0011   Epoch: 35   Global Step: 180960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:01:03,134-Speed 10610.11 samples/sec   Loss 5.2940   LearningRate 0.0011   Epoch: 35   Global Step: 180970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:01:04,125-Speed 10342.72 samples/sec   Loss 5.2630   LearningRate 0.0011   Epoch: 35   Global Step: 180980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:01:05,114-Speed 10360.46 samples/sec   Loss 5.3152   LearningRate 0.0011   Epoch: 35   Global Step: 180990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:01:06,096-Speed 10435.83 samples/sec   Loss 5.3931   LearningRate 0.0011   Epoch: 35   Global Step: 181000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:01:07,059-Speed 10640.51 samples/sec   Loss 5.4643   LearningRate 0.0011   Epoch: 35   Global Step: 181010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:01:08,083-Speed 10013.12 samples/sec   Loss 5.3076   LearningRate 0.0011   Epoch: 35   Global Step: 181020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:01:09,071-Speed 10368.98 samples/sec   Loss 5.4361   LearningRate 0.0011   Epoch: 35   Global Step: 181030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:01:10,025-Speed 10742.91 samples/sec   Loss 5.3726   LearningRate 0.0011   Epoch: 35   Global Step: 181040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:01:10,988-Speed 10642.26 samples/sec   Loss 5.3109   LearningRate 0.0011   Epoch: 35   Global Step: 181050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:01:11,990-Speed 10233.51 samples/sec   Loss 5.2859   LearningRate 0.0011   Epoch: 35   Global Step: 181060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:01:12,981-Speed 10342.93 samples/sec   Loss 5.3379   LearningRate 0.0011   Epoch: 35   Global Step: 181070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:01:13,977-Speed 10290.12 samples/sec   Loss 5.4411   LearningRate 0.0011   Epoch: 35   Global Step: 181080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:01:14,929-Speed 10770.73 samples/sec   Loss 5.1581   LearningRate 0.0011   Epoch: 35   Global Step: 181090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:01:15,889-Speed 10667.88 samples/sec   Loss 5.3114   LearningRate 0.0011   Epoch: 35   Global Step: 181100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:01:16,874-Speed 10409.29 samples/sec   Loss 5.2306   LearningRate 0.0011   Epoch: 35   Global Step: 181110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:01:17,832-Speed 10694.02 samples/sec   Loss 5.3658   LearningRate 0.0011   Epoch: 35   Global Step: 181120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:01:18,843-Speed 10134.08 samples/sec   Loss 5.4429   LearningRate 0.0011   Epoch: 35   Global Step: 181130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:01:19,819-Speed 10508.03 samples/sec   Loss 5.3645   LearningRate 0.0011   Epoch: 35   Global Step: 181140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:01:20,802-Speed 10430.08 samples/sec   Loss 5.2834   LearningRate 0.0011   Epoch: 35   Global Step: 181150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:01:21,775-Speed 10530.80 samples/sec   Loss 5.4635   LearningRate 0.0011   Epoch: 35   Global Step: 181160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:01:22,772-Speed 10280.24 samples/sec   Loss 5.4040   LearningRate 0.0011   Epoch: 35   Global Step: 181170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:01:23,759-Speed 10380.33 samples/sec   Loss 5.4648   LearningRate 0.0011   Epoch: 35   Global Step: 181180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:01:24,764-Speed 10192.83 samples/sec   Loss 5.3731   LearningRate 0.0011   Epoch: 35   Global Step: 181190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:01:25,766-Speed 10235.01 samples/sec   Loss 5.3565   LearningRate 0.0011   Epoch: 35   Global Step: 181200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:01:26,765-Speed 10257.36 samples/sec   Loss 5.3777   LearningRate 0.0011   Epoch: 35   Global Step: 181210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:01:27,746-Speed 10450.11 samples/sec   Loss 5.2314   LearningRate 0.0011   Epoch: 35   Global Step: 181220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:01:28,755-Speed 10156.91 samples/sec   Loss 5.3205   LearningRate 0.0011   Epoch: 35   Global Step: 181230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:01:29,723-Speed 10589.25 samples/sec   Loss 5.4295   LearningRate 0.0011   Epoch: 35   Global Step: 181240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:01:30,713-Speed 10350.31 samples/sec   Loss 5.4291   LearningRate 0.0011   Epoch: 35   Global Step: 181250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:01:31,683-Speed 10567.32 samples/sec   Loss 5.3594   LearningRate 0.0011   Epoch: 35   Global Step: 181260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:01:32,662-Speed 10465.42 samples/sec   Loss 5.3142   LearningRate 0.0011   Epoch: 35   Global Step: 181270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:01:33,670-Speed 10165.11 samples/sec   Loss 5.3360   LearningRate 0.0011   Epoch: 35   Global Step: 181280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:01:34,649-Speed 10475.95 samples/sec   Loss 5.3078   LearningRate 0.0011   Epoch: 35   Global Step: 181290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:01:35,625-Speed 10500.13 samples/sec   Loss 5.3754   LearningRate 0.0011   Epoch: 35   Global Step: 181300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:01:36,563-Speed 10921.00 samples/sec   Loss 5.3370   LearningRate 0.0011   Epoch: 35   Global Step: 181310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:01:37,568-Speed 10201.65 samples/sec   Loss 5.3298   LearningRate 0.0011   Epoch: 35   Global Step: 181320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:01:38,610-Speed 9830.54 samples/sec   Loss 5.3674   LearningRate 0.0011   Epoch: 35   Global Step: 181330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:01:39,566-Speed 10721.83 samples/sec   Loss 5.4224   LearningRate 0.0011   Epoch: 35   Global Step: 181340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:01:40,556-Speed 10355.38 samples/sec   Loss 5.4396   LearningRate 0.0011   Epoch: 35   Global Step: 181350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:01:41,572-Speed 10082.25 samples/sec   Loss 5.4202   LearningRate 0.0011   Epoch: 35   Global Step: 181360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:01:42,564-Speed 10335.40 samples/sec   Loss 5.3244   LearningRate 0.0011   Epoch: 35   Global Step: 181370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:01:43,570-Speed 10186.67 samples/sec   Loss 5.2017   LearningRate 0.0011   Epoch: 35   Global Step: 181380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:01:44,548-Speed 10484.01 samples/sec   Loss 5.3831   LearningRate 0.0011   Epoch: 35   Global Step: 181390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:01:45,535-Speed 10380.28 samples/sec   Loss 5.1993   LearningRate 0.0011   Epoch: 35   Global Step: 181400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:01:46,500-Speed 10623.39 samples/sec   Loss 5.2611   LearningRate 0.0011   Epoch: 35   Global Step: 181410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:01:47,511-Speed 10126.92 samples/sec   Loss 5.3790   LearningRate 0.0011   Epoch: 35   Global Step: 181420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:01:48,527-Speed 10093.43 samples/sec   Loss 5.3582   LearningRate 0.0011   Epoch: 35   Global Step: 181430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:01:49,546-Speed 10056.64 samples/sec   Loss 5.5618   LearningRate 0.0011   Epoch: 35   Global Step: 181440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:01:50,498-Speed 10772.53 samples/sec   Loss 5.2075   LearningRate 0.0011   Epoch: 35   Global Step: 181450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:01:51,422-Speed 11095.40 samples/sec   Loss 5.3387   LearningRate 0.0011   Epoch: 35   Global Step: 181460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:01:52,393-Speed 10550.47 samples/sec   Loss 5.2511   LearningRate 0.0011   Epoch: 35   Global Step: 181470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:01:53,408-Speed 10095.31 samples/sec   Loss 5.3373   LearningRate 0.0011   Epoch: 35   Global Step: 181480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:01:54,390-Speed 10440.09 samples/sec   Loss 5.3576   LearningRate 0.0011   Epoch: 35   Global Step: 181490   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 06:01:55,360-Speed 10566.55 samples/sec   Loss 5.4458   LearningRate 0.0011   Epoch: 35   Global Step: 181500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:01:56,311-Speed 10788.80 samples/sec   Loss 5.4178   LearningRate 0.0011   Epoch: 35   Global Step: 181510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:01:57,270-Speed 10805.64 samples/sec   Loss 5.3705   LearningRate 0.0011   Epoch: 35   Global Step: 181520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:01:58,286-Speed 10093.30 samples/sec   Loss 5.3386   LearningRate 0.0011   Epoch: 35   Global Step: 181530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:01:59,275-Speed 10361.55 samples/sec   Loss 5.3290   LearningRate 0.0011   Epoch: 35   Global Step: 181540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:02:00,302-Speed 9974.63 samples/sec   Loss 5.2953   LearningRate 0.0011   Epoch: 35   Global Step: 181550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:02:01,301-Speed 10267.24 samples/sec   Loss 5.2663   LearningRate 0.0011   Epoch: 35   Global Step: 181560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:02:02,346-Speed 9803.47 samples/sec   Loss 5.3391   LearningRate 0.0011   Epoch: 35   Global Step: 181570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:02:03,350-Speed 10208.54 samples/sec   Loss 5.3198   LearningRate 0.0011   Epoch: 35   Global Step: 181580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:02:04,335-Speed 10404.68 samples/sec   Loss 5.4721   LearningRate 0.0010   Epoch: 35   Global Step: 181590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:02:05,324-Speed 10365.32 samples/sec   Loss 5.2889   LearningRate 0.0010   Epoch: 35   Global Step: 181600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:02:06,343-Speed 10065.97 samples/sec   Loss 5.3747   LearningRate 0.0010   Epoch: 35   Global Step: 181610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:02:07,346-Speed 10214.75 samples/sec   Loss 5.2939   LearningRate 0.0010   Epoch: 35   Global Step: 181620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:02:08,346-Speed 10240.87 samples/sec   Loss 5.3289   LearningRate 0.0010   Epoch: 35   Global Step: 181630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:02:09,312-Speed 10613.81 samples/sec   Loss 5.4371   LearningRate 0.0010   Epoch: 35   Global Step: 181640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:02:10,325-Speed 10122.94 samples/sec   Loss 5.3142   LearningRate 0.0010   Epoch: 35   Global Step: 181650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:02:11,301-Speed 10498.43 samples/sec   Loss 5.4310   LearningRate 0.0010   Epoch: 35   Global Step: 181660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:02:12,270-Speed 10581.57 samples/sec   Loss 5.4072   LearningRate 0.0010   Epoch: 35   Global Step: 181670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:02:13,264-Speed 10314.66 samples/sec   Loss 5.3246   LearningRate 0.0010   Epoch: 35   Global Step: 181680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:02:14,274-Speed 10147.47 samples/sec   Loss 5.3811   LearningRate 0.0010   Epoch: 35   Global Step: 181690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:02:15,243-Speed 10574.96 samples/sec   Loss 5.4601   LearningRate 0.0010   Epoch: 35   Global Step: 181700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:02:16,246-Speed 10219.96 samples/sec   Loss 5.4090   LearningRate 0.0010   Epoch: 35   Global Step: 181710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:02:17,202-Speed 10724.13 samples/sec   Loss 5.3038   LearningRate 0.0010   Epoch: 35   Global Step: 181720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:02:18,207-Speed 10195.77 samples/sec   Loss 5.2857   LearningRate 0.0010   Epoch: 35   Global Step: 181730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:02:19,183-Speed 10498.44 samples/sec   Loss 5.3609   LearningRate 0.0010   Epoch: 35   Global Step: 181740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:02:20,166-Speed 10435.23 samples/sec   Loss 5.3473   LearningRate 0.0010   Epoch: 35   Global Step: 181750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:02:21,193-Speed 9980.53 samples/sec   Loss 5.3293   LearningRate 0.0010   Epoch: 35   Global Step: 181760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:02:22,230-Speed 9915.74 samples/sec   Loss 5.3725   LearningRate 0.0010   Epoch: 35   Global Step: 181770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:02:23,181-Speed 10774.91 samples/sec   Loss 5.3516   LearningRate 0.0010   Epoch: 35   Global Step: 181780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:02:24,132-Speed 10776.03 samples/sec   Loss 5.3208   LearningRate 0.0010   Epoch: 35   Global Step: 181790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:02:25,118-Speed 10399.17 samples/sec   Loss 5.4153   LearningRate 0.0010   Epoch: 35   Global Step: 181800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:02:26,126-Speed 10177.26 samples/sec   Loss 5.3491   LearningRate 0.0010   Epoch: 35   Global Step: 181810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:02:27,153-Speed 9977.19 samples/sec   Loss 5.4244   LearningRate 0.0010   Epoch: 35   Global Step: 181820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:02:28,169-Speed 10088.68 samples/sec   Loss 5.2895   LearningRate 0.0010   Epoch: 35   Global Step: 181830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:02:29,189-Speed 10060.03 samples/sec   Loss 5.3417   LearningRate 0.0010   Epoch: 35   Global Step: 181840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:02:30,148-Speed 10683.58 samples/sec   Loss 5.3024   LearningRate 0.0010   Epoch: 35   Global Step: 181850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:02:31,142-Speed 10316.16 samples/sec   Loss 5.4278   LearningRate 0.0010   Epoch: 35   Global Step: 181860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:02:32,085-Speed 10861.77 samples/sec   Loss 5.3453   LearningRate 0.0010   Epoch: 35   Global Step: 181870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:02:33,102-Speed 10084.69 samples/sec   Loss 5.4478   LearningRate 0.0010   Epoch: 35   Global Step: 181880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:02:34,069-Speed 10598.95 samples/sec   Loss 5.3559   LearningRate 0.0010   Epoch: 35   Global Step: 181890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:02:35,040-Speed 10552.66 samples/sec   Loss 5.2526   LearningRate 0.0010   Epoch: 35   Global Step: 181900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:02:36,034-Speed 10316.83 samples/sec   Loss 5.3282   LearningRate 0.0010   Epoch: 35   Global Step: 181910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:02:37,027-Speed 10490.49 samples/sec   Loss 5.4628   LearningRate 0.0010   Epoch: 35   Global Step: 181920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:02:37,994-Speed 10604.17 samples/sec   Loss 5.3499   LearningRate 0.0010   Epoch: 35   Global Step: 181930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:02:38,953-Speed 10686.25 samples/sec   Loss 5.3796   LearningRate 0.0010   Epoch: 35   Global Step: 181940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:02:39,890-Speed 10931.11 samples/sec   Loss 5.3195   LearningRate 0.0010   Epoch: 35   Global Step: 181950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:02:40,897-Speed 10205.11 samples/sec   Loss 5.3335   LearningRate 0.0010   Epoch: 35   Global Step: 181960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:02:41,889-Speed 10336.00 samples/sec   Loss 5.4113   LearningRate 0.0010   Epoch: 35   Global Step: 181970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:02:42,847-Speed 10700.23 samples/sec   Loss 5.2982   LearningRate 0.0010   Epoch: 35   Global Step: 181980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:02:43,821-Speed 10531.93 samples/sec   Loss 5.2300   LearningRate 0.0010   Epoch: 35   Global Step: 181990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:02:44,814-Speed 10311.13 samples/sec   Loss 5.1695   LearningRate 0.0010   Epoch: 35   Global Step: 182000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:03:06,892-[lfw][182000]XNorm: 8.027318
Training: 2022-04-11 06:03:06,893-[lfw][182000]Accuracy-Flip: 0.99617+-0.00317
Training: 2022-04-11 06:03:06,893-[lfw][182000]Accuracy-Highest: 0.99700
Training: 2022-04-11 06:03:32,424-[cfp_fp][182000]XNorm: 6.932548
Training: 2022-04-11 06:03:32,425-[cfp_fp][182000]Accuracy-Flip: 0.97200+-0.00881
Training: 2022-04-11 06:03:32,426-[cfp_fp][182000]Accuracy-Highest: 0.97371
Training: 2022-04-11 06:03:54,550-[agedb_30][182000]XNorm: 7.849429
Training: 2022-04-11 06:03:54,551-[agedb_30][182000]Accuracy-Flip: 0.96967+-0.00748
Training: 2022-04-11 06:03:54,551-[agedb_30][182000]Accuracy-Highest: 0.97350
Training: 2022-04-11 06:03:55,520-Speed 144.84 samples/sec   Loss 5.4716   LearningRate 0.0010   Epoch: 35   Global Step: 182010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:03:56,537-Speed 10083.33 samples/sec   Loss 5.2808   LearningRate 0.0010   Epoch: 35   Global Step: 182020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:03:57,558-Speed 10036.67 samples/sec   Loss 5.5428   LearningRate 0.0010   Epoch: 35   Global Step: 182030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:03:58,596-Speed 9867.48 samples/sec   Loss 5.2196   LearningRate 0.0010   Epoch: 35   Global Step: 182040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:03:59,580-Speed 10415.55 samples/sec   Loss 5.3558   LearningRate 0.0010   Epoch: 35   Global Step: 182050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:04:00,546-Speed 10611.71 samples/sec   Loss 5.2759   LearningRate 0.0010   Epoch: 35   Global Step: 182060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:04:01,595-Speed 9772.67 samples/sec   Loss 5.3388   LearningRate 0.0010   Epoch: 35   Global Step: 182070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:04:02,632-Speed 9877.35 samples/sec   Loss 5.4365   LearningRate 0.0010   Epoch: 35   Global Step: 182080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:04:14,143-Speed 889.75 samples/sec   Loss 5.2290   LearningRate 0.0010   Epoch: 36   Global Step: 182090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:04:15,243-Speed 9323.19 samples/sec   Loss 5.2071   LearningRate 0.0010   Epoch: 36   Global Step: 182100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:16,314-Speed 9567.29 samples/sec   Loss 5.1624   LearningRate 0.0010   Epoch: 36   Global Step: 182110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:17,317-Speed 10211.34 samples/sec   Loss 5.0214   LearningRate 0.0010   Epoch: 36   Global Step: 182120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:18,415-Speed 9332.45 samples/sec   Loss 5.1929   LearningRate 0.0010   Epoch: 36   Global Step: 182130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:19,477-Speed 9660.82 samples/sec   Loss 5.2517   LearningRate 0.0010   Epoch: 36   Global Step: 182140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:20,511-Speed 9919.39 samples/sec   Loss 5.1852   LearningRate 0.0010   Epoch: 36   Global Step: 182150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:21,524-Speed 10122.96 samples/sec   Loss 5.1983   LearningRate 0.0010   Epoch: 36   Global Step: 182160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:22,543-Speed 10065.83 samples/sec   Loss 5.2549   LearningRate 0.0010   Epoch: 36   Global Step: 182170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:23,530-Speed 10386.23 samples/sec   Loss 5.1672   LearningRate 0.0010   Epoch: 36   Global Step: 182180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:24,585-Speed 9712.82 samples/sec   Loss 5.2124   LearningRate 0.0010   Epoch: 36   Global Step: 182190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:25,615-Speed 9953.30 samples/sec   Loss 5.1991   LearningRate 0.0010   Epoch: 36   Global Step: 182200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:26,631-Speed 10082.77 samples/sec   Loss 5.1138   LearningRate 0.0010   Epoch: 36   Global Step: 182210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:27,593-Speed 10649.94 samples/sec   Loss 5.3120   LearningRate 0.0010   Epoch: 36   Global Step: 182220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:28,754-Speed 8831.39 samples/sec   Loss 5.1548   LearningRate 0.0010   Epoch: 36   Global Step: 182230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:29,797-Speed 9831.64 samples/sec   Loss 5.1989   LearningRate 0.0010   Epoch: 36   Global Step: 182240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:30,830-Speed 9922.94 samples/sec   Loss 5.2663   LearningRate 0.0010   Epoch: 36   Global Step: 182250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:31,822-Speed 10330.82 samples/sec   Loss 5.2027   LearningRate 0.0010   Epoch: 36   Global Step: 182260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:32,830-Speed 10175.28 samples/sec   Loss 5.2413   LearningRate 0.0010   Epoch: 36   Global Step: 182270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:33,821-Speed 10341.65 samples/sec   Loss 5.1616   LearningRate 0.0010   Epoch: 36   Global Step: 182280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:34,932-Speed 9227.90 samples/sec   Loss 5.3736   LearningRate 0.0010   Epoch: 36   Global Step: 182290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:35,891-Speed 10678.83 samples/sec   Loss 5.0895   LearningRate 0.0010   Epoch: 36   Global Step: 182300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:36,890-Speed 10268.31 samples/sec   Loss 5.2795   LearningRate 0.0010   Epoch: 36   Global Step: 182310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:37,879-Speed 10356.08 samples/sec   Loss 5.0381   LearningRate 0.0010   Epoch: 36   Global Step: 182320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:38,906-Speed 9988.24 samples/sec   Loss 5.1000   LearningRate 0.0010   Epoch: 36   Global Step: 182330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:39,892-Speed 10391.52 samples/sec   Loss 5.2398   LearningRate 0.0010   Epoch: 36   Global Step: 182340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:40,881-Speed 10369.49 samples/sec   Loss 5.2865   LearningRate 0.0010   Epoch: 36   Global Step: 182350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:41,910-Speed 9955.84 samples/sec   Loss 5.2721   LearningRate 0.0010   Epoch: 36   Global Step: 182360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:42,881-Speed 10553.18 samples/sec   Loss 5.1472   LearningRate 0.0010   Epoch: 36   Global Step: 182370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:43,885-Speed 10210.19 samples/sec   Loss 5.2439   LearningRate 0.0010   Epoch: 36   Global Step: 182380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:44,890-Speed 10194.36 samples/sec   Loss 5.2475   LearningRate 0.0010   Epoch: 36   Global Step: 182390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:45,854-Speed 10630.70 samples/sec   Loss 5.2541   LearningRate 0.0010   Epoch: 36   Global Step: 182400   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 06:04:46,821-Speed 10608.73 samples/sec   Loss 5.2052   LearningRate 0.0010   Epoch: 36   Global Step: 182410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:47,821-Speed 10253.27 samples/sec   Loss 5.1513   LearningRate 0.0010   Epoch: 36   Global Step: 182420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:48,822-Speed 10230.22 samples/sec   Loss 5.2375   LearningRate 0.0010   Epoch: 36   Global Step: 182430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:49,872-Speed 9765.91 samples/sec   Loss 5.2700   LearningRate 0.0010   Epoch: 36   Global Step: 182440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:50,850-Speed 10493.72 samples/sec   Loss 5.2332   LearningRate 0.0010   Epoch: 36   Global Step: 182450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:51,822-Speed 10540.86 samples/sec   Loss 5.1031   LearningRate 0.0010   Epoch: 36   Global Step: 182460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:52,887-Speed 9622.84 samples/sec   Loss 5.2655   LearningRate 0.0010   Epoch: 36   Global Step: 182470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:53,858-Speed 10559.45 samples/sec   Loss 5.1656   LearningRate 0.0010   Epoch: 36   Global Step: 182480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:54,853-Speed 10304.22 samples/sec   Loss 5.1643   LearningRate 0.0010   Epoch: 36   Global Step: 182490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:55,822-Speed 10576.78 samples/sec   Loss 5.2262   LearningRate 0.0010   Epoch: 36   Global Step: 182500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:56,816-Speed 10311.69 samples/sec   Loss 5.1137   LearningRate 0.0010   Epoch: 36   Global Step: 182510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:04:57,830-Speed 10108.17 samples/sec   Loss 5.1570   LearningRate 0.0010   Epoch: 36   Global Step: 182520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:04:58,854-Speed 10012.49 samples/sec   Loss 5.1881   LearningRate 0.0010   Epoch: 36   Global Step: 182530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:04:59,830-Speed 10495.83 samples/sec   Loss 5.2197   LearningRate 0.0010   Epoch: 36   Global Step: 182540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:05:00,802-Speed 10546.73 samples/sec   Loss 5.2534   LearningRate 0.0010   Epoch: 36   Global Step: 182550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:05:01,791-Speed 10363.95 samples/sec   Loss 5.1948   LearningRate 0.0010   Epoch: 36   Global Step: 182560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:05:02,761-Speed 10564.34 samples/sec   Loss 5.1211   LearningRate 0.0010   Epoch: 36   Global Step: 182570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:05:03,762-Speed 10255.84 samples/sec   Loss 5.1222   LearningRate 0.0010   Epoch: 36   Global Step: 182580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:05:04,792-Speed 9949.85 samples/sec   Loss 5.1752   LearningRate 0.0010   Epoch: 36   Global Step: 182590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:05:05,794-Speed 10219.59 samples/sec   Loss 5.1152   LearningRate 0.0010   Epoch: 36   Global Step: 182600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:05:06,758-Speed 10637.20 samples/sec   Loss 5.0707   LearningRate 0.0009   Epoch: 36   Global Step: 182610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:05:07,747-Speed 10357.61 samples/sec   Loss 5.2029   LearningRate 0.0009   Epoch: 36   Global Step: 182620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:05:08,715-Speed 10590.34 samples/sec   Loss 5.2586   LearningRate 0.0009   Epoch: 36   Global Step: 182630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:05:09,722-Speed 10173.66 samples/sec   Loss 5.1627   LearningRate 0.0009   Epoch: 36   Global Step: 182640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:05:10,784-Speed 9651.30 samples/sec   Loss 5.1806   LearningRate 0.0009   Epoch: 36   Global Step: 182650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:05:11,784-Speed 10259.05 samples/sec   Loss 5.2427   LearningRate 0.0009   Epoch: 36   Global Step: 182660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:05:12,825-Speed 9838.81 samples/sec   Loss 5.0196   LearningRate 0.0009   Epoch: 36   Global Step: 182670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:05:13,801-Speed 10505.38 samples/sec   Loss 5.1922   LearningRate 0.0009   Epoch: 36   Global Step: 182680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:05:14,815-Speed 10102.87 samples/sec   Loss 5.2573   LearningRate 0.0009   Epoch: 36   Global Step: 182690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:05:15,861-Speed 9801.78 samples/sec   Loss 5.2478   LearningRate 0.0009   Epoch: 36   Global Step: 182700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:05:16,852-Speed 10340.86 samples/sec   Loss 5.1737   LearningRate 0.0009   Epoch: 36   Global Step: 182710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:05:17,870-Speed 10071.78 samples/sec   Loss 5.2441   LearningRate 0.0009   Epoch: 36   Global Step: 182720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:05:18,824-Speed 10739.39 samples/sec   Loss 5.1878   LearningRate 0.0009   Epoch: 36   Global Step: 182730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:05:19,942-Speed 9169.75 samples/sec   Loss 5.1249   LearningRate 0.0009   Epoch: 36   Global Step: 182740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:05:20,967-Speed 10003.41 samples/sec   Loss 5.1424   LearningRate 0.0009   Epoch: 36   Global Step: 182750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:05:22,053-Speed 9434.86 samples/sec   Loss 5.1393   LearningRate 0.0009   Epoch: 36   Global Step: 182760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:05:23,194-Speed 8982.61 samples/sec   Loss 5.1741   LearningRate 0.0009   Epoch: 36   Global Step: 182770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:05:24,196-Speed 10225.69 samples/sec   Loss 5.2613   LearningRate 0.0009   Epoch: 36   Global Step: 182780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:05:25,217-Speed 10044.48 samples/sec   Loss 5.1774   LearningRate 0.0009   Epoch: 36   Global Step: 182790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:05:26,232-Speed 10087.43 samples/sec   Loss 5.1517   LearningRate 0.0009   Epoch: 36   Global Step: 182800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:05:27,254-Speed 10035.45 samples/sec   Loss 5.3196   LearningRate 0.0009   Epoch: 36   Global Step: 182810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:05:28,220-Speed 10613.21 samples/sec   Loss 5.0702   LearningRate 0.0009   Epoch: 36   Global Step: 182820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:05:29,197-Speed 10483.49 samples/sec   Loss 5.2353   LearningRate 0.0009   Epoch: 36   Global Step: 182830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:05:30,216-Speed 10056.50 samples/sec   Loss 5.2158   LearningRate 0.0009   Epoch: 36   Global Step: 182840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:05:31,245-Speed 9955.81 samples/sec   Loss 5.1917   LearningRate 0.0009   Epoch: 36   Global Step: 182850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:05:32,279-Speed 9923.52 samples/sec   Loss 5.0642   LearningRate 0.0009   Epoch: 36   Global Step: 182860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:05:33,261-Speed 10437.58 samples/sec   Loss 5.2085   LearningRate 0.0009   Epoch: 36   Global Step: 182870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:05:34,269-Speed 10169.29 samples/sec   Loss 5.1389   LearningRate 0.0009   Epoch: 36   Global Step: 182880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:05:35,321-Speed 9749.49 samples/sec   Loss 5.1504   LearningRate 0.0009   Epoch: 36   Global Step: 182890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:05:36,316-Speed 10302.36 samples/sec   Loss 5.1627   LearningRate 0.0009   Epoch: 36   Global Step: 182900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:05:37,317-Speed 10238.53 samples/sec   Loss 5.1926   LearningRate 0.0009   Epoch: 36   Global Step: 182910   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 06:05:38,275-Speed 10697.22 samples/sec   Loss 5.1835   LearningRate 0.0009   Epoch: 36   Global Step: 182920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:05:39,299-Speed 10006.57 samples/sec   Loss 5.1877   LearningRate 0.0009   Epoch: 36   Global Step: 182930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:05:40,237-Speed 10937.13 samples/sec   Loss 5.2402   LearningRate 0.0009   Epoch: 36   Global Step: 182940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:05:41,243-Speed 10189.03 samples/sec   Loss 5.1594   LearningRate 0.0009   Epoch: 36   Global Step: 182950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:05:42,242-Speed 10260.68 samples/sec   Loss 5.1461   LearningRate 0.0009   Epoch: 36   Global Step: 182960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:05:43,270-Speed 9968.21 samples/sec   Loss 5.1940   LearningRate 0.0009   Epoch: 36   Global Step: 182970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:05:44,285-Speed 10102.45 samples/sec   Loss 5.2815   LearningRate 0.0009   Epoch: 36   Global Step: 182980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:05:45,301-Speed 10086.21 samples/sec   Loss 5.1155   LearningRate 0.0009   Epoch: 36   Global Step: 182990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:05:46,272-Speed 10554.76 samples/sec   Loss 5.1585   LearningRate 0.0009   Epoch: 36   Global Step: 183000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:05:47,217-Speed 10845.58 samples/sec   Loss 5.2023   LearningRate 0.0009   Epoch: 36   Global Step: 183010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:05:48,236-Speed 10052.78 samples/sec   Loss 5.0845   LearningRate 0.0009   Epoch: 36   Global Step: 183020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:05:49,212-Speed 10512.63 samples/sec   Loss 5.1962   LearningRate 0.0009   Epoch: 36   Global Step: 183030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:05:50,240-Speed 9966.69 samples/sec   Loss 5.0612   LearningRate 0.0009   Epoch: 36   Global Step: 183040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:05:51,233-Speed 10322.22 samples/sec   Loss 5.1834   LearningRate 0.0009   Epoch: 36   Global Step: 183050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:05:52,304-Speed 9572.08 samples/sec   Loss 5.1382   LearningRate 0.0009   Epoch: 36   Global Step: 183060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:05:53,253-Speed 10793.60 samples/sec   Loss 5.3679   LearningRate 0.0009   Epoch: 36   Global Step: 183070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:05:54,217-Speed 10638.14 samples/sec   Loss 5.2748   LearningRate 0.0009   Epoch: 36   Global Step: 183080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:05:55,209-Speed 10328.05 samples/sec   Loss 5.1731   LearningRate 0.0009   Epoch: 36   Global Step: 183090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:05:56,172-Speed 10642.84 samples/sec   Loss 5.2264   LearningRate 0.0009   Epoch: 36   Global Step: 183100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:05:57,173-Speed 10237.24 samples/sec   Loss 5.2212   LearningRate 0.0009   Epoch: 36   Global Step: 183110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:05:58,171-Speed 10260.62 samples/sec   Loss 5.0604   LearningRate 0.0009   Epoch: 36   Global Step: 183120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:05:59,175-Speed 10211.50 samples/sec   Loss 5.1465   LearningRate 0.0009   Epoch: 36   Global Step: 183130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:06:00,196-Speed 10045.13 samples/sec   Loss 5.0985   LearningRate 0.0009   Epoch: 36   Global Step: 183140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:06:01,216-Speed 10042.00 samples/sec   Loss 5.0835   LearningRate 0.0009   Epoch: 36   Global Step: 183150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:06:02,229-Speed 10117.86 samples/sec   Loss 5.2133   LearningRate 0.0009   Epoch: 36   Global Step: 183160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:06:03,232-Speed 10217.26 samples/sec   Loss 5.0836   LearningRate 0.0009   Epoch: 36   Global Step: 183170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:06:04,208-Speed 10510.25 samples/sec   Loss 5.1304   LearningRate 0.0009   Epoch: 36   Global Step: 183180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:06:05,145-Speed 10941.10 samples/sec   Loss 5.1762   LearningRate 0.0009   Epoch: 36   Global Step: 183190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:06:06,105-Speed 10666.52 samples/sec   Loss 5.2484   LearningRate 0.0009   Epoch: 36   Global Step: 183200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:06:07,091-Speed 10403.43 samples/sec   Loss 5.0294   LearningRate 0.0009   Epoch: 36   Global Step: 183210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:06:08,088-Speed 10278.40 samples/sec   Loss 5.1620   LearningRate 0.0009   Epoch: 36   Global Step: 183220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:06:09,054-Speed 10617.33 samples/sec   Loss 5.1885   LearningRate 0.0009   Epoch: 36   Global Step: 183230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:06:10,064-Speed 10141.87 samples/sec   Loss 5.2522   LearningRate 0.0009   Epoch: 36   Global Step: 183240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:06:11,037-Speed 10533.46 samples/sec   Loss 5.1829   LearningRate 0.0009   Epoch: 36   Global Step: 183250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:06:12,123-Speed 9437.03 samples/sec   Loss 5.1535   LearningRate 0.0009   Epoch: 36   Global Step: 183260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:06:13,098-Speed 10516.39 samples/sec   Loss 5.0362   LearningRate 0.0009   Epoch: 36   Global Step: 183270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:06:14,075-Speed 10494.88 samples/sec   Loss 5.1262   LearningRate 0.0009   Epoch: 36   Global Step: 183280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:06:15,109-Speed 9909.64 samples/sec   Loss 5.1682   LearningRate 0.0009   Epoch: 36   Global Step: 183290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:06:16,105-Speed 10288.81 samples/sec   Loss 5.1966   LearningRate 0.0009   Epoch: 36   Global Step: 183300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:06:17,071-Speed 10614.79 samples/sec   Loss 5.1231   LearningRate 0.0009   Epoch: 36   Global Step: 183310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:06:18,076-Speed 10195.42 samples/sec   Loss 5.3167   LearningRate 0.0009   Epoch: 36   Global Step: 183320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:06:19,076-Speed 10249.94 samples/sec   Loss 5.0709   LearningRate 0.0009   Epoch: 36   Global Step: 183330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:06:20,059-Speed 10428.29 samples/sec   Loss 5.2532   LearningRate 0.0009   Epoch: 36   Global Step: 183340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:06:21,054-Speed 10292.69 samples/sec   Loss 5.2776   LearningRate 0.0009   Epoch: 36   Global Step: 183350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:06:22,052-Speed 10277.22 samples/sec   Loss 5.1305   LearningRate 0.0009   Epoch: 36   Global Step: 183360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:06:23,171-Speed 9161.85 samples/sec   Loss 5.1990   LearningRate 0.0009   Epoch: 36   Global Step: 183370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:06:24,177-Speed 10187.18 samples/sec   Loss 5.1744   LearningRate 0.0009   Epoch: 36   Global Step: 183380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:06:25,139-Speed 10655.07 samples/sec   Loss 5.0266   LearningRate 0.0009   Epoch: 36   Global Step: 183390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:06:26,138-Speed 10257.39 samples/sec   Loss 5.0692   LearningRate 0.0009   Epoch: 36   Global Step: 183400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:06:27,143-Speed 10196.08 samples/sec   Loss 5.3206   LearningRate 0.0009   Epoch: 36   Global Step: 183410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:06:28,139-Speed 10288.33 samples/sec   Loss 5.2689   LearningRate 0.0009   Epoch: 36   Global Step: 183420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:06:29,175-Speed 9897.43 samples/sec   Loss 5.0921   LearningRate 0.0009   Epoch: 36   Global Step: 183430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:06:30,160-Speed 10401.23 samples/sec   Loss 5.3016   LearningRate 0.0009   Epoch: 36   Global Step: 183440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:06:31,136-Speed 10507.10 samples/sec   Loss 5.1695   LearningRate 0.0009   Epoch: 36   Global Step: 183450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:06:32,146-Speed 10146.94 samples/sec   Loss 5.2425   LearningRate 0.0009   Epoch: 36   Global Step: 183460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:06:33,144-Speed 10271.90 samples/sec   Loss 5.1864   LearningRate 0.0009   Epoch: 36   Global Step: 183470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:06:34,123-Speed 10467.60 samples/sec   Loss 5.2794   LearningRate 0.0009   Epoch: 36   Global Step: 183480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:06:35,103-Speed 10460.02 samples/sec   Loss 5.2162   LearningRate 0.0009   Epoch: 36   Global Step: 183490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:06:36,094-Speed 10343.22 samples/sec   Loss 5.2243   LearningRate 0.0009   Epoch: 36   Global Step: 183500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:06:37,142-Speed 9777.60 samples/sec   Loss 5.0755   LearningRate 0.0009   Epoch: 36   Global Step: 183510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:06:38,156-Speed 10108.25 samples/sec   Loss 5.0616   LearningRate 0.0009   Epoch: 36   Global Step: 183520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:06:39,131-Speed 10515.76 samples/sec   Loss 5.1832   LearningRate 0.0009   Epoch: 36   Global Step: 183530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:06:40,199-Speed 9593.15 samples/sec   Loss 5.2487   LearningRate 0.0009   Epoch: 36   Global Step: 183540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:06:41,180-Speed 10449.87 samples/sec   Loss 5.3519   LearningRate 0.0009   Epoch: 36   Global Step: 183550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:06:42,135-Speed 10732.60 samples/sec   Loss 5.1853   LearningRate 0.0009   Epoch: 36   Global Step: 183560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:06:43,144-Speed 10160.65 samples/sec   Loss 5.1087   LearningRate 0.0009   Epoch: 36   Global Step: 183570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:06:44,194-Speed 9765.08 samples/sec   Loss 5.1844   LearningRate 0.0009   Epoch: 36   Global Step: 183580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:06:45,208-Speed 10099.89 samples/sec   Loss 5.3707   LearningRate 0.0009   Epoch: 36   Global Step: 183590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:06:46,169-Speed 10672.61 samples/sec   Loss 5.3269   LearningRate 0.0009   Epoch: 36   Global Step: 183600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:06:47,178-Speed 10152.39 samples/sec   Loss 5.0584   LearningRate 0.0009   Epoch: 36   Global Step: 183610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:06:48,194-Speed 10098.49 samples/sec   Loss 5.1949   LearningRate 0.0009   Epoch: 36   Global Step: 183620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:06:49,198-Speed 10200.44 samples/sec   Loss 5.2250   LearningRate 0.0009   Epoch: 36   Global Step: 183630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:06:50,197-Speed 10260.70 samples/sec   Loss 5.1113   LearningRate 0.0009   Epoch: 36   Global Step: 183640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:06:51,181-Speed 10424.87 samples/sec   Loss 5.1868   LearningRate 0.0009   Epoch: 36   Global Step: 183650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:06:52,135-Speed 10737.57 samples/sec   Loss 5.2914   LearningRate 0.0009   Epoch: 36   Global Step: 183660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:06:53,116-Speed 10453.87 samples/sec   Loss 5.2917   LearningRate 0.0008   Epoch: 36   Global Step: 183670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:06:54,105-Speed 10369.94 samples/sec   Loss 5.3920   LearningRate 0.0008   Epoch: 36   Global Step: 183680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:06:55,093-Speed 10375.13 samples/sec   Loss 5.2397   LearningRate 0.0008   Epoch: 36   Global Step: 183690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:06:56,075-Speed 10432.61 samples/sec   Loss 5.2695   LearningRate 0.0008   Epoch: 36   Global Step: 183700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:06:57,055-Speed 10460.75 samples/sec   Loss 5.1882   LearningRate 0.0008   Epoch: 36   Global Step: 183710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:06:58,023-Speed 10580.30 samples/sec   Loss 5.2449   LearningRate 0.0008   Epoch: 36   Global Step: 183720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:06:59,012-Speed 10362.70 samples/sec   Loss 5.2092   LearningRate 0.0008   Epoch: 36   Global Step: 183730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:06:59,982-Speed 10566.61 samples/sec   Loss 5.1521   LearningRate 0.0008   Epoch: 36   Global Step: 183740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:07:00,961-Speed 10469.09 samples/sec   Loss 5.0390   LearningRate 0.0008   Epoch: 36   Global Step: 183750   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 06:07:01,959-Speed 10268.06 samples/sec   Loss 5.3388   LearningRate 0.0008   Epoch: 36   Global Step: 183760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:07:02,926-Speed 10602.92 samples/sec   Loss 5.3390   LearningRate 0.0008   Epoch: 36   Global Step: 183770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:07:03,925-Speed 10256.80 samples/sec   Loss 5.2150   LearningRate 0.0008   Epoch: 36   Global Step: 183780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:07:04,909-Speed 10413.14 samples/sec   Loss 5.2983   LearningRate 0.0008   Epoch: 36   Global Step: 183790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:07:05,929-Speed 10052.95 samples/sec   Loss 5.2874   LearningRate 0.0008   Epoch: 36   Global Step: 183800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:07:06,943-Speed 10104.83 samples/sec   Loss 5.1466   LearningRate 0.0008   Epoch: 36   Global Step: 183810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:07:07,956-Speed 10125.34 samples/sec   Loss 5.2050   LearningRate 0.0008   Epoch: 36   Global Step: 183820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:07:08,994-Speed 9871.22 samples/sec   Loss 5.3337   LearningRate 0.0008   Epoch: 36   Global Step: 183830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:07:10,023-Speed 9959.40 samples/sec   Loss 5.0059   LearningRate 0.0008   Epoch: 36   Global Step: 183840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:07:10,968-Speed 10840.19 samples/sec   Loss 5.2754   LearningRate 0.0008   Epoch: 36   Global Step: 183850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:07:11,973-Speed 10203.12 samples/sec   Loss 5.2268   LearningRate 0.0008   Epoch: 36   Global Step: 183860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:07:12,925-Speed 10759.30 samples/sec   Loss 5.3168   LearningRate 0.0008   Epoch: 36   Global Step: 183870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:07:13,922-Speed 10281.78 samples/sec   Loss 5.2285   LearningRate 0.0008   Epoch: 36   Global Step: 183880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:07:14,961-Speed 9866.27 samples/sec   Loss 5.2044   LearningRate 0.0008   Epoch: 36   Global Step: 183890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:07:15,956-Speed 10306.02 samples/sec   Loss 5.3447   LearningRate 0.0008   Epoch: 36   Global Step: 183900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:07:16,934-Speed 10477.77 samples/sec   Loss 5.1516   LearningRate 0.0008   Epoch: 36   Global Step: 183910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:07:17,915-Speed 10442.77 samples/sec   Loss 5.2597   LearningRate 0.0008   Epoch: 36   Global Step: 183920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:07:18,933-Speed 10073.37 samples/sec   Loss 5.1830   LearningRate 0.0008   Epoch: 36   Global Step: 183930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:07:19,933-Speed 10240.71 samples/sec   Loss 5.1072   LearningRate 0.0008   Epoch: 36   Global Step: 183940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:07:20,923-Speed 10364.27 samples/sec   Loss 5.1319   LearningRate 0.0008   Epoch: 36   Global Step: 183950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:07:21,930-Speed 10181.46 samples/sec   Loss 5.0650   LearningRate 0.0008   Epoch: 36   Global Step: 183960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:07:22,942-Speed 10125.76 samples/sec   Loss 5.1968   LearningRate 0.0008   Epoch: 36   Global Step: 183970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:07:23,955-Speed 10117.10 samples/sec   Loss 5.2688   LearningRate 0.0008   Epoch: 36   Global Step: 183980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:07:24,968-Speed 10113.49 samples/sec   Loss 5.2746   LearningRate 0.0008   Epoch: 36   Global Step: 183990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:07:25,942-Speed 10527.39 samples/sec   Loss 5.1393   LearningRate 0.0008   Epoch: 36   Global Step: 184000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:07:48,250-[lfw][184000]XNorm: 7.989775
Training: 2022-04-11 06:07:48,251-[lfw][184000]Accuracy-Flip: 0.99583+-0.00318
Training: 2022-04-11 06:07:48,251-[lfw][184000]Accuracy-Highest: 0.99700
Training: 2022-04-11 06:08:14,063-[cfp_fp][184000]XNorm: 6.910501
Training: 2022-04-11 06:08:14,063-[cfp_fp][184000]Accuracy-Flip: 0.97343+-0.00718
Training: 2022-04-11 06:08:14,065-[cfp_fp][184000]Accuracy-Highest: 0.97371
Training: 2022-04-11 06:08:36,362-[agedb_30][184000]XNorm: 7.817207
Training: 2022-04-11 06:08:36,363-[agedb_30][184000]Accuracy-Flip: 0.97200+-0.00726
Training: 2022-04-11 06:08:36,364-[agedb_30][184000]Accuracy-Highest: 0.97350
Training: 2022-04-11 06:08:37,371-Speed 143.36 samples/sec   Loss 5.1371   LearningRate 0.0008   Epoch: 36   Global Step: 184010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:08:38,345-Speed 10528.80 samples/sec   Loss 5.2739   LearningRate 0.0008   Epoch: 36   Global Step: 184020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:08:39,375-Speed 9964.19 samples/sec   Loss 5.1367   LearningRate 0.0008   Epoch: 36   Global Step: 184030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:08:40,387-Speed 10125.55 samples/sec   Loss 5.2620   LearningRate 0.0008   Epoch: 36   Global Step: 184040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:08:41,371-Speed 10419.11 samples/sec   Loss 5.1727   LearningRate 0.0008   Epoch: 36   Global Step: 184050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:08:42,408-Speed 9881.13 samples/sec   Loss 5.1895   LearningRate 0.0008   Epoch: 36   Global Step: 184060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:08:43,411-Speed 10229.49 samples/sec   Loss 5.1051   LearningRate 0.0008   Epoch: 36   Global Step: 184070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:08:44,367-Speed 10724.82 samples/sec   Loss 5.2730   LearningRate 0.0008   Epoch: 36   Global Step: 184080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:08:45,370-Speed 10213.62 samples/sec   Loss 5.1597   LearningRate 0.0008   Epoch: 36   Global Step: 184090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:08:46,372-Speed 10231.51 samples/sec   Loss 5.1933   LearningRate 0.0008   Epoch: 36   Global Step: 184100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:08:47,374-Speed 10229.61 samples/sec   Loss 5.2251   LearningRate 0.0008   Epoch: 36   Global Step: 184110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:08:48,372-Speed 10270.22 samples/sec   Loss 5.3927   LearningRate 0.0008   Epoch: 36   Global Step: 184120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:08:49,411-Speed 9867.42 samples/sec   Loss 5.1450   LearningRate 0.0008   Epoch: 36   Global Step: 184130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:08:50,404-Speed 10323.59 samples/sec   Loss 5.2115   LearningRate 0.0008   Epoch: 36   Global Step: 184140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:08:51,380-Speed 10510.75 samples/sec   Loss 5.2764   LearningRate 0.0008   Epoch: 36   Global Step: 184150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:08:52,482-Speed 9494.87 samples/sec   Loss 5.1779   LearningRate 0.0008   Epoch: 36   Global Step: 184160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:08:53,556-Speed 9544.53 samples/sec   Loss 5.0986   LearningRate 0.0008   Epoch: 36   Global Step: 184170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:08:54,553-Speed 10286.20 samples/sec   Loss 5.1653   LearningRate 0.0008   Epoch: 36   Global Step: 184180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:08:55,512-Speed 10687.23 samples/sec   Loss 5.2400   LearningRate 0.0008   Epoch: 36   Global Step: 184190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:08:56,498-Speed 10394.70 samples/sec   Loss 5.3213   LearningRate 0.0008   Epoch: 36   Global Step: 184200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:08:57,546-Speed 9781.21 samples/sec   Loss 5.1349   LearningRate 0.0008   Epoch: 36   Global Step: 184210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:08:58,565-Speed 10061.62 samples/sec   Loss 5.2188   LearningRate 0.0008   Epoch: 36   Global Step: 184220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:08:59,614-Speed 9772.12 samples/sec   Loss 5.1195   LearningRate 0.0008   Epoch: 36   Global Step: 184230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:09:00,616-Speed 10230.05 samples/sec   Loss 5.2596   LearningRate 0.0008   Epoch: 36   Global Step: 184240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:09:01,611-Speed 10303.64 samples/sec   Loss 5.2861   LearningRate 0.0008   Epoch: 36   Global Step: 184250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:09:02,616-Speed 10194.80 samples/sec   Loss 5.4233   LearningRate 0.0008   Epoch: 36   Global Step: 184260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:09:03,615-Speed 10267.04 samples/sec   Loss 5.1626   LearningRate 0.0008   Epoch: 36   Global Step: 184270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:09:04,614-Speed 10257.87 samples/sec   Loss 5.3278   LearningRate 0.0008   Epoch: 36   Global Step: 184280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:09:05,604-Speed 10359.79 samples/sec   Loss 5.2094   LearningRate 0.0008   Epoch: 36   Global Step: 184290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:09:06,567-Speed 10647.57 samples/sec   Loss 5.1688   LearningRate 0.0008   Epoch: 36   Global Step: 184300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:09:07,542-Speed 10511.98 samples/sec   Loss 5.2953   LearningRate 0.0008   Epoch: 36   Global Step: 184310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:09:08,528-Speed 10390.74 samples/sec   Loss 5.2589   LearningRate 0.0008   Epoch: 36   Global Step: 184320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:09:09,505-Speed 10489.36 samples/sec   Loss 5.1964   LearningRate 0.0008   Epoch: 36   Global Step: 184330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:09:10,539-Speed 9916.30 samples/sec   Loss 5.1605   LearningRate 0.0008   Epoch: 36   Global Step: 184340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:09:11,554-Speed 10098.53 samples/sec   Loss 5.3086   LearningRate 0.0008   Epoch: 36   Global Step: 184350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:09:12,558-Speed 10214.44 samples/sec   Loss 5.3309   LearningRate 0.0008   Epoch: 36   Global Step: 184360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:09:13,553-Speed 10297.23 samples/sec   Loss 5.1201   LearningRate 0.0008   Epoch: 36   Global Step: 184370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:09:14,489-Speed 10959.04 samples/sec   Loss 5.1089   LearningRate 0.0008   Epoch: 36   Global Step: 184380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:09:15,497-Speed 10161.07 samples/sec   Loss 5.2187   LearningRate 0.0008   Epoch: 36   Global Step: 184390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:09:16,521-Speed 10008.60 samples/sec   Loss 5.3501   LearningRate 0.0008   Epoch: 36   Global Step: 184400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:09:17,481-Speed 10677.13 samples/sec   Loss 5.2713   LearningRate 0.0008   Epoch: 36   Global Step: 184410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:09:18,466-Speed 10400.39 samples/sec   Loss 5.1877   LearningRate 0.0008   Epoch: 36   Global Step: 184420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:09:19,455-Speed 10365.95 samples/sec   Loss 5.3467   LearningRate 0.0008   Epoch: 36   Global Step: 184430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:09:20,467-Speed 10132.27 samples/sec   Loss 5.1964   LearningRate 0.0008   Epoch: 36   Global Step: 184440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:09:21,433-Speed 10608.23 samples/sec   Loss 5.0971   LearningRate 0.0008   Epoch: 36   Global Step: 184450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:09:22,391-Speed 10698.63 samples/sec   Loss 5.1833   LearningRate 0.0008   Epoch: 36   Global Step: 184460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:09:23,427-Speed 9886.25 samples/sec   Loss 5.1017   LearningRate 0.0008   Epoch: 36   Global Step: 184470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:09:24,470-Speed 9835.53 samples/sec   Loss 5.3021   LearningRate 0.0008   Epoch: 36   Global Step: 184480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:09:25,459-Speed 10353.63 samples/sec   Loss 5.1853   LearningRate 0.0008   Epoch: 36   Global Step: 184490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:09:26,468-Speed 10160.18 samples/sec   Loss 5.0913   LearningRate 0.0008   Epoch: 36   Global Step: 184500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:09:27,457-Speed 10361.02 samples/sec   Loss 5.2721   LearningRate 0.0008   Epoch: 36   Global Step: 184510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:09:28,502-Speed 9804.09 samples/sec   Loss 5.2809   LearningRate 0.0008   Epoch: 36   Global Step: 184520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:09:29,469-Speed 10612.50 samples/sec   Loss 5.2292   LearningRate 0.0008   Epoch: 36   Global Step: 184530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:09:30,424-Speed 10735.69 samples/sec   Loss 5.2501   LearningRate 0.0008   Epoch: 36   Global Step: 184540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:09:31,412-Speed 10375.11 samples/sec   Loss 5.2165   LearningRate 0.0008   Epoch: 36   Global Step: 184550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:09:32,458-Speed 9801.03 samples/sec   Loss 5.1454   LearningRate 0.0008   Epoch: 36   Global Step: 184560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:09:33,462-Speed 10207.50 samples/sec   Loss 5.2651   LearningRate 0.0008   Epoch: 36   Global Step: 184570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:09:34,492-Speed 9951.82 samples/sec   Loss 5.2213   LearningRate 0.0008   Epoch: 36   Global Step: 184580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:09:35,506-Speed 10102.81 samples/sec   Loss 5.2060   LearningRate 0.0008   Epoch: 36   Global Step: 184590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:09:36,461-Speed 10728.59 samples/sec   Loss 5.0788   LearningRate 0.0008   Epoch: 36   Global Step: 184600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:09:37,426-Speed 10626.84 samples/sec   Loss 5.2544   LearningRate 0.0008   Epoch: 36   Global Step: 184610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:09:38,401-Speed 10512.50 samples/sec   Loss 5.2364   LearningRate 0.0008   Epoch: 36   Global Step: 184620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:09:39,413-Speed 10128.17 samples/sec   Loss 5.2262   LearningRate 0.0008   Epoch: 36   Global Step: 184630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:09:40,452-Speed 9864.22 samples/sec   Loss 5.2638   LearningRate 0.0008   Epoch: 36   Global Step: 184640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:09:41,427-Speed 10525.49 samples/sec   Loss 5.1336   LearningRate 0.0008   Epoch: 36   Global Step: 184650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:09:42,451-Speed 10002.21 samples/sec   Loss 5.1621   LearningRate 0.0008   Epoch: 36   Global Step: 184660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:09:43,421-Speed 10573.89 samples/sec   Loss 5.2954   LearningRate 0.0008   Epoch: 36   Global Step: 184670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:09:44,395-Speed 10515.39 samples/sec   Loss 5.3487   LearningRate 0.0008   Epoch: 36   Global Step: 184680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:09:45,399-Speed 10206.26 samples/sec   Loss 5.2175   LearningRate 0.0008   Epoch: 36   Global Step: 184690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:09:46,393-Speed 10312.52 samples/sec   Loss 5.2575   LearningRate 0.0008   Epoch: 36   Global Step: 184700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:09:47,397-Speed 10215.32 samples/sec   Loss 5.2325   LearningRate 0.0008   Epoch: 36   Global Step: 184710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:09:48,372-Speed 10509.93 samples/sec   Loss 5.1566   LearningRate 0.0008   Epoch: 36   Global Step: 184720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:09:49,372-Speed 10253.99 samples/sec   Loss 5.1633   LearningRate 0.0008   Epoch: 36   Global Step: 184730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:09:50,359-Speed 10382.20 samples/sec   Loss 5.2829   LearningRate 0.0008   Epoch: 36   Global Step: 184740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:09:51,393-Speed 9909.87 samples/sec   Loss 5.2421   LearningRate 0.0008   Epoch: 36   Global Step: 184750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:09:52,390-Speed 10282.63 samples/sec   Loss 5.3048   LearningRate 0.0008   Epoch: 36   Global Step: 184760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:09:53,405-Speed 10098.58 samples/sec   Loss 5.1273   LearningRate 0.0008   Epoch: 36   Global Step: 184770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:09:54,422-Speed 10073.53 samples/sec   Loss 5.2999   LearningRate 0.0008   Epoch: 36   Global Step: 184780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:09:55,440-Speed 10080.28 samples/sec   Loss 5.2058   LearningRate 0.0008   Epoch: 36   Global Step: 184790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:09:56,446-Speed 10189.32 samples/sec   Loss 5.2819   LearningRate 0.0007   Epoch: 36   Global Step: 184800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:09:57,410-Speed 10625.66 samples/sec   Loss 5.2586   LearningRate 0.0007   Epoch: 36   Global Step: 184810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:09:58,379-Speed 10579.83 samples/sec   Loss 5.2636   LearningRate 0.0007   Epoch: 36   Global Step: 184820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:09:59,382-Speed 10224.28 samples/sec   Loss 5.2640   LearningRate 0.0007   Epoch: 36   Global Step: 184830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:10:00,342-Speed 10669.41 samples/sec   Loss 5.2319   LearningRate 0.0007   Epoch: 36   Global Step: 184840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:10:01,282-Speed 10909.36 samples/sec   Loss 5.2751   LearningRate 0.0007   Epoch: 36   Global Step: 184850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:10:02,275-Speed 10319.15 samples/sec   Loss 5.1586   LearningRate 0.0007   Epoch: 36   Global Step: 184860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:10:03,299-Speed 10009.03 samples/sec   Loss 5.3056   LearningRate 0.0007   Epoch: 36   Global Step: 184870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:10:04,330-Speed 9938.70 samples/sec   Loss 5.2634   LearningRate 0.0007   Epoch: 36   Global Step: 184880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:10:05,323-Speed 10325.71 samples/sec   Loss 5.1892   LearningRate 0.0007   Epoch: 36   Global Step: 184890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:10:06,284-Speed 10661.99 samples/sec   Loss 5.2207   LearningRate 0.0007   Epoch: 36   Global Step: 184900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:10:07,319-Speed 9901.83 samples/sec   Loss 5.2531   LearningRate 0.0007   Epoch: 36   Global Step: 184910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:10:08,274-Speed 10742.46 samples/sec   Loss 5.2054   LearningRate 0.0007   Epoch: 36   Global Step: 184920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:10:09,297-Speed 10031.32 samples/sec   Loss 5.2121   LearningRate 0.0007   Epoch: 36   Global Step: 184930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:10:10,316-Speed 10046.67 samples/sec   Loss 5.1303   LearningRate 0.0007   Epoch: 36   Global Step: 184940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:10:11,355-Speed 9873.20 samples/sec   Loss 5.1727   LearningRate 0.0007   Epoch: 36   Global Step: 184950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:10:12,315-Speed 10683.13 samples/sec   Loss 5.3219   LearningRate 0.0007   Epoch: 36   Global Step: 184960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:10:13,296-Speed 10444.36 samples/sec   Loss 5.2548   LearningRate 0.0007   Epoch: 36   Global Step: 184970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:10:14,285-Speed 10355.40 samples/sec   Loss 5.1744   LearningRate 0.0007   Epoch: 36   Global Step: 184980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:10:15,325-Speed 9861.95 samples/sec   Loss 5.3276   LearningRate 0.0007   Epoch: 36   Global Step: 184990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:10:16,301-Speed 10502.24 samples/sec   Loss 5.2612   LearningRate 0.0007   Epoch: 36   Global Step: 185000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:10:17,309-Speed 10162.58 samples/sec   Loss 5.2408   LearningRate 0.0007   Epoch: 36   Global Step: 185010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:10:18,281-Speed 10540.82 samples/sec   Loss 5.2839   LearningRate 0.0007   Epoch: 36   Global Step: 185020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:10:19,308-Speed 9983.97 samples/sec   Loss 5.0822   LearningRate 0.0007   Epoch: 36   Global Step: 185030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:10:20,309-Speed 10236.97 samples/sec   Loss 5.2064   LearningRate 0.0007   Epoch: 36   Global Step: 185040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:10:21,290-Speed 10441.05 samples/sec   Loss 5.2856   LearningRate 0.0007   Epoch: 36   Global Step: 185050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:10:22,276-Speed 10402.54 samples/sec   Loss 5.1982   LearningRate 0.0007   Epoch: 36   Global Step: 185060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:10:23,224-Speed 10810.72 samples/sec   Loss 5.1321   LearningRate 0.0007   Epoch: 36   Global Step: 185070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:10:24,205-Speed 10456.95 samples/sec   Loss 5.2481   LearningRate 0.0007   Epoch: 36   Global Step: 185080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:10:25,205-Speed 10240.92 samples/sec   Loss 5.2291   LearningRate 0.0007   Epoch: 36   Global Step: 185090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:10:26,208-Speed 10219.43 samples/sec   Loss 5.3315   LearningRate 0.0007   Epoch: 36   Global Step: 185100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:10:27,179-Speed 10556.36 samples/sec   Loss 5.2323   LearningRate 0.0007   Epoch: 36   Global Step: 185110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:10:28,152-Speed 10545.24 samples/sec   Loss 5.2127   LearningRate 0.0007   Epoch: 36   Global Step: 185120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:10:29,144-Speed 10333.86 samples/sec   Loss 5.2885   LearningRate 0.0007   Epoch: 36   Global Step: 185130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:10:30,143-Speed 10258.20 samples/sec   Loss 5.0970   LearningRate 0.0007   Epoch: 36   Global Step: 185140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:10:31,192-Speed 9766.41 samples/sec   Loss 5.1860   LearningRate 0.0007   Epoch: 36   Global Step: 185150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:10:32,188-Speed 10285.30 samples/sec   Loss 5.2967   LearningRate 0.0007   Epoch: 36   Global Step: 185160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:10:33,156-Speed 10595.61 samples/sec   Loss 5.1593   LearningRate 0.0007   Epoch: 36   Global Step: 185170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:10:34,156-Speed 10243.74 samples/sec   Loss 5.2139   LearningRate 0.0007   Epoch: 36   Global Step: 185180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:10:35,124-Speed 10593.84 samples/sec   Loss 5.2028   LearningRate 0.0007   Epoch: 36   Global Step: 185190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:10:36,102-Speed 10481.78 samples/sec   Loss 5.1260   LearningRate 0.0007   Epoch: 36   Global Step: 185200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:10:37,136-Speed 9918.37 samples/sec   Loss 5.2462   LearningRate 0.0007   Epoch: 36   Global Step: 185210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:10:38,120-Speed 10415.15 samples/sec   Loss 5.2295   LearningRate 0.0007   Epoch: 36   Global Step: 185220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:10:39,082-Speed 10661.54 samples/sec   Loss 5.2062   LearningRate 0.0007   Epoch: 36   Global Step: 185230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:10:40,074-Speed 10324.61 samples/sec   Loss 5.2245   LearningRate 0.0007   Epoch: 36   Global Step: 185240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:10:41,057-Speed 10441.05 samples/sec   Loss 5.3157   LearningRate 0.0007   Epoch: 36   Global Step: 185250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:10:42,057-Speed 10249.61 samples/sec   Loss 5.2429   LearningRate 0.0007   Epoch: 36   Global Step: 185260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:10:43,079-Speed 10030.36 samples/sec   Loss 5.1376   LearningRate 0.0007   Epoch: 36   Global Step: 185270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:10:44,045-Speed 10604.15 samples/sec   Loss 5.3448   LearningRate 0.0007   Epoch: 36   Global Step: 185280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:10:45,040-Speed 10299.37 samples/sec   Loss 5.1484   LearningRate 0.0007   Epoch: 36   Global Step: 185290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:10:45,983-Speed 10873.33 samples/sec   Loss 5.2287   LearningRate 0.0007   Epoch: 36   Global Step: 185300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:10:46,965-Speed 10439.66 samples/sec   Loss 5.2744   LearningRate 0.0007   Epoch: 36   Global Step: 185310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:10:47,957-Speed 10336.55 samples/sec   Loss 5.0948   LearningRate 0.0007   Epoch: 36   Global Step: 185320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:10:48,964-Speed 10171.46 samples/sec   Loss 5.2418   LearningRate 0.0007   Epoch: 36   Global Step: 185330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:10:49,977-Speed 10123.38 samples/sec   Loss 5.1332   LearningRate 0.0007   Epoch: 36   Global Step: 185340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:10:50,921-Speed 10847.82 samples/sec   Loss 5.2855   LearningRate 0.0007   Epoch: 36   Global Step: 185350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:10:51,936-Speed 10100.22 samples/sec   Loss 5.1489   LearningRate 0.0007   Epoch: 36   Global Step: 185360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:10:52,957-Speed 10045.41 samples/sec   Loss 5.3047   LearningRate 0.0007   Epoch: 36   Global Step: 185370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:10:53,946-Speed 10355.76 samples/sec   Loss 5.3315   LearningRate 0.0007   Epoch: 36   Global Step: 185380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:10:54,963-Speed 10098.83 samples/sec   Loss 5.3688   LearningRate 0.0007   Epoch: 36   Global Step: 185390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:10:55,922-Speed 10686.72 samples/sec   Loss 5.1359   LearningRate 0.0007   Epoch: 36   Global Step: 185400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:10:56,947-Speed 9997.69 samples/sec   Loss 5.2368   LearningRate 0.0007   Epoch: 36   Global Step: 185410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:10:57,935-Speed 10372.67 samples/sec   Loss 5.2656   LearningRate 0.0007   Epoch: 36   Global Step: 185420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:10:58,909-Speed 10531.39 samples/sec   Loss 5.3634   LearningRate 0.0007   Epoch: 36   Global Step: 185430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:10:59,924-Speed 10091.32 samples/sec   Loss 5.4207   LearningRate 0.0007   Epoch: 36   Global Step: 185440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:11:00,939-Speed 10102.69 samples/sec   Loss 5.3135   LearningRate 0.0007   Epoch: 36   Global Step: 185450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:11:01,951-Speed 10126.81 samples/sec   Loss 5.2805   LearningRate 0.0007   Epoch: 36   Global Step: 185460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:11:02,921-Speed 10573.84 samples/sec   Loss 5.1586   LearningRate 0.0007   Epoch: 36   Global Step: 185470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:11:03,868-Speed 10824.39 samples/sec   Loss 5.1761   LearningRate 0.0007   Epoch: 36   Global Step: 185480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:11:04,906-Speed 9864.48 samples/sec   Loss 5.1346   LearningRate 0.0007   Epoch: 36   Global Step: 185490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:11:05,930-Speed 10017.60 samples/sec   Loss 5.2548   LearningRate 0.0007   Epoch: 36   Global Step: 185500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:11:06,899-Speed 10573.60 samples/sec   Loss 5.2040   LearningRate 0.0007   Epoch: 36   Global Step: 185510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:11:07,903-Speed 10211.52 samples/sec   Loss 5.2272   LearningRate 0.0007   Epoch: 36   Global Step: 185520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:11:08,996-Speed 9373.06 samples/sec   Loss 5.4737   LearningRate 0.0007   Epoch: 36   Global Step: 185530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:11:10,006-Speed 10150.19 samples/sec   Loss 5.2506   LearningRate 0.0007   Epoch: 36   Global Step: 185540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:11:11,005-Speed 10257.30 samples/sec   Loss 5.0715   LearningRate 0.0007   Epoch: 36   Global Step: 185550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:11:11,985-Speed 10457.90 samples/sec   Loss 5.1504   LearningRate 0.0007   Epoch: 36   Global Step: 185560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:11:13,000-Speed 10106.99 samples/sec   Loss 5.3016   LearningRate 0.0007   Epoch: 36   Global Step: 185570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:11:14,028-Speed 9970.46 samples/sec   Loss 5.1925   LearningRate 0.0007   Epoch: 36   Global Step: 185580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:11:15,055-Speed 9976.19 samples/sec   Loss 5.2837   LearningRate 0.0007   Epoch: 36   Global Step: 185590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:11:16,002-Speed 10830.87 samples/sec   Loss 5.1549   LearningRate 0.0007   Epoch: 36   Global Step: 185600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:11:16,986-Speed 10414.66 samples/sec   Loss 5.3167   LearningRate 0.0007   Epoch: 36   Global Step: 185610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:11:17,995-Speed 10148.81 samples/sec   Loss 5.1377   LearningRate 0.0007   Epoch: 36   Global Step: 185620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:11:19,059-Speed 9636.75 samples/sec   Loss 5.1988   LearningRate 0.0007   Epoch: 36   Global Step: 185630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:11:20,092-Speed 9923.06 samples/sec   Loss 5.3773   LearningRate 0.0007   Epoch: 36   Global Step: 185640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:11:21,055-Speed 10638.66 samples/sec   Loss 5.1537   LearningRate 0.0007   Epoch: 36   Global Step: 185650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:11:21,974-Speed 11148.04 samples/sec   Loss 5.2905   LearningRate 0.0007   Epoch: 36   Global Step: 185660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:11:23,036-Speed 9659.01 samples/sec   Loss 5.2069   LearningRate 0.0007   Epoch: 36   Global Step: 185670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:11:24,059-Speed 10017.53 samples/sec   Loss 5.3460   LearningRate 0.0007   Epoch: 36   Global Step: 185680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:11:25,047-Speed 10369.27 samples/sec   Loss 5.1751   LearningRate 0.0007   Epoch: 36   Global Step: 185690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:11:26,046-Speed 10260.39 samples/sec   Loss 5.2017   LearningRate 0.0007   Epoch: 36   Global Step: 185700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:11:27,101-Speed 9711.36 samples/sec   Loss 5.1686   LearningRate 0.0007   Epoch: 36   Global Step: 185710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:11:28,077-Speed 10505.61 samples/sec   Loss 5.1655   LearningRate 0.0007   Epoch: 36   Global Step: 185720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:11:29,055-Speed 10478.18 samples/sec   Loss 5.2498   LearningRate 0.0007   Epoch: 36   Global Step: 185730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:11:30,071-Speed 10095.67 samples/sec   Loss 5.2141   LearningRate 0.0007   Epoch: 36   Global Step: 185740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:11:31,072-Speed 10232.55 samples/sec   Loss 5.3428   LearningRate 0.0007   Epoch: 36   Global Step: 185750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:11:32,099-Speed 9979.98 samples/sec   Loss 5.2462   LearningRate 0.0007   Epoch: 36   Global Step: 185760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:11:33,079-Speed 10466.41 samples/sec   Loss 5.2759   LearningRate 0.0007   Epoch: 36   Global Step: 185770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:11:34,112-Speed 9915.63 samples/sec   Loss 5.3013   LearningRate 0.0007   Epoch: 36   Global Step: 185780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:11:35,086-Speed 10525.74 samples/sec   Loss 5.3298   LearningRate 0.0007   Epoch: 36   Global Step: 185790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:11:36,069-Speed 10429.27 samples/sec   Loss 5.3278   LearningRate 0.0007   Epoch: 36   Global Step: 185800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:11:37,049-Speed 10451.04 samples/sec   Loss 5.2086   LearningRate 0.0007   Epoch: 36   Global Step: 185810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:11:38,081-Speed 9932.77 samples/sec   Loss 5.2066   LearningRate 0.0007   Epoch: 36   Global Step: 185820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:11:39,082-Speed 10239.87 samples/sec   Loss 5.1822   LearningRate 0.0007   Epoch: 36   Global Step: 185830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:11:40,129-Speed 9787.10 samples/sec   Loss 5.2956   LearningRate 0.0007   Epoch: 36   Global Step: 185840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:11:41,088-Speed 10680.84 samples/sec   Loss 5.2432   LearningRate 0.0007   Epoch: 36   Global Step: 185850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:11:42,140-Speed 9745.16 samples/sec   Loss 5.3546   LearningRate 0.0007   Epoch: 36   Global Step: 185860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:11:43,118-Speed 10479.55 samples/sec   Loss 5.2616   LearningRate 0.0007   Epoch: 36   Global Step: 185870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:11:44,086-Speed 10589.91 samples/sec   Loss 5.3007   LearningRate 0.0007   Epoch: 36   Global Step: 185880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:11:45,099-Speed 10116.54 samples/sec   Loss 5.2458   LearningRate 0.0007   Epoch: 36   Global Step: 185890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:11:46,075-Speed 10513.54 samples/sec   Loss 5.2347   LearningRate 0.0007   Epoch: 36   Global Step: 185900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:11:47,101-Speed 9987.88 samples/sec   Loss 5.1681   LearningRate 0.0007   Epoch: 36   Global Step: 185910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:11:48,101-Speed 10246.31 samples/sec   Loss 5.1188   LearningRate 0.0007   Epoch: 36   Global Step: 185920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:11:49,141-Speed 9859.77 samples/sec   Loss 5.2508   LearningRate 0.0007   Epoch: 36   Global Step: 185930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:11:50,118-Speed 10497.68 samples/sec   Loss 5.2371   LearningRate 0.0007   Epoch: 36   Global Step: 185940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:11:51,131-Speed 10113.37 samples/sec   Loss 5.1312   LearningRate 0.0007   Epoch: 36   Global Step: 185950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:11:52,211-Speed 9489.16 samples/sec   Loss 5.1884   LearningRate 0.0007   Epoch: 36   Global Step: 185960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:11:53,218-Speed 10181.39 samples/sec   Loss 5.2801   LearningRate 0.0007   Epoch: 36   Global Step: 185970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:11:54,183-Speed 10619.75 samples/sec   Loss 5.3479   LearningRate 0.0007   Epoch: 36   Global Step: 185980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:11:55,147-Speed 10633.67 samples/sec   Loss 5.3858   LearningRate 0.0007   Epoch: 36   Global Step: 185990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:11:56,145-Speed 10263.35 samples/sec   Loss 5.1528   LearningRate 0.0007   Epoch: 36   Global Step: 186000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:12:18,258-[lfw][186000]XNorm: 7.999444
Training: 2022-04-11 06:12:18,259-[lfw][186000]Accuracy-Flip: 0.99633+-0.00332
Training: 2022-04-11 06:12:18,260-[lfw][186000]Accuracy-Highest: 0.99700
Training: 2022-04-11 06:12:43,806-[cfp_fp][186000]XNorm: 6.917609
Training: 2022-04-11 06:12:43,807-[cfp_fp][186000]Accuracy-Flip: 0.97271+-0.00944
Training: 2022-04-11 06:12:43,808-[cfp_fp][186000]Accuracy-Highest: 0.97371
Training: 2022-04-11 06:13:05,927-[agedb_30][186000]XNorm: 7.818758
Training: 2022-04-11 06:13:05,927-[agedb_30][186000]Accuracy-Flip: 0.97033+-0.00741
Training: 2022-04-11 06:13:05,928-[agedb_30][186000]Accuracy-Highest: 0.97350
Training: 2022-04-11 06:13:06,910-Speed 144.71 samples/sec   Loss 5.3776   LearningRate 0.0006   Epoch: 36   Global Step: 186010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:13:07,883-Speed 10541.80 samples/sec   Loss 5.1327   LearningRate 0.0006   Epoch: 36   Global Step: 186020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:13:08,868-Speed 10417.37 samples/sec   Loss 5.1571   LearningRate 0.0006   Epoch: 36   Global Step: 186030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:13:09,888-Speed 10045.04 samples/sec   Loss 5.2234   LearningRate 0.0006   Epoch: 36   Global Step: 186040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:13:10,903-Speed 10097.09 samples/sec   Loss 5.2583   LearningRate 0.0006   Epoch: 36   Global Step: 186050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:13:11,905-Speed 10231.30 samples/sec   Loss 5.3343   LearningRate 0.0006   Epoch: 36   Global Step: 186060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:13:12,891-Speed 10396.85 samples/sec   Loss 5.5055   LearningRate 0.0006   Epoch: 36   Global Step: 186070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:13:13,870-Speed 10463.20 samples/sec   Loss 5.3932   LearningRate 0.0006   Epoch: 36   Global Step: 186080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:13:14,910-Speed 9860.38 samples/sec   Loss 5.2403   LearningRate 0.0006   Epoch: 36   Global Step: 186090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:13:15,919-Speed 10157.16 samples/sec   Loss 5.2300   LearningRate 0.0006   Epoch: 36   Global Step: 186100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:13:16,881-Speed 10654.10 samples/sec   Loss 5.2288   LearningRate 0.0006   Epoch: 36   Global Step: 186110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:13:17,864-Speed 10422.53 samples/sec   Loss 5.3884   LearningRate 0.0006   Epoch: 36   Global Step: 186120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:13:18,826-Speed 10655.97 samples/sec   Loss 5.2897   LearningRate 0.0006   Epoch: 36   Global Step: 186130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:13:19,820-Speed 10345.33 samples/sec   Loss 5.2602   LearningRate 0.0006   Epoch: 36   Global Step: 186140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:13:20,780-Speed 10669.89 samples/sec   Loss 5.2546   LearningRate 0.0006   Epoch: 36   Global Step: 186150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:13:21,809-Speed 9958.84 samples/sec   Loss 5.2454   LearningRate 0.0006   Epoch: 36   Global Step: 186160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:13:22,780-Speed 10565.11 samples/sec   Loss 5.2555   LearningRate 0.0006   Epoch: 36   Global Step: 186170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:13:23,735-Speed 10732.84 samples/sec   Loss 5.2680   LearningRate 0.0006   Epoch: 36   Global Step: 186180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:13:24,687-Speed 10765.22 samples/sec   Loss 5.3165   LearningRate 0.0006   Epoch: 36   Global Step: 186190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:13:25,665-Speed 10486.23 samples/sec   Loss 5.2567   LearningRate 0.0006   Epoch: 36   Global Step: 186200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:13:26,660-Speed 10290.22 samples/sec   Loss 5.2595   LearningRate 0.0006   Epoch: 36   Global Step: 186210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:13:27,673-Speed 10121.01 samples/sec   Loss 5.2363   LearningRate 0.0006   Epoch: 36   Global Step: 186220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:13:28,728-Speed 9718.14 samples/sec   Loss 5.2824   LearningRate 0.0006   Epoch: 36   Global Step: 186230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:13:29,744-Speed 10084.21 samples/sec   Loss 5.1749   LearningRate 0.0006   Epoch: 36   Global Step: 186240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:13:30,719-Speed 10511.83 samples/sec   Loss 5.3127   LearningRate 0.0006   Epoch: 36   Global Step: 186250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:13:31,711-Speed 10333.85 samples/sec   Loss 5.1467   LearningRate 0.0006   Epoch: 36   Global Step: 186260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:13:32,723-Speed 10135.90 samples/sec   Loss 5.1798   LearningRate 0.0006   Epoch: 36   Global Step: 186270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:13:33,708-Speed 10400.71 samples/sec   Loss 5.2115   LearningRate 0.0006   Epoch: 36   Global Step: 186280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:13:34,725-Speed 10078.05 samples/sec   Loss 5.2518   LearningRate 0.0006   Epoch: 36   Global Step: 186290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:13:35,767-Speed 9835.30 samples/sec   Loss 5.2657   LearningRate 0.0006   Epoch: 36   Global Step: 186300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:13:36,748-Speed 10473.26 samples/sec   Loss 5.3787   LearningRate 0.0006   Epoch: 36   Global Step: 186310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:13:37,742-Speed 10309.90 samples/sec   Loss 5.1270   LearningRate 0.0006   Epoch: 36   Global Step: 186320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:13:38,777-Speed 9906.66 samples/sec   Loss 5.3165   LearningRate 0.0006   Epoch: 36   Global Step: 186330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:13:39,793-Speed 10082.88 samples/sec   Loss 5.3216   LearningRate 0.0006   Epoch: 36   Global Step: 186340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:13:40,807-Speed 10109.49 samples/sec   Loss 5.3394   LearningRate 0.0006   Epoch: 36   Global Step: 186350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:13:41,814-Speed 10176.56 samples/sec   Loss 5.1526   LearningRate 0.0006   Epoch: 36   Global Step: 186360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:13:42,851-Speed 9883.61 samples/sec   Loss 5.3017   LearningRate 0.0006   Epoch: 36   Global Step: 186370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:13:43,827-Speed 10503.19 samples/sec   Loss 5.2216   LearningRate 0.0006   Epoch: 36   Global Step: 186380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:13:44,806-Speed 10472.34 samples/sec   Loss 5.3469   LearningRate 0.0006   Epoch: 36   Global Step: 186390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:13:45,822-Speed 10084.78 samples/sec   Loss 5.1645   LearningRate 0.0006   Epoch: 36   Global Step: 186400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:13:46,825-Speed 10212.51 samples/sec   Loss 5.2226   LearningRate 0.0006   Epoch: 36   Global Step: 186410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:13:47,831-Speed 10191.53 samples/sec   Loss 5.2019   LearningRate 0.0006   Epoch: 36   Global Step: 186420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:13:48,874-Speed 9825.99 samples/sec   Loss 5.1483   LearningRate 0.0006   Epoch: 36   Global Step: 186430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:13:49,895-Speed 10039.88 samples/sec   Loss 5.2745   LearningRate 0.0006   Epoch: 36   Global Step: 186440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:13:50,887-Speed 10328.78 samples/sec   Loss 5.2766   LearningRate 0.0006   Epoch: 36   Global Step: 186450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:13:51,875-Speed 10376.67 samples/sec   Loss 5.3424   LearningRate 0.0006   Epoch: 36   Global Step: 186460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:13:52,906-Speed 9940.02 samples/sec   Loss 5.2443   LearningRate 0.0006   Epoch: 36   Global Step: 186470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:13:53,917-Speed 10137.16 samples/sec   Loss 5.2930   LearningRate 0.0006   Epoch: 36   Global Step: 186480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:13:54,934-Speed 10074.90 samples/sec   Loss 5.1428   LearningRate 0.0006   Epoch: 36   Global Step: 186490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:13:55,889-Speed 10732.30 samples/sec   Loss 5.2548   LearningRate 0.0006   Epoch: 36   Global Step: 186500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:13:56,859-Speed 10559.98 samples/sec   Loss 5.2833   LearningRate 0.0006   Epoch: 36   Global Step: 186510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:13:57,835-Speed 10504.37 samples/sec   Loss 5.3664   LearningRate 0.0006   Epoch: 36   Global Step: 186520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:13:58,815-Speed 10463.35 samples/sec   Loss 5.1356   LearningRate 0.0006   Epoch: 36   Global Step: 186530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:13:59,823-Speed 10171.20 samples/sec   Loss 5.1717   LearningRate 0.0006   Epoch: 36   Global Step: 186540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:14:00,859-Speed 9889.97 samples/sec   Loss 5.0531   LearningRate 0.0006   Epoch: 36   Global Step: 186550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:14:01,820-Speed 10669.90 samples/sec   Loss 5.2355   LearningRate 0.0006   Epoch: 36   Global Step: 186560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:14:02,803-Speed 10424.54 samples/sec   Loss 5.2354   LearningRate 0.0006   Epoch: 36   Global Step: 186570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:14:03,788-Speed 10405.08 samples/sec   Loss 5.1721   LearningRate 0.0006   Epoch: 36   Global Step: 186580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:14:04,814-Speed 9991.11 samples/sec   Loss 5.1554   LearningRate 0.0006   Epoch: 36   Global Step: 186590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:14:05,793-Speed 10469.41 samples/sec   Loss 5.2214   LearningRate 0.0006   Epoch: 36   Global Step: 186600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:14:06,789-Speed 10294.93 samples/sec   Loss 5.2579   LearningRate 0.0006   Epoch: 36   Global Step: 186610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:14:07,804-Speed 10102.05 samples/sec   Loss 5.2670   LearningRate 0.0006   Epoch: 36   Global Step: 186620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:14:08,857-Speed 9727.55 samples/sec   Loss 5.2057   LearningRate 0.0006   Epoch: 36   Global Step: 186630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:14:09,840-Speed 10429.64 samples/sec   Loss 5.1893   LearningRate 0.0006   Epoch: 36   Global Step: 186640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:14:10,794-Speed 10748.10 samples/sec   Loss 5.1474   LearningRate 0.0006   Epoch: 36   Global Step: 186650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:14:11,799-Speed 10192.93 samples/sec   Loss 5.2366   LearningRate 0.0006   Epoch: 36   Global Step: 186660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:14:12,818-Speed 10057.71 samples/sec   Loss 5.1764   LearningRate 0.0006   Epoch: 36   Global Step: 186670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:14:13,809-Speed 10345.52 samples/sec   Loss 5.3171   LearningRate 0.0006   Epoch: 36   Global Step: 186680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:14:14,822-Speed 10114.06 samples/sec   Loss 4.9645   LearningRate 0.0006   Epoch: 36   Global Step: 186690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:14:15,812-Speed 10354.93 samples/sec   Loss 5.2214   LearningRate 0.0006   Epoch: 36   Global Step: 186700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:14:16,849-Speed 9879.81 samples/sec   Loss 5.2569   LearningRate 0.0006   Epoch: 36   Global Step: 186710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:14:17,868-Speed 10050.13 samples/sec   Loss 5.2728   LearningRate 0.0006   Epoch: 36   Global Step: 186720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:14:18,830-Speed 10662.02 samples/sec   Loss 5.2116   LearningRate 0.0006   Epoch: 36   Global Step: 186730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:14:19,804-Speed 10545.69 samples/sec   Loss 5.2783   LearningRate 0.0006   Epoch: 36   Global Step: 186740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:14:20,789-Speed 10401.40 samples/sec   Loss 5.4170   LearningRate 0.0006   Epoch: 36   Global Step: 186750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:14:21,787-Speed 10272.95 samples/sec   Loss 5.3617   LearningRate 0.0006   Epoch: 36   Global Step: 186760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:14:22,786-Speed 10267.96 samples/sec   Loss 5.2547   LearningRate 0.0006   Epoch: 36   Global Step: 186770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:14:23,781-Speed 10296.01 samples/sec   Loss 5.3137   LearningRate 0.0006   Epoch: 36   Global Step: 186780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:14:24,781-Speed 10249.09 samples/sec   Loss 5.2465   LearningRate 0.0006   Epoch: 36   Global Step: 186790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:14:25,806-Speed 9999.30 samples/sec   Loss 5.3201   LearningRate 0.0006   Epoch: 36   Global Step: 186800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:14:26,758-Speed 10767.90 samples/sec   Loss 5.0479   LearningRate 0.0006   Epoch: 36   Global Step: 186810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:14:27,743-Speed 10403.41 samples/sec   Loss 5.2968   LearningRate 0.0006   Epoch: 36   Global Step: 186820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:14:28,751-Speed 10165.35 samples/sec   Loss 5.2184   LearningRate 0.0006   Epoch: 36   Global Step: 186830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:14:29,777-Speed 9987.87 samples/sec   Loss 5.3060   LearningRate 0.0006   Epoch: 36   Global Step: 186840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:14:30,788-Speed 10145.91 samples/sec   Loss 5.2288   LearningRate 0.0006   Epoch: 36   Global Step: 186850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:14:31,726-Speed 10926.42 samples/sec   Loss 5.2742   LearningRate 0.0006   Epoch: 36   Global Step: 186860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:14:32,743-Speed 10076.29 samples/sec   Loss 5.2696   LearningRate 0.0006   Epoch: 36   Global Step: 186870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:14:33,808-Speed 9625.39 samples/sec   Loss 5.1715   LearningRate 0.0006   Epoch: 36   Global Step: 186880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:14:34,768-Speed 10685.35 samples/sec   Loss 5.1716   LearningRate 0.0006   Epoch: 36   Global Step: 186890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:14:35,723-Speed 10727.64 samples/sec   Loss 5.1780   LearningRate 0.0006   Epoch: 36   Global Step: 186900   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 06:14:36,700-Speed 10488.18 samples/sec   Loss 5.1408   LearningRate 0.0006   Epoch: 36   Global Step: 186910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:14:37,692-Speed 10337.70 samples/sec   Loss 5.2265   LearningRate 0.0006   Epoch: 36   Global Step: 186920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:14:38,685-Speed 10317.13 samples/sec   Loss 5.3060   LearningRate 0.0006   Epoch: 36   Global Step: 186930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:14:39,684-Speed 10260.90 samples/sec   Loss 5.1496   LearningRate 0.0006   Epoch: 36   Global Step: 186940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:14:40,677-Speed 10317.32 samples/sec   Loss 5.2090   LearningRate 0.0006   Epoch: 36   Global Step: 186950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:14:41,755-Speed 9514.12 samples/sec   Loss 5.2906   LearningRate 0.0006   Epoch: 36   Global Step: 186960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:14:42,726-Speed 10555.15 samples/sec   Loss 5.2929   LearningRate 0.0006   Epoch: 36   Global Step: 186970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:14:43,764-Speed 9871.94 samples/sec   Loss 5.2899   LearningRate 0.0006   Epoch: 36   Global Step: 186980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:14:44,821-Speed 9714.34 samples/sec   Loss 5.3282   LearningRate 0.0006   Epoch: 36   Global Step: 186990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:14:45,801-Speed 10469.49 samples/sec   Loss 5.1109   LearningRate 0.0006   Epoch: 36   Global Step: 187000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:14:46,761-Speed 10683.48 samples/sec   Loss 5.3129   LearningRate 0.0006   Epoch: 36   Global Step: 187010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:14:47,759-Speed 10266.05 samples/sec   Loss 5.2793   LearningRate 0.0006   Epoch: 36   Global Step: 187020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:14:48,737-Speed 10480.92 samples/sec   Loss 5.2338   LearningRate 0.0006   Epoch: 36   Global Step: 187030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:14:49,686-Speed 10799.20 samples/sec   Loss 5.1014   LearningRate 0.0006   Epoch: 36   Global Step: 187040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:14:50,721-Speed 9906.25 samples/sec   Loss 5.3074   LearningRate 0.0006   Epoch: 36   Global Step: 187050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:14:51,758-Speed 9900.20 samples/sec   Loss 5.2004   LearningRate 0.0006   Epoch: 36   Global Step: 187060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:14:52,744-Speed 10422.53 samples/sec   Loss 5.2019   LearningRate 0.0006   Epoch: 36   Global Step: 187070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:14:53,728-Speed 10415.06 samples/sec   Loss 5.2683   LearningRate 0.0006   Epoch: 36   Global Step: 187080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:14:54,743-Speed 10091.42 samples/sec   Loss 5.3225   LearningRate 0.0006   Epoch: 36   Global Step: 187090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:14:55,751-Speed 10174.74 samples/sec   Loss 5.0986   LearningRate 0.0006   Epoch: 36   Global Step: 187100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:14:56,728-Speed 10491.14 samples/sec   Loss 5.1823   LearningRate 0.0006   Epoch: 36   Global Step: 187110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:14:57,664-Speed 10945.89 samples/sec   Loss 5.2875   LearningRate 0.0006   Epoch: 36   Global Step: 187120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:14:58,638-Speed 10521.67 samples/sec   Loss 5.2222   LearningRate 0.0006   Epoch: 36   Global Step: 187130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:14:59,679-Speed 9850.24 samples/sec   Loss 5.3796   LearningRate 0.0006   Epoch: 36   Global Step: 187140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:15:11,519-Speed 865.05 samples/sec   Loss 5.3079   LearningRate 0.0006   Epoch: 37   Global Step: 187150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:15:12,719-Speed 8537.71 samples/sec   Loss 5.1544   LearningRate 0.0006   Epoch: 37   Global Step: 187160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:15:13,958-Speed 8275.58 samples/sec   Loss 5.2136   LearningRate 0.0006   Epoch: 37   Global Step: 187170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:15:15,009-Speed 9753.35 samples/sec   Loss 5.0573   LearningRate 0.0006   Epoch: 37   Global Step: 187180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:15:15,969-Speed 10668.11 samples/sec   Loss 5.1795   LearningRate 0.0006   Epoch: 37   Global Step: 187190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:15:17,212-Speed 8244.62 samples/sec   Loss 5.1351   LearningRate 0.0006   Epoch: 37   Global Step: 187200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:15:18,321-Speed 9243.87 samples/sec   Loss 5.1669   LearningRate 0.0006   Epoch: 37   Global Step: 187210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:15:19,343-Speed 10030.91 samples/sec   Loss 5.1041   LearningRate 0.0006   Epoch: 37   Global Step: 187220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:15:20,327-Speed 10409.44 samples/sec   Loss 5.0714   LearningRate 0.0006   Epoch: 37   Global Step: 187230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:15:21,343-Speed 10093.70 samples/sec   Loss 5.0392   LearningRate 0.0006   Epoch: 37   Global Step: 187240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:15:22,388-Speed 9810.63 samples/sec   Loss 5.1445   LearningRate 0.0006   Epoch: 37   Global Step: 187250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:15:23,380-Speed 10333.91 samples/sec   Loss 5.1486   LearningRate 0.0006   Epoch: 37   Global Step: 187260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:15:24,354-Speed 10517.33 samples/sec   Loss 5.1500   LearningRate 0.0006   Epoch: 37   Global Step: 187270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:15:25,411-Speed 9695.17 samples/sec   Loss 5.1318   LearningRate 0.0006   Epoch: 37   Global Step: 187280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:15:26,438-Speed 9987.34 samples/sec   Loss 5.2081   LearningRate 0.0006   Epoch: 37   Global Step: 187290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:15:27,530-Speed 9389.44 samples/sec   Loss 5.2706   LearningRate 0.0006   Epoch: 37   Global Step: 187300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:15:28,508-Speed 10477.66 samples/sec   Loss 5.1419   LearningRate 0.0006   Epoch: 37   Global Step: 187310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:15:29,520-Speed 10122.09 samples/sec   Loss 5.2170   LearningRate 0.0005   Epoch: 37   Global Step: 187320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:15:30,547-Speed 9977.39 samples/sec   Loss 5.2075   LearningRate 0.0005   Epoch: 37   Global Step: 187330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:15:31,640-Speed 9378.13 samples/sec   Loss 5.0881   LearningRate 0.0005   Epoch: 37   Global Step: 187340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:15:32,664-Speed 10007.90 samples/sec   Loss 5.1390   LearningRate 0.0005   Epoch: 37   Global Step: 187350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:15:33,805-Speed 8982.16 samples/sec   Loss 5.1677   LearningRate 0.0005   Epoch: 37   Global Step: 187360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:15:34,884-Speed 9491.30 samples/sec   Loss 5.0601   LearningRate 0.0005   Epoch: 37   Global Step: 187370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:15:35,873-Speed 10367.70 samples/sec   Loss 5.2278   LearningRate 0.0005   Epoch: 37   Global Step: 187380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:15:36,920-Speed 9782.97 samples/sec   Loss 5.1399   LearningRate 0.0005   Epoch: 37   Global Step: 187390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:15:37,942-Speed 10028.42 samples/sec   Loss 5.2274   LearningRate 0.0005   Epoch: 37   Global Step: 187400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:15:38,945-Speed 10219.37 samples/sec   Loss 5.1347   LearningRate 0.0005   Epoch: 37   Global Step: 187410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:15:39,924-Speed 10468.74 samples/sec   Loss 5.1985   LearningRate 0.0005   Epoch: 37   Global Step: 187420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:15:40,882-Speed 10699.29 samples/sec   Loss 5.1249   LearningRate 0.0005   Epoch: 37   Global Step: 187430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:15:41,923-Speed 9844.42 samples/sec   Loss 5.0816   LearningRate 0.0005   Epoch: 37   Global Step: 187440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:15:42,927-Speed 10212.02 samples/sec   Loss 5.0830   LearningRate 0.0005   Epoch: 37   Global Step: 187450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:15:43,897-Speed 10567.24 samples/sec   Loss 5.2363   LearningRate 0.0005   Epoch: 37   Global Step: 187460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:15:44,873-Speed 10499.13 samples/sec   Loss 5.1696   LearningRate 0.0005   Epoch: 37   Global Step: 187470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:15:45,856-Speed 10423.79 samples/sec   Loss 5.0924   LearningRate 0.0005   Epoch: 37   Global Step: 187480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:15:46,874-Speed 10063.72 samples/sec   Loss 5.0270   LearningRate 0.0005   Epoch: 37   Global Step: 187490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:15:47,858-Speed 10417.25 samples/sec   Loss 5.1540   LearningRate 0.0005   Epoch: 37   Global Step: 187500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:15:48,876-Speed 10072.39 samples/sec   Loss 5.2164   LearningRate 0.0005   Epoch: 37   Global Step: 187510   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 06:15:49,853-Speed 10489.62 samples/sec   Loss 5.1729   LearningRate 0.0005   Epoch: 37   Global Step: 187520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:15:50,903-Speed 9757.84 samples/sec   Loss 5.1227   LearningRate 0.0005   Epoch: 37   Global Step: 187530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:15:51,886-Speed 10425.74 samples/sec   Loss 5.2489   LearningRate 0.0005   Epoch: 37   Global Step: 187540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:15:52,844-Speed 10700.14 samples/sec   Loss 5.2156   LearningRate 0.0005   Epoch: 37   Global Step: 187550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:15:53,869-Speed 10002.52 samples/sec   Loss 5.0119   LearningRate 0.0005   Epoch: 37   Global Step: 187560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:15:54,919-Speed 9759.88 samples/sec   Loss 5.2230   LearningRate 0.0005   Epoch: 37   Global Step: 187570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:15:55,927-Speed 10169.26 samples/sec   Loss 5.1761   LearningRate 0.0005   Epoch: 37   Global Step: 187580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:15:56,895-Speed 10581.89 samples/sec   Loss 5.1313   LearningRate 0.0005   Epoch: 37   Global Step: 187590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:15:57,928-Speed 9920.84 samples/sec   Loss 5.2035   LearningRate 0.0005   Epoch: 37   Global Step: 187600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:15:58,901-Speed 10539.59 samples/sec   Loss 5.2588   LearningRate 0.0005   Epoch: 37   Global Step: 187610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:15:59,895-Speed 10309.70 samples/sec   Loss 5.1044   LearningRate 0.0005   Epoch: 37   Global Step: 187620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:16:00,824-Speed 11029.79 samples/sec   Loss 5.1411   LearningRate 0.0005   Epoch: 37   Global Step: 187630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:16:01,811-Speed 10389.60 samples/sec   Loss 5.1082   LearningRate 0.0005   Epoch: 37   Global Step: 187640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:16:02,807-Speed 10292.15 samples/sec   Loss 5.0866   LearningRate 0.0005   Epoch: 37   Global Step: 187650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:16:03,829-Speed 10036.23 samples/sec   Loss 5.1682   LearningRate 0.0005   Epoch: 37   Global Step: 187660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:16:04,852-Speed 10015.85 samples/sec   Loss 5.1749   LearningRate 0.0005   Epoch: 37   Global Step: 187670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:16:05,787-Speed 10963.06 samples/sec   Loss 5.2687   LearningRate 0.0005   Epoch: 37   Global Step: 187680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:16:06,751-Speed 10634.90 samples/sec   Loss 5.1181   LearningRate 0.0005   Epoch: 37   Global Step: 187690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:16:07,750-Speed 10251.14 samples/sec   Loss 5.0355   LearningRate 0.0005   Epoch: 37   Global Step: 187700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:16:08,788-Speed 9874.74 samples/sec   Loss 4.9924   LearningRate 0.0005   Epoch: 37   Global Step: 187710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:16:09,809-Speed 10038.95 samples/sec   Loss 5.1984   LearningRate 0.0005   Epoch: 37   Global Step: 187720   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 06:16:10,767-Speed 10699.80 samples/sec   Loss 5.1484   LearningRate 0.0005   Epoch: 37   Global Step: 187730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:16:11,772-Speed 10192.58 samples/sec   Loss 5.2005   LearningRate 0.0005   Epoch: 37   Global Step: 187740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:16:12,842-Speed 9583.96 samples/sec   Loss 5.2404   LearningRate 0.0005   Epoch: 37   Global Step: 187750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:16:13,824-Speed 10436.48 samples/sec   Loss 5.1911   LearningRate 0.0005   Epoch: 37   Global Step: 187760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:16:14,824-Speed 10242.10 samples/sec   Loss 5.1976   LearningRate 0.0005   Epoch: 37   Global Step: 187770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:16:15,880-Speed 9709.51 samples/sec   Loss 5.0520   LearningRate 0.0005   Epoch: 37   Global Step: 187780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:16:16,880-Speed 10249.32 samples/sec   Loss 5.0977   LearningRate 0.0005   Epoch: 37   Global Step: 187790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:16:17,820-Speed 10910.01 samples/sec   Loss 5.2346   LearningRate 0.0005   Epoch: 37   Global Step: 187800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:16:18,770-Speed 10780.19 samples/sec   Loss 5.0293   LearningRate 0.0005   Epoch: 37   Global Step: 187810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:16:19,716-Speed 10838.78 samples/sec   Loss 5.1486   LearningRate 0.0005   Epoch: 37   Global Step: 187820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:16:20,698-Speed 10427.75 samples/sec   Loss 5.1334   LearningRate 0.0005   Epoch: 37   Global Step: 187830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:16:21,690-Speed 10336.51 samples/sec   Loss 5.1966   LearningRate 0.0005   Epoch: 37   Global Step: 187840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:16:22,708-Speed 10064.43 samples/sec   Loss 5.3663   LearningRate 0.0005   Epoch: 37   Global Step: 187850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:16:23,663-Speed 10735.26 samples/sec   Loss 5.0638   LearningRate 0.0005   Epoch: 37   Global Step: 187860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:16:24,652-Speed 10357.02 samples/sec   Loss 5.1963   LearningRate 0.0005   Epoch: 37   Global Step: 187870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:16:25,683-Speed 9942.85 samples/sec   Loss 5.0966   LearningRate 0.0005   Epoch: 37   Global Step: 187880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:16:26,656-Speed 10532.07 samples/sec   Loss 5.1527   LearningRate 0.0005   Epoch: 37   Global Step: 187890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:16:27,647-Speed 10341.78 samples/sec   Loss 5.3073   LearningRate 0.0005   Epoch: 37   Global Step: 187900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:16:28,616-Speed 10569.99 samples/sec   Loss 5.1554   LearningRate 0.0005   Epoch: 37   Global Step: 187910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:16:29,610-Speed 10310.32 samples/sec   Loss 5.1276   LearningRate 0.0005   Epoch: 37   Global Step: 187920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:16:30,634-Speed 10186.78 samples/sec   Loss 5.1364   LearningRate 0.0005   Epoch: 37   Global Step: 187930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:16:31,604-Speed 10566.49 samples/sec   Loss 5.1787   LearningRate 0.0005   Epoch: 37   Global Step: 187940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:16:32,586-Speed 10435.35 samples/sec   Loss 5.1314   LearningRate 0.0005   Epoch: 37   Global Step: 187950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:16:33,527-Speed 10886.76 samples/sec   Loss 5.2494   LearningRate 0.0005   Epoch: 37   Global Step: 187960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:16:34,609-Speed 9469.33 samples/sec   Loss 5.1503   LearningRate 0.0005   Epoch: 37   Global Step: 187970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:16:35,580-Speed 10560.45 samples/sec   Loss 5.1217   LearningRate 0.0005   Epoch: 37   Global Step: 187980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:16:36,577-Speed 10273.41 samples/sec   Loss 5.2082   LearningRate 0.0005   Epoch: 37   Global Step: 187990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:16:37,568-Speed 10347.96 samples/sec   Loss 5.2707   LearningRate 0.0005   Epoch: 37   Global Step: 188000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:17:00,343-[lfw][188000]XNorm: 8.005090
Training: 2022-04-11 06:17:00,344-[lfw][188000]Accuracy-Flip: 0.99583+-0.00352
Training: 2022-04-11 06:17:00,345-[lfw][188000]Accuracy-Highest: 0.99700
Training: 2022-04-11 06:17:25,947-[cfp_fp][188000]XNorm: 6.923191
Training: 2022-04-11 06:17:25,948-[cfp_fp][188000]Accuracy-Flip: 0.97214+-0.00855
Training: 2022-04-11 06:17:25,949-[cfp_fp][188000]Accuracy-Highest: 0.97371
Training: 2022-04-11 06:17:48,044-[agedb_30][188000]XNorm: 7.835423
Training: 2022-04-11 06:17:48,044-[agedb_30][188000]Accuracy-Flip: 0.97250+-0.00672
Training: 2022-04-11 06:17:48,045-[agedb_30][188000]Accuracy-Highest: 0.97350
Training: 2022-04-11 06:17:49,026-Speed 143.30 samples/sec   Loss 5.2279   LearningRate 0.0005   Epoch: 37   Global Step: 188010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:17:50,019-Speed 10311.13 samples/sec   Loss 5.1553   LearningRate 0.0005   Epoch: 37   Global Step: 188020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:17:50,989-Speed 10574.30 samples/sec   Loss 5.1908   LearningRate 0.0005   Epoch: 37   Global Step: 188030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:17:51,974-Speed 10426.68 samples/sec   Loss 5.0871   LearningRate 0.0005   Epoch: 37   Global Step: 188040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:17:52,942-Speed 10591.91 samples/sec   Loss 5.1922   LearningRate 0.0005   Epoch: 37   Global Step: 188050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:17:53,967-Speed 9998.21 samples/sec   Loss 5.0151   LearningRate 0.0005   Epoch: 37   Global Step: 188060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:17:54,942-Speed 10520.76 samples/sec   Loss 5.2661   LearningRate 0.0005   Epoch: 37   Global Step: 188070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:17:55,896-Speed 10738.93 samples/sec   Loss 5.1157   LearningRate 0.0005   Epoch: 37   Global Step: 188080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:17:56,860-Speed 10630.12 samples/sec   Loss 5.0052   LearningRate 0.0005   Epoch: 37   Global Step: 188090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:17:57,874-Speed 10119.08 samples/sec   Loss 5.1667   LearningRate 0.0005   Epoch: 37   Global Step: 188100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:17:58,875-Speed 10249.14 samples/sec   Loss 5.1180   LearningRate 0.0005   Epoch: 37   Global Step: 188110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:17:59,868-Speed 10317.57 samples/sec   Loss 5.2240   LearningRate 0.0005   Epoch: 37   Global Step: 188120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:18:00,829-Speed 10661.61 samples/sec   Loss 5.1808   LearningRate 0.0005   Epoch: 37   Global Step: 188130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:18:01,822-Speed 10326.22 samples/sec   Loss 5.1484   LearningRate 0.0005   Epoch: 37   Global Step: 188140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:18:02,783-Speed 10670.70 samples/sec   Loss 5.1529   LearningRate 0.0005   Epoch: 37   Global Step: 188150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:18:03,778-Speed 10294.87 samples/sec   Loss 5.1281   LearningRate 0.0005   Epoch: 37   Global Step: 188160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:04,746-Speed 10591.86 samples/sec   Loss 5.1495   LearningRate 0.0005   Epoch: 37   Global Step: 188170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:05,751-Speed 10193.81 samples/sec   Loss 5.0974   LearningRate 0.0005   Epoch: 37   Global Step: 188180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:06,740-Speed 10370.76 samples/sec   Loss 5.0406   LearningRate 0.0005   Epoch: 37   Global Step: 188190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:07,752-Speed 10145.19 samples/sec   Loss 5.1194   LearningRate 0.0005   Epoch: 37   Global Step: 188200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:08,711-Speed 10691.52 samples/sec   Loss 5.0717   LearningRate 0.0005   Epoch: 37   Global Step: 188210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:09,737-Speed 9992.04 samples/sec   Loss 5.1073   LearningRate 0.0005   Epoch: 37   Global Step: 188220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:10,774-Speed 9883.40 samples/sec   Loss 5.0077   LearningRate 0.0005   Epoch: 37   Global Step: 188230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:11,734-Speed 10681.20 samples/sec   Loss 5.2627   LearningRate 0.0005   Epoch: 37   Global Step: 188240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:12,730-Speed 10284.94 samples/sec   Loss 5.2412   LearningRate 0.0005   Epoch: 37   Global Step: 188250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:13,735-Speed 10211.87 samples/sec   Loss 5.0978   LearningRate 0.0005   Epoch: 37   Global Step: 188260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:14,726-Speed 10347.37 samples/sec   Loss 5.0853   LearningRate 0.0005   Epoch: 37   Global Step: 188270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:15,688-Speed 10656.73 samples/sec   Loss 5.1448   LearningRate 0.0005   Epoch: 37   Global Step: 188280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:16,663-Speed 10513.13 samples/sec   Loss 5.1188   LearningRate 0.0005   Epoch: 37   Global Step: 188290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:17,678-Speed 10092.84 samples/sec   Loss 5.0941   LearningRate 0.0005   Epoch: 37   Global Step: 188300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:18,637-Speed 10696.52 samples/sec   Loss 5.1560   LearningRate 0.0005   Epoch: 37   Global Step: 188310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:19,652-Speed 10093.77 samples/sec   Loss 5.1716   LearningRate 0.0005   Epoch: 37   Global Step: 188320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:20,662-Speed 10141.72 samples/sec   Loss 5.1804   LearningRate 0.0005   Epoch: 37   Global Step: 188330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:21,690-Speed 9972.61 samples/sec   Loss 5.1204   LearningRate 0.0005   Epoch: 37   Global Step: 188340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:22,690-Speed 10248.70 samples/sec   Loss 5.1155   LearningRate 0.0005   Epoch: 37   Global Step: 188350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:23,657-Speed 10608.94 samples/sec   Loss 5.1574   LearningRate 0.0005   Epoch: 37   Global Step: 188360   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 06:18:24,643-Speed 10387.83 samples/sec   Loss 5.2107   LearningRate 0.0005   Epoch: 37   Global Step: 188370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:25,637-Speed 10327.15 samples/sec   Loss 5.2030   LearningRate 0.0005   Epoch: 37   Global Step: 188380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:18:26,592-Speed 10732.92 samples/sec   Loss 5.2687   LearningRate 0.0005   Epoch: 37   Global Step: 188390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:18:27,593-Speed 10246.18 samples/sec   Loss 5.0867   LearningRate 0.0005   Epoch: 37   Global Step: 188400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:18:28,608-Speed 10098.08 samples/sec   Loss 5.0950   LearningRate 0.0005   Epoch: 37   Global Step: 188410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:18:29,625-Speed 10073.05 samples/sec   Loss 5.0853   LearningRate 0.0005   Epoch: 37   Global Step: 188420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:18:30,606-Speed 10456.95 samples/sec   Loss 5.1349   LearningRate 0.0005   Epoch: 37   Global Step: 188430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:18:31,601-Speed 10298.57 samples/sec   Loss 5.1391   LearningRate 0.0005   Epoch: 37   Global Step: 188440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:18:32,642-Speed 9855.69 samples/sec   Loss 5.0878   LearningRate 0.0005   Epoch: 37   Global Step: 188450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:18:33,614-Speed 10546.66 samples/sec   Loss 5.1943   LearningRate 0.0005   Epoch: 37   Global Step: 188460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:18:34,576-Speed 10655.66 samples/sec   Loss 5.1121   LearningRate 0.0005   Epoch: 37   Global Step: 188470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:18:35,628-Speed 9736.61 samples/sec   Loss 5.2541   LearningRate 0.0005   Epoch: 37   Global Step: 188480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:36,590-Speed 10654.66 samples/sec   Loss 5.0333   LearningRate 0.0005   Epoch: 37   Global Step: 188490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:37,576-Speed 10402.88 samples/sec   Loss 5.0950   LearningRate 0.0005   Epoch: 37   Global Step: 188500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:38,579-Speed 10224.46 samples/sec   Loss 5.1319   LearningRate 0.0005   Epoch: 37   Global Step: 188510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:39,601-Speed 10025.57 samples/sec   Loss 5.3075   LearningRate 0.0005   Epoch: 37   Global Step: 188520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:40,643-Speed 9832.00 samples/sec   Loss 5.0487   LearningRate 0.0005   Epoch: 37   Global Step: 188530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:41,636-Speed 10322.92 samples/sec   Loss 5.2374   LearningRate 0.0005   Epoch: 37   Global Step: 188540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:42,663-Speed 9984.17 samples/sec   Loss 5.1172   LearningRate 0.0005   Epoch: 37   Global Step: 188550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:43,632-Speed 10579.40 samples/sec   Loss 5.0725   LearningRate 0.0005   Epoch: 37   Global Step: 188560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:44,649-Speed 10082.89 samples/sec   Loss 4.9712   LearningRate 0.0005   Epoch: 37   Global Step: 188570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:45,619-Speed 10565.57 samples/sec   Loss 5.1346   LearningRate 0.0005   Epoch: 37   Global Step: 188580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:46,580-Speed 10673.42 samples/sec   Loss 5.1420   LearningRate 0.0005   Epoch: 37   Global Step: 188590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:47,584-Speed 10202.72 samples/sec   Loss 5.2234   LearningRate 0.0005   Epoch: 37   Global Step: 188600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:48,596-Speed 10139.31 samples/sec   Loss 5.1403   LearningRate 0.0005   Epoch: 37   Global Step: 188610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:49,610-Speed 10097.70 samples/sec   Loss 5.0474   LearningRate 0.0005   Epoch: 37   Global Step: 188620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:50,685-Speed 9531.37 samples/sec   Loss 5.1679   LearningRate 0.0005   Epoch: 37   Global Step: 188630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:51,683-Speed 10273.97 samples/sec   Loss 5.1564   LearningRate 0.0005   Epoch: 37   Global Step: 188640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:52,684-Speed 10246.47 samples/sec   Loss 5.1726   LearningRate 0.0005   Epoch: 37   Global Step: 188650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:53,703-Speed 10054.37 samples/sec   Loss 5.2498   LearningRate 0.0005   Epoch: 37   Global Step: 188660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:54,709-Speed 10184.45 samples/sec   Loss 5.0010   LearningRate 0.0005   Epoch: 37   Global Step: 188670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:55,714-Speed 10207.50 samples/sec   Loss 5.1926   LearningRate 0.0005   Epoch: 37   Global Step: 188680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:18:56,686-Speed 10536.56 samples/sec   Loss 5.2409   LearningRate 0.0005   Epoch: 37   Global Step: 188690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:18:57,639-Speed 10757.46 samples/sec   Loss 5.0961   LearningRate 0.0005   Epoch: 37   Global Step: 188700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:18:58,658-Speed 10059.68 samples/sec   Loss 5.1331   LearningRate 0.0005   Epoch: 37   Global Step: 188710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:18:59,634-Speed 10508.34 samples/sec   Loss 5.1692   LearningRate 0.0005   Epoch: 37   Global Step: 188720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:19:00,621-Speed 10381.60 samples/sec   Loss 5.1451   LearningRate 0.0005   Epoch: 37   Global Step: 188730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:19:01,626-Speed 10205.22 samples/sec   Loss 5.0267   LearningRate 0.0005   Epoch: 37   Global Step: 188740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:19:02,607-Speed 10456.38 samples/sec   Loss 5.1220   LearningRate 0.0004   Epoch: 37   Global Step: 188750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:19:03,559-Speed 10762.38 samples/sec   Loss 5.0665   LearningRate 0.0004   Epoch: 37   Global Step: 188760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:19:04,660-Speed 9324.38 samples/sec   Loss 5.1705   LearningRate 0.0004   Epoch: 37   Global Step: 188770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:19:05,644-Speed 10420.35 samples/sec   Loss 5.2382   LearningRate 0.0004   Epoch: 37   Global Step: 188780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:19:06,623-Speed 10467.12 samples/sec   Loss 5.0842   LearningRate 0.0004   Epoch: 37   Global Step: 188790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:19:07,642-Speed 10059.63 samples/sec   Loss 5.1477   LearningRate 0.0004   Epoch: 37   Global Step: 188800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:19:08,736-Speed 9375.39 samples/sec   Loss 5.2765   LearningRate 0.0004   Epoch: 37   Global Step: 188810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:19:09,738-Speed 10244.80 samples/sec   Loss 5.2098   LearningRate 0.0004   Epoch: 37   Global Step: 188820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:19:10,715-Speed 10488.12 samples/sec   Loss 5.1423   LearningRate 0.0004   Epoch: 37   Global Step: 188830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:19:11,677-Speed 10649.06 samples/sec   Loss 5.1632   LearningRate 0.0004   Epoch: 37   Global Step: 188840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:19:12,701-Speed 10037.66 samples/sec   Loss 5.1715   LearningRate 0.0004   Epoch: 37   Global Step: 188850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:19:13,679-Speed 10482.31 samples/sec   Loss 5.1090   LearningRate 0.0004   Epoch: 37   Global Step: 188860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:19:14,691-Speed 10119.31 samples/sec   Loss 5.0647   LearningRate 0.0004   Epoch: 37   Global Step: 188870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:19:15,655-Speed 10641.28 samples/sec   Loss 5.1078   LearningRate 0.0004   Epoch: 37   Global Step: 188880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:19:16,659-Speed 10200.17 samples/sec   Loss 5.1537   LearningRate 0.0004   Epoch: 37   Global Step: 188890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:19:17,723-Speed 9639.80 samples/sec   Loss 5.0822   LearningRate 0.0004   Epoch: 37   Global Step: 188900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:19:18,725-Speed 10232.17 samples/sec   Loss 5.0339   LearningRate 0.0004   Epoch: 37   Global Step: 188910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:19:19,807-Speed 9469.36 samples/sec   Loss 5.1010   LearningRate 0.0004   Epoch: 37   Global Step: 188920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:19:20,778-Speed 10549.19 samples/sec   Loss 5.1050   LearningRate 0.0004   Epoch: 37   Global Step: 188930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:19:21,805-Speed 9984.60 samples/sec   Loss 5.0611   LearningRate 0.0004   Epoch: 37   Global Step: 188940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:19:22,801-Speed 10290.75 samples/sec   Loss 5.1691   LearningRate 0.0004   Epoch: 37   Global Step: 188950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 06:19:23,860-Speed 9679.80 samples/sec   Loss 5.1424   LearningRate 0.0004   Epoch: 37   Global Step: 188960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 06:19:24,869-Speed 10152.05 samples/sec   Loss 5.1979   LearningRate 0.0004   Epoch: 37   Global Step: 188970   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:19:25,885-Speed 10085.68 samples/sec   Loss 5.1304   LearningRate 0.0004   Epoch: 37   Global Step: 188980   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:19:26,902-Speed 10078.01 samples/sec   Loss 5.0848   LearningRate 0.0004   Epoch: 37   Global Step: 188990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:19:27,952-Speed 9769.93 samples/sec   Loss 5.0928   LearningRate 0.0004   Epoch: 37   Global Step: 189000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:19:28,894-Speed 10885.33 samples/sec   Loss 5.0138   LearningRate 0.0004   Epoch: 37   Global Step: 189010   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:19:29,876-Speed 10431.56 samples/sec   Loss 5.1682   LearningRate 0.0004   Epoch: 37   Global Step: 189020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:19:30,847-Speed 10553.32 samples/sec   Loss 5.1930   LearningRate 0.0004   Epoch: 37   Global Step: 189030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:19:31,853-Speed 10186.50 samples/sec   Loss 5.2219   LearningRate 0.0004   Epoch: 37   Global Step: 189040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:19:32,829-Speed 10508.57 samples/sec   Loss 5.1360   LearningRate 0.0004   Epoch: 37   Global Step: 189050   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:19:33,810-Speed 10444.39 samples/sec   Loss 5.2048   LearningRate 0.0004   Epoch: 37   Global Step: 189060   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:19:34,796-Speed 10394.28 samples/sec   Loss 5.1830   LearningRate 0.0004   Epoch: 37   Global Step: 189070   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:19:35,825-Speed 9954.63 samples/sec   Loss 5.2344   LearningRate 0.0004   Epoch: 37   Global Step: 189080   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:19:36,811-Speed 10401.84 samples/sec   Loss 5.2146   LearningRate 0.0004   Epoch: 37   Global Step: 189090   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:19:37,869-Speed 9693.08 samples/sec   Loss 5.1779   LearningRate 0.0004   Epoch: 37   Global Step: 189100   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:19:38,862-Speed 10315.57 samples/sec   Loss 5.0567   LearningRate 0.0004   Epoch: 37   Global Step: 189110   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:19:39,850-Speed 10383.41 samples/sec   Loss 5.2551   LearningRate 0.0004   Epoch: 37   Global Step: 189120   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:19:40,873-Speed 10014.74 samples/sec   Loss 5.0594   LearningRate 0.0004   Epoch: 37   Global Step: 189130   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:19:41,860-Speed 10384.39 samples/sec   Loss 5.1278   LearningRate 0.0004   Epoch: 37   Global Step: 189140   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:19:42,859-Speed 10259.74 samples/sec   Loss 5.2080   LearningRate 0.0004   Epoch: 37   Global Step: 189150   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:19:43,821-Speed 10657.19 samples/sec   Loss 5.1795   LearningRate 0.0004   Epoch: 37   Global Step: 189160   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:19:44,837-Speed 10083.84 samples/sec   Loss 5.1315   LearningRate 0.0004   Epoch: 37   Global Step: 189170   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:19:45,837-Speed 10248.12 samples/sec   Loss 5.1541   LearningRate 0.0004   Epoch: 37   Global Step: 189180   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:19:46,848-Speed 10142.45 samples/sec   Loss 5.1262   LearningRate 0.0004   Epoch: 37   Global Step: 189190   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:19:47,871-Speed 10016.82 samples/sec   Loss 5.0639   LearningRate 0.0004   Epoch: 37   Global Step: 189200   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:19:48,867-Speed 10289.42 samples/sec   Loss 5.1355   LearningRate 0.0004   Epoch: 37   Global Step: 189210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:19:49,889-Speed 10030.48 samples/sec   Loss 5.0733   LearningRate 0.0004   Epoch: 37   Global Step: 189220   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:19:50,830-Speed 10887.48 samples/sec   Loss 4.9623   LearningRate 0.0004   Epoch: 37   Global Step: 189230   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:19:51,840-Speed 10156.79 samples/sec   Loss 5.1249   LearningRate 0.0004   Epoch: 37   Global Step: 189240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:19:52,868-Speed 9966.07 samples/sec   Loss 5.1101   LearningRate 0.0004   Epoch: 37   Global Step: 189250   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-11 06:19:53,832-Speed 10639.26 samples/sec   Loss 5.0656   LearningRate 0.0004   Epoch: 37   Global Step: 189260   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:19:54,847-Speed 10097.03 samples/sec   Loss 5.0977   LearningRate 0.0004   Epoch: 37   Global Step: 189270   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:19:55,850-Speed 10223.73 samples/sec   Loss 5.1375   LearningRate 0.0004   Epoch: 37   Global Step: 189280   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:19:56,830-Speed 10455.82 samples/sec   Loss 5.3188   LearningRate 0.0004   Epoch: 37   Global Step: 189290   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:19:57,844-Speed 10109.51 samples/sec   Loss 5.1182   LearningRate 0.0004   Epoch: 37   Global Step: 189300   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:19:58,846-Speed 10224.03 samples/sec   Loss 5.1994   LearningRate 0.0004   Epoch: 37   Global Step: 189310   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:19:59,814-Speed 10592.05 samples/sec   Loss 5.1252   LearningRate 0.0004   Epoch: 37   Global Step: 189320   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:00,777-Speed 10644.17 samples/sec   Loss 5.2567   LearningRate 0.0004   Epoch: 37   Global Step: 189330   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:01,838-Speed 9654.84 samples/sec   Loss 5.1404   LearningRate 0.0004   Epoch: 37   Global Step: 189340   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:02,856-Speed 10070.23 samples/sec   Loss 5.2803   LearningRate 0.0004   Epoch: 37   Global Step: 189350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:03,861-Speed 10206.25 samples/sec   Loss 5.0419   LearningRate 0.0004   Epoch: 37   Global Step: 189360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:04,885-Speed 10021.36 samples/sec   Loss 5.1082   LearningRate 0.0004   Epoch: 37   Global Step: 189370   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:05,910-Speed 9988.82 samples/sec   Loss 5.1916   LearningRate 0.0004   Epoch: 37   Global Step: 189380   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:06,920-Speed 10155.19 samples/sec   Loss 5.2364   LearningRate 0.0004   Epoch: 37   Global Step: 189390   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:20:07,924-Speed 10210.55 samples/sec   Loss 5.0985   LearningRate 0.0004   Epoch: 37   Global Step: 189400   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:20:08,959-Speed 9895.71 samples/sec   Loss 5.1806   LearningRate 0.0004   Epoch: 37   Global Step: 189410   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:20:09,943-Speed 10417.10 samples/sec   Loss 5.2825   LearningRate 0.0004   Epoch: 37   Global Step: 189420   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:20:10,948-Speed 10206.68 samples/sec   Loss 5.2218   LearningRate 0.0004   Epoch: 37   Global Step: 189430   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:20:11,950-Speed 10219.89 samples/sec   Loss 5.0390   LearningRate 0.0004   Epoch: 37   Global Step: 189440   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:20:12,930-Speed 10460.31 samples/sec   Loss 5.1416   LearningRate 0.0004   Epoch: 37   Global Step: 189450   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:20:13,962-Speed 9942.26 samples/sec   Loss 5.1793   LearningRate 0.0004   Epoch: 37   Global Step: 189460   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:20:14,937-Speed 10515.10 samples/sec   Loss 5.0964   LearningRate 0.0004   Epoch: 37   Global Step: 189470   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:20:15,911-Speed 10518.63 samples/sec   Loss 5.1699   LearningRate 0.0004   Epoch: 37   Global Step: 189480   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:20:16,890-Speed 10465.14 samples/sec   Loss 5.1881   LearningRate 0.0004   Epoch: 37   Global Step: 189490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:17,901-Speed 10140.00 samples/sec   Loss 5.0687   LearningRate 0.0004   Epoch: 37   Global Step: 189500   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:18,866-Speed 10626.71 samples/sec   Loss 5.2499   LearningRate 0.0004   Epoch: 37   Global Step: 189510   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:19,825-Speed 10682.76 samples/sec   Loss 5.2092   LearningRate 0.0004   Epoch: 37   Global Step: 189520   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:20,830-Speed 10194.39 samples/sec   Loss 5.2388   LearningRate 0.0004   Epoch: 37   Global Step: 189530   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:21,890-Speed 9689.37 samples/sec   Loss 5.2382   LearningRate 0.0004   Epoch: 37   Global Step: 189540   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:22,849-Speed 10694.49 samples/sec   Loss 5.2875   LearningRate 0.0004   Epoch: 37   Global Step: 189550   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:23,879-Speed 9957.41 samples/sec   Loss 5.2405   LearningRate 0.0004   Epoch: 37   Global Step: 189560   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:24,862-Speed 10426.95 samples/sec   Loss 5.1815   LearningRate 0.0004   Epoch: 37   Global Step: 189570   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:25,833-Speed 10555.90 samples/sec   Loss 5.0828   LearningRate 0.0004   Epoch: 37   Global Step: 189580   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:20:26,838-Speed 10187.04 samples/sec   Loss 5.0818   LearningRate 0.0004   Epoch: 37   Global Step: 189590   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:20:27,876-Speed 9881.87 samples/sec   Loss 5.1560   LearningRate 0.0004   Epoch: 37   Global Step: 189600   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:20:28,939-Speed 9639.81 samples/sec   Loss 5.2438   LearningRate 0.0004   Epoch: 37   Global Step: 189610   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:20:29,913-Speed 10530.19 samples/sec   Loss 5.1465   LearningRate 0.0004   Epoch: 37   Global Step: 189620   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:20:30,851-Speed 10924.60 samples/sec   Loss 5.1965   LearningRate 0.0004   Epoch: 37   Global Step: 189630   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:20:31,823-Speed 10552.77 samples/sec   Loss 5.1456   LearningRate 0.0004   Epoch: 37   Global Step: 189640   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:20:32,817-Speed 10315.05 samples/sec   Loss 5.1821   LearningRate 0.0004   Epoch: 37   Global Step: 189650   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:20:33,789-Speed 10552.61 samples/sec   Loss 5.1553   LearningRate 0.0004   Epoch: 37   Global Step: 189660   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:20:34,812-Speed 10021.50 samples/sec   Loss 5.1672   LearningRate 0.0004   Epoch: 37   Global Step: 189670   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:20:35,807-Speed 10302.24 samples/sec   Loss 5.2376   LearningRate 0.0004   Epoch: 37   Global Step: 189680   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:36,776-Speed 10574.09 samples/sec   Loss 5.2106   LearningRate 0.0004   Epoch: 37   Global Step: 189690   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:37,763-Speed 10387.11 samples/sec   Loss 5.1249   LearningRate 0.0004   Epoch: 37   Global Step: 189700   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:38,834-Speed 9576.65 samples/sec   Loss 5.2188   LearningRate 0.0004   Epoch: 37   Global Step: 189710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:39,799-Speed 10619.84 samples/sec   Loss 5.2102   LearningRate 0.0004   Epoch: 37   Global Step: 189720   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:40,765-Speed 10609.75 samples/sec   Loss 5.0782   LearningRate 0.0004   Epoch: 37   Global Step: 189730   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:41,787-Speed 10024.37 samples/sec   Loss 5.1065   LearningRate 0.0004   Epoch: 37   Global Step: 189740   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:42,781-Speed 10328.23 samples/sec   Loss 5.2377   LearningRate 0.0004   Epoch: 37   Global Step: 189750   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:43,744-Speed 10637.48 samples/sec   Loss 5.1705   LearningRate 0.0004   Epoch: 37   Global Step: 189760   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:44,765-Speed 10048.02 samples/sec   Loss 5.2288   LearningRate 0.0004   Epoch: 37   Global Step: 189770   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:45,740-Speed 10519.35 samples/sec   Loss 5.2340   LearningRate 0.0004   Epoch: 37   Global Step: 189780   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:46,703-Speed 10632.83 samples/sec   Loss 5.1576   LearningRate 0.0004   Epoch: 37   Global Step: 189790   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:47,759-Speed 9725.72 samples/sec   Loss 5.2063   LearningRate 0.0004   Epoch: 37   Global Step: 189800   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:48,832-Speed 9556.16 samples/sec   Loss 5.0928   LearningRate 0.0004   Epoch: 37   Global Step: 189810   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:49,805-Speed 10557.77 samples/sec   Loss 5.0442   LearningRate 0.0004   Epoch: 37   Global Step: 189820   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:50,817-Speed 10125.17 samples/sec   Loss 5.0432   LearningRate 0.0004   Epoch: 37   Global Step: 189830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:51,824-Speed 10185.19 samples/sec   Loss 4.9275   LearningRate 0.0004   Epoch: 37   Global Step: 189840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:52,783-Speed 10689.66 samples/sec   Loss 5.3620   LearningRate 0.0004   Epoch: 37   Global Step: 189850   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:53,764-Speed 10444.22 samples/sec   Loss 5.2627   LearningRate 0.0004   Epoch: 37   Global Step: 189860   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:54,782-Speed 10070.07 samples/sec   Loss 5.1680   LearningRate 0.0004   Epoch: 37   Global Step: 189870   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:55,738-Speed 10719.84 samples/sec   Loss 5.1574   LearningRate 0.0004   Epoch: 37   Global Step: 189880   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:56,754-Speed 10086.21 samples/sec   Loss 5.2279   LearningRate 0.0004   Epoch: 37   Global Step: 189890   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:57,782-Speed 9971.35 samples/sec   Loss 5.0694   LearningRate 0.0004   Epoch: 37   Global Step: 189900   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:58,804-Speed 10025.51 samples/sec   Loss 5.1123   LearningRate 0.0004   Epoch: 37   Global Step: 189910   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:20:59,778-Speed 10528.19 samples/sec   Loss 5.3602   LearningRate 0.0004   Epoch: 37   Global Step: 189920   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:21:00,768-Speed 10350.70 samples/sec   Loss 5.1411   LearningRate 0.0004   Epoch: 37   Global Step: 189930   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:21:01,839-Speed 9576.48 samples/sec   Loss 5.0655   LearningRate 0.0004   Epoch: 37   Global Step: 189940   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:21:02,834-Speed 10306.62 samples/sec   Loss 5.1526   LearningRate 0.0004   Epoch: 37   Global Step: 189950   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:21:03,876-Speed 9836.63 samples/sec   Loss 5.1670   LearningRate 0.0004   Epoch: 37   Global Step: 189960   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:21:04,872-Speed 10288.94 samples/sec   Loss 5.0471   LearningRate 0.0004   Epoch: 37   Global Step: 189970   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:21:05,835-Speed 10648.65 samples/sec   Loss 5.2070   LearningRate 0.0004   Epoch: 37   Global Step: 189980   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:21:06,825-Speed 10350.40 samples/sec   Loss 5.1078   LearningRate 0.0004   Epoch: 37   Global Step: 189990   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:21:07,826-Speed 10251.19 samples/sec   Loss 5.2457   LearningRate 0.0004   Epoch: 37   Global Step: 190000   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:21:30,538-[lfw][190000]XNorm: 7.975839
Training: 2022-04-11 06:21:30,539-[lfw][190000]Accuracy-Flip: 0.99667+-0.00333
Training: 2022-04-11 06:21:30,539-[lfw][190000]Accuracy-Highest: 0.99700
Training: 2022-04-11 06:21:55,953-[cfp_fp][190000]XNorm: 6.900032
Training: 2022-04-11 06:21:55,954-[cfp_fp][190000]Accuracy-Flip: 0.97214+-0.00931
Training: 2022-04-11 06:21:55,955-[cfp_fp][190000]Accuracy-Highest: 0.97371
Training: 2022-04-11 06:22:18,226-[agedb_30][190000]XNorm: 7.801801
Training: 2022-04-11 06:22:18,226-[agedb_30][190000]Accuracy-Flip: 0.97050+-0.00796
Training: 2022-04-11 06:22:18,227-[agedb_30][190000]Accuracy-Highest: 0.97350
Training: 2022-04-11 06:22:19,180-Speed 143.51 samples/sec   Loss 5.1708   LearningRate 0.0004   Epoch: 37   Global Step: 190010   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:22:20,188-Speed 10160.40 samples/sec   Loss 5.0994   LearningRate 0.0004   Epoch: 37   Global Step: 190020   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:22:21,221-Speed 9926.69 samples/sec   Loss 5.2592   LearningRate 0.0004   Epoch: 37   Global Step: 190030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:22:22,174-Speed 10756.09 samples/sec   Loss 5.2181   LearningRate 0.0004   Epoch: 37   Global Step: 190040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:22:23,184-Speed 10143.20 samples/sec   Loss 5.1454   LearningRate 0.0004   Epoch: 37   Global Step: 190050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:22:24,150-Speed 10608.44 samples/sec   Loss 5.1429   LearningRate 0.0004   Epoch: 37   Global Step: 190060   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:22:25,197-Speed 9790.92 samples/sec   Loss 5.1205   LearningRate 0.0004   Epoch: 37   Global Step: 190070   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:22:26,178-Speed 10443.41 samples/sec   Loss 5.0178   LearningRate 0.0004   Epoch: 37   Global Step: 190080   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:22:27,113-Speed 10969.37 samples/sec   Loss 5.2998   LearningRate 0.0004   Epoch: 37   Global Step: 190090   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:22:28,039-Speed 11072.56 samples/sec   Loss 5.2894   LearningRate 0.0004   Epoch: 37   Global Step: 190100   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:22:29,094-Speed 9717.07 samples/sec   Loss 5.1422   LearningRate 0.0004   Epoch: 37   Global Step: 190110   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:22:30,134-Speed 9852.50 samples/sec   Loss 5.1105   LearningRate 0.0004   Epoch: 37   Global Step: 190120   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:22:31,115-Speed 10451.78 samples/sec   Loss 5.1272   LearningRate 0.0004   Epoch: 37   Global Step: 190130   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:22:32,167-Speed 9741.81 samples/sec   Loss 5.0588   LearningRate 0.0004   Epoch: 37   Global Step: 190140   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:22:33,210-Speed 9823.01 samples/sec   Loss 5.2095   LearningRate 0.0004   Epoch: 37   Global Step: 190150   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:22:34,176-Speed 10617.19 samples/sec   Loss 5.1760   LearningRate 0.0004   Epoch: 37   Global Step: 190160   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:22:35,173-Speed 10274.26 samples/sec   Loss 5.2608   LearningRate 0.0004   Epoch: 37   Global Step: 190170   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:22:36,157-Speed 10418.41 samples/sec   Loss 5.2987   LearningRate 0.0004   Epoch: 37   Global Step: 190180   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:22:37,155-Speed 10263.60 samples/sec   Loss 5.1780   LearningRate 0.0004   Epoch: 37   Global Step: 190190   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:22:38,152-Speed 10285.68 samples/sec   Loss 5.1745   LearningRate 0.0004   Epoch: 37   Global Step: 190200   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:22:39,122-Speed 10564.06 samples/sec   Loss 5.1849   LearningRate 0.0004   Epoch: 37   Global Step: 190210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:22:40,122-Speed 10256.08 samples/sec   Loss 5.1027   LearningRate 0.0004   Epoch: 37   Global Step: 190220   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:22:41,156-Speed 9912.44 samples/sec   Loss 5.2063   LearningRate 0.0004   Epoch: 37   Global Step: 190230   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:22:42,167-Speed 10140.91 samples/sec   Loss 5.1725   LearningRate 0.0004   Epoch: 37   Global Step: 190240   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:22:43,155-Speed 10378.59 samples/sec   Loss 5.1124   LearningRate 0.0004   Epoch: 37   Global Step: 190250   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:22:44,186-Speed 9939.76 samples/sec   Loss 5.2582   LearningRate 0.0004   Epoch: 37   Global Step: 190260   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:22:45,187-Speed 10234.74 samples/sec   Loss 4.9969   LearningRate 0.0004   Epoch: 37   Global Step: 190270   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:22:46,149-Speed 10660.72 samples/sec   Loss 5.1432   LearningRate 0.0004   Epoch: 37   Global Step: 190280   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:22:47,182-Speed 9924.73 samples/sec   Loss 5.2777   LearningRate 0.0004   Epoch: 37   Global Step: 190290   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:22:48,164-Speed 10448.06 samples/sec   Loss 5.1073   LearningRate 0.0004   Epoch: 37   Global Step: 190300   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:22:49,175-Speed 10136.14 samples/sec   Loss 5.1940   LearningRate 0.0004   Epoch: 37   Global Step: 190310   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:22:50,186-Speed 10131.97 samples/sec   Loss 5.1171   LearningRate 0.0004   Epoch: 37   Global Step: 190320   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:22:51,196-Speed 10154.58 samples/sec   Loss 5.0104   LearningRate 0.0004   Epoch: 37   Global Step: 190330   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:22:52,217-Speed 10029.66 samples/sec   Loss 5.2494   LearningRate 0.0004   Epoch: 37   Global Step: 190340   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:22:53,209-Speed 10332.89 samples/sec   Loss 5.1205   LearningRate 0.0004   Epoch: 37   Global Step: 190350   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:22:54,232-Speed 10026.79 samples/sec   Loss 5.2204   LearningRate 0.0003   Epoch: 37   Global Step: 190360   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:22:55,181-Speed 10801.31 samples/sec   Loss 5.1932   LearningRate 0.0003   Epoch: 37   Global Step: 190370   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:22:56,194-Speed 10117.44 samples/sec   Loss 5.1931   LearningRate 0.0003   Epoch: 37   Global Step: 190380   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:22:57,160-Speed 10608.43 samples/sec   Loss 5.0440   LearningRate 0.0003   Epoch: 37   Global Step: 190390   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:22:58,166-Speed 10180.61 samples/sec   Loss 5.1451   LearningRate 0.0003   Epoch: 37   Global Step: 190400   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:22:59,157-Speed 10348.48 samples/sec   Loss 5.2081   LearningRate 0.0003   Epoch: 37   Global Step: 190410   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:23:00,133-Speed 10497.68 samples/sec   Loss 5.2839   LearningRate 0.0003   Epoch: 37   Global Step: 190420   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:23:01,114-Speed 10453.38 samples/sec   Loss 5.1175   LearningRate 0.0003   Epoch: 37   Global Step: 190430   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:23:02,084-Speed 10574.18 samples/sec   Loss 5.0837   LearningRate 0.0003   Epoch: 37   Global Step: 190440   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:23:03,091-Speed 10174.64 samples/sec   Loss 5.1290   LearningRate 0.0003   Epoch: 37   Global Step: 190450   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:23:04,061-Speed 10567.33 samples/sec   Loss 5.1369   LearningRate 0.0003   Epoch: 37   Global Step: 190460   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:23:05,114-Speed 9731.14 samples/sec   Loss 5.1675   LearningRate 0.0003   Epoch: 37   Global Step: 190470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:23:06,070-Speed 10732.92 samples/sec   Loss 4.9408   LearningRate 0.0003   Epoch: 37   Global Step: 190480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:23:07,027-Speed 10700.25 samples/sec   Loss 5.0635   LearningRate 0.0003   Epoch: 37   Global Step: 190490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:23:08,023-Speed 10298.99 samples/sec   Loss 5.2038   LearningRate 0.0003   Epoch: 37   Global Step: 190500   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:23:09,007-Speed 10419.26 samples/sec   Loss 5.1908   LearningRate 0.0003   Epoch: 37   Global Step: 190510   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:23:10,014-Speed 10187.65 samples/sec   Loss 5.1511   LearningRate 0.0003   Epoch: 37   Global Step: 190520   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:23:10,983-Speed 10572.83 samples/sec   Loss 5.2565   LearningRate 0.0003   Epoch: 37   Global Step: 190530   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:23:12,009-Speed 9985.04 samples/sec   Loss 5.1973   LearningRate 0.0003   Epoch: 37   Global Step: 190540   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:23:12,984-Speed 10518.16 samples/sec   Loss 5.1397   LearningRate 0.0003   Epoch: 37   Global Step: 190550   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:23:13,975-Speed 10335.90 samples/sec   Loss 4.9429   LearningRate 0.0003   Epoch: 37   Global Step: 190560   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:23:14,968-Speed 10321.85 samples/sec   Loss 5.1677   LearningRate 0.0003   Epoch: 37   Global Step: 190570   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:23:15,963-Speed 10306.24 samples/sec   Loss 5.1550   LearningRate 0.0003   Epoch: 37   Global Step: 190580   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:23:16,953-Speed 10351.23 samples/sec   Loss 5.2567   LearningRate 0.0003   Epoch: 37   Global Step: 190590   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:23:17,951-Speed 10272.88 samples/sec   Loss 5.2572   LearningRate 0.0003   Epoch: 37   Global Step: 190600   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:23:18,955-Speed 10208.35 samples/sec   Loss 5.0927   LearningRate 0.0003   Epoch: 37   Global Step: 190610   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:23:19,979-Speed 10004.54 samples/sec   Loss 5.2188   LearningRate 0.0003   Epoch: 37   Global Step: 190620   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:23:20,968-Speed 10365.72 samples/sec   Loss 5.0719   LearningRate 0.0003   Epoch: 37   Global Step: 190630   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:23:21,969-Speed 10235.09 samples/sec   Loss 5.2528   LearningRate 0.0003   Epoch: 37   Global Step: 190640   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:23:22,944-Speed 10522.84 samples/sec   Loss 5.1682   LearningRate 0.0003   Epoch: 37   Global Step: 190650   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:23:23,935-Speed 10346.76 samples/sec   Loss 5.1375   LearningRate 0.0003   Epoch: 37   Global Step: 190660   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:23:24,928-Speed 10314.52 samples/sec   Loss 5.2260   LearningRate 0.0003   Epoch: 37   Global Step: 190670   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:23:25,905-Speed 10489.79 samples/sec   Loss 5.0978   LearningRate 0.0003   Epoch: 37   Global Step: 190680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:23:26,892-Speed 10384.81 samples/sec   Loss 5.1435   LearningRate 0.0003   Epoch: 37   Global Step: 190690   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:23:27,861-Speed 10581.97 samples/sec   Loss 5.1453   LearningRate 0.0003   Epoch: 37   Global Step: 190700   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:23:28,880-Speed 10062.09 samples/sec   Loss 5.1555   LearningRate 0.0003   Epoch: 37   Global Step: 190710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:23:29,870-Speed 10354.61 samples/sec   Loss 5.1940   LearningRate 0.0003   Epoch: 37   Global Step: 190720   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:23:30,862-Speed 10325.18 samples/sec   Loss 5.0871   LearningRate 0.0003   Epoch: 37   Global Step: 190730   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:23:31,871-Speed 10160.81 samples/sec   Loss 5.1660   LearningRate 0.0003   Epoch: 37   Global Step: 190740   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:23:32,825-Speed 10747.31 samples/sec   Loss 4.9964   LearningRate 0.0003   Epoch: 37   Global Step: 190750   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:23:33,783-Speed 10700.80 samples/sec   Loss 5.3106   LearningRate 0.0003   Epoch: 37   Global Step: 190760   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:23:34,770-Speed 10387.90 samples/sec   Loss 5.1279   LearningRate 0.0003   Epoch: 37   Global Step: 190770   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:23:35,719-Speed 10793.57 samples/sec   Loss 5.3870   LearningRate 0.0003   Epoch: 37   Global Step: 190780   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:23:36,746-Speed 9978.43 samples/sec   Loss 5.1003   LearningRate 0.0003   Epoch: 37   Global Step: 190790   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:23:37,741-Speed 10316.08 samples/sec   Loss 5.1651   LearningRate 0.0003   Epoch: 37   Global Step: 190800   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:23:38,727-Speed 10396.15 samples/sec   Loss 5.1891   LearningRate 0.0003   Epoch: 37   Global Step: 190810   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-11 06:23:39,677-Speed 10786.78 samples/sec   Loss 5.1955   LearningRate 0.0003   Epoch: 37   Global Step: 190820   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:23:40,681-Speed 10207.64 samples/sec   Loss 5.2142   LearningRate 0.0003   Epoch: 37   Global Step: 190830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:23:41,671-Speed 10345.62 samples/sec   Loss 5.1064   LearningRate 0.0003   Epoch: 37   Global Step: 190840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:23:42,674-Speed 10226.48 samples/sec   Loss 5.1798   LearningRate 0.0003   Epoch: 37   Global Step: 190850   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:23:43,650-Speed 10499.36 samples/sec   Loss 5.3792   LearningRate 0.0003   Epoch: 37   Global Step: 190860   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:23:44,603-Speed 10753.92 samples/sec   Loss 5.2796   LearningRate 0.0003   Epoch: 37   Global Step: 190870   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:23:45,602-Speed 10254.93 samples/sec   Loss 5.1997   LearningRate 0.0003   Epoch: 37   Global Step: 190880   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:23:46,618-Speed 10089.54 samples/sec   Loss 5.1049   LearningRate 0.0003   Epoch: 37   Global Step: 190890   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:23:47,598-Speed 10463.62 samples/sec   Loss 5.0696   LearningRate 0.0003   Epoch: 37   Global Step: 190900   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:23:48,569-Speed 10559.21 samples/sec   Loss 5.1155   LearningRate 0.0003   Epoch: 37   Global Step: 190910   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:23:49,578-Speed 10153.54 samples/sec   Loss 5.2254   LearningRate 0.0003   Epoch: 37   Global Step: 190920   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-11 06:23:50,554-Speed 10503.67 samples/sec   Loss 5.2328   LearningRate 0.0003   Epoch: 37   Global Step: 190930   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:23:51,547-Speed 10323.68 samples/sec   Loss 5.0858   LearningRate 0.0003   Epoch: 37   Global Step: 190940   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:23:52,551-Speed 10203.38 samples/sec   Loss 5.0991   LearningRate 0.0003   Epoch: 37   Global Step: 190950   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:23:53,526-Speed 10513.20 samples/sec   Loss 5.1548   LearningRate 0.0003   Epoch: 37   Global Step: 190960   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:23:54,509-Speed 10426.83 samples/sec   Loss 4.9692   LearningRate 0.0003   Epoch: 37   Global Step: 190970   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:23:55,517-Speed 10171.71 samples/sec   Loss 5.1876   LearningRate 0.0003   Epoch: 37   Global Step: 190980   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:23:56,479-Speed 10654.23 samples/sec   Loss 5.1307   LearningRate 0.0003   Epoch: 37   Global Step: 190990   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:23:57,419-Speed 10898.95 samples/sec   Loss 5.0654   LearningRate 0.0003   Epoch: 37   Global Step: 191000   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:23:58,437-Speed 10071.19 samples/sec   Loss 5.2139   LearningRate 0.0003   Epoch: 37   Global Step: 191010   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:23:59,408-Speed 10552.89 samples/sec   Loss 5.0502   LearningRate 0.0003   Epoch: 37   Global Step: 191020   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:24:00,449-Speed 9840.31 samples/sec   Loss 5.1371   LearningRate 0.0003   Epoch: 37   Global Step: 191030   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:24:01,420-Speed 10555.67 samples/sec   Loss 5.0921   LearningRate 0.0003   Epoch: 37   Global Step: 191040   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:24:02,422-Speed 10234.95 samples/sec   Loss 5.0042   LearningRate 0.0003   Epoch: 37   Global Step: 191050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:24:03,434-Speed 10129.79 samples/sec   Loss 5.2123   LearningRate 0.0003   Epoch: 37   Global Step: 191060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:24:04,560-Speed 9102.26 samples/sec   Loss 5.1625   LearningRate 0.0003   Epoch: 37   Global Step: 191070   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:24:05,516-Speed 10722.36 samples/sec   Loss 5.1713   LearningRate 0.0003   Epoch: 37   Global Step: 191080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:24:06,487-Speed 10551.88 samples/sec   Loss 5.1241   LearningRate 0.0003   Epoch: 37   Global Step: 191090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:24:07,501-Speed 10109.61 samples/sec   Loss 5.2586   LearningRate 0.0003   Epoch: 37   Global Step: 191100   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:24:08,543-Speed 9837.08 samples/sec   Loss 5.2361   LearningRate 0.0003   Epoch: 37   Global Step: 191110   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:24:09,544-Speed 10232.89 samples/sec   Loss 5.1440   LearningRate 0.0003   Epoch: 37   Global Step: 191120   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:24:10,506-Speed 10658.10 samples/sec   Loss 5.0595   LearningRate 0.0003   Epoch: 37   Global Step: 191130   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:24:11,485-Speed 10465.85 samples/sec   Loss 5.2473   LearningRate 0.0003   Epoch: 37   Global Step: 191140   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:24:12,512-Speed 9980.37 samples/sec   Loss 5.0970   LearningRate 0.0003   Epoch: 37   Global Step: 191150   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:24:13,525-Speed 10122.51 samples/sec   Loss 5.1871   LearningRate 0.0003   Epoch: 37   Global Step: 191160   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:24:14,496-Speed 10555.10 samples/sec   Loss 5.2172   LearningRate 0.0003   Epoch: 37   Global Step: 191170   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:24:15,526-Speed 9945.98 samples/sec   Loss 5.1864   LearningRate 0.0003   Epoch: 37   Global Step: 191180   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:24:16,493-Speed 10602.77 samples/sec   Loss 5.1315   LearningRate 0.0003   Epoch: 37   Global Step: 191190   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:24:17,481-Speed 10378.43 samples/sec   Loss 5.2038   LearningRate 0.0003   Epoch: 37   Global Step: 191200   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:24:18,428-Speed 10818.13 samples/sec   Loss 5.1355   LearningRate 0.0003   Epoch: 37   Global Step: 191210   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:24:19,458-Speed 9954.81 samples/sec   Loss 5.2528   LearningRate 0.0003   Epoch: 37   Global Step: 191220   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:24:20,438-Speed 10461.47 samples/sec   Loss 5.2563   LearningRate 0.0003   Epoch: 37   Global Step: 191230   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:24:21,393-Speed 10721.33 samples/sec   Loss 5.1326   LearningRate 0.0003   Epoch: 37   Global Step: 191240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:24:22,394-Speed 10241.84 samples/sec   Loss 5.1014   LearningRate 0.0003   Epoch: 37   Global Step: 191250   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:24:23,363-Speed 10619.84 samples/sec   Loss 5.2970   LearningRate 0.0003   Epoch: 37   Global Step: 191260   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:24:24,332-Speed 10579.08 samples/sec   Loss 5.1768   LearningRate 0.0003   Epoch: 37   Global Step: 191270   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:24:25,301-Speed 10581.86 samples/sec   Loss 5.1418   LearningRate 0.0003   Epoch: 37   Global Step: 191280   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:24:26,274-Speed 10535.08 samples/sec   Loss 5.1539   LearningRate 0.0003   Epoch: 37   Global Step: 191290   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:24:27,252-Speed 10481.41 samples/sec   Loss 5.2179   LearningRate 0.0003   Epoch: 37   Global Step: 191300   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:24:28,261-Speed 10163.09 samples/sec   Loss 5.1372   LearningRate 0.0003   Epoch: 37   Global Step: 191310   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:24:29,207-Speed 10837.89 samples/sec   Loss 5.1737   LearningRate 0.0003   Epoch: 37   Global Step: 191320   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:24:30,175-Speed 10587.44 samples/sec   Loss 5.1916   LearningRate 0.0003   Epoch: 37   Global Step: 191330   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:24:31,268-Speed 9376.75 samples/sec   Loss 5.1189   LearningRate 0.0003   Epoch: 37   Global Step: 191340   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:24:32,301-Speed 9930.49 samples/sec   Loss 5.1757   LearningRate 0.0003   Epoch: 37   Global Step: 191350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:24:33,279-Speed 10483.93 samples/sec   Loss 5.1153   LearningRate 0.0003   Epoch: 37   Global Step: 191360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:24:34,329-Speed 9767.87 samples/sec   Loss 5.2037   LearningRate 0.0003   Epoch: 37   Global Step: 191370   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:24:35,324-Speed 10304.35 samples/sec   Loss 5.0267   LearningRate 0.0003   Epoch: 37   Global Step: 191380   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:24:36,317-Speed 10327.46 samples/sec   Loss 5.2615   LearningRate 0.0003   Epoch: 37   Global Step: 191390   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:24:37,308-Speed 10354.45 samples/sec   Loss 5.2172   LearningRate 0.0003   Epoch: 37   Global Step: 191400   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:24:38,300-Speed 10335.46 samples/sec   Loss 5.2198   LearningRate 0.0003   Epoch: 37   Global Step: 191410   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:24:39,332-Speed 9928.62 samples/sec   Loss 5.0820   LearningRate 0.0003   Epoch: 37   Global Step: 191420   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:24:40,327-Speed 10309.83 samples/sec   Loss 5.1653   LearningRate 0.0003   Epoch: 37   Global Step: 191430   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:24:41,304-Speed 10480.13 samples/sec   Loss 5.1911   LearningRate 0.0003   Epoch: 37   Global Step: 191440   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:24:42,357-Speed 9735.05 samples/sec   Loss 5.1729   LearningRate 0.0003   Epoch: 37   Global Step: 191450   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:24:43,325-Speed 10596.36 samples/sec   Loss 5.1448   LearningRate 0.0003   Epoch: 37   Global Step: 191460   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:24:44,342-Speed 10078.17 samples/sec   Loss 5.0846   LearningRate 0.0003   Epoch: 37   Global Step: 191470   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:24:45,370-Speed 9974.83 samples/sec   Loss 5.0628   LearningRate 0.0003   Epoch: 37   Global Step: 191480   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:24:46,374-Speed 10208.74 samples/sec   Loss 5.0985   LearningRate 0.0003   Epoch: 37   Global Step: 191490   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:24:47,323-Speed 10797.78 samples/sec   Loss 5.0950   LearningRate 0.0003   Epoch: 37   Global Step: 191500   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:24:48,310-Speed 10384.03 samples/sec   Loss 5.2076   LearningRate 0.0003   Epoch: 37   Global Step: 191510   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:24:49,307-Speed 10285.62 samples/sec   Loss 5.2262   LearningRate 0.0003   Epoch: 37   Global Step: 191520   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:24:50,221-Speed 11217.00 samples/sec   Loss 5.2969   LearningRate 0.0003   Epoch: 37   Global Step: 191530   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:24:51,165-Speed 10857.97 samples/sec   Loss 5.2850   LearningRate 0.0003   Epoch: 37   Global Step: 191540   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:24:52,197-Speed 9933.98 samples/sec   Loss 5.1680   LearningRate 0.0003   Epoch: 37   Global Step: 191550   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:24:53,172-Speed 10505.37 samples/sec   Loss 5.1478   LearningRate 0.0003   Epoch: 37   Global Step: 191560   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:24:54,195-Speed 10017.05 samples/sec   Loss 5.1350   LearningRate 0.0003   Epoch: 37   Global Step: 191570   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:24:55,174-Speed 10477.17 samples/sec   Loss 5.1247   LearningRate 0.0003   Epoch: 37   Global Step: 191580   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:24:56,172-Speed 10264.52 samples/sec   Loss 5.0052   LearningRate 0.0003   Epoch: 37   Global Step: 191590   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:24:57,190-Speed 10067.94 samples/sec   Loss 5.1573   LearningRate 0.0003   Epoch: 37   Global Step: 191600   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:24:58,166-Speed 10498.07 samples/sec   Loss 5.3635   LearningRate 0.0003   Epoch: 37   Global Step: 191610   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:24:59,104-Speed 10927.59 samples/sec   Loss 5.1547   LearningRate 0.0003   Epoch: 37   Global Step: 191620   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:25:00,087-Speed 10434.44 samples/sec   Loss 5.1361   LearningRate 0.0003   Epoch: 37   Global Step: 191630   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:25:01,055-Speed 10580.93 samples/sec   Loss 5.2225   LearningRate 0.0003   Epoch: 37   Global Step: 191640   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:25:02,057-Speed 10227.08 samples/sec   Loss 5.1707   LearningRate 0.0003   Epoch: 37   Global Step: 191650   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:25:03,014-Speed 10720.89 samples/sec   Loss 5.1812   LearningRate 0.0003   Epoch: 37   Global Step: 191660   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:25:03,971-Speed 10711.95 samples/sec   Loss 5.1179   LearningRate 0.0003   Epoch: 37   Global Step: 191670   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:25:04,978-Speed 10174.16 samples/sec   Loss 5.1680   LearningRate 0.0003   Epoch: 37   Global Step: 191680   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:25:05,991-Speed 10111.20 samples/sec   Loss 5.1144   LearningRate 0.0003   Epoch: 37   Global Step: 191690   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:25:06,985-Speed 10315.80 samples/sec   Loss 5.1181   LearningRate 0.0003   Epoch: 37   Global Step: 191700   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:25:07,953-Speed 10591.23 samples/sec   Loss 5.1966   LearningRate 0.0003   Epoch: 37   Global Step: 191710   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:25:08,952-Speed 10258.74 samples/sec   Loss 5.2218   LearningRate 0.0003   Epoch: 37   Global Step: 191720   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:25:09,995-Speed 9818.51 samples/sec   Loss 5.1586   LearningRate 0.0003   Epoch: 37   Global Step: 191730   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:25:11,050-Speed 9727.83 samples/sec   Loss 5.1851   LearningRate 0.0003   Epoch: 37   Global Step: 191740   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:25:12,013-Speed 10640.13 samples/sec   Loss 5.1850   LearningRate 0.0003   Epoch: 37   Global Step: 191750   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:25:13,035-Speed 10032.93 samples/sec   Loss 5.2257   LearningRate 0.0003   Epoch: 37   Global Step: 191760   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:25:14,011-Speed 10512.84 samples/sec   Loss 5.1082   LearningRate 0.0003   Epoch: 37   Global Step: 191770   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:25:14,985-Speed 10525.01 samples/sec   Loss 5.3050   LearningRate 0.0003   Epoch: 37   Global Step: 191780   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:25:15,965-Speed 10454.06 samples/sec   Loss 5.0632   LearningRate 0.0003   Epoch: 37   Global Step: 191790   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:25:16,942-Speed 10494.74 samples/sec   Loss 5.1746   LearningRate 0.0003   Epoch: 37   Global Step: 191800   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:25:17,948-Speed 10187.61 samples/sec   Loss 5.3221   LearningRate 0.0003   Epoch: 37   Global Step: 191810   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:25:18,916-Speed 10595.94 samples/sec   Loss 5.1471   LearningRate 0.0003   Epoch: 37   Global Step: 191820   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:25:19,926-Speed 10140.06 samples/sec   Loss 5.1947   LearningRate 0.0003   Epoch: 37   Global Step: 191830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:25:20,904-Speed 10482.71 samples/sec   Loss 5.2247   LearningRate 0.0003   Epoch: 37   Global Step: 191840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:25:21,934-Speed 9946.52 samples/sec   Loss 5.1746   LearningRate 0.0003   Epoch: 37   Global Step: 191850   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:25:22,920-Speed 10400.33 samples/sec   Loss 5.0833   LearningRate 0.0003   Epoch: 37   Global Step: 191860   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:25:23,913-Speed 10317.95 samples/sec   Loss 5.1234   LearningRate 0.0003   Epoch: 37   Global Step: 191870   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:25:24,889-Speed 10503.44 samples/sec   Loss 5.1265   LearningRate 0.0003   Epoch: 37   Global Step: 191880   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:25:25,884-Speed 10301.89 samples/sec   Loss 5.1182   LearningRate 0.0003   Epoch: 37   Global Step: 191890   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:25:26,951-Speed 9607.31 samples/sec   Loss 5.2067   LearningRate 0.0003   Epoch: 37   Global Step: 191900   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:25:27,933-Speed 10433.86 samples/sec   Loss 5.2395   LearningRate 0.0003   Epoch: 37   Global Step: 191910   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:25:28,905-Speed 10570.67 samples/sec   Loss 5.1792   LearningRate 0.0003   Epoch: 37   Global Step: 191920   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:25:29,915-Speed 10155.16 samples/sec   Loss 5.2243   LearningRate 0.0003   Epoch: 37   Global Step: 191930   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:25:30,896-Speed 10451.69 samples/sec   Loss 5.2619   LearningRate 0.0003   Epoch: 37   Global Step: 191940   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:25:31,901-Speed 10189.52 samples/sec   Loss 5.2148   LearningRate 0.0003   Epoch: 37   Global Step: 191950   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:25:32,897-Speed 10295.89 samples/sec   Loss 5.2238   LearningRate 0.0003   Epoch: 37   Global Step: 191960   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:25:33,876-Speed 10465.98 samples/sec   Loss 5.1459   LearningRate 0.0003   Epoch: 37   Global Step: 191970   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:25:34,839-Speed 10647.69 samples/sec   Loss 5.0191   LearningRate 0.0003   Epoch: 37   Global Step: 191980   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:25:35,764-Speed 11083.39 samples/sec   Loss 5.0407   LearningRate 0.0003   Epoch: 37   Global Step: 191990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:25:36,772-Speed 10167.73 samples/sec   Loss 5.1931   LearningRate 0.0003   Epoch: 37   Global Step: 192000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:25:59,396-[lfw][192000]XNorm: 7.992332
Training: 2022-04-11 06:25:59,396-[lfw][192000]Accuracy-Flip: 0.99633+-0.00332
Training: 2022-04-11 06:25:59,397-[lfw][192000]Accuracy-Highest: 0.99700
Training: 2022-04-11 06:26:25,150-[cfp_fp][192000]XNorm: 6.921082
Training: 2022-04-11 06:26:25,151-[cfp_fp][192000]Accuracy-Flip: 0.97214+-0.00955
Training: 2022-04-11 06:26:25,152-[cfp_fp][192000]Accuracy-Highest: 0.97371
Training: 2022-04-11 06:26:47,495-[agedb_30][192000]XNorm: 7.819824
Training: 2022-04-11 06:26:47,496-[agedb_30][192000]Accuracy-Flip: 0.97250+-0.00638
Training: 2022-04-11 06:26:47,497-[agedb_30][192000]Accuracy-Highest: 0.97350
Training: 2022-04-11 06:26:48,460-Speed 142.84 samples/sec   Loss 5.1868   LearningRate 0.0003   Epoch: 37   Global Step: 192010   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:26:49,476-Speed 10086.05 samples/sec   Loss 5.0754   LearningRate 0.0003   Epoch: 37   Global Step: 192020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:26:50,493-Speed 10074.78 samples/sec   Loss 5.1187   LearningRate 0.0003   Epoch: 37   Global Step: 192030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:26:51,474-Speed 10451.56 samples/sec   Loss 5.2518   LearningRate 0.0003   Epoch: 37   Global Step: 192040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:26:52,476-Speed 10235.27 samples/sec   Loss 5.2223   LearningRate 0.0003   Epoch: 37   Global Step: 192050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:26:53,430-Speed 10734.81 samples/sec   Loss 5.2062   LearningRate 0.0003   Epoch: 37   Global Step: 192060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:26:54,457-Speed 9974.79 samples/sec   Loss 5.0940   LearningRate 0.0003   Epoch: 37   Global Step: 192070   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:26:55,454-Speed 10292.22 samples/sec   Loss 5.2242   LearningRate 0.0003   Epoch: 37   Global Step: 192080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:26:56,433-Speed 10468.02 samples/sec   Loss 5.2815   LearningRate 0.0003   Epoch: 37   Global Step: 192090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:26:57,388-Speed 10735.52 samples/sec   Loss 5.2228   LearningRate 0.0003   Epoch: 37   Global Step: 192100   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:26:58,362-Speed 10529.79 samples/sec   Loss 5.2342   LearningRate 0.0003   Epoch: 37   Global Step: 192110   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:26:59,357-Speed 10297.08 samples/sec   Loss 5.2582   LearningRate 0.0003   Epoch: 37   Global Step: 192120   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:27:00,311-Speed 10742.24 samples/sec   Loss 5.1997   LearningRate 0.0003   Epoch: 37   Global Step: 192130   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:27:01,336-Speed 10002.59 samples/sec   Loss 5.1548   LearningRate 0.0003   Epoch: 37   Global Step: 192140   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:27:02,336-Speed 10246.68 samples/sec   Loss 5.1694   LearningRate 0.0003   Epoch: 37   Global Step: 192150   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:27:03,313-Speed 10499.14 samples/sec   Loss 5.1915   LearningRate 0.0003   Epoch: 37   Global Step: 192160   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:27:04,296-Speed 10424.13 samples/sec   Loss 5.1755   LearningRate 0.0003   Epoch: 37   Global Step: 192170   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:27:05,280-Speed 10411.17 samples/sec   Loss 5.2458   LearningRate 0.0003   Epoch: 37   Global Step: 192180   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:27:06,225-Speed 10843.11 samples/sec   Loss 5.1546   LearningRate 0.0003   Epoch: 37   Global Step: 192190   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:27:07,312-Speed 9439.83 samples/sec   Loss 5.0938   LearningRate 0.0003   Epoch: 37   Global Step: 192200   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:27:17,034-Speed 1053.51 samples/sec   Loss 5.2625   LearningRate 0.0002   Epoch: 38   Global Step: 192210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:27:18,270-Speed 8293.96 samples/sec   Loss 4.9929   LearningRate 0.0002   Epoch: 38   Global Step: 192220   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:27:19,391-Speed 9140.87 samples/sec   Loss 4.9880   LearningRate 0.0002   Epoch: 38   Global Step: 192230   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:27:20,377-Speed 10400.34 samples/sec   Loss 5.0069   LearningRate 0.0002   Epoch: 38   Global Step: 192240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:27:21,658-Speed 7999.15 samples/sec   Loss 5.2150   LearningRate 0.0002   Epoch: 38   Global Step: 192250   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:27:22,716-Speed 9691.34 samples/sec   Loss 5.0869   LearningRate 0.0002   Epoch: 38   Global Step: 192260   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:27:23,804-Speed 9422.94 samples/sec   Loss 4.9339   LearningRate 0.0002   Epoch: 38   Global Step: 192270   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:27:24,789-Speed 10408.64 samples/sec   Loss 5.1217   LearningRate 0.0002   Epoch: 38   Global Step: 192280   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:27:25,737-Speed 10802.74 samples/sec   Loss 5.0768   LearningRate 0.0002   Epoch: 38   Global Step: 192290   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:27:26,806-Speed 9592.65 samples/sec   Loss 5.2384   LearningRate 0.0002   Epoch: 38   Global Step: 192300   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:27:27,776-Speed 10576.36 samples/sec   Loss 5.1340   LearningRate 0.0002   Epoch: 38   Global Step: 192310   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:27:28,768-Speed 10328.76 samples/sec   Loss 5.0266   LearningRate 0.0002   Epoch: 38   Global Step: 192320   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:27:29,769-Speed 10236.07 samples/sec   Loss 5.0408   LearningRate 0.0002   Epoch: 38   Global Step: 192330   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:27:30,745-Speed 10494.19 samples/sec   Loss 5.1204   LearningRate 0.0002   Epoch: 38   Global Step: 192340   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:27:31,778-Speed 9938.33 samples/sec   Loss 5.0741   LearningRate 0.0002   Epoch: 38   Global Step: 192350   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:27:32,797-Speed 10057.44 samples/sec   Loss 5.0352   LearningRate 0.0002   Epoch: 38   Global Step: 192360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:27:33,810-Speed 10115.51 samples/sec   Loss 5.0447   LearningRate 0.0002   Epoch: 38   Global Step: 192370   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:27:34,895-Speed 9451.63 samples/sec   Loss 5.0697   LearningRate 0.0002   Epoch: 38   Global Step: 192380   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:27:35,836-Speed 10901.27 samples/sec   Loss 5.1558   LearningRate 0.0002   Epoch: 38   Global Step: 192390   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:27:36,810-Speed 10515.17 samples/sec   Loss 5.0587   LearningRate 0.0002   Epoch: 38   Global Step: 192400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:27:37,818-Speed 10171.80 samples/sec   Loss 4.9890   LearningRate 0.0002   Epoch: 38   Global Step: 192410   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:27:38,869-Speed 9759.06 samples/sec   Loss 5.1253   LearningRate 0.0002   Epoch: 38   Global Step: 192420   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:27:39,842-Speed 10532.94 samples/sec   Loss 5.2101   LearningRate 0.0002   Epoch: 38   Global Step: 192430   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:27:40,848-Speed 10186.51 samples/sec   Loss 5.1358   LearningRate 0.0002   Epoch: 38   Global Step: 192440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:27:41,878-Speed 9952.92 samples/sec   Loss 5.1640   LearningRate 0.0002   Epoch: 38   Global Step: 192450   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:27:42,842-Speed 10647.08 samples/sec   Loss 5.0733   LearningRate 0.0002   Epoch: 38   Global Step: 192460   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:27:43,828-Speed 10386.15 samples/sec   Loss 5.1947   LearningRate 0.0002   Epoch: 38   Global Step: 192470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:27:44,904-Speed 9527.89 samples/sec   Loss 5.1845   LearningRate 0.0002   Epoch: 38   Global Step: 192480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:27:45,911-Speed 10185.23 samples/sec   Loss 5.1454   LearningRate 0.0002   Epoch: 38   Global Step: 192490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:27:46,886-Speed 10511.60 samples/sec   Loss 5.0341   LearningRate 0.0002   Epoch: 38   Global Step: 192500   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:27:47,914-Speed 9964.85 samples/sec   Loss 5.0744   LearningRate 0.0002   Epoch: 38   Global Step: 192510   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:27:48,916-Speed 10243.13 samples/sec   Loss 5.0685   LearningRate 0.0002   Epoch: 38   Global Step: 192520   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:27:49,909-Speed 10357.97 samples/sec   Loss 5.1443   LearningRate 0.0002   Epoch: 38   Global Step: 192530   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:27:50,964-Speed 9712.36 samples/sec   Loss 5.0534   LearningRate 0.0002   Epoch: 38   Global Step: 192540   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:27:51,921-Speed 10714.67 samples/sec   Loss 5.2488   LearningRate 0.0002   Epoch: 38   Global Step: 192550   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:27:52,914-Speed 10329.00 samples/sec   Loss 5.1079   LearningRate 0.0002   Epoch: 38   Global Step: 192560   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:27:53,907-Speed 10318.24 samples/sec   Loss 5.0619   LearningRate 0.0002   Epoch: 38   Global Step: 192570   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:27:54,937-Speed 9953.46 samples/sec   Loss 5.0784   LearningRate 0.0002   Epoch: 38   Global Step: 192580   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:27:55,898-Speed 10673.72 samples/sec   Loss 5.1662   LearningRate 0.0002   Epoch: 38   Global Step: 192590   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:27:56,885-Speed 10374.76 samples/sec   Loss 5.1014   LearningRate 0.0002   Epoch: 38   Global Step: 192600   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:27:57,871-Speed 10390.88 samples/sec   Loss 5.1506   LearningRate 0.0002   Epoch: 38   Global Step: 192610   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:27:58,862-Speed 10347.75 samples/sec   Loss 5.1864   LearningRate 0.0002   Epoch: 38   Global Step: 192620   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:27:59,837-Speed 10512.60 samples/sec   Loss 5.1252   LearningRate 0.0002   Epoch: 38   Global Step: 192630   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:28:00,849-Speed 10131.07 samples/sec   Loss 5.0608   LearningRate 0.0002   Epoch: 38   Global Step: 192640   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:28:01,865-Speed 10083.31 samples/sec   Loss 5.1044   LearningRate 0.0002   Epoch: 38   Global Step: 192650   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:28:02,870-Speed 10203.59 samples/sec   Loss 5.0375   LearningRate 0.0002   Epoch: 38   Global Step: 192660   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:28:03,786-Speed 11189.32 samples/sec   Loss 5.1119   LearningRate 0.0002   Epoch: 38   Global Step: 192670   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:28:04,735-Speed 10796.48 samples/sec   Loss 5.0257   LearningRate 0.0002   Epoch: 38   Global Step: 192680   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:28:05,729-Speed 10313.93 samples/sec   Loss 4.9601   LearningRate 0.0002   Epoch: 38   Global Step: 192690   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:28:06,862-Speed 9048.67 samples/sec   Loss 5.0376   LearningRate 0.0002   Epoch: 38   Global Step: 192700   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:28:07,922-Speed 9674.51 samples/sec   Loss 5.1910   LearningRate 0.0002   Epoch: 38   Global Step: 192710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:28:09,093-Speed 8743.95 samples/sec   Loss 5.0373   LearningRate 0.0002   Epoch: 38   Global Step: 192720   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:28:10,082-Speed 10367.26 samples/sec   Loss 5.0934   LearningRate 0.0002   Epoch: 38   Global Step: 192730   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:28:11,106-Speed 10009.12 samples/sec   Loss 5.1019   LearningRate 0.0002   Epoch: 38   Global Step: 192740   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:28:12,067-Speed 10672.15 samples/sec   Loss 5.0573   LearningRate 0.0002   Epoch: 38   Global Step: 192750   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:28:13,149-Speed 9462.75 samples/sec   Loss 5.1148   LearningRate 0.0002   Epoch: 38   Global Step: 192760   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:28:14,212-Speed 9644.41 samples/sec   Loss 5.1418   LearningRate 0.0002   Epoch: 38   Global Step: 192770   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:28:15,187-Speed 10517.30 samples/sec   Loss 5.0352   LearningRate 0.0002   Epoch: 38   Global Step: 192780   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:28:16,179-Speed 10333.68 samples/sec   Loss 5.2264   LearningRate 0.0002   Epoch: 38   Global Step: 192790   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:28:17,096-Speed 11175.29 samples/sec   Loss 5.1789   LearningRate 0.0002   Epoch: 38   Global Step: 192800   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:28:18,160-Speed 9640.58 samples/sec   Loss 5.0907   LearningRate 0.0002   Epoch: 38   Global Step: 192810   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:28:19,170-Speed 10144.75 samples/sec   Loss 5.1745   LearningRate 0.0002   Epoch: 38   Global Step: 192820   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:28:20,148-Speed 10486.95 samples/sec   Loss 5.1774   LearningRate 0.0002   Epoch: 38   Global Step: 192830   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:28:21,129-Speed 10442.63 samples/sec   Loss 5.1552   LearningRate 0.0002   Epoch: 38   Global Step: 192840   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:28:22,218-Speed 9412.36 samples/sec   Loss 5.1572   LearningRate 0.0002   Epoch: 38   Global Step: 192850   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:28:23,178-Speed 10673.21 samples/sec   Loss 5.0504   LearningRate 0.0002   Epoch: 38   Global Step: 192860   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:28:24,135-Speed 10709.96 samples/sec   Loss 5.1676   LearningRate 0.0002   Epoch: 38   Global Step: 192870   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:28:25,108-Speed 10529.43 samples/sec   Loss 5.1090   LearningRate 0.0002   Epoch: 38   Global Step: 192880   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:28:26,119-Speed 10138.32 samples/sec   Loss 5.1619   LearningRate 0.0002   Epoch: 38   Global Step: 192890   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:28:27,113-Speed 10319.70 samples/sec   Loss 5.0954   LearningRate 0.0002   Epoch: 38   Global Step: 192900   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:28:28,032-Speed 11148.92 samples/sec   Loss 5.0960   LearningRate 0.0002   Epoch: 38   Global Step: 192910   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:28:29,038-Speed 10190.56 samples/sec   Loss 5.1248   LearningRate 0.0002   Epoch: 38   Global Step: 192920   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:28:30,038-Speed 10248.33 samples/sec   Loss 5.1899   LearningRate 0.0002   Epoch: 38   Global Step: 192930   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:28:31,017-Speed 10476.72 samples/sec   Loss 5.1972   LearningRate 0.0002   Epoch: 38   Global Step: 192940   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:28:32,019-Speed 10229.17 samples/sec   Loss 5.1883   LearningRate 0.0002   Epoch: 38   Global Step: 192950   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:28:33,024-Speed 10194.27 samples/sec   Loss 4.9824   LearningRate 0.0002   Epoch: 38   Global Step: 192960   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:28:34,004-Speed 10463.60 samples/sec   Loss 5.1034   LearningRate 0.0002   Epoch: 38   Global Step: 192970   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:28:34,992-Speed 10380.47 samples/sec   Loss 5.0857   LearningRate 0.0002   Epoch: 38   Global Step: 192980   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:28:35,995-Speed 10213.02 samples/sec   Loss 5.1364   LearningRate 0.0002   Epoch: 38   Global Step: 192990   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:28:36,995-Speed 10257.94 samples/sec   Loss 5.2875   LearningRate 0.0002   Epoch: 38   Global Step: 193000   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:28:37,980-Speed 10403.34 samples/sec   Loss 5.1279   LearningRate 0.0002   Epoch: 38   Global Step: 193010   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:28:39,016-Speed 9895.78 samples/sec   Loss 5.2195   LearningRate 0.0002   Epoch: 38   Global Step: 193020   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:28:40,021-Speed 10201.11 samples/sec   Loss 5.0392   LearningRate 0.0002   Epoch: 38   Global Step: 193030   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:28:41,040-Speed 10054.40 samples/sec   Loss 5.0265   LearningRate 0.0002   Epoch: 38   Global Step: 193040   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:28:42,043-Speed 10222.92 samples/sec   Loss 5.1289   LearningRate 0.0002   Epoch: 38   Global Step: 193050   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:28:43,007-Speed 10638.17 samples/sec   Loss 5.1825   LearningRate 0.0002   Epoch: 38   Global Step: 193060   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:28:44,003-Speed 10293.99 samples/sec   Loss 5.0611   LearningRate 0.0002   Epoch: 38   Global Step: 193070   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:28:45,040-Speed 9890.52 samples/sec   Loss 5.1358   LearningRate 0.0002   Epoch: 38   Global Step: 193080   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:28:45,995-Speed 10729.68 samples/sec   Loss 5.1434   LearningRate 0.0002   Epoch: 38   Global Step: 193090   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:28:46,992-Speed 10282.15 samples/sec   Loss 5.0235   LearningRate 0.0002   Epoch: 38   Global Step: 193100   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:28:47,973-Speed 10441.34 samples/sec   Loss 5.0230   LearningRate 0.0002   Epoch: 38   Global Step: 193110   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:28:49,000-Speed 9982.59 samples/sec   Loss 5.1298   LearningRate 0.0002   Epoch: 38   Global Step: 193120   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:28:49,949-Speed 10802.66 samples/sec   Loss 5.1191   LearningRate 0.0002   Epoch: 38   Global Step: 193130   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:28:50,964-Speed 10095.09 samples/sec   Loss 5.1465   LearningRate 0.0002   Epoch: 38   Global Step: 193140   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:28:51,957-Speed 10325.61 samples/sec   Loss 5.0946   LearningRate 0.0002   Epoch: 38   Global Step: 193150   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:28:52,946-Speed 10368.82 samples/sec   Loss 5.1495   LearningRate 0.0002   Epoch: 38   Global Step: 193160   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:28:53,898-Speed 10760.14 samples/sec   Loss 5.1946   LearningRate 0.0002   Epoch: 38   Global Step: 193170   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:28:54,898-Speed 10252.63 samples/sec   Loss 5.2743   LearningRate 0.0002   Epoch: 38   Global Step: 193180   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:28:55,878-Speed 10454.74 samples/sec   Loss 5.1250   LearningRate 0.0002   Epoch: 38   Global Step: 193190   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:28:56,864-Speed 10402.73 samples/sec   Loss 5.0875   LearningRate 0.0002   Epoch: 38   Global Step: 193200   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:28:57,841-Speed 10490.26 samples/sec   Loss 5.1630   LearningRate 0.0002   Epoch: 38   Global Step: 193210   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:28:58,878-Speed 9887.72 samples/sec   Loss 4.9691   LearningRate 0.0002   Epoch: 38   Global Step: 193220   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:28:59,873-Speed 10304.90 samples/sec   Loss 4.9734   LearningRate 0.0002   Epoch: 38   Global Step: 193230   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:29:00,897-Speed 10007.83 samples/sec   Loss 5.1388   LearningRate 0.0002   Epoch: 38   Global Step: 193240   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:29:01,946-Speed 9779.54 samples/sec   Loss 5.0886   LearningRate 0.0002   Epoch: 38   Global Step: 193250   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:29:02,920-Speed 10516.25 samples/sec   Loss 5.0473   LearningRate 0.0002   Epoch: 38   Global Step: 193260   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:29:03,919-Speed 10266.24 samples/sec   Loss 5.1660   LearningRate 0.0002   Epoch: 38   Global Step: 193270   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:29:04,965-Speed 9791.25 samples/sec   Loss 5.0352   LearningRate 0.0002   Epoch: 38   Global Step: 193280   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:29:05,926-Speed 10679.18 samples/sec   Loss 5.0451   LearningRate 0.0002   Epoch: 38   Global Step: 193290   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:29:06,880-Speed 10744.72 samples/sec   Loss 5.0778   LearningRate 0.0002   Epoch: 38   Global Step: 193300   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:29:07,872-Speed 10329.65 samples/sec   Loss 5.1681   LearningRate 0.0002   Epoch: 38   Global Step: 193310   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:29:08,886-Speed 10116.61 samples/sec   Loss 5.0766   LearningRate 0.0002   Epoch: 38   Global Step: 193320   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:29:09,878-Speed 10323.44 samples/sec   Loss 5.0856   LearningRate 0.0002   Epoch: 38   Global Step: 193330   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:29:10,884-Speed 10195.38 samples/sec   Loss 5.0831   LearningRate 0.0002   Epoch: 38   Global Step: 193340   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:29:11,937-Speed 9731.29 samples/sec   Loss 5.0319   LearningRate 0.0002   Epoch: 38   Global Step: 193350   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:29:12,931-Speed 10327.60 samples/sec   Loss 5.1008   LearningRate 0.0002   Epoch: 38   Global Step: 193360   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:29:13,909-Speed 10475.27 samples/sec   Loss 5.0249   LearningRate 0.0002   Epoch: 38   Global Step: 193370   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:29:14,914-Speed 10194.62 samples/sec   Loss 5.1816   LearningRate 0.0002   Epoch: 38   Global Step: 193380   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:29:15,903-Speed 10368.23 samples/sec   Loss 5.1585   LearningRate 0.0002   Epoch: 38   Global Step: 193390   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:29:16,893-Speed 10347.55 samples/sec   Loss 5.1735   LearningRate 0.0002   Epoch: 38   Global Step: 193400   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:29:17,904-Speed 10149.51 samples/sec   Loss 5.1978   LearningRate 0.0002   Epoch: 38   Global Step: 193410   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:29:18,912-Speed 10162.32 samples/sec   Loss 5.1252   LearningRate 0.0002   Epoch: 38   Global Step: 193420   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:29:19,938-Speed 9988.06 samples/sec   Loss 5.0872   LearningRate 0.0002   Epoch: 38   Global Step: 193430   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:29:20,961-Speed 10018.82 samples/sec   Loss 5.1518   LearningRate 0.0002   Epoch: 38   Global Step: 193440   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:29:21,959-Speed 10268.64 samples/sec   Loss 5.2057   LearningRate 0.0002   Epoch: 38   Global Step: 193450   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:29:23,012-Speed 9737.32 samples/sec   Loss 5.1048   LearningRate 0.0002   Epoch: 38   Global Step: 193460   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:29:24,017-Speed 10191.51 samples/sec   Loss 5.1204   LearningRate 0.0002   Epoch: 38   Global Step: 193470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:29:24,960-Speed 10874.46 samples/sec   Loss 5.0986   LearningRate 0.0002   Epoch: 38   Global Step: 193480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:29:25,960-Speed 10256.86 samples/sec   Loss 5.1800   LearningRate 0.0002   Epoch: 38   Global Step: 193490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:29:26,941-Speed 10446.01 samples/sec   Loss 5.0531   LearningRate 0.0002   Epoch: 38   Global Step: 193500   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:29:27,961-Speed 10059.32 samples/sec   Loss 5.2480   LearningRate 0.0002   Epoch: 38   Global Step: 193510   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:29:28,951-Speed 10356.24 samples/sec   Loss 5.0797   LearningRate 0.0002   Epoch: 38   Global Step: 193520   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:29:29,965-Speed 10105.89 samples/sec   Loss 5.1021   LearningRate 0.0002   Epoch: 38   Global Step: 193530   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:29:30,945-Speed 10457.76 samples/sec   Loss 4.9806   LearningRate 0.0002   Epoch: 38   Global Step: 193540   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:29:31,924-Speed 10468.86 samples/sec   Loss 5.0772   LearningRate 0.0002   Epoch: 38   Global Step: 193550   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:29:32,872-Speed 10822.14 samples/sec   Loss 5.2645   LearningRate 0.0002   Epoch: 38   Global Step: 193560   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:29:33,850-Speed 10477.10 samples/sec   Loss 5.0472   LearningRate 0.0002   Epoch: 38   Global Step: 193570   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:29:34,809-Speed 10679.58 samples/sec   Loss 5.1044   LearningRate 0.0002   Epoch: 38   Global Step: 193580   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:29:35,792-Speed 10429.44 samples/sec   Loss 5.1139   LearningRate 0.0002   Epoch: 38   Global Step: 193590   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:29:36,767-Speed 10512.24 samples/sec   Loss 5.0519   LearningRate 0.0002   Epoch: 38   Global Step: 193600   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:29:37,764-Speed 10280.57 samples/sec   Loss 5.0954   LearningRate 0.0002   Epoch: 38   Global Step: 193610   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:29:38,758-Speed 10319.48 samples/sec   Loss 5.1025   LearningRate 0.0002   Epoch: 38   Global Step: 193620   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:29:39,737-Speed 10469.82 samples/sec   Loss 5.0843   LearningRate 0.0002   Epoch: 38   Global Step: 193630   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:29:40,776-Speed 9857.74 samples/sec   Loss 5.0400   LearningRate 0.0002   Epoch: 38   Global Step: 193640   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:29:41,754-Speed 10489.96 samples/sec   Loss 5.0723   LearningRate 0.0002   Epoch: 38   Global Step: 193650   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:29:42,713-Speed 10681.30 samples/sec   Loss 5.0635   LearningRate 0.0002   Epoch: 38   Global Step: 193660   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:29:43,719-Speed 10191.16 samples/sec   Loss 4.9946   LearningRate 0.0002   Epoch: 38   Global Step: 193670   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:29:44,710-Speed 10344.12 samples/sec   Loss 4.9980   LearningRate 0.0002   Epoch: 38   Global Step: 193680   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:29:45,663-Speed 10755.11 samples/sec   Loss 5.1632   LearningRate 0.0002   Epoch: 38   Global Step: 193690   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:29:46,661-Speed 10274.28 samples/sec   Loss 5.1034   LearningRate 0.0002   Epoch: 38   Global Step: 193700   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:29:47,666-Speed 10201.65 samples/sec   Loss 5.2434   LearningRate 0.0002   Epoch: 38   Global Step: 193710   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:29:48,662-Speed 10288.32 samples/sec   Loss 5.0837   LearningRate 0.0002   Epoch: 38   Global Step: 193720   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:29:49,662-Speed 10248.96 samples/sec   Loss 5.1118   LearningRate 0.0002   Epoch: 38   Global Step: 193730   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:29:50,725-Speed 9642.78 samples/sec   Loss 5.1369   LearningRate 0.0002   Epoch: 38   Global Step: 193740   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:29:51,740-Speed 10099.14 samples/sec   Loss 5.1679   LearningRate 0.0002   Epoch: 38   Global Step: 193750   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:29:52,714-Speed 10516.25 samples/sec   Loss 5.2287   LearningRate 0.0002   Epoch: 38   Global Step: 193760   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:29:53,713-Speed 10260.93 samples/sec   Loss 4.9028   LearningRate 0.0002   Epoch: 38   Global Step: 193770   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:29:54,683-Speed 10569.98 samples/sec   Loss 5.0036   LearningRate 0.0002   Epoch: 38   Global Step: 193780   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:29:55,676-Speed 10323.15 samples/sec   Loss 5.1261   LearningRate 0.0002   Epoch: 38   Global Step: 193790   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:29:56,669-Speed 10324.71 samples/sec   Loss 4.9770   LearningRate 0.0002   Epoch: 38   Global Step: 193800   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:29:57,628-Speed 10684.77 samples/sec   Loss 5.1296   LearningRate 0.0002   Epoch: 38   Global Step: 193810   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:29:58,689-Speed 9662.85 samples/sec   Loss 5.0040   LearningRate 0.0002   Epoch: 38   Global Step: 193820   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:29:59,637-Speed 10810.60 samples/sec   Loss 5.0478   LearningRate 0.0002   Epoch: 38   Global Step: 193830   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:30:00,648-Speed 10135.77 samples/sec   Loss 5.1159   LearningRate 0.0002   Epoch: 38   Global Step: 193840   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:30:01,656-Speed 10167.59 samples/sec   Loss 5.2435   LearningRate 0.0002   Epoch: 38   Global Step: 193850   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:30:02,650-Speed 10315.67 samples/sec   Loss 5.0441   LearningRate 0.0002   Epoch: 38   Global Step: 193860   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:30:03,705-Speed 9715.20 samples/sec   Loss 5.1137   LearningRate 0.0002   Epoch: 38   Global Step: 193870   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:30:04,673-Speed 10585.33 samples/sec   Loss 5.0669   LearningRate 0.0002   Epoch: 38   Global Step: 193880   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:30:05,671-Speed 10268.82 samples/sec   Loss 5.2158   LearningRate 0.0002   Epoch: 38   Global Step: 193890   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:30:06,640-Speed 10578.58 samples/sec   Loss 5.1559   LearningRate 0.0002   Epoch: 38   Global Step: 193900   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:30:07,632-Speed 10328.16 samples/sec   Loss 5.1495   LearningRate 0.0002   Epoch: 38   Global Step: 193910   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:30:08,654-Speed 10032.09 samples/sec   Loss 5.1277   LearningRate 0.0002   Epoch: 38   Global Step: 193920   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:30:09,686-Speed 9935.38 samples/sec   Loss 5.1003   LearningRate 0.0002   Epoch: 38   Global Step: 193930   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:30:10,717-Speed 9940.41 samples/sec   Loss 5.0998   LearningRate 0.0002   Epoch: 38   Global Step: 193940   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:30:11,785-Speed 9596.76 samples/sec   Loss 5.0835   LearningRate 0.0002   Epoch: 38   Global Step: 193950   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:30:12,760-Speed 10517.61 samples/sec   Loss 5.1107   LearningRate 0.0002   Epoch: 38   Global Step: 193960   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:30:13,729-Speed 10577.51 samples/sec   Loss 5.0691   LearningRate 0.0002   Epoch: 38   Global Step: 193970   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:30:14,735-Speed 10184.91 samples/sec   Loss 5.1710   LearningRate 0.0002   Epoch: 38   Global Step: 193980   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:30:15,748-Speed 10122.04 samples/sec   Loss 5.0520   LearningRate 0.0002   Epoch: 38   Global Step: 193990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:30:16,729-Speed 10448.96 samples/sec   Loss 5.1569   LearningRate 0.0002   Epoch: 38   Global Step: 194000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:30:39,279-[lfw][194000]XNorm: 7.987350
Training: 2022-04-11 06:30:39,279-[lfw][194000]Accuracy-Flip: 0.99617+-0.00317
Training: 2022-04-11 06:30:39,280-[lfw][194000]Accuracy-Highest: 0.99700
Training: 2022-04-11 06:31:04,584-[cfp_fp][194000]XNorm: 6.905072
Training: 2022-04-11 06:31:04,585-[cfp_fp][194000]Accuracy-Flip: 0.97143+-0.00874
Training: 2022-04-11 06:31:04,586-[cfp_fp][194000]Accuracy-Highest: 0.97371
Training: 2022-04-11 06:31:26,551-[agedb_30][194000]XNorm: 7.801820
Training: 2022-04-11 06:31:26,552-[agedb_30][194000]Accuracy-Flip: 0.97300+-0.00653
Training: 2022-04-11 06:31:26,552-[agedb_30][194000]Accuracy-Highest: 0.97350
Training: 2022-04-11 06:31:27,540-Speed 144.61 samples/sec   Loss 5.1946   LearningRate 0.0002   Epoch: 38   Global Step: 194010   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-11 06:31:28,502-Speed 10661.87 samples/sec   Loss 5.1217   LearningRate 0.0002   Epoch: 38   Global Step: 194020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:31:29,465-Speed 10642.19 samples/sec   Loss 5.1670   LearningRate 0.0002   Epoch: 38   Global Step: 194030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:31:30,492-Speed 9977.43 samples/sec   Loss 5.0419   LearningRate 0.0002   Epoch: 38   Global Step: 194040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:31:31,518-Speed 9995.37 samples/sec   Loss 5.2474   LearningRate 0.0002   Epoch: 38   Global Step: 194050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:31:32,515-Speed 10286.17 samples/sec   Loss 5.2816   LearningRate 0.0002   Epoch: 38   Global Step: 194060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:31:33,477-Speed 10657.69 samples/sec   Loss 5.1070   LearningRate 0.0002   Epoch: 38   Global Step: 194070   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:31:34,435-Speed 10919.96 samples/sec   Loss 5.1496   LearningRate 0.0002   Epoch: 38   Global Step: 194080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:31:35,448-Speed 10119.68 samples/sec   Loss 5.0117   LearningRate 0.0002   Epoch: 38   Global Step: 194090   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:31:36,410-Speed 10668.54 samples/sec   Loss 5.0062   LearningRate 0.0002   Epoch: 38   Global Step: 194100   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:31:37,359-Speed 10801.36 samples/sec   Loss 5.0896   LearningRate 0.0002   Epoch: 38   Global Step: 194110   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:31:38,378-Speed 10061.32 samples/sec   Loss 5.1027   LearningRate 0.0002   Epoch: 38   Global Step: 194120   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:31:39,398-Speed 10042.95 samples/sec   Loss 5.2179   LearningRate 0.0002   Epoch: 38   Global Step: 194130   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:31:40,385-Speed 10392.14 samples/sec   Loss 5.1068   LearningRate 0.0002   Epoch: 38   Global Step: 194140   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:31:41,350-Speed 10615.99 samples/sec   Loss 5.0882   LearningRate 0.0002   Epoch: 38   Global Step: 194150   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:31:42,403-Speed 9729.89 samples/sec   Loss 4.8847   LearningRate 0.0002   Epoch: 38   Global Step: 194160   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:31:43,416-Speed 10124.99 samples/sec   Loss 5.3020   LearningRate 0.0002   Epoch: 38   Global Step: 194170   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:31:44,394-Speed 10474.51 samples/sec   Loss 5.1165   LearningRate 0.0002   Epoch: 38   Global Step: 194180   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:31:45,363-Speed 10576.06 samples/sec   Loss 5.0543   LearningRate 0.0002   Epoch: 38   Global Step: 194190   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:31:46,366-Speed 10228.42 samples/sec   Loss 5.1376   LearningRate 0.0002   Epoch: 38   Global Step: 194200   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:31:47,443-Speed 9509.14 samples/sec   Loss 5.0700   LearningRate 0.0002   Epoch: 38   Global Step: 194210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:31:48,445-Speed 10233.10 samples/sec   Loss 5.0570   LearningRate 0.0002   Epoch: 38   Global Step: 194220   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:31:49,480-Speed 9901.40 samples/sec   Loss 5.1324   LearningRate 0.0002   Epoch: 38   Global Step: 194230   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:31:50,529-Speed 9777.07 samples/sec   Loss 5.1619   LearningRate 0.0002   Epoch: 38   Global Step: 194240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:31:51,531-Speed 10230.09 samples/sec   Loss 5.2312   LearningRate 0.0002   Epoch: 38   Global Step: 194250   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:31:52,520-Speed 10360.23 samples/sec   Loss 5.0488   LearningRate 0.0002   Epoch: 38   Global Step: 194260   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:31:53,532-Speed 10123.27 samples/sec   Loss 5.1082   LearningRate 0.0002   Epoch: 38   Global Step: 194270   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:31:54,521-Speed 10365.52 samples/sec   Loss 5.1206   LearningRate 0.0002   Epoch: 38   Global Step: 194280   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:31:55,512-Speed 10342.74 samples/sec   Loss 5.2609   LearningRate 0.0002   Epoch: 38   Global Step: 194290   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:31:56,495-Speed 10428.13 samples/sec   Loss 5.0545   LearningRate 0.0002   Epoch: 38   Global Step: 194300   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:31:57,448-Speed 10751.60 samples/sec   Loss 5.0832   LearningRate 0.0002   Epoch: 38   Global Step: 194310   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:31:58,390-Speed 10882.95 samples/sec   Loss 5.0700   LearningRate 0.0002   Epoch: 38   Global Step: 194320   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:31:59,379-Speed 10357.38 samples/sec   Loss 5.0929   LearningRate 0.0002   Epoch: 38   Global Step: 194330   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:32:00,420-Speed 9850.19 samples/sec   Loss 5.0944   LearningRate 0.0002   Epoch: 38   Global Step: 194340   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:32:01,369-Speed 10805.51 samples/sec   Loss 5.1791   LearningRate 0.0002   Epoch: 38   Global Step: 194350   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:32:02,351-Speed 10432.43 samples/sec   Loss 5.1308   LearningRate 0.0002   Epoch: 38   Global Step: 194360   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:32:03,408-Speed 9695.62 samples/sec   Loss 5.2089   LearningRate 0.0002   Epoch: 38   Global Step: 194370   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:32:04,402-Speed 10311.02 samples/sec   Loss 5.1497   LearningRate 0.0002   Epoch: 38   Global Step: 194380   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:32:05,375-Speed 10535.25 samples/sec   Loss 5.0389   LearningRate 0.0002   Epoch: 38   Global Step: 194390   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:32:06,343-Speed 10582.50 samples/sec   Loss 5.2069   LearningRate 0.0002   Epoch: 38   Global Step: 194400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:32:07,346-Speed 10216.30 samples/sec   Loss 5.0253   LearningRate 0.0002   Epoch: 38   Global Step: 194410   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:32:08,387-Speed 9850.46 samples/sec   Loss 5.1495   LearningRate 0.0002   Epoch: 38   Global Step: 194420   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:32:09,393-Speed 10187.77 samples/sec   Loss 5.1164   LearningRate 0.0002   Epoch: 38   Global Step: 194430   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:32:10,388-Speed 10303.84 samples/sec   Loss 5.1757   LearningRate 0.0002   Epoch: 38   Global Step: 194440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:32:11,403-Speed 10096.27 samples/sec   Loss 5.1854   LearningRate 0.0002   Epoch: 38   Global Step: 194450   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:32:12,399-Speed 10280.98 samples/sec   Loss 5.1843   LearningRate 0.0002   Epoch: 38   Global Step: 194460   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:32:13,426-Speed 9986.32 samples/sec   Loss 5.1829   LearningRate 0.0002   Epoch: 38   Global Step: 194470   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:32:14,385-Speed 10683.08 samples/sec   Loss 5.2126   LearningRate 0.0002   Epoch: 38   Global Step: 194480   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:32:15,370-Speed 10405.81 samples/sec   Loss 5.1241   LearningRate 0.0001   Epoch: 38   Global Step: 194490   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:32:16,382-Speed 10124.97 samples/sec   Loss 5.1266   LearningRate 0.0001   Epoch: 38   Global Step: 194500   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:32:17,436-Speed 9722.02 samples/sec   Loss 5.0565   LearningRate 0.0001   Epoch: 38   Global Step: 194510   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:32:18,425-Speed 10361.31 samples/sec   Loss 5.0686   LearningRate 0.0001   Epoch: 38   Global Step: 194520   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:32:19,423-Speed 10275.03 samples/sec   Loss 5.0910   LearningRate 0.0001   Epoch: 38   Global Step: 194530   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:32:20,389-Speed 10606.44 samples/sec   Loss 5.1592   LearningRate 0.0001   Epoch: 38   Global Step: 194540   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:32:21,371-Speed 10435.70 samples/sec   Loss 5.1863   LearningRate 0.0001   Epoch: 38   Global Step: 194550   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:32:22,351-Speed 10466.40 samples/sec   Loss 5.0110   LearningRate 0.0001   Epoch: 38   Global Step: 194560   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:32:23,301-Speed 10784.46 samples/sec   Loss 5.1732   LearningRate 0.0001   Epoch: 38   Global Step: 194570   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:32:24,280-Speed 10659.70 samples/sec   Loss 5.0965   LearningRate 0.0001   Epoch: 38   Global Step: 194580   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:32:25,320-Speed 9858.54 samples/sec   Loss 5.0608   LearningRate 0.0001   Epoch: 38   Global Step: 194590   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:32:26,322-Speed 10231.46 samples/sec   Loss 5.1353   LearningRate 0.0001   Epoch: 38   Global Step: 194600   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:32:27,306-Speed 10414.38 samples/sec   Loss 5.1129   LearningRate 0.0001   Epoch: 38   Global Step: 194610   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:32:28,370-Speed 9634.46 samples/sec   Loss 5.2204   LearningRate 0.0001   Epoch: 38   Global Step: 194620   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:32:29,367-Speed 10277.27 samples/sec   Loss 5.1834   LearningRate 0.0001   Epoch: 38   Global Step: 194630   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:32:30,356-Speed 10360.09 samples/sec   Loss 5.0380   LearningRate 0.0001   Epoch: 38   Global Step: 194640   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:32:31,328-Speed 10543.71 samples/sec   Loss 5.1640   LearningRate 0.0001   Epoch: 38   Global Step: 194650   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:32:32,383-Speed 9722.70 samples/sec   Loss 5.1224   LearningRate 0.0001   Epoch: 38   Global Step: 194660   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:32:33,375-Speed 10337.54 samples/sec   Loss 5.1649   LearningRate 0.0001   Epoch: 38   Global Step: 194670   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:32:34,358-Speed 10431.70 samples/sec   Loss 5.0852   LearningRate 0.0001   Epoch: 38   Global Step: 194680   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:32:35,387-Speed 9958.24 samples/sec   Loss 5.0448   LearningRate 0.0001   Epoch: 38   Global Step: 194690   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:32:36,366-Speed 10465.98 samples/sec   Loss 4.9858   LearningRate 0.0001   Epoch: 38   Global Step: 194700   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:32:37,375-Speed 10167.56 samples/sec   Loss 5.2222   LearningRate 0.0001   Epoch: 38   Global Step: 194710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:32:38,388-Speed 10125.72 samples/sec   Loss 5.1997   LearningRate 0.0001   Epoch: 38   Global Step: 194720   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:32:39,357-Speed 10578.20 samples/sec   Loss 5.1404   LearningRate 0.0001   Epoch: 38   Global Step: 194730   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:32:40,396-Speed 9860.26 samples/sec   Loss 5.2501   LearningRate 0.0001   Epoch: 38   Global Step: 194740   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:32:41,399-Speed 10211.50 samples/sec   Loss 5.2336   LearningRate 0.0001   Epoch: 38   Global Step: 194750   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:32:42,357-Speed 10698.55 samples/sec   Loss 5.0306   LearningRate 0.0001   Epoch: 38   Global Step: 194760   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:32:43,344-Speed 10385.97 samples/sec   Loss 5.0122   LearningRate 0.0001   Epoch: 38   Global Step: 194770   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:32:44,304-Speed 10677.90 samples/sec   Loss 5.2270   LearningRate 0.0001   Epoch: 38   Global Step: 194780   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:32:45,298-Speed 10313.65 samples/sec   Loss 5.2156   LearningRate 0.0001   Epoch: 38   Global Step: 194790   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:32:46,283-Speed 10410.47 samples/sec   Loss 5.0594   LearningRate 0.0001   Epoch: 38   Global Step: 194800   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:32:47,247-Speed 10636.66 samples/sec   Loss 5.1761   LearningRate 0.0001   Epoch: 38   Global Step: 194810   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:32:48,379-Speed 9045.72 samples/sec   Loss 5.0004   LearningRate 0.0001   Epoch: 38   Global Step: 194820   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:32:49,377-Speed 10274.60 samples/sec   Loss 5.1022   LearningRate 0.0001   Epoch: 38   Global Step: 194830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:32:50,343-Speed 10609.50 samples/sec   Loss 5.2727   LearningRate 0.0001   Epoch: 38   Global Step: 194840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:32:51,360-Speed 10077.02 samples/sec   Loss 5.0328   LearningRate 0.0001   Epoch: 38   Global Step: 194850   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:32:52,391-Speed 10139.85 samples/sec   Loss 5.1259   LearningRate 0.0001   Epoch: 38   Global Step: 194860   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:32:53,354-Speed 10644.35 samples/sec   Loss 5.1264   LearningRate 0.0001   Epoch: 38   Global Step: 194870   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:32:54,352-Speed 10271.82 samples/sec   Loss 5.0166   LearningRate 0.0001   Epoch: 38   Global Step: 194880   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:32:55,350-Speed 10271.99 samples/sec   Loss 5.0811   LearningRate 0.0001   Epoch: 38   Global Step: 194890   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:32:56,285-Speed 11056.44 samples/sec   Loss 5.1617   LearningRate 0.0001   Epoch: 38   Global Step: 194900   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:32:57,400-Speed 9194.96 samples/sec   Loss 5.2727   LearningRate 0.0001   Epoch: 38   Global Step: 194910   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:32:58,376-Speed 10498.71 samples/sec   Loss 5.1331   LearningRate 0.0001   Epoch: 38   Global Step: 194920   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:32:59,361-Speed 10413.51 samples/sec   Loss 5.1141   LearningRate 0.0001   Epoch: 38   Global Step: 194930   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:00,401-Speed 9856.43 samples/sec   Loss 5.1027   LearningRate 0.0001   Epoch: 38   Global Step: 194940   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:01,412-Speed 10141.56 samples/sec   Loss 5.2757   LearningRate 0.0001   Epoch: 38   Global Step: 194950   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:02,412-Speed 10241.95 samples/sec   Loss 5.1327   LearningRate 0.0001   Epoch: 38   Global Step: 194960   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:03,401-Speed 10365.68 samples/sec   Loss 5.1778   LearningRate 0.0001   Epoch: 38   Global Step: 194970   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:04,382-Speed 10443.65 samples/sec   Loss 5.0917   LearningRate 0.0001   Epoch: 38   Global Step: 194980   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:05,386-Speed 10216.08 samples/sec   Loss 5.1099   LearningRate 0.0001   Epoch: 38   Global Step: 194990   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:06,350-Speed 10635.43 samples/sec   Loss 5.1124   LearningRate 0.0001   Epoch: 38   Global Step: 195000   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:07,314-Speed 10632.14 samples/sec   Loss 5.2181   LearningRate 0.0001   Epoch: 38   Global Step: 195010   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:08,351-Speed 9884.86 samples/sec   Loss 5.0429   LearningRate 0.0001   Epoch: 38   Global Step: 195020   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:09,351-Speed 10259.44 samples/sec   Loss 5.1200   LearningRate 0.0001   Epoch: 38   Global Step: 195030   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:10,350-Speed 10253.09 samples/sec   Loss 5.0667   LearningRate 0.0001   Epoch: 38   Global Step: 195040   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:11,397-Speed 9798.30 samples/sec   Loss 5.1371   LearningRate 0.0001   Epoch: 38   Global Step: 195050   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:12,352-Speed 10727.87 samples/sec   Loss 5.0567   LearningRate 0.0001   Epoch: 38   Global Step: 195060   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:13,352-Speed 10250.65 samples/sec   Loss 5.0066   LearningRate 0.0001   Epoch: 38   Global Step: 195070   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:33:14,324-Speed 10555.77 samples/sec   Loss 5.1584   LearningRate 0.0001   Epoch: 38   Global Step: 195080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:33:15,359-Speed 9905.15 samples/sec   Loss 5.2046   LearningRate 0.0001   Epoch: 38   Global Step: 195090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:33:16,373-Speed 10112.75 samples/sec   Loss 5.0749   LearningRate 0.0001   Epoch: 38   Global Step: 195100   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:33:17,355-Speed 10444.56 samples/sec   Loss 5.2442   LearningRate 0.0001   Epoch: 38   Global Step: 195110   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:33:18,343-Speed 10369.36 samples/sec   Loss 5.1451   LearningRate 0.0001   Epoch: 38   Global Step: 195120   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:33:19,345-Speed 10231.44 samples/sec   Loss 5.1080   LearningRate 0.0001   Epoch: 38   Global Step: 195130   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:33:20,313-Speed 10582.55 samples/sec   Loss 5.1249   LearningRate 0.0001   Epoch: 38   Global Step: 195140   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:33:21,266-Speed 10755.61 samples/sec   Loss 5.1517   LearningRate 0.0001   Epoch: 38   Global Step: 195150   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:33:22,271-Speed 10207.98 samples/sec   Loss 5.0761   LearningRate 0.0001   Epoch: 38   Global Step: 195160   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:33:23,241-Speed 10560.72 samples/sec   Loss 5.0714   LearningRate 0.0001   Epoch: 38   Global Step: 195170   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:33:24,326-Speed 9445.66 samples/sec   Loss 5.0075   LearningRate 0.0001   Epoch: 38   Global Step: 195180   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:33:25,291-Speed 10629.44 samples/sec   Loss 5.0758   LearningRate 0.0001   Epoch: 38   Global Step: 195190   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:26,290-Speed 10259.39 samples/sec   Loss 5.0789   LearningRate 0.0001   Epoch: 38   Global Step: 195200   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:27,280-Speed 10350.98 samples/sec   Loss 5.1313   LearningRate 0.0001   Epoch: 38   Global Step: 195210   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:28,292-Speed 10134.27 samples/sec   Loss 5.1512   LearningRate 0.0001   Epoch: 38   Global Step: 195220   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:29,332-Speed 9850.22 samples/sec   Loss 5.1556   LearningRate 0.0001   Epoch: 38   Global Step: 195230   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:30,335-Speed 10221.50 samples/sec   Loss 5.0464   LearningRate 0.0001   Epoch: 38   Global Step: 195240   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:31,361-Speed 9990.53 samples/sec   Loss 5.1583   LearningRate 0.0001   Epoch: 38   Global Step: 195250   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:32,373-Speed 10127.71 samples/sec   Loss 5.2115   LearningRate 0.0001   Epoch: 38   Global Step: 195260   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:33,350-Speed 10497.10 samples/sec   Loss 5.2803   LearningRate 0.0001   Epoch: 38   Global Step: 195270   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:34,317-Speed 10592.61 samples/sec   Loss 5.0771   LearningRate 0.0001   Epoch: 38   Global Step: 195280   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:35,327-Speed 10147.59 samples/sec   Loss 5.1479   LearningRate 0.0001   Epoch: 38   Global Step: 195290   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:33:36,327-Speed 10249.76 samples/sec   Loss 5.2631   LearningRate 0.0001   Epoch: 38   Global Step: 195300   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:33:37,350-Speed 10013.09 samples/sec   Loss 5.0272   LearningRate 0.0001   Epoch: 38   Global Step: 195310   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:38,336-Speed 10403.16 samples/sec   Loss 5.1283   LearningRate 0.0001   Epoch: 38   Global Step: 195320   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:39,322-Speed 10395.33 samples/sec   Loss 5.0659   LearningRate 0.0001   Epoch: 38   Global Step: 195330   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:40,359-Speed 9875.92 samples/sec   Loss 5.1021   LearningRate 0.0001   Epoch: 38   Global Step: 195340   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:41,385-Speed 9988.21 samples/sec   Loss 5.1059   LearningRate 0.0001   Epoch: 38   Global Step: 195350   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:42,383-Speed 10271.50 samples/sec   Loss 5.1968   LearningRate 0.0001   Epoch: 38   Global Step: 195360   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:43,342-Speed 10692.60 samples/sec   Loss 5.0977   LearningRate 0.0001   Epoch: 38   Global Step: 195370   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:44,320-Speed 10483.79 samples/sec   Loss 5.0282   LearningRate 0.0001   Epoch: 38   Global Step: 195380   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:45,267-Speed 10822.00 samples/sec   Loss 5.0035   LearningRate 0.0001   Epoch: 38   Global Step: 195390   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:46,258-Speed 10344.89 samples/sec   Loss 5.0751   LearningRate 0.0001   Epoch: 38   Global Step: 195400   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:47,232-Speed 10520.47 samples/sec   Loss 5.1120   LearningRate 0.0001   Epoch: 38   Global Step: 195410   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:33:48,262-Speed 9955.99 samples/sec   Loss 5.2494   LearningRate 0.0001   Epoch: 38   Global Step: 195420   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:33:49,232-Speed 10562.66 samples/sec   Loss 5.2681   LearningRate 0.0001   Epoch: 38   Global Step: 195430   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:50,239-Speed 10182.14 samples/sec   Loss 5.1632   LearningRate 0.0001   Epoch: 38   Global Step: 195440   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:51,342-Speed 9298.17 samples/sec   Loss 5.0896   LearningRate 0.0001   Epoch: 38   Global Step: 195450   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:52,358-Speed 10093.53 samples/sec   Loss 5.0335   LearningRate 0.0001   Epoch: 38   Global Step: 195460   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:53,350-Speed 10324.47 samples/sec   Loss 4.9564   LearningRate 0.0001   Epoch: 38   Global Step: 195470   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:54,349-Speed 10266.04 samples/sec   Loss 5.1573   LearningRate 0.0001   Epoch: 38   Global Step: 195480   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:55,311-Speed 10661.39 samples/sec   Loss 5.1400   LearningRate 0.0001   Epoch: 38   Global Step: 195490   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:56,274-Speed 10644.14 samples/sec   Loss 5.0791   LearningRate 0.0001   Epoch: 38   Global Step: 195500   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:57,300-Speed 9981.15 samples/sec   Loss 5.0788   LearningRate 0.0001   Epoch: 38   Global Step: 195510   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:58,319-Speed 10059.51 samples/sec   Loss 5.2014   LearningRate 0.0001   Epoch: 38   Global Step: 195520   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:33:59,333-Speed 10116.76 samples/sec   Loss 5.1273   LearningRate 0.0001   Epoch: 38   Global Step: 195530   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:34:00,321-Speed 10363.20 samples/sec   Loss 5.0425   LearningRate 0.0001   Epoch: 38   Global Step: 195540   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:34:01,347-Speed 10007.52 samples/sec   Loss 5.0859   LearningRate 0.0001   Epoch: 38   Global Step: 195550   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:34:02,402-Speed 9713.73 samples/sec   Loss 5.1751   LearningRate 0.0001   Epoch: 38   Global Step: 195560   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:34:03,401-Speed 10259.61 samples/sec   Loss 5.0711   LearningRate 0.0001   Epoch: 38   Global Step: 195570   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:34:04,368-Speed 10606.30 samples/sec   Loss 5.0635   LearningRate 0.0001   Epoch: 38   Global Step: 195580   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:34:05,344-Speed 10514.62 samples/sec   Loss 4.9220   LearningRate 0.0001   Epoch: 38   Global Step: 195590   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:34:06,348-Speed 10206.91 samples/sec   Loss 5.0932   LearningRate 0.0001   Epoch: 38   Global Step: 195600   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:34:07,326-Speed 10481.54 samples/sec   Loss 5.2386   LearningRate 0.0001   Epoch: 38   Global Step: 195610   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:34:08,338-Speed 10122.75 samples/sec   Loss 5.0488   LearningRate 0.0001   Epoch: 38   Global Step: 195620   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:34:09,309-Speed 10559.78 samples/sec   Loss 5.1143   LearningRate 0.0001   Epoch: 38   Global Step: 195630   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:34:10,287-Speed 10475.69 samples/sec   Loss 5.0498   LearningRate 0.0001   Epoch: 38   Global Step: 195640   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:34:11,306-Speed 10063.11 samples/sec   Loss 5.0679   LearningRate 0.0001   Epoch: 38   Global Step: 195650   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:34:12,333-Speed 9975.37 samples/sec   Loss 5.1196   LearningRate 0.0001   Epoch: 38   Global Step: 195660   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:34:13,385-Speed 9744.84 samples/sec   Loss 5.0818   LearningRate 0.0001   Epoch: 38   Global Step: 195670   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:34:14,381-Speed 10295.73 samples/sec   Loss 5.0435   LearningRate 0.0001   Epoch: 38   Global Step: 195680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:34:15,353-Speed 10547.40 samples/sec   Loss 5.1408   LearningRate 0.0001   Epoch: 38   Global Step: 195690   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:34:16,337-Speed 10412.44 samples/sec   Loss 5.1701   LearningRate 0.0001   Epoch: 38   Global Step: 195700   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:34:17,334-Speed 10275.16 samples/sec   Loss 5.1925   LearningRate 0.0001   Epoch: 38   Global Step: 195710   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:34:18,368-Speed 9910.47 samples/sec   Loss 5.1169   LearningRate 0.0001   Epoch: 38   Global Step: 195720   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:34:19,380-Speed 10132.14 samples/sec   Loss 5.0885   LearningRate 0.0001   Epoch: 38   Global Step: 195730   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:34:20,358-Speed 10478.28 samples/sec   Loss 5.0842   LearningRate 0.0001   Epoch: 38   Global Step: 195740   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:34:21,370-Speed 10128.40 samples/sec   Loss 5.0701   LearningRate 0.0001   Epoch: 38   Global Step: 195750   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:34:22,399-Speed 9957.55 samples/sec   Loss 5.1478   LearningRate 0.0001   Epoch: 38   Global Step: 195760   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:34:23,389-Speed 10356.03 samples/sec   Loss 5.1280   LearningRate 0.0001   Epoch: 38   Global Step: 195770   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:34:24,454-Speed 9618.68 samples/sec   Loss 5.1195   LearningRate 0.0001   Epoch: 38   Global Step: 195780   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:34:25,470-Speed 10091.36 samples/sec   Loss 5.1434   LearningRate 0.0001   Epoch: 38   Global Step: 195790   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:34:26,443-Speed 10532.74 samples/sec   Loss 4.9546   LearningRate 0.0001   Epoch: 38   Global Step: 195800   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:34:27,437-Speed 10314.54 samples/sec   Loss 5.1899   LearningRate 0.0001   Epoch: 38   Global Step: 195810   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:34:28,473-Speed 9892.32 samples/sec   Loss 5.1969   LearningRate 0.0001   Epoch: 38   Global Step: 195820   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:34:29,469-Speed 10290.93 samples/sec   Loss 5.1915   LearningRate 0.0001   Epoch: 38   Global Step: 195830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:34:30,554-Speed 9447.28 samples/sec   Loss 5.1987   LearningRate 0.0001   Epoch: 38   Global Step: 195840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:34:31,533-Speed 10479.41 samples/sec   Loss 5.0911   LearningRate 0.0001   Epoch: 38   Global Step: 195850   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:34:32,501-Speed 10585.11 samples/sec   Loss 5.1766   LearningRate 0.0001   Epoch: 38   Global Step: 195860   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:34:33,490-Speed 10363.24 samples/sec   Loss 5.1184   LearningRate 0.0001   Epoch: 38   Global Step: 195870   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:34:34,545-Speed 9715.64 samples/sec   Loss 5.1564   LearningRate 0.0001   Epoch: 38   Global Step: 195880   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:34:35,545-Speed 10250.88 samples/sec   Loss 5.1718   LearningRate 0.0001   Epoch: 38   Global Step: 195890   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:34:36,470-Speed 11079.67 samples/sec   Loss 5.1100   LearningRate 0.0001   Epoch: 38   Global Step: 195900   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:34:37,541-Speed 9566.68 samples/sec   Loss 5.2400   LearningRate 0.0001   Epoch: 38   Global Step: 195910   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:34:38,540-Speed 10269.32 samples/sec   Loss 5.1112   LearningRate 0.0001   Epoch: 38   Global Step: 195920   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:34:39,547-Speed 10189.81 samples/sec   Loss 5.0804   LearningRate 0.0001   Epoch: 38   Global Step: 195930   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:34:40,573-Speed 9981.57 samples/sec   Loss 5.0771   LearningRate 0.0001   Epoch: 38   Global Step: 195940   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:34:41,589-Speed 10094.30 samples/sec   Loss 5.1594   LearningRate 0.0001   Epoch: 38   Global Step: 195950   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:34:42,549-Speed 10677.04 samples/sec   Loss 5.2672   LearningRate 0.0001   Epoch: 38   Global Step: 195960   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:34:43,557-Speed 10172.34 samples/sec   Loss 4.9892   LearningRate 0.0001   Epoch: 38   Global Step: 195970   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:34:44,588-Speed 9947.31 samples/sec   Loss 5.1515   LearningRate 0.0001   Epoch: 38   Global Step: 195980   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:34:45,553-Speed 10625.23 samples/sec   Loss 5.0477   LearningRate 0.0001   Epoch: 38   Global Step: 195990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:34:46,555-Speed 10230.85 samples/sec   Loss 5.0534   LearningRate 0.0001   Epoch: 38   Global Step: 196000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:35:09,207-[lfw][196000]XNorm: 7.974712
Training: 2022-04-11 06:35:09,208-[lfw][196000]Accuracy-Flip: 0.99633+-0.00332
Training: 2022-04-11 06:35:09,208-[lfw][196000]Accuracy-Highest: 0.99700
Training: 2022-04-11 06:35:34,734-[cfp_fp][196000]XNorm: 6.901116
Training: 2022-04-11 06:35:34,735-[cfp_fp][196000]Accuracy-Flip: 0.97286+-0.00945
Training: 2022-04-11 06:35:34,736-[cfp_fp][196000]Accuracy-Highest: 0.97371
Training: 2022-04-11 06:35:56,995-[agedb_30][196000]XNorm: 7.807797
Training: 2022-04-11 06:35:56,996-[agedb_30][196000]Accuracy-Flip: 0.97083+-0.00680
Training: 2022-04-11 06:35:56,996-[agedb_30][196000]Accuracy-Highest: 0.97350
Training: 2022-04-11 06:35:57,945-Speed 143.44 samples/sec   Loss 5.1410   LearningRate 0.0001   Epoch: 38   Global Step: 196010   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:35:58,918-Speed 10535.45 samples/sec   Loss 5.0969   LearningRate 0.0001   Epoch: 38   Global Step: 196020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:35:59,914-Speed 10306.80 samples/sec   Loss 5.0632   LearningRate 0.0001   Epoch: 38   Global Step: 196030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:36:00,864-Speed 10787.98 samples/sec   Loss 5.1575   LearningRate 0.0001   Epoch: 38   Global Step: 196040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:36:01,848-Speed 10408.88 samples/sec   Loss 5.0408   LearningRate 0.0001   Epoch: 38   Global Step: 196050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:36:02,857-Speed 10174.56 samples/sec   Loss 5.1235   LearningRate 0.0001   Epoch: 38   Global Step: 196060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:36:03,843-Speed 10389.23 samples/sec   Loss 5.0287   LearningRate 0.0001   Epoch: 38   Global Step: 196070   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:36:04,824-Speed 10447.02 samples/sec   Loss 5.1471   LearningRate 0.0001   Epoch: 38   Global Step: 196080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:36:05,797-Speed 10541.98 samples/sec   Loss 5.0733   LearningRate 0.0001   Epoch: 38   Global Step: 196090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:36:06,722-Speed 11066.77 samples/sec   Loss 5.1863   LearningRate 0.0001   Epoch: 38   Global Step: 196100   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:36:07,780-Speed 9688.46 samples/sec   Loss 5.1931   LearningRate 0.0001   Epoch: 38   Global Step: 196110   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:36:08,818-Speed 9880.46 samples/sec   Loss 5.0524   LearningRate 0.0001   Epoch: 38   Global Step: 196120   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:36:09,809-Speed 10346.75 samples/sec   Loss 5.1193   LearningRate 0.0001   Epoch: 38   Global Step: 196130   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:36:10,817-Speed 10160.44 samples/sec   Loss 5.2382   LearningRate 0.0001   Epoch: 38   Global Step: 196140   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:36:11,839-Speed 10039.59 samples/sec   Loss 5.1739   LearningRate 0.0001   Epoch: 38   Global Step: 196150   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:36:12,856-Speed 10078.31 samples/sec   Loss 5.0956   LearningRate 0.0001   Epoch: 38   Global Step: 196160   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:36:13,803-Speed 10832.06 samples/sec   Loss 5.1084   LearningRate 0.0001   Epoch: 38   Global Step: 196170   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:36:14,912-Speed 9236.09 samples/sec   Loss 5.1913   LearningRate 0.0001   Epoch: 38   Global Step: 196180   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:36:15,903-Speed 10347.50 samples/sec   Loss 5.1338   LearningRate 0.0001   Epoch: 38   Global Step: 196190   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:36:16,907-Speed 10208.95 samples/sec   Loss 5.0809   LearningRate 0.0001   Epoch: 38   Global Step: 196200   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:36:17,910-Speed 10217.90 samples/sec   Loss 4.9616   LearningRate 0.0001   Epoch: 38   Global Step: 196210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:36:18,889-Speed 10472.29 samples/sec   Loss 5.1123   LearningRate 0.0001   Epoch: 38   Global Step: 196220   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:36:19,876-Speed 10384.45 samples/sec   Loss 5.1707   LearningRate 0.0001   Epoch: 38   Global Step: 196230   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:36:20,861-Speed 10404.19 samples/sec   Loss 5.2318   LearningRate 0.0001   Epoch: 38   Global Step: 196240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:36:21,864-Speed 10226.51 samples/sec   Loss 5.1211   LearningRate 0.0001   Epoch: 38   Global Step: 196250   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:36:22,908-Speed 9812.95 samples/sec   Loss 5.1653   LearningRate 0.0001   Epoch: 38   Global Step: 196260   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:36:23,935-Speed 9979.93 samples/sec   Loss 5.1423   LearningRate 0.0001   Epoch: 38   Global Step: 196270   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:36:24,912-Speed 10500.63 samples/sec   Loss 5.0628   LearningRate 0.0001   Epoch: 38   Global Step: 196280   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:36:25,902-Speed 10353.71 samples/sec   Loss 5.0180   LearningRate 0.0001   Epoch: 38   Global Step: 196290   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:36:26,898-Speed 10285.38 samples/sec   Loss 5.1011   LearningRate 0.0001   Epoch: 38   Global Step: 196300   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:36:27,896-Speed 10261.83 samples/sec   Loss 5.1244   LearningRate 0.0001   Epoch: 38   Global Step: 196310   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:36:28,883-Speed 10392.12 samples/sec   Loss 5.0567   LearningRate 0.0001   Epoch: 38   Global Step: 196320   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:36:29,872-Speed 10387.19 samples/sec   Loss 4.9366   LearningRate 0.0001   Epoch: 38   Global Step: 196330   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:36:30,793-Speed 11139.29 samples/sec   Loss 5.2237   LearningRate 0.0001   Epoch: 38   Global Step: 196340   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:36:31,922-Speed 9081.07 samples/sec   Loss 5.0964   LearningRate 0.0001   Epoch: 38   Global Step: 196350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:36:32,930-Speed 10167.34 samples/sec   Loss 5.0836   LearningRate 0.0001   Epoch: 38   Global Step: 196360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:36:33,940-Speed 10149.71 samples/sec   Loss 5.1496   LearningRate 0.0001   Epoch: 38   Global Step: 196370   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:36:34,931-Speed 10342.70 samples/sec   Loss 5.1862   LearningRate 0.0001   Epoch: 38   Global Step: 196380   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:36:35,942-Speed 10137.92 samples/sec   Loss 5.0902   LearningRate 0.0001   Epoch: 38   Global Step: 196390   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:36:36,951-Speed 10152.25 samples/sec   Loss 4.9844   LearningRate 0.0001   Epoch: 38   Global Step: 196400   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:36:37,948-Speed 10291.68 samples/sec   Loss 5.2579   LearningRate 0.0001   Epoch: 38   Global Step: 196410   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:36:38,951-Speed 10216.49 samples/sec   Loss 5.0852   LearningRate 0.0001   Epoch: 38   Global Step: 196420   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:36:40,003-Speed 9748.07 samples/sec   Loss 4.9663   LearningRate 0.0001   Epoch: 38   Global Step: 196430   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:36:41,023-Speed 10045.12 samples/sec   Loss 5.0577   LearningRate 0.0001   Epoch: 38   Global Step: 196440   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:36:42,037-Speed 10108.06 samples/sec   Loss 5.1764   LearningRate 0.0001   Epoch: 38   Global Step: 196450   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:36:43,036-Speed 10256.17 samples/sec   Loss 5.0896   LearningRate 0.0001   Epoch: 38   Global Step: 196460   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:36:44,068-Speed 9939.06 samples/sec   Loss 5.0076   LearningRate 0.0001   Epoch: 38   Global Step: 196470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:36:45,047-Speed 10479.69 samples/sec   Loss 5.0057   LearningRate 0.0001   Epoch: 38   Global Step: 196480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:36:46,046-Speed 10254.18 samples/sec   Loss 5.1799   LearningRate 0.0001   Epoch: 38   Global Step: 196490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:36:47,029-Speed 10422.94 samples/sec   Loss 5.0718   LearningRate 0.0001   Epoch: 38   Global Step: 196500   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:36:48,101-Speed 9557.61 samples/sec   Loss 5.0639   LearningRate 0.0001   Epoch: 38   Global Step: 196510   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:36:49,086-Speed 10418.71 samples/sec   Loss 5.2295   LearningRate 0.0001   Epoch: 38   Global Step: 196520   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:36:50,043-Speed 10708.21 samples/sec   Loss 5.1686   LearningRate 0.0001   Epoch: 38   Global Step: 196530   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:36:51,032-Speed 10357.77 samples/sec   Loss 5.2123   LearningRate 0.0001   Epoch: 38   Global Step: 196540   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:36:52,064-Speed 9961.28 samples/sec   Loss 5.1196   LearningRate 0.0001   Epoch: 38   Global Step: 196550   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:36:53,062-Speed 10268.45 samples/sec   Loss 5.1660   LearningRate 0.0001   Epoch: 38   Global Step: 196560   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:36:54,095-Speed 9921.95 samples/sec   Loss 5.1848   LearningRate 0.0001   Epoch: 38   Global Step: 196570   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:36:55,029-Speed 10969.79 samples/sec   Loss 5.0960   LearningRate 0.0001   Epoch: 38   Global Step: 196580   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:36:56,042-Speed 10118.58 samples/sec   Loss 5.1644   LearningRate 0.0001   Epoch: 38   Global Step: 196590   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:36:57,072-Speed 9950.70 samples/sec   Loss 5.2417   LearningRate 0.0001   Epoch: 38   Global Step: 196600   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:36:58,026-Speed 10748.50 samples/sec   Loss 5.0924   LearningRate 0.0001   Epoch: 38   Global Step: 196610   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:36:59,008-Speed 10437.95 samples/sec   Loss 5.0850   LearningRate 0.0001   Epoch: 38   Global Step: 196620   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:37:00,034-Speed 9988.54 samples/sec   Loss 5.2200   LearningRate 0.0001   Epoch: 38   Global Step: 196630   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:37:00,996-Speed 10649.36 samples/sec   Loss 5.2307   LearningRate 0.0001   Epoch: 38   Global Step: 196640   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:37:01,989-Speed 10329.77 samples/sec   Loss 5.1394   LearningRate 0.0001   Epoch: 38   Global Step: 196650   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:37:02,972-Speed 10422.94 samples/sec   Loss 5.1712   LearningRate 0.0001   Epoch: 38   Global Step: 196660   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:37:04,033-Speed 9654.67 samples/sec   Loss 5.1544   LearningRate 0.0001   Epoch: 38   Global Step: 196670   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:37:05,025-Speed 10339.57 samples/sec   Loss 5.0442   LearningRate 0.0001   Epoch: 38   Global Step: 196680   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:37:05,972-Speed 10836.22 samples/sec   Loss 5.0455   LearningRate 0.0001   Epoch: 38   Global Step: 196690   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:37:06,976-Speed 10200.96 samples/sec   Loss 5.1579   LearningRate 0.0001   Epoch: 38   Global Step: 196700   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:37:07,970-Speed 10309.81 samples/sec   Loss 4.9952   LearningRate 0.0001   Epoch: 38   Global Step: 196710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:37:08,953-Speed 10439.45 samples/sec   Loss 5.0530   LearningRate 0.0001   Epoch: 38   Global Step: 196720   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:37:09,938-Speed 10399.50 samples/sec   Loss 5.1709   LearningRate 0.0001   Epoch: 38   Global Step: 196730   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:37:10,880-Speed 10875.06 samples/sec   Loss 5.1445   LearningRate 0.0001   Epoch: 38   Global Step: 196740   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:37:11,896-Speed 10094.76 samples/sec   Loss 5.1116   LearningRate 0.0001   Epoch: 38   Global Step: 196750   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:37:12,898-Speed 10223.36 samples/sec   Loss 5.2049   LearningRate 0.0001   Epoch: 38   Global Step: 196760   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:37:13,914-Speed 10098.27 samples/sec   Loss 5.0162   LearningRate 0.0001   Epoch: 38   Global Step: 196770   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:37:14,932-Speed 10059.25 samples/sec   Loss 5.0024   LearningRate 0.0001   Epoch: 38   Global Step: 196780   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:37:15,998-Speed 9612.92 samples/sec   Loss 5.0631   LearningRate 0.0001   Epoch: 38   Global Step: 196790   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:37:17,048-Speed 9775.94 samples/sec   Loss 5.1848   LearningRate 0.0001   Epoch: 38   Global Step: 196800   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:37:18,061-Speed 10131.44 samples/sec   Loss 5.1094   LearningRate 0.0001   Epoch: 38   Global Step: 196810   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:37:19,001-Speed 10896.83 samples/sec   Loss 5.0542   LearningRate 0.0001   Epoch: 38   Global Step: 196820   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:37:20,035-Speed 9914.29 samples/sec   Loss 5.0888   LearningRate 0.0001   Epoch: 38   Global Step: 196830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:37:21,056-Speed 10039.58 samples/sec   Loss 5.0693   LearningRate 0.0001   Epoch: 38   Global Step: 196840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:37:22,024-Speed 10589.13 samples/sec   Loss 5.0899   LearningRate 0.0001   Epoch: 38   Global Step: 196850   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:37:23,032-Speed 10166.47 samples/sec   Loss 4.9964   LearningRate 0.0001   Epoch: 38   Global Step: 196860   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:37:24,017-Speed 10399.76 samples/sec   Loss 5.1694   LearningRate 0.0001   Epoch: 38   Global Step: 196870   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:37:24,984-Speed 10603.54 samples/sec   Loss 5.0433   LearningRate 0.0001   Epoch: 38   Global Step: 196880   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:37:25,933-Speed 10796.39 samples/sec   Loss 5.1263   LearningRate 0.0001   Epoch: 38   Global Step: 196890   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:37:26,928-Speed 10298.20 samples/sec   Loss 5.0218   LearningRate 0.0001   Epoch: 38   Global Step: 196900   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:37:27,937-Speed 10167.13 samples/sec   Loss 5.1145   LearningRate 0.0001   Epoch: 38   Global Step: 196910   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:37:28,907-Speed 10563.50 samples/sec   Loss 4.9915   LearningRate 0.0001   Epoch: 38   Global Step: 196920   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:37:29,872-Speed 10628.54 samples/sec   Loss 5.1260   LearningRate 0.0001   Epoch: 38   Global Step: 196930   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:37:30,838-Speed 10609.10 samples/sec   Loss 5.0477   LearningRate 0.0001   Epoch: 38   Global Step: 196940   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:37:31,887-Speed 9764.85 samples/sec   Loss 5.0887   LearningRate 0.0001   Epoch: 38   Global Step: 196950   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:37:32,873-Speed 10402.00 samples/sec   Loss 5.2160   LearningRate 0.0001   Epoch: 38   Global Step: 196960   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:37:33,875-Speed 10225.71 samples/sec   Loss 5.0420   LearningRate 0.0001   Epoch: 38   Global Step: 196970   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:37:34,890-Speed 10092.89 samples/sec   Loss 5.1842   LearningRate 0.0001   Epoch: 38   Global Step: 196980   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:37:35,878-Speed 10375.74 samples/sec   Loss 5.1551   LearningRate 0.0001   Epoch: 38   Global Step: 196990   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:37:36,854-Speed 10505.54 samples/sec   Loss 5.2703   LearningRate 0.0001   Epoch: 38   Global Step: 197000   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:37:37,904-Speed 9762.37 samples/sec   Loss 4.9838   LearningRate 0.0001   Epoch: 38   Global Step: 197010   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:37:38,884-Speed 10466.90 samples/sec   Loss 5.2514   LearningRate 0.0001   Epoch: 38   Global Step: 197020   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:37:39,888-Speed 10207.55 samples/sec   Loss 5.1448   LearningRate 0.0001   Epoch: 38   Global Step: 197030   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:37:40,881-Speed 10323.54 samples/sec   Loss 5.1390   LearningRate 0.0001   Epoch: 38   Global Step: 197040   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:37:41,876-Speed 10301.56 samples/sec   Loss 5.1770   LearningRate 0.0001   Epoch: 38   Global Step: 197050   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:37:42,876-Speed 10241.42 samples/sec   Loss 5.0136   LearningRate 0.0001   Epoch: 38   Global Step: 197060   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:37:43,892-Speed 10092.15 samples/sec   Loss 5.0985   LearningRate 0.0001   Epoch: 38   Global Step: 197070   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:37:44,886-Speed 10308.94 samples/sec   Loss 5.1598   LearningRate 0.0001   Epoch: 38   Global Step: 197080   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:37:45,849-Speed 10642.60 samples/sec   Loss 5.1764   LearningRate 0.0001   Epoch: 38   Global Step: 197090   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:37:46,821-Speed 10549.81 samples/sec   Loss 5.3474   LearningRate 0.0001   Epoch: 38   Global Step: 197100   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:37:47,823-Speed 10223.98 samples/sec   Loss 4.9986   LearningRate 0.0001   Epoch: 38   Global Step: 197110   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:37:48,815-Speed 10335.55 samples/sec   Loss 5.1931   LearningRate 0.0001   Epoch: 38   Global Step: 197120   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:37:49,815-Speed 10252.33 samples/sec   Loss 5.0748   LearningRate 0.0001   Epoch: 38   Global Step: 197130   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:37:50,811-Speed 10306.47 samples/sec   Loss 5.1813   LearningRate 0.0001   Epoch: 38   Global Step: 197140   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:37:51,831-Speed 10044.88 samples/sec   Loss 5.0427   LearningRate 0.0001   Epoch: 38   Global Step: 197150   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:37:52,846-Speed 10112.47 samples/sec   Loss 5.1248   LearningRate 0.0001   Epoch: 38   Global Step: 197160   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:37:53,941-Speed 9359.82 samples/sec   Loss 5.0616   LearningRate 0.0001   Epoch: 38   Global Step: 197170   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:37:54,977-Speed 9895.41 samples/sec   Loss 5.0934   LearningRate 0.0001   Epoch: 38   Global Step: 197180   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:37:55,944-Speed 10605.67 samples/sec   Loss 5.1654   LearningRate 0.0001   Epoch: 38   Global Step: 197190   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:37:56,918-Speed 10516.95 samples/sec   Loss 5.0453   LearningRate 0.0001   Epoch: 38   Global Step: 197200   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:37:57,942-Speed 10132.72 samples/sec   Loss 5.1966   LearningRate 0.0001   Epoch: 38   Global Step: 197210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:37:58,977-Speed 9900.31 samples/sec   Loss 5.2504   LearningRate 0.0001   Epoch: 38   Global Step: 197220   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:37:59,951-Speed 10523.73 samples/sec   Loss 5.0216   LearningRate 0.0001   Epoch: 38   Global Step: 197230   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:38:00,900-Speed 10796.59 samples/sec   Loss 5.0710   LearningRate 0.0001   Epoch: 38   Global Step: 197240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:38:01,871-Speed 10561.98 samples/sec   Loss 5.2407   LearningRate 0.0001   Epoch: 38   Global Step: 197250   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:38:02,858-Speed 10386.95 samples/sec   Loss 5.1027   LearningRate 0.0001   Epoch: 38   Global Step: 197260   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:38:14,792-Speed 858.19 samples/sec   Loss 5.1141   LearningRate 0.0001   Epoch: 39   Global Step: 197270   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:38:15,780-Speed 10374.91 samples/sec   Loss 5.0792   LearningRate 0.0001   Epoch: 39   Global Step: 197280   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-11 06:38:16,831-Speed 9748.43 samples/sec   Loss 5.0105   LearningRate 0.0001   Epoch: 39   Global Step: 197290   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:38:17,807-Speed 10505.94 samples/sec   Loss 5.0837   LearningRate 0.0001   Epoch: 39   Global Step: 197300   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:38:19,034-Speed 8354.97 samples/sec   Loss 5.1922   LearningRate 0.0001   Epoch: 39   Global Step: 197310   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:38:20,180-Speed 8943.73 samples/sec   Loss 5.1228   LearningRate 0.0001   Epoch: 39   Global Step: 197320   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:38:21,215-Speed 9895.22 samples/sec   Loss 5.0178   LearningRate 0.0001   Epoch: 39   Global Step: 197330   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:38:22,299-Speed 9455.31 samples/sec   Loss 5.1287   LearningRate 0.0001   Epoch: 39   Global Step: 197340   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:38:23,285-Speed 10403.65 samples/sec   Loss 4.8734   LearningRate 0.0001   Epoch: 39   Global Step: 197350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:38:24,260-Speed 10524.84 samples/sec   Loss 5.1464   LearningRate 0.0001   Epoch: 39   Global Step: 197360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:38:25,248-Speed 10368.15 samples/sec   Loss 5.1845   LearningRate 0.0001   Epoch: 39   Global Step: 197370   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:38:26,266-Speed 10062.89 samples/sec   Loss 5.0024   LearningRate 0.0001   Epoch: 39   Global Step: 197380   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:38:27,272-Speed 10201.18 samples/sec   Loss 5.0254   LearningRate 0.0001   Epoch: 39   Global Step: 197390   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:38:28,244-Speed 10549.64 samples/sec   Loss 5.0703   LearningRate 0.0001   Epoch: 39   Global Step: 197400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:38:29,262-Speed 10062.07 samples/sec   Loss 5.1281   LearningRate 0.0001   Epoch: 39   Global Step: 197410   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:38:30,269-Speed 10184.30 samples/sec   Loss 4.9973   LearningRate 0.0001   Epoch: 39   Global Step: 197420   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:38:31,256-Speed 10386.90 samples/sec   Loss 5.0333   LearningRate 0.0001   Epoch: 39   Global Step: 197430   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:38:32,255-Speed 10262.41 samples/sec   Loss 5.2405   LearningRate 0.0001   Epoch: 39   Global Step: 197440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:38:33,268-Speed 10108.23 samples/sec   Loss 5.0540   LearningRate 0.0001   Epoch: 39   Global Step: 197450   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:38:34,250-Speed 10446.11 samples/sec   Loss 5.2079   LearningRate 0.0001   Epoch: 39   Global Step: 197460   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:38:35,238-Speed 10377.06 samples/sec   Loss 5.0989   LearningRate 0.0001   Epoch: 39   Global Step: 197470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:38:36,193-Speed 10742.93 samples/sec   Loss 5.0869   LearningRate 0.0001   Epoch: 39   Global Step: 197480   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:38:37,193-Speed 10246.42 samples/sec   Loss 5.2329   LearningRate 0.0001   Epoch: 39   Global Step: 197490   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:38:38,248-Speed 9719.03 samples/sec   Loss 5.1265   LearningRate 0.0001   Epoch: 39   Global Step: 197500   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:38:39,295-Speed 9796.40 samples/sec   Loss 5.0684   LearningRate 0.0001   Epoch: 39   Global Step: 197510   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:38:40,269-Speed 10518.58 samples/sec   Loss 5.1598   LearningRate 0.0001   Epoch: 39   Global Step: 197520   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:38:41,250-Speed 10449.58 samples/sec   Loss 4.9145   LearningRate 0.0001   Epoch: 39   Global Step: 197530   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:38:42,319-Speed 9590.70 samples/sec   Loss 4.8885   LearningRate 0.0001   Epoch: 39   Global Step: 197540   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:38:43,310-Speed 10342.89 samples/sec   Loss 5.1121   LearningRate 0.0001   Epoch: 39   Global Step: 197550   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:38:44,284-Speed 10521.31 samples/sec   Loss 5.0134   LearningRate 0.0001   Epoch: 39   Global Step: 197560   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:38:45,279-Speed 10308.43 samples/sec   Loss 5.1457   LearningRate 0.0001   Epoch: 39   Global Step: 197570   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:38:46,291-Speed 10125.64 samples/sec   Loss 5.0860   LearningRate 0.0001   Epoch: 39   Global Step: 197580   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:38:47,306-Speed 10102.84 samples/sec   Loss 5.1229   LearningRate 0.0001   Epoch: 39   Global Step: 197590   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:38:48,269-Speed 10640.38 samples/sec   Loss 5.0548   LearningRate 0.0001   Epoch: 39   Global Step: 197600   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:38:49,292-Speed 10016.38 samples/sec   Loss 5.1077   LearningRate 0.0001   Epoch: 39   Global Step: 197610   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:38:50,400-Speed 9258.47 samples/sec   Loss 5.0814   LearningRate 0.0001   Epoch: 39   Global Step: 197620   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:38:51,358-Speed 10732.20 samples/sec   Loss 5.0947   LearningRate 0.0001   Epoch: 39   Global Step: 197630   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:38:52,368-Speed 10141.09 samples/sec   Loss 4.9452   LearningRate 0.0001   Epoch: 39   Global Step: 197640   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:38:53,416-Speed 9789.80 samples/sec   Loss 5.0921   LearningRate 0.0001   Epoch: 39   Global Step: 197650   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:38:54,393-Speed 10491.07 samples/sec   Loss 5.0338   LearningRate 0.0001   Epoch: 39   Global Step: 197660   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:38:55,409-Speed 10089.25 samples/sec   Loss 4.9769   LearningRate 0.0001   Epoch: 39   Global Step: 197670   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:38:56,406-Speed 10269.98 samples/sec   Loss 5.0210   LearningRate 0.0001   Epoch: 39   Global Step: 197680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:38:57,394-Speed 10377.00 samples/sec   Loss 5.2240   LearningRate 0.0001   Epoch: 39   Global Step: 197690   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:38:58,372-Speed 10490.45 samples/sec   Loss 5.0531   LearningRate 0.0001   Epoch: 39   Global Step: 197700   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:38:59,363-Speed 10335.38 samples/sec   Loss 5.1394   LearningRate 0.0001   Epoch: 39   Global Step: 197710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:39:00,412-Speed 9778.12 samples/sec   Loss 5.0731   LearningRate 0.0001   Epoch: 39   Global Step: 197720   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:39:01,380-Speed 10590.48 samples/sec   Loss 5.1205   LearningRate 0.0001   Epoch: 39   Global Step: 197730   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:39:02,376-Speed 10292.75 samples/sec   Loss 5.1509   LearningRate 0.0001   Epoch: 39   Global Step: 197740   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:39:03,359-Speed 10426.14 samples/sec   Loss 5.1325   LearningRate 0.0001   Epoch: 39   Global Step: 197750   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:39:04,394-Speed 9901.57 samples/sec   Loss 5.3243   LearningRate 0.0001   Epoch: 39   Global Step: 197760   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:39:05,386-Speed 10336.45 samples/sec   Loss 5.2664   LearningRate 0.0001   Epoch: 39   Global Step: 197770   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:39:06,379-Speed 10317.85 samples/sec   Loss 5.1847   LearningRate 0.0001   Epoch: 39   Global Step: 197780   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:39:07,365-Speed 10398.45 samples/sec   Loss 5.0480   LearningRate 0.0001   Epoch: 39   Global Step: 197790   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:39:08,397-Speed 9922.18 samples/sec   Loss 5.0114   LearningRate 0.0000   Epoch: 39   Global Step: 197800   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:39:09,398-Speed 10251.40 samples/sec   Loss 4.9478   LearningRate 0.0000   Epoch: 39   Global Step: 197810   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:39:10,405-Speed 10171.24 samples/sec   Loss 4.9429   LearningRate 0.0000   Epoch: 39   Global Step: 197820   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:39:11,420-Speed 10096.75 samples/sec   Loss 5.1368   LearningRate 0.0000   Epoch: 39   Global Step: 197830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:39:12,536-Speed 9187.87 samples/sec   Loss 5.0870   LearningRate 0.0000   Epoch: 39   Global Step: 197840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:39:13,495-Speed 10682.09 samples/sec   Loss 5.0869   LearningRate 0.0000   Epoch: 39   Global Step: 197850   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:39:14,472-Speed 10495.18 samples/sec   Loss 5.0546   LearningRate 0.0000   Epoch: 39   Global Step: 197860   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:39:15,428-Speed 10732.03 samples/sec   Loss 5.0855   LearningRate 0.0000   Epoch: 39   Global Step: 197870   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:39:16,391-Speed 10642.01 samples/sec   Loss 5.1468   LearningRate 0.0000   Epoch: 39   Global Step: 197880   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:39:17,436-Speed 9816.24 samples/sec   Loss 4.8680   LearningRate 0.0000   Epoch: 39   Global Step: 197890   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:39:18,458-Speed 10032.16 samples/sec   Loss 5.0754   LearningRate 0.0000   Epoch: 39   Global Step: 197900   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:39:19,473-Speed 10097.26 samples/sec   Loss 5.1003   LearningRate 0.0000   Epoch: 39   Global Step: 197910   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:39:20,518-Speed 9812.10 samples/sec   Loss 5.0287   LearningRate 0.0000   Epoch: 39   Global Step: 197920   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:39:21,525-Speed 10174.88 samples/sec   Loss 4.9688   LearningRate 0.0000   Epoch: 39   Global Step: 197930   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:39:22,539-Speed 10111.25 samples/sec   Loss 5.1110   LearningRate 0.0000   Epoch: 39   Global Step: 197940   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:39:23,596-Speed 9695.22 samples/sec   Loss 5.0494   LearningRate 0.0000   Epoch: 39   Global Step: 197950   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:39:24,578-Speed 10439.98 samples/sec   Loss 5.1201   LearningRate 0.0000   Epoch: 39   Global Step: 197960   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:39:25,588-Speed 10144.95 samples/sec   Loss 5.1666   LearningRate 0.0000   Epoch: 39   Global Step: 197970   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:39:26,615-Speed 9977.86 samples/sec   Loss 5.0984   LearningRate 0.0000   Epoch: 39   Global Step: 197980   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:39:27,662-Speed 9793.69 samples/sec   Loss 5.0827   LearningRate 0.0000   Epoch: 39   Global Step: 197990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:39:28,630-Speed 10590.73 samples/sec   Loss 5.0804   LearningRate 0.0000   Epoch: 39   Global Step: 198000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:39:51,201-[lfw][198000]XNorm: 7.968239
Training: 2022-04-11 06:39:51,202-[lfw][198000]Accuracy-Flip: 0.99617+-0.00317
Training: 2022-04-11 06:39:51,203-[lfw][198000]Accuracy-Highest: 0.99700
Training: 2022-04-11 06:40:16,860-[cfp_fp][198000]XNorm: 6.893361
Training: 2022-04-11 06:40:16,861-[cfp_fp][198000]Accuracy-Flip: 0.97229+-0.00915
Training: 2022-04-11 06:40:16,862-[cfp_fp][198000]Accuracy-Highest: 0.97371
Training: 2022-04-11 06:40:38,930-[agedb_30][198000]XNorm: 7.789925
Training: 2022-04-11 06:40:38,930-[agedb_30][198000]Accuracy-Flip: 0.97117+-0.00796
Training: 2022-04-11 06:40:38,931-[agedb_30][198000]Accuracy-Highest: 0.97350
Training: 2022-04-11 06:40:39,938-Speed 143.60 samples/sec   Loss 5.0242   LearningRate 0.0000   Epoch: 39   Global Step: 198010   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:40:40,908-Speed 10558.42 samples/sec   Loss 5.0541   LearningRate 0.0000   Epoch: 39   Global Step: 198020   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:40:41,912-Speed 10204.12 samples/sec   Loss 5.0625   LearningRate 0.0000   Epoch: 39   Global Step: 198030   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:40:42,895-Speed 10429.29 samples/sec   Loss 4.8980   LearningRate 0.0000   Epoch: 39   Global Step: 198040   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:40:43,875-Speed 10461.25 samples/sec   Loss 5.0027   LearningRate 0.0000   Epoch: 39   Global Step: 198050   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:40:44,887-Speed 10125.85 samples/sec   Loss 5.0100   LearningRate 0.0000   Epoch: 39   Global Step: 198060   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:40:45,863-Speed 10508.20 samples/sec   Loss 5.1039   LearningRate 0.0000   Epoch: 39   Global Step: 198070   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:40:46,879-Speed 10092.90 samples/sec   Loss 5.0818   LearningRate 0.0000   Epoch: 39   Global Step: 198080   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:40:47,866-Speed 10382.20 samples/sec   Loss 5.0161   LearningRate 0.0000   Epoch: 39   Global Step: 198090   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:40:48,871-Speed 10205.53 samples/sec   Loss 5.0942   LearningRate 0.0000   Epoch: 39   Global Step: 198100   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:40:49,865-Speed 10312.52 samples/sec   Loss 5.1442   LearningRate 0.0000   Epoch: 39   Global Step: 198110   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:40:50,857-Speed 10336.69 samples/sec   Loss 5.0072   LearningRate 0.0000   Epoch: 39   Global Step: 198120   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:40:51,850-Speed 10320.65 samples/sec   Loss 5.0611   LearningRate 0.0000   Epoch: 39   Global Step: 198130   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:40:52,830-Speed 10458.68 samples/sec   Loss 5.1932   LearningRate 0.0000   Epoch: 39   Global Step: 198140   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:40:53,812-Speed 10437.63 samples/sec   Loss 5.1545   LearningRate 0.0000   Epoch: 39   Global Step: 198150   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:40:54,824-Speed 10118.03 samples/sec   Loss 5.2362   LearningRate 0.0000   Epoch: 39   Global Step: 198160   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:40:55,911-Speed 9433.66 samples/sec   Loss 5.2196   LearningRate 0.0000   Epoch: 39   Global Step: 198170   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:40:56,878-Speed 10601.40 samples/sec   Loss 5.0910   LearningRate 0.0000   Epoch: 39   Global Step: 198180   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:40:57,817-Speed 10912.50 samples/sec   Loss 5.1102   LearningRate 0.0000   Epoch: 39   Global Step: 198190   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:40:58,786-Speed 10576.35 samples/sec   Loss 5.0402   LearningRate 0.0000   Epoch: 39   Global Step: 198200   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:40:59,791-Speed 10197.19 samples/sec   Loss 5.1666   LearningRate 0.0000   Epoch: 39   Global Step: 198210   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:41:00,793-Speed 10242.28 samples/sec   Loss 5.0953   LearningRate 0.0000   Epoch: 39   Global Step: 198220   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:41:01,783-Speed 10353.35 samples/sec   Loss 5.0285   LearningRate 0.0000   Epoch: 39   Global Step: 198230   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:41:02,775-Speed 10337.73 samples/sec   Loss 5.1116   LearningRate 0.0000   Epoch: 39   Global Step: 198240   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:41:03,803-Speed 9967.62 samples/sec   Loss 5.2145   LearningRate 0.0000   Epoch: 39   Global Step: 198250   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:41:04,781-Speed 10484.30 samples/sec   Loss 5.0699   LearningRate 0.0000   Epoch: 39   Global Step: 198260   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:41:05,731-Speed 10792.54 samples/sec   Loss 5.1852   LearningRate 0.0000   Epoch: 39   Global Step: 198270   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:41:06,729-Speed 10270.47 samples/sec   Loss 5.2540   LearningRate 0.0000   Epoch: 39   Global Step: 198280   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:41:07,722-Speed 10320.70 samples/sec   Loss 5.0307   LearningRate 0.0000   Epoch: 39   Global Step: 198290   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:41:08,712-Speed 10358.44 samples/sec   Loss 5.1124   LearningRate 0.0000   Epoch: 39   Global Step: 198300   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:41:09,689-Speed 10491.95 samples/sec   Loss 4.9615   LearningRate 0.0000   Epoch: 39   Global Step: 198310   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:41:10,714-Speed 9995.75 samples/sec   Loss 5.1681   LearningRate 0.0000   Epoch: 39   Global Step: 198320   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:41:11,759-Speed 9799.99 samples/sec   Loss 5.0931   LearningRate 0.0000   Epoch: 39   Global Step: 198330   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:41:12,768-Speed 10164.04 samples/sec   Loss 5.1502   LearningRate 0.0000   Epoch: 39   Global Step: 198340   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:41:13,732-Speed 10632.30 samples/sec   Loss 5.0866   LearningRate 0.0000   Epoch: 39   Global Step: 198350   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:41:14,729-Speed 10285.31 samples/sec   Loss 5.1964   LearningRate 0.0000   Epoch: 39   Global Step: 198360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:41:15,828-Speed 9323.38 samples/sec   Loss 5.0592   LearningRate 0.0000   Epoch: 39   Global Step: 198370   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:41:16,811-Speed 10423.87 samples/sec   Loss 4.9987   LearningRate 0.0000   Epoch: 39   Global Step: 198380   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:41:17,777-Speed 10610.28 samples/sec   Loss 4.9569   LearningRate 0.0000   Epoch: 39   Global Step: 198390   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:41:18,751-Speed 10520.14 samples/sec   Loss 5.1681   LearningRate 0.0000   Epoch: 39   Global Step: 198400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:41:19,779-Speed 9976.03 samples/sec   Loss 5.0747   LearningRate 0.0000   Epoch: 39   Global Step: 198410   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:41:20,829-Speed 9763.37 samples/sec   Loss 5.3448   LearningRate 0.0000   Epoch: 39   Global Step: 198420   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:41:21,827-Speed 10268.82 samples/sec   Loss 5.2049   LearningRate 0.0000   Epoch: 39   Global Step: 198430   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:41:22,824-Speed 10285.99 samples/sec   Loss 5.1300   LearningRate 0.0000   Epoch: 39   Global Step: 198440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:41:23,835-Speed 10131.69 samples/sec   Loss 4.9489   LearningRate 0.0000   Epoch: 39   Global Step: 198450   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:41:24,834-Speed 10259.50 samples/sec   Loss 5.2278   LearningRate 0.0000   Epoch: 39   Global Step: 198460   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:41:25,816-Speed 10442.68 samples/sec   Loss 5.1266   LearningRate 0.0000   Epoch: 39   Global Step: 198470   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:41:26,805-Speed 10364.68 samples/sec   Loss 4.9880   LearningRate 0.0000   Epoch: 39   Global Step: 198480   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:41:27,804-Speed 10254.94 samples/sec   Loss 4.9632   LearningRate 0.0000   Epoch: 39   Global Step: 198490   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:41:28,773-Speed 10584.26 samples/sec   Loss 5.1764   LearningRate 0.0000   Epoch: 39   Global Step: 198500   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:41:29,783-Speed 10148.74 samples/sec   Loss 5.1708   LearningRate 0.0000   Epoch: 39   Global Step: 198510   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:41:30,794-Speed 10142.08 samples/sec   Loss 5.0265   LearningRate 0.0000   Epoch: 39   Global Step: 198520   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:41:31,795-Speed 10242.74 samples/sec   Loss 5.0244   LearningRate 0.0000   Epoch: 39   Global Step: 198530   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:41:32,792-Speed 10279.99 samples/sec   Loss 5.0472   LearningRate 0.0000   Epoch: 39   Global Step: 198540   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:41:33,780-Speed 10363.19 samples/sec   Loss 5.0646   LearningRate 0.0000   Epoch: 39   Global Step: 198550   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:41:34,794-Speed 10116.08 samples/sec   Loss 5.1419   LearningRate 0.0000   Epoch: 39   Global Step: 198560   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:41:35,764-Speed 10570.01 samples/sec   Loss 5.1821   LearningRate 0.0000   Epoch: 39   Global Step: 198570   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:41:36,746-Speed 10440.50 samples/sec   Loss 5.1375   LearningRate 0.0000   Epoch: 39   Global Step: 198580   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:41:37,770-Speed 10000.07 samples/sec   Loss 5.0977   LearningRate 0.0000   Epoch: 39   Global Step: 198590   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:41:38,773-Speed 10227.13 samples/sec   Loss 5.1650   LearningRate 0.0000   Epoch: 39   Global Step: 198600   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:41:39,742-Speed 10583.47 samples/sec   Loss 5.0645   LearningRate 0.0000   Epoch: 39   Global Step: 198610   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:41:40,687-Speed 10845.13 samples/sec   Loss 5.0944   LearningRate 0.0000   Epoch: 39   Global Step: 198620   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:41:41,668-Speed 10440.16 samples/sec   Loss 5.0150   LearningRate 0.0000   Epoch: 39   Global Step: 198630   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:41:42,643-Speed 10511.30 samples/sec   Loss 5.1154   LearningRate 0.0000   Epoch: 39   Global Step: 198640   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:41:43,669-Speed 9990.59 samples/sec   Loss 5.0211   LearningRate 0.0000   Epoch: 39   Global Step: 198650   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:41:44,651-Speed 10442.48 samples/sec   Loss 5.1132   LearningRate 0.0000   Epoch: 39   Global Step: 198660   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:41:45,665-Speed 10110.45 samples/sec   Loss 5.1568   LearningRate 0.0000   Epoch: 39   Global Step: 198670   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:41:46,677-Speed 10122.66 samples/sec   Loss 5.0687   LearningRate 0.0000   Epoch: 39   Global Step: 198680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:41:47,711-Speed 9913.47 samples/sec   Loss 4.9749   LearningRate 0.0000   Epoch: 39   Global Step: 198690   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:41:48,707-Speed 10298.91 samples/sec   Loss 5.0183   LearningRate 0.0000   Epoch: 39   Global Step: 198700   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:41:49,695-Speed 10365.99 samples/sec   Loss 5.0227   LearningRate 0.0000   Epoch: 39   Global Step: 198710   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:41:50,777-Speed 9472.00 samples/sec   Loss 5.0388   LearningRate 0.0000   Epoch: 39   Global Step: 198720   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:41:51,776-Speed 10269.69 samples/sec   Loss 5.0339   LearningRate 0.0000   Epoch: 39   Global Step: 198730   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:41:52,789-Speed 10116.75 samples/sec   Loss 5.1674   LearningRate 0.0000   Epoch: 39   Global Step: 198740   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:41:53,781-Speed 10328.27 samples/sec   Loss 5.1648   LearningRate 0.0000   Epoch: 39   Global Step: 198750   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:41:54,822-Speed 9856.64 samples/sec   Loss 5.0394   LearningRate 0.0000   Epoch: 39   Global Step: 198760   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:41:55,793-Speed 10566.37 samples/sec   Loss 5.1260   LearningRate 0.0000   Epoch: 39   Global Step: 198770   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:41:56,770-Speed 10483.96 samples/sec   Loss 5.0690   LearningRate 0.0000   Epoch: 39   Global Step: 198780   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:41:57,784-Speed 10108.15 samples/sec   Loss 5.0745   LearningRate 0.0000   Epoch: 39   Global Step: 198790   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:41:58,831-Speed 9794.07 samples/sec   Loss 4.8963   LearningRate 0.0000   Epoch: 39   Global Step: 198800   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:41:59,798-Speed 10594.86 samples/sec   Loss 4.9557   LearningRate 0.0000   Epoch: 39   Global Step: 198810   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:42:00,777-Speed 10470.43 samples/sec   Loss 5.0809   LearningRate 0.0000   Epoch: 39   Global Step: 198820   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:42:01,796-Speed 10060.63 samples/sec   Loss 5.1377   LearningRate 0.0000   Epoch: 39   Global Step: 198830   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:42:02,784-Speed 10370.59 samples/sec   Loss 5.0720   LearningRate 0.0000   Epoch: 39   Global Step: 198840   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:42:03,784-Speed 10248.38 samples/sec   Loss 5.0209   LearningRate 0.0000   Epoch: 39   Global Step: 198850   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:42:04,771-Speed 10384.46 samples/sec   Loss 5.1312   LearningRate 0.0000   Epoch: 39   Global Step: 198860   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:42:05,776-Speed 10200.68 samples/sec   Loss 5.1570   LearningRate 0.0000   Epoch: 39   Global Step: 198870   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:42:06,761-Speed 10402.88 samples/sec   Loss 5.1901   LearningRate 0.0000   Epoch: 39   Global Step: 198880   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:42:07,762-Speed 10238.91 samples/sec   Loss 5.0900   LearningRate 0.0000   Epoch: 39   Global Step: 198890   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:42:08,801-Speed 9876.27 samples/sec   Loss 5.1277   LearningRate 0.0000   Epoch: 39   Global Step: 198900   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:42:09,772-Speed 10554.78 samples/sec   Loss 4.9703   LearningRate 0.0000   Epoch: 39   Global Step: 198910   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:42:10,766-Speed 10310.15 samples/sec   Loss 4.9725   LearningRate 0.0000   Epoch: 39   Global Step: 198920   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:42:11,834-Speed 9600.93 samples/sec   Loss 5.0503   LearningRate 0.0000   Epoch: 39   Global Step: 198930   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:42:12,777-Speed 10865.33 samples/sec   Loss 5.1193   LearningRate 0.0000   Epoch: 39   Global Step: 198940   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:42:13,748-Speed 10558.60 samples/sec   Loss 5.1052   LearningRate 0.0000   Epoch: 39   Global Step: 198950   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:42:14,725-Speed 10485.98 samples/sec   Loss 4.9317   LearningRate 0.0000   Epoch: 39   Global Step: 198960   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:42:15,711-Speed 10400.21 samples/sec   Loss 5.0744   LearningRate 0.0000   Epoch: 39   Global Step: 198970   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:42:16,716-Speed 10192.29 samples/sec   Loss 5.0147   LearningRate 0.0000   Epoch: 39   Global Step: 198980   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:42:17,723-Speed 10183.80 samples/sec   Loss 5.1805   LearningRate 0.0000   Epoch: 39   Global Step: 198990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:42:18,757-Speed 9914.48 samples/sec   Loss 5.0106   LearningRate 0.0000   Epoch: 39   Global Step: 199000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:42:19,726-Speed 10580.04 samples/sec   Loss 5.1679   LearningRate 0.0000   Epoch: 39   Global Step: 199010   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:42:20,683-Speed 10705.19 samples/sec   Loss 5.1030   LearningRate 0.0000   Epoch: 39   Global Step: 199020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:42:21,724-Speed 9839.82 samples/sec   Loss 5.1869   LearningRate 0.0000   Epoch: 39   Global Step: 199030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:42:22,730-Speed 10191.74 samples/sec   Loss 4.9404   LearningRate 0.0000   Epoch: 39   Global Step: 199040   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:42:23,667-Speed 10939.47 samples/sec   Loss 5.0339   LearningRate 0.0000   Epoch: 39   Global Step: 199050   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:42:24,689-Speed 10029.89 samples/sec   Loss 5.1148   LearningRate 0.0000   Epoch: 39   Global Step: 199060   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:42:25,694-Speed 10198.14 samples/sec   Loss 4.8961   LearningRate 0.0000   Epoch: 39   Global Step: 199070   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:42:26,677-Speed 10426.84 samples/sec   Loss 5.0084   LearningRate 0.0000   Epoch: 39   Global Step: 199080   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:42:27,650-Speed 10536.08 samples/sec   Loss 5.1345   LearningRate 0.0000   Epoch: 39   Global Step: 199090   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:42:28,650-Speed 10251.71 samples/sec   Loss 5.1430   LearningRate 0.0000   Epoch: 39   Global Step: 199100   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:42:30,033-Speed 7413.58 samples/sec   Loss 5.1208   LearningRate 0.0000   Epoch: 39   Global Step: 199110   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:42:31,003-Speed 10566.02 samples/sec   Loss 5.0820   LearningRate 0.0000   Epoch: 39   Global Step: 199120   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:42:31,965-Speed 10652.62 samples/sec   Loss 5.1608   LearningRate 0.0000   Epoch: 39   Global Step: 199130   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:42:32,979-Speed 10102.29 samples/sec   Loss 5.1314   LearningRate 0.0000   Epoch: 39   Global Step: 199140   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:42:33,951-Speed 10546.39 samples/sec   Loss 5.3483   LearningRate 0.0000   Epoch: 39   Global Step: 199150   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:42:34,939-Speed 10377.36 samples/sec   Loss 5.1635   LearningRate 0.0000   Epoch: 39   Global Step: 199160   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:42:35,946-Speed 10177.76 samples/sec   Loss 5.0489   LearningRate 0.0000   Epoch: 39   Global Step: 199170   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:42:36,950-Speed 10206.53 samples/sec   Loss 4.8618   LearningRate 0.0000   Epoch: 39   Global Step: 199180   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:42:37,955-Speed 10204.54 samples/sec   Loss 5.0566   LearningRate 0.0000   Epoch: 39   Global Step: 199190   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:42:38,935-Speed 10460.46 samples/sec   Loss 5.0808   LearningRate 0.0000   Epoch: 39   Global Step: 199200   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:42:39,904-Speed 10587.96 samples/sec   Loss 5.0791   LearningRate 0.0000   Epoch: 39   Global Step: 199210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:42:40,845-Speed 10880.06 samples/sec   Loss 5.2316   LearningRate 0.0000   Epoch: 39   Global Step: 199220   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:42:41,862-Speed 10080.31 samples/sec   Loss 5.1089   LearningRate 0.0000   Epoch: 39   Global Step: 199230   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:42:42,849-Speed 10384.31 samples/sec   Loss 5.1210   LearningRate 0.0000   Epoch: 39   Global Step: 199240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:42:43,837-Speed 10374.70 samples/sec   Loss 5.0391   LearningRate 0.0000   Epoch: 39   Global Step: 199250   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:42:44,797-Speed 10671.15 samples/sec   Loss 5.0824   LearningRate 0.0000   Epoch: 39   Global Step: 199260   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:42:45,795-Speed 10267.81 samples/sec   Loss 5.0362   LearningRate 0.0000   Epoch: 39   Global Step: 199270   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:42:46,760-Speed 10625.86 samples/sec   Loss 5.1147   LearningRate 0.0000   Epoch: 39   Global Step: 199280   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:42:47,725-Speed 10614.42 samples/sec   Loss 5.0687   LearningRate 0.0000   Epoch: 39   Global Step: 199290   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:42:48,662-Speed 10938.99 samples/sec   Loss 5.1230   LearningRate 0.0000   Epoch: 39   Global Step: 199300   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:42:49,645-Speed 10419.97 samples/sec   Loss 5.1424   LearningRate 0.0000   Epoch: 39   Global Step: 199310   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:42:50,609-Speed 10630.67 samples/sec   Loss 5.1262   LearningRate 0.0000   Epoch: 39   Global Step: 199320   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:42:51,568-Speed 10681.14 samples/sec   Loss 5.2037   LearningRate 0.0000   Epoch: 39   Global Step: 199330   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:42:52,565-Speed 10284.66 samples/sec   Loss 5.0371   LearningRate 0.0000   Epoch: 39   Global Step: 199340   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:42:53,561-Speed 10282.37 samples/sec   Loss 5.2479   LearningRate 0.0000   Epoch: 39   Global Step: 199350   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:42:54,552-Speed 10355.98 samples/sec   Loss 5.2450   LearningRate 0.0000   Epoch: 39   Global Step: 199360   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:42:55,526-Speed 10520.09 samples/sec   Loss 5.0344   LearningRate 0.0000   Epoch: 39   Global Step: 199370   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:42:56,541-Speed 10091.08 samples/sec   Loss 5.0817   LearningRate 0.0000   Epoch: 39   Global Step: 199380   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:42:57,528-Speed 10385.40 samples/sec   Loss 5.1551   LearningRate 0.0000   Epoch: 39   Global Step: 199390   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:42:58,497-Speed 10570.57 samples/sec   Loss 5.2530   LearningRate 0.0000   Epoch: 39   Global Step: 199400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:42:59,476-Speed 10470.30 samples/sec   Loss 5.1974   LearningRate 0.0000   Epoch: 39   Global Step: 199410   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:43:00,448-Speed 10543.93 samples/sec   Loss 5.3177   LearningRate 0.0000   Epoch: 39   Global Step: 199420   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:01,442-Speed 10304.64 samples/sec   Loss 5.0495   LearningRate 0.0000   Epoch: 39   Global Step: 199430   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:02,435-Speed 10319.54 samples/sec   Loss 5.1781   LearningRate 0.0000   Epoch: 39   Global Step: 199440   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:03,387-Speed 10758.50 samples/sec   Loss 5.0957   LearningRate 0.0000   Epoch: 39   Global Step: 199450   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:04,393-Speed 10192.85 samples/sec   Loss 5.0369   LearningRate 0.0000   Epoch: 39   Global Step: 199460   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:05,378-Speed 10403.68 samples/sec   Loss 5.1113   LearningRate 0.0000   Epoch: 39   Global Step: 199470   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:06,348-Speed 10564.06 samples/sec   Loss 4.9817   LearningRate 0.0000   Epoch: 39   Global Step: 199480   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:07,357-Speed 10148.36 samples/sec   Loss 5.0662   LearningRate 0.0000   Epoch: 39   Global Step: 199490   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:08,362-Speed 10200.65 samples/sec   Loss 5.1279   LearningRate 0.0000   Epoch: 39   Global Step: 199500   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:09,321-Speed 10682.90 samples/sec   Loss 5.1349   LearningRate 0.0000   Epoch: 39   Global Step: 199510   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:10,323-Speed 10226.60 samples/sec   Loss 4.9608   LearningRate 0.0000   Epoch: 39   Global Step: 199520   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:43:11,330-Speed 10174.63 samples/sec   Loss 5.0029   LearningRate 0.0000   Epoch: 39   Global Step: 199530   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:43:12,313-Speed 10429.73 samples/sec   Loss 5.0808   LearningRate 0.0000   Epoch: 39   Global Step: 199540   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:43:13,264-Speed 10772.85 samples/sec   Loss 4.8945   LearningRate 0.0000   Epoch: 39   Global Step: 199550   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:43:14,257-Speed 10314.48 samples/sec   Loss 4.9902   LearningRate 0.0000   Epoch: 39   Global Step: 199560   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:43:15,236-Speed 10471.13 samples/sec   Loss 5.0435   LearningRate 0.0000   Epoch: 39   Global Step: 199570   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:43:16,228-Speed 10334.33 samples/sec   Loss 5.1612   LearningRate 0.0000   Epoch: 39   Global Step: 199580   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:43:17,209-Speed 10440.69 samples/sec   Loss 5.0715   LearningRate 0.0000   Epoch: 39   Global Step: 199590   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:43:18,217-Speed 10171.80 samples/sec   Loss 4.9760   LearningRate 0.0000   Epoch: 39   Global Step: 199600   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:43:19,169-Speed 10768.10 samples/sec   Loss 5.0602   LearningRate 0.0000   Epoch: 39   Global Step: 199610   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:20,150-Speed 10449.16 samples/sec   Loss 4.8939   LearningRate 0.0000   Epoch: 39   Global Step: 199620   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:21,100-Speed 10777.20 samples/sec   Loss 5.0212   LearningRate 0.0000   Epoch: 39   Global Step: 199630   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:22,038-Speed 10929.12 samples/sec   Loss 5.0363   LearningRate 0.0000   Epoch: 39   Global Step: 199640   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:23,039-Speed 10239.16 samples/sec   Loss 5.1303   LearningRate 0.0000   Epoch: 39   Global Step: 199650   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:24,062-Speed 10012.35 samples/sec   Loss 5.0526   LearningRate 0.0000   Epoch: 39   Global Step: 199660   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:25,044-Speed 10438.10 samples/sec   Loss 5.1052   LearningRate 0.0000   Epoch: 39   Global Step: 199670   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:26,008-Speed 10630.90 samples/sec   Loss 5.2089   LearningRate 0.0000   Epoch: 39   Global Step: 199680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:27,006-Speed 10266.26 samples/sec   Loss 5.0776   LearningRate 0.0000   Epoch: 39   Global Step: 199690   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:28,064-Speed 9688.34 samples/sec   Loss 5.0388   LearningRate 0.0000   Epoch: 39   Global Step: 199700   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:29,020-Speed 10718.26 samples/sec   Loss 5.1250   LearningRate 0.0000   Epoch: 39   Global Step: 199710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:43:30,038-Speed 10063.90 samples/sec   Loss 5.1957   LearningRate 0.0000   Epoch: 39   Global Step: 199720   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:43:31,046-Speed 10160.64 samples/sec   Loss 5.0646   LearningRate 0.0000   Epoch: 39   Global Step: 199730   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:43:32,057-Speed 10134.14 samples/sec   Loss 5.0999   LearningRate 0.0000   Epoch: 39   Global Step: 199740   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:43:33,055-Speed 10269.05 samples/sec   Loss 5.1097   LearningRate 0.0000   Epoch: 39   Global Step: 199750   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:34,037-Speed 10432.87 samples/sec   Loss 5.1208   LearningRate 0.0000   Epoch: 39   Global Step: 199760   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:35,011-Speed 10537.83 samples/sec   Loss 5.0730   LearningRate 0.0000   Epoch: 39   Global Step: 199770   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:36,015-Speed 10197.30 samples/sec   Loss 5.1330   LearningRate 0.0000   Epoch: 39   Global Step: 199780   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:36,988-Speed 10531.45 samples/sec   Loss 5.0214   LearningRate 0.0000   Epoch: 39   Global Step: 199790   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:37,954-Speed 10615.07 samples/sec   Loss 5.0440   LearningRate 0.0000   Epoch: 39   Global Step: 199800   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:38,936-Speed 10429.55 samples/sec   Loss 5.2014   LearningRate 0.0000   Epoch: 39   Global Step: 199810   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:39,920-Speed 10411.71 samples/sec   Loss 5.0672   LearningRate 0.0000   Epoch: 39   Global Step: 199820   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:40,889-Speed 10578.57 samples/sec   Loss 5.2058   LearningRate 0.0000   Epoch: 39   Global Step: 199830   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:41,890-Speed 10228.98 samples/sec   Loss 5.0526   LearningRate 0.0000   Epoch: 39   Global Step: 199840   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:42,855-Speed 10619.47 samples/sec   Loss 5.1479   LearningRate 0.0000   Epoch: 39   Global Step: 199850   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:43,845-Speed 10353.40 samples/sec   Loss 5.1591   LearningRate 0.0000   Epoch: 39   Global Step: 199860   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:44,833-Speed 10367.22 samples/sec   Loss 5.0697   LearningRate 0.0000   Epoch: 39   Global Step: 199870   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:45,844-Speed 10138.73 samples/sec   Loss 5.1320   LearningRate 0.0000   Epoch: 39   Global Step: 199880   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:46,860-Speed 10089.58 samples/sec   Loss 5.1852   LearningRate 0.0000   Epoch: 39   Global Step: 199890   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:47,857-Speed 10274.72 samples/sec   Loss 5.0961   LearningRate 0.0000   Epoch: 39   Global Step: 199900   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:48,831-Speed 10512.92 samples/sec   Loss 5.0971   LearningRate 0.0000   Epoch: 39   Global Step: 199910   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:49,796-Speed 10620.23 samples/sec   Loss 5.1916   LearningRate 0.0000   Epoch: 39   Global Step: 199920   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:50,781-Speed 10403.87 samples/sec   Loss 5.0258   LearningRate 0.0000   Epoch: 39   Global Step: 199930   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:51,765-Speed 10423.68 samples/sec   Loss 5.1635   LearningRate 0.0000   Epoch: 39   Global Step: 199940   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:52,766-Speed 10237.73 samples/sec   Loss 5.0207   LearningRate 0.0000   Epoch: 39   Global Step: 199950   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:43:53,738-Speed 10538.22 samples/sec   Loss 5.0031   LearningRate 0.0000   Epoch: 39   Global Step: 199960   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:43:54,746-Speed 10167.97 samples/sec   Loss 5.3316   LearningRate 0.0000   Epoch: 39   Global Step: 199970   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:43:55,732-Speed 10388.39 samples/sec   Loss 5.1038   LearningRate 0.0000   Epoch: 39   Global Step: 199980   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:56,715-Speed 10418.55 samples/sec   Loss 5.1112   LearningRate 0.0000   Epoch: 39   Global Step: 199990   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:43:57,703-Speed 10379.04 samples/sec   Loss 5.2346   LearningRate 0.0000   Epoch: 39   Global Step: 200000   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:44:20,404-[lfw][200000]XNorm: 7.981685
Training: 2022-04-11 06:44:20,404-[lfw][200000]Accuracy-Flip: 0.99667+-0.00365
Training: 2022-04-11 06:44:20,405-[lfw][200000]Accuracy-Highest: 0.99700
Training: 2022-04-11 06:44:46,085-[cfp_fp][200000]XNorm: 6.904567
Training: 2022-04-11 06:44:46,086-[cfp_fp][200000]Accuracy-Flip: 0.97143+-0.00963
Training: 2022-04-11 06:44:46,087-[cfp_fp][200000]Accuracy-Highest: 0.97371
Training: 2022-04-11 06:45:08,358-[agedb_30][200000]XNorm: 7.800118
Training: 2022-04-11 06:45:08,359-[agedb_30][200000]Accuracy-Flip: 0.97383+-0.00663
Training: 2022-04-11 06:45:08,360-[agedb_30][200000]Accuracy-Highest: 0.97383
Training: 2022-04-11 06:45:09,317-Speed 142.99 samples/sec   Loss 5.1439   LearningRate 0.0000   Epoch: 39   Global Step: 200010   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:45:10,408-Speed 9393.85 samples/sec   Loss 5.0427   LearningRate 0.0000   Epoch: 39   Global Step: 200020   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:45:11,414-Speed 10197.54 samples/sec   Loss 5.0288   LearningRate 0.0000   Epoch: 39   Global Step: 200030   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:45:12,386-Speed 10538.97 samples/sec   Loss 5.1353   LearningRate 0.0000   Epoch: 39   Global Step: 200040   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:45:13,360-Speed 10529.91 samples/sec   Loss 5.0825   LearningRate 0.0000   Epoch: 39   Global Step: 200050   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:45:14,384-Speed 10007.36 samples/sec   Loss 5.1625   LearningRate 0.0000   Epoch: 39   Global Step: 200060   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:45:15,394-Speed 10153.07 samples/sec   Loss 5.1709   LearningRate 0.0000   Epoch: 39   Global Step: 200070   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:45:16,394-Speed 10246.53 samples/sec   Loss 5.1696   LearningRate 0.0000   Epoch: 39   Global Step: 200080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:45:17,380-Speed 10392.50 samples/sec   Loss 5.0400   LearningRate 0.0000   Epoch: 39   Global Step: 200090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:45:18,386-Speed 10194.22 samples/sec   Loss 5.1919   LearningRate 0.0000   Epoch: 39   Global Step: 200100   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:45:19,375-Speed 10367.66 samples/sec   Loss 5.1847   LearningRate 0.0000   Epoch: 39   Global Step: 200110   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:45:20,355-Speed 10457.46 samples/sec   Loss 5.0495   LearningRate 0.0000   Epoch: 39   Global Step: 200120   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:45:21,383-Speed 10168.99 samples/sec   Loss 4.9882   LearningRate 0.0000   Epoch: 39   Global Step: 200130   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:45:22,347-Speed 10634.87 samples/sec   Loss 5.1775   LearningRate 0.0000   Epoch: 39   Global Step: 200140   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:45:23,351-Speed 10215.30 samples/sec   Loss 5.0213   LearningRate 0.0000   Epoch: 39   Global Step: 200150   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:45:24,294-Speed 10858.30 samples/sec   Loss 5.0889   LearningRate 0.0000   Epoch: 39   Global Step: 200160   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:45:25,275-Speed 10453.45 samples/sec   Loss 5.2274   LearningRate 0.0000   Epoch: 39   Global Step: 200170   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:45:26,273-Speed 10274.62 samples/sec   Loss 5.2062   LearningRate 0.0000   Epoch: 39   Global Step: 200180   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:45:27,294-Speed 10036.45 samples/sec   Loss 5.1448   LearningRate 0.0000   Epoch: 39   Global Step: 200190   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:45:28,275-Speed 10447.87 samples/sec   Loss 5.0443   LearningRate 0.0000   Epoch: 39   Global Step: 200200   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:45:29,247-Speed 10547.02 samples/sec   Loss 5.2290   LearningRate 0.0000   Epoch: 39   Global Step: 200210   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:45:30,293-Speed 9795.69 samples/sec   Loss 5.0376   LearningRate 0.0000   Epoch: 39   Global Step: 200220   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:45:31,274-Speed 10458.97 samples/sec   Loss 5.0807   LearningRate 0.0000   Epoch: 39   Global Step: 200230   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:45:32,259-Speed 10420.50 samples/sec   Loss 5.0886   LearningRate 0.0000   Epoch: 39   Global Step: 200240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:45:33,251-Speed 10327.94 samples/sec   Loss 5.0480   LearningRate 0.0000   Epoch: 39   Global Step: 200250   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:45:34,301-Speed 9765.27 samples/sec   Loss 5.0482   LearningRate 0.0000   Epoch: 39   Global Step: 200260   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:45:35,265-Speed 10628.81 samples/sec   Loss 5.0900   LearningRate 0.0000   Epoch: 39   Global Step: 200270   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:45:36,221-Speed 10722.14 samples/sec   Loss 5.0594   LearningRate 0.0000   Epoch: 39   Global Step: 200280   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:45:37,212-Speed 10335.22 samples/sec   Loss 5.1521   LearningRate 0.0000   Epoch: 39   Global Step: 200290   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:45:38,217-Speed 10194.36 samples/sec   Loss 5.1323   LearningRate 0.0000   Epoch: 39   Global Step: 200300   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:45:39,252-Speed 9909.29 samples/sec   Loss 5.1046   LearningRate 0.0000   Epoch: 39   Global Step: 200310   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:45:40,211-Speed 10694.07 samples/sec   Loss 5.0650   LearningRate 0.0000   Epoch: 39   Global Step: 200320   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:45:41,240-Speed 9956.16 samples/sec   Loss 5.0139   LearningRate 0.0000   Epoch: 39   Global Step: 200330   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:45:42,234-Speed 10308.27 samples/sec   Loss 5.0464   LearningRate 0.0000   Epoch: 39   Global Step: 200340   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:45:43,219-Speed 10409.66 samples/sec   Loss 4.9889   LearningRate 0.0000   Epoch: 39   Global Step: 200350   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:45:44,220-Speed 10241.47 samples/sec   Loss 5.1440   LearningRate 0.0000   Epoch: 39   Global Step: 200360   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:45:45,215-Speed 10317.20 samples/sec   Loss 5.0948   LearningRate 0.0000   Epoch: 39   Global Step: 200370   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:45:46,220-Speed 10191.66 samples/sec   Loss 5.1219   LearningRate 0.0000   Epoch: 39   Global Step: 200380   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:45:47,241-Speed 10043.33 samples/sec   Loss 5.1588   LearningRate 0.0000   Epoch: 39   Global Step: 200390   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:45:48,228-Speed 10394.16 samples/sec   Loss 5.0074   LearningRate 0.0000   Epoch: 39   Global Step: 200400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:45:49,195-Speed 10593.75 samples/sec   Loss 5.1242   LearningRate 0.0000   Epoch: 39   Global Step: 200410   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:45:50,171-Speed 10503.06 samples/sec   Loss 5.0493   LearningRate 0.0000   Epoch: 39   Global Step: 200420   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:45:51,208-Speed 9881.81 samples/sec   Loss 5.1223   LearningRate 0.0000   Epoch: 39   Global Step: 200430   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:45:52,227-Speed 10063.13 samples/sec   Loss 5.1116   LearningRate 0.0000   Epoch: 39   Global Step: 200440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:45:53,227-Speed 10251.12 samples/sec   Loss 5.0635   LearningRate 0.0000   Epoch: 39   Global Step: 200450   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:45:54,254-Speed 9975.60 samples/sec   Loss 5.0205   LearningRate 0.0000   Epoch: 39   Global Step: 200460   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:45:55,226-Speed 10555.06 samples/sec   Loss 4.9354   LearningRate 0.0000   Epoch: 39   Global Step: 200470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:45:56,147-Speed 11135.85 samples/sec   Loss 5.1963   LearningRate 0.0000   Epoch: 39   Global Step: 200480   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:45:57,110-Speed 10637.38 samples/sec   Loss 5.2174   LearningRate 0.0000   Epoch: 39   Global Step: 200490   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:45:58,075-Speed 10629.72 samples/sec   Loss 5.0317   LearningRate 0.0000   Epoch: 39   Global Step: 200500   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:45:59,071-Speed 10293.62 samples/sec   Loss 5.1172   LearningRate 0.0000   Epoch: 39   Global Step: 200510   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:46:00,044-Speed 10523.58 samples/sec   Loss 5.1939   LearningRate 0.0000   Epoch: 39   Global Step: 200520   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:46:01,046-Speed 10231.37 samples/sec   Loss 5.1421   LearningRate 0.0000   Epoch: 39   Global Step: 200530   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:46:02,033-Speed 10390.89 samples/sec   Loss 5.1304   LearningRate 0.0000   Epoch: 39   Global Step: 200540   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:46:03,053-Speed 10050.94 samples/sec   Loss 5.0826   LearningRate 0.0000   Epoch: 39   Global Step: 200550   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:46:04,054-Speed 10236.45 samples/sec   Loss 4.9817   LearningRate 0.0000   Epoch: 39   Global Step: 200560   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:46:05,101-Speed 9808.75 samples/sec   Loss 5.0828   LearningRate 0.0000   Epoch: 39   Global Step: 200570   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:46:06,049-Speed 10816.41 samples/sec   Loss 5.0194   LearningRate 0.0000   Epoch: 39   Global Step: 200580   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:46:07,024-Speed 10512.09 samples/sec   Loss 5.2733   LearningRate 0.0000   Epoch: 39   Global Step: 200590   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:46:07,979-Speed 10751.27 samples/sec   Loss 5.1817   LearningRate 0.0000   Epoch: 39   Global Step: 200600   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:46:08,997-Speed 10071.48 samples/sec   Loss 5.2085   LearningRate 0.0000   Epoch: 39   Global Step: 200610   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:46:09,980-Speed 10429.85 samples/sec   Loss 5.1527   LearningRate 0.0000   Epoch: 39   Global Step: 200620   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:46:10,958-Speed 10473.10 samples/sec   Loss 5.0288   LearningRate 0.0000   Epoch: 39   Global Step: 200630   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:46:11,981-Speed 10023.10 samples/sec   Loss 5.2074   LearningRate 0.0000   Epoch: 39   Global Step: 200640   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:46:12,973-Speed 10335.64 samples/sec   Loss 5.0578   LearningRate 0.0000   Epoch: 39   Global Step: 200650   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:46:13,978-Speed 10193.13 samples/sec   Loss 5.1623   LearningRate 0.0000   Epoch: 39   Global Step: 200660   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:46:14,978-Speed 10246.01 samples/sec   Loss 5.1750   LearningRate 0.0000   Epoch: 39   Global Step: 200670   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:46:15,982-Speed 10206.44 samples/sec   Loss 5.1154   LearningRate 0.0000   Epoch: 39   Global Step: 200680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:46:16,998-Speed 10088.71 samples/sec   Loss 4.9468   LearningRate 0.0000   Epoch: 39   Global Step: 200690   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:46:17,968-Speed 10567.22 samples/sec   Loss 5.0416   LearningRate 0.0000   Epoch: 39   Global Step: 200700   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:46:18,982-Speed 10113.74 samples/sec   Loss 5.2019   LearningRate 0.0000   Epoch: 39   Global Step: 200710   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:46:20,002-Speed 10043.92 samples/sec   Loss 5.0740   LearningRate 0.0000   Epoch: 39   Global Step: 200720   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:46:20,995-Speed 10318.88 samples/sec   Loss 5.1243   LearningRate 0.0000   Epoch: 39   Global Step: 200730   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:46:22,080-Speed 9452.32 samples/sec   Loss 5.1492   LearningRate 0.0000   Epoch: 39   Global Step: 200740   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:46:23,085-Speed 10195.53 samples/sec   Loss 5.1667   LearningRate 0.0000   Epoch: 39   Global Step: 200750   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:46:24,053-Speed 10586.85 samples/sec   Loss 5.0726   LearningRate 0.0000   Epoch: 39   Global Step: 200760   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:46:25,018-Speed 10625.24 samples/sec   Loss 5.1168   LearningRate 0.0000   Epoch: 39   Global Step: 200770   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:46:26,040-Speed 10026.49 samples/sec   Loss 5.2208   LearningRate 0.0000   Epoch: 39   Global Step: 200780   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:46:27,011-Speed 10558.07 samples/sec   Loss 5.0846   LearningRate 0.0000   Epoch: 39   Global Step: 200790   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:46:28,148-Speed 9020.61 samples/sec   Loss 5.2242   LearningRate 0.0000   Epoch: 39   Global Step: 200800   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:46:29,132-Speed 10414.30 samples/sec   Loss 5.0352   LearningRate 0.0000   Epoch: 39   Global Step: 200810   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:46:30,090-Speed 10700.68 samples/sec   Loss 5.0467   LearningRate 0.0000   Epoch: 39   Global Step: 200820   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:46:31,102-Speed 10128.38 samples/sec   Loss 4.9818   LearningRate 0.0000   Epoch: 39   Global Step: 200830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:46:32,119-Speed 10077.39 samples/sec   Loss 5.1163   LearningRate 0.0000   Epoch: 39   Global Step: 200840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:46:33,127-Speed 10166.51 samples/sec   Loss 5.1251   LearningRate 0.0000   Epoch: 39   Global Step: 200850   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:46:34,113-Speed 10399.49 samples/sec   Loss 5.0846   LearningRate 0.0000   Epoch: 39   Global Step: 200860   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:46:35,163-Speed 9751.04 samples/sec   Loss 5.0028   LearningRate 0.0000   Epoch: 39   Global Step: 200870   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:46:36,159-Speed 10298.99 samples/sec   Loss 5.2511   LearningRate 0.0000   Epoch: 39   Global Step: 200880   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:46:37,128-Speed 10570.26 samples/sec   Loss 4.9039   LearningRate 0.0000   Epoch: 39   Global Step: 200890   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:46:38,135-Speed 10184.88 samples/sec   Loss 5.1280   LearningRate 0.0000   Epoch: 39   Global Step: 200900   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:46:39,125-Speed 10346.42 samples/sec   Loss 5.1278   LearningRate 0.0000   Epoch: 39   Global Step: 200910   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:46:40,099-Speed 10528.57 samples/sec   Loss 5.0287   LearningRate 0.0000   Epoch: 39   Global Step: 200920   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:46:41,097-Speed 10266.18 samples/sec   Loss 4.9840   LearningRate 0.0000   Epoch: 39   Global Step: 200930   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:46:42,127-Speed 9954.77 samples/sec   Loss 5.1034   LearningRate 0.0000   Epoch: 39   Global Step: 200940   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:46:43,127-Speed 10249.52 samples/sec   Loss 5.0417   LearningRate 0.0000   Epoch: 39   Global Step: 200950   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:46:44,117-Speed 10351.45 samples/sec   Loss 5.0251   LearningRate 0.0000   Epoch: 39   Global Step: 200960   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:46:45,125-Speed 10167.81 samples/sec   Loss 5.0526   LearningRate 0.0000   Epoch: 39   Global Step: 200970   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:46:46,115-Speed 10349.68 samples/sec   Loss 4.9942   LearningRate 0.0000   Epoch: 39   Global Step: 200980   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:46:47,110-Speed 10307.86 samples/sec   Loss 5.1053   LearningRate 0.0000   Epoch: 39   Global Step: 200990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:46:48,151-Speed 9838.86 samples/sec   Loss 5.1071   LearningRate 0.0000   Epoch: 39   Global Step: 201000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:46:49,130-Speed 10471.31 samples/sec   Loss 5.1003   LearningRate 0.0000   Epoch: 39   Global Step: 201010   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:46:50,112-Speed 10442.22 samples/sec   Loss 4.8743   LearningRate 0.0000   Epoch: 39   Global Step: 201020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:46:51,092-Speed 10451.96 samples/sec   Loss 5.2431   LearningRate 0.0000   Epoch: 39   Global Step: 201030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:46:52,096-Speed 10220.76 samples/sec   Loss 5.0249   LearningRate 0.0000   Epoch: 39   Global Step: 201040   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:46:53,085-Speed 10365.21 samples/sec   Loss 5.1077   LearningRate 0.0000   Epoch: 39   Global Step: 201050   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:46:54,067-Speed 10437.52 samples/sec   Loss 5.1351   LearningRate 0.0000   Epoch: 39   Global Step: 201060   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:46:55,126-Speed 9683.34 samples/sec   Loss 5.1104   LearningRate 0.0000   Epoch: 39   Global Step: 201070   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:46:56,124-Speed 10272.76 samples/sec   Loss 5.1611   LearningRate 0.0000   Epoch: 39   Global Step: 201080   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:46:57,126-Speed 10220.61 samples/sec   Loss 5.0753   LearningRate 0.0000   Epoch: 39   Global Step: 201090   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:46:58,138-Speed 10134.21 samples/sec   Loss 5.2207   LearningRate 0.0000   Epoch: 39   Global Step: 201100   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:46:59,105-Speed 10610.73 samples/sec   Loss 5.0505   LearningRate 0.0000   Epoch: 39   Global Step: 201110   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:47:00,095-Speed 10351.94 samples/sec   Loss 5.0351   LearningRate 0.0000   Epoch: 39   Global Step: 201120   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:47:01,194-Speed 9322.32 samples/sec   Loss 5.0554   LearningRate 0.0000   Epoch: 39   Global Step: 201130   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:47:02,145-Speed 10774.16 samples/sec   Loss 5.2077   LearningRate 0.0000   Epoch: 39   Global Step: 201140   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:47:03,140-Speed 10310.44 samples/sec   Loss 5.0739   LearningRate 0.0000   Epoch: 39   Global Step: 201150   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:47:04,120-Speed 10457.01 samples/sec   Loss 4.9610   LearningRate 0.0000   Epoch: 39   Global Step: 201160   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:47:05,140-Speed 10044.51 samples/sec   Loss 4.9301   LearningRate 0.0000   Epoch: 39   Global Step: 201170   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:47:06,097-Speed 10710.27 samples/sec   Loss 5.0391   LearningRate 0.0000   Epoch: 39   Global Step: 201180   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:47:07,094-Speed 10284.72 samples/sec   Loss 5.0761   LearningRate 0.0000   Epoch: 39   Global Step: 201190   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:47:08,056-Speed 10654.03 samples/sec   Loss 5.0946   LearningRate 0.0000   Epoch: 39   Global Step: 201200   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:47:09,115-Speed 9674.48 samples/sec   Loss 5.0546   LearningRate 0.0000   Epoch: 39   Global Step: 201210   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:47:10,142-Speed 9984.79 samples/sec   Loss 4.9053   LearningRate 0.0000   Epoch: 39   Global Step: 201220   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:47:11,119-Speed 10494.66 samples/sec   Loss 5.0984   LearningRate 0.0000   Epoch: 39   Global Step: 201230   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:47:12,112-Speed 10321.34 samples/sec   Loss 5.0963   LearningRate 0.0000   Epoch: 39   Global Step: 201240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:47:13,139-Speed 9977.05 samples/sec   Loss 5.1178   LearningRate 0.0000   Epoch: 39   Global Step: 201250   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:47:14,150-Speed 10138.46 samples/sec   Loss 5.0775   LearningRate 0.0000   Epoch: 39   Global Step: 201260   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:47:15,146-Speed 10297.51 samples/sec   Loss 5.1445   LearningRate 0.0000   Epoch: 39   Global Step: 201270   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:47:16,104-Speed 10697.31 samples/sec   Loss 5.1216   LearningRate 0.0000   Epoch: 39   Global Step: 201280   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:47:17,115-Speed 10137.55 samples/sec   Loss 5.1382   LearningRate 0.0000   Epoch: 39   Global Step: 201290   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:47:18,079-Speed 10634.79 samples/sec   Loss 5.1382   LearningRate 0.0000   Epoch: 39   Global Step: 201300   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:47:19,028-Speed 10790.45 samples/sec   Loss 5.1457   LearningRate 0.0000   Epoch: 39   Global Step: 201310   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:47:20,095-Speed 9608.99 samples/sec   Loss 5.2440   LearningRate 0.0000   Epoch: 39   Global Step: 201320   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:47:21,064-Speed 10579.01 samples/sec   Loss 5.0438   LearningRate 0.0000   Epoch: 39   Global Step: 201330   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:47:22,048-Speed 10415.22 samples/sec   Loss 5.2002   LearningRate 0.0000   Epoch: 39   Global Step: 201340   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:47:23,157-Speed 9236.99 samples/sec   Loss 5.1469   LearningRate 0.0000   Epoch: 39   Global Step: 201350   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:47:24,188-Speed 9947.20 samples/sec   Loss 5.2388   LearningRate 0.0000   Epoch: 39   Global Step: 201360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:47:25,197-Speed 10156.26 samples/sec   Loss 5.2444   LearningRate 0.0000   Epoch: 39   Global Step: 201370   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:47:26,145-Speed 10812.46 samples/sec   Loss 5.1293   LearningRate 0.0000   Epoch: 39   Global Step: 201380   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:47:27,155-Speed 10148.60 samples/sec   Loss 5.1863   LearningRate 0.0000   Epoch: 39   Global Step: 201390   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:47:28,162-Speed 10174.38 samples/sec   Loss 5.0730   LearningRate 0.0000   Epoch: 39   Global Step: 201400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:47:29,142-Speed 10464.83 samples/sec   Loss 5.1334   LearningRate 0.0000   Epoch: 39   Global Step: 201410   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:47:30,158-Speed 10079.24 samples/sec   Loss 5.0875   LearningRate 0.0000   Epoch: 39   Global Step: 201420   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:47:31,184-Speed 10009.14 samples/sec   Loss 5.0822   LearningRate 0.0000   Epoch: 39   Global Step: 201430   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:47:32,162-Speed 10471.80 samples/sec   Loss 5.0540   LearningRate 0.0000   Epoch: 39   Global Step: 201440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:47:33,150-Speed 10373.43 samples/sec   Loss 5.0001   LearningRate 0.0000   Epoch: 39   Global Step: 201450   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:47:34,141-Speed 10348.11 samples/sec   Loss 4.9390   LearningRate 0.0000   Epoch: 39   Global Step: 201460   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:47:35,152-Speed 10139.19 samples/sec   Loss 5.0411   LearningRate 0.0000   Epoch: 39   Global Step: 201470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:47:36,119-Speed 10598.47 samples/sec   Loss 5.2085   LearningRate 0.0000   Epoch: 39   Global Step: 201480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:47:37,085-Speed 10615.40 samples/sec   Loss 5.1267   LearningRate 0.0000   Epoch: 39   Global Step: 201490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:47:38,083-Speed 10268.65 samples/sec   Loss 5.1160   LearningRate 0.0000   Epoch: 39   Global Step: 201500   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:47:39,042-Speed 10682.13 samples/sec   Loss 5.0774   LearningRate 0.0000   Epoch: 39   Global Step: 201510   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:47:40,100-Speed 9687.65 samples/sec   Loss 4.9669   LearningRate 0.0000   Epoch: 39   Global Step: 201520   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:47:41,055-Speed 10740.72 samples/sec   Loss 5.1475   LearningRate 0.0000   Epoch: 39   Global Step: 201530   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:47:42,041-Speed 10389.09 samples/sec   Loss 5.0591   LearningRate 0.0000   Epoch: 39   Global Step: 201540   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:47:43,151-Speed 9236.76 samples/sec   Loss 5.1498   LearningRate 0.0000   Epoch: 39   Global Step: 201550   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:47:44,162-Speed 10139.28 samples/sec   Loss 5.1391   LearningRate 0.0000   Epoch: 39   Global Step: 201560   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-11 06:47:45,130-Speed 10589.27 samples/sec   Loss 5.1222   LearningRate 0.0000   Epoch: 39   Global Step: 201570   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:47:46,149-Speed 10050.95 samples/sec   Loss 5.0436   LearningRate 0.0000   Epoch: 39   Global Step: 201580   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:47:47,199-Speed 9765.30 samples/sec   Loss 5.0478   LearningRate 0.0000   Epoch: 39   Global Step: 201590   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:47:48,208-Speed 10155.64 samples/sec   Loss 5.0892   LearningRate 0.0000   Epoch: 39   Global Step: 201600   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:47:49,169-Speed 10673.58 samples/sec   Loss 5.1192   LearningRate 0.0000   Epoch: 39   Global Step: 201610   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:47:50,162-Speed 10322.66 samples/sec   Loss 5.1248   LearningRate 0.0000   Epoch: 39   Global Step: 201620   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:47:51,263-Speed 9312.59 samples/sec   Loss 5.1411   LearningRate 0.0000   Epoch: 39   Global Step: 201630   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:47:52,265-Speed 10223.19 samples/sec   Loss 5.0260   LearningRate 0.0000   Epoch: 39   Global Step: 201640   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:47:53,226-Speed 10697.12 samples/sec   Loss 5.0134   LearningRate 0.0000   Epoch: 39   Global Step: 201650   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:47:54,235-Speed 10152.32 samples/sec   Loss 5.1981   LearningRate 0.0000   Epoch: 39   Global Step: 201660   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:47:55,267-Speed 9930.48 samples/sec   Loss 5.1709   LearningRate 0.0000   Epoch: 39   Global Step: 201670   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:47:56,227-Speed 10685.27 samples/sec   Loss 5.0740   LearningRate 0.0000   Epoch: 39   Global Step: 201680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:47:57,235-Speed 10165.12 samples/sec   Loss 5.2662   LearningRate 0.0000   Epoch: 39   Global Step: 201690   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:47:58,173-Speed 10918.42 samples/sec   Loss 5.0789   LearningRate 0.0000   Epoch: 39   Global Step: 201700   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:47:59,167-Speed 10311.16 samples/sec   Loss 5.0499   LearningRate 0.0000   Epoch: 39   Global Step: 201710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:48:00,162-Speed 10309.58 samples/sec   Loss 5.0953   LearningRate 0.0000   Epoch: 39   Global Step: 201720   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:48:01,200-Speed 9876.59 samples/sec   Loss 5.0949   LearningRate 0.0000   Epoch: 39   Global Step: 201730   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:48:02,178-Speed 10480.90 samples/sec   Loss 5.1094   LearningRate 0.0000   Epoch: 39   Global Step: 201740   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:48:03,221-Speed 9818.58 samples/sec   Loss 5.1714   LearningRate 0.0000   Epoch: 39   Global Step: 201750   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:48:04,223-Speed 10242.03 samples/sec   Loss 5.2033   LearningRate 0.0000   Epoch: 39   Global Step: 201760   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:48:05,219-Speed 10290.81 samples/sec   Loss 5.1271   LearningRate 0.0000   Epoch: 39   Global Step: 201770   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:48:06,168-Speed 10801.23 samples/sec   Loss 5.1151   LearningRate 0.0000   Epoch: 39   Global Step: 201780   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:48:07,137-Speed 10570.11 samples/sec   Loss 4.9204   LearningRate 0.0000   Epoch: 39   Global Step: 201790   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:48:08,143-Speed 10188.40 samples/sec   Loss 4.9127   LearningRate 0.0000   Epoch: 39   Global Step: 201800   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:48:09,131-Speed 10377.73 samples/sec   Loss 5.1823   LearningRate 0.0000   Epoch: 39   Global Step: 201810   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:48:10,101-Speed 10565.03 samples/sec   Loss 5.0756   LearningRate 0.0000   Epoch: 39   Global Step: 201820   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:48:11,120-Speed 10060.26 samples/sec   Loss 5.0348   LearningRate 0.0000   Epoch: 39   Global Step: 201830   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:48:12,124-Speed 10219.25 samples/sec   Loss 5.1753   LearningRate 0.0000   Epoch: 39   Global Step: 201840   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:48:13,124-Speed 10243.51 samples/sec   Loss 5.0072   LearningRate 0.0000   Epoch: 39   Global Step: 201850   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:48:14,167-Speed 9833.99 samples/sec   Loss 5.0712   LearningRate 0.0000   Epoch: 39   Global Step: 201860   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:48:15,199-Speed 9929.33 samples/sec   Loss 5.1201   LearningRate 0.0000   Epoch: 39   Global Step: 201870   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:48:16,188-Speed 10363.64 samples/sec   Loss 5.0861   LearningRate 0.0000   Epoch: 39   Global Step: 201880   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:48:17,179-Speed 10334.30 samples/sec   Loss 4.8984   LearningRate 0.0000   Epoch: 39   Global Step: 201890   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:48:18,141-Speed 10656.17 samples/sec   Loss 5.0572   LearningRate 0.0000   Epoch: 39   Global Step: 201900   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:48:19,119-Speed 10481.90 samples/sec   Loss 5.1041   LearningRate 0.0000   Epoch: 39   Global Step: 201910   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:48:20,153-Speed 9916.67 samples/sec   Loss 5.0800   LearningRate 0.0000   Epoch: 39   Global Step: 201920   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:48:21,153-Speed 10249.99 samples/sec   Loss 5.1851   LearningRate 0.0000   Epoch: 39   Global Step: 201930   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:48:22,188-Speed 9904.02 samples/sec   Loss 5.0872   LearningRate 0.0000   Epoch: 39   Global Step: 201940   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:48:23,175-Speed 10395.50 samples/sec   Loss 5.2009   LearningRate 0.0000   Epoch: 39   Global Step: 201950   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:48:24,150-Speed 10520.41 samples/sec   Loss 5.2018   LearningRate 0.0000   Epoch: 39   Global Step: 201960   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:48:25,175-Speed 9996.79 samples/sec   Loss 5.0524   LearningRate 0.0000   Epoch: 39   Global Step: 201970   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:48:26,181-Speed 10189.52 samples/sec   Loss 5.2096   LearningRate 0.0000   Epoch: 39   Global Step: 201980   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:48:27,153-Speed 10535.80 samples/sec   Loss 5.2084   LearningRate 0.0000   Epoch: 39   Global Step: 201990   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:48:28,124-Speed 10567.36 samples/sec   Loss 5.0136   LearningRate 0.0000   Epoch: 39   Global Step: 202000   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:48:53,501-[lfw][202000]XNorm: 7.970208
Training: 2022-04-11 06:48:53,501-[lfw][202000]Accuracy-Flip: 0.99683+-0.00345
Training: 2022-04-11 06:48:53,502-[lfw][202000]Accuracy-Highest: 0.99700
Training: 2022-04-11 06:49:19,180-[cfp_fp][202000]XNorm: 6.889223
Training: 2022-04-11 06:49:19,181-[cfp_fp][202000]Accuracy-Flip: 0.97143+-0.00924
Training: 2022-04-11 06:49:19,182-[cfp_fp][202000]Accuracy-Highest: 0.97371
Training: 2022-04-11 06:49:41,830-[agedb_30][202000]XNorm: 7.788529
Training: 2022-04-11 06:49:41,830-[agedb_30][202000]Accuracy-Flip: 0.97400+-0.00564
Training: 2022-04-11 06:49:41,831-[agedb_30][202000]Accuracy-Highest: 0.97400
Training: 2022-04-11 06:49:42,836-Speed 137.06 samples/sec   Loss 5.0912   LearningRate 0.0000   Epoch: 39   Global Step: 202010   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:49:43,829-Speed 10314.38 samples/sec   Loss 5.0383   LearningRate 0.0000   Epoch: 39   Global Step: 202020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:49:44,847-Speed 10073.94 samples/sec   Loss 5.0216   LearningRate 0.0000   Epoch: 39   Global Step: 202030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:49:45,837-Speed 10363.91 samples/sec   Loss 5.0834   LearningRate 0.0000   Epoch: 39   Global Step: 202040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:49:46,819-Speed 10437.14 samples/sec   Loss 5.1877   LearningRate 0.0000   Epoch: 39   Global Step: 202050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:49:47,894-Speed 9531.29 samples/sec   Loss 5.1069   LearningRate 0.0000   Epoch: 39   Global Step: 202060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:49:48,910-Speed 10098.55 samples/sec   Loss 5.0856   LearningRate 0.0000   Epoch: 39   Global Step: 202070   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:49:49,874-Speed 10629.63 samples/sec   Loss 5.2220   LearningRate 0.0000   Epoch: 39   Global Step: 202080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:49:50,870-Speed 10289.37 samples/sec   Loss 5.0123   LearningRate 0.0000   Epoch: 39   Global Step: 202090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:49:51,891-Speed 10053.21 samples/sec   Loss 5.0509   LearningRate 0.0000   Epoch: 39   Global Step: 202100   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:49:52,860-Speed 10593.09 samples/sec   Loss 5.0282   LearningRate 0.0000   Epoch: 39   Global Step: 202110   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-11 06:49:53,831-Speed 10547.49 samples/sec   Loss 5.0953   LearningRate 0.0000   Epoch: 39   Global Step: 202120   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:49:54,858-Speed 9992.86 samples/sec   Loss 5.1568   LearningRate 0.0000   Epoch: 39   Global Step: 202130   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:49:55,826-Speed 10600.72 samples/sec   Loss 5.0588   LearningRate 0.0000   Epoch: 39   Global Step: 202140   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:49:56,799-Speed 10523.60 samples/sec   Loss 4.9972   LearningRate 0.0000   Epoch: 39   Global Step: 202150   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:49:57,771-Speed 10548.09 samples/sec   Loss 4.9262   LearningRate 0.0000   Epoch: 39   Global Step: 202160   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:49:58,711-Speed 10905.71 samples/sec   Loss 5.0606   LearningRate 0.0000   Epoch: 39   Global Step: 202170   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:49:59,725-Speed 10107.41 samples/sec   Loss 5.0171   LearningRate 0.0000   Epoch: 39   Global Step: 202180   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:50:00,670-Speed 10850.03 samples/sec   Loss 5.1885   LearningRate 0.0000   Epoch: 39   Global Step: 202190   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:50:01,623-Speed 10749.50 samples/sec   Loss 4.9937   LearningRate 0.0000   Epoch: 39   Global Step: 202200   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:50:02,642-Speed 10056.27 samples/sec   Loss 5.1756   LearningRate 0.0000   Epoch: 39   Global Step: 202210   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:50:03,629-Speed 10388.70 samples/sec   Loss 4.9630   LearningRate 0.0000   Epoch: 39   Global Step: 202220   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:50:04,639-Speed 10154.74 samples/sec   Loss 5.1663   LearningRate 0.0000   Epoch: 39   Global Step: 202230   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:50:05,617-Speed 10476.65 samples/sec   Loss 5.1672   LearningRate 0.0000   Epoch: 39   Global Step: 202240   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:50:06,626-Speed 10166.26 samples/sec   Loss 5.1161   LearningRate 0.0000   Epoch: 39   Global Step: 202250   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:50:07,586-Speed 10669.67 samples/sec   Loss 5.1166   LearningRate 0.0000   Epoch: 39   Global Step: 202260   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:50:08,622-Speed 9893.68 samples/sec   Loss 5.0584   LearningRate 0.0000   Epoch: 39   Global Step: 202270   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 06:50:09,562-Speed 10911.86 samples/sec   Loss 5.1028   LearningRate 0.0000   Epoch: 39   Global Step: 202280   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:50:10,501-Speed 10913.72 samples/sec   Loss 5.1294   LearningRate 0.0000   Epoch: 39   Global Step: 202290   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:50:11,472-Speed 10548.42 samples/sec   Loss 5.1206   LearningRate 0.0000   Epoch: 39   Global Step: 202300   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:50:12,624-Speed 8902.59 samples/sec   Loss 5.0430   LearningRate 0.0000   Epoch: 39   Global Step: 202310   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 06:50:13,530-Speed 11308.94 samples/sec   Loss 5.1661   LearningRate 0.0000   Epoch: 39   Global Step: 202320   Fp16 Grad Scale: 65536   Required: -0 hours